Files
ten31-transcripts/Ten31Transcripts/Adapters/MeetAdapter.swift
T
Grant Gilliam 39beccf7f4 Fix Meet visual: reject solid avatar tiles + screen-share OCR
Root cause of the "4 people → 2 speakers" Meet call: the colored-border detector
read solid camera-off avatar tiles (orange "J", magenta "G") as active speakers
for the ENTIRE call. Those whole-call phantom spans dominated backend name
attribution, collapsing every remote voice onto one name — and the giant filled
bbox also swallowed screen-share text (WERUNBTC.COM ×49) as a speaker.

Validated against 9 real fixtures (harness over the real MeetAdapter):

Detection:
- FrameSampler.thinColoredPoints: coloured counterpart of thinWhitePoints — keeps
  thin border/ring/pill edges, drops solid colour fills.
- GridCallAnalyzer.isHollow: reject a highlight component whose interior is filled
  (a solid tile) vs a hollow ring (a real border). Config.maxInteriorFill (0.2 default).
- MeetAdapter: detect thin BLUE edges only (hue 180–240°, measured from the
  fixtures), maxInteriorFill 0.3 (real Meet rings ≈0.2–0.3, solid tiles ≈0.36).
- Result on fixtures: John Arnold/Grant Gilliam (solid tiles) now NEVER detected;
  Matt Odell/Mark detected when their blue cue is present. Sparse but never wrong —
  correct for a naming hint over audio diarization.

OCR name hygiene:
- isLikelyName rejects domain-like screen-share text ("WERUNBTC.COM", OCR'd ".GOM").
- cleaned() strips trailing punctuation ("Mark." → "Mark").
- TimelineBuilder.canonicalizeByFrequency folds rare OCR misspellings into a
  dominant near-twin name ("Matt Odel"/"MattOdell" → "Matt Odell", "Mare" → "Mark").

Tests: hollow-ring, extended OCR filter, fuzzy-merge. 65 pass.
2026-06-08 16:18:52 -05:00

59 lines
2.8 KiB
Swift
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
import Foundation
import CoreVideo
/// Google Meet adapter (browser tab capture is at the browser-window level).
///
/// Meet's active-speaker cue is a **coloured (Google-blue) ring/glow** around the
/// speaking participant's tile, plus animated audio bars in the tile's mic chip.
/// The participant **name sits in the tile's bottom-LEFT corner**, so the tile is
/// estimated extending up and to the right of the name.
///
/// Detection *logic* is validated on synthetic frames; the geometry constants are a
/// first pass and will be calibrated against real Meet screenshots. Meet runs in a
/// browser, so there's no Accessibility name source we rely on OCR only.
struct MeetAdapter: AppAdapter {
// Browsers that can host a Meet tab. The window, not the app, is what we capture;
// CallDetector decides a browser window is a Meet call by its title.
static let bundleIDs = [
"com.google.Chrome", "org.mozilla.firefox", "com.apple.Safari",
"company.thebrowser.Browser", "com.brave.Browser", "com.microsoft.edgemac",
"com.google.Chrome.canary", "org.chromium.Chromium",
]
let adapterVersion = "meet-0.1.0"
let preferredFPS = 3
private let analyzer: GridCallAnalyzer
init() {
var config = GridCallAnalyzer.Config()
config.nameAnchor = .bottomLeft
config.detectColoredBorder = true // Google-blue speaking ring/glow
config.detectWhiteBorder = false
// The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
// ~0.44, below the 0.5 default lower the threshold so the glow registers.
config.colorSaturation = 0.35
// Meet's active cue is a thin BLUE (210°) ring + audio pill. Detect thin blue
// EDGES only, gated to blue: this rejects solid camera-off avatar tiles (orange
// 30°, magenta 340°), which otherwise read as "speaking" for the whole call
// and collapse every remote voice onto one name. Validated on real fixtures.
config.coloredBorderThinOnly = true
config.colorHueRange = 180...240
// Meet's blue border is faint; real rings measure 0.200.30 interior fill while
// solid tiles measure 0.36, so allow a higher fill here than the 0.2 default to
// recover real borders without readmitting the solid-tile false positives.
config.maxInteriorFill = 0.3
config.tileExpandX = 3.0
config.tileExpandY = 5.0
self.analyzer = GridCallAnalyzer(config: config)
}
func analyze(frame: CVPixelBuffer, at t: TimeInterval) -> [SpeakerObservation] {
analyzer.analyze(pixelBuffer: frame, at: t)
}
// Exposed for fixture/synthetic tests.
func analyze(cgImage: CGImage, at t: TimeInterval) -> [SpeakerObservation] {
analyzer.analyze(cgImage: cgImage, at: t)
}
}