Files
ten31-transcripts/Ten31TranscriptsTests
Grant Gilliam 7f16b29f56 Filter OCR to participant-name labels (kill visual-timeline noise)
Real Meet capture revealed the visual pipeline was treating ALL on-screen text as
participant names: meeting URL, clock, 'Add others' button, lobby 'Your meeting's
ready' dialog, 'Joined as …@gmail.com', etc. 46 of 52 'visual segments' in a real
session were phantom speakers. (The backend was unaffected — it diarizes from audio
and ignores names that match no voice cluster — but the visual_timeline.json and the
segment count were junk.)

GridCallAnalyzer.isLikelyName now gates OCR strings to things shaped like a name:
2–30 chars, 1–3 Title-Cased alphabetic words, no digits/URL/email/glyph punctuation.
Errs toward dropping (a missed name just loses a hint; audio diarization still runs).
Unit-tested against the EXACT 19 OCR strings from the real session: keeps the 5
real names, drops all 14 chrome strings. 28/28 XCTest.
2026-06-06 12:01:57 -05:00
..