Investigating Grant's real 38-min group call: 'Marty' was a GARBAGE cluster (192
segs, 0.37s mean, 186 ≤2 words, 125 single words flanked by the same other speaker —
diarization micro-fragments split mid-sentence, then LLM-named 'Marty'). Same for
'Message'/'HI'.
- SpeakerReconciler.smoothFragments: dissolve non-self clusters whose MEDIAN segment
duration ≤ 1s (≥3 segs) — reassign each fragment to the temporally-nearest real
speaker. (Median, not max, so one stray long segment can't rescue a fragment
cluster — the bug in the first cut.) On the real call: 7 speakers (3 junk) → 4 real
(Marty/Message/HI absorbed into Grant/Jonathan/Me/MH). Runs before LLM naming.
- LLM naming guardrails: forbid assigning the self name or ANY already-taken name to
another voice (fixes 'Grant' = the user's name pinned on a remote speaker); prompt
demands self-intro / direct-address evidence (mention ≠ presence), 'precision over
coverage', one name per speaker.
- Open saved session now offers Open Editor vs Re-process, so newer logic can be
applied to past calls (+ always-visible progress from the prior fix).
NOTE: the self-name guardrail needs the app to KNOW the user's name — selfName is still
'Me', so set it in Settings (e.g. 'Grant') so the LLM can't reuse it. 62/62 XCTest.
Reconciliation (the marry-the-signals layer): after transcription, before the recap,
SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar
(cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one
remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters
from transcript CONTENT via the gateway LLM (people addressed by name / self-intros),
conservative + confidence-gated, keeping the placeholder when unrevealed. The
mic-channel self is protected and never reassigned. Voice does the segmentation; the
fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack.
- SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites
speakers.json before recap. SessionController.finishBackend shares one model lookup
for reconcile + recap. Gated by settings.reconcileSpeakers (default on).
- Open saved session: menu 'Open saved session…' → folder picker. Edits it if already
transcribed, else reconstructs inputs from disk (visual_timeline vision segs +
channel self-spans) and runs transcribe → reconcile → recap, then opens the editor.
Lets you evaluate/correct ANY past call, not just the in-memory last one.
Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread
initials, lowercase/center names) — so reconciliation + the editor (which teaches
voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.