The status line only rendered inside the last-in-memory-session block, so 'Open
saved session' processed invisibly — looked like nothing happened. Now: the
transcript status (with a spinner) is always shown, the processing(0,0) reconcile
phase reads 'Working… (this can take a few minutes)', and invalid picks surface an
alert (not a recorded session / already processing / unreadable transcript) instead
of doing nothing.
Reconciliation (the marry-the-signals layer): after transcription, before the recap,
SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar
(cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one
remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters
from transcript CONTENT via the gateway LLM (people addressed by name / self-intros),
conservative + confidence-gated, keeping the placeholder when unrevealed. The
mic-channel self is protected and never reassigned. Voice does the segmentation; the
fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack.
- SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites
speakers.json before recap. SessionController.finishBackend shares one model lookup
for reconcile + recap. Gated by settings.reconcileSpeakers (default on).
- Open saved session: menu 'Open saved session…' → folder picker. Edits it if already
transcribed, else reconstructs inputs from disk (visual_timeline vision segs +
channel self-spans) and runs transcribe → reconcile → recap, then opens the editor.
Lets you evaluate/correct ANY past call, not just the in-memory last one.
Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread
initials, lowercase/center names) — so reconciliation + the editor (which teaches
voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.
Native editor to fix speaker-ID errors after transcription (modeled on recap-relay's
correction UX): rename a speaker in the legend, merge two speakers, or reassign an
individual transcript line. Saving rewrites speakers.json, re-renders transcript.md +
recap.html, and updates the voiceprint memory — so a correction compounds: naming an
"Unknown" speaker teaches that voice for future calls.
- SpeakerEditing (pure, tested): replaceSpeaker (rename = merge-onto-existing),
reassign, netNameMap (compose ops), and remap (apply a name map to a recap's
structured fields + whole-word free text, so summaries/extras update without re-LLM).
- RecapEditModel (@MainActor): loads speakers.json (+ optional recap.json +
cluster_fingerprints.json); on save writes the resolved speakers.json, re-renders,
and reconciles voiceprints — merge keeps the survivor's print; rename/name-an-Unknown
enrolls the cluster's fingerprint under the new name.
- TranscriptEditorView (SwiftUI) + EditorWindow (AppKit window for the LSUIElement app);
menu gains "Edit speakers".
- Pipeline now persists cluster_fingerprints.json (every cluster incl. Unknown) and
recap.json (RecapFile) so the editor can learn voices + re-render offline.
- RecapModels made Codable; TranscriptAssembler exposes allFingerprints;
VoiceprintStore gains enroll() + merge().
52/52 XCTest (6 new, incl. a full rename→artifacts→voiceprint round-trip on disk).
New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging
recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible
/v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start
from already-named speakers (label-merge), so recap-relay's speaker clustering +
name-inference are skipped entirely.
- GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via
/api/endpoints, TLS-skip reuse, 503 retry, sequential.
- RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript →
time-windowed analyze (single window for short calls, 18min/2min overlap for long)
→ stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/
open_questions/key_quotes). Defensive JSON parsing of LLM output.
- RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html
(topic sections w/ collapsible transcripts, extras panels, speaker color chips,
full timestamped speaker-attributed transcript, print styles).
- SessionController.buildRecap: best-effort after speakers.json (gated by
settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if
the gateway has no LLM. Settings toggle added.
Validated END-TO-END on the real Meet session against the live gateway: dual-channel
transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin'
correctly attributed to the remote speaker. 46/46 XCTest (10 new).
Visual capture falls back to audio-only silently, so the user couldn't tell if
it attached on a real call. SessionInfo now carries visualSegmentCount (nil =
audio-only; a count = visual ran, with that many vision-detected speaker
segments), shown in the menu as '… · N visual segments' or '… · audio-only'.
Makes the pending live-call validation unambiguous.
AudioRecorder captures system audio (ScreenCaptureKit) + mic (AVAudioEngine) on a
single serial ioQueue, one shared monotonic t0, time-driven writers (pad gaps /
trim overlaps) so tracks stay aligned, and an energy mic-VAD for 'self' spans.
AudioMixer sums the aligned tracks into mixed_mono_16k.wav. SessionController
drives a serialized start/stop state machine, writes the session folder +
self_vad.json, exposes live level meters, and finalizes on quit.
Hardening from review: ioQueue single-domain (no races), stop() never hangs
(mic-first teardown + bounded stopCapture), layout-agnostic mic deep-copy,
discard-only video output to keep SCStream alive, VAD lockstep on committed
frames, stable signing team in project.yml, single-instance enforcement.