Reconciliation (the marry-the-signals layer): after transcription, before the recap,
SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar
(cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one
remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters
from transcript CONTENT via the gateway LLM (people addressed by name / self-intros),
conservative + confidence-gated, keeping the placeholder when unrevealed. The
mic-channel self is protected and never reassigned. Voice does the segmentation; the
fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack.
- SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites
speakers.json before recap. SessionController.finishBackend shares one model lookup
for reconcile + recap. Gated by settings.reconcileSpeakers (default on).
- Open saved session: menu 'Open saved session…' → folder picker. Edits it if already
transcribed, else reconstructs inputs from disk (visual_timeline vision segs +
channel self-spans) and runs transcribe → reconcile → recap, then opens the editor.
Lets you evaluate/correct ANY past call, not just the in-memory last one.
Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread
initials, lowercase/center names) — so reconciliation + the editor (which teaches
voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.
Takeaways categories are no longer hardcoded — they're editable templates. A
template = the always-on TLDR + an ordered list of sections, each with a title, a
type (attributed items / bulleted list / paragraph), and an instruction (the prompt
text for that category). The analyzer assembles the LLM prompt FROM the template
and parses generically, so adding/removing/renaming a category needs zero code and
the output always renders.
- RecapTemplate / TemplateSection / SectionKind + TopicGranularity; built-in
defaults (Internal Meeting, 1:1, Company/Sales Call), all editable.
- Generic extras: RecapExtras{tldr, primarySpeakers, sections:[RenderedSection]} +
RecapItem{text,who,when,note} replaces the fixed MeetingExtras. Analyzer builds
per-section sec_N fields + parses by kind; renderer + remap are generic.
- Topic granularity (coarse/auto/fine) answers 'should chunking be configurable' —
it scales the target topic count; raw window sizes stay as tuned defaults.
- AppSettings persists templates + defaultTemplateId (seeded once). Settings gets a
default-template picker + 'Manage…' → TemplatesView (CRUD, edit sections/
instructions, set default, **Preview prompt** for full transparency).
- Recap editor gains a template picker; Regenerate uses the chosen template. Auto
recap uses the default template.
54/54 XCTest (template prompt build, generic parse/remap/render updated).
New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging
recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible
/v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start
from already-named speakers (label-merge), so recap-relay's speaker clustering +
name-inference are skipped entirely.
- GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via
/api/endpoints, TLS-skip reuse, 503 retry, sequential.
- RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript →
time-windowed analyze (single window for short calls, 18min/2min overlap for long)
→ stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/
open_questions/key_quotes). Defensive JSON parsing of LLM output.
- RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html
(topic sections w/ collapsible transcripts, extras panels, speaker color chips,
full timestamped speaker-attributed transcript, print styles).
- SessionController.buildRecap: best-effort after speakers.json (gated by
settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if
the gateway has no LLM. Settings toggle added.
Validated END-TO-END on the real Meet session against the live gateway: dual-channel
transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin'
correctly attributed to the remote speaker. 46/46 XCTest (10 new).