Chunk size was hardcoded at 2.5-min bodies. Add a Settings control:
Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer
simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large
calls, at some cost to speed and cross-chunk voice matching.
- ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4
participants were detected, else 150s; overlap + single-chunk threshold scale
with the body length.
- AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation.
- TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it.
- SessionController resolves the body from the setting + the session's detected
participant count (visual_timeline participants) for both send + re-process.
- Participant roster now counts EVERY tile OCR'd, not just who spoke
(TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto
call-size signal is meaningful even though speaking-detection is sparse.
Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.
Settings was a NavigationLink pushed inside the 320pt menu-bar popover, so the
grouped form was cramped and most controls sat below a non-obvious scroll (and
showed a confusing "< Settings" back arrow). Add SettingsWindow (same standalone
NSWindow pattern as the Editor/Templates windows) and open it from the menu-bar
"Settings…" button. Drop the now-unused NavigationStack and the 320pt cap so the
form uses real window width with normal macOS spacing; window is resizable.
The Settings "Adapters" toggles wrote adapterEnabled but nothing in the capture
path ever read it, so flipping one off did nothing — and the caption still said
"Inert in Phase 0". The adapters (Zoom/Teams/Signal/Meet) are all live now.
SessionController.startVisual now skips visual capture when the detected app's
adapter is toggled off (records audio-only; transcription still runs). Update the
section caption to describe the real behavior.
The name field was the first row of the third "Transcription" section, below
the fold — users couldn't find where to set their name (it's the setting that
labels the mic channel and reserves the name so the LLM never assigns it to
another speaker). Move it into a dedicated "Your name" section at the top of
Settings, and show an orange nudge while it's still the placeholder "Me"/empty.
Reconciliation (the marry-the-signals layer): after transcription, before the recap,
SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar
(cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one
remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters
from transcript CONTENT via the gateway LLM (people addressed by name / self-intros),
conservative + confidence-gated, keeping the placeholder when unrevealed. The
mic-channel self is protected and never reassigned. Voice does the segmentation; the
fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack.
- SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites
speakers.json before recap. SessionController.finishBackend shares one model lookup
for reconcile + recap. Gated by settings.reconcileSpeakers (default on).
- Open saved session: menu 'Open saved session…' → folder picker. Edits it if already
transcribed, else reconstructs inputs from disk (visual_timeline vision segs +
channel self-spans) and runs transcribe → reconcile → recap, then opens the editor.
Lets you evaluate/correct ANY past call, not just the in-memory last one.
Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread
initials, lowercase/center names) — so reconciliation + the editor (which teaches
voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.
Takeaways categories are no longer hardcoded — they're editable templates. A
template = the always-on TLDR + an ordered list of sections, each with a title, a
type (attributed items / bulleted list / paragraph), and an instruction (the prompt
text for that category). The analyzer assembles the LLM prompt FROM the template
and parses generically, so adding/removing/renaming a category needs zero code and
the output always renders.
- RecapTemplate / TemplateSection / SectionKind + TopicGranularity; built-in
defaults (Internal Meeting, 1:1, Company/Sales Call), all editable.
- Generic extras: RecapExtras{tldr, primarySpeakers, sections:[RenderedSection]} +
RecapItem{text,who,when,note} replaces the fixed MeetingExtras. Analyzer builds
per-section sec_N fields + parses by kind; renderer + remap are generic.
- Topic granularity (coarse/auto/fine) answers 'should chunking be configurable' —
it scales the target topic count; raw window sizes stay as tuned defaults.
- AppSettings persists templates + defaultTemplateId (seeded once). Settings gets a
default-template picker + 'Manage…' → TemplatesView (CRUD, edit sections/
instructions, set default, **Preview prompt** for full transparency).
- Recap editor gains a template picker; Regenerate uses the chosen template. Auto
recap uses the default template.
54/54 XCTest (template prompt build, generic parse/remap/render updated).
New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging
recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible
/v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start
from already-named speakers (label-merge), so recap-relay's speaker clustering +
name-inference are skipped entirely.
- GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via
/api/endpoints, TLS-skip reuse, 503 retry, sequential.
- RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript →
time-windowed analyze (single window for short calls, 18min/2min overlap for long)
→ stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/
open_questions/key_quotes). Defensive JSON parsing of LLM output.
- RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html
(topic sections w/ collapsible transcripts, extras panels, speaker color chips,
full timestamped speaker-attributed transcript, print styles).
- SessionController.buildRecap: best-effort after speakers.json (gated by
settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if
the gateway has no LLM. Settings toggle added.
Validated END-TO-END on the real Meet session against the live gateway: dual-channel
transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin'
correctly attributed to the remote speaker. 46/46 XCTest (10 new).