ten31-transcripts

Author	SHA1	Message	Date
Grant Gilliam	a5c227ef1c	Prompt for a meeting name on stop; rename the session folder When a recording finishes, ask for a meeting name and rename the session folder from the auto stamp `<yyyy-MM-dd'T'HH-mm-ss>_<app>` to the readable `<date>_<name>_<app>` (dropping HH-MM-SS), so sessions/ is easy to scan. Skipping or leaving it blank keeps the timestamped name. The rename runs after the recorder and visual capture finish (files closed) and before finish() captures the folder for backend processing, so the renamed folder is what flows downstream; finish() re-derives the track URLs from the possibly-moved folder. The quit path never prompts, and a quit with the prompt open ends its modal so termination isn't blocked. Naming/parsing logic lives in a pure, unit-tested SessionNaming; recapTitle moves there and now understands both folder forms.	2026-06-17 21:51:05 -05:00
Grant Gilliam	a3e3406b28	Make diarization chunk length configurable (Auto + presets) Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.	2026-06-09 10:15:16 -05:00
Grant Gilliam	5c80e827a1	Revert adjacent same-speaker segment collapse User found the merged transcript lines harder to read — too many sentences joined into one statement. Remove SpeakerReconciler.mergeAdjacent, its wiring in finishBackend (restore the no-LLM early return), and its tests. Back to one segment per diarized utterance.	2026-06-08 15:52:27 -05:00
Grant Gilliam	8f82e9c0a1	Make adapter toggles actually gate screen-reading The Settings "Adapters" toggles wrote adapterEnabled but nothing in the capture path ever read it, so flipping one off did nothing — and the caption still said "Inert in Phase 0". The adapters (Zoom/Teams/Signal/Meet) are all live now. SessionController.startVisual now skips visual capture when the detected app's adapter is toggled off (records audio-only; transcription still runs). Update the section caption to describe the real behavior.	2026-06-08 13:30:31 -05:00
Grant Gilliam	a95f27ecd1	Collapse adjacent same-speaker segments after reconciliation Fragments reabsorbed by smoothFragments (e.g. "I" then "need to switch it back") were left as separate transcript lines. Add SpeakerReconciler.mergeAdjacent to join consecutive same-speaker segments within 2s, concatenating their text. Wire it into SessionController.finishBackend AFTER reconcile/LLM naming. The collapse needs no LLM, so finishBackend no longer early-returns when the gateway has no chat model — it runs the collapse and re-persists speakers.json unconditionally, gating only the reconcile and recap passes on the model.	2026-06-08 13:19:05 -05:00
Grant Gilliam	1c133c8970	Fix mis-attributed fragments + LLM naming guardrails + re-process saved sessions Investigating Grant's real 38-min group call: 'Marty' was a GARBAGE cluster (192 segs, 0.37s mean, 186 ≤2 words, 125 single words flanked by the same other speaker — diarization micro-fragments split mid-sentence, then LLM-named 'Marty'). Same for 'Message'/'HI'. - SpeakerReconciler.smoothFragments: dissolve non-self clusters whose MEDIAN segment duration ≤ 1s (≥3 segs) — reassign each fragment to the temporally-nearest real speaker. (Median, not max, so one stray long segment can't rescue a fragment cluster — the bug in the first cut.) On the real call: 7 speakers (3 junk) → 4 real (Marty/Message/HI absorbed into Grant/Jonathan/Me/MH). Runs before LLM naming. - LLM naming guardrails: forbid assigning the self name or ANY already-taken name to another voice (fixes 'Grant' = the user's name pinned on a remote speaker); prompt demands self-intro / direct-address evidence (mention ≠ presence), 'precision over coverage', one name per speaker. - Open saved session now offers Open Editor vs Re-process, so newer logic can be applied to past calls (+ always-visible progress from the prior fix). NOTE: the self-name guardrail needs the app to KNOW the user's name — selfName is still 'Me', so set it in Settings (e.g. 'Grant') so the LLM can't reuse it. 62/62 XCTest.	2026-06-08 12:45:17 -05:00
Grant Gilliam	9a18664429	Open saved session: visible progress + clear errors (no silent no-op) The status line only rendered inside the last-in-memory-session block, so 'Open saved session' processed invisibly — looked like nothing happened. Now: the transcript status (with a spinner) is always shown, the processing(0,0) reconcile phase reads 'Working… (this can take a few minutes)', and invalid picks surface an alert (not a recorded session / already processing / unreadable transcript) instead of doing nothing.	2026-06-08 12:16:52 -05:00
Grant Gilliam	6d0c8be8c9	Speaker reconciliation + open/re-process any saved session Reconciliation (the marry-the-signals layer): after transcription, before the recap, SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar (cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters from transcript CONTENT via the gateway LLM (people addressed by name / self-intros), conservative + confidence-gated, keeping the placeholder when unrevealed. The mic-channel self is protected and never reassigned. Voice does the segmentation; the fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack. - SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites speakers.json before recap. SessionController.finishBackend shares one model lookup for reconcile + recap. Gated by settings.reconcileSpeakers (default on). - Open saved session: menu 'Open saved session…' → folder picker. Edits it if already transcribed, else reconstructs inputs from disk (visual_timeline vision segs + channel self-spans) and runs transcribe → reconcile → recap, then opens the editor. Lets you evaluate/correct ANY past call, not just the in-memory last one. Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread initials, lowercase/center names) — so reconciliation + the editor (which teaches voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.	2026-06-08 11:54:41 -05:00
Grant Gilliam	c539b78a58	Configurable recap templates (categories per meeting type, in Settings) Takeaways categories are no longer hardcoded — they're editable templates. A template = the always-on TLDR + an ordered list of sections, each with a title, a type (attributed items / bulleted list / paragraph), and an instruction (the prompt text for that category). The analyzer assembles the LLM prompt FROM the template and parses generically, so adding/removing/renaming a category needs zero code and the output always renders. - RecapTemplate / TemplateSection / SectionKind + TopicGranularity; built-in defaults (Internal Meeting, 1:1, Company/Sales Call), all editable. - Generic extras: RecapExtras{tldr, primarySpeakers, sections:[RenderedSection]} + RecapItem{text,who,when,note} replaces the fixed MeetingExtras. Analyzer builds per-section sec_N fields + parses by kind; renderer + remap are generic. - Topic granularity (coarse/auto/fine) answers 'should chunking be configurable' — it scales the target topic count; raw window sizes stay as tuned defaults. - AppSettings persists templates + defaultTemplateId (seeded once). Settings gets a default-template picker + 'Manage…' → TemplatesView (CRUD, edit sections/ instructions, set default, Preview prompt for full transparency). - Recap editor gains a template picker; Regenerate uses the chosen template. Auto recap uses the default template. 54/54 XCTest (template prompt build, generic parse/remap/render updated).	2026-06-06 19:26:03 -05:00
Grant Gilliam	10ddf9992a	Recap editor: Regenerate recap (re-run LLM on corrected transcript) Adds a 'Regenerate recap' action so corrected speaker names flow into freshly written summaries/extras (not just find-replaced). regenerate() commits the corrections (rewrite speakers.json + reconcile voiceprints), re-runs RecapAnalyzer on the corrected transcript via the gateway LLM, and rewrites recap.json + transcript.md + recap.html. save() and regenerate() share commitCorrections(); both rebaseline the speaker set afterward so further edits map cleanly. Editor view gains the button + progress spinner; RecapEditModel takes the gateway baseURL/skipTLS. 52/52 XCTest; builds clean.	2026-06-06 16:48:18 -05:00
Grant Gilliam	4c086251d9	Speaker corrections: rename / merge / reassign + voice learning Native editor to fix speaker-ID errors after transcription (modeled on recap-relay's correction UX): rename a speaker in the legend, merge two speakers, or reassign an individual transcript line. Saving rewrites speakers.json, re-renders transcript.md + recap.html, and updates the voiceprint memory — so a correction compounds: naming an "Unknown" speaker teaches that voice for future calls. - SpeakerEditing (pure, tested): replaceSpeaker (rename = merge-onto-existing), reassign, netNameMap (compose ops), and remap (apply a name map to a recap's structured fields + whole-word free text, so summaries/extras update without re-LLM). - RecapEditModel (@MainActor): loads speakers.json (+ optional recap.json + cluster_fingerprints.json); on save writes the resolved speakers.json, re-renders, and reconciles voiceprints — merge keeps the survivor's print; rename/name-an-Unknown enrolls the cluster's fingerprint under the new name. - TranscriptEditorView (SwiftUI) + EditorWindow (AppKit window for the LSUIElement app); menu gains "Edit speakers". - Pipeline now persists cluster_fingerprints.json (every cluster incl. Unknown) and recap.json (RecapFile) so the editor can learn voices + re-render offline. - RecapModels made Codable; TranscriptAssembler exposes allFingerprints; VoiceprintStore gains enroll() + merge(). 52/52 XCTest (6 new, incl. a full rename→artifacts→voiceprint round-trip on disk).	2026-06-06 15:12:23 -05:00
Grant Gilliam	85bfdf2b56	Recap: readable transcript + topic sections + meeting extras (gateway LLM) New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible /v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start from already-named speakers (label-merge), so recap-relay's speaker clustering + name-inference are skipped entirely. - GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via /api/endpoints, TLS-skip reuse, 503 retry, sequential. - RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript → time-windowed analyze (single window for short calls, 18min/2min overlap for long) → stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/ open_questions/key_quotes). Defensive JSON parsing of LLM output. - RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html (topic sections w/ collapsible transcripts, extras panels, speaker color chips, full timestamped speaker-attributed transcript, print styles). - SessionController.buildRecap: best-effort after speakers.json (gated by settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if the gateway has no LLM. Settings toggle added. Validated END-TO-END on the real Meet session against the live gateway: dual-channel transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin' correctly attributed to the remote speaker. 46/46 XCTest (10 new).	2026-06-06 14:36:18 -05:00
Grant Gilliam	53d7fcdac0	Client: dual-channel label-merge (mic_file + system_file) The backend shipped dual-channel mode; wire the client to it. We already capture mic (you) and system (others) separately, so send them as two files instead of the mono mix — fixing the misattribution at the source. - SparkControlClient: labelMergeDual(mic_file, system_file, self_name, self_vad); multipart generalized to N files; shared POST/retry/decode extracted. - SessionPackager.rebasedSelfVadData: chunk-local [{start,end}] for self_vad; sliceAudio reused for both tracks. - TranscriptPipeline.process: dual-channel chunking (slice mic+system, rebase timeline + self_vad per chunk) when system audio is healthy; mono mixed-file fallback (self folded into the timeline) otherwise. - VisualCapture.finish: write the full visual_timeline.json (remote + self merged) but return REMOTE (vision) segments only — self travels via the mic channel. - TranscriptAssembler: rank mic_channel highest (the user's own track wins). - VoiceprintStore: store the clean mic_channel self voiceprint. - SessionController: pass mic/system URLs + remote timeline + channel self-spans + self_name + systemHealthy; self_vad.json now reflects the channel-verified spans. Validated END-TO-END against the live backend on the real misattributing session: 'Go Bitcoin' (remote) is now attributed to Unknown_0, NOT the user; the user's own lines come back source=mic_channel; per-channel ASR recovered fuller remote text. 36/36 XCTest (4 new: self_vad rebase, mic_channel ranking + voiceprint storage).	2026-06-06 13:15:29 -05:00
Grant Gilliam	2191486506	Channel-verified self identity: the mic track is you Grant's insight + proven on real session audio: we capture self (mic) and others (system) as separate tracks, then throw the separation away by mixing to mono — so the backend has to re-guess who's who. Analysis of a real call showed the channels are cleanly separated (envelope corr 0.015, NO echo); Caitlyn's 'Go Bitcoin' was 11.8x louder in system than mic, yet the mono mix + noisy visual named it 'Grant'. ChannelSelfVAD marks self-speech as windows where the mic is active AND louder than system (mic > system x1.5). Benefits: (1) self is identified by CHANNEL, not by the on-screen name — set one name in Settings, no per-platform matching; (2) a remote speaker (or room echo) can never be mislabeled as self. Computed at finalize from the two finished WAVs; the live capture path is untouched. Falls back to mic-VAD if tracks can't be read. SessionController feeds these spans to the backend timeline. Validated on the real session: 16 self spans; 'Go Bitcoin' (72-74s) correctly EXCLUDED, Grant's 49.9-53.3s / 62.6-64s correctly INCLUDED. 33/33 XCTest (5 new).	2026-06-06 12:24:29 -05:00
Grant Gilliam	3785f6bdd0	Surface whether visual capture ran on the last session Visual capture falls back to audio-only silently, so the user couldn't tell if it attached on a real call. SessionInfo now carries visualSegmentCount (nil = audio-only; a count = visual ran, with that many vision-detected speaker segments), shown in the menu as '… · N visual segments' or '… · audio-only'. Makes the pending live-call validation unambiguous.	2026-06-06 10:21:44 -05:00
Grant Gilliam	880b56e426	Wire visual capture into the recording lifecycle (failure-isolated) Visual capture now runs alongside audio: on call start the session picks the app's adapter, captures the call window on the SAME monotonic clock as the audio (AudioRecorder.sharedT0Host), and on stop writes visual_timeline.json and hands the backend the visual segments with mic-VAD self-spans merged. Any visual failure (no adapter, no window, Screen Recording denied) leaves the session recording audio-only — the proven path is never blocked or broken. - CallDetector now emits DetectedCall{app, bundleID, windowID}: the exact CGWindowID of the matched Meet browser window (native apps → nil → largest). - VisualCapture wraps VisualObserver + AdapterRegistry, writes visual_timeline.json. - AudioRecorder.sharedT0Host() exposes the shared t0 for frame alignment. Hardened per a 3-lens adversarial review (concurrency / failure-isolation / data-flow), all 6 confirmed findings fixed: - P0 (critical): startVisual could adopt a stale capture into a DIFFERENT session (cross-session SCStream leak + visual_timeline.json written to the wrong folder). Now gated on session identity — generation + recorder ===, still .recording — with fail-closed adoption; otherwise the stream is cancelled. - P1: observer captured the browser's largest window, not the detected Meet window. Now targets the exact CGWindowID (pickWindowIndex, unit-tested), largest-area only as fallback. - P2: a startVisual orphaned by a concurrent stop could leak a stream on quit. inFlightVisual is registered before the await and drained in prepareForTermination. - P3: trailing visual gap/segment ends could exceed duration_sec. Clamped in VisualCapture (clampSegments/clampGaps, unit-tested). - P4: capture pixel size used NSScreen.main scale; now uses the scale of the display actually hosting the window (OCR clarity on secondary displays). - VisualObserver.stop() bounds stopCapture() with a 3s timeout (mirrors audio) so a wedged stream can't hang finalization. 25/25 XCTest pass. Live validation on real calls still pending.	2026-06-06 10:18:52 -05:00
Grant Gilliam	863136aeec	Phases 2-6: detection, visual timeline, backend hand-off, voiceprints Phase 2 (call detection): CallDetector using CoreAudio per-process mic attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet, ignoring our own recording; auto-record toggle. Built; pending live multi-app confirmation by the user. Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation, TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema 1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR + saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver (window capture; frames released, never saved; minimized->visual_gap, idle != gap). Synthetic-frame tested; adapter geometry pending real Signal fixtures + live VisualObserver validation. Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential, TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated END-TO-END against the live backend (chunk -> label-merge -> speakers.json). Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status, auto-send toggle (default off) + self-name setting. All adversarial-review findings fixed. App + XCTest suite build; tests pass.	2026-06-06 00:15:49 -05:00
Grant Gilliam	fd7e1a5907	Phase 1: dual-track audio capture → mixed-mono 16 kHz WAV + mic VAD AudioRecorder captures system audio (ScreenCaptureKit) + mic (AVAudioEngine) on a single serial ioQueue, one shared monotonic t0, time-driven writers (pad gaps / trim overlaps) so tracks stay aligned, and an energy mic-VAD for 'self' spans. AudioMixer sums the aligned tracks into mixed_mono_16k.wav. SessionController drives a serialized start/stop state machine, writes the session folder + self_vad.json, exposes live level meters, and finalizes on quit. Hardening from review: ioQueue single-domain (no races), stop() never hangs (mic-first teardown + bounded stopCapture), layout-agnostic mic deep-copy, discard-only video output to keep SCStream alive, VAD lockstep on committed frames, stable signing team in project.yml, single-instance enforcement.	2026-06-05 21:30:11 -05:00

18 Commits