ten31-transcripts

Author	SHA1	Message	Date
Grant Gilliam	35ba6ecf05	Drop unused AppleEvents usage string; de-stale Phase-N comments The NSAppleEventsUsageDescription usage string was dead — the app has no AppleEvents/AppleScript code path (Meet detection reads window titles), so the permission prompt never fired; remove it. Rephrase the leftover "Phase N" build-plan references in source comments (one of which falsely claimed "no audio, capture, or call detection yet"), and complete the AGENTS.md Audio/Detection layout listings.	2026-06-16 22:15:44 -05:00
Grant Gilliam	a3e3406b28	Make diarization chunk length configurable (Auto + presets) Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.	2026-06-09 10:15:16 -05:00
Grant Gilliam	1b6bb8ab67	Drop stuck whole-call visual spans at processing time Defense-in-depth + salvage for sessions captured before the adapter fix: drop any vision-source span whose single unbroken duration covers ≥60% of the call. No one speaks that long without a break, so it's a stuck/false active-speaker cue that would dominate backend name attribution. Self (mic_vad) spans are never dropped. Applied to both the live and re-process paths. Test added; 66 pass.	2026-06-08 16:21:45 -05:00
Grant Gilliam	ab910cf742	Chunk overlap + overlap-aware stitching Chunks were contiguous (start = prev end) with a naïve offset-concat stitch — no overlap. That cut sentences at boundaries, denied the diarizer context at edges, and let one voice split across chunks (the MH/Unknown_0 problem). Now each ~150s body is sliced with a 15s margin on both sides ([bodyStart-15, bodyEnd+15]); the stitcher keeps a segment only in the chunk that owns its MIDPOINT (body region) and drops it from the neighbour's margin — so boundary-spanning speech is seen whole by the backend and kept exactly once. - SessionPackager.PlannedChunk gains bodyStart/bodyEnd; planChunks adds overlapSeconds. - TranscriptAssembler.ChunkResult carries body bounds (defaults keep-all → no-overlap behaviour preserved for existing callers); assemble dedups by midpoint-in-body. - TranscriptPipeline passes body bounds through. Complements (doesn't replace) the fragment-smoothing + reconciliation safety nets; this is the upstream fix. ~+20% backend audio per interior chunk. 63/63 XCTest (new: overlap window layout + boundary-segment dedup).	2026-06-08 13:03:56 -05:00
Grant Gilliam	4c086251d9	Speaker corrections: rename / merge / reassign + voice learning Native editor to fix speaker-ID errors after transcription (modeled on recap-relay's correction UX): rename a speaker in the legend, merge two speakers, or reassign an individual transcript line. Saving rewrites speakers.json, re-renders transcript.md + recap.html, and updates the voiceprint memory — so a correction compounds: naming an "Unknown" speaker teaches that voice for future calls. - SpeakerEditing (pure, tested): replaceSpeaker (rename = merge-onto-existing), reassign, netNameMap (compose ops), and remap (apply a name map to a recap's structured fields + whole-word free text, so summaries/extras update without re-LLM). - RecapEditModel (@MainActor): loads speakers.json (+ optional recap.json + cluster_fingerprints.json); on save writes the resolved speakers.json, re-renders, and reconciles voiceprints — merge keeps the survivor's print; rename/name-an-Unknown enrolls the cluster's fingerprint under the new name. - TranscriptEditorView (SwiftUI) + EditorWindow (AppKit window for the LSUIElement app); menu gains "Edit speakers". - Pipeline now persists cluster_fingerprints.json (every cluster incl. Unknown) and recap.json (RecapFile) so the editor can learn voices + re-render offline. - RecapModels made Codable; TranscriptAssembler exposes allFingerprints; VoiceprintStore gains enroll() + merge(). 52/52 XCTest (6 new, incl. a full rename→artifacts→voiceprint round-trip on disk).	2026-06-06 15:12:23 -05:00
Grant Gilliam	53d7fcdac0	Client: dual-channel label-merge (mic_file + system_file) The backend shipped dual-channel mode; wire the client to it. We already capture mic (you) and system (others) separately, so send them as two files instead of the mono mix — fixing the misattribution at the source. - SparkControlClient: labelMergeDual(mic_file, system_file, self_name, self_vad); multipart generalized to N files; shared POST/retry/decode extracted. - SessionPackager.rebasedSelfVadData: chunk-local [{start,end}] for self_vad; sliceAudio reused for both tracks. - TranscriptPipeline.process: dual-channel chunking (slice mic+system, rebase timeline + self_vad per chunk) when system audio is healthy; mono mixed-file fallback (self folded into the timeline) otherwise. - VisualCapture.finish: write the full visual_timeline.json (remote + self merged) but return REMOTE (vision) segments only — self travels via the mic channel. - TranscriptAssembler: rank mic_channel highest (the user's own track wins). - VoiceprintStore: store the clean mic_channel self voiceprint. - SessionController: pass mic/system URLs + remote timeline + channel self-spans + self_name + systemHealthy; self_vad.json now reflects the channel-verified spans. Validated END-TO-END against the live backend on the real misattributing session: 'Go Bitcoin' (remote) is now attributed to Unknown_0, NOT the user; the user's own lines come back source=mic_channel; per-channel ASR recovered fuller remote text. 36/36 XCTest (4 new: self_vad rebase, mic_channel ranking + voiceprint storage).	2026-06-06 13:15:29 -05:00
Grant Gilliam	863136aeec	Phases 2-6: detection, visual timeline, backend hand-off, voiceprints Phase 2 (call detection): CallDetector using CoreAudio per-process mic attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet, ignoring our own recording; auto-record toggle. Built; pending live multi-app confirmation by the user. Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation, TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema 1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR + saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver (window capture; frames released, never saved; minimized->visual_gap, idle != gap). Synthetic-frame tested; adapter geometry pending real Signal fixtures + live VisualObserver validation. Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential, TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated END-TO-END against the live backend (chunk -> label-merge -> speakers.json). Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status, auto-send toggle (default off) + self-name setting. All adversarial-review findings fixed. App + XCTest suite build; tests pass.	2026-06-06 00:15:49 -05:00

7 Commits