The NSAppleEventsUsageDescription usage string was dead — the app has no AppleEvents/AppleScript code path (Meet detection reads window titles), so the permission prompt never fired; remove it. Rephrase the leftover "Phase N" build-plan references in source comments (one of which falsely claimed "no audio, capture, or call detection yet"), and complete the AGENTS.md Audio/Detection layout listings.
The app shipped with certificate validation bypassed globally and on by
default — InsecureTrustDelegate trusted any cert from any host. That was
the evaluation's P1: anyone on the LAN could MITM call audio, transcripts,
and voiceprints.
The backend's Start9 cert already validates under normal system trust when
the StartOS Root CA is installed in the keychain (confirmed: URLSession
default validation returns 200 against the backend and its fallback), so the
bypass is unnecessary:
- skip-TLS now defaults to off
- when explicitly enabled, the bypass is scoped to the configured host via
InsecureTrustDelegate.allowsTrustOverride, never "trust any server"
- the host gate is pure and unit-tested (InsecureTrustDelegateTests)
Docs reconciled: AGENTS.md backend/TLS line and Current state.
Native editor to fix speaker-ID errors after transcription (modeled on recap-relay's
correction UX): rename a speaker in the legend, merge two speakers, or reassign an
individual transcript line. Saving rewrites speakers.json, re-renders transcript.md +
recap.html, and updates the voiceprint memory — so a correction compounds: naming an
"Unknown" speaker teaches that voice for future calls.
- SpeakerEditing (pure, tested): replaceSpeaker (rename = merge-onto-existing),
reassign, netNameMap (compose ops), and remap (apply a name map to a recap's
structured fields + whole-word free text, so summaries/extras update without re-LLM).
- RecapEditModel (@MainActor): loads speakers.json (+ optional recap.json +
cluster_fingerprints.json); on save writes the resolved speakers.json, re-renders,
and reconciles voiceprints — merge keeps the survivor's print; rename/name-an-Unknown
enrolls the cluster's fingerprint under the new name.
- TranscriptEditorView (SwiftUI) + EditorWindow (AppKit window for the LSUIElement app);
menu gains "Edit speakers".
- Pipeline now persists cluster_fingerprints.json (every cluster incl. Unknown) and
recap.json (RecapFile) so the editor can learn voices + re-render offline.
- RecapModels made Codable; TranscriptAssembler exposes allFingerprints;
VoiceprintStore gains enroll() + merge().
52/52 XCTest (6 new, incl. a full rename→artifacts→voiceprint round-trip on disk).
New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging
recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible
/v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start
from already-named speakers (label-merge), so recap-relay's speaker clustering +
name-inference are skipped entirely.
- GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via
/api/endpoints, TLS-skip reuse, 503 retry, sequential.
- RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript →
time-windowed analyze (single window for short calls, 18min/2min overlap for long)
→ stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/
open_questions/key_quotes). Defensive JSON parsing of LLM output.
- RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html
(topic sections w/ collapsible transcripts, extras panels, speaker color chips,
full timestamped speaker-attributed transcript, print styles).
- SessionController.buildRecap: best-effort after speakers.json (gated by
settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if
the gateway has no LLM. Settings toggle added.
Validated END-TO-END on the real Meet session against the live gateway: dual-channel
transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin'
correctly attributed to the remote speaker. 46/46 XCTest (10 new).
The backend shipped dual-channel mode; wire the client to it. We already capture
mic (you) and system (others) separately, so send them as two files instead of the
mono mix — fixing the misattribution at the source.
- SparkControlClient: labelMergeDual(mic_file, system_file, self_name, self_vad);
multipart generalized to N files; shared POST/retry/decode extracted.
- SessionPackager.rebasedSelfVadData: chunk-local [{start,end}] for self_vad;
sliceAudio reused for both tracks.
- TranscriptPipeline.process: dual-channel chunking (slice mic+system, rebase
timeline + self_vad per chunk) when system audio is healthy; mono mixed-file
fallback (self folded into the timeline) otherwise.
- VisualCapture.finish: write the full visual_timeline.json (remote + self merged)
but return REMOTE (vision) segments only — self travels via the mic channel.
- TranscriptAssembler: rank mic_channel highest (the user's own track wins).
- VoiceprintStore: store the clean mic_channel self voiceprint.
- SessionController: pass mic/system URLs + remote timeline + channel self-spans +
self_name + systemHealthy; self_vad.json now reflects the channel-verified spans.
Validated END-TO-END against the live backend on the real misattributing session:
'Go Bitcoin' (remote) is now attributed to Unknown_0, NOT the user; the user's own
lines come back source=mic_channel; per-channel ASR recovered fuller remote text.
36/36 XCTest (4 new: self_vad rebase, mic_channel ranking + voiceprint storage).