Commit Graph

11 Commits

Author SHA1 Message Date
Grant Gilliam a2328de4d3 Surface "Your name" as its own top Settings section
The name field was the first row of the third "Transcription" section, below
the fold — users couldn't find where to set their name (it's the setting that
labels the mic channel and reserves the name so the LLM never assigns it to
another speaker). Move it into a dedicated "Your name" section at the top of
Settings, and show an orange nudge while it's still the placeholder "Me"/empty.
2026-06-08 13:25:33 -05:00
Grant Gilliam 9a18664429 Open saved session: visible progress + clear errors (no silent no-op)
The status line only rendered inside the last-in-memory-session block, so 'Open
saved session' processed invisibly — looked like nothing happened. Now: the
transcript status (with a spinner) is always shown, the processing(0,0) reconcile
phase reads 'Working… (this can take a few minutes)', and invalid picks surface an
alert (not a recorded session / already processing / unreadable transcript) instead
of doing nothing.
2026-06-08 12:16:52 -05:00
Grant Gilliam 6d0c8be8c9 Speaker reconciliation + open/re-process any saved session
Reconciliation (the marry-the-signals layer): after transcription, before the recap,
SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar
(cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one
remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters
from transcript CONTENT via the gateway LLM (people addressed by name / self-intros),
conservative + confidence-gated, keeping the placeholder when unrevealed. The
mic-channel self is protected and never reassigned. Voice does the segmentation; the
fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack.

- SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites
  speakers.json before recap. SessionController.finishBackend shares one model lookup
  for reconcile + recap. Gated by settings.reconcileSpeakers (default on).
- Open saved session: menu 'Open saved session…' → folder picker. Edits it if already
  transcribed, else reconstructs inputs from disk (visual_timeline vision segs +
  channel self-spans) and runs transcribe → reconcile → recap, then opens the editor.
  Lets you evaluate/correct ANY past call, not just the in-memory last one.

Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread
initials, lowercase/center names) — so reconciliation + the editor (which teaches
voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.
2026-06-08 11:54:41 -05:00
Grant Gilliam c539b78a58 Configurable recap templates (categories per meeting type, in Settings)
Takeaways categories are no longer hardcoded — they're editable templates. A
template = the always-on TLDR + an ordered list of sections, each with a title, a
type (attributed items / bulleted list / paragraph), and an instruction (the prompt
text for that category). The analyzer assembles the LLM prompt FROM the template
and parses generically, so adding/removing/renaming a category needs zero code and
the output always renders.

- RecapTemplate / TemplateSection / SectionKind + TopicGranularity; built-in
  defaults (Internal Meeting, 1:1, Company/Sales Call), all editable.
- Generic extras: RecapExtras{tldr, primarySpeakers, sections:[RenderedSection]} +
  RecapItem{text,who,when,note} replaces the fixed MeetingExtras. Analyzer builds
  per-section sec_N fields + parses by kind; renderer + remap are generic.
- Topic granularity (coarse/auto/fine) answers 'should chunking be configurable' —
  it scales the target topic count; raw window sizes stay as tuned defaults.
- AppSettings persists templates + defaultTemplateId (seeded once). Settings gets a
  default-template picker + 'Manage…' → TemplatesView (CRUD, edit sections/
  instructions, set default, **Preview prompt** for full transparency).
- Recap editor gains a template picker; Regenerate uses the chosen template. Auto
  recap uses the default template.

54/54 XCTest (template prompt build, generic parse/remap/render updated).
2026-06-06 19:26:03 -05:00
Grant Gilliam 10ddf9992a Recap editor: Regenerate recap (re-run LLM on corrected transcript)
Adds a 'Regenerate recap' action so corrected speaker names flow into freshly
written summaries/extras (not just find-replaced). regenerate() commits the
corrections (rewrite speakers.json + reconcile voiceprints), re-runs RecapAnalyzer
on the corrected transcript via the gateway LLM, and rewrites recap.json +
transcript.md + recap.html. save() and regenerate() share commitCorrections();
both rebaseline the speaker set afterward so further edits map cleanly. Editor view
gains the button + progress spinner; RecapEditModel takes the gateway baseURL/skipTLS.

52/52 XCTest; builds clean.
2026-06-06 16:48:18 -05:00
Grant Gilliam 4c086251d9 Speaker corrections: rename / merge / reassign + voice learning
Native editor to fix speaker-ID errors after transcription (modeled on recap-relay's
correction UX): rename a speaker in the legend, merge two speakers, or reassign an
individual transcript line. Saving rewrites speakers.json, re-renders transcript.md +
recap.html, and updates the voiceprint memory — so a correction compounds: naming an
"Unknown" speaker teaches that voice for future calls.

- SpeakerEditing (pure, tested): replaceSpeaker (rename = merge-onto-existing),
  reassign, netNameMap (compose ops), and remap (apply a name map to a recap's
  structured fields + whole-word free text, so summaries/extras update without re-LLM).
- RecapEditModel (@MainActor): loads speakers.json (+ optional recap.json +
  cluster_fingerprints.json); on save writes the resolved speakers.json, re-renders,
  and reconciles voiceprints — merge keeps the survivor's print; rename/name-an-Unknown
  enrolls the cluster's fingerprint under the new name.
- TranscriptEditorView (SwiftUI) + EditorWindow (AppKit window for the LSUIElement app);
  menu gains "Edit speakers".
- Pipeline now persists cluster_fingerprints.json (every cluster incl. Unknown) and
  recap.json (RecapFile) so the editor can learn voices + re-render offline.
- RecapModels made Codable; TranscriptAssembler exposes allFingerprints;
  VoiceprintStore gains enroll() + merge().

52/52 XCTest (6 new, incl. a full rename→artifacts→voiceprint round-trip on disk).
2026-06-06 15:12:23 -05:00
Grant Gilliam 85bfdf2b56 Recap: readable transcript + topic sections + meeting extras (gateway LLM)
New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging
recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible
/v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start
from already-named speakers (label-merge), so recap-relay's speaker clustering +
name-inference are skipped entirely.

- GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via
  /api/endpoints, TLS-skip reuse, 503 retry, sequential.
- RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript →
  time-windowed analyze (single window for short calls, 18min/2min overlap for long)
  → stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/
  open_questions/key_quotes). Defensive JSON parsing of LLM output.
- RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html
  (topic sections w/ collapsible transcripts, extras panels, speaker color chips,
  full timestamped speaker-attributed transcript, print styles).
- SessionController.buildRecap: best-effort after speakers.json (gated by
  settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if
  the gateway has no LLM. Settings toggle added.

Validated END-TO-END on the real Meet session against the live gateway: dual-channel
transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin'
correctly attributed to the remote speaker. 46/46 XCTest (10 new).
2026-06-06 14:36:18 -05:00
Grant Gilliam 3785f6bdd0 Surface whether visual capture ran on the last session
Visual capture falls back to audio-only silently, so the user couldn't tell if
it attached on a real call. SessionInfo now carries visualSegmentCount (nil =
audio-only; a count = visual ran, with that many vision-detected speaker
segments), shown in the menu as '… · N visual segments' or '… · audio-only'.
Makes the pending live-call validation unambiguous.
2026-06-06 10:21:44 -05:00
Grant Gilliam 863136aeec Phases 2-6: detection, visual timeline, backend hand-off, voiceprints
Phase 2 (call detection): CallDetector using CoreAudio per-process mic
attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet,
ignoring our own recording; auto-record toggle. Built; pending live multi-app
confirmation by the user.

Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation,
TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema
1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR +
saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver
(window capture; frames released, never saved; minimized->visual_gap, idle != gap).
Synthetic-frame tested; adapter geometry pending real Signal fixtures + live
VisualObserver validation.

Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential,
TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline
slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated
END-TO-END against the live backend (chunk -> label-merge -> speakers.json).

Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named
fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status,
auto-send toggle (default off) + self-name setting.

All adversarial-review findings fixed. App + XCTest suite build; tests pass.
2026-06-06 00:15:49 -05:00
Grant Gilliam fd7e1a5907 Phase 1: dual-track audio capture → mixed-mono 16 kHz WAV + mic VAD
AudioRecorder captures system audio (ScreenCaptureKit) + mic (AVAudioEngine) on a
single serial ioQueue, one shared monotonic t0, time-driven writers (pad gaps /
trim overlaps) so tracks stay aligned, and an energy mic-VAD for 'self' spans.
AudioMixer sums the aligned tracks into mixed_mono_16k.wav. SessionController
drives a serialized start/stop state machine, writes the session folder +
self_vad.json, exposes live level meters, and finalizes on quit.

Hardening from review: ioQueue single-domain (no races), stop() never hangs
(mic-first teardown + bounded stopCapture), layout-agnostic mic deep-copy,
discard-only video output to keep SCStream alive, VAD lockstep on committed
frames, stable signing team in project.yml, single-instance enforcement.
2026-06-05 21:30:11 -05:00
Grant Gilliam b2ae3a62b9 Phase 0: menu-bar scaffold, permissions, backend health check
Native SwiftUI menu-bar app (LSUIElement, macOS 13+), generated from project.yml
via XcodeGen. Includes:
- PermissionsManager (Microphone / Screen Recording / Accessibility) + UI
- SparkControlHealth: GET /api/status over self-signed TLS (InsecureTrustDelegate)
- AppSettings persistence (host, TLS-skip, output folder, adapter toggles)
- Menu-bar panel + Settings, app sandbox & hardened runtime off (LAN tool)
2026-06-05 19:33:53 -05:00