Reconcile docs/ specs with the shipped app

Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
This commit is contained in:
Grant Gilliam
2026-06-16 22:09:04 -05:00
parent 85ea8fde45
commit dda4322de7
6 changed files with 106 additions and 56 deletions
+23 -6
View File
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
└────────────────┘ └────────────────────┘
```
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
the backend LLM — see §2.11.)
## 2. Modules
### 2.1 `CallDetector`
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
```
### 2.7 `SparkControlClient`
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with
`file`, `timeline`, `known_voiceprints`, `transcribe=true`.
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
**dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
system track is healthy, else the **mono** `file`; always with `timeline`,
`known_voiceprints`, `transcribe=true`.
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
Start9 cert) or trust the Root CA. **No auth on the LAN.**
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
- Editable/clearable from the menu-bar UI (rename, delete a person, reset).
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
Status (idle / detected / recording / uploading), manual start/stop, recent
sessions (open folder, resend, delete), adapter toggles, **backend host + a
health check** (`GET /api/status`), output folder, voiceprint manager, and a
permissions checklist (Screen Recording, Microphone, Accessibility).
Status (idle / detected / recording / finishing), manual start/stop with live
mic/system level meters, and the **last session** — reveal in Finder, resend
("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
to reprocess an existing folder. Also a **backend host + health check**
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
(Microphone, Screen Recording, Accessibility). (No multi-session list or
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
After `speakers.json`, the recap phase turns the named transcript into the
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
`RecapEditModel`) rewrites names across all outputs after the fact. All
language-model work stays on the backend; the app orchestrates and renders.
## 3. macOS frameworks & permissions