Reconcile docs/ specs with the shipped app
Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
This commit is contained in:
+23
-6
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
|
||||
└────────────────┘ └────────────────────┘
|
||||
```
|
||||
|
||||
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
|
||||
the backend LLM — see §2.11.)
|
||||
|
||||
## 2. Modules
|
||||
|
||||
### 2.1 `CallDetector`
|
||||
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
|
||||
```
|
||||
|
||||
### 2.7 `SparkControlClient`
|
||||
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with
|
||||
`file`, `timeline`, `known_voiceprints`, `transcribe=true`.
|
||||
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
|
||||
**dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
|
||||
system track is healthy, else the **mono** `file`; always with `timeline`,
|
||||
`known_voiceprints`, `transcribe=true`.
|
||||
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
|
||||
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
|
||||
Start9 cert) or trust the Root CA. **No auth on the LAN.**
|
||||
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
|
||||
- Editable/clearable from the menu-bar UI (rename, delete a person, reset).
|
||||
|
||||
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
|
||||
Status (idle / detected / recording / uploading), manual start/stop, recent
|
||||
sessions (open folder, resend, delete), adapter toggles, **backend host + a
|
||||
health check** (`GET /api/status`), output folder, voiceprint manager, and a
|
||||
permissions checklist (Screen Recording, Microphone, Accessibility).
|
||||
Status (idle / detected / recording / finishing), manual start/stop with live
|
||||
mic/system level meters, and the **last session** — reveal in Finder, resend
|
||||
("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
|
||||
to reprocess an existing folder. Also a **backend host + health check**
|
||||
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
|
||||
(Microphone, Screen Recording, Accessibility). (No multi-session list or
|
||||
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
|
||||
|
||||
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
|
||||
After `speakers.json`, the recap phase turns the named transcript into the
|
||||
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
|
||||
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
|
||||
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
|
||||
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
|
||||
`RecapEditModel`) rewrites names across all outputs after the fact. All
|
||||
language-model work stays on the backend; the app orchestrates and renders.
|
||||
|
||||
## 3. macOS frameworks & permissions
|
||||
|
||||
|
||||
Reference in New Issue
Block a user