Reconcile docs/ specs with the shipped app

Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
This commit is contained in:
Grant Gilliam
2026-06-16 22:09:04 -05:00
parent 85ea8fde45
commit dda4322de7
6 changed files with 106 additions and 56 deletions
+2 -3
View File
@@ -27,10 +27,9 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`) ## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`)
- Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change. - Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
- Rewrite the stale README: it claims "Phase 0 / no audio capture" for a shipped Phase-6 app; the `AppSettings.swift:7` comment and the `README.md:49` skip-TLS "on by default" line are also stale (eval P2).
- Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3). - Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
- Reconcile `docs/` specs with reality: the dual-channel API fields (`mic_file`/`system_file`/`self_name`/`self_vad`) and the recap/LLM phase are undocumented; `docs/01` §7 lists already-resolved open items; `docs/02` §2.10 claims absent MenuBarUI features (eval P3). - Optional: sweep the stale "Phase N" references in source comments (e.g. `SparkControlHealth.swift:7` "arrives in Phase 5", `Ten31TranscriptsApp.swift:6` "Phase 0 only") — historical, not false, but dated. `docs/04_BUILD_PLAN.md` is now marked COMPLETE/historical and is the map for these.
- Smaller P3s in `EVALUATION.md`: incomplete AGENTS Layout listings, unwritten `manifest.json` sha256 contract, unused `NSAppleEventsUsageDescription`, unauthenticated LAN backend (consider a bearer token). - Smaller P3s in `EVALUATION.md`: incomplete AGENTS Layout listings, unwritten `manifest.json` sha256 contract (now documented as not-emitted in `docs/03` §2), unused `NSAppleEventsUsageDescription`, unauthenticated LAN backend (consider a bearer token).
## Deferred decisions ## Deferred decisions
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path). - Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
+1 -1
View File
@@ -173,7 +173,7 @@ struct MenuBarView: View {
private var header: some View { private var header: some View {
VStack(alignment: .leading, spacing: 2) { VStack(alignment: .leading, spacing: 2) {
Text("Ten31 Transcripts").font(.headline) Text("Ten31 Transcripts").font(.headline)
Text("Phase 0 · setup & status") Text("Setup & status")
.font(.caption) .font(.caption)
.foregroundStyle(.secondary) .foregroundStyle(.secondary)
} }
+46 -35
View File
@@ -7,9 +7,9 @@
> returns named transcript segments. A growing **voiceprint library** recovers > returns named transcript segments. A growing **voiceprint library** recovers
> speakers even when the visual cue is missing. > speakers even when the visual cue is missing.
Master context document. Read this first, then `02_ARCHITECTURE.md`, Master context document. Read this first, then `02_ARCHITECTURE.md` and
`03_DATA_CONTRACTS.md`, `04_BUILD_PLAN.md`. The SparkControl API is now fully `03_DATA_CONTRACTS.md`. The SparkControl API is fully specified in
specified — see `03_DATA_CONTRACTS.md` (and the source `AUDIO_API.md`). `03_DATA_CONTRACTS.md`.
--- ---
@@ -20,25 +20,30 @@ A lightweight, always-running **menu-bar app on macOS** that:
1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams, 1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams,
or Signal. or Signal.
2. **Records two local audio tracks** — system audio (everyone else) and the 2. **Records two local audio tracks** — system audio (everyone else) and the
user's microphone (the user) — and **mixes them to one 16 kHz mono WAV** for user's microphone (the user). It sends the backend **dual-channel**
the backend. (`mic_file` + `system_file`) when the system track is healthy, falling back to
a **mixed-mono 16 kHz WAV** otherwise.
3. **Watches the call window** at ~24 fps and, per app, reads participant 3. **Watches the call window** at ~24 fps and, per app, reads participant
**names** and the **active-speaker cue**, producing a **names** and the **active-speaker cue**, producing a
`(start, end, name, confidence)` **visual timeline** — its best guess at who `(start, end, name, confidence)` **visual timeline** — its best guess at who
was talking when. was talking when.
4. **Discards every video frame after extraction.** No video is ever written to 4. **Discards every video frame after extraction.** No video is ever written to
disk. Only audio + the derived timeline persist locally. disk. Only audio + the derived timeline persist locally.
5. On call end, **POSTs the mixed audio + the visual timeline (+ the known 5. On call end, **POSTs the audio + the visual timeline (+ the known voiceprint
voiceprint library) to `POST /api/audio/label-merge`** on SparkControl, which library) to `POST /api/audio/label-merge`** on SparkControl, which returns
returns **named, speaker-attributed transcript segments** and a **voiceprint **named, speaker-attributed transcript segments** and a **voiceprint per
per speaker**. speaker**.
6. **Persists the returned voiceprints** keyed by name, so the next call can pass 6. **Persists the returned voiceprints** keyed by name, so the next call can pass
them as `known_voiceprints` and recover a speaker by voice when the visual cue them as `known_voiceprints` and recover a speaker by voice when the visual cue
is absent (camera off, a bad OCR frame). is absent (camera off, a bad OCR frame).
7. **Renders the result locally** — a readable `transcript.md` plus an HTML
`recap.html` (topics + meeting extras, generated via the backend's LLM
endpoint), with an in-app editor for fixing speaker names after the fact.
The app's job ends at receiving and storing the named segments from SparkControl. The app's job ends at producing the named transcript and recap from SparkControl's
**All transcription, diarization, and the name-merge happen on the backend.** Do segments. **All transcription, diarization, name-merge, and LLM analysis happen on
not build transcription, diarization, or the merge vote in this app. the backend.** Do not build transcription, diarization, or the merge vote in this
app.
## 2. Why the visual timeline still matters (the core idea) ## 2. Why the visual timeline still matters (the core idea)
@@ -68,19 +73,25 @@ few calls the system can name regulars even with cameras off.
**In scope (this app):** **In scope (this app):**
- Call detection for Meet / Zoom / Teams / Signal. - Call detection for Meet / Zoom / Teams / Signal.
- Dual-track local audio capture + mix-to-mono for the backend. - Dual-track local audio capture; **dual-channel send** (mic + system) with a
mix-to-mono fallback for the backend.
- Low-fps window capture → OCR (names) + active-speaker cue detection. - Low-fps window capture → OCR (names) + active-speaker cue detection.
- Per-app "adapter" modules encapsulating each app's UI quirks. - Per-app "adapter" modules encapsulating each app's UI quirks.
- Building the visual timeline; **mic-VAD self-labeling** (the mic track is the - Building the visual timeline; **mic-VAD self-labeling** (the mic track is the
user, so hot-mic spans pre-seed the user's name into the timeline). user, so hot-mic spans pre-seed the user's name into the timeline).
- Chunking long calls (~23 min) and calling `label-merge` **sequentially**. - Chunking long calls (~23 min) and calling `label-merge` **sequentially**.
- A local **voiceprint store** (persist + replay named voiceprints). - A local **voiceprint store** (persist + replay named voiceprints).
- Storing the backend's named transcript segments locally. - Storing the backend's named segments and **rendering** them — `transcript.md`
- A minimal menu-bar UI: status, manual start/stop, recent sessions, adapter plus an HTML `recap.html` (recap analysis via the backend LLM) — with an in-app
toggles, backend host/health, output folder. speaker-name editor.
- A minimal menu-bar UI: status, manual start/stop, the last session (reveal,
resend, open recap, edit speakers), adapter toggles, backend host/health,
output folder.
**Out of scope (owned by the backend):** **Out of scope (owned by the backend):**
- Transcription, diarization, the name-merge vote, summarization/analysis. - Transcription, diarization, the name-merge vote, and LLM summarization — these
run on the backend; the app only orchestrates the recap call and renders the
result.
**Explicitly not doing:** saving video; cloud anything. Everything stays on the **Explicitly not doing:** saving video; cloud anything. Everything stays on the
operator's LAN. operator's LAN.
@@ -91,14 +102,14 @@ operator's LAN.
|---|---|---| |---|---|---|
| Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. | | Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. |
| Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. | | Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. |
| Backend audio format | **Mixed-mono 16 kHz WAV** | Diarizer separates speakers from one mixed stream; 16 kHz is ideal. | | Backend audio format | **Dual-channel (mic + system)** when the system track is healthy, else **mixed-mono 16 kHz WAV** | Separate tracks let the backend attribute the user's mic channel directly; the diarizer can still split the mono fallback. |
| Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. | | Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. |
| Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. | | Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. |
| Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. | | Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. |
| Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. | | Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. |
| Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). | | Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). |
| Long calls | Chunk ~23 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. | | Long calls | Chunk ~23 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. |
| Transport / TLS | `multipart/form-data`, file field `file`; self-signed Start9 cert (skip verify or trust the Root CA); **no auth on LAN** | Matches every other SparkControl endpoint. | | Transport / TLS | `multipart/form-data`, file field `file` (mono) or `mic_file` + `system_file` (dual-channel); self-signed Start9 cert (trust the Root CA — supported default; host-scoped skip-verify is an off-by-default escape hatch); **no auth on LAN** | Matches every other SparkControl endpoint. |
| Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. | | Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. |
### On forking Hyprnote ### On forking Hyprnote
@@ -128,25 +139,25 @@ SparkControl, on the operator's Start9 LAN, fronting two DGX Sparks:
- **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize + - **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize +
name from the visual timeline (+ voiceprint fallback), optionally transcribe, name from the visual timeline (+ voiceprint fallback), optionally transcribe,
in one synchronous call. in one synchronous call.
- **LLM (recap):** Qwen3 via OpenAI-compatible `POST /v1/chat/completions`
generates the readable recap (topics + meeting extras) from the transcript.
- Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`. - Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`.
Full request/response shapes, curl examples, limits, and error formats are in Full request/response shapes, curl examples, limits, and error formats are in
`03_DATA_CONTRACTS.md`. `03_DATA_CONTRACTS.md`.
## 7. Remaining open items (small) ## 7. Settled decisions (were open at brief time)
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred 1. **Base URL.** A private LAN host — a `.local` mDNS name (preferred over a raw
over a raw IP, since it survives IP changes) — configured in Settings or via the IP, since it survives IP changes) — configured in Settings or via the
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as `SPARK_BACKEND_URL` env var, never committed. A neutral placeholder ships as the
the default; keep it editable in settings. Service-discovery at default and stays editable in Settings. Service-discovery at `GET /api/endpoints`.
`GET /api/endpoints`. 2. **Send trigger.** Auto-send on call end is a setting (`autoSendOnStop`), **off
2. **Send trigger**assume auto-POST on call end; expose a "hold for review" by default** — the user reviews the session and sends manually unless they opt in.
toggle if the user wants to eyeball the timeline first. 3. **Retention.** The session folder is kept after a successful hand-off (output
3. **Retention** — keep the session folder after a successful hand-off, or prune location is configurable); nothing is pruned automatically.
audio and keep only `speakers.json` + voiceprints? Default: keep everything, 4. **Voiceprint update policy.** Store/refresh the latest high-confidence vector
user-configurable. per name (`02_ARCHITECTURE.md §2.9`); a per-name running average is a possible
4. **Voiceprint update policy** — overwrite vs running-average a person's stored later refinement.
voiceprint across calls (see `02_ARCHITECTURE.md §2.9`). Start simple 5. **Signing.** A stable identity via `Config/Signing.xcconfig` (gitignored) keeps
(store/refresh latest high-confidence), refine later. macOS from re-prompting for permissions on each rebuild.
5. **Signing** — stable identity so macOS doesn't re-prompt for permissions on
each rebuild.
+23 -6
View File
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
└────────────────┘ └────────────────────┘ └────────────────┘ └────────────────────┘
``` ```
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
the backend LLM — see §2.11.)
## 2. Modules ## 2. Modules
### 2.1 `CallDetector` ### 2.1 `CallDetector`
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
``` ```
### 2.7 `SparkControlClient` ### 2.7 `SparkControlClient`
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
`file`, `timeline`, `known_voiceprints`, `transcribe=true`. **dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
system track is healthy, else the **mono** `file`; always with `timeline`,
`known_voiceprints`, `transcribe=true`.
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`). - **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the - **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
Start9 cert) or trust the Root CA. **No auth on the LAN.** Start9 cert) or trust the Root CA. **No auth on the LAN.**
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
- Editable/clearable from the menu-bar UI (rename, delete a person, reset). - Editable/clearable from the menu-bar UI (rename, delete a person, reset).
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`) ### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
Status (idle / detected / recording / uploading), manual start/stop, recent Status (idle / detected / recording / finishing), manual start/stop with live
sessions (open folder, resend, delete), adapter toggles, **backend host + a mic/system level meters, and the **last session** — reveal in Finder, resend
health check** (`GET /api/status`), output folder, voiceprint manager, and a ("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
permissions checklist (Screen Recording, Microphone, Accessibility). to reprocess an existing folder. Also a **backend host + health check**
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
(Microphone, Screen Recording, Accessibility). (No multi-session list or
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
After `speakers.json`, the recap phase turns the named transcript into the
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
`RecapEditModel`) rewrites names across all outputs after the fact. All
language-model work stays on the backend; the app orchestrates and renders.
## 3. macOS frameworks & permissions ## 3. macOS frameworks & permissions
+28 -11
View File
@@ -1,7 +1,7 @@
# Data Contracts — Ten31 Transcripts # Data Contracts — Ten31 Transcripts
Companion to docs 01/02. Defines the files the app produces/stores and the **real Companion to docs 01/02. Defines the files the app produces/stores and the **real
SparkControl contract** (source of truth: `AUDIO_API.md`). The `label-merge` SparkControl contract** (verified against the live backend). The `label-merge`
endpoint is the app's primary integration point. endpoint is the app's primary integration point.
--- ---
@@ -69,8 +69,10 @@ When chunking, **slice to the chunk window and rebase to chunk-local seconds**
"app_version": "0.1.0" "app_version": "0.1.0"
} }
``` ```
(`mixed_mono_16k.wav` is the one the backend gets; the separate tracks are kept (On the dual-channel path the backend gets `mic.wav` + `system.wav` directly; on
locally — the mic track is the user's known identity / VAD source.) the mono fallback it gets `mixed_mono_16k.wav`. The mic track is the user's known
identity / VAD source. **Note:** the per-file `sha256` fields above are part of the
intended contract but are **not currently emitted** by the pipeline.)
--- ---
@@ -83,15 +85,17 @@ locally — the mic track is the user's known identity / VAD source.)
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
the default. the default.
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession` - **TLS:** Start9 self-signed Root CA. Supported path: install the Start9 Root CA
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install into the System keychain (default trust then succeeds). Skip-verification is an
the Start9 Root CA into the trust store. **off-by-default, host-scoped** escape hatch (`InsecureTrustDelegate`, scoped to
the configured backend host), not the default.
- **Auth:** **none on the LAN.** No token/key today. - **Auth:** **none on the LAN.** No token/key today.
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription), - **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio ~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
trips a GPU FFT race → `503 + Retry-After`. trips a GPU FFT race → `503 + Retry-After`.
- **Transport:** `multipart/form-data`, audio file field name **`file`** (bytes, - **Transport:** `multipart/form-data`. Audio file field is **`file`** on the mono
not base64/path). path, or **`mic_file`** + **`system_file`** on the dual-channel path (bytes, not
base64/path).
- **All endpoints are synchronous** (no job IDs / polling). - **All endpoints are synchronous** (no job IDs / polling).
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 + - **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
Retry-After` transient (retry after the interval). Retry-After` transient (retry after the interval).
@@ -105,11 +109,16 @@ Diarize + name clusters from the visual timeline (majority temporal overlap),
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** — with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
the app owns the timeline and the voiceprint library. the app owns the timeline and the voiceprint library.
**Multipart fields:** **Multipart fields** — two audio shapes: **mono** (`file`) or **dual-channel**
(`mic_file` + `system_file`, preferred when the system track is healthy):
| field | required | notes | | field | required | notes |
|---|---|---| |---|---|---|
| `file` | **yes** | mixed-mono WAV (the chunk, when chunking) | | `file` | mono path | mixed-mono WAV (the chunk, when chunking) |
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1) | | `mic_file` | dual path | the user's mic track (chunk) — attributed to `self_name` |
| `system_file` | dual path | the remote/system track (chunk) |
| `self_name` | dual path | the user's name; the mic channel is attributed to them |
| `self_vad` | no | chunk-local windows where the mic is genuinely the user (active + louder than system) |
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1); on the dual path it names only the remote speakers |
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` | | `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
| `transcribe` | no | `"true"` to also return per-segment text (default false) | | `transcribe` | no | `"true"` to also return per-segment text (default false) |
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) | | `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
@@ -213,3 +222,11 @@ Loaded → `known_voiceprints` on every `label-merge` call. Updated from respons
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never `fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
`overlap_confidence ≥ ~0.8`; consider per-name running mean later. `overlap_confidence ≥ ~0.8`; consider per-name running mean later.
## 8. Recap outputs (`transcript.md`, `recap.{html,json}`)
After `speakers.json` is assembled, the recap phase renders the human-readable
deliverables: a `transcript.md` (one line per diarized utterance) and an HTML
`recap.html`, backed by a structured `recap.json`. The recap's topic/summary
content is generated by the **backend LLM** (`POST /v1/chat/completions`, Qwen3);
the app owns the rendering and the in-app **speaker-name editor**, which can rewrite
names across `speakers.json`, the transcript, and the recap after the fact.
+6
View File
@@ -1,5 +1,11 @@
# Build Plan — Ten31 Transcripts # Build Plan — Ten31 Transcripts
> **Status: COMPLETE (historical).** Phases 06 shipped and the app is in daily
> use; a recap phase (transcript + HTML recap via the backend LLM) was added after
> this plan was written. Kept as the original build log and as the map for the
> "Phase N" references in the code comments. Forward-looking work lives in
> `ROADMAP.md`; current status in `AGENTS.md`.
Companion to docs 0103. Phased plan for the Claude Code session, each phase with Companion to docs 0103. Phased plan for the Claude Code session, each phase with
a demoable milestone. Build in order; the risky/novel work (visual adapters) is a demoable milestone. Build in order; the risky/novel work (visual adapters) is
isolated for independent tuning. The SparkControl contract is now known isolated for independent tuning. The SparkControl contract is now known