Reconcile docs/ specs with the shipped app
Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
This commit is contained in:
+28
-11
@@ -1,7 +1,7 @@
|
||||
# Data Contracts — Ten31 Transcripts
|
||||
|
||||
Companion to docs 01/02. Defines the files the app produces/stores and the **real
|
||||
SparkControl contract** (source of truth: `AUDIO_API.md`). The `label-merge`
|
||||
SparkControl contract** (verified against the live backend). The `label-merge`
|
||||
endpoint is the app's primary integration point.
|
||||
|
||||
---
|
||||
@@ -69,8 +69,10 @@ When chunking, **slice to the chunk window and rebase to chunk-local seconds**
|
||||
"app_version": "0.1.0"
|
||||
}
|
||||
```
|
||||
(`mixed_mono_16k.wav` is the one the backend gets; the separate tracks are kept
|
||||
locally — the mic track is the user's known identity / VAD source.)
|
||||
(On the dual-channel path the backend gets `mic.wav` + `system.wav` directly; on
|
||||
the mono fallback it gets `mixed_mono_16k.wav`. The mic track is the user's known
|
||||
identity / VAD source. **Note:** the per-file `sha256` fields above are part of the
|
||||
intended contract but are **not currently emitted** by the pipeline.)
|
||||
|
||||
---
|
||||
|
||||
@@ -83,15 +85,17 @@ locally — the mic track is the user's known identity / VAD source.)
|
||||
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
||||
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
||||
the default.
|
||||
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
||||
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
||||
the Start9 Root CA into the trust store.
|
||||
- **TLS:** Start9 self-signed Root CA. Supported path: install the Start9 Root CA
|
||||
into the System keychain (default trust then succeeds). Skip-verification is an
|
||||
**off-by-default, host-scoped** escape hatch (`InsecureTrustDelegate`, scoped to
|
||||
the configured backend host), not the default.
|
||||
- **Auth:** **none on the LAN.** No token/key today.
|
||||
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
|
||||
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
|
||||
trips a GPU FFT race → `503 + Retry-After`.
|
||||
- **Transport:** `multipart/form-data`, audio file field name **`file`** (bytes,
|
||||
not base64/path).
|
||||
- **Transport:** `multipart/form-data`. Audio file field is **`file`** on the mono
|
||||
path, or **`mic_file`** + **`system_file`** on the dual-channel path (bytes, not
|
||||
base64/path).
|
||||
- **All endpoints are synchronous** (no job IDs / polling).
|
||||
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
|
||||
Retry-After` transient (retry after the interval).
|
||||
@@ -105,11 +109,16 @@ Diarize + name clusters from the visual timeline (majority temporal overlap),
|
||||
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
|
||||
the app owns the timeline and the voiceprint library.
|
||||
|
||||
**Multipart fields:**
|
||||
**Multipart fields** — two audio shapes: **mono** (`file`) or **dual-channel**
|
||||
(`mic_file` + `system_file`, preferred when the system track is healthy):
|
||||
| field | required | notes |
|
||||
|---|---|---|
|
||||
| `file` | **yes** | mixed-mono WAV (the chunk, when chunking) |
|
||||
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1) |
|
||||
| `file` | mono path | mixed-mono WAV (the chunk, when chunking) |
|
||||
| `mic_file` | dual path | the user's mic track (chunk) — attributed to `self_name` |
|
||||
| `system_file` | dual path | the remote/system track (chunk) |
|
||||
| `self_name` | dual path | the user's name; the mic channel is attributed to them |
|
||||
| `self_vad` | no | chunk-local windows where the mic is genuinely the user (active + louder than system) |
|
||||
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1); on the dual path it names only the remote speakers |
|
||||
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
|
||||
| `transcribe` | no | `"true"` to also return per-segment text (default false) |
|
||||
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
|
||||
@@ -213,3 +222,11 @@ Loaded → `known_voiceprints` on every `label-merge` call. Updated from respons
|
||||
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
|
||||
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
|
||||
`overlap_confidence ≥ ~0.8`; consider per-name running mean later.
|
||||
|
||||
## 8. Recap outputs (`transcript.md`, `recap.{html,json}`)
|
||||
After `speakers.json` is assembled, the recap phase renders the human-readable
|
||||
deliverables: a `transcript.md` (one line per diarized utterance) and an HTML
|
||||
`recap.html`, backed by a structured `recap.json`. The recap's topic/summary
|
||||
content is generated by the **backend LLM** (`POST /v1/chat/completions`, Qwen3);
|
||||
the app owns the rendering and the in-app **speaker-name editor**, which can rewrite
|
||||
names across `speakers.json`, the transcript, and the recap after the fact.
|
||||
|
||||
Reference in New Issue
Block a user