docs(roadmap): defer parakeet long-audio guard; record rationale + impl shortcut

2026-06-15 17:44:48 -05:00
parent 6238ac88f7
commit d9c098262f
1 changed files with 1 additions and 1 deletions
@@ -3,7 +3,7 @@
 Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
 ## Near term
- parakeet-asr `--memory` cap, shipped via the Reapply-patches action (guards against swap-thrash on very long audio).
+- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
 - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
 ## Audio quality