docs: split CLAUDE.md into path-scoped .claude/rules; fix dev/test commands

- CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance moved to .claude/rules/{startos-package,fastapi-image,redaction, audio-speech}.md with paths: frontmatter so each loads only when matching files are touched - .gitignore: track .claude/rules/ while keeping the rest of .claude/ (settings.local.json) ignored - test-audio-with-speakers.sh: require audio-file arg in docs, replace owner-specific SPARK_CONTROL/VLLM defaults with generic ones (localhost dev server + Spark Control vLLM proxy), discover the loaded LLM via /api/status since /v1/models lists audio models only - document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local dev (/data only exists in the container) - prettier pass over startos/actions (formatting drift)
2026-06-11 19:12:23 -05:00
parent 7e8175d857
commit 9ef9226e0a
9 changed files with 175 additions and 84 deletions
@@ -0,0 +1,35 @@
+---
+paths:
+  - "image/app/audio_proxy.py"
+  - "image/app/speech_models.py"
+  - "image/app/deep_health.py"
+  - "image/parakeet_patches/**"
+  - "scripts/test-audio-with-speakers.sh"
+  - "docs/AUDIO_API.md"
+---
+
+# Audio / speech stack (Parakeet STT + Sortformer diarizer + Kokoro TTS on Spark 2)
+
+## Changing the parakeet-asr container
+
+- `image/parakeet_patches/` (`main.py`, `diarizer.py`) is an overlay copied into the `parakeet-asr` container by the "Reapply speech-model patches" dashboard action (`image/app/speech_models.py`). This is the **only** durable way to change that container — `docker exec` / pip changes inside it die on `docker rm`.
+- **Never install `cuda-python` in parakeet-asr** to "fix" the startup warning about CUDA graphs being disabled. The warning is harmless; enabling the graph path crashes real decode with illegal memory access on this GPU/CUDA-13 stack (GB10/sm_121). The slow path served 11k+ requests with zero failures — leave it alone.
+- Pin/constrain torch versions when pip-installing anything into NGC-based containers on the Sparks (ABI breaks otherwise); expect ARM64 wheel gaps and source builds (`--no-build-isolation` for torchaudio). Applies to `spark_embed` too.
+
+## Testing audio endpoints
+
+- Test with **real speech** (e.g. `say -o /tmp/t.wav --data-format=LEI16@16000 "<a couple of sentences>"`), not tones/silence — zero-token audio skips the decoder paths where crashes live.
+- Send audio requests to Spark 2 **sequentially** in tests/scripts. Parallel audio requests can race (cuFFT → 503), and the single GPU serializes them anyway.
+- End-to-end suite (hits the LIVE cluster):
+
+```bash
+./scripts/test-audio-with-speakers.sh <audio-file>   # from repo root
+```
+
+`SPARK_CONTROL` defaults to `http://127.0.0.1:9999` (a running local dev server); point it at the installed package URL otherwise.
+
+## API quirk
+
+Spark Control's `/v1/models` lists *audio* models (STT model + Kokoro voices) by design — **not** the loaded LLM. Discover the LLM via `/api/status` (`vllm.current_model`).
+
+Diarizer caps at 4 speakers (Sortformer `diar_sortformer_4spk-v1`).