9ef9226e0a
- CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance
moved to .claude/rules/{startos-package,fastapi-image,redaction,
audio-speech}.md with paths: frontmatter so each loads only when
matching files are touched
- .gitignore: track .claude/rules/ while keeping the rest of .claude/
(settings.local.json) ignored
- test-audio-with-speakers.sh: require audio-file arg in docs, replace
owner-specific SPARK_CONTROL/VLLM defaults with generic ones
(localhost dev server + Spark Control vLLM proxy), discover the
loaded LLM via /api/status since /v1/models lists audio models only
- document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local
dev (/data only exists in the container)
- prettier pass over startos/actions (formatting drift)
2.1 KiB
2.1 KiB
paths
| paths | ||||||
|---|---|---|---|---|---|---|
|
Audio / speech stack (Parakeet STT + Sortformer diarizer + Kokoro TTS on Spark 2)
Changing the parakeet-asr container
image/parakeet_patches/(main.py,diarizer.py) is an overlay copied into theparakeet-asrcontainer by the "Reapply speech-model patches" dashboard action (image/app/speech_models.py). This is the only durable way to change that container —docker exec/ pip changes inside it die ondocker rm.- Never install
cuda-pythonin parakeet-asr to "fix" the startup warning about CUDA graphs being disabled. The warning is harmless; enabling the graph path crashes real decode with illegal memory access on this GPU/CUDA-13 stack (GB10/sm_121). The slow path served 11k+ requests with zero failures — leave it alone. - Pin/constrain torch versions when pip-installing anything into NGC-based containers on the Sparks (ABI breaks otherwise); expect ARM64 wheel gaps and source builds (
--no-build-isolationfor torchaudio). Applies tospark_embedtoo.
Testing audio endpoints
- Test with real speech (e.g.
say -o /tmp/t.wav --data-format=LEI16@16000 "<a couple of sentences>"), not tones/silence — zero-token audio skips the decoder paths where crashes live. - Send audio requests to Spark 2 sequentially in tests/scripts. Parallel audio requests can race (cuFFT → 503), and the single GPU serializes them anyway.
- End-to-end suite (hits the LIVE cluster):
./scripts/test-audio-with-speakers.sh <audio-file> # from repo root
SPARK_CONTROL defaults to http://127.0.0.1:9999 (a running local dev server); point it at the installed package URL otherwise.
API quirk
Spark Control's /v1/models lists audio models (STT model + Kokoro voices) by design — not the loaded LLM. Discover the LLM via /api/status (vllm.current_model).
Diarizer caps at 4 speakers (Sortformer diar_sortformer_4spk-v1).