v0.12.0:0 - WhisperX as a one-click dashboard install + managed service

Replaces the manual rsync+build+run with a proper spark-control feature. First in the audio path that doesn't require shell access on Spark 2. What's in the box ───────────────── * image/whisperx_container/ - the build context (Dockerfile, requirements, app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT + pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint /v1/audio/transcribe-with-speakers returns the exact same shape spark- control's existing endpoint does, so the recap-relay PR spec needs no changes when we cut over. * image/app/whisperx_install.py - install manager. ships build context to Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min file), polls /health until both Whisper + pyannote report loaded. * Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX when its /health reports diarizer_loaded=true, falls back to the legacy Parakeet + Sortformer path otherwise. Same response shape either way. Clean cutover, easy rollback (`docker rm whisperx-asr`). * Dashboard (Audio / Speech tab): - "Add WhisperX" banner appears when not installed, with a primary "Install WhisperX" button. One click triggers the install. - Build progress dialog with phase + elapsed timer + live build log via SSE (`/api/whisperx/install/{job_id}/stream`). - After install, WhisperX auto-registers as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart). - Banner self-hides once /api/whisperx/status reports healthy. New endpoints ───────────── GET /api/whisperx/status POST /api/whisperx/install GET /api/whisperx/install/{job_id} GET /api/whisperx/install/{job_id}/stream (SSE phase + log) Config additions (env) ────────────────────── WHISPERX_HOST (defaults to spark2_host) WHISPERX_USER (defaults to spark2_user) WHISPERX_CONTAINER (default: whisperx-asr) WHISPERX_PORT (default: 8002) WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3) Dockerfile ────────── Added COPY whisperx_container /app/whisperx_container so the runtime install manager can read the build context from inside the spark-control image and ship it over SSH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:02:26 -05:00
parent cfc1c408d4
commit 5a0bfba6a3
14 changed files with 1033 additions and 3 deletions
@@ -0,0 +1,74 @@
+# WhisperX container for Spark 2
+
+Replaces the custom Parakeet wrapper + Sortformer overlay (v0.10/v0.11) with a
+single mainline pipeline:
+
+- **faster-whisper** (CTranslate2-optimized) for STT
+- **pyannote.audio 3.1** for speaker diarization (sliding-window — handles
+  long files in bounded memory, fixes the Sortformer OOM on 90-min audio)
+- **wav2vec2 forced alignment** for word-level timestamps
+
+Exposes the same API surface spark-control already proxies to, so the cutover
+is a one-URL change in the audio proxy:
+
+- `GET  /health` — readiness probe
+- `GET  /v1/models` — model list
+- `POST /v1/audio/transcriptions` — OpenAI-shaped STT
+- `POST /v1/audio/transcribe-with-speakers` — merged diarized transcript
+  (matches spark-control's response shape exactly)
+
+## Deploy to Spark 2
+
+```bash
+# 1. Copy this directory to Spark 2
+rsync -av --delete image/whisperx_container/ <spark-user>@<spark-2-ip>:~/whisperx-build/
+
+# 2. SSH in and build
+ssh <spark-user>@<spark-2-ip>
+cd ~/whisperx-build
+docker build -t whisperx-asr:latest .
+
+# 3. Run alongside the existing parakeet-asr (which stays on 8000 for now)
+docker run -d --restart unless-stopped --name whisperx-asr \
+  --gpus all --memory=40g \
+  -p 8002:8002 \
+  -v whisperx-models:/root/.cache/huggingface \
+  -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
+  -e WHISPER_MODEL=medium \
+  whisperx-asr:latest
+
+# 4. Watch first-start logs (model load + first health check)
+docker logs -f whisperx-asr
+```
+
+## Model size knobs
+
+`WHISPER_MODEL` env var. Defaults to `medium`. Options:
+
+| Model | Size | Speed (GB10) | Quality |
+|---|---|---|---|
+| `tiny`  | ~75M  | ~120x rt | low |
+| `base`  | ~74M  | ~80x rt  | ok |
+| `small` | ~244M | ~50x rt  | good |
+| `medium`| ~769M | ~30x rt  | excellent (**default**) |
+| `large-v3`| ~1.5B | ~15x rt | best |
+
+For a 90-min file, medium takes ~3 min STT + ~9 min diarize ≈ ~12 min total.
+
+## Memory budget
+
+The `--memory=40g` cap is intentional. Spark 2 has 122 GB unified, of which
+~35 GB is consumed by parakeet-asr + magpie-tts. The 40 GB cap leaves
+comfortable headroom for both the model weights (~5 GB) and pyannote's
+in-memory features (~5–15 GB for a 90-min audio). If WhisperX hits a
+pathological input it gets OOM-killed cleanly instead of swap-thrashing the
+whole Spark — the symptom we hit with the unbounded Sortformer container.
+
+## Rollback to Parakeet+Sortformer
+
+```bash
+docker stop whisperx-asr && docker rm whisperx-asr
+```
+
+The parakeet-asr container stays running throughout — spark-control's proxy
+URL switch is reversible via config or version downgrade.