v0.12.0:0 - WhisperX as a one-click dashboard install + managed service
Replaces the manual rsync+build+run with a proper spark-control feature.
First in the audio path that doesn't require shell access on Spark 2.
What's in the box
─────────────────
* image/whisperx_container/ - the build context (Dockerfile, requirements,
app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT +
pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint
/v1/audio/transcribe-with-speakers returns the exact same shape spark-
control's existing endpoint does, so the recap-relay PR spec needs no
changes when we cut over.
* image/app/whisperx_install.py - install manager. ships build context to
Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB
memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min
file), polls /health until both Whisper + pyannote report loaded.
* Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX
when its /health reports diarizer_loaded=true, falls back to the legacy
Parakeet + Sortformer path otherwise. Same response shape either way.
Clean cutover, easy rollback (`docker rm whisperx-asr`).
* Dashboard (Audio / Speech tab):
- "Add WhisperX" banner appears when not installed, with a primary
"Install WhisperX" button. One click triggers the install.
- Build progress dialog with phase + elapsed timer + live build log via
SSE (`/api/whisperx/install/{job_id}/stream`).
- After install, WhisperX auto-registers as a managed service alongside
Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart).
- Banner self-hides once /api/whisperx/status reports healthy.
New endpoints
─────────────
GET /api/whisperx/status
POST /api/whisperx/install
GET /api/whisperx/install/{job_id}
GET /api/whisperx/install/{job_id}/stream (SSE phase + log)
Config additions (env)
──────────────────────
WHISPERX_HOST (defaults to spark2_host)
WHISPERX_USER (defaults to spark2_user)
WHISPERX_CONTAINER (default: whisperx-asr)
WHISPERX_PORT (default: 8002)
WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3)
Dockerfile
──────────
Added COPY whisperx_container /app/whisperx_container so the runtime
install manager can read the build context from inside the spark-control
image and ship it over SSH.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,74 @@
|
||||
# WhisperX container for Spark 2
|
||||
|
||||
Replaces the custom Parakeet wrapper + Sortformer overlay (v0.10/v0.11) with a
|
||||
single mainline pipeline:
|
||||
|
||||
- **faster-whisper** (CTranslate2-optimized) for STT
|
||||
- **pyannote.audio 3.1** for speaker diarization (sliding-window — handles
|
||||
long files in bounded memory, fixes the Sortformer OOM on 90-min audio)
|
||||
- **wav2vec2 forced alignment** for word-level timestamps
|
||||
|
||||
Exposes the same API surface spark-control already proxies to, so the cutover
|
||||
is a one-URL change in the audio proxy:
|
||||
|
||||
- `GET /health` — readiness probe
|
||||
- `GET /v1/models` — model list
|
||||
- `POST /v1/audio/transcriptions` — OpenAI-shaped STT
|
||||
- `POST /v1/audio/transcribe-with-speakers` — merged diarized transcript
|
||||
(matches spark-control's response shape exactly)
|
||||
|
||||
## Deploy to Spark 2
|
||||
|
||||
```bash
|
||||
# 1. Copy this directory to Spark 2
|
||||
rsync -av --delete image/whisperx_container/ <spark-user>@<spark-2-ip>:~/whisperx-build/
|
||||
|
||||
# 2. SSH in and build
|
||||
ssh <spark-user>@<spark-2-ip>
|
||||
cd ~/whisperx-build
|
||||
docker build -t whisperx-asr:latest .
|
||||
|
||||
# 3. Run alongside the existing parakeet-asr (which stays on 8000 for now)
|
||||
docker run -d --restart unless-stopped --name whisperx-asr \
|
||||
--gpus all --memory=40g \
|
||||
-p 8002:8002 \
|
||||
-v whisperx-models:/root/.cache/huggingface \
|
||||
-e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
|
||||
-e WHISPER_MODEL=medium \
|
||||
whisperx-asr:latest
|
||||
|
||||
# 4. Watch first-start logs (model load + first health check)
|
||||
docker logs -f whisperx-asr
|
||||
```
|
||||
|
||||
## Model size knobs
|
||||
|
||||
`WHISPER_MODEL` env var. Defaults to `medium`. Options:
|
||||
|
||||
| Model | Size | Speed (GB10) | Quality |
|
||||
|---|---|---|---|
|
||||
| `tiny` | ~75M | ~120x rt | low |
|
||||
| `base` | ~74M | ~80x rt | ok |
|
||||
| `small` | ~244M | ~50x rt | good |
|
||||
| `medium`| ~769M | ~30x rt | excellent (**default**) |
|
||||
| `large-v3`| ~1.5B | ~15x rt | best |
|
||||
|
||||
For a 90-min file, medium takes ~3 min STT + ~9 min diarize ≈ ~12 min total.
|
||||
|
||||
## Memory budget
|
||||
|
||||
The `--memory=40g` cap is intentional. Spark 2 has 122 GB unified, of which
|
||||
~35 GB is consumed by parakeet-asr + magpie-tts. The 40 GB cap leaves
|
||||
comfortable headroom for both the model weights (~5 GB) and pyannote's
|
||||
in-memory features (~5–15 GB for a 90-min audio). If WhisperX hits a
|
||||
pathological input it gets OOM-killed cleanly instead of swap-thrashing the
|
||||
whole Spark — the symptom we hit with the unbounded Sortformer container.
|
||||
|
||||
## Rollback to Parakeet+Sortformer
|
||||
|
||||
```bash
|
||||
docker stop whisperx-asr && docker rm whisperx-asr
|
||||
```
|
||||
|
||||
The parakeet-asr container stays running throughout — spark-control's proxy
|
||||
URL switch is reversible via config or version downgrade.
|
||||
Reference in New Issue
Block a user