v0.12.0:0 - WhisperX as a one-click dashboard install + managed service
Replaces the manual rsync+build+run with a proper spark-control feature.
First in the audio path that doesn't require shell access on Spark 2.
What's in the box
─────────────────
* image/whisperx_container/ - the build context (Dockerfile, requirements,
app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT +
pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint
/v1/audio/transcribe-with-speakers returns the exact same shape spark-
control's existing endpoint does, so the recap-relay PR spec needs no
changes when we cut over.
* image/app/whisperx_install.py - install manager. ships build context to
Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB
memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min
file), polls /health until both Whisper + pyannote report loaded.
* Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX
when its /health reports diarizer_loaded=true, falls back to the legacy
Parakeet + Sortformer path otherwise. Same response shape either way.
Clean cutover, easy rollback (`docker rm whisperx-asr`).
* Dashboard (Audio / Speech tab):
- "Add WhisperX" banner appears when not installed, with a primary
"Install WhisperX" button. One click triggers the install.
- Build progress dialog with phase + elapsed timer + live build log via
SSE (`/api/whisperx/install/{job_id}/stream`).
- After install, WhisperX auto-registers as a managed service alongside
Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart).
- Banner self-hides once /api/whisperx/status reports healthy.
New endpoints
─────────────
GET /api/whisperx/status
POST /api/whisperx/install
GET /api/whisperx/install/{job_id}
GET /api/whisperx/install/{job_id}/stream (SSE phase + log)
Config additions (env)
──────────────────────
WHISPERX_HOST (defaults to spark2_host)
WHISPERX_USER (defaults to spark2_user)
WHISPERX_CONTAINER (default: whisperx-asr)
WHISPERX_PORT (default: 8002)
WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3)
Dockerfile
──────────
Added COPY whisperx_container /app/whisperx_container so the runtime
install manager can read the build context from inside the spark-control
image and ship it over SSH.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# WhisperX ASR + diarization container for Spark 2 (Blackwell GB10, sm_120).
|
||||
#
|
||||
# Replaces the custom Parakeet wrapper + Sortformer overlay with a single
|
||||
# mainline pipeline: faster-whisper for transcription + pyannote.audio 3.1
|
||||
# for diarization + wav2vec2 forced alignment for word-level timestamps.
|
||||
#
|
||||
# Build (on Spark 2, where Blackwell + nvcr.io credentials are available):
|
||||
# docker build -t whisperx-asr:latest .
|
||||
#
|
||||
# Run:
|
||||
# docker run -d --restart unless-stopped --name whisperx-asr \
|
||||
# --gpus all --memory=40g \
|
||||
# -p 8002:8002 \
|
||||
# -v whisperx-models:/root/.cache/huggingface \
|
||||
# -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
|
||||
# -e WHISPER_MODEL=medium \
|
||||
# whisperx-asr:latest
|
||||
#
|
||||
# The memory cap is intentional: even if WhisperX hits a pathological input,
|
||||
# it gets OOM-killed cleanly instead of swap-thrashing the whole Spark.
|
||||
|
||||
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
||||
|
||||
# WhisperX runs ffmpeg under the hood for audio decoding
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends ffmpeg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install whisperx + the FastAPI wrapper deps. --break-system-packages because
|
||||
# the NGC PyTorch image has its own managed Python that's flagged "system".
|
||||
COPY requirements.txt /tmp/requirements.txt
|
||||
RUN pip install --break-system-packages --no-cache-dir -r /tmp/requirements.txt
|
||||
|
||||
# Pre-warm the default Whisper + alignment models at build time so first-call
|
||||
# latency on a fresh container is small. (~3 GB cached into the image; if you
|
||||
# want a smaller image, comment this out and accept the first-call download.)
|
||||
ARG WHISPER_MODEL=medium
|
||||
ENV WHISPER_MODEL=${WHISPER_MODEL}
|
||||
RUN python3 -c "import whisperx; whisperx.load_model('${WHISPER_MODEL}', 'cpu', compute_type='int8')" \
|
||||
&& python3 -c "import whisperx; whisperx.load_align_model(language_code='en', device='cpu')"
|
||||
|
||||
WORKDIR /opt/whisperx
|
||||
COPY app /opt/whisperx/app
|
||||
|
||||
# Expose for spark-control's proxy on Spark 2
|
||||
EXPOSE 8002
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=180s \
|
||||
CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8002/health')" || exit 1
|
||||
|
||||
CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002", "--workers", "1"]
|
||||
Reference in New Issue
Block a user