98aeef8779
WhisperX docker build was crashing at the model-prewarm step: OSError: undefined symbol: torch_library_impl Root cause: the NGC PyTorch base ships custom builds of torch + torchaudio + torchvision matched together for Blackwell (sm_120). When pip installed whisperx, it pulled the latest stock torchaudio wheel as a transitive dep, which was compiled against a different libtorch and won't load against NGC's. Fix: at build time, capture NGC's actual torch/torchaudio/torchvision versions into /tmp/torch-constraints.txt, then `pip install -c` that constraint for all subsequent installs. pip can't swap torch out, so the ABI stays consistent. whisperx and pyannote are happy with torch>=2.0 — NGC's 2.10.0a0 satisfies that easily. The pinned versions print to the build log so you can see them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
65 lines
2.9 KiB
Docker
65 lines
2.9 KiB
Docker
# WhisperX ASR + diarization container for Spark 2 (Blackwell GB10, sm_120).
|
|
#
|
|
# Replaces the custom Parakeet wrapper + Sortformer overlay with a single
|
|
# mainline pipeline: faster-whisper for transcription + pyannote.audio 3.1
|
|
# for diarization + wav2vec2 forced alignment for word-level timestamps.
|
|
#
|
|
# Build (on Spark 2, where Blackwell + nvcr.io credentials are available):
|
|
# docker build -t whisperx-asr:latest .
|
|
#
|
|
# Run:
|
|
# docker run -d --restart unless-stopped --name whisperx-asr \
|
|
# --gpus all --memory=40g \
|
|
# -p 8002:8002 \
|
|
# -v whisperx-models:/root/.cache/huggingface \
|
|
# -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
|
|
# -e WHISPER_MODEL=medium \
|
|
# whisperx-asr:latest
|
|
#
|
|
# The memory cap is intentional: even if WhisperX hits a pathological input,
|
|
# it gets OOM-killed cleanly instead of swap-thrashing the whole Spark.
|
|
|
|
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
|
|
|
# WhisperX runs ffmpeg under the hood for audio decoding
|
|
RUN apt-get update \
|
|
&& apt-get install -y --no-install-recommends ffmpeg \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# CRITICAL: the NGC base image ships custom builds of torch + torchaudio +
|
|
# torchvision compiled together for Blackwell (sm_120). If pip pulls a stock
|
|
# torchaudio wheel as a transitive dep of whisperx/pyannote, the resulting
|
|
# ABI mismatch crashes at import time:
|
|
# "undefined symbol: torch_library_impl"
|
|
# Generate a constraints.txt from whatever versions NGC actually shipped,
|
|
# then pass it to every pip install so pip cannot swap torch out.
|
|
RUN python3 -c "import torch, torchaudio, torchvision; \
|
|
import sys; \
|
|
sys.stdout.write(f'torch=={torch.__version__}\ntorchaudio=={torchaudio.__version__}\ntorchvision=={torchvision.__version__}\n')" \
|
|
> /tmp/torch-constraints.txt \
|
|
&& echo '── pinned torch versions ──' && cat /tmp/torch-constraints.txt
|
|
|
|
# Install whisperx + the FastAPI wrapper deps under the torch constraint.
|
|
COPY requirements.txt /tmp/requirements.txt
|
|
RUN pip install --break-system-packages --no-cache-dir \
|
|
-c /tmp/torch-constraints.txt -r /tmp/requirements.txt
|
|
|
|
# Pre-warm the default Whisper + alignment models at build time so first-call
|
|
# latency on a fresh container is small. (~3 GB cached into the image; if you
|
|
# want a smaller image, comment this out and accept the first-call download.)
|
|
ARG WHISPER_MODEL=medium
|
|
ENV WHISPER_MODEL=${WHISPER_MODEL}
|
|
RUN python3 -c "import whisperx; whisperx.load_model('${WHISPER_MODEL}', 'cpu', compute_type='int8')" \
|
|
&& python3 -c "import whisperx; whisperx.load_align_model(language_code='en', device='cpu')"
|
|
|
|
WORKDIR /opt/whisperx
|
|
COPY app /opt/whisperx/app
|
|
|
|
# Expose for spark-control's proxy on Spark 2
|
|
EXPOSE 8002
|
|
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=180s \
|
|
CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8002/health')" || exit 1
|
|
|
|
CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002", "--workers", "1"]
|