v0.12.0:3 - hotfix: build torchaudio from source against NGC's torch
NGC PyTorch (the only base with working torch on Spark's ARM64 + sm_120 Blackwell) doesn't ship torchaudio. Stock pip wheels are amd64-only AND ABI-incompatible with NGC's custom torch 2.10.0a anyway. Pip install just fails or crashes at runtime. Real fix: - apt install git cmake build-essential ninja-build - pip install git+https://github.com/pytorch/audio.git@v2.5.1 with TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0" (sm_120 for Blackwell GB10) - this compiles torchaudio against the torch already in the image, so ABI matches by construction Then constraints.txt locks torch + torchvision + torchaudio so the later `pip install whisperx` can't swap any of them. Cost: +3-5 min to the first install. Docker layer cache reuses the built torchaudio on every subsequent rebuild. Torchaudio v2.5.1 is the last tag that builds cleanly against torch 2.5-2.10 — main branch is too volatile against NGC's alpha torch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -21,25 +21,44 @@
|
|||||||
|
|
||||||
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
||||||
|
|
||||||
# WhisperX runs ffmpeg under the hood for audio decoding
|
# WhisperX runs ffmpeg under the hood for audio decoding.
|
||||||
|
# git + cmake + build-essential are needed to build torchaudio from source
|
||||||
|
# (see below); we remove them at the end of the next layer to keep the image
|
||||||
|
# from growing unnecessarily.
|
||||||
RUN apt-get update \
|
RUN apt-get update \
|
||||||
&& apt-get install -y --no-install-recommends ffmpeg \
|
&& apt-get install -y --no-install-recommends \
|
||||||
|
ffmpeg git cmake build-essential ninja-build \
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
# CRITICAL: the NGC base image ships custom builds of torch + torchaudio +
|
# Pin torch + torchvision to whatever NGC actually shipped so pip can't swap
|
||||||
# torchvision compiled together for Blackwell (sm_120). If pip pulls a stock
|
# them out when it satisfies whisperx/pyannote deps. (NGC's torch is a custom
|
||||||
# torchaudio wheel as a transitive dep of whisperx/pyannote, the resulting
|
# build with a non-standard local version like "2.10.0a0+b558c986e8.nv25.11"
|
||||||
# ABI mismatch crashes at import time:
|
# — stock pip wheels would clobber it and break the ABI.)
|
||||||
# "undefined symbol: torch_library_impl"
|
RUN python3 -c "import torch, torchvision; \
|
||||||
# Generate a constraints.txt from whatever versions NGC actually shipped,
|
|
||||||
# then pass it to every pip install so pip cannot swap torch out.
|
|
||||||
RUN python3 -c "import torch, torchaudio, torchvision; \
|
|
||||||
import sys; \
|
import sys; \
|
||||||
sys.stdout.write(f'torch=={torch.__version__}\ntorchaudio=={torchaudio.__version__}\ntorchvision=={torchvision.__version__}\n')" \
|
sys.stdout.write(f'torch=={torch.__version__}\ntorchvision=={torchvision.__version__}\n')" \
|
||||||
> /tmp/torch-constraints.txt \
|
> /tmp/torch-constraints.txt \
|
||||||
&& echo '── pinned torch versions ──' && cat /tmp/torch-constraints.txt
|
&& echo '── pinned torch versions ──' && cat /tmp/torch-constraints.txt
|
||||||
|
|
||||||
# Install whisperx + the FastAPI wrapper deps under the torch constraint.
|
# NGC PyTorch images don't include torchaudio (NVIDIA optimizes for
|
||||||
|
# vision/text workloads). Stock torchaudio wheels are ABI-incompatible with
|
||||||
|
# NGC's custom torch 2.10a, so the only working option is building from
|
||||||
|
# source against the NGC torch already in the image. Pinning to v2.5.1 — the
|
||||||
|
# last torchaudio tag that builds cleanly against torch 2.5–2.10 and is a
|
||||||
|
# proven compatibility target.
|
||||||
|
ENV USE_CUDA=1 BUILD_SOX=0 TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0"
|
||||||
|
RUN pip install --break-system-packages --no-cache-dir \
|
||||||
|
git+https://github.com/pytorch/audio.git@v2.5.1 \
|
||||||
|
&& python3 -c "import torchaudio; print('torchaudio built:', torchaudio.__version__)"
|
||||||
|
|
||||||
|
# Append torchaudio to constraints so pip can't replace it later.
|
||||||
|
RUN python3 -c "import torchaudio; print(f'torchaudio=={torchaudio.__version__}')" \
|
||||||
|
>> /tmp/torch-constraints.txt \
|
||||||
|
&& echo '── final pinned versions ──' && cat /tmp/torch-constraints.txt
|
||||||
|
|
||||||
|
# Install whisperx + the FastAPI wrapper deps under the torch+torchaudio
|
||||||
|
# constraint. pip will satisfy whisperx/pyannote without swapping any of the
|
||||||
|
# pytorch-family packages.
|
||||||
COPY requirements.txt /tmp/requirements.txt
|
COPY requirements.txt /tmp/requirements.txt
|
||||||
RUN pip install --break-system-packages --no-cache-dir \
|
RUN pip install --break-system-packages --no-cache-dir \
|
||||||
-c /tmp/torch-constraints.txt -r /tmp/requirements.txt
|
-c /tmp/torch-constraints.txt -r /tmp/requirements.txt
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
export const v0_1_0 = VersionInfo.of({
|
export const v0_1_0 = VersionInfo.of({
|
||||||
version: '0.12.0:2',
|
version: '0.12.0:3',
|
||||||
releaseNotes: {
|
releaseNotes: {
|
||||||
en_US:
|
en_US:
|
||||||
'v0.12.0:2 — hotfix: WhisperX docker build was failing at the model-prewarm step with "undefined symbol: torch_library_impl". Root cause: the NGC PyTorch base image ships custom builds of torch + torchaudio + torchvision compiled together for Blackwell (sm_120); pip pulled a stock torchaudio wheel as a transitive dep of whisperx/pyannote, the ABIs didn\'t match, and the resulting .so file refused to load. Fix: generate a constraints.txt at build time from the NGC base\'s installed torch versions, and pass it to every pip install so pip can\'t swap torch/torchaudio/torchvision out from under us. Build should now finish through the model-prewarm step. No other changes vs 0.12.0:1.',
|
'v0.12.0:3 — hotfix: deeper torchaudio fix. The Spark is ARM64 (Grace + GB10 Blackwell), and the NGC PyTorch container — the only base with a working torch for sm_120 ARM64 — does NOT ship torchaudio at all. Stock pip wheels are amd64-only and ABI-incompatible with NGC\'s custom torch anyway. Real fix: build torchaudio from source against NGC\'s torch (v2.5.1, the last torchaudio tag that compiles cleanly against torch 2.5–2.10) with TORCH_CUDA_ARCH_LIST set for Blackwell sm_120. Build adds ~3-5 min to the first WhisperX install (only first time — Docker layer cache reuses it after). Plus the constraints.txt approach from 0.12.0:2 to lock torch + torchvision + torchaudio against any later pip swap-out.',
|
||||||
},
|
},
|
||||||
migrations: {
|
migrations: {
|
||||||
up: async ({ effects }) => {},
|
up: async ({ effects }) => {},
|
||||||
|
|||||||
Reference in New Issue
Block a user