v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer
After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we
never got a working docker build. The fundamental constraint isn't
patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that
runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch
2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere.
WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't
satisfy without rebuilding torchaudio against torch 2.10's alpha API.
Walking away cleanly is better than another night of chasing.
Removed from the codebase:
- image/whisperx_container/* (Dockerfile + requirements + app/main.py)
- image/app/whisperx_install.py (install manager + SSH ship-context logic)
- image/Dockerfile COPY whisperx_container
- WHISPERX_* config keys in config.py
- whisperx service entry in services.py
- WhisperX-preferred branch in audio_proxy.py
- /api/whisperx/* endpoints in server.py
- install banner + progress dialog in index.html
- render + handlers in app.js
- .whisperx-install styles in style.css
Spark 2 cleaned in tandem (user-authorized): container removed,
~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of
builder cache reclaimed. parakeet-asr and magpie-tts unaffected and
healthy throughout.
The audio path is back to exactly what shipped in v0.11.0:3:
POST /api/audio/transcribe-with-speakers
→ Parakeet (transcription) + Sortformer (diarization) in parallel
→ merged by timestamp into speaker-labeled blocks
v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour
was meant to address:
1. memory cap on the parakeet-asr container so a long-audio crash
can't swap-thrash Spark 2 again
2. a chunking proxy in /api/audio/transcribe-with-speakers that
splits inputs >10 min before Sortformer
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -209,17 +209,6 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
|
||||
raise HTTPException(r.status_code, r.text[:500])
|
||||
return Response(content=r.content, media_type=r.headers.get("content-type", "application/json"))
|
||||
|
||||
def _whisperx_base() -> str:
|
||||
return f"http://{settings.whisperx_host}:{settings.whisperx_port}"
|
||||
|
||||
async def _whisperx_healthy() -> bool:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=2.0) as client:
|
||||
r = await client.get(f"{_whisperx_base()}/health")
|
||||
return r.status_code == 200 and bool(r.json().get("diarizer_loaded"))
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
# ---- /api/audio/transcribe-with-speakers (STT + diarization, merged) ----
|
||||
@router.post("/api/audio/transcribe-with-speakers")
|
||||
async def transcribe_with_speakers(
|
||||
@@ -256,23 +245,8 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
|
||||
filename = file.filename or "audio.wav"
|
||||
content_type = file.content_type or "application/octet-stream"
|
||||
|
||||
# Prefer WhisperX (single-pipeline, handles long audio properly) when it's
|
||||
# installed and healthy. Fall back to Parakeet + Sortformer otherwise.
|
||||
if await _whisperx_healthy():
|
||||
files = {"file": (filename, body, content_type)}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=1800.0) as client:
|
||||
r = await client.post(
|
||||
f"{_whisperx_base()}/v1/audio/transcribe-with-speakers",
|
||||
files=files,
|
||||
)
|
||||
except httpx.HTTPError as e:
|
||||
raise HTTPException(502, f"whisperx unreachable: {e}")
|
||||
if r.status_code != 200:
|
||||
raise HTTPException(r.status_code, r.text[:500])
|
||||
return r.json()
|
||||
|
||||
# ── Legacy fallback: Parakeet ASR + Sortformer diarizer in parallel ──
|
||||
# Parakeet ASR + Sortformer diarizer in parallel. (A WhisperX detour
|
||||
# lived here briefly — reverted in v0.13.0:0; see release notes.)
|
||||
async def _call_transcribe(client: httpx.AsyncClient) -> dict:
|
||||
files = {"file": (filename, body, content_type)}
|
||||
data = {"response_format": "verbose_json"}
|
||||
|
||||
Reference in New Issue
Block a user