v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer

After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we never got a working docker build. The fundamental constraint isn't patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch 2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere. WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't satisfy without rebuilding torchaudio against torch 2.10's alpha API. Walking away cleanly is better than another night of chasing. Removed from the codebase: - image/whisperx_container/* (Dockerfile + requirements + app/main.py) - image/app/whisperx_install.py (install manager + SSH ship-context logic) - image/Dockerfile COPY whisperx_container - WHISPERX_* config keys in config.py - whisperx service entry in services.py - WhisperX-preferred branch in audio_proxy.py - /api/whisperx/* endpoints in server.py - install banner + progress dialog in index.html - render + handlers in app.js - .whisperx-install styles in style.css Spark 2 cleaned in tandem (user-authorized): container removed, ~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of builder cache reclaimed. parakeet-asr and magpie-tts unaffected and healthy throughout. The audio path is back to exactly what shipped in v0.11.0:3: POST /api/audio/transcribe-with-speakers → Parakeet (transcription) + Sortformer (diarization) in parallel → merged by timestamp into speaker-labeled blocks v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour was meant to address: 1. memory cap on the parakeet-asr container so a long-audio crash can't swap-thrash Spark 2 again 2. a chunking proxy in /api/audio/transcribe-with-speakers that splits inputs >10 min before Sortformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:03:19 -05:00
parent a24610ad2a
commit 95524f4983
14 changed files with 14 additions and 1086 deletions
@@ -209,17 +209,6 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
            raise HTTPException(r.status_code, r.text[:500])
        return Response(content=r.content, media_type=r.headers.get("content-type", "application/json"))

-    def _whisperx_base() -> str:
-        return f"http://{settings.whisperx_host}:{settings.whisperx_port}"
-
-    async def _whisperx_healthy() -> bool:
-        try:
-            async with httpx.AsyncClient(timeout=2.0) as client:
-                r = await client.get(f"{_whisperx_base()}/health")
-            return r.status_code == 200 and bool(r.json().get("diarizer_loaded"))
-        except Exception:
-            return False
-
    # ---- /api/audio/transcribe-with-speakers (STT + diarization, merged) ----
    @router.post("/api/audio/transcribe-with-speakers")
    async def transcribe_with_speakers(
@@ -256,23 +245,8 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
        filename = file.filename or "audio.wav"
        content_type = file.content_type or "application/octet-stream"

-        # Prefer WhisperX (single-pipeline, handles long audio properly) when it's
-        # installed and healthy. Fall back to Parakeet + Sortformer otherwise.
-        if await _whisperx_healthy():
-            files = {"file": (filename, body, content_type)}
-            try:
-                async with httpx.AsyncClient(timeout=1800.0) as client:
-                    r = await client.post(
-                        f"{_whisperx_base()}/v1/audio/transcribe-with-speakers",
-                        files=files,
-                    )
-            except httpx.HTTPError as e:
-                raise HTTPException(502, f"whisperx unreachable: {e}")
-            if r.status_code != 200:
-                raise HTTPException(r.status_code, r.text[:500])
-            return r.json()
-
-        # ── Legacy fallback: Parakeet ASR + Sortformer diarizer in parallel ──
+        # Parakeet ASR + Sortformer diarizer in parallel. (A WhisperX detour
+        # lived here briefly — reverted in v0.13.0:0; see release notes.)
        async def _call_transcribe(client: httpx.AsyncClient) -> dict:
            files = {"file": (filename, body, content_type)}
            data = {"response_format": "verbose_json"}