v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer

After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we never got a working docker build. The fundamental constraint isn't patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch 2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere. WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't satisfy without rebuilding torchaudio against torch 2.10's alpha API. Walking away cleanly is better than another night of chasing. Removed from the codebase: - image/whisperx_container/* (Dockerfile + requirements + app/main.py) - image/app/whisperx_install.py (install manager + SSH ship-context logic) - image/Dockerfile COPY whisperx_container - WHISPERX_* config keys in config.py - whisperx service entry in services.py - WhisperX-preferred branch in audio_proxy.py - /api/whisperx/* endpoints in server.py - install banner + progress dialog in index.html - render + handlers in app.js - .whisperx-install styles in style.css Spark 2 cleaned in tandem (user-authorized): container removed, ~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of builder cache reclaimed. parakeet-asr and magpie-tts unaffected and healthy throughout. The audio path is back to exactly what shipped in v0.11.0:3: POST /api/audio/transcribe-with-speakers → Parakeet (transcription) + Sortformer (diarization) in parallel → merged by timestamp into speaker-labeled blocks v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour was meant to address: 1. memory cap on the parakeet-asr container so a long-audio crash can't swap-thrash Spark 2 again 2. a chunking proxy in /api/audio/transcribe-with-speakers that splits inputs >10 min before Sortformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:03:19 -05:00
parent a24610ad2a
commit 95524f4983
14 changed files with 14 additions and 1086 deletions
@@ -103,46 +103,6 @@

    <div class="tab-content" id="tab-audio" role="tabpanel" aria-labelledby="tab-audio-trigger">

-    <section id="whisperx-install-card" class="whisperx-install hidden">
-      <div class="wx-install-body">
-        <div class="wx-install-title">
-          <strong>Add WhisperX</strong>
-          <span class="tag ok">recommended</span>
-        </div>
-        <p class="muted small">
-          WhisperX is a single-container speech pipeline (faster-whisper for transcription + pyannote 3.1 for diarization)
-          designed to handle long audio cleanly. Replaces the Parakeet + Sortformer combo we patched together,
-          which crashed on a 90-min meeting. Pulled and built directly on Spark 2 (~10–15 min first time;
-          you only do this once).
-        </p>
-        <p class="muted small">
-          Requires a Hugging Face token at <code>~/.cache/huggingface/token</code> on Spark 2 (already set up).
-        </p>
-        <div class="wx-install-actions">
-          <button id="wx-install" class="btn primary">Install WhisperX</button>
-        </div>
-      </div>
-    </section>
-
-    <dialog id="whisperx-progress-dialog" class="modal">
-      <form method="dialog" class="modal-form">
-        <h3 id="wx-prog-title">Installing WhisperX…</h3>
-        <div class="phase-row">
-          <span class="spinner"></span>
-          <div class="phase" id="wx-prog-phase">Starting…</div>
-          <span class="spacer"></span>
-          <span class="timer" id="wx-prog-elapsed">0:00</span>
-        </div>
-        <details open>
-          <summary class="muted small">Build log</summary>
-          <pre id="wx-prog-log" class="log"></pre>
-        </details>
-        <div class="modal-actions">
-          <button type="button" id="wx-prog-close" class="btn">Close</button>
-        </div>
-      </form>
-    </dialog>
-
    <section id="services-panel" class="services hidden">
      <div class="section-header">
        <h2 class="section-title">Always-on services</h2>