v0.12.0:0 - WhisperX as a one-click dashboard install + managed service

Replaces the manual rsync+build+run with a proper spark-control feature. First in the audio path that doesn't require shell access on Spark 2. What's in the box ───────────────── * image/whisperx_container/ - the build context (Dockerfile, requirements, app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT + pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint /v1/audio/transcribe-with-speakers returns the exact same shape spark- control's existing endpoint does, so the recap-relay PR spec needs no changes when we cut over. * image/app/whisperx_install.py - install manager. ships build context to Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min file), polls /health until both Whisper + pyannote report loaded. * Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX when its /health reports diarizer_loaded=true, falls back to the legacy Parakeet + Sortformer path otherwise. Same response shape either way. Clean cutover, easy rollback (`docker rm whisperx-asr`). * Dashboard (Audio / Speech tab): - "Add WhisperX" banner appears when not installed, with a primary "Install WhisperX" button. One click triggers the install. - Build progress dialog with phase + elapsed timer + live build log via SSE (`/api/whisperx/install/{job_id}/stream`). - After install, WhisperX auto-registers as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart). - Banner self-hides once /api/whisperx/status reports healthy. New endpoints ───────────── GET /api/whisperx/status POST /api/whisperx/install GET /api/whisperx/install/{job_id} GET /api/whisperx/install/{job_id}/stream (SSE phase + log) Config additions (env) ────────────────────── WHISPERX_HOST (defaults to spark2_host) WHISPERX_USER (defaults to spark2_user) WHISPERX_CONTAINER (default: whisperx-asr) WHISPERX_PORT (default: 8002) WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3) Dockerfile ────────── Added COPY whisperx_container /app/whisperx_container so the runtime install manager can read the build context from inside the spark-control image and ship it over SSH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:02:26 -05:00
parent cfc1c408d4
commit 5a0bfba6a3
14 changed files with 1033 additions and 3 deletions
@@ -18,6 +18,12 @@ COPY models.yaml /app/models.yaml
 # time — survives docker rm + redeploy of the parakeet container.
 COPY parakeet_patches /app/parakeet_patches

+# WhisperX container build context (Dockerfile + requirements.txt + app/).
+# The "Install WhisperX" action in spark-control ships these files to Spark 2
+# over SSH, then runs `docker build` + `docker run` there. The container
+# becomes a managed always-on service alongside parakeet-asr and magpie-tts.
+COPY whisperx_container /app/whisperx_container
+
 RUN pip install --no-cache-dir -e .

 ENV BIND_PORT=9999
@@ -209,6 +209,17 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
            raise HTTPException(r.status_code, r.text[:500])
        return Response(content=r.content, media_type=r.headers.get("content-type", "application/json"))

+    def _whisperx_base() -> str:
+        return f"http://{settings.whisperx_host}:{settings.whisperx_port}"
+
+    async def _whisperx_healthy() -> bool:
+        try:
+            async with httpx.AsyncClient(timeout=2.0) as client:
+                r = await client.get(f"{_whisperx_base()}/health")
+            return r.status_code == 200 and bool(r.json().get("diarizer_loaded"))
+        except Exception:
+            return False
+
    # ---- /api/audio/transcribe-with-speakers (STT + diarization, merged) ----
    @router.post("/api/audio/transcribe-with-speakers")
    async def transcribe_with_speakers(
@@ -245,6 +256,23 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
        filename = file.filename or "audio.wav"
        content_type = file.content_type or "application/octet-stream"

+        # Prefer WhisperX (single-pipeline, handles long audio properly) when it's
+        # installed and healthy. Fall back to Parakeet + Sortformer otherwise.
+        if await _whisperx_healthy():
+            files = {"file": (filename, body, content_type)}
+            try:
+                async with httpx.AsyncClient(timeout=1800.0) as client:
+                    r = await client.post(
+                        f"{_whisperx_base()}/v1/audio/transcribe-with-speakers",
+                        files=files,
+                    )
+            except httpx.HTTPError as e:
+                raise HTTPException(502, f"whisperx unreachable: {e}")
+            if r.status_code != 200:
+                raise HTTPException(r.status_code, r.text[:500])
+            return r.json()
+
+        # ── Legacy fallback: Parakeet ASR + Sortformer diarizer in parallel ──
        async def _call_transcribe(client: httpx.AsyncClient) -> dict:
            files = {"file": (filename, body, content_type)}
            data = {"response_format": "verbose_json"}
@@ -35,6 +35,11 @@ class Settings:
    magpie_host: str
    magpie_user: str
    magpie_container: str
+    whisperx_host: str
+    whisperx_user: str
+    whisperx_container: str
+    whisperx_port: int
+    whisperx_model: str
    ssh_key_path: str
    ssh_known_hosts: str
    models_yaml: str
@@ -49,7 +54,7 @@ class Settings:
    def from_env(cls) -> "Settings":
        spark2_host = _env("SPARK2_HOST")
        spark2_user = _env("SPARK2_USER")
-        # Parakeet and Magpie default to Spark 2 unless explicitly overridden.
+        # Parakeet, Magpie, and WhisperX all default to Spark 2 unless overridden.
        return cls(
            spark1_host=_env("SPARK1_HOST"),
            spark1_user=_env("SPARK1_USER"),
@@ -61,6 +66,11 @@ class Settings:
            magpie_host=_env("MAGPIE_HOST") or spark2_host,
            magpie_user=_env("MAGPIE_USER") or spark2_user,
            magpie_container=_env("MAGPIE_CONTAINER") or "magpie-tts",
+            whisperx_host=_env("WHISPERX_HOST") or spark2_host,
+            whisperx_user=_env("WHISPERX_USER") or spark2_user,
+            whisperx_container=_env("WHISPERX_CONTAINER") or "whisperx-asr",
+            whisperx_port=int(_env("WHISPERX_PORT", "8002")),
+            whisperx_model=_env("WHISPERX_MODEL", "medium"),
            ssh_key_path=_env("SSH_KEY_PATH"),
            ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
            models_yaml=_resolve_models_yaml(),
@@ -24,6 +24,7 @@ from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_
 from .services import docker_state, run_action, services_from_settings
 from .speech_models import SpeechModelsManager
 from .ssh import ssh_run
+from .whisperx_install import WhisperXInstaller
 from .swap import SwapManager
 from .updates import UpdateManager, get_update_status
 from .validate import validate_launch
@@ -39,6 +40,7 @@ hardware_probe = HardwareProbe(settings)
 nim_manager = NimManager(settings)
 deep_health = DeepHealth(settings)
 speech_models = SpeechModelsManager(settings)
+whisperx_installer = WhisperXInstaller(settings)

 app = FastAPI(title="spark-control", version="0.1.0")

@@ -535,6 +537,70 @@ async def post_speech_models_restart() -> dict:
    return result


+# ---- WhisperX install (Phase 2 of the WhisperX migration) ----
+
+@app.get("/api/whisperx/status")
+async def get_whisperx_status() -> dict:
+    """Is WhisperX installed + healthy on Spark 2 right now?"""
+    return await whisperx_installer.status()
+
+
+@app.post("/api/whisperx/install")
+async def post_whisperx_install() -> dict:
+    """One-click install: ships the WhisperX build context from inside
+    spark-control to Spark 2, runs `docker build` + `docker run`, polls
+    /health until both models are loaded. Streams progress via the matching
+    GET /api/whisperx/install/{job_id}/stream SSE endpoint."""
+    try:
+        job = await whisperx_installer.trigger()
+    except RuntimeError as e:
+        raise HTTPException(409, str(e))
+    return {"job_id": job.id, "started_at": job.started_at}
+
+
+@app.get("/api/whisperx/install/{job_id}")
+async def get_whisperx_install(job_id: str) -> dict:
+    job = whisperx_installer.get(job_id)
+    if not job:
+        raise HTTPException(404, "unknown job")
+    return {
+        "id": job.id,
+        "state": job.state,
+        "phase": job.phase,
+        "lines": job.lines,
+        "started_at": job.started_at,
+        "finished_at": job.finished_at,
+        "returncode": job.returncode,
+    }
+
+
+@app.get("/api/whisperx/install/{job_id}/stream")
+async def stream_whisperx_install(job_id: str) -> StreamingResponse:
+    job = whisperx_installer.get(job_id)
+    if not job:
+        raise HTTPException(404, "unknown job")
+
+    async def event_stream():
+        last_idx = 0
+        last_phase = ""
+        last_state = ""
+        while True:
+            new_lines = job.lines[last_idx:]
+            last_idx = len(job.lines)
+            for line in new_lines:
+                yield f"data: {json.dumps({'line': line})}\n\n"
+            if job.phase != last_phase or job.state != last_state:
+                yield f"event: phase\ndata: {json.dumps({'phase': job.phase, 'state': job.state})}\n\n"
+                last_phase = job.phase
+                last_state = job.state
+            if job.finished_at:
+                yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
+                return
+            await asyncio.sleep(0.6)
+
+    return StreamingResponse(event_stream(), media_type="text/event-stream")
+
+
@app.get("/api/endpoints")
 async def get_endpoints() -> dict:
    """Service-discovery summary. Stable shape; other apps on the LAN can poll this
@@ -65,6 +65,14 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
            container=s.magpie_container,
            port=s.magpie_port,
        ),
+        "whisperx": ServiceDef(
+            name="whisperx",
+            kind="stt+diarize",
+            host=s.whisperx_host,
+            user=s.whisperx_user,
+            container=s.whisperx_container,
+            port=s.whisperx_port,
+        ),
    }
    for entry in load_custom_services():
        key = entry.get("key")
@@ -664,6 +664,117 @@ async function onSpeechModelsRestart() {
  }
 }

+// ===================== WhisperX install (v0.12) =====================
+
+const wxState = {
+  job_id: null,
+  eventsource: null,
+  timer_handle: null,
+  started_at: null,
+};
+
+async function renderWhisperXBanner() {
+  const card = el('#whisperx-install-card');
+  if (!card) return;
+  let status;
+  try {
+    status = await fetchJSON('/api/whisperx/status');
+  } catch {
+    card.classList.add('hidden');
+    return;
+  }
+  if (status.installed && status.healthy) {
+    card.classList.add('hidden');
+  } else if (status.configured) {
+    card.classList.remove('hidden');
+  } else {
+    card.classList.add('hidden');
+  }
+}
+
+async function onWhisperXInstall() {
+  if (wxState.job_id) {
+    // Just re-attach to the running job
+    showWhisperXDialog();
+    return;
+  }
+  if (!confirm('Install WhisperX on Spark 2? This builds a new Docker image (~10–15 min first time, mostly downloading pyannote + whisper weights). Parakeet/Magpie stay untouched.')) return;
+  try {
+    const r = await fetchJSON('/api/whisperx/install', { method: 'POST' });
+    attachToWhisperXInstall(r.job_id);
+  } catch (e) {
+    alert('Failed to start WhisperX install: ' + e.message);
+  }
+}
+
+function showWhisperXDialog() {
+  el('#whisperx-progress-dialog').showModal();
+}
+
+function attachToWhisperXInstall(jobId) {
+  wxState.job_id = jobId;
+  el('#wx-prog-title').textContent = 'Installing WhisperX…';
+  el('#wx-prog-phase').textContent = 'Starting…';
+  el('#wx-prog-log').textContent = '';
+  showWhisperXDialog();
+
+  // Tick a timer
+  wxState.started_at = Date.now();
+  if (wxState.timer_handle) clearInterval(wxState.timer_handle);
+  wxState.timer_handle = setInterval(() => {
+    const sec = Math.max(0, Math.floor((Date.now() - wxState.started_at) / 1000));
+    const m = Math.floor(sec / 60);
+    el('#wx-prog-elapsed').textContent = `${m}:${(sec % 60).toString().padStart(2, '0')}`;
+  }, 500);
+
+  // Backfill snapshot then connect SSE
+  fetchJSON(`/api/whisperx/install/${jobId}`).then((snap) => {
+    el('#wx-prog-phase').textContent = snap.phase || 'Working…';
+    el('#wx-prog-log').textContent = (snap.lines || []).join('\n');
+    el('#wx-prog-log').scrollTop = el('#wx-prog-log').scrollHeight;
+    if (snap.finished_at) {
+      handleWhisperXDone(snap);
+      return;
+    }
+    const es = new EventSource(`/api/whisperx/install/${jobId}/stream`);
+    wxState.eventsource = es;
+    es.onmessage = (ev) => {
+      try {
+        const log = el('#wx-prog-log');
+        log.textContent += JSON.parse(ev.data).line + '\n';
+        log.scrollTop = log.scrollHeight;
+      } catch {}
+    };
+    es.addEventListener('phase', (ev) => {
+      try { el('#wx-prog-phase').textContent = JSON.parse(ev.data).phase; } catch {}
+    });
+    es.addEventListener('done', (ev) => {
+      try { handleWhisperXDone(JSON.parse(ev.data)); } catch {}
+      es.close();
+      wxState.eventsource = null;
+    });
+    es.onerror = () => { es.close(); wxState.eventsource = null; };
+  }).catch(() => {});
+}
+
+function handleWhisperXDone(d) {
+  if (wxState.timer_handle) { clearInterval(wxState.timer_handle); wxState.timer_handle = null; }
+  wxState.job_id = null;
+  const rc = d.returncode;
+  if (d.state === 'failed' || (rc !== 0 && rc != null)) {
+    el('#wx-prog-title').textContent = `WhisperX install failed (rc=${rc})`;
+    el('#wx-prog-phase').textContent = 'Failed — check the build log below';
+  } else {
+    el('#wx-prog-title').textContent = 'WhisperX installed';
+    el('#wx-prog-phase').textContent = 'Ready ✓ — appears in Always-on services below';
+    // Refresh services + banner state
+    setTimeout(() => {
+      renderServices();
+      renderWhisperXBanner();
+    }, 1000);
+  }
+}
+
 async function onServiceAction(key) {
  if (state.service_action_in_flight) return;
  const [name, action] = key.split(':');
@@ -1860,6 +1971,11 @@ async function init() {
  } catch {}
  setupDashboardTabs();
  setupEndpointCollapse();
+  // WhisperX install button
+  const wxBtn = el('#wx-install');
+  if (wxBtn) wxBtn.addEventListener('click', onWhisperXInstall);
+  const wxCloseBtn = el('#wx-prog-close');
+  if (wxCloseBtn) wxCloseBtn.addEventListener('click', () => el('#whisperx-progress-dialog').close());
  await loadModels();
  await pollStatus();
  await renderServices();
@@ -1869,11 +1985,14 @@ async function init() {
  loadDiskStatus();
  // Speech-model patches panel — slow over SSH, runs after first paint.
  renderSpeechModels();
+  // WhisperX install banner — show only when not yet installed/healthy.
+  renderWhisperXBanner();
  setInterval(pollStatus, 5000);
  setInterval(pollHardware, 8000);    // every 8s
  setInterval(pollUpdates, 300000);  // every 5 min
  setInterval(loadDiskStatus, 60000); // every 60s — disk state changes rarely
  setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
+  setInterval(renderWhisperXBanner, 60000); // every 60s — auto-hides banner after install
 }

 init();
@@ -103,6 +103,46 @@

    <div class="tab-content" id="tab-audio" role="tabpanel" aria-labelledby="tab-audio-trigger">

+    <section id="whisperx-install-card" class="whisperx-install hidden">
+      <div class="wx-install-body">
+        <div class="wx-install-title">
+          <strong>Add WhisperX</strong>
+          <span class="tag ok">recommended</span>
+        </div>
+        <p class="muted small">
+          WhisperX is a single-container speech pipeline (faster-whisper for transcription + pyannote 3.1 for diarization)
+          designed to handle long audio cleanly. Replaces the Parakeet + Sortformer combo we patched together,
+          which crashed on a 90-min meeting. Pulled and built directly on Spark 2 (~10–15 min first time;
+          you only do this once).
+        </p>
+        <p class="muted small">
+          Requires a Hugging Face token at <code>~/.cache/huggingface/token</code> on Spark 2 (already set up).
+        </p>
+        <div class="wx-install-actions">
+          <button id="wx-install" class="btn primary">Install WhisperX</button>
+        </div>
+      </div>
+    </section>
+
+    <dialog id="whisperx-progress-dialog" class="modal">
+      <form method="dialog" class="modal-form">
+        <h3 id="wx-prog-title">Installing WhisperX…</h3>
+        <div class="phase-row">
+          <span class="spinner"></span>
+          <div class="phase" id="wx-prog-phase">Starting…</div>
+          <span class="spacer"></span>
+          <span class="timer" id="wx-prog-elapsed">0:00</span>
+        </div>
+        <details open>
+          <summary class="muted small">Build log</summary>
+          <pre id="wx-prog-log" class="log"></pre>
+        </details>
+        <div class="modal-actions">
+          <button type="button" id="wx-prog-close" class="btn">Close</button>
+        </div>
+      </form>
+    </dialog>
+
    <section id="services-panel" class="services hidden">
      <div class="section-header">
        <h2 class="section-title">Always-on services</h2>
@@ -906,3 +906,16 @@ main {
 }
 .tab-content { display: none; }
 .tab-content.active { display: block; }
+
+/* ===== WhisperX install banner (v0.12) ===== */
+.whisperx-install {
+  background: var(--surface);
+  border: 1px solid var(--info);
+  border-radius: var(--radius);
+  padding: 16px 18px;
+  margin-bottom: 20px;
+}
+.wx-install-body { display: flex; flex-direction: column; gap: 10px; }
+.wx-install-title { display: flex; align-items: center; gap: 10px; }
+.wx-install-title strong { font-size: 15px; color: var(--text); }
+.wx-install-actions { display: flex; gap: 10px; margin-top: 4px; }
@@ -0,0 +1,255 @@
+"""WhisperX install action — ships the build context from inside spark-control
+to Spark 2 over SSH, then runs `docker build` + `docker run` on Spark 2 and
+streams progress back as SSE.
+
+Pattern mirrors NimManager (see nim.py) but for a locally-built container
+rather than an `nvcr.io` pull. Build context lives at
+/app/whisperx_container/ inside the spark-control Docker image (set up by
+the Dockerfile COPY directive).
+
+Endpoints:
+  POST /api/whisperx/install           — kick off
+  GET  /api/whisperx/install/{job_id}  — snapshot
+  GET  /api/whisperx/install/{job_id}/stream — SSE phase + log lines
+  GET  /api/whisperx/status            — installed + healthy?
+"""
+from __future__ import annotations
+import asyncio
+import shlex
+import uuid
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+import httpx
+
+from .config import Settings
+from .ssh import _base_args, ssh_run, ssh_stream, StreamHandle
+
+
+# Build context shipped inside the spark-control image (Dockerfile COPYs it).
+BUILD_CONTEXT_DIR = Path(__file__).resolve().parent.parent / "whisperx_container"
+
+# Files we ship to Spark 2's build dir. Mapped local-name → remote-relative-path.
+BUILD_FILES = {
+    "Dockerfile": "Dockerfile",
+    "requirements.txt": "requirements.txt",
+    "README.md": "README.md",
+    "app/main.py": "app/main.py",
+}
+
+
+@dataclass
+class WhisperXInstallJob:
+    id: str
+    started_at: str
+    state: str = "starting"        # starting | sending | building | running | done | failed
+    phase: str = "Starting…"
+    lines: list[str] = field(default_factory=list)
+    returncode: Optional[int] = None
+    finished_at: Optional[str] = None
+
+    def append(self, line: str) -> None:
+        self.lines.append(line)
+        if len(self.lines) > 1500:
+            del self.lines[: len(self.lines) - 1500]
+
+
+class WhisperXInstaller:
+    def __init__(self, settings: Settings) -> None:
+        self.settings = settings
+        self.lock = asyncio.Lock()
+        self.jobs: dict[str, WhisperXInstallJob] = {}
+        self.current_job_id: Optional[str] = None
+
+    def get(self, job_id: str) -> WhisperXInstallJob | None:
+        return self.jobs.get(job_id)
+
+    async def status(self) -> dict:
+        """Probe whether WhisperX is installed + healthy on its configured host."""
+        s = self.settings
+        host_present = bool(s.whisperx_host and s.whisperx_user)
+        if not host_present:
+            return {"configured": False, "installed": False, "healthy": False}
+        # Probe HTTP health
+        url = f"http://{s.whisperx_host}:{s.whisperx_port}/health"
+        try:
+            async with httpx.AsyncClient(timeout=3.0) as client:
+                r = await client.get(url)
+            if r.status_code == 200:
+                body = r.json()
+                return {
+                    "configured": True,
+                    "installed": True,
+                    "healthy": True,
+                    "model": body.get("model"),
+                    "device": body.get("device"),
+                    "diarizer_loaded": body.get("diarizer_loaded", False),
+                }
+        except Exception:
+            pass
+        # No HTTP — check if the container exists at all
+        container_present = await self._container_exists()
+        return {
+            "configured": True,
+            "installed": container_present,
+            "healthy": False,
+            "current_job_id": self.current_job_id,
+        }
+
+    async def _container_exists(self) -> bool:
+        s = self.settings
+        cmd = f"docker ps -a --filter name=^{s.whisperx_container}$ --format '{{{{.Names}}}}'"
+        rc, out, _ = await ssh_run(s.whisperx_host, s.whisperx_user, cmd, s, timeout=10)
+        return rc == 0 and s.whisperx_container in out
+
+    async def trigger(self) -> WhisperXInstallJob:
+        if self.lock.locked():
+            raise RuntimeError("a WhisperX install is already in progress")
+        s = self.settings
+        if not s.whisperx_host or not s.whisperx_user:
+            raise RuntimeError("whisperx host/user not configured")
+        for local_name in BUILD_FILES:
+            if not (BUILD_CONTEXT_DIR / local_name).exists():
+                raise RuntimeError(f"build context file missing inside spark-control image: {local_name}")
+        job = WhisperXInstallJob(
+            id=uuid.uuid4().hex[:8],
+            started_at=datetime.now(timezone.utc).isoformat(),
+        )
+        self.jobs[job.id] = job
+        self.current_job_id = job.id
+        asyncio.create_task(self._run(job))
+        return job
+
+    async def _run(self, job: WhisperXInstallJob) -> None:
+        async with self.lock:
+            try:
+                await self._do(job)
+                if job.state != "failed":
+                    job.state = "done"
+                    job.returncode = 0
+                    job.phase = "Done — WhisperX is running on port 8002"
+            except Exception as e:
+                job.append(f"[error] {type(e).__name__}: {e}")
+                job.state = "failed"
+                if job.returncode is None:
+                    job.returncode = 1
+            finally:
+                job.finished_at = datetime.now(timezone.utc).isoformat()
+                if self.current_job_id == job.id:
+                    self.current_job_id = None
+
+    async def _ssh_pipe(self, host: str, user: str, remote_cmd: str,
+                       payload: bytes, timeout: float = 60.0) -> tuple[bool, str, str]:
+        """ssh user@host <remote_cmd> with payload piped to stdin."""
+        args = _base_args(self.settings) + [f"{user}@{host}", remote_cmd]
+        proc = await asyncio.create_subprocess_exec(
+            *args,
+            stdin=asyncio.subprocess.PIPE,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+        try:
+            stdout_b, stderr_b = await asyncio.wait_for(
+                proc.communicate(input=payload), timeout=timeout
+            )
+        except asyncio.TimeoutError:
+            proc.kill(); await proc.wait()
+            return False, "", f"timeout after {timeout}s"
+        return proc.returncode == 0, stdout_b.decode(errors="replace"), stderr_b.decode(errors="replace")
+
+    async def _do(self, job: WhisperXInstallJob) -> None:
+        s = self.settings
+        host = s.whisperx_host
+        user = s.whisperx_user
+        build_dir = "~/whisperx-build"
+
+        # ── Phase 1: stage build context on Spark 2 ──
+        job.state = "sending"
+        job.phase = "Sending build context to Spark 2…"
+        job.append(f"$ ssh {user}@{host} 'mkdir -p {build_dir}/app'")
+        rc, out, err = await ssh_run(host, user, f"mkdir -p {build_dir}/app && rm -f {build_dir}/Dockerfile {build_dir}/requirements.txt {build_dir}/README.md {build_dir}/app/main.py", s, timeout=10)
+        if rc != 0:
+            job.append(f"[mkdir failed] {err.strip()}")
+            raise RuntimeError("failed to create build directory")
+        for local_name, remote_rel in BUILD_FILES.items():
+            local_path = BUILD_CONTEXT_DIR / local_name
+            body = local_path.read_bytes()
+            remote_path = f"{build_dir}/{remote_rel}"
+            cmd = f"cat > {shlex.quote(remote_path)}"
+            ok, out, err = await self._ssh_pipe(host, user, cmd, body, timeout=30)
+            if not ok:
+                job.append(f"[scp {local_name} failed] {err.strip()[:200]}")
+                raise RuntimeError(f"failed to ship {local_name}")
+            job.append(f"  → {remote_path} ({len(body)} bytes)")
+
+        # ── Phase 2: docker build ──
+        job.state = "building"
+        job.phase = "Building Docker image on Spark 2 (this is the slow part — 5–15 min if base layers aren't cached)…"
+        build_cmd = (
+            f"set -e; "
+            f"cd {build_dir}; "
+            f"echo '=== docker build -t {s.whisperx_container}:latest . ==='; "
+            f"docker build -t {s.whisperx_container}:latest ."
+        )
+        job.append(f"$ {build_cmd}")
+        handle = StreamHandle()
+        async for line in ssh_stream(host, user, build_cmd, s, handle=handle):
+            job.append(line)
+            if "Step " in line and "/" in line:
+                # docker build progress: "Step 5/10 : RUN pip install ..."
+                job.phase = f"Building: {line.strip()[:120]}"
+            elif "Successfully built" in line or "naming to" in line:
+                job.phase = "Image built — preparing to start container…"
+        if (handle.returncode or 0) != 0:
+            job.returncode = handle.returncode
+            raise RuntimeError(f"docker build failed (rc={handle.returncode})")
+
+        # ── Phase 3: docker run ──
+        job.state = "running"
+        job.phase = "Starting container…"
+        run_cmd = (
+            f"set -e; "
+            f"echo '=== removing any prior {s.whisperx_container} container ==='; "
+            f"docker rm -f {s.whisperx_container} 2>/dev/null || true; "
+            f"echo '=== docker run -d --restart unless-stopped --name {s.whisperx_container} ==='; "
+            f"HF_TOKEN=$(cat ~/.cache/huggingface/token 2>/dev/null || true); "
+            f"if [ -z \"$HF_TOKEN\" ]; then echo 'WARN: no HF_TOKEN found at ~/.cache/huggingface/token — diarization will be disabled until you set one'; fi; "
+            f"docker run -d --restart unless-stopped "
+            f"--name {s.whisperx_container} "
+            f"--gpus all --memory=40g "
+            f"-p {s.whisperx_port}:{s.whisperx_port} "
+            f"-v whisperx-models:/root/.cache/huggingface "
+            f"-e HF_TOKEN=\"$HF_TOKEN\" "
+            f"-e WHISPER_MODEL={s.whisperx_model} "
+            f"{s.whisperx_container}:latest"
+        )
+        job.append(f"$ {run_cmd}")
+        rc, out, err = await ssh_run(host, user, run_cmd, s, timeout=60)
+        if rc != 0:
+            job.append(f"[docker run failed rc={rc}] {(err or out).strip()[:300]}")
+            raise RuntimeError("docker run failed")
+        job.append(out.strip())
+
+        # ── Phase 4: wait for /health to report ready ──
+        job.phase = "Container is starting; loading whisper + alignment + pyannote models (~60–120 s on first boot)…"
+        url = f"http://{s.whisperx_host}:{s.whisperx_port}/health"
+        ready = False
+        for i in range(60):           # up to ~180 s
+            await asyncio.sleep(3)
+            try:
+                async with httpx.AsyncClient(timeout=4.0) as client:
+                    r = await client.get(url)
+                if r.status_code == 200:
+                    body = r.json()
+                    if body.get("status") == "ready":
+                        ready = True
+                        job.append(f"[ready] {body}")
+                        break
+                    job.phase = f"Loading models (transcribe={body.get('transcribe_loaded')}, align={body.get('align_loaded')}, diarize={body.get('diarizer_loaded')})…"
+            except Exception:
+                pass
+        if not ready:
+            raise RuntimeError("container started but /health did not report ready within ~180 s — check `docker logs whisperx-asr` on Spark 2")
+        job.phase = "Done — WhisperX is healthy and reachable on port 8002"
@@ -0,0 +1,51 @@
+# WhisperX ASR + diarization container for Spark 2 (Blackwell GB10, sm_120).
+#
+# Replaces the custom Parakeet wrapper + Sortformer overlay with a single
+# mainline pipeline: faster-whisper for transcription + pyannote.audio 3.1
+# for diarization + wav2vec2 forced alignment for word-level timestamps.
+#
+# Build (on Spark 2, where Blackwell + nvcr.io credentials are available):
+#   docker build -t whisperx-asr:latest .
+#
+# Run:
+#   docker run -d --restart unless-stopped --name whisperx-asr \
+#     --gpus all --memory=40g \
+#     -p 8002:8002 \
+#     -v whisperx-models:/root/.cache/huggingface \
+#     -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
+#     -e WHISPER_MODEL=medium \
+#     whisperx-asr:latest
+#
+# The memory cap is intentional: even if WhisperX hits a pathological input,
+# it gets OOM-killed cleanly instead of swap-thrashing the whole Spark.
+
+FROM nvcr.io/nvidia/pytorch:25.11-py3
+
+# WhisperX runs ffmpeg under the hood for audio decoding
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends ffmpeg \
+ && rm -rf /var/lib/apt/lists/*
+
+# Install whisperx + the FastAPI wrapper deps. --break-system-packages because
+# the NGC PyTorch image has its own managed Python that's flagged "system".
+COPY requirements.txt /tmp/requirements.txt
+RUN pip install --break-system-packages --no-cache-dir -r /tmp/requirements.txt
+
+# Pre-warm the default Whisper + alignment models at build time so first-call
+# latency on a fresh container is small. (~3 GB cached into the image; if you
+# want a smaller image, comment this out and accept the first-call download.)
+ARG WHISPER_MODEL=medium
+ENV WHISPER_MODEL=${WHISPER_MODEL}
+RUN python3 -c "import whisperx; whisperx.load_model('${WHISPER_MODEL}', 'cpu', compute_type='int8')" \
+ && python3 -c "import whisperx; whisperx.load_align_model(language_code='en', device='cpu')"
+
+WORKDIR /opt/whisperx
+COPY app /opt/whisperx/app
+
+# Expose for spark-control's proxy on Spark 2
+EXPOSE 8002
+
+HEALTHCHECK --interval=30s --timeout=10s --start-period=180s \
+  CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8002/health')" || exit 1
+
+CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002", "--workers", "1"]
@@ -0,0 +1,74 @@
+# WhisperX container for Spark 2
+
+Replaces the custom Parakeet wrapper + Sortformer overlay (v0.10/v0.11) with a
+single mainline pipeline:
+
+- **faster-whisper** (CTranslate2-optimized) for STT
+- **pyannote.audio 3.1** for speaker diarization (sliding-window — handles
+  long files in bounded memory, fixes the Sortformer OOM on 90-min audio)
+- **wav2vec2 forced alignment** for word-level timestamps
+
+Exposes the same API surface spark-control already proxies to, so the cutover
+is a one-URL change in the audio proxy:
+
+- `GET  /health` — readiness probe
+- `GET  /v1/models` — model list
+- `POST /v1/audio/transcriptions` — OpenAI-shaped STT
+- `POST /v1/audio/transcribe-with-speakers` — merged diarized transcript
+  (matches spark-control's response shape exactly)
+
+## Deploy to Spark 2
+
+```bash
+# 1. Copy this directory to Spark 2
+rsync -av --delete image/whisperx_container/ <spark-user>@<spark-2-ip>:~/whisperx-build/
+
+# 2. SSH in and build
+ssh <spark-user>@<spark-2-ip>
+cd ~/whisperx-build
+docker build -t whisperx-asr:latest .
+
+# 3. Run alongside the existing parakeet-asr (which stays on 8000 for now)
+docker run -d --restart unless-stopped --name whisperx-asr \
+  --gpus all --memory=40g \
+  -p 8002:8002 \
+  -v whisperx-models:/root/.cache/huggingface \
+  -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
+  -e WHISPER_MODEL=medium \
+  whisperx-asr:latest
+
+# 4. Watch first-start logs (model load + first health check)
+docker logs -f whisperx-asr
+```
+
+## Model size knobs
+
+`WHISPER_MODEL` env var. Defaults to `medium`. Options:
+
+| Model | Size | Speed (GB10) | Quality |
+|---|---|---|---|
+| `tiny`  | ~75M  | ~120x rt | low |
+| `base`  | ~74M  | ~80x rt  | ok |
+| `small` | ~244M | ~50x rt  | good |
+| `medium`| ~769M | ~30x rt  | excellent (**default**) |
+| `large-v3`| ~1.5B | ~15x rt | best |
+
+For a 90-min file, medium takes ~3 min STT + ~9 min diarize ≈ ~12 min total.
+
+## Memory budget
+
+The `--memory=40g` cap is intentional. Spark 2 has 122 GB unified, of which
+~35 GB is consumed by parakeet-asr + magpie-tts. The 40 GB cap leaves
+comfortable headroom for both the model weights (~5 GB) and pyannote's
+in-memory features (~5–15 GB for a 90-min audio). If WhisperX hits a
+pathological input it gets OOM-killed cleanly instead of swap-thrashing the
+whole Spark — the symptom we hit with the unbounded Sortformer container.
+
+## Rollback to Parakeet+Sortformer
+
+```bash
+docker stop whisperx-asr && docker rm whisperx-asr
+```
+
+The parakeet-asr container stays running throughout — spark-control's proxy
+URL switch is reversible via config or version downgrade.
@@ -0,0 +1,355 @@
+"""WhisperX FastAPI wrapper — STT + speaker diarization in a single endpoint.
+
+Endpoints (designed to be drop-in compatible with the existing spark-control
+audio API surface, so the proxy just changes its upstream URL):
+
+  GET  /                                 — service info
+  GET  /health                           — readiness probe
+  GET  /v1/models                        — list loaded models
+  POST /v1/audio/transcriptions          — OpenAI-shaped STT (no speakers)
+  POST /v1/audio/transcribe-with-speakers — merged diarized transcript
+
+The /transcribe-with-speakers response shape EXACTLY matches what
+spark-control's /api/audio/transcribe-with-speakers returns today (the one
+that recap-relay's PR spec was written against), so swapping the upstream
+from Parakeet+Sortformer to WhisperX is a one-URL change in the proxy.
+"""
+from __future__ import annotations
+import os
+import time
+import tempfile
+import logging
+from contextlib import asynccontextmanager
+from typing import Optional
+
+import torch
+import whisperx
+from fastapi import FastAPI, File, Form, UploadFile, HTTPException
+from fastapi.responses import JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+logger = logging.getLogger("whisperx-api")
+
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+COMPUTE_TYPE = os.getenv("COMPUTE_TYPE", "float16" if DEVICE == "cuda" else "int8")
+WHISPER_MODEL = os.getenv("WHISPER_MODEL", "medium")
+DEFAULT_LANG = os.getenv("DEFAULT_LANGUAGE", "en")
+BATCH_SIZE = int(os.getenv("BATCH_SIZE", "16"))
+HF_TOKEN = os.getenv("HF_TOKEN") or None
+
+
+class WhisperXEngine:
+    def __init__(self) -> None:
+        self.transcribe_model = None
+        self.align_model = None
+        self.align_metadata = None
+        self.diarize_model = None
+        self._loaded = False
+
+    def load(self) -> None:
+        if self._loaded:
+            return
+        logger.info(f"Loading whisper-{WHISPER_MODEL} on {DEVICE} ({COMPUTE_TYPE})")
+        self.transcribe_model = whisperx.load_model(
+            WHISPER_MODEL, DEVICE, compute_type=COMPUTE_TYPE
+        )
+        logger.info(f"Loading alignment model for {DEFAULT_LANG}")
+        self.align_model, self.align_metadata = whisperx.load_align_model(
+            language_code=DEFAULT_LANG, device=DEVICE
+        )
+        if HF_TOKEN:
+            logger.info("Loading pyannote diarization pipeline (3.1)")
+            try:
+                self.diarize_model = whisperx.DiarizationPipeline(
+                    use_auth_token=HF_TOKEN, device=DEVICE
+                )
+            except Exception as e:
+                logger.exception(f"Diarization pipeline failed to load: {e}")
+                self.diarize_model = None
+        else:
+            logger.warning(
+                "HF_TOKEN not set — diarization disabled. /transcribe-with-speakers "
+                "will return 503. /transcriptions still works."
+            )
+        self._loaded = True
+        logger.info("WhisperX engine ready")
+
+    def transcribe(self, audio_bytes: bytes, filename: str, want_timestamps: bool = True) -> dict:
+        if not self._loaded:
+            self.load()
+        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
+            tmp.write(audio_bytes)
+            tmp_path = tmp.name
+        try:
+            audio = whisperx.load_audio(tmp_path)
+            duration = float(audio.shape[0]) / 16000.0
+            result = self.transcribe_model.transcribe(
+                audio, batch_size=BATCH_SIZE, language=DEFAULT_LANG
+            )
+            language = result.get("language") or DEFAULT_LANG
+            if want_timestamps:
+                aligned = whisperx.align(
+                    result["segments"],
+                    self.align_model,
+                    self.align_metadata,
+                    audio,
+                    DEVICE,
+                    return_char_alignments=False,
+                )
+                segments = aligned.get("segments", [])
+            else:
+                segments = result.get("segments", [])
+            full_text = " ".join(s.get("text", "").strip() for s in segments).strip()
+            return {
+                "duration": duration,
+                "language": language,
+                "text": full_text,
+                "segments": segments,
+                "audio_path": tmp_path,
+                "audio": audio,  # caller can reuse for diarization without re-loading
+            }
+        finally:
+            # NOTE: caller is responsible for unlinking the temp file. We expose it
+            # in the return dict so diarization can run on the same audio without
+            # disk re-IO. The unlink happens in the request handler's finally.
+            pass
+
+    def diarize(self, audio) -> dict:
+        if self.diarize_model is None:
+            raise RuntimeError(
+                "Diarization pipeline not loaded (HF_TOKEN missing or load failed)"
+            )
+        diar = self.diarize_model(audio)
+        return diar
+
+
+engine = WhisperXEngine()
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    engine.load()
+    yield
+
+
+app = FastAPI(
+    title="WhisperX ASR + Diarization",
+    version="1.0.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+@app.get("/")
+async def root() -> dict:
+    return {
+        "service": "whisperx",
+        "device": DEVICE,
+        "models": {
+            "transcription": f"whisper-{WHISPER_MODEL}",
+            "alignment": f"wav2vec2-{DEFAULT_LANG}",
+            "diarization": "pyannote-speaker-diarization-3.1" if engine.diarize_model else None,
+        },
+        "endpoints": {
+            "transcriptions": "/v1/audio/transcriptions",
+            "transcribe_with_speakers": "/v1/audio/transcribe-with-speakers",
+            "models": "/v1/models",
+            "health": "/health",
+        },
+    }
+
+
+@app.get("/health")
+async def health() -> dict:
+    return {
+        "status": "ready" if engine._loaded else "loading",
+        "transcribe_loaded": engine.transcribe_model is not None,
+        "align_loaded": engine.align_model is not None,
+        "diarizer_loaded": engine.diarize_model is not None,
+        "model": f"whisper-{WHISPER_MODEL}",
+        "device": DEVICE,
+    }
+
+
+@app.get("/v1/models")
+async def list_models() -> dict:
+    data = [
+        {"id": f"whisper-{WHISPER_MODEL}", "object": "model", "owned_by": "openai", "kind": "stt"},
+    ]
+    if engine.diarize_model is not None:
+        data.append(
+            {"id": "pyannote-speaker-diarization-3.1", "object": "model",
+             "owned_by": "pyannote", "kind": "diarization"}
+        )
+    return {"object": "list", "data": data}
+
+
+def _normalize_speaker(label: str) -> str:
+    """WhisperX/pyannote uses 'SPEAKER_00' / 'SPEAKER_01' / ... — normalize to
+    the same 'Speaker_0' shape spark-control's existing endpoint returns."""
+    if not label:
+        return "Speaker_unknown"
+    if label.upper().startswith("SPEAKER_"):
+        idx = label.split("_", 1)[1].lstrip("0") or "0"
+        return f"Speaker_{idx}"
+    return label
+
+
+def _segments_to_blocks(segments: list[dict]) -> list[dict]:
+    """Convert WhisperX's per-utterance segments into the
+    [{start_ms, end_ms, speaker, text}, ...] block shape spark-control returns
+    today. Groups consecutive same-speaker segments into one block."""
+    blocks: list[dict] = []
+    cur = None
+    for s in segments:
+        spk_raw = s.get("speaker") or "Speaker_unknown"
+        spk = _normalize_speaker(spk_raw)
+        text = (s.get("text") or "").strip()
+        start_ms = int(float(s.get("start", 0)) * 1000)
+        end_ms = int(float(s.get("end", 0)) * 1000)
+        if not text:
+            continue
+        if cur is None or cur["speaker"] != spk or start_ms - cur["end_ms"] > 1500:
+            if cur is not None:
+                blocks.append(cur)
+            cur = {"start_ms": start_ms, "end_ms": end_ms, "speaker": spk, "text": text}
+        else:
+            cur["text"] = (cur["text"] + " " + text).strip()
+            cur["end_ms"] = end_ms
+    if cur is not None:
+        blocks.append(cur)
+    return blocks
+
+
+@app.post("/v1/audio/transcriptions")
+async def transcribe(
+    file: UploadFile = File(...),
+    model: Optional[str] = Form(default=None),
+    language: Optional[str] = Form(default=None),
+    response_format: Optional[str] = Form(default="json"),
+    temperature: Optional[float] = Form(default=None),
+    prompt: Optional[str] = Form(default=None),
+):
+    if not engine._loaded:
+        raise HTTPException(status_code=503, detail="Engine loading")
+    audio_bytes = await file.read()
+    if not audio_bytes:
+        raise HTTPException(status_code=400, detail="Empty file")
+
+    start_t = time.time()
+    audio_path = None
+    try:
+        result = engine.transcribe(
+            audio_bytes,
+            file.filename or "audio.wav",
+            want_timestamps=(response_format == "verbose_json"),
+        )
+        audio_path = result.pop("audio_path", None)
+        result.pop("audio", None)
+    except Exception as e:
+        logger.exception("Transcription failed")
+        raise HTTPException(status_code=500, detail=f"Failed: {e}")
+    finally:
+        if audio_path:
+            try: os.unlink(audio_path)
+            except OSError: pass
+
+    elapsed = time.time() - start_t
+    duration = result.get("duration", 0.0)
+    logger.info(f"Transcribed {duration:.1f}s in {elapsed:.1f}s ({duration/elapsed:.0f}x rt)")
+
+    if response_format == "text":
+        return JSONResponse(content=result["text"], media_type="text/plain")
+    if response_format == "verbose_json":
+        words = []
+        for s in result.get("segments", []):
+            for w in s.get("words", []) or []:
+                words.append({
+                    "word": w.get("word"),
+                    "start": w.get("start"),
+                    "end": w.get("end"),
+                    "score": w.get("score"),
+                })
+        return {
+            "task": "transcribe",
+            "language": result.get("language", "en"),
+            "duration": duration,
+            "text": result["text"],
+            "segments": [
+                {"start": s.get("start"), "end": s.get("end"), "text": s.get("text", "").strip()}
+                for s in result.get("segments", [])
+            ],
+            "words": words,
+        }
+    return {"text": result["text"]}
+
+
+@app.post("/v1/audio/transcribe-with-speakers")
+async def transcribe_with_speakers(file: UploadFile = File(...)) -> dict:
+    """Merged STT + diarization. Response shape matches spark-control's
+    /api/audio/transcribe-with-speakers exactly — recap-relay's PR spec
+    needs no changes when we cut over."""
+    if not engine._loaded:
+        raise HTTPException(status_code=503, detail="Engine loading")
+    if engine.diarize_model is None:
+        raise HTTPException(
+            status_code=503,
+            detail="Diarization unavailable — HF_TOKEN not set or pyannote failed to load",
+        )
+    audio_bytes = await file.read()
+    if not audio_bytes:
+        raise HTTPException(status_code=400, detail="Empty file")
+
+    start_t = time.time()
+    audio_path = None
+    try:
+        result = engine.transcribe(
+            audio_bytes, file.filename or "audio.wav", want_timestamps=True
+        )
+        audio_path = result.pop("audio_path", None)
+        audio = result.pop("audio")
+        # Diarize on the in-memory audio (no second decode)
+        logger.info("Running pyannote diarization…")
+        diar = engine.diarize(audio)
+        # whisperx.assign_word_speakers writes speaker labels into the
+        # aligned segments + their nested words
+        result_with_speakers = whisperx.assign_word_speakers(
+            diar, {"segments": result["segments"]}
+        )
+        segments_in = result_with_speakers.get("segments", [])
+        blocks = _segments_to_blocks(segments_in)
+        speakers = sorted({b["speaker"] for b in blocks if b["speaker"] != "Speaker_unknown"})
+    except Exception as e:
+        logger.exception("Diarized transcription failed")
+        raise HTTPException(status_code=500, detail=f"Failed: {e}")
+    finally:
+        if audio_path:
+            try: os.unlink(audio_path)
+            except OSError: pass
+
+    elapsed = time.time() - start_t
+    duration = result.get("duration", 0.0)
+    logger.info(
+        f"Transcribed+diarized {duration:.1f}s in {elapsed:.1f}s "
+        f"({duration/elapsed:.0f}x rt), {len(speakers)} speakers, {len(blocks)} blocks"
+    )
+    return {
+        "duration": duration,
+        "language": result.get("language", "en"),
+        "speakers_detected": speakers,
+        "segments": blocks,
+        "models": {
+            "transcription": f"whisper-{WHISPER_MODEL}",
+            "diarization": "pyannote-speaker-diarization-3.1",
+        },
+    }
@@ -0,0 +1,5 @@
+whisperx==3.4.3
+fastapi>=0.115
+uvicorn[standard]>=0.32
+python-multipart>=0.0.9
+soundfile>=0.12
@@ -1,10 +1,10 @@
 import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'

 export const v0_1_0 = VersionInfo.of({
-  version: '0.11.0:3',
+  version: '0.12.0:0',
  releaseNotes: {
    en_US:
-      'v0.11.0:3 — button sizing fix. The "Reapply patches", "Restart container", "Switch to this", and "Download" buttons inherited 15px from the body font. Only the service-card action buttons (Start/Restart/Stop on parakeet/magpie) had an explicit 12px override — exactly the size you liked. Changed the base .btn to 12px font + 6px 12px padding so every action button across the dashboard matches the service-card button footprint. Per-context overrides (.service-actions .btn, .nim-card .btn, etc.) are now redundant but kept in place; they no longer make a visible difference.',
+      'v0.12.0 — WhisperX as a one-click dashboard install. The Audio / Speech tab now shows an "Add WhisperX" banner the first time you open it (when WhisperX isn\'t installed). Clicking it ships the build context to Spark 2 over SSH, runs docker build (~10–15 min first time), runs docker run with a 40 GB memory cap (so a long-audio pathological case gets OOM-killed cleanly instead of swap-thrashing the whole Spark — what bit us with Sortformer on a 90-min file), and polls /health until both Whisper + pyannote 3.1 report loaded. Progress streams live in a build-log dialog with phase + elapsed timer. Once installed, WhisperX auto-appears as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart on wedge — same lifecycle as the others). The /api/audio/transcribe-with-speakers endpoint now prefers WhisperX when it\'s healthy and falls back to the legacy Parakeet + Sortformer path otherwise — clean cutover, no client-side changes, easy rollback. New endpoints: GET /api/whisperx/status, POST /api/whisperx/install, GET /api/whisperx/install/{job_id}, GET /api/whisperx/install/{job_id}/stream (SSE).',
  },
  migrations: {
    up: async ({ effects }) => {},