v0.12.0:0 - WhisperX as a one-click dashboard install + managed service

Replaces the manual rsync+build+run with a proper spark-control feature.
First in the audio path that doesn't require shell access on Spark 2.

What's in the box
─────────────────
* image/whisperx_container/   - the build context (Dockerfile, requirements,
  app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT +
  pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint
  /v1/audio/transcribe-with-speakers returns the exact same shape spark-
  control's existing endpoint does, so the recap-relay PR spec needs no
  changes when we cut over.

* image/app/whisperx_install.py - install manager. ships build context to
  Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB
  memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min
  file), polls /health until both Whisper + pyannote report loaded.

* Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX
  when its /health reports diarizer_loaded=true, falls back to the legacy
  Parakeet + Sortformer path otherwise. Same response shape either way.
  Clean cutover, easy rollback (`docker rm whisperx-asr`).

* Dashboard (Audio / Speech tab):
  - "Add WhisperX" banner appears when not installed, with a primary
    "Install WhisperX" button. One click triggers the install.
  - Build progress dialog with phase + elapsed timer + live build log via
    SSE (`/api/whisperx/install/{job_id}/stream`).
  - After install, WhisperX auto-registers as a managed service alongside
    Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart).
  - Banner self-hides once /api/whisperx/status reports healthy.

New endpoints
─────────────
  GET  /api/whisperx/status
  POST /api/whisperx/install
  GET  /api/whisperx/install/{job_id}
  GET  /api/whisperx/install/{job_id}/stream  (SSE phase + log)

Config additions (env)
──────────────────────
  WHISPERX_HOST       (defaults to spark2_host)
  WHISPERX_USER       (defaults to spark2_user)
  WHISPERX_CONTAINER  (default: whisperx-asr)
  WHISPERX_PORT       (default: 8002)
  WHISPERX_MODEL      (default: medium; tiny/base/small/medium/large-v3)

Dockerfile
──────────
Added COPY whisperx_container /app/whisperx_container so the runtime
install manager can read the build context from inside the spark-control
image and ship it over SSH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-05-18 21:02:26 -05:00
parent cfc1c408d4
commit 5a0bfba6a3
14 changed files with 1033 additions and 3 deletions
+2 -2
View File
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({
version: '0.11.0:3',
version: '0.12.0:0',
releaseNotes: {
en_US:
'v0.11.0:3button sizing fix. The "Reapply patches", "Restart container", "Switch to this", and "Download" buttons inherited 15px from the body font. Only the service-card action buttons (Start/Restart/Stop on parakeet/magpie) had an explicit 12px override — exactly the size you liked. Changed the base .btn to 12px font + 6px 12px padding so every action button across the dashboard matches the service-card button footprint. Per-context overrides (.service-actions .btn, .nim-card .btn, etc.) are now redundant but kept in place; they no longer make a visible difference.',
'v0.12.0 — WhisperX as a one-click dashboard install. The Audio / Speech tab now shows an "Add WhisperX" banner the first time you open it (when WhisperX isn\'t installed). Clicking it ships the build context to Spark 2 over SSH, runs docker build (~1015 min first time), runs docker run with a 40 GB memory cap (so a long-audio pathological case gets OOM-killed cleanly instead of swap-thrashing the whole Spark — what bit us with Sortformer on a 90-min file), and polls /health until both Whisper + pyannote 3.1 report loaded. Progress streams live in a build-log dialog with phase + elapsed timer. Once installed, WhisperX auto-appears as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart on wedge — same lifecycle as the others). The /api/audio/transcribe-with-speakers endpoint now prefers WhisperX when it\'s healthy and falls back to the legacy Parakeet + Sortformer path otherwise — clean cutover, no client-side changes, easy rollback. New endpoints: GET /api/whisperx/status, POST /api/whisperx/install, GET /api/whisperx/install/{job_id}, GET /api/whisperx/install/{job_id}/stream (SSE).',
},
migrations: {
up: async ({ effects }) => {},