T

Keysat 713cd09cc2 v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers

Adds a new pipeline for diarized transcription that any client (recap-relay,
ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM
or UI included — name resolution / analysis happen downstream where prompts
and rendering are configurable.

Architecture:
  Spark 2 / parakeet-asr container:
    + /opt/parakeet/app/diarizer.py        (new: SortformerDiarizer class)
    + /opt/parakeet/app/main.py            (patched: loads diarizer, adds
                                            /v1/audio/diarize endpoint)
    Model: nvidia/diar_sortformer_4spk-v1  (~150 MB, ungated, NeMo native)

  Spark Control:
    + POST /api/audio/transcribe-with-speakers
      Body: multipart file
      Returns: {
        duration, language, speakers_detected,
        segments: [{start_ms, end_ms, speaker, text}, ...],
        models: {transcription, diarization}
      }
      Runs Parakeet ASR + Sortformer in parallel, merges words to speaker
      turns by timestamp, groups into speaker-change blocks (breaks also
      on >1.5s silence gaps).
    + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns
      503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2.

Apply Sortformer patches to the running Parakeet container with:
  bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user>

Patches are reversible — apply.sh backs up the original main.py inside the
container at main.py.pre-sortformer before overwriting. Restore by copying
that file back and removing diarizer.py, then docker restart.

v0.11 follow-up: dashboard "Speech Models" panel to swap/update model
versions from the UI instead of needing to re-run apply.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:14:48 -05:00

image

v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers

2026-05-18 15:14:48 -05:00

package

v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers

2026-05-18 15:14:48 -05:00

.gitignore

Initial scaffold: image/ FastAPI app, models.yaml, docs

2026-05-12 09:29:13 -05:00

claude-code-starter-prompt.md

Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)

2026-05-12 09:49:08 -05:00

known-issues.md

v0.8.0:3 - add --max-num-batched-tokens=16384 to vision models (gemma4, qwen3-vl)

2026-05-12 14:47:32 -05:00

LICENSE

Pack spark-control_x86_64.s9pk (55 MB)

2026-05-12 09:52:53 -05:00

README.md

v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint

2026-05-12 13:19:27 -05:00

runbook.md

Add per-model descriptions + repo-cleanup polish

2026-05-12 10:19:09 -05:00

README.md

spark-control

A browser-based control panel for a dual-DGX-Spark vLLM cluster. Designed to run as a StartOS 0.4 package on a Start9 server on the same LAN as the Sparks.

What it does

Shows which LLM is currently loaded on the cluster (:8888/v1/models).
Click to swap to a different model — stops the current one, launches the new one, streams logs to the UI until Application startup complete. appears.
Surfaces health for Parakeet (STT, :8000) and Magpie (TTS, :9000) on Spark 2.

Architecture

[Browser/phone] ──► [StartOS reverse proxy] ──► [spark-control container]
                                                       │  (SSH over LAN)
                                                       ▼
                                                  [Spark 1] ──► launch-cluster.sh
                                                       │
                                                       ▼
                                                  [Spark 2]

Two layers in this repo:

image/ — a self-contained FastAPI app + static UI. Runs anywhere with uvicorn and an SSH client. Useful for development.
package/ — a thin StartOS 0.4 wrapper that packages the image, exposes the UI on the LAN, and gives the user actions to configure SSH access to the Sparks.

Quick start (local dev, no StartOS yet)

cd image
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
export SPARK1_HOST=<spark-1-ip>
export SPARK1_USER=<spark-user>
export SPARK2_HOST=<spark-2-ip>
export SPARK2_USER=<spark-user>
export SSH_KEY_PATH="$HOME/Library/Application Support/NVIDIA/Sync/config/nvsync.key"
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload

Open http://localhost:9999.

Note: use the IP <spark-1-ip> for Spark 1, not <spark-1-host>.local. mDNS resolves to IPv6 first and httpx hangs on it because vLLM only binds IPv4.

Build the StartOS package

cd package
npm i        # one-time
make x86     # produces spark-control_x86_64.s9pk (~55 MB)

Requires start-cli, Node ≥ 22, Docker. The build runs tsc + ncc for the TS bundle, then docker build on image/Dockerfile, then start-cli s9pk pack to produce the .s9pk.

To sideload onto your Start9: make install (needs host: set in ~/.startos/config.yaml), or upload the .s9pk via the Start9 web UI's sideload feature.

Post-install setup (one-time per Start9 install)

Open the Spark Control service → Actions → Show Public Key → copy the line.
SSH to each Spark and append the line to ~/.ssh/authorized_keys for the <spark-user> user.
Actions → Configure Sparks → enter <spark-1-ip> / <spark-user> for Spark 1 and <spark-2-ip> / <spark-user> for Spark 2.
Start the service. Open the Web UI — current model + health should show within ~5 s.

Repo layout

image/ — Docker image source (FastAPI app + models.yaml)
package/ — StartOS 0.4 package source
runbook.md — operating notes
known-issues.md — known quirks and workarounds
LICENSE — MIT

Service discovery API

Other services on your LAN can hit GET /api/endpoints to learn where the current model lives without hardcoding Spark IPs. Stable JSON shape:

{
  "vllm":    { "ready": true,  "base_url": "http://<spark-1-ip>:8888/v1", "model": "RedHatAI/Qwen3.6-35B-A3B-NVFP4", "openai_compat": true },
  "parakeet":{ "ready": true,  "base_url": "http://<spark-2-ip>:8000",   "kind": "stt", "model": "nvidia/parakeet-tdt-0.6b-v3" },
  "magpie":  { "ready": false, "base_url": "http://<spark-2-ip>:9000",   "kind": "tts" }
}

base_url is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with ready: true to safely route traffic.

Reporting failures from external apps

Spark Control polls every 5 s, so a brief blip in Parakeet/Magpie/vLLM availability can slip between polls and never make it into the connectivity log. To capture short failures, an external app (e.g. Open WebUI) can POST whenever a call fails (or succeeds):

curl -X POST http://<dashboard-url>/api/health-event \
  -H 'content-type: application/json' \
  -d '{
    "service": "parakeet",
    "ok": false,
    "source": "open-webui",
    "error": "HTTP 503",
    "ms": 420
  }'

Fields: service (required), ok (required), source (optional, free-form), error (optional), ms (optional latency). Each POST appends a report event to the connectivity log alongside the polling-based transition events.

Status

v0.2.3 — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.

What v0.2 added on top of v0.1

Service discovery API (/api/endpoints) for other LAN services
Magpie crash fix documented (chown the model-cache volume to uid 1000)
Always-on services panel with Start/Stop/Restart for Parakeet + Magpie, plus per-service host configuration in Configure Sparks (so Parakeet/Magpie can live on Spark 1, Spark 2, or anywhere)
Model download from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
spark-vllm-docker update check — banner shows "N commits behind upstream"; Apply Update runs git pull && ./build-and-copy.sh -c over SSH with a streamed log
Per-model Advanced settings — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to /data/models-overrides.yaml so they survive package updates. Bundled and custom models alike.

v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.

Releases 2

0.23.0:0 Latest

2026-06-18 03:27:48 +00:00

Languages

Python 65.6%

JavaScript 15.9%

CSS 5.3%

HTML 4.6%

TypeScript 4.3%

Other 4.3%