Keysat 1e1e1cb568 v0.27.1:0 - fix model download: prepend ~/.local/bin so SSH finds uvx
hf-download.sh shells out to uvx (the uv installer drops it in ~/.local/bin),
but the non-interactive SSH session doesn't source the user's profile, so
~/.local/bin was off PATH and downloads died with "uvx: command not found".
build_download_command now prepends $HOME/.local/bin. Adds test_download.py.
2026-06-18 16:44:07 -05:00
2026-05-12 09:52:53 -05:00

spark-control

A browser-based control panel for a dual-DGX-Spark vLLM cluster. Designed to run as a StartOS 0.4 package on a Start9 server on the same LAN as the Sparks.

If you've just received this package from someone, start with HANDOFF.md — it has the prereq checklist and a step-by-step install guide written for a fresh user.

What it does

  • Shows which LLM is currently loaded on the cluster (<spark1-host>:8888/v1/models).
  • Click to swap to a different model — stops the current one, launches the new one, streams logs to the UI until Application startup complete. appears.
  • Surfaces health for Parakeet (STT, :8000) and Kokoro (TTS, :8880) on Spark 2.
  • Proxies OpenAI-compatible chat-completions, transcribe, diarize, and TTS through one trusted host so external apps only need to know about Spark Control.

Architecture

[Browser/phone] ──► [StartOS reverse proxy] ──► [spark-control container]
                                                       │  (SSH over LAN)
                                                       ▼
                                                  [Spark 1] ──► launch-cluster.sh
                                                       │
                                                       ▼
                                                  [Spark 2]

Two layers in this repo:

  • image/ — a self-contained FastAPI app + static UI. Runs anywhere with uvicorn and an SSH client. Useful for development.
  • package/ — a thin StartOS 0.4 wrapper that packages the image, exposes the UI on the LAN, and gives the user actions to configure SSH access to the Sparks.

Quick start (local dev, no StartOS yet)

cd image
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
export SPARK1_HOST=<spark-1-ip>
export SPARK1_USER=<your-ssh-user>
export SPARK2_HOST=<spark-2-ip>
export SPARK2_USER=<your-ssh-user>
export SSH_KEY_PATH=<path-to-your-private-key>
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload

Open http://localhost:9999.

Note: prefer the IP for Spark 1 over a .local hostname. mDNS can resolve to IPv6 first, and httpx will hang on it because vLLM only binds IPv4.

Build the StartOS package

cd package
npm i        # one-time
make x86     # produces spark-control_x86_64.s9pk (~55 MB)
# or
make aarch64 # for ARM-based Start9 servers

Requires start-cli, Node ≥ 22, Docker. The build runs tsc + ncc for the TS bundle, then docker build on image/Dockerfile, then start-cli s9pk pack to produce the .s9pk.

To sideload onto your Start9: make install (needs host: set in ~/.startos/config.yaml), or upload the .s9pk via the Start9 web UI's sideload feature.

Post-install setup (one-time per Start9 install)

  1. Open the Spark Control service → ActionsShow Public Key → copy the produced one-liner.
  2. Run that one-liner from any machine that already has SSH access to your Sparks. It appends the package's pubkey to ~/.ssh/authorized_keys on each Spark.
  3. ActionsConfigure Sparks → enter your Spark 1 / Spark 2 IPs and the SSH username you use to log into them.
  4. Start the service. Open the Web UI — current model + health should show within ~5 s.

See HANDOFF.md for a fuller prereq checklist and the hardware-side setup required before this package is useful.

Repo layout

  • image/ — Docker image source (FastAPI app + models.yaml)
  • package/ — StartOS 0.4 package source
  • HANDOFF.md — prereqs + first-time install guide for a fresh user
  • runbook.md — operating notes
  • known-issues.md — known quirks and workarounds
  • LICENSE — MIT

Service discovery API

Other services on your LAN can hit GET /api/endpoints to learn where the current model lives without hardcoding Spark IPs. Stable JSON shape:

{
  "vllm":    { "ready": true,  "base_url": "http://<spark1-host>:8888/v1", "model": "RedHatAI/Qwen3.6-35B-A3B-NVFP4", "openai_compat": true },
  "parakeet":{ "ready": true,  "base_url": "http://<spark2-host>:8000",   "kind": "stt", "model": "nvidia/parakeet-tdt-0.6b-v3" },
  "kokoro":  { "ready": true,  "base_url": "http://<spark2-host>:8880",   "kind": "tts" }
}

base_url is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with ready: true to safely route traffic.

Reporting failures from external apps

Spark Control polls every 5 s, so a brief blip in Parakeet/Kokoro/vLLM availability can slip between polls and never make it into the connectivity log. To capture short failures, an external app (e.g. Open WebUI) can POST whenever a call fails (or succeeds):

curl -X POST http://<dashboard-url>/api/health-event \
  -H 'content-type: application/json' \
  -d '{
    "service": "parakeet",
    "ok": false,
    "source": "open-webui",
    "error": "HTTP 503",
    "ms": 420
  }'

Fields: service (required), ok (required), source (optional, free-form), error (optional), ms (optional latency). Each POST appends a report event to the connectivity log alongside the polling-based transition events.

Status

s9pk version 0.26.0:0 — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled launch recipes (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.

What v0.2 added on top of v0.1

  • Service discovery API (/api/endpoints) for other LAN services
  • Kokoro-82M TTS replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
  • Always-on services panel with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host/port/container configuration in the in-app ⚙ Settings gear (so they can live on Spark 1, Spark 2, or anywhere, on any port)
  • Model download from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
  • spark-vllm-docker update check — banner shows "N commits behind upstream"; Apply Update runs git pull && ./build-and-copy.sh -c over SSH with a streamed log
  • Per-model Advanced settings — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to /data/models-overrides.yaml so they survive package updates. Bundled and custom models alike.
  • Diarization with speaker fingerprints via Sortformer + TitaNet, exposed at /api/audio/diarize-chunk for chunked workflows
  • OpenAI chat-completions proxy (/v1/chat/completions, /v1/completions) — forwards to the loaded vLLM so external apps need only one trusted host

v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.

S
Description
No description provided
Readme MIT 2.3 MiB
0.27.1:0 Latest
2026-06-18 21:46:24 +00:00
Languages
Python 68%
JavaScript 15.9%
CSS 4.9%
HTML 4.5%
TypeScript 3%
Other 3.7%