7e0759846f
Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks"
action (now just the 4 required fields) and into a dashboard ⚙ Settings gear,
backed by a /data/app_settings.json overlay keyed by env-var names. One shared
mutable Settings instance + Settings.reload() applies edits live without a
restart; existing installs' values migrate automatically on first boot.
Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now
configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the
/api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its
url/secret reload live too.
129 lines
7.4 KiB
Markdown
129 lines
7.4 KiB
Markdown
# spark-control
|
|
|
|
A browser-based control panel for a dual-DGX-Spark vLLM cluster. Designed to run as a [StartOS 0.4](https://docs.start9.com/packaging/0.4.0.x/) package on a Start9 server on the same LAN as the Sparks.
|
|
|
|
> **If you've just received this package from someone**, start with [HANDOFF.md](./HANDOFF.md) — it has the prereq checklist and a step-by-step install guide written for a fresh user.
|
|
|
|
## What it does
|
|
|
|
- Shows which LLM is currently loaded on the cluster (`<spark1-host>:8888/v1/models`).
|
|
- Click to swap to a different model — stops the current one, launches the new one, streams logs to the UI until `Application startup complete.` appears.
|
|
- Surfaces health for Parakeet (STT, `:8000`) and Kokoro (TTS, `:8880`) on Spark 2.
|
|
- Proxies OpenAI-compatible chat-completions, transcribe, diarize, and TTS through one trusted host so external apps only need to know about Spark Control.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
[Browser/phone] ──► [StartOS reverse proxy] ──► [spark-control container]
|
|
│ (SSH over LAN)
|
|
▼
|
|
[Spark 1] ──► launch-cluster.sh
|
|
│
|
|
▼
|
|
[Spark 2]
|
|
```
|
|
|
|
Two layers in this repo:
|
|
|
|
- `image/` — a self-contained FastAPI app + static UI. Runs anywhere with `uvicorn` and an SSH client. Useful for development.
|
|
- `package/` — a thin StartOS 0.4 wrapper that packages the image, exposes the UI on the LAN, and gives the user actions to configure SSH access to the Sparks.
|
|
|
|
## Quick start (local dev, no StartOS yet)
|
|
|
|
```bash
|
|
cd image
|
|
python3 -m venv .venv && source .venv/bin/activate
|
|
pip install -e .
|
|
export SPARK1_HOST=<spark-1-ip>
|
|
export SPARK1_USER=<your-ssh-user>
|
|
export SPARK2_HOST=<spark-2-ip>
|
|
export SPARK2_USER=<your-ssh-user>
|
|
export SSH_KEY_PATH=<path-to-your-private-key>
|
|
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
|
|
```
|
|
|
|
Open <http://localhost:9999>.
|
|
|
|
> **Note:** prefer the **IP** for Spark 1 over a `.local` hostname. mDNS can resolve to IPv6 first, and `httpx` will hang on it because vLLM only binds IPv4.
|
|
|
|
## Build the StartOS package
|
|
|
|
```bash
|
|
cd package
|
|
npm i # one-time
|
|
make x86 # produces spark-control_x86_64.s9pk (~55 MB)
|
|
# or
|
|
make aarch64 # for ARM-based Start9 servers
|
|
```
|
|
|
|
Requires [`start-cli`](https://docs.start9.com/latest/developer-guide/sdk/installing-the-sdk), Node ≥ 22, Docker. The build runs `tsc` + `ncc` for the TS bundle, then `docker build` on `image/Dockerfile`, then `start-cli s9pk pack` to produce the `.s9pk`.
|
|
|
|
To sideload onto your Start9: `make install` (needs `host:` set in `~/.startos/config.yaml`), or upload the `.s9pk` via the Start9 web UI's sideload feature.
|
|
|
|
## Post-install setup (one-time per Start9 install)
|
|
|
|
1. Open the Spark Control service → **Actions** → **Show Public Key** → copy the produced one-liner.
|
|
2. Run that one-liner from any machine that already has SSH access to your Sparks. It appends the package's pubkey to `~/.ssh/authorized_keys` on each Spark.
|
|
3. **Actions** → **Configure Sparks** → enter your Spark 1 / Spark 2 IPs and the SSH username you use to log into them.
|
|
4. Start the service. Open the Web UI — current model + health should show within ~5 s.
|
|
|
|
See [HANDOFF.md](./HANDOFF.md) for a fuller prereq checklist and the hardware-side setup required *before* this package is useful.
|
|
|
|
## Repo layout
|
|
|
|
- `image/` — Docker image source (FastAPI app + `models.yaml`)
|
|
- `package/` — StartOS 0.4 package source
|
|
- `HANDOFF.md` — prereqs + first-time install guide for a fresh user
|
|
- `runbook.md` — operating notes
|
|
- `known-issues.md` — known quirks and workarounds
|
|
- `LICENSE` — MIT
|
|
|
|
## Service discovery API
|
|
|
|
Other services on your LAN can hit `GET /api/endpoints` to learn where the current model lives without hardcoding Spark IPs. Stable JSON shape:
|
|
|
|
```json
|
|
{
|
|
"vllm": { "ready": true, "base_url": "http://<spark1-host>:8888/v1", "model": "RedHatAI/Qwen3.6-35B-A3B-NVFP4", "openai_compat": true },
|
|
"parakeet":{ "ready": true, "base_url": "http://<spark2-host>:8000", "kind": "stt", "model": "nvidia/parakeet-tdt-0.6b-v3" },
|
|
"kokoro": { "ready": true, "base_url": "http://<spark2-host>:8880", "kind": "tts" }
|
|
}
|
|
```
|
|
|
|
`base_url` is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with `ready: true` to safely route traffic.
|
|
|
|
## Reporting failures from external apps
|
|
|
|
Spark Control polls every 5 s, so a brief blip in Parakeet/Kokoro/vLLM availability can slip between polls and never make it into the connectivity log. To capture short failures, an external app (e.g. Open WebUI) can POST whenever a call fails (or succeeds):
|
|
|
|
```bash
|
|
curl -X POST http://<dashboard-url>/api/health-event \
|
|
-H 'content-type: application/json' \
|
|
-d '{
|
|
"service": "parakeet",
|
|
"ok": false,
|
|
"source": "open-webui",
|
|
"error": "HTTP 503",
|
|
"ms": 420
|
|
}'
|
|
```
|
|
|
|
Fields: `service` (required), `ok` (required), `source` (optional, free-form), `error` (optional), `ms` (optional latency). Each POST appends a `report` event to the connectivity log alongside the polling-based transition events.
|
|
|
|
## Status
|
|
|
|
**s9pk version 0.26.0:0** — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled *launch recipes* (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.
|
|
|
|
### What v0.2 added on top of v0.1
|
|
|
|
- **Service discovery API** (`/api/endpoints`) for other LAN services
|
|
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
|
|
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host/port/container configuration in the in-app **⚙ Settings** gear (so they can live on Spark 1, Spark 2, or anywhere, on any port)
|
|
- **Model download** from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
|
|
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
|
|
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
|
|
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
|
|
- **OpenAI chat-completions proxy** (`/v1/chat/completions`, `/v1/completions`) — forwards to the loaded vLLM so external apps need only one trusted host
|
|
|
|
v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.
|