The dashboard menu is now the set of models actually downloaded on the Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as launch recipes matched to an on-disk model by repo; an on-disk model with no recipe is flagged needs_setup and its launch settings are inferred from its config.json for a one-time operator confirmation (discovery.py). - delete now removes weights AND the menu card (delete_from_disk sweeps all hosts; the delete endpoint resolves keys via the live menu) - new GET /api/models/suggest; /api/models returns the menu + a recipes list (download autocomplete); GET /api/models/disk-status removed - dropped the two legacy Qwen recipes (235B FP8, 2.5 72B) - tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)
11 KiB
Spark Control — handoff guide
You've received a spark-control.s9pk file. This guide gets you from "fresh install" to "working dashboard" in about an hour, most of which is waiting for downloads.
What this is
Spark Control is a StartOS 0.4 package that runs on your Start9 server and gives you a browser dashboard for a dual-DGX-Spark vLLM cluster. From the dashboard you can:
- See which LLM is currently loaded
- Swap to a different LLM with one click (live log streaming until ready)
- Download new LLM weights from HuggingFace
- Install and monitor audio services (Parakeet STT, Kokoro TTS, Sortformer diarization)
- Expose OpenAI-compatible endpoints (
/v1/chat/completions,/v1/audio/transcriptions,/v1/audio/speech, etc.) to other apps on your LAN through a single trusted host
It does not run any models itself — it's a controller. The actual GPU work happens on your two Sparks. Spark Control SSHes into Spark 1 to invoke launch-cluster.sh, and HTTP-polls both Sparks for health.
Prerequisites before installing the s9pk
You need all of the following set up first. The s9pk assumes they exist.
Hardware
- A Start9 server running StartOS 0.4.x with sideload-install enabled.
- Two NVIDIA DGX Sparks (or similar boxes with NVIDIA GPUs + Docker). One will be "Spark 1" (head node) and one will be "Spark 2" (worker node + audio services). They must be on the same LAN as the Start9 server.
Spark 1 (the head node)
-
A Linux user account you can SSH into (any username —
ubuntu,nvidia, your own — just be consistent). Note the username; you'll enter it later. -
Docker + NVIDIA Container Toolkit installed and working.
-
~/spark-vllm-docker/cloned from the community repo:git clone https://github.com/eugr/spark-vllm-docker ~/spark-vllm-docker cd ~/spark-vllm-docker ./build-and-copy.sh -c # builds the vLLM container imageThe path matters. Spark Control hardcodes
~/spark-vllm-dockeras the working directory for cluster commands. If you clone it elsewhere, the dashboard's swap and download actions will silently fail. -
A HuggingFace cache at
~/.cache/huggingface/hub/. Either pre-download one model now, or use the dashboard's "Download a new model" button after install.
Spark 2 (the worker node)
- Same Linux user account as Spark 1, with passwordless SSH from Spark 1 working.
- Docker + NVIDIA Container Toolkit installed.
- That's it — the rest can be installed through the Spark Control dashboard once it's running.
Optional but recommended
- An NVIDIA NGC personal API key if you want to install Parakeet (STT) from
nvcr.io. Free: https://ngc.nvidia.com/setup/personal-key. Starts withnvapi-.... (Not needed for Kokoro — it's Apache 2.0 and pulls from a public GitHub Container Registry image with no auth.)
Install steps
1. Sideload the s9pk
In your Start9 web UI, go to Sideload Service and upload the spark-control_*.s9pk file (x86_64 or aarch64 depending on your Start9). Install it.
2. Start the service once
The first start generates an ed25519 SSH keypair inside the package volume. Wait until the service shows "Running" status — should take only a few seconds.
3. Show the public key and install it on both Sparks
- Open Spark Control → Actions → Show Public Key.
- If you haven't run Configure Sparks yet, you'll just see the raw key. Skip to step 4, then come back here.
- Once Configure Sparks is filled in, this action produces a ready-to-paste install command (a multi-line
ssh ... 'echo ... >> authorized_keys'block). Copy the entire block. - Run it in a terminal on a machine that already has SSH access to your Sparks. You'll be prompted for each Spark's SSH password once. After it completes, the Start9 server can SSH into both Sparks.
4. Configure Sparks
- Open Spark Control → Actions → Configure Sparks.
- Fill in:
- Spark 1 hostname or IP — prefer the IP (e.g.
192.168.1.x) over.localhostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first. - Spark 1 SSH user — whatever username you set up on Spark 1.
- Spark 2 hostname or IP + SSH user — same idea.
- Optional Parakeet/Kokoro overrides — leave blank if those services run on Spark 2 (the normal case).
- Optional Open WebUI URL — paste your Open WebUI LAN URL to get a deep-link button in the dashboard next to the current model.
- Optional NGC API key — paste it here if you have one.
- Spark 1 hostname or IP — prefer the IP (e.g.
Save.
5. Re-run Show Public Key (if you skipped earlier)
Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
6. Open the Web UI
From the Spark Control service page, click the Web UI button. You should see:
- A top status bar with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
- An LLM tab whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use + Download a new model to fetch one — it appears here when it finishes.
- An Audio / Speech tab with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, you're in.
7. Load your first LLM
Click "Switch to this" on any model card. The dashboard will:
- SSH into Spark 1, stop any running vLLM container.
- Run
launch-cluster.shwith the model's bundled flags. - Stream
docker logs -fback to your browser untilApplication startup complete.appears. - Mark the new model as active.
Typical times: solo-mode models (Qwen3.6, Gemma 4) take ~3–5 min. Cluster-mode models (Qwen3-VL 235B) take ~5–8 min — they have to coordinate across both Sparks via Ray.
8. (Optional) install audio services
From the Audio / Speech tab, click Install Parakeet. This pulls and starts the parakeet-asr container on Spark 2 with appropriate settings. Takes ~2–3 min for the first install.
For diarization with speaker fingerprints, also click Reapply patches — that overlays Sortformer + TitaNet support onto the parakeet container. The patches survive docker restart but are wiped by docker rm; if you ever recreate the container, re-run Reapply patches.
Kokoro TTS is similar — pull ghcr.io/remsky/kokoro-fastapi-gpu:latest on Spark 2 and run with --gpus all -p 8880:8880. No NGC key required (Kokoro is Apache 2.0). Boots in ~5 seconds and uses only ~1.3 GB of GPU memory. (A one-click Kokoro install action is planned for a near-future release; for now you can install it manually or Spark Control will pick it up automatically once it's running on port 8880.)
Endpoints exposed to your other apps
Once Spark Control is healthy, your other LAN apps can hit it as a single trusted backend:
| Path | Backend | Notes |
|---|---|---|
GET /api/endpoints |
(self) | Service discovery — JSON of base_urls + ready flags. Hit this first so you don't have to hardcode Spark IPs in other apps. |
POST /v1/chat/completions |
vLLM on Spark 1 | OpenAI-compatible; supports stream: true |
POST /v1/completions |
vLLM on Spark 1 | Legacy OpenAI completions |
POST /v1/audio/transcriptions |
Parakeet on Spark 2 | OpenAI-compatible STT |
POST /v1/audio/speech |
Kokoro on Spark 2 | OpenAI-compatible TTS. Default voice bm_george; pass voice to pick any of Kokoro's 67 voices. Reliable at any input length (no chunking/retry needed). |
POST /api/audio/diarize-chunk |
Sortformer + TitaNet | Per-chunk diarization with voice fingerprints for cross-chunk re-clustering |
POST /api/audio/transcribe-with-speakers |
Parakeet + Sortformer | One-shot transcribe + diarize, merged |
All of these inherit Spark Control's TLS cert and StartOS access controls. You only need one allowlist entry in downstream apps.
Operational notes
- vLLM does not auto-load a model after a power loss. When your Sparks reboot, the dashboard will show "no model loaded" — you click "Switch to this" on whichever LLM you want. Parakeet/Kokoro auto-restart with their containers (Kokoro is
--restart unless-stoppedand Parakeet runs the same way). - Single-slot chunked workflows. If you're calling
/v1/audio/transcriptionsor/api/audio/diarize-chunkin chunked workflows, send chunks sequentially, not in parallel. Parallel requests can trigger a known cuFFT race on the Spark 2 GPU that returns a 503 + Retry-After. Spark Control recovers automatically but each retry costs ~60s. - Context window: the bundled Qwen3.6 entry runs at 64K total tokens (input + output combined). Adjust per-model via the Advanced button on each card.
- Update path: model-catalog overrides and custom services live in
/data/*inside the volume; they survive s9pk updates.
Resources
README.md— repo overview, build instructions, dev environmentrunbook.md— model-swap recipes and operating notesknown-issues.md— debugging fixes (Mamba block-size, vision token budget, historical Magpie notes, etc.)- Source:
image/is the FastAPI app;package/is the StartOS wrapper. The s9pk build iscd package && make x86(oraarch64).
If you're an AI agent helping with this install
A few things worth knowing:
- The codebase is two halves:
image/is a standalone FastAPI app you can run withuvicorn app.server:appfor local dev.package/is the StartOS wrapper. Changes to either should be coordinated. - All connection info comes from environment variables in
image/app/config.py, populated frompackage/startos/fileModels/sparkConfig.yaml.tsvia the Configure Sparks action. No IPs, usernames, or paths are hardcoded in runtime code. - The path
~/spark-vllm-dockeris hardcoded inswap.py,download.py,updates.py, andmodels.py. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it. - Persistent state lives at
/data/inside the container:config.yaml,models-overrides.yaml,services-overrides.yaml,connectivity.json,ssh/. These survive package updates. - The dashboard polls every 5 s; check
image/app/health.pyandimage/app/connectivity.pyfor the probing logic. External apps can also POST failures to/api/health-eventto log between-poll blips. - Debugging audio issues: SSH into Spark 2 and run
docker logs --tail 100 parakeet-asr. cuFFT errors usually mean parallel requests; see the operational note above. - Debugging LLM swaps: the swap log is streamed in the browser, but the underlying
docker logs -f vllm_nodeon Spark 1 is the ground truth. - The package supports both
x86_64andaarch64builds. Match your Start9 server architecture.