7e0759846f
Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks"
action (now just the 4 required fields) and into a dashboard ⚙ Settings gear,
backed by a /data/app_settings.json overlay keyed by env-var names. One shared
mutable Settings instance + Settings.reload() applies edits live without a
restart; existing installs' values migrate automatically on first boot.
Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now
configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the
/api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its
url/secret reload live too.
168 lines
11 KiB
Markdown
168 lines
11 KiB
Markdown
# Spark Control — handoff guide
|
||
|
||
You've received a `spark-control.s9pk` file. This guide gets you from "fresh install" to "working dashboard" in about an hour, most of which is waiting for downloads.
|
||
|
||
## What this is
|
||
|
||
Spark Control is a StartOS 0.4 package that runs on your Start9 server and gives you a browser dashboard for a **dual-DGX-Spark vLLM cluster**. From the dashboard you can:
|
||
|
||
- See which LLM is currently loaded
|
||
- Swap to a different LLM with one click (live log streaming until ready)
|
||
- Download new LLM weights from HuggingFace
|
||
- Install and monitor audio services (Parakeet STT, Kokoro TTS, Sortformer diarization)
|
||
- Expose OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/audio/transcriptions`, `/v1/audio/speech`, etc.) to other apps on your LAN through a single trusted host
|
||
|
||
It does **not** run any models itself — it's a controller. The actual GPU work happens on your two Sparks. Spark Control SSHes into Spark 1 to invoke `launch-cluster.sh`, and HTTP-polls both Sparks for health.
|
||
|
||
---
|
||
|
||
## Prerequisites before installing the s9pk
|
||
|
||
You need all of the following set up **first**. The s9pk assumes they exist.
|
||
|
||
### Hardware
|
||
|
||
- A **Start9 server** running StartOS 0.4.x with sideload-install enabled.
|
||
- **Two NVIDIA DGX Sparks** (or similar boxes with NVIDIA GPUs + Docker). One will be "Spark 1" (head node) and one will be "Spark 2" (worker node + audio services). They must be on the same LAN as the Start9 server.
|
||
|
||
### Spark 1 (the head node)
|
||
|
||
- A Linux user account you can SSH into (any username — `ubuntu`, `nvidia`, your own — just be consistent). Note the username; you'll enter it later.
|
||
- **Docker + NVIDIA Container Toolkit** installed and working.
|
||
- **`~/spark-vllm-docker/`** cloned from the community repo:
|
||
|
||
```bash
|
||
git clone https://github.com/eugr/spark-vllm-docker ~/spark-vllm-docker
|
||
cd ~/spark-vllm-docker
|
||
./build-and-copy.sh -c # builds the vLLM container image
|
||
```
|
||
|
||
> **The path matters.** Spark Control hardcodes `~/spark-vllm-docker` as the working directory for cluster commands. If you clone it elsewhere, the dashboard's swap and download actions will silently fail.
|
||
|
||
- A HuggingFace cache at `~/.cache/huggingface/hub/`. Either pre-download one model now, or use the dashboard's "Download a new model" button after install.
|
||
|
||
### Spark 2 (the worker node)
|
||
|
||
- Same Linux user account as Spark 1, with passwordless SSH from Spark 1 working.
|
||
- **Docker + NVIDIA Container Toolkit** installed.
|
||
- That's it — the rest can be installed through the Spark Control dashboard once it's running.
|
||
|
||
### Optional but recommended
|
||
|
||
- An **NVIDIA NGC personal API key** if you want to install Parakeet (STT) from `nvcr.io`. Free: <https://ngc.nvidia.com/setup/personal-key>. Starts with `nvapi-...`. (Not needed for Kokoro — it's Apache 2.0 and pulls from a public GitHub Container Registry image with no auth.)
|
||
|
||
---
|
||
|
||
## Install steps
|
||
|
||
### 1. Sideload the s9pk
|
||
|
||
In your Start9 web UI, go to **Sideload Service** and upload the `spark-control_*.s9pk` file (x86_64 or aarch64 depending on your Start9). Install it.
|
||
|
||
### 2. Start the service once
|
||
|
||
The first start generates an ed25519 SSH keypair inside the package volume. Wait until the service shows "Running" status — should take only a few seconds.
|
||
|
||
### 3. Show the public key and install it on both Sparks
|
||
|
||
- Open Spark Control → **Actions → Show Public Key**.
|
||
- If you haven't run Configure Sparks yet, you'll just see the raw key. Skip to step 4, then come back here.
|
||
- Once Configure Sparks is filled in, this action produces a **ready-to-paste install command** (a multi-line `ssh ... 'echo ... >> authorized_keys'` block). Copy the entire block.
|
||
- Run it in a terminal on a machine that already has SSH access to your Sparks. You'll be prompted for each Spark's SSH password once. After it completes, the Start9 server can SSH into both Sparks.
|
||
|
||
### 4. Configure Sparks
|
||
|
||
- Open Spark Control → **Actions → Configure Sparks**.
|
||
- Fill in just the four required fields:
|
||
- **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first.
|
||
- **Spark 1 SSH user** — whatever username you set up on Spark 1.
|
||
- **Spark 2 hostname or IP** + **SSH user** — same idea.
|
||
|
||
Save.
|
||
|
||
Everything else is optional and lives in the dashboard, not this action: open Spark Control and click **⚙ Settings** in the top bar to set vLLM/service **ports** (e.g. if your vLLM runs on 8000 rather than the default 8888, or you moved Parakeet off 8000), container names, support-service hosts, an **Open WebUI URL** (adds a deep-link button), an **NGC API key**, and a swap webhook. Changes there apply immediately and are included in StartOS backups.
|
||
|
||
### 5. Re-run Show Public Key (if you skipped earlier)
|
||
|
||
Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
|
||
|
||
### 6. Open the Web UI
|
||
|
||
From the Spark Control service page, click the Web UI button. You should see:
|
||
|
||
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
|
||
- An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes.
|
||
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
|
||
|
||
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
|
||
|
||
### 7. Load your first LLM
|
||
|
||
Click **"Switch to this"** on any model card. The dashboard will:
|
||
|
||
1. SSH into Spark 1, stop any running vLLM container.
|
||
2. Run `launch-cluster.sh` with the model's bundled flags.
|
||
3. Stream `docker logs -f` back to your browser until `Application startup complete.` appears.
|
||
4. Mark the new model as active.
|
||
|
||
Typical times: solo-mode models (Qwen3.6, Gemma 4) take ~3–5 min. Cluster-mode models (Qwen3-VL 235B) take ~5–8 min — they have to coordinate across both Sparks via Ray.
|
||
|
||
### 8. (Optional) install audio services
|
||
|
||
From the Audio / Speech tab, click **Install Parakeet**. This pulls and starts the parakeet-asr container on Spark 2 with appropriate settings. Takes ~2–3 min for the first install.
|
||
|
||
For diarization with speaker fingerprints, also click **Reapply patches** — that overlays Sortformer + TitaNet support onto the parakeet container. The patches survive `docker restart` but are wiped by `docker rm`; if you ever recreate the container, re-run Reapply patches.
|
||
|
||
Kokoro TTS is similar — pull `ghcr.io/remsky/kokoro-fastapi-gpu:latest` on Spark 2 and run with `--gpus all -p 8880:8880`. No NGC key required (Kokoro is Apache 2.0). Boots in ~5 seconds and uses only ~1.3 GB of GPU memory. (A one-click Kokoro install action is planned for a near-future release; for now you can install it manually or Spark Control will pick it up automatically once it's running on port 8880.)
|
||
|
||
---
|
||
|
||
## Endpoints exposed to your other apps
|
||
|
||
Once Spark Control is healthy, your other LAN apps can hit it as a single trusted backend:
|
||
|
||
| Path | Backend | Notes |
|
||
|---|---|---|
|
||
| `GET /api/endpoints` | (self) | Service discovery — JSON of base_urls + ready flags. Hit this first so you don't have to hardcode Spark IPs in other apps. |
|
||
| `POST /v1/chat/completions` | vLLM on Spark 1 | OpenAI-compatible; supports `stream: true` |
|
||
| `POST /v1/completions` | vLLM on Spark 1 | Legacy OpenAI completions |
|
||
| `POST /v1/audio/transcriptions` | Parakeet on Spark 2 | OpenAI-compatible STT |
|
||
| `POST /v1/audio/speech` | Kokoro on Spark 2 | OpenAI-compatible TTS. Default voice `bm_george`; pass `voice` to pick any of Kokoro's 67 voices. Reliable at any input length (no chunking/retry needed). |
|
||
| `POST /api/audio/diarize-chunk` | Sortformer + TitaNet | Per-chunk diarization with voice fingerprints for cross-chunk re-clustering |
|
||
| `POST /api/audio/transcribe-with-speakers` | Parakeet + Sortformer | One-shot transcribe + diarize, merged |
|
||
|
||
All of these inherit Spark Control's TLS cert and StartOS access controls. You only need one allowlist entry in downstream apps.
|
||
|
||
---
|
||
|
||
## Operational notes
|
||
|
||
- **vLLM does not auto-load a model after a power loss.** When your Sparks reboot, the dashboard will show "no model loaded" — you click "Switch to this" on whichever LLM you want. Parakeet/Kokoro auto-restart with their containers (Kokoro is `--restart unless-stopped` and Parakeet runs the same way).
|
||
- **Single-slot chunked workflows.** If you're calling `/v1/audio/transcriptions` or `/api/audio/diarize-chunk` in chunked workflows, send chunks **sequentially**, not in parallel. Parallel requests can trigger a known cuFFT race on the Spark 2 GPU that returns a 503 + Retry-After. Spark Control recovers automatically but each retry costs ~60s.
|
||
- **Context window**: the bundled Qwen3.6 entry runs at 64K total tokens (input + output combined). Adjust per-model via the Advanced button on each card.
|
||
- **Update path**: model-catalog overrides and custom services live in `/data/*` inside the volume; they survive s9pk updates.
|
||
|
||
---
|
||
|
||
## Resources
|
||
|
||
- `README.md` — repo overview, build instructions, dev environment
|
||
- `runbook.md` — model-swap recipes and operating notes
|
||
- `known-issues.md` — debugging fixes (Mamba block-size, vision token budget, historical Magpie notes, etc.)
|
||
- Source: `image/` is the FastAPI app; `package/` is the StartOS wrapper. The s9pk build is `cd package && make x86` (or `aarch64`).
|
||
|
||
---
|
||
|
||
## If you're an AI agent helping with this install
|
||
|
||
A few things worth knowing:
|
||
|
||
- The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated.
|
||
- **All connection info** comes from environment variables in `image/app/config.py`. The four required fields are populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action; the optional knobs are overlaid from the in-app `⚙ Settings` store (`/data/app_settings.json`, see `image/app/app_settings.py`). No IPs, usernames, or paths are hardcoded in runtime code.
|
||
- The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it.
|
||
- **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates.
|
||
- The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips.
|
||
- Debugging audio issues: SSH into Spark 2 and run `docker logs --tail 100 parakeet-asr`. cuFFT errors usually mean parallel requests; see the operational note above.
|
||
- Debugging LLM swaps: the swap log is streamed in the browser, but the underlying `docker logs -f vllm_node` on Spark 1 is the ground truth.
|
||
- The package supports both `x86_64` and `aarch64` builds. Match your Start9 server architecture.
|