# Spark Control — handoff guide You've received a `spark-control.s9pk` file. This guide gets you from "fresh install" to "working dashboard" in about an hour, most of which is waiting for downloads. ## What this is Spark Control is a StartOS 0.4 package that runs on your Start9 server and gives you a browser dashboard for a **dual-DGX-Spark vLLM cluster**. From the dashboard you can: - See which LLM is currently loaded - Swap to a different LLM with one click (live log streaming until ready) - Download new LLM weights from HuggingFace - Install and monitor audio services (Parakeet STT, Kokoro TTS, Sortformer diarization) - Expose OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/audio/transcriptions`, `/v1/audio/speech`, etc.) to other apps on your LAN through a single trusted host It does **not** run any models itself — it's a controller. The actual GPU work happens on your two Sparks. Spark Control SSHes into Spark 1 to invoke `launch-cluster.sh`, and HTTP-polls both Sparks for health. --- ## Prerequisites before installing the s9pk You need all of the following set up **first**. The s9pk assumes they exist. ### Hardware - A **Start9 server** running StartOS 0.4.x with sideload-install enabled. - **Two NVIDIA DGX Sparks** (or similar boxes with NVIDIA GPUs + Docker). One will be "Spark 1" (head node) and one will be "Spark 2" (worker node + audio services). They must be on the same LAN as the Start9 server. ### Spark 1 (the head node) - A Linux user account you can SSH into (any username — `ubuntu`, `nvidia`, your own — just be consistent). Note the username; you'll enter it later. - **Docker + NVIDIA Container Toolkit** installed and working. - **`~/spark-vllm-docker/`** cloned from the community repo: ```bash git clone https://github.com/eugr/spark-vllm-docker ~/spark-vllm-docker cd ~/spark-vllm-docker ./build-and-copy.sh -c # builds the vLLM container image ``` > **The path matters.** Spark Control hardcodes `~/spark-vllm-docker` as the working directory for cluster commands. If you clone it elsewhere, the dashboard's swap and download actions will silently fail. - A HuggingFace cache at `~/.cache/huggingface/hub/`. Either pre-download one model now, or use the dashboard's "Download a new model" button after install. ### Spark 2 (the worker node) - Same Linux user account as Spark 1, with passwordless SSH from Spark 1 working. - **Docker + NVIDIA Container Toolkit** installed. - That's it — the rest can be installed through the Spark Control dashboard once it's running. ### Optional but recommended - An **NVIDIA NGC personal API key** if you want to install Parakeet (STT) from `nvcr.io`. Free: . Starts with `nvapi-...`. (Not needed for Kokoro — it's Apache 2.0 and pulls from a public GitHub Container Registry image with no auth.) --- ## Install steps ### 1. Sideload the s9pk In your Start9 web UI, go to **Sideload Service** and upload the `spark-control_*.s9pk` file (x86_64 or aarch64 depending on your Start9). Install it. ### 2. Start the service once The first start generates an ed25519 SSH keypair inside the package volume. Wait until the service shows "Running" status — should take only a few seconds. ### 3. Show the public key and install it on both Sparks - Open Spark Control → **Actions → Show Public Key**. - If you haven't run Configure Sparks yet, you'll just see the raw key. Skip to step 4, then come back here. - Once Configure Sparks is filled in, this action produces a **ready-to-paste install command** (a multi-line `ssh ... 'echo ... >> authorized_keys'` block). Copy the entire block. - Run it in a terminal on a machine that already has SSH access to your Sparks. You'll be prompted for each Spark's SSH password once. After it completes, the Start9 server can SSH into both Sparks. ### 4. Configure Sparks - Open Spark Control → **Actions → Configure Sparks**. - Fill in just the four required fields: - **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first. - **Spark 1 SSH user** — whatever username you set up on Spark 1. - **Spark 2 hostname or IP** + **SSH user** — same idea. Save. Everything else is optional and lives in the dashboard, not this action: open Spark Control and click **⚙ Settings** in the top bar to set vLLM/service **ports** (e.g. if your vLLM runs on 8000 rather than the default 8888, or you moved Parakeet off 8000), container names, support-service hosts, an **Open WebUI URL** (adds a deep-link button), an **NGC API key**, and a swap webhook. Changes there apply immediately and are included in StartOS backups. ### 5. Re-run Show Public Key (if you skipped earlier) Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3. ### 6. Open the Web UI From the Spark Control service page, click the Web UI button. You should see: - A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh). - An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes. - An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro. If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**. ### 7. Load your first LLM Click **"Switch to this"** on any model card. The dashboard will: 1. SSH into Spark 1, stop any running vLLM container. 2. Run `launch-cluster.sh` with the model's bundled flags. 3. Stream `docker logs -f` back to your browser until `Application startup complete.` appears. 4. Mark the new model as active. Typical times: solo-mode models (Qwen3.6, Gemma 4) take ~3–5 min. Cluster-mode models (Qwen3-VL 235B) take ~5–8 min — they have to coordinate across both Sparks via Ray. ### 8. (Optional) install audio services From the Audio / Speech tab, click **Install Parakeet**. This pulls and starts the parakeet-asr container on Spark 2 with appropriate settings. Takes ~2–3 min for the first install. For diarization with speaker fingerprints, also click **Reapply patches** — that overlays Sortformer + TitaNet support onto the parakeet container. The patches survive `docker restart` but are wiped by `docker rm`; if you ever recreate the container, re-run Reapply patches. Kokoro TTS is similar — pull `ghcr.io/remsky/kokoro-fastapi-gpu:latest` on Spark 2 and run with `--gpus all -p 8880:8880`. No NGC key required (Kokoro is Apache 2.0). Boots in ~5 seconds and uses only ~1.3 GB of GPU memory. (A one-click Kokoro install action is planned for a near-future release; for now you can install it manually or Spark Control will pick it up automatically once it's running on port 8880.) --- ## Endpoints exposed to your other apps Once Spark Control is healthy, your other LAN apps can hit it as a single trusted backend: | Path | Backend | Notes | |---|---|---| | `GET /api/endpoints` | (self) | Service discovery — JSON of base_urls + ready flags. Hit this first so you don't have to hardcode Spark IPs in other apps. | | `POST /v1/chat/completions` | vLLM on Spark 1 | OpenAI-compatible; supports `stream: true` | | `POST /v1/completions` | vLLM on Spark 1 | Legacy OpenAI completions | | `POST /v1/audio/transcriptions` | Parakeet on Spark 2 | OpenAI-compatible STT | | `POST /v1/audio/speech` | Kokoro on Spark 2 | OpenAI-compatible TTS. Default voice `bm_george`; pass `voice` to pick any of Kokoro's 67 voices. Reliable at any input length (no chunking/retry needed). | | `POST /api/audio/diarize-chunk` | Sortformer + TitaNet | Per-chunk diarization with voice fingerprints for cross-chunk re-clustering | | `POST /api/audio/transcribe-with-speakers` | Parakeet + Sortformer | One-shot transcribe + diarize, merged | All of these inherit Spark Control's TLS cert and StartOS access controls. You only need one allowlist entry in downstream apps. --- ## Operational notes - **vLLM does not auto-load a model after a power loss.** When your Sparks reboot, the dashboard will show "no model loaded" — you click "Switch to this" on whichever LLM you want. Parakeet/Kokoro auto-restart with their containers (Kokoro is `--restart unless-stopped` and Parakeet runs the same way). - **Single-slot chunked workflows.** If you're calling `/v1/audio/transcriptions` or `/api/audio/diarize-chunk` in chunked workflows, send chunks **sequentially**, not in parallel. Parallel requests can trigger a known cuFFT race on the Spark 2 GPU that returns a 503 + Retry-After. Spark Control recovers automatically but each retry costs ~60s. - **Context window**: the bundled Qwen3.6 entry runs at 64K total tokens (input + output combined). Adjust per-model via the Advanced button on each card. - **Update path**: model-catalog overrides and custom services live in `/data/*` inside the volume; they survive s9pk updates. --- ## Resources - `README.md` — repo overview, build instructions, dev environment - `runbook.md` — model-swap recipes and operating notes - `known-issues.md` — debugging fixes (Mamba block-size, vision token budget, historical Magpie notes, etc.) - Source: `image/` is the FastAPI app; `package/` is the StartOS wrapper. The s9pk build is `cd package && make x86` (or `aarch64`). --- ## If you're an AI agent helping with this install A few things worth knowing: - The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated. - **All connection info** comes from environment variables in `image/app/config.py`. The four required fields are populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action; the optional knobs are overlaid from the in-app `⚙ Settings` store (`/data/app_settings.json`, see `image/app/app_settings.py`). No IPs, usernames, or paths are hardcoded in runtime code. - The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it. - **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates. - The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips. - Debugging audio issues: SSH into Spark 2 and run `docker logs --tail 100 parakeet-asr`. cuFFT errors usually mean parallel requests; see the operational note above. - Debugging LLM swaps: the swap log is streamed in the browser, but the underlying `docker logs -f vllm_node` on Spark 1 is the ground truth. - The package supports both `x86_64` and `aarch64` builds. Match your Start9 server architecture.