# spark-control runbook Operating notes for running and maintaining the cluster via spark-control. ## Prerequisites (per Spark) spark-control is a **controller**, not a runtime. Each Spark in your cluster must already have the upstream `eugr/spark-vllm-docker` project set up: 1. Clone `https://github.com/eugr/spark-vllm-docker` to `~/spark-vllm-docker` on Spark 1 (the head node). 2. Build the vLLM container: `./build-and-copy.sh -c` (on a cluster) or `./build-and-copy.sh` (solo). 3. Pre-download any models you want in the catalog: `./hf-download.sh -c --copy-parallel`. 4. Verify: `./launch-cluster.sh status` returns sensibly. 5. Set up passwordless SSH from your Start9 server's spark-control container to each Spark (use the Show Public Key action — see README.md "Post-install setup"). Sharing this package with someone else who has a similar dual-DGX-Spark setup: they do the same per-Spark prerequisites, then sideload the `.s9pk` on their Start9 and run the setup actions. ## Recent successful swaps - **2026-05-12 — gemma4 → qwen36** via `POST /api/swap` from laptop dev server. ~5:30 to "Application startup complete." Inference works (`/v1/chat/completions` returns reasoning content via `reasoning` field). `--moe_backend=flashinfer_cutlass` confirmed valid by vLLM (logged "Using 'FLASHINFER_CUTLASS' NvFp4 MoE backend"). ## Optimization flags (added 2026-05-12) Aligned `gemma4` and `qwen36` `vllm_args` with the project's sibling recipes (`qwen3.5-35b-a3b-fp8.yaml`, `gemma4-26b-a4b.yaml`): - `--load-format=fastsafetensors` — faster cold-start weight load. - `--enable-prefix-caching` — reuse cached prefix tokens (e.g. system prompt) across requests. - `--kv-cache-dtype=fp8` — store KV cache in 8-bit FP; halves memory used per active context. These take effect on the **next swap to that model**. If a swap fails after this change with errors mentioning fastsafetensors/prefix-caching/fp8, revert the entry in `models.yaml` and retry. ## Day-to-day - The UI lives at `http://.local:9999` once the StartOS package is installed and configured. - Status auto-refreshes every 5 s. - A swap takes 3–6 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream. ## matrix-bridge bot tile (optional) If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected. The **Update** button runs `git fetch && git reset --hard origin/ && docker compose up -d --build` as that SSH user. For it to reach your git remote: 1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`. 2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`: ``` Host Port IdentityFile ~/.ssh/id_ed25519 IdentitiesOnly yes ``` 3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark. ## Adding a new model 1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`. 2. Confirm the weights are on the Spark: `ssh @ 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh ` on Spark 1. 3. Rebuild + redeploy the package: `cd package && make x86 && make install`. If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable. ### Local / fine-tuned models (v0.23.0+) A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`. **Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v :"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates. ## Manual swap fallback If the UI is unavailable and you need to swap by hand: ```bash ssh @ cd ~/spark-vllm-docker ./launch-cluster.sh stop ./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \ --port 8888 --host 0.0.0.0 --gpu-memory-utilization 0.8 \ --max-model-len 32768 --reasoning-parser gemma4 \ --tool-call-parser gemma4 --enable-auto-tool-choice docker logs -f vllm_node # wait for "Application startup complete." ``` ## Sideload (`make install`) can't reach the server Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https:///rpc/v1` returns 200. - **Don't** work around it with `start-cli -H https:// package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP. - **Fix:** make the name resolve again, then re-run `make install`: - `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or - `echo " immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later). Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest). ## Diagnostics ```bash # Is vLLM serving? curl -s http://:8888/v1/models | jq . # Cluster status (containers up?) ssh @ 'cd ~/spark-vllm-docker && ./launch-cluster.sh status' # Tail current model's logs ssh @ 'docker logs --tail 200 -f vllm_node' # Parakeet curl -s http://:8000/health # Kokoro TTS (v0.14.0+) curl -s http://:8880/health ``` ## Hard reset If launch-cluster.sh gets stuck: ```bash ssh @ cd ~/spark-vllm-docker ./launch-cluster.sh stop docker ps -aq | xargs -r docker rm -f # then relaunch your preferred model ```