Files
spark-control/runbook.md
T
Keysat e783653ef0 v0.23.0:0 - local / fine-tuned model support
Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.

- ModelDef gains local_path; a model must set exactly one of repo / local_path.
  The validator also enforces the local-path whitelist and that any
  --chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
  host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
  then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
  expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
  bad entry can't be written and then break catalog load; _merge_overrides skips
  an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
  link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
  exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
  pass; findings addressed.
2026-06-17 22:27:41 -05:00

114 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# spark-control runbook
Operating notes for running and maintaining the cluster via spark-control.
## Prerequisites (per Spark)
spark-control is a **controller**, not a runtime. Each Spark in your cluster must already have the upstream `eugr/spark-vllm-docker` project set up:
1. Clone `https://github.com/eugr/spark-vllm-docker` to `~/spark-vllm-docker` on Spark 1 (the head node).
2. Build the vLLM container: `./build-and-copy.sh -c` (on a cluster) or `./build-and-copy.sh` (solo).
3. Pre-download any models you want in the catalog: `./hf-download.sh <repo> -c --copy-parallel`.
4. Verify: `./launch-cluster.sh status` returns sensibly.
5. Set up passwordless SSH from your Start9 server's spark-control container to each Spark (use the Show Public Key action — see README.md "Post-install setup").
Sharing this package with someone else who has a similar dual-DGX-Spark setup: they do the same per-Spark prerequisites, then sideload the `.s9pk` on their Start9 and run the setup actions.
## Recent successful swaps
- **2026-05-12 — gemma4 → qwen36** via `POST /api/swap` from laptop dev server. ~5:30 to "Application startup complete." Inference works (`/v1/chat/completions` returns reasoning content via `reasoning` field). `--moe_backend=flashinfer_cutlass` confirmed valid by vLLM (logged "Using 'FLASHINFER_CUTLASS' NvFp4 MoE backend").
## Optimization flags (added 2026-05-12)
Aligned `gemma4` and `qwen36` `vllm_args` with the project's sibling recipes (`qwen3.5-35b-a3b-fp8.yaml`, `gemma4-26b-a4b.yaml`):
- `--load-format=fastsafetensors` — faster cold-start weight load.
- `--enable-prefix-caching` — reuse cached prefix tokens (e.g. system prompt) across requests.
- `--kv-cache-dtype=fp8` — store KV cache in 8-bit FP; halves memory used per active context.
These take effect on the **next swap to that model**. If a swap fails after this change with errors mentioning fastsafetensors/prefix-caching/fp8, revert the entry in `models.yaml` and retry.
## Day-to-day
- The UI lives at `http://<your-start9>.local:9999` once the StartOS package is installed and configured.
- Status auto-refreshes every 5 s.
- A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
## matrix-bridge bot tile (optional)
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
```
Host <your-git-host>
Port <port>
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
```
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
## Adding a new model
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host> 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
### Local / fine-tuned models (v0.23.0+)
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
## Manual swap fallback
If the UI is unavailable and you need to swap by hand:
```bash
ssh <spark-user>@<spark-1-host>
cd ~/spark-vllm-docker
./launch-cluster.sh stop
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
--port 8888 --host 0.0.0.0 --gpu-memory-utilization 0.8 \
--max-model-len 32768 --reasoning-parser gemma4 \
--tool-call-parser gemma4 --enable-auto-tool-choice
docker logs -f vllm_node # wait for "Application startup complete."
```
## Diagnostics
```bash
# Is vLLM serving?
curl -s http://<spark-1-ip>:8888/v1/models | jq .
# Cluster status (containers up?)
ssh <spark-user>@<spark-1-host> 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
# Tail current model's logs
ssh <spark-user>@<spark-1-host> 'docker logs --tail 200 -f vllm_node'
# Parakeet
curl -s http://<spark-2-ip>:8000/health
# Kokoro TTS (v0.14.0+)
curl -s http://<spark-2-ip>:8880/health
```
## Hard reset
If launch-cluster.sh gets stuck:
```bash
ssh <spark-user>@<spark-1-host>
cd ~/spark-vllm-docker
./launch-cluster.sh stop
docker ps -aq | xargs -r docker rm -f
# then relaunch your preferred model
```