Files
Keysat 26070eb191 v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor)
Make the cluster topology configurable so an adopter wired differently
(vLLM on both Sparks, port 8000, different container name, no Parakeet)
can monitor without forking. Covers the OpenClaw report P4/P5/#6.

- VLLM_CONTAINER override (default vllm_node), validated at the boundary
  and quote_arg-quoted into the swap log-tail + pre-flight validator exec.
- DISABLED_SERVICES list: hidden services show no tile and are skipped by
  status/deep-health/connectivity probes (kills the Parakeet-on-8000
  collision).
- kind: vllm custom service monitors a second Spark's vLLM via the shared
  probe_vllm_endpoint; /api/endpoints gains a disabled flag.

Swap mechanism intentionally not generalized to raw docker run (that's
coordination, roadmap item 4).
2026-06-17 23:03:33 -05:00

8.7 KiB
Raw Permalink Blame History

spark-control runbook

Operating notes for running and maintaining the cluster via spark-control.

Prerequisites (per Spark)

spark-control is a controller, not a runtime. Each Spark in your cluster must already have the upstream eugr/spark-vllm-docker project set up:

  1. Clone https://github.com/eugr/spark-vllm-docker to ~/spark-vllm-docker on Spark 1 (the head node).
  2. Build the vLLM container: ./build-and-copy.sh -c (on a cluster) or ./build-and-copy.sh (solo).
  3. Pre-download any models you want in the catalog: ./hf-download.sh <repo> -c --copy-parallel.
  4. Verify: ./launch-cluster.sh status returns sensibly.
  5. Set up passwordless SSH from your Start9 server's spark-control container to each Spark (use the Show Public Key action — see README.md "Post-install setup").

Sharing this package with someone else who has a similar dual-DGX-Spark setup: they do the same per-Spark prerequisites, then sideload the .s9pk on their Start9 and run the setup actions.

Recent successful swaps

  • 2026-05-12 — gemma4 → qwen36 via POST /api/swap from laptop dev server. ~5:30 to "Application startup complete." Inference works (/v1/chat/completions returns reasoning content via reasoning field). --moe_backend=flashinfer_cutlass confirmed valid by vLLM (logged "Using 'FLASHINFER_CUTLASS' NvFp4 MoE backend").

Optimization flags (added 2026-05-12)

Aligned gemma4 and qwen36 vllm_args with the project's sibling recipes (qwen3.5-35b-a3b-fp8.yaml, gemma4-26b-a4b.yaml):

  • --load-format=fastsafetensors — faster cold-start weight load.
  • --enable-prefix-caching — reuse cached prefix tokens (e.g. system prompt) across requests.
  • --kv-cache-dtype=fp8 — store KV cache in 8-bit FP; halves memory used per active context.

These take effect on the next swap to that model. If a swap fails after this change with errors mentioning fastsafetensors/prefix-caching/fp8, revert the entry in models.yaml and retry.

Day-to-day

  • The UI lives at http://<your-start9>.local:9999 once the StartOS package is installed and configured.
  • Status auto-refreshes every 5 s.
  • A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.

matrix-bridge bot tile (optional)

If you run the matrix-bridge bot container on a Spark, set its SSH user in Configure Sparks (e.g. the user that owns ~/matrix-bridge) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a running badge means the container is up, not necessarily that the bot is connected.

The Update button runs git fetch && git reset --hard origin/<branch> && docker compose up -d --build as that SSH user. For it to reach your git remote:

  1. ~/matrix-bridge must be a clone of the repo (not loose files). Gitignored secrets (.env, etc.) survive a git reset --hard.

  2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common Permission denied (publickey) cause). In the user's ~/.ssh/config:

    Host <your-git-host>
        Port <port>
        IdentityFile ~/.ssh/id_ed25519
        IdentitiesOnly yes
    
  3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their authorized_keys) unless it's the same user Spark Control already uses for that Spark.

Configurable topology (v0.24.0+)

For a cluster wired differently from the reference layout, three optional knobs in Configure Sparks (no fork needed):

  • vLLM container name — defaults to vllm_node. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator docker exec into it by name.

  • Services to hide — comma-separated parakeet,kokoro,embeddings,qdrant. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.

  • Monitor a second vLLM — the swap machinery only drives the Spark 1 vLLM, but you can monitor a vLLM on another Spark by adding a custom service of kind: vllm to /data/services-overrides.yaml:

    custom:
      - key: vllm-spark2
        kind: vllm
        host: <spark-2-ip>
        user: <ssh-user>
        container: vllm_node
        port: 8000
    

    It gets a read-only tile: loaded model (via /v1/models), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)

Adding a new model

  1. Add an entry to image/models.yaml. Required fields: display_name, repo, size_gb, mode (solo or cluster), vllm_args. Optional but recommended: description (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), capabilities (tags like [vision, reasoning, tools]), expected_ready_seconds.
  2. Confirm the weights are on the Spark: ssh <spark-user>@<spark-1-host> 'ls ~/.cache/huggingface/hub/'. If not, download with ./hf-download.sh <repo> on Spark 1.
  3. Rebuild + redeploy the package: cd package && make x86 && make install.

If description is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.

Local / fine-tuned models (v0.23.0+)

A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the "+ Add local model" button under LLM swap (or a custom: entry with local_path instead of repo in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a --chat-template must live inside local_path.

Load-bearing contract: on swap, spark-control prefixes the launch with VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>" so launch-cluster.sh bind-mounts the dir into the vLLM container at the same path. This relies on the upstream eugr/spark-vllm-docker launch-cluster.sh expanding $VLLM_SPARK_EXTRA_DOCKER_ARGS unquoted into its docker run (verified against the on-Spark script 2026-06-17: line ~11 appends it to DOCKER_ARGS, used unquoted in docker run). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.

Manual swap fallback

If the UI is unavailable and you need to swap by hand:

ssh <spark-user>@<spark-1-host>
cd ~/spark-vllm-docker
./launch-cluster.sh stop
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
  --port 8888 --host 0.0.0.0 --gpu-memory-utilization 0.8 \
  --max-model-len 32768 --reasoning-parser gemma4 \
  --tool-call-parser gemma4 --enable-auto-tool-choice
docker logs -f vllm_node      # wait for "Application startup complete."

Sideload (make install) can't reach the server

Symptom: make install fails with package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1). Cause seen 2026-06-17: immense-voyage.local stopped resolving via mDNS from the Mac (curl https://immense-voyage.local/... → exit 6, "couldn't resolve host"), even though the server is up — curl -sk https://<server-ip>/rpc/v1 returns 200.

  • Don't work around it with start-cli -H https://<server-ip> package install: TLS connects but it returns UNAUTHORIZED, because start-cli's stored credential is bound to the registered .local host, not the IP.
  • Fix: make the name resolve again, then re-run make install:
    • sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder (flush mDNS), or
    • echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts (deterministic; remove later).

Note this only blocks installing to your own Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).

Diagnostics

# Is vLLM serving?
curl -s http://<spark-1-ip>:8888/v1/models | jq .

# Cluster status (containers up?)
ssh <spark-user>@<spark-1-host> 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'

# Tail current model's logs
ssh <spark-user>@<spark-1-host> 'docker logs --tail 200 -f vllm_node'

# Parakeet
curl -s http://<spark-2-ip>:8000/health

# Kokoro TTS (v0.14.0+)
curl -s http://<spark-2-ip>:8880/health

Hard reset

If launch-cluster.sh gets stuck:

ssh <spark-user>@<spark-1-host>
cd ~/spark-vllm-docker
./launch-cluster.sh stop
docker ps -aq | xargs -r docker rm -f
# then relaunch your preferred model