From 34bdbb7abaef2635fb4edf966bf193f08bb964d8 Mon Sep 17 00:00:00 2001
From: Grant <grant@ten31.xyz>
Date: Tue, 12 May 2026 10:05:17 -0500
Subject: [PATCH] Add Spark prerequisites section to runbook (spark-vllm-docker
 is upstream + Spark-side)

---
 runbook.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)
diff --git a/runbook.md b/runbook.md
index 598e248..1e1055a 100644
--- a/runbook.md
+++ b/runbook.md
@@ -2,6 +2,18 @@
 
 Operating notes for running and maintaining the cluster via spark-control.
 
+## Prerequisites (per Spark)
+
+spark-control is a **controller**, not a runtime. Each Spark in your cluster must already have the upstream `eugr/spark-vllm-docker` project set up:
+
+1. Clone `https://github.com/eugr/spark-vllm-docker` to `~/spark-vllm-docker` on Spark 1 (the head node).
+2. Build the vLLM container: `./build-and-copy.sh -c` (on a cluster) or `./build-and-copy.sh` (solo).
+3. Pre-download any models you want in the catalog: `./hf-download.sh <repo> -c --copy-parallel`.
+4. Verify: `./launch-cluster.sh status` returns sensibly.
+5. Set up passwordless SSH from your Start9 server's spark-control container to each Spark (use the Show Public Key action — see README.md "Post-install setup").
+
+Sharing this package with someone else who has a similar dual-DGX-Spark setup: they do the same per-Spark prerequisites, then sideload the `.s9pk` on their Start9 and run the setup actions.
+
 ## Recent successful swaps
 
 - **2026-05-12 — gemma4 → qwen36** via `POST /api/swap` from laptop dev server. ~5:30 to "Application startup complete." Inference works (`/v1/chat/completions` returns reasoning content via `reasoning` field). `--moe_backend=flashinfer_cutlass` confirmed valid by vLLM (logged "Using 'FLASHINFER_CUTLASS' NvFp4 MoE backend").