Initial scaffold: image/ FastAPI app, models.yaml, docs

- image/ FastAPI app: /api/status, /api/swap, /api/swap/{id}/stream, /api/test-connection - models.yaml: 5-model catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen25-72b) - README, runbook, known-issues - Dry-run swap verified against live Spark 1 (gemma4 currently loaded)
2026-05-12 09:29:13 -05:00
commit ae8efa1754
19 changed files with 1500 additions and 0 deletions
@@ -0,0 +1,61 @@
+# spark-control runbook
+
+Operating notes for running and maintaining the cluster via spark-control.
+
+## Day-to-day
+
+- The UI lives at `http://<your-start9>.local:9999` once the StartOS package is installed and configured.
+- Status auto-refreshes every 5 s.
+- A swap takes 3–6 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
+
+## Adding a new model
+
+1. Add an entry to `models.yaml` (in the image source) or, post-install, via the "Edit Model Catalog" action in StartOS.
+2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host>.local 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
+3. The new model appears in the UI on next refresh.
+
+## Manual swap fallback
+
+If the UI is unavailable and you need to swap by hand:
+
+```bash
+ssh <spark-user>@<spark-1-host>.local
+cd ~/spark-vllm-docker
+./launch-cluster.sh stop
+./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
+  --port 8888 --host 0.0.0.0 --gpu-memory-utilization 0.8 \
+  --max-model-len 32768 --reasoning-parser gemma4 \
+  --tool-call-parser gemma4 --enable-auto-tool-choice
+docker logs -f vllm_node      # wait for "Application startup complete."
+```
+
+## Diagnostics
+
+```bash
+# Is vLLM serving?
+curl -s http://<spark-1-ip>:8888/v1/models | jq .
+
+# Cluster status (containers up?)
+ssh <spark-user>@<spark-1-host>.local 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
+
+# Tail current model's logs
+ssh <spark-user>@<spark-1-host>.local 'docker logs --tail 200 -f vllm_node'
+
+# Parakeet
+curl -s http://<spark-2-ip>:8000/health
+
+# Magpie (see known-issues.md)
+curl -s http://<spark-2-ip>:9000/v1/health/ready
+```
+
+## Hard reset
+
+If launch-cluster.sh gets stuck:
+
+```bash
+ssh <spark-user>@<spark-1-host>.local
+cd ~/spark-vllm-docker
+./launch-cluster.sh stop
+docker ps -aq | xargs -r docker rm -f
+# then relaunch your preferred model
+```