Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)

Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model. First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
2026-05-12 09:49:08 -05:00
parent dd9d53060b
commit 342e150266
4 changed files with 38 additions and 0 deletions
@@ -1,5 +1,21 @@
 # Project: spark-control — Model switcher web UI for dual DGX Spark cluster

+> **Update 2026-05-12 — Direction change:** the web UI is being built as a
+> **StartOS 0.4 package** (sideloaded onto Alice's existing Start9 server),
+> **not** as a FastAPI service running directly on Spark 1. The Start9 server
+> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
+> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
+> credentials live in a per-install config file managed by a "Configure Sparks"
+> action. See <https://docs.start9.com/packaging/0.4.0.x/> for the packaging
+> model. Repo layout:
+>
+> - `image/` — Docker image source (FastAPI app, runs anywhere with `uvicorn`).
+> - `package/` — StartOS 0.4 wrapper (manifest, main, interfaces, actions).
+>
+> The "Phase 4: Deploy" section below (systemd on Spark 1) is **superseded** by
+> the StartOS sideload workflow. Other phases (models.yaml schema, swap script,
+> FastAPI endpoints, frontend) still apply but live inside `image/`.
+
 ## Goal

 I want to build a small web service that gives me a browser-based interface to: