Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)

Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model.
First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
This commit is contained in:
Grant
2026-05-12 09:49:08 -05:00
parent dd9d53060b
commit 342e150266
4 changed files with 38 additions and 0 deletions
+16
View File
@@ -1,5 +1,21 @@
# Project: spark-control — Model switcher web UI for dual DGX Spark cluster
> **Update 2026-05-12 — Direction change:** the web UI is being built as a
> **StartOS 0.4 package** (sideloaded onto Alice's existing Start9 server),
> **not** as a FastAPI service running directly on Spark 1. The Start9 server
> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
> credentials live in a per-install config file managed by a "Configure Sparks"
> action. See <https://docs.start9.com/packaging/0.4.0.x/> for the packaging
> model. Repo layout:
>
> - `image/` — Docker image source (FastAPI app, runs anywhere with `uvicorn`).
> - `package/` — StartOS 0.4 wrapper (manifest, main, interfaces, actions).
>
> The "Phase 4: Deploy" section below (systemd on Spark 1) is **superseded** by
> the StartOS sideload workflow. Other phases (models.yaml schema, swap script,
> FastAPI endpoints, frontend) still apply but live inside `image/`.
## Goal
I want to build a small web service that gives me a browser-based interface to: