Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)
Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model. First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
This commit is contained in:
@@ -1,5 +1,21 @@
|
||||
# Project: spark-control — Model switcher web UI for dual DGX Spark cluster
|
||||
|
||||
> **Update 2026-05-12 — Direction change:** the web UI is being built as a
|
||||
> **StartOS 0.4 package** (sideloaded onto Alice's existing Start9 server),
|
||||
> **not** as a FastAPI service running directly on Spark 1. The Start9 server
|
||||
> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
|
||||
> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
|
||||
> credentials live in a per-install config file managed by a "Configure Sparks"
|
||||
> action. See <https://docs.start9.com/packaging/0.4.0.x/> for the packaging
|
||||
> model. Repo layout:
|
||||
>
|
||||
> - `image/` — Docker image source (FastAPI app, runs anywhere with `uvicorn`).
|
||||
> - `package/` — StartOS 0.4 wrapper (manifest, main, interfaces, actions).
|
||||
>
|
||||
> The "Phase 4: Deploy" section below (systemd on Spark 1) is **superseded** by
|
||||
> the StartOS sideload workflow. Other phases (models.yaml schema, swap script,
|
||||
> FastAPI endpoints, frontend) still apply but live inside `image/`.
|
||||
|
||||
## Goal
|
||||
|
||||
I want to build a small web service that gives me a browser-based interface to:
|
||||
|
||||
Reference in New Issue
Block a user