docs: v0.27.1:0 live + published to Clankistry; Gemma download fix confirmed end-to-end

v0.27.1:0 - fix model download: prepend ~/.local/bin so SSH finds uvx
hf-download.sh shells out to uvx (the uv installer drops it in ~/.local/bin), but the non-interactive SSH session doesn't source the user's profile, so ~/.local/bin was off PATH and downloads died with "uvx: command not found". build_download_command now prepends $HOME/.local/bin. Adds test_download.py.
2026-06-18 16:46:24 -05:00 · 2026-06-18 16:44:07 -05:00 · 2026-06-18 13:51:11 -05:00 · 2026-06-18 13:41:28 -05:00 · 2026-06-18 12:35:16 -05:00 · 2026-06-18 11:09:56 -05:00
23 changed files with 1661 additions and 508 deletions
@@ -55,12 +55,20 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou

 ## Current state

- **Live service runs v0.22.0:0** (installed and serving). **v0.25.0:0 is the latest in tree — coordination layer (swap lock + webhook + schedule registry); built/typechecked clean, NOT yet committed/tagged/installed (this session's work).** It stacks on three releases also staged-but-not-live: v0.24.0:0 (configurable topology — committed `26070eb`, tagged, pushed to `gitea/master`), v0.23.0:0 (local/fine-tuned models — committed/tagged/Gitea-published). **Close-out backlog for all of these: (a) commit/tag/push v0.25.0:0; (b) `make release` to publish s9pk assets to Gitea Releases (needs `GITEA_URL` + write `GITEA_TOKEN`, neither in env); (c) the live install.** Installs blocked on the same mDNS issue (P3 line below). Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (blank ⇒ 8888); **configurable topology** (vLLM container name, hide-services list, second-Spark vLLM monitor — v0.24.0:0); local/fine-tuned models (v0.23.0:0); **coordination layer** (v0.25.0:0 — GPU swap reservation lock with `423`-enforced manual-swap pause + human Release override, swap_complete/swap_failed webhook, read-only schedule registry; API in `docs/COORDINATION.md`). Everything from v0.23 onward lands live once the installs go through. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (124 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), and the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping). Mock-heavy swap/proxy/endpoint tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
+- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
+- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
+- **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
+- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
+- **Next (owner-driven, concrete): Gemma-4-26B-A4B vision daily-driver eval.** The `gemma4-26b` recipe is in the catalog (NVFP4 MoE; `--moe_backend=marlin` set — the fast CUTLASS FP4 path errors on GB10; vision+tools). Not yet downloaded or swap-tested. Owner wants vision for business-card OCR and is weighing it against the text-only Qwen3.6 35B daily driver (research: Gemma ~52 tok/s vs Qwen's ~97, slightly weaker reasoning). Next: download it, swap-test, try a business card.
+- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
+- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
+- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
+- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (157 passing; the in-app settings gear + swap-lock route-order regression + the webhook-repoint live-reload check are in `test_app_settings.py`, incl. `TestClient` end-to-end). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
 - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
 - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
 - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
 - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. (4) **coordination layer** — DONE in tree, staged as **v0.25.0:0** (brought forward 2026-06-17 on request rather than waiting for our own automation). `image/app/coordination.py` + `docs/COORDINATION.md`: swap reservation lock (`GET/POST/DELETE /api/swap/lock`, secret token, `423`-enforced in `post_swap`, TTL-bounded in-memory, `?force=true` human override, dashboard banner + swap-button pause), swap webhook (`swap_complete`/`swap_failed` fired outside the swap lock from `SwapManager._run`, optional HMAC `X-Spark-Signature`, Configure-Sparks URL+secret), schedule registry (`GET/POST/DELETE /api/schedule`, read-only "Scheduled jobs" panel). +20 tests (`test_coordination.py`). Built/typechecked clean; commit + install pending. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
+- **Hosting / distribution:** source on self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) The s9pk ships via Gitea Releases (`make release`) **and** a self-hosted StartOS registry — operator-local publish tooling lives outside the repo; owner-specific addresses + the **authenticated-writes-must-be-direct-not-via-the-tunnel** gotcha are in session memory.
+- **Design stance (decided):** Spark Control = control plane / GPU arbiter, **not** a job runner; recurring business jobs live in separate services that *call* the swap API (`POST /api/swap`). Full epic history (v0.22→v0.25) is in git log + `ROADMAP.md` → "Cluster coordination".
+- **Usage note (2026-06-18):** owner's daily driver is the solo **Qwen3.6 35B**; the 235B `cluster` models are dormant. Keeping `launch-cluster.sh` (the `eugr/spark-vllm-docker` community standard, mirrors NVIDIA's `dgx-spark-playbooks` Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
+- **Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in `ROADMAP.md`):** (1) raw-`docker run` swap generalization — **DEFERRED** (rationale in ROADMAP; revisit only if an adopter wants Spark Control to *drive*, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting `launch-cluster.sh`). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker `HEALTHCHECK` — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).
@@ -73,16 +73,15 @@ The first start generates an ed25519 SSH keypair inside the package volume. Wait
 ### 4. Configure Sparks

 - Open Spark Control → **Actions → Configure Sparks**.
- Fill in:
+- Fill in just the four required fields:
  - **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first.
  - **Spark 1 SSH user** — whatever username you set up on Spark 1.
  - **Spark 2 hostname or IP** + **SSH user** — same idea.
-  - Optional Parakeet/Kokoro overrides — leave blank if those services run on Spark 2 (the normal case).
-  - Optional **Open WebUI URL** — paste your Open WebUI LAN URL to get a deep-link button in the dashboard next to the current model.
-  - Optional **NGC API key** — paste it here if you have one.

 Save.

+Everything else is optional and lives in the dashboard, not this action: open Spark Control and click **⚙ Settings** in the top bar to set vLLM/service **ports** (e.g. if your vLLM runs on 8000 rather than the default 8888, or you moved Parakeet off 8000), container names, support-service hosts, an **Open WebUI URL** (adds a deep-link button), an **NGC API key**, and a swap webhook. Changes there apply immediately and are included in StartOS backups.
+
 ### 5. Re-run Show Public Key (if you skipped earlier)

 Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
@@ -92,7 +91,7 @@ Now that hosts are configured, Show Public Key will give you the paste-ready ins
 From the Spark Control service page, click the Web UI button. You should see:

 - A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
- An **LLM tab** with cards for each model in the bundled catalog. Models you've downloaded show "on disk" badges; others show "not downloaded".
+- An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes.
 - An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.

 If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
@@ -159,7 +158,7 @@ All of these inherit Spark Control's TLS cert and StartOS access controls. You o
 A few things worth knowing:

 - The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated.
- **All connection info** comes from environment variables in `image/app/config.py`, populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action. No IPs, usernames, or paths are hardcoded in runtime code.
+- **All connection info** comes from environment variables in `image/app/config.py`. The four required fields are populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action; the optional knobs are overlaid from the in-app `⚙ Settings` store (`/data/app_settings.json`, see `image/app/app_settings.py`). No IPs, usernames, or paths are hardcoded in runtime code.
 - The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it.
 - **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates.
 - The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips.
@@ -112,14 +112,14 @@ Fields: `service` (required), `ok` (required), `source` (optional, free-form), `

 ## Status

-**v0.2.3 / s9pk version 0.13.0:4** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
+**s9pk version 0.26.0:0** — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled *launch recipes* (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.

 ### What v0.2 added on top of v0.1

 - **Service discovery API** (`/api/endpoints`) for other LAN services
 - **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host configuration in Configure Sparks (so they can live on Spark 1, Spark 2, or anywhere)
- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
+- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host/port/container configuration in the in-app **⚙ Settings** gear (so they can live on Spark 1, Spark 2, or anywhere, on any port)
+- **Model download** from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
 - **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
 - **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
 - **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
@@ -16,7 +16,19 @@ Sequenced:
   - **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). Acquire returns a secret token; the swap endpoint refuses any real swap (`423`) that doesn't present it in `X-Swap-Lock-Token`, so the dashboard's manual swap is paused while a scheduler holds it (with a `?force=true` human override). In-memory + TTL-bounded → resets to unlocked on restart; re-acquire with the token extends. Enforced in `post_swap`, not advisory.
   - **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL (Configure-Sparks field), fired from `SwapManager._run` *outside* the swap lock; optional shared secret ⇒ `X-Spark-Signature` HMAC. Fire-and-forget (5 s, no retries); dry runs don't fire.
   - **Schedule visibility** — `GET/POST/DELETE /api/schedule`; read-only "Scheduled jobs" dashboard panel, registered by external schedulers. Spark Control stores and displays, never executes.
-   - Still NOT generalized: the swap *mechanism* to raw `docker run` (that's the adopter's own crons' job). Tests: `image/tests/test_coordination.py` (22 cases — lock lifecycle/expiry/token, the single-read swap gate, schedule CRUD + id validation, webhook payload+signature). Known limit: lock + schedules are in-memory (a restart frees the lock and empties the registry until schedulers re-register) — persist to `/data` only if that bites.
+   - Tests: `image/tests/test_coordination.py` (22 cases — lock lifecycle/expiry/token, the single-read swap gate, schedule CRUD + id validation, webhook payload+signature). Known limit: lock + schedules are in-memory (a restart frees the lock and empties the registry until schedulers re-register) — persist to `/data` only if that bites.
+
+### Generalizing the swap mechanism to raw `docker run` — DEFERRED (decided 2026-06-18, research-backed; was item 4's last open thread)
+
+Our swap drives `~/spark-vllm-docker/launch-cluster.sh` over SSH on Spark 1 (`./launch-cluster.sh stop`, then `[VLLM_SPARK_EXTRA_DOCKER_ARGS=…] ./launch-cluster.sh [--solo ]-d exec vllm serve <model> <args>`, then `docker logs -f` until the ready marker). The OpenClaw adopter launches vLLM with a plain `docker run` instead, so the swap button can't drive his cluster — only monitor it. The portability fix would be a configurable "swap backend": keep `launch-cluster.sh` as the default and add a "bring your own command" mode (operator-authored stop/launch templates in `services-overrides.yaml` with quoted `{model}`/`{container}`/`{port}`/`{extra_args}` substitution; ready-detection unchanged; the vLLM-argparse pre-flight disabled for that backend).
+
+**Why deferred, not built:**
+- **Raw docker is not an upgrade for *us* — for half our catalog it's impossible.** `launch-cluster.sh` is the `eugr/spark-vllm-docker` community project (de-facto DGX Spark standard; mirrors NVIDIA's own `dgx-spark-playbooks` Ray+RDMA architecture). Its headline job is **multi-node** serving: our 235B `cluster` models (Qwen3-VL 235B, Qwen3 235B) exceed one Spark's 128 GB and *must* shard across both Sparks via Ray over the 200 Gbps ConnectX/RoCE link — plumbing (NCCL/MTU/per-node env) that a single-node `docker run` cannot do. So we keep the helper script; switching our own cluster to raw docker is off the table.
+- **The feature is therefore portability-only** (for differently-wired adopters), and the one known adopter doesn't need it — he swaps via his own crons and uses Spark Control to watch.
+- **Untestable on our hardware** — our cluster uses the helper script, so we can't validate a real raw-docker swap without risking the live vLLM.
+- The one real standing risk is eugr's single-maintainer status; fallback is community forks or migrating to NVIDIA's official `dgx-spark-playbooks` launcher (same design). No reason to switch now.
+
+**Revisit only if** an adopter explicitly wants Spark Control to *drive* (not just monitor) swaps on a raw-`docker run` cluster. At that point, get their actual working `docker run` command and build the command-template backend to it.

 ## Near term
 - parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
@@ -35,10 +35,13 @@ Two kinds, both run with the `image/.venv` interpreter (system python3 has no de
 - New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
 - **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
 - **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
+- **Settings split (gear vs StartOS action):** only the four *required* fields (both Spark IPs + SSH users) live in the StartOS "Configure Sparks" action → `config.yaml` → env. Every *optional* knob (ports, container names, support-service hosts, integrations, webhook) is edited in the dashboard's ⚙ Settings gear, backed by the `/data/app_settings.json` overlay (`app_settings.py`), keyed by the same env-var names. Precedence (`config._effective_env`): `os.environ` first, overlay on top. `app_settings.seed_from_env` runs **once at startup** to migrate a pre-gear install's env values into the overlay (don't move seeding into `from_env`/`reload` — it writes, and `from_env` runs on every build → it would clobber across calls, which it did once already). **`Settings` is deliberately not frozen:** one shared instance is threaded by reference into every router closure/manager, and `Settings.reload()` (called after a gear save) recomputes its fields **in place** so changes apply live with no restart and no call-site changes. **Gotcha:** this only reaches holders that keep the *object* (`self.settings = settings`); anything that snapshots a *value* at construction is invisible to `reload()` and must be re-synced explicitly. The one such holder is `WebhookNotifier`, which copies `url`/`secret` — `post_settings` calls `swap_webhook.update(...)` right after `reload()`. Any future component that caches a gear-managed value (rather than reading `settings.x` at use time) needs the same treatment. A new gear knob = add one entry to `app_settings.FIELDS` (the front-end renders it generically); the matching `config.Settings` field must already read that env var.

 ## Layout

 - `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
+- `image/app/discovery.py` — the disk-driven model menu. `/api/models` lists what's actually downloaded on the Sparks (via `disk.list_cached_models`); `models.yaml`/overrides are *launch recipes* matched by repo, not the menu. An on-disk model with no recipe is `needs_setup` → `infer_recipe` reads its `config.json` to prefill a setup form the operator confirms once.
+- `image/app/app_settings.py` — the in-app settings overlay backing the ⚙ gear: `FIELDS` metadata (drives `/api/settings` + the UI form), `load_overlay()` (pure read), `seed_from_env()` (one-time migration), `apply()` (validate + persist). `GET/POST /api/settings` in `server.py` read/write it, then `settings.reload()`.
 - `image/app/static/` — the dashboard UI.
- `image/models.yaml` — vLLM model catalog bundled into the image.
+- `image/models.yaml` — bundled vLLM **launch recipes** (how to launch a known model), NOT the dashboard menu — the menu is the on-disk scan.
 - `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
@@ -0,0 +1,286 @@
+"""App-owned settings overlay: the in-dashboard 'gear' knobs.
+
+Spark Control's *required* wiring — the two Spark IPs and SSH users — is set once
+via the StartOS "Configure Sparks" action and arrives as env vars. Everything
+else (ports, container names, support-service hosts, integrations, webhook) is
+optional and lives here: a small JSON overlay on /data that the dashboard gear
+reads and writes, so an operator never has to open StartOS actions to tune the
+cluster. This follows the StartOS 0.4 convention (minimal setup action; routine
+config in the app's own UI) and stays inside the package's backup volume, so the
+file is backed up and restored for free.
+
+Each overlay entry is keyed by the *same env var name* config.Settings already
+reads, so the overlay is simply an env-var override store. Precedence (see
+config._effective_env): process env first, this overlay on top — so a knob set
+in the gear wins, while an un-touched knob falls through to whatever the StartOS
+action injected, then to the code default.
+
+First-run migration: when the overlay file doesn't exist yet (e.g. an existing
+install upgrading into this version), it's seeded from the current env so any
+value previously set via the StartOS action carries over into the gear with no
+operator action and nothing lost.
+"""
+from __future__ import annotations
+import json
+import logging
+import os
+import re
+import tempfile
+from pathlib import Path
+from typing import Mapping
+
+log = logging.getLogger(__name__)
+
+# Field metadata drives BOTH the /api/settings response (the front-end renders
+# the form generically from this) and light server-side validation. `key` is the
+# env var name; `type` is one of text|int|csv|secret. `secret` values are
+# write-only — never echoed back to the browser.
+FIELDS: list[dict] = [
+    # --- vLLM (Spark 1) ---
+    {"group": "vLLM (Spark 1)", "key": "VLLM_PORT", "label": "vLLM port", "type": "int",
+     "placeholder": "8888",
+     "help": "Port your vLLM listens on. Blank ⇒ 8888 (the bundled launch-cluster.sh). Set 8000 for vanilla vLLM, or wherever yours listens."},
+    {"group": "vLLM (Spark 1)", "key": "VLLM_CONTAINER", "label": "vLLM container name", "type": "text",
+     "placeholder": "vllm_node",
+     "help": "Docker container the swappable vLLM runs in. Blank ⇒ vllm_node. The swap log-tail and pre-flight validator exec into it by name."},
+
+    # --- Monitoring ---
+    {"group": "Monitoring", "key": "DISABLED_SERVICES", "label": "Services to hide", "type": "csv",
+     "placeholder": "e.g. parakeet,kokoro",
+     "help": "Comma-separated built-in services your cluster doesn't run, so their tiles are hidden and never probed. Valid: parakeet, kokoro, embeddings, qdrant. Blank ⇒ monitor all."},
+
+    # --- Parakeet (STT) ---
+    {"group": "Parakeet (STT)", "key": "PARAKEET_HOST", "label": "Host", "type": "text",
+     "placeholder": "leave blank for Spark 2",
+     "help": "Host running the Parakeet STT container. Blank ⇒ Spark 2."},
+    {"group": "Parakeet (STT)", "key": "PARAKEET_PORT", "label": "Port", "type": "int",
+     "placeholder": "8000",
+     "help": "Port Parakeet listens on. Blank ⇒ 8000. Set this if you remapped it (e.g. because your vLLM holds 8000)."},
+    {"group": "Parakeet (STT)", "key": "PARAKEET_CONTAINER", "label": "Container name", "type": "text",
+     "placeholder": "parakeet-asr",
+     "help": "Docker container name for Parakeet. Blank ⇒ parakeet-asr."},
+    {"group": "Parakeet (STT)", "key": "PARAKEET_USER", "label": "SSH user", "type": "text",
+     "placeholder": "leave blank for Spark 2 user",
+     "help": "SSH user that owns the Parakeet container. Blank ⇒ your Spark 2 user."},
+
+    # --- Kokoro (TTS) ---
+    {"group": "Kokoro (TTS)", "key": "KOKORO_HOST", "label": "Host", "type": "text",
+     "placeholder": "leave blank for Spark 2",
+     "help": "Host running the Kokoro TTS container. Blank ⇒ Spark 2."},
+    {"group": "Kokoro (TTS)", "key": "KOKORO_PORT", "label": "Port", "type": "int",
+     "placeholder": "8880",
+     "help": "Port Kokoro listens on. Blank ⇒ 8880."},
+    {"group": "Kokoro (TTS)", "key": "KOKORO_CONTAINER", "label": "Container name", "type": "text",
+     "placeholder": "kokoro-tts",
+     "help": "Docker container name for Kokoro. Blank ⇒ kokoro-tts."},
+    {"group": "Kokoro (TTS)", "key": "KOKORO_USER", "label": "SSH user", "type": "text",
+     "placeholder": "leave blank for Spark 2 user",
+     "help": "SSH user that owns the Kokoro container. Blank ⇒ your Spark 2 user."},
+
+    # --- Embeddings ---
+    {"group": "Embeddings", "key": "EMBED_HOST", "label": "Host", "type": "text",
+     "placeholder": "leave blank for Spark 2",
+     "help": "Host running the spark-embed container (bge-m3 + reranker). Blank ⇒ Spark 2."},
+    {"group": "Embeddings", "key": "EMBED_PORT", "label": "Port", "type": "int",
+     "placeholder": "8088",
+     "help": "Port the embedding server listens on. Blank ⇒ 8088."},
+    {"group": "Embeddings", "key": "EMBED_CONTAINER", "label": "Container name", "type": "text",
+     "placeholder": "spark-embed",
+     "help": "Docker container name for the embedding server. Blank ⇒ spark-embed."},
+    {"group": "Embeddings", "key": "EMBED_USER", "label": "SSH user", "type": "text",
+     "placeholder": "leave blank for Spark 2 user",
+     "help": "SSH user that owns the embedding container. Blank ⇒ your Spark 2 user."},
+
+    # --- Qdrant ---
+    {"group": "Qdrant", "key": "QDRANT_HOST", "label": "Host", "type": "text",
+     "placeholder": "leave blank for Spark 2",
+     "help": "Host running the Qdrant vector database. Blank ⇒ Spark 2."},
+    {"group": "Qdrant", "key": "QDRANT_PORT", "label": "Port", "type": "int",
+     "placeholder": "6333",
+     "help": "Port Qdrant's REST API listens on. Blank ⇒ 6333."},
+    {"group": "Qdrant", "key": "QDRANT_CONTAINER", "label": "Container name", "type": "text",
+     "placeholder": "qdrant",
+     "help": "Docker container name for Qdrant. Blank ⇒ qdrant."},
+    {"group": "Qdrant", "key": "QDRANT_USER", "label": "SSH user", "type": "text",
+     "placeholder": "leave blank for Spark 2 user",
+     "help": "SSH user that owns the Qdrant container. Blank ⇒ your Spark 2 user."},
+    {"group": "Qdrant", "key": "QDRANT_COLLECTION", "label": "Default collection", "type": "text",
+     "placeholder": "e.g. crm_chunks",
+     "help": "Collection used by /api/search when a request doesn't name one. Blank ⇒ callers must pass a collection."},
+
+    # --- Integrations ---
+    {"group": "Integrations", "key": "OPEN_WEBUI_URL", "label": "Open WebUI URL", "type": "text",
+     "placeholder": "e.g. https://open-webui.yourserver.local",
+     "help": "If set, the header shows a one-click 'Open chat' button to your Open WebUI."},
+    {"group": "Integrations", "key": "MATRIX_BRIDGE_USER", "label": "matrix-bridge bot SSH user", "type": "text",
+     "placeholder": "e.g. modelo",
+     "help": "SSH user owning the bot's ~/matrix-bridge clone (Spark 2). Set this to show the bot tile (update/restart/logs). Blank ⇒ tile hidden."},
+    {"group": "Integrations", "key": "NGC_API_KEY", "label": "NGC API key", "type": "secret",
+     "placeholder": "starts with nvapi-…",
+     "help": "NVIDIA NGC personal key, needed only to install NIM containers from nvcr.io. Stored on this server."},
+    {"group": "Integrations", "key": "SWAP_WEBHOOK_URL", "label": "Swap webhook URL", "type": "text",
+     "placeholder": "e.g. https://my-service.local/spark-swap",
+     "help": "POSTed a small JSON event (swap_complete / swap_failed) after every model swap, so automation can re-point to the new model. Blank ⇒ disabled."},
+    {"group": "Integrations", "key": "SWAP_WEBHOOK_SECRET", "label": "Swap webhook secret", "type": "secret",
+     "placeholder": "a random shared string",
+     "help": "If set, each webhook is HMAC-signed (X-Spark-Signature) so the receiver can verify it. Blank ⇒ unsigned."},
+]
+
+_BY_KEY = {f["key"]: f for f in FIELDS}
+_SECRET_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "secret")
+_INT_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "int")
+# Reject control characters (incl. newlines) — these values flow into env vars,
+# URLs, and SSH command lines (quoted at the sink, but defence in depth).
+_BAD_CHARS = re.compile(r"[\x00-\x1f\x7f]")
+# A secret's value is never echoed back, so a blank submit means "keep the stored
+# one" (you can't see it to retype it). To actually *remove* a stored secret the
+# UI sends this sentinel instead of a real value. Surfaced to the front-end via
+# public_view so the two stay in sync.
+CLEAR_SENTINEL = "__clear__"
+
+
+def _path() -> Path:
+    return Path(os.environ.get("APP_SETTINGS_FILE", "/data/app_settings.json"))
+
+
+def field_keys() -> frozenset[str]:
+    return frozenset(_BY_KEY)
+
+
+def load_overlay() -> dict[str, str]:
+    """Return the overlay as {ENV_KEY: value}, filtered to known, non-empty keys.
+
+    Pure read (no side effects) — called on every Settings (re)build, so it must
+    not write. Missing/corrupt file ⇒ {}. The file is tiny."""
+    p = _path()
+    if not p.exists():
+        return {}
+    try:
+        raw = json.loads(p.read_text())
+    except (ValueError, OSError) as e:
+        log.warning("ignoring unreadable %s: %s", p, e)
+        return {}
+    if not isinstance(raw, dict):
+        return {}
+    return {k: str(v) for k, v in raw.items() if k in _BY_KEY and v not in (None, "")}
+
+
+def seed_from_env(env: Mapping[str, str]) -> None:
+    """One-time migration, called once at startup: if no overlay exists yet, seed
+    it from the current env so any optional value previously set via the StartOS
+    action carries into the gear automatically (nothing lost on upgrade). No-op
+    if the file already exists or the env carries no known non-empty knob — a
+    fresh install then starts with no overlay and pure defaults. Values run
+    through the same validation as apply(); a malformed one (e.g. a paste-error
+    port) is skipped rather than written, matching the gear's own guards."""
+    if _path().exists():
+        return
+    seeded: dict[str, str] = {}
+    for k in _BY_KEY:
+        v = env.get(k)
+        if not v:
+            continue
+        try:
+            cleaned = _validate(k, v)
+        except SettingsError as e:
+            log.warning("skipping invalid env value while seeding overlay: %s", e)
+            continue
+        if cleaned and cleaned != CLEAR_SENTINEL:
+            seeded[k] = cleaned
+    if seeded:
+        _write(seeded)
+        log.info("seeded settings overlay from env (%d keys): %s", len(seeded), _path())
+
+
+def _write(overlay: dict[str, str]) -> None:
+    p = _path()
+    p.parent.mkdir(parents=True, exist_ok=True)
+    # Atomic replace so a crash mid-write never leaves a truncated overlay.
+    fd, tmp = tempfile.mkstemp(dir=str(p.parent), prefix=".app_settings.", suffix=".tmp")
+    try:
+        with os.fdopen(fd, "w") as fh:
+            json.dump(overlay, fh, indent=2, sort_keys=True)
+        os.replace(tmp, p)
+    except BaseException:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+        raise
+
+
+def public_view() -> dict:
+    """Shape the gear form for the browser: ordered groups of fields with their
+    current overlay value. Secret values are never sent — only a `set` flag."""
+    overlay = load_overlay()
+    groups: list[dict] = []
+    index: dict[str, dict] = {}
+    for f in FIELDS:
+        g = index.get(f["group"])
+        if g is None:
+            g = {"name": f["group"], "fields": []}
+            index[f["group"]] = g
+            groups.append(g)
+        entry = {
+            "key": f["key"],
+            "label": f["label"],
+            "type": f["type"],
+            "placeholder": f.get("placeholder", ""),
+            "help": f.get("help", ""),
+        }
+        if f["type"] == "secret":
+            entry["set"] = bool(overlay.get(f["key"]))
+        else:
+            entry["value"] = overlay.get(f["key"], "")
+        g["fields"].append(entry)
+    return {"groups": groups, "clear_sentinel": CLEAR_SENTINEL}
+
+
+class SettingsError(ValueError):
+    """Bad input to apply() — surfaced as 422 by the endpoint."""
+
+
+def _validate(key: str, value) -> str:
+    """Clean + validate one value; raise SettingsError on bad input. Returns the
+    stripped string ('' is valid and means 'unset'). The CLEAR_SENTINEL passes
+    through for the caller to interpret (secret removal)."""
+    if key not in _BY_KEY:
+        raise SettingsError(f"unknown setting: {key}")
+    val = ("" if value is None else str(value)).strip()
+    if val == CLEAR_SENTINEL:
+        return val
+    if _BAD_CHARS.search(val):
+        raise SettingsError(f"{key}: control characters are not allowed")
+    if key in _INT_KEYS and val:
+        if not val.isdigit() or not (1 <= int(val) <= 65535):
+            raise SettingsError(f"{key}: must be a port number between 1 and 65535")
+    return val
+
+
+def apply(updates: Mapping[str, str]) -> dict[str, str]:
+    """Validate `updates` and merge them into the overlay, then persist.
+
+    Rules per key:
+      - unknown key / bad int / control chars → reject (422, via _validate)
+      - secret + CLEAR_SENTINEL → delete the stored secret
+      - secret + blank value    → leave the stored secret unchanged (don't wipe)
+      - non-secret + blank       → delete the key (revert to env/default)
+      - otherwise                → set the key
+
+    Returns the new overlay. The caller reloads Settings so the change goes live.
+    """
+    overlay = load_overlay()
+    for key, value in updates.items():
+        val = _validate(key, value)
+        if key in _SECRET_KEYS:
+            if val == CLEAR_SENTINEL:
+                overlay.pop(key, None)
+            elif val:
+                overlay[key] = val
+            # blank secret ⇒ leave the existing value in place
+        elif val and val != CLEAR_SENTINEL:
+            overlay[key] = val
+        else:
+            overlay.pop(key, None)
+    _write(overlay)
+    return overlay
@@ -1,26 +1,28 @@
 from __future__ import annotations
 import logging
 import os
-from dataclasses import dataclass
+from dataclasses import dataclass, fields
 from pathlib import Path
+from typing import Mapping

+from . import app_settings
 from .shellsafe import validate_container

 log = logging.getLogger(__name__)


-def _env(name: str, default: str = "") -> str:
-    return os.environ.get(name, default)
+def _env(src: Mapping[str, str], name: str, default: str = "") -> str:
+    return src.get(name, default)


-def _env_container(name: str, default: str) -> str:
+def _env_container(src: Mapping[str, str], name: str, default: str) -> str:
    """Resolve a container-name env var, validating it at the config boundary.

    The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
    the sink — but per the repo's two-layer convention it's also whitelist-checked
    here. A malformed optional value falls back to `default` rather than crashing
-    daemon startup (mirrors `_env_int` for VLLM_PORT)."""
-    val = os.environ.get(name, "") or default
+    daemon startup (mirrors `_env_int`)."""
+    val = src.get(name, "") or default
    try:
        return validate_container(val)
    except ValueError:
@@ -28,23 +30,23 @@ def _env_container(name: str, default: str) -> str:
        return default


-def _env_set(name: str) -> frozenset[str]:
+def _env_set(src: Mapping[str, str], name: str) -> frozenset[str]:
    """Parse a comma-separated env var into a lowercased frozenset of keys.

    Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
    support service can switch its tile + probes off entirely (rather than have
    the probe hit whatever else listens on that port — e.g. a vLLM sharing
    Parakeet's default 8000)."""
-    raw = os.environ.get(name, "")
+    raw = src.get(name, "")
    return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())


-def _env_int(name: str, default: int) -> int:
+def _env_int(src: Mapping[str, str], name: str, default: int) -> int:
    """Parse an int env var, falling back to `default` when unset, blank, or
-    malformed. The StartOS Configure panel passes optional numeric fields as an
-    empty string when left blank, so a bare int("") would crash daemon startup."""
+    malformed. Optional numeric fields arrive as an empty string when left blank,
+    so a bare int("") would crash daemon startup."""
    try:
-        return int(os.environ.get(name, "") or default)
+        return int(src.get(name, "") or default)
    except (TypeError, ValueError):
        return default

@@ -64,8 +66,23 @@ def _resolve_models_yaml() -> str:
    return str(candidates[0])  # let load fail with a clear path


-@dataclass(frozen=True)
+def _effective_env() -> dict[str, str]:
+    """The env Settings is built from: process env first, the in-app settings
+    overlay on top. The overlay (the dashboard 'gear') is keyed by the same env
+    var names, so a knob set in the UI overrides the value the StartOS action
+    injected — while an un-touched knob keeps falling through to the action's
+    value, then to the code default. See app_settings."""
+    return {**os.environ, **app_settings.load_overlay()}
+
+
+@dataclass
 class Settings:
+    # NOTE: intentionally NOT frozen. There is exactly one Settings instance,
+    # shared by reference across every router closure and manager (build_router,
+    # self.settings = settings). `reload()` mutates it in place so a change saved
+    # via the in-app settings gear goes live for all of them without rebuilding
+    # the app — the only window of inconsistency is the microseconds it takes to
+    # reassign the fields, acceptable for a single-operator config save.
    spark1_host: str
    spark1_user: str
    spark2_host: str
@@ -107,73 +124,82 @@ class Settings:
    swap_webhook_secret: str

    @classmethod
-    def from_env(cls) -> "Settings":
-        spark2_host = _env("SPARK2_HOST")
-        spark2_user = _env("SPARK2_USER")
+    def from_env(cls, src: Mapping[str, str] | None = None) -> "Settings":
+        src = _effective_env() if src is None else src
+        spark2_host = _env(src, "SPARK2_HOST")
+        spark2_user = _env(src, "SPARK2_USER")
        # Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden.
        return cls(
-            spark1_host=_env("SPARK1_HOST"),
-            spark1_user=_env("SPARK1_USER"),
+            spark1_host=_env(src, "SPARK1_HOST"),
+            spark1_user=_env(src, "SPARK1_USER"),
            spark2_host=spark2_host,
            spark2_user=spark2_user,
-            parakeet_host=_env("PARAKEET_HOST") or spark2_host,
-            parakeet_user=_env("PARAKEET_USER") or spark2_user,
-            parakeet_container=_env("PARAKEET_CONTAINER") or "parakeet-asr",
-            kokoro_host=_env("KOKORO_HOST") or spark2_host,
-            kokoro_user=_env("KOKORO_USER") or spark2_user,
-            kokoro_container=_env("KOKORO_CONTAINER") or "kokoro-tts",
+            parakeet_host=_env(src, "PARAKEET_HOST") or spark2_host,
+            parakeet_user=_env(src, "PARAKEET_USER") or spark2_user,
+            parakeet_container=_env(src, "PARAKEET_CONTAINER") or "parakeet-asr",
+            kokoro_host=_env(src, "KOKORO_HOST") or spark2_host,
+            kokoro_user=_env(src, "KOKORO_USER") or spark2_user,
+            kokoro_container=_env(src, "KOKORO_CONTAINER") or "kokoro-tts",
            # Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant
            # (vector storage) default to Spark 2 unless overridden.
-            embed_host=_env("EMBED_HOST") or spark2_host,
-            embed_user=_env("EMBED_USER") or spark2_user,
-            embed_container=_env("EMBED_CONTAINER") or "spark-embed",
-            qdrant_host=_env("QDRANT_HOST") or spark2_host,
-            qdrant_user=_env("QDRANT_USER") or spark2_user,
-            qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
-            qdrant_collection=_env("QDRANT_COLLECTION", ""),
+            embed_host=_env(src, "EMBED_HOST") or spark2_host,
+            embed_user=_env(src, "EMBED_USER") or spark2_user,
+            embed_container=_env(src, "EMBED_CONTAINER") or "spark-embed",
+            qdrant_host=_env(src, "QDRANT_HOST") or spark2_host,
+            qdrant_user=_env(src, "QDRANT_USER") or spark2_user,
+            qdrant_container=_env(src, "QDRANT_CONTAINER") or "qdrant",
+            qdrant_collection=_env(src, "QDRANT_COLLECTION", ""),
            # matrix-bridge bot container, driven as its own SSH user (the owner
            # of the ~/matrix-bridge git clone) so git/docker run unprivileged.
-            # The user is BLANK by default and set via the "Configure Sparks"
-            # action; leaving it blank reports the service as unconfigured, which
-            # hides the tile. That keeps the shared package portable — a
-            # deployment without the bot never shows a stray tile or a hardcoded
-            # username. Host defaults to Spark 2 (same box); container/dir/branch
-            # are sensible defaults. All are env-overridable.
-            matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
-            matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
-            matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
-            matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
-            matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
+            # The user is BLANK by default and set via the settings gear; leaving
+            # it blank reports the service as unconfigured, which hides the tile.
+            # That keeps the shared package portable — a deployment without the
+            # bot never shows a stray tile or a hardcoded username. Host defaults
+            # to Spark 2 (same box); container/dir/branch are sensible defaults.
+            matrix_bridge_host=_env(src, "MATRIX_BRIDGE_HOST") or spark2_host,
+            matrix_bridge_user=_env(src, "MATRIX_BRIDGE_USER"),
+            matrix_bridge_container=_env(src, "MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
+            matrix_bridge_dir=_env(src, "MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
+            matrix_bridge_branch=_env(src, "MATRIX_BRIDGE_BRANCH") or "master",
            # Redaction gateway pseudonym-map store (server-held de-anon key).
-            redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
-            redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
-            ssh_key_path=_env("SSH_KEY_PATH"),
-            ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
+            redaction_map_db=_env(src, "REDACTION_MAP_DB", "/data/redaction_maps.db"),
+            redaction_map_ttl=_env_int(src, "REDACTION_MAP_TTL", 7200),
+            ssh_key_path=_env(src, "SSH_KEY_PATH"),
+            ssh_known_hosts=_env(src, "SSH_KNOWN_HOSTS"),
            models_yaml=_resolve_models_yaml(),
-            vllm_port=_env_int("VLLM_PORT", 8888),
+            vllm_port=_env_int(src, "VLLM_PORT", 8888),
            # Container name for the swappable vLLM on Spark 1. Defaults to the
            # bundled launch-cluster.sh container; override if you named yours
            # something else (the swap log-tail and pre-flight validator exec
            # into it by name).
-            vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
+            vllm_container=_env_container(src, "VLLM_CONTAINER", "vllm_node"),
            # Built-in support-service keys (parakeet, kokoro, embeddings,
            # qdrant) the deployment doesn't run — hidden from the dashboard and
            # never probed.
-            disabled_services=_env_set("DISABLED_SERVICES"),
-            parakeet_port=_env_int("PARAKEET_PORT", 8000),
-            kokoro_port=_env_int("KOKORO_PORT", 8880),
-            embed_port=_env_int("EMBED_PORT", 8088),
-            qdrant_port=_env_int("QDRANT_PORT", 6333),
-            bind_port=_env_int("BIND_PORT", 9999),
-            open_webui_url=_env("OPEN_WEBUI_URL", ""),
-            ngc_api_key=_env("NGC_API_KEY", ""),
+            disabled_services=_env_set(src, "DISABLED_SERVICES"),
+            parakeet_port=_env_int(src, "PARAKEET_PORT", 8000),
+            kokoro_port=_env_int(src, "KOKORO_PORT", 8880),
+            embed_port=_env_int(src, "EMBED_PORT", 8088),
+            qdrant_port=_env_int(src, "QDRANT_PORT", 6333),
+            bind_port=_env_int(src, "BIND_PORT", 9999),
+            open_webui_url=_env(src, "OPEN_WEBUI_URL", ""),
+            ngc_api_key=_env(src, "NGC_API_KEY", ""),
            # Coordination layer: fire a swap-lifecycle webhook to this URL so
            # downstream consumers re-point their model config on a swap. Blank
            # ⇒ disabled. The optional secret HMAC-signs the body (X-Spark-Signature).
-            swap_webhook_url=_env("SWAP_WEBHOOK_URL", ""),
-            swap_webhook_secret=_env("SWAP_WEBHOOK_SECRET", ""),
+            swap_webhook_url=_env(src, "SWAP_WEBHOOK_URL", ""),
+            swap_webhook_secret=_env(src, "SWAP_WEBHOOK_SECRET", ""),
        )

+    def reload(self) -> None:
+        """Recompute every field from the current env + settings overlay and
+        assign it onto this same instance, so all holders of the reference see
+        the change without an app restart. Called after the gear writes the
+        overlay (see server.post_settings)."""
+        fresh = Settings.from_env()
+        for f in fields(self):
+            setattr(self, f.name, getattr(fresh, f.name))
+
    @property
    def configured(self) -> bool:
        return bool(self.spark1_host)
@@ -239,6 +239,14 @@ class WebhookNotifier:
        self.secret = secret or ""
        self.timeout = timeout

+    def update(self, url: str, secret: str = "") -> None:
+        """Re-point after a live settings change. The notifier holds snapshot
+        copies of these two fields (not the Settings object), so Settings.reload()
+        can't reach it — server.post_settings calls this explicitly so editing the
+        webhook URL/secret in the dashboard gear takes effect without a restart."""
+        self.url = (url or "").strip()
+        self.secret = secret or ""
+
    @property
    def enabled(self) -> bool:
        return bool(self.url)
@@ -0,0 +1,209 @@
+"""Disk-driven model menu + launch-recipe inference.
+
+The dashboard's model list is whatever is actually downloaded on the Sparks
+(see `disk.list_cached_models`), NOT a hard-coded catalog. The bundled/overridden
+catalog entries are *launch recipes*: matched to an on-disk model by repo, they
+say HOW to launch it. A completed model on disk with no matching recipe shows up
+as `needs_setup` — the first switch reads its `config.json`, proposes a recipe
+(`infer_recipe`) the operator confirms once, and that confirmed recipe is saved
+to /data so it's a normal card from then on.
+
+Why a recipe layer at all, if the menu is the disk? Because a folder on disk
+doesn't say how to launch it: the per-family parsers (`--reasoning-parser`,
+`--tool-call-parser`), the MoE backend (some Gemma MoE checkpoints need
+`marlin` on GB10), and solo-vs-cluster topology can't be read off a directory.
+We infer a best guess from the model's own config + size, but the operator
+confirms it — a wrong guess is cheap, a wrong launch is not.
+"""
+from __future__ import annotations
+import asyncio
+import re
+
+from .config import Settings
+from .disk import list_cached_models, probe_disk
+from .overrides import extract_knobs_from_args
+
+
+# A model whose weights exceed this can't fit one Spark's 128 GB beside a KV
+# cache, so it must shard across both via Ray. A heuristic prefill only — the
+# operator confirms mode in the setup form, so the exact cutoff isn't critical.
+SINGLE_SPARK_BYTES = 115 * 1000 ** 3
+
+# Generic knob defaults applied to every inferred recipe (the operator can tweak
+# these in the setup form). Family-specific flags (parsers, MoE backend) are
+# layered on separately by `_detect_family`.
+_COMMON_KNOBS = {
+    "max_model_len": 32768,
+    "gpu_memory_utilization": 0.85,
+    "fastsafetensors": True,
+    "prefix_caching": True,
+    "kv_cache_dtype": "fp8",
+}
+
+
+def repo_to_key(repo: str) -> str:
+    """Stable, URL-safe menu key for a discovered model with no recipe key yet.
+
+    'RedHatAI/Qwen3.6-35B-A3B-NVFP4' -> 'redhatai-qwen3-6-35b-a3b-nvfp4'. The same
+    slug is used by the menu, the setup form, and `_identify_current_model`, so a
+    loaded-but-unconfigured model still highlights as active."""
+    return re.sub(r"[^a-z0-9_-]+", "-", repo.lower()).strip("-")
+
+
+def _detect_family(config: dict) -> tuple[str, list[str], list[str]]:
+    """Return (family_label, vllm_flags, capabilities) inferred from config.json.
+
+    Only family-specific, non-knob flags (parsers, MoE backend) go in vllm_flags;
+    generic knob defaults are handled by the caller. Best-effort and operator-
+    confirmed, so a wrong guess is cheap."""
+    arch = " ".join(config.get("architectures") or [])
+    mtype = str(config.get("model_type") or "")
+    s = (arch + " " + mtype).lower()
+    is_moe = (
+        "moe" in s
+        or any(config.get(k) for k in ("num_experts", "n_routed_experts", "num_local_experts"))
+    )
+    is_vision = (
+        "conditionalgeneration" in s
+        or "vision" in s
+        or "vlforcausallm" in s
+        or "vision_config" in config
+        or "image_token_index" in config
+    )
+    flags: list[str] = []
+    caps: list[str] = []
+    label = "Generic"
+    if mtype.startswith("qwen3") or "qwen3" in s:
+        label = "Qwen3 (MoE)" if is_moe else "Qwen3"
+        flags.append("--reasoning-parser=qwen3")
+        caps.append("reasoning")
+        if is_moe:
+            flags.append("--moe_backend=flashinfer_cutlass")
+    elif "gemma" in s:
+        label = "Gemma (MoE)" if is_moe else "Gemma"
+        flags += ["--reasoning-parser=gemma4", "--tool-call-parser=gemma4", "--enable-auto-tool-choice"]
+        caps += ["reasoning", "tools"]
+        if is_moe:
+            # The fast flashinfer/CUTLASS FP4 path errors on GB10 for Gemma MoE;
+            # marlin is the working fallback (see the Gemma 26B trial notes).
+            flags.append("--moe_backend=marlin")
+    if is_vision and "vision" not in caps:
+        caps.append("vision")
+    return label, flags, caps
+
+
+def _infer_mode(total_bytes: int, on_host_count: int) -> str:
+    """Solo unless the weights are present on both Sparks or too big for one."""
+    if on_host_count >= 2 or total_bytes > SINGLE_SPARK_BYTES:
+        return "cluster"
+    return "solo"
+
+
+def infer_recipe(repo: str, config: dict, total_bytes: int, on_host_count: int) -> dict:
+    """Propose a launch recipe for a discovered model — prefills the setup form."""
+    label, flags, caps = _detect_family(config or {})
+    mode = _infer_mode(total_bytes, on_host_count)
+    vllm_args = list(flags)
+    vllm_args.append("--max-num-batched-tokens=16384")
+    knobs = dict(_COMMON_KNOBS)
+    if mode == "cluster":
+        # Large models shard across both Sparks via Ray; leave more headroom.
+        vllm_args += ["-tp=2", "--distributed-executor-backend=ray"]
+        knobs["gpu_memory_utilization"] = 0.7
+    return {
+        "key": repo_to_key(repo),
+        "repo": repo,
+        "display_name": repo.split("/")[-1],
+        "mode": mode,
+        "capabilities": caps,
+        "vllm_args": vllm_args,
+        "knobs": knobs,
+        "family": label,
+    }
+
+
+def _menu_entry_from_recipe(m, *, on_disk: bool, total_bytes: int, per_host: list[dict]) -> dict:
+    d = m.model_dump()
+    d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
+    d["needs_setup"] = False
+    d["on_disk"] = on_disk
+    d["total_bytes"] = total_bytes
+    d["per_host"] = per_host
+    return d
+
+
+async def build_menu(settings: Settings, catalog) -> dict[str, dict]:
+    """The disk-driven model menu: every completed model on the Sparks, annotated
+    with its launch recipe (matched by repo) or flagged `needs_setup` if none.
+
+    Two SSH scans total (one per Spark), run in parallel — much cheaper than the
+    old per-recipe disk probe. A host that errors is skipped, not fatal."""
+    hosts = [(settings.spark1_host, settings.spark1_user)]
+    if settings.spark2_host:
+        hosts.append((settings.spark2_host, settings.spark2_user))
+    scans = await asyncio.gather(
+        *(list_cached_models(h, u, settings) for h, u in hosts),
+        return_exceptions=True,
+    )
+    by_repo: dict[str, dict] = {}
+    for (h, _u), res in zip(hosts, scans):
+        if isinstance(res, Exception):
+            continue
+        for repo, size, complete in res:
+            e = by_repo.setdefault(repo, {"total_bytes": 0, "per_host": [], "complete": False})
+            e["total_bytes"] += size
+            e["per_host"].append({"host": h, "size_bytes": size})
+            e["complete"] = e["complete"] or complete
+
+    recipe_by_repo = {m.repo: (k, m) for k, m in catalog.models.items() if m.repo}
+
+    menu: dict[str, dict] = {}
+    for repo, info in by_repo.items():
+        # Skip half-fetched / corrupt caches (no finished snapshot) — they'd show
+        # as broken cards. In-flight downloads surface in the download panel.
+        if not info["complete"]:
+            continue
+        if repo in recipe_by_repo:
+            key, m = recipe_by_repo[repo]
+            menu[key] = _menu_entry_from_recipe(
+                m, on_disk=True, total_bytes=info["total_bytes"], per_host=info["per_host"]
+            )
+        else:
+            key = repo_to_key(repo)
+            menu[key] = {
+                "display_name": repo.split("/")[-1],
+                "repo": repo,
+                "local_path": None,
+                "size_gb": round(info["total_bytes"] / 1e9, 1),
+                "mode": _infer_mode(info["total_bytes"], len(info["per_host"])),
+                "capabilities": [],
+                "expected_ready_seconds": 300,
+                "vllm_args": [],
+                "description": None,
+                "knobs": None,
+                "custom": False,
+                "needs_setup": True,
+                "effective_knobs": {},
+                "on_disk": True,
+                "total_bytes": info["total_bytes"],
+                "per_host": info["per_host"],
+            }
+
+    # Local/fine-tuned recipes live as a directory, not an HF cache entry — probe
+    # each by path and include it if present. Their keys are unique catalog keys
+    # (and local models carry repo="" per ModelDef), so they never collide with a
+    # discovered repo's slug or an HF recipe key above.
+    for key, m in catalog.models.items():
+        if not m.local_path:
+            continue
+        st = await probe_disk(m.repo, m.mode, settings, local_path=m.local_path)
+        if not st.on_disk:
+            continue
+        menu[key] = _menu_entry_from_recipe(
+            m,
+            on_disk=True,
+            total_bytes=st.total_bytes,
+            per_host=[{"host": r.host, "size_bytes": r.size_bytes} for r in st.per_host if r.on_disk],
+        )
+
+    return menu
@@ -10,6 +10,7 @@ model or one tied to an in-flight swap/download.
 """
 from __future__ import annotations
 import asyncio
+import json
 import re
 from dataclasses import dataclass
 from typing import Optional
@@ -36,6 +37,87 @@ def repo_to_cache_dirname(repo: str) -> str:
    return dn


+def cache_dirname_to_repo(dirname: str) -> Optional[str]:
+    """Inverse of `repo_to_cache_dirname`: 'models--org--name' -> 'org/name'.
+
+    A repo has exactly one '/', so the org is the first '--'-segment and the name
+    is everything after (names may themselves contain single dashes). Returns
+    None for anything that isn't a model cache dir."""
+    if not dirname.startswith("models--"):
+        return None
+    parts = dirname[len("models--"):].split("--")
+    if len(parts) < 2 or not parts[0] or not parts[1]:
+        return None
+    return f"{parts[0]}/{'--'.join(parts[1:])}"
+
+
+def parse_cache_listing(out: str) -> list[tuple[str, int, bool]]:
+    """Parse the 'size|complete|dirname' lines from `list_cached_models`'s scan.
+
+    Returns [(repo, size_bytes, complete), ...], skipping non-model lines. Pure
+    function so the parsing is unit-testable without SSH."""
+    items: list[tuple[str, int, bool]] = []
+    for line in out.splitlines():
+        line = line.strip()
+        if line.count("|") < 2:
+            continue
+        size_s, complete_s, dirname = line.split("|", 2)
+        repo = cache_dirname_to_repo(dirname.strip())
+        if not repo:
+            continue
+        try:
+            size = int(size_s)
+        except ValueError:
+            size = 0
+        items.append((repo, size, complete_s.strip() == "1"))
+    return items
+
+
+async def list_cached_models(host: str, user: str, settings: Settings) -> list[tuple[str, int, bool]]:
+    """Enumerate every Hugging Face model cached on a host: (repo, size_bytes, complete).
+
+    'complete' = the cache has at least one snapshot carrying a config.json (a
+    finished download, not a half-fetched/corrupt dir). One SSH round-trip; the
+    glob's no-match case is handled by the `[ -d ]` guard."""
+    if not host or not user:
+        return []
+    cmd = (
+        'HUB="$HOME/.cache/huggingface/hub"; '
+        'for d in "$HUB"/models--*; do '
+        '[ -d "$d" ] || continue; '
+        'n=$(basename "$d"); '
+        'sz=$(du -sb "$d" 2>/dev/null | cut -f1); sz=${sz:-0}; '
+        'if ls "$d"/snapshots/*/config.json >/dev/null 2>&1; then c=1; else c=0; fi; '
+        'echo "${sz}|${c}|${n}"; '
+        'done'
+    )
+    rc, out, err = await ssh_run(host, user, cmd, settings, timeout=30.0)
+    if rc != 0:
+        return []
+    return parse_cache_listing(out)
+
+
+async def read_model_config(host: str, user: str, repo: str, settings: Settings) -> Optional[dict]:
+    """Read a cached model's config.json (first snapshot) for launch inference.
+
+    Returns the parsed dict, or None if absent/unreadable. The dirname is
+    whitelisted (repo_to_cache_dirname) so it's safe to embed unquoted."""
+    if not host or not user:
+        return None
+    dn = repo_to_cache_dirname(repo)
+    cmd = (
+        f'D=$(ls -d "$HOME/.cache/huggingface/hub/{dn}/snapshots/"*/ 2>/dev/null | head -1); '
+        f'[ -n "$D" ] && cat "${{D}}config.json" 2>/dev/null'
+    )
+    rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
+    if rc != 0 or not out.strip():
+        return None
+    try:
+        return json.loads(out)
+    except (ValueError, TypeError):
+        return None
+
+
@dataclass
 class HostDiskResult:
    host: str
@@ -159,10 +241,14 @@ async def delete_host(host: str, user: str, repo: str, settings: Settings) -> Ho
    return HostDiskResult(host=host, on_disk=False, size_bytes=freed)


-async def delete_from_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
-    """rm -rf the model's cache dir on the relevant Sparks. Idempotent."""
+async def delete_from_disk(repo: str, settings: Settings) -> DiskStatus:
+    """rm -rf the model's cache dir on ALL configured Sparks. Idempotent.
+
+    We sweep both Sparks regardless of the model's declared mode: a 'remove from
+    disk & menu' must leave nothing behind, and rm of an absent dir reports 0
+    bytes freed (FREED 0), so an extra host is harmless."""
    hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
-    if mode == "cluster" and settings.spark2_host:
+    if settings.spark2_host:
        hosts.append((settings.spark2_host, settings.spark2_user))

    results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
@@ -23,6 +23,20 @@ from .ssh import ssh_stream, StreamHandle
 Mode = Literal["spark1", "spark2", "cluster"]


+def build_download_command(repo: str, flags: str = "") -> str:
+    """Remote shell command that drives hf-download.sh on a Spark.
+
+    Prepends ~/.local/bin to PATH. hf-download.sh shells out to `uvx` (Astral's
+    uv), and the official uv installer drops its binaries in ~/.local/bin — but
+    our SSH session is non-interactive, so it never sources the user's profile
+    and ~/.local/bin is off PATH, leaving `uvx` as "command not found". $HOME
+    expands server-side, so this stays correct for any adopter/user. `repo` is
+    shlex-quoted at the sink (validate_repo gates the charset upstream).
+    """
+    serve = f"./hf-download.sh {quote_arg(repo)} {flags}".strip()
+    return f'export PATH="$HOME/.local/bin:$PATH" && cd ~/spark-vllm-docker && {serve}'
+
+
 _TQDM_RE = re.compile(
    r"(\d+(?:\.\d+)?)\s*%\s*\|.*?\|\s*"
    r"([\d.]+[KMG]?B?)\s*/\s*([\d.]+[KMG]?B?)\s*"
@@ -126,7 +140,7 @@ class DownloadManager:
        if not target_host or not target_user:
            raise RuntimeError(f"{job.mode} host not configured")

-        cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {quote_arg(job.repo)} {flags}".strip()
+        cmd = build_download_command(job.repo, flags)
        job.append(f"$ {cmd}")
        job.state = "downloading"
        job.progress.phase = "Connecting to Hugging Face…"
@@ -1,6 +1,7 @@
 from __future__ import annotations
 import asyncio
 import json
+import os
 from pathlib import Path

 from fastapi import FastAPI, HTTPException, Query, Request
@@ -9,13 +10,15 @@ from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel, ValidationError
 from typing import Literal

+from . import app_settings
 from .config import Settings
 from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
 from .coordination import LockHeld, ScheduleRegistry, SwapLockManager, WebhookNotifier, valid_schedule_id
 from .custom_services import add_custom_service, delete_custom_service
 from .audio_proxy import build_router as build_audio_router
 from .deep_health import DeepHealth
-from .disk import delete_from_disk, probe_disk
+from .discovery import build_menu, infer_recipe, repo_to_key
+from .disk import delete_from_disk, probe_host, read_model_config
 from .download import DownloadManager
 from .llm_proxy import build_router as build_llm_router
 from .embeddings_proxy import build_router as build_embeddings_router
@@ -25,7 +28,7 @@ from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings,
 from .matrix_bridge import MatrixBridgeManager
 from .models import ModelDef, load_catalog
 from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
-from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
+from .overrides import add_custom, delete_custom, load_overrides, set_knobs
 from .services import docker_state, run_action, services_from_settings
 from .shellsafe import validate_container, validate_image, validate_repo
 from .speech_models import SpeechModelsManager
@@ -36,6 +39,10 @@ from .validate import validate_launch
 from .wol import send_local_broadcast, send_via_peer


+# One-time migration: seed the in-app settings overlay from env (values set via
+# the StartOS action on a pre-gear install) before building Settings, so nothing
+# is lost on upgrade. No-op once the overlay exists. See app_settings.
+app_settings.seed_from_env(os.environ)
 settings = Settings.from_env()
 catalog = load_catalog(settings.models_yaml)
 # Coordination layer (GPU arbiter): swap-lifecycle webhook, the swap reservation
@@ -155,26 +162,100 @@ async def get_config() -> dict:
    }


+# ---- In-app settings ('gear') ----
+# The optional cluster knobs (ports, container names, support-service hosts,
+# integrations) live in an app-owned overlay on /data, edited here instead of in
+# the StartOS action — which keeps to just the four required setup fields. See
+# app_settings. Writes apply live: we rewrite the overlay then reload the shared
+# Settings instance in place, so every router/manager holding the reference picks
+# up the change with no container restart.
+@app.get("/api/settings")
+async def get_settings() -> dict:
+    return app_settings.public_view()
+
+
+class SettingsUpdate(BaseModel):
+    values: dict[str, str]
+
+
+@app.post("/api/settings")
+async def post_settings(req: SettingsUpdate) -> dict:
+    try:
+        app_settings.apply(req.values)
+    except app_settings.SettingsError as e:
+        raise HTTPException(422, str(e))
+    settings.reload()
+    # WebhookNotifier snapshots url/secret (not the Settings object), so reload()
+    # can't reach it — re-point it explicitly so a webhook edit applies live too.
+    swap_webhook.update(settings.swap_webhook_url, settings.swap_webhook_secret)
+    return app_settings.public_view()
+
+
 def _reload_catalog() -> None:
    global catalog
    catalog = load_catalog(settings.models_yaml)
    swap_manager.reload_catalog(catalog)


+def _recipe_summaries() -> list[dict]:
+    """Known launch recipes (bundled + saved), for the download panel's autocomplete.
+
+    These are NOT the menu — the menu is what's on disk. This is just the set of
+    repos Spark Control already knows how to launch, so the download box can
+    suggest them by name without putting phantom cards on the dashboard."""
+    out = []
+    for m in catalog.models.values():
+        if m.repo:
+            out.append({"repo": m.repo, "display_name": m.display_name, "mode": m.mode})
+    return out
+
+
@app.get("/api/models")
 async def get_models() -> dict:
-    out_models: dict[str, dict] = {}
-    for key, m in catalog.models.items():
-        d = m.model_dump()
-        # Always include effective knobs for the UI (defaults from base args + any overrides)
-        d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
-        out_models[key] = d
+    """The model menu = what's actually downloaded on the Sparks (one scan per
+    Spark), each annotated with its launch recipe or flagged `needs_setup`.
+
+    Does SSH, so it's the slower of the model endpoints; the front-end calls it on
+    load, after a swap/download/delete, and on a slow timer — not every poll."""
+    if not settings.configured:
+        return {"configured": False, "defaults": catalog.defaults.model_dump(), "models": {}, "recipes": []}
+    menu = await build_menu(settings, catalog)
    return {
+        "configured": True,
        "defaults": catalog.defaults.model_dump(),
-        "models": out_models,
+        "models": menu,
+        "recipes": _recipe_summaries(),
    }


+@app.get("/api/models/suggest")
+async def suggest_model(repo: str = Query(...)) -> dict:
+    """Read a downloaded model's config.json + size and propose a launch recipe.
+
+    Prefills the 'set up this model' form for an on-disk model that has no recipe
+    yet. The operator confirms/edits, then POSTs it to /api/models to save."""
+    if not settings.configured:
+        raise HTTPException(503, "spark1 not configured")
+    try:
+        validate_repo(repo)
+    except ValueError as e:
+        raise HTTPException(400, str(e))
+    hosts = [(settings.spark1_host, settings.spark1_user)]
+    if settings.spark2_host:
+        hosts.append((settings.spark2_host, settings.spark2_user))
+    # Config from whichever Spark has it; size summed across the Sparks that do.
+    sizes = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
+    total = sum(r.size_bytes for r in sizes if r.on_disk)
+    on_hosts = sum(1 for r in sizes if r.on_disk)
+    config = None
+    for (h, u), r in zip(hosts, sizes):
+        if r.on_disk:
+            config = await read_model_config(h, u, repo, settings)
+            if config is not None:
+                break
+    return infer_recipe(repo, config or {}, total, on_hosts)
+
+
 class KnobsBody(BaseModel):
    knobs: dict

@@ -238,71 +319,43 @@ async def del_model(key: str) -> dict:
    return {"ok": True, "key": key}


-@app.get("/api/models/disk-status")
-async def get_models_disk_status() -> dict:
-    """Probe each catalog model's HF cache on the appropriate Spark(s) in parallel.
-
-    Result is keyed by model key: {on_disk, total_bytes, per_host:[{host,on_disk,size_bytes,error?}]}.
-    Designed to be called once on dashboard load; takes ~1–3s depending on Spark count.
-    """
-    if not settings.configured:
-        return {"configured": False, "models": {}}
-    keys = list(catalog.models.keys())
-    statuses = await asyncio.gather(*(
-        probe_disk(
-            catalog.models[k].repo,
-            catalog.models[k].mode,
-            settings,
-            local_path=catalog.models[k].local_path,
-        )
-        for k in keys
-    ), return_exceptions=True)
-    out: dict[str, dict] = {}
-    for k, s in zip(keys, statuses):
-        if isinstance(s, Exception):
-            out[k] = {"on_disk": False, "total_bytes": 0, "per_host": [], "error": str(s)}
-            continue
-        out[k] = {
-            "on_disk": s.on_disk,
-            "total_bytes": s.total_bytes,
-            "per_host": [
-                {"host": r.host, "on_disk": r.on_disk, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
-                for r in s.per_host
-            ],
-        }
-    return {"configured": True, "models": out}
-
-
@app.delete("/api/models/{key}/disk")
 async def del_model_disk(key: str) -> dict:
-    """Delete a model's weights from the Spark filesystem(s). The catalog entry stays.
+    """Remove a model's weights from the Sparks — and thus from the menu, since the
+    menu IS the disk. Resolves the key against the live menu, so a discovered
+    model (no saved recipe) is deletable too.

    Safety rails:
+      - Refuses a local/fine-tuned directory (hand-placed, not re-downloadable).
      - Refuses if the model is currently loaded on vLLM.
-      - Refuses if a swap or download is in flight.
-      - Idempotent: if the cache dir is already gone on a host, that host reports 0 bytes freed.
+      - Refuses if a swap or this model's own download is in flight.
+      - Idempotent across both Sparks: an already-absent cache dir frees 0 bytes.
    """
-    if key not in catalog.models:
+    if not settings.configured:
+        raise HTTPException(503, "spark1 not configured")
+    menu = await build_menu(settings, catalog)
+    entry = menu.get(key)
+    if entry is None:
        raise HTTPException(404, f"unknown model: {key}")
-    m = catalog.models[key]

    # Never rm a local fine-tune directory from the dashboard — it's irreplaceable
    # training output the user placed by hand, not a re-downloadable HF cache.
-    if m.local_path:
+    if entry.get("local_path"):
        raise HTTPException(
            400,
            "this is a local model; its directory must be managed on the Spark, not deleted from here",
        )
+    repo = entry["repo"]

    # Refuse if currently loaded
    try:
        vllm = await check_vllm(settings)
    except Exception:
        vllm = {}
-    if vllm.get("ok") and vllm.get("current_model") == m.repo:
+    if vllm.get("ok") and vllm.get("current_model") == repo:
        raise HTTPException(
            409,
-            f"'{m.display_name}' is the currently loaded model. Switch to a different model first, then try again."
+            f"'{entry['display_name']}' is the currently loaded model. Switch to a different model first, then try again."
        )

    # Refuse if a swap is in flight
@@ -312,10 +365,10 @@ async def del_model_disk(key: str) -> dict:
    # Refuse if a download is in flight for this same repo (a different model's download is fine)
    if download_manager.current_job_id:
        job = download_manager.get(download_manager.current_job_id)
-        if job and job.repo == m.repo:
+        if job and job.repo == repo:
            raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")

-    status = await delete_from_disk(m.repo, m.mode, settings)
+    status = await delete_from_disk(repo, settings)
    # Audit log
    record_report(
        f"disk:{key}",
@@ -326,7 +379,7 @@ async def del_model_disk(key: str) -> dict:
    return {
        "ok": True,
        "key": key,
-        "repo": m.repo,
+        "repo": repo,
        "bytes_freed": status.total_bytes,
        "per_host": [
            {"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
@@ -881,10 +934,13 @@ async def get_status() -> dict:
 def _identify_current_model(repo: str | None) -> str | None:
    if not repo:
        return None
+    # A recipe-backed model keys by its recipe key; a discovered model (loaded but
+    # not yet set up) keys by the same slug build_menu uses, so it still
+    # highlights as the active card.
    for key, m in catalog.models.items():
        if m.repo == repo:
            return key
-    return None
+    return repo_to_key(repo)


 class SwapRequest(BaseModel):
@@ -926,6 +982,56 @@ async def post_swap(req: SwapRequest, request: Request) -> dict:
    return {"job_id": job.id, "model_key": job.model_key, "state": job.state}


+# ---- Swap reservation lock (the GPU arbiter) ----
+# ROUTE ORDER IS LOAD-BEARING: these static `/api/swap/lock` routes MUST be
+# registered before the parametric `/api/swap/{job_id}` below. FastAPI matches in
+# registration order, so if `{job_id}` came first, GET /api/swap/lock would bind
+# job_id="lock", look up a (non-existent) swap job, and 404 — which is exactly
+# the bug this ordering fixes. Keep these above the {job_id} routes.
+# CSRF: these are control-surface, not browser-exempt — an external scheduler is
+# a non-browser client (no Origin header) so it passes the guard already, the
+# same way it calls /api/swap; the dashboard is same-origin.
+class LockAcquireRequest(BaseModel):
+    holder: str
+    ttl_seconds: int | None = None
+    note: str = ""
+    token: str | None = None   # present only to extend an existing hold
+
+
+@app.post("/api/swap/lock")
+async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
+    """Reserve the GPU swap path. Returns a secret token used to swap (header
+    X-Swap-Lock-Token) and to release. 409 if held by another holder."""
+    try:
+        lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
+    except ValueError as e:
+        raise HTTPException(422, str(e))
+    except LockHeld as e:
+        raise HTTPException(status_code=409, detail={
+            "error": "swap lock is held by another holder",
+            "lock": e.state,
+        })
+    return {**swap_lock.status(), "token": lock.token}
+
+
+@app.get("/api/swap/lock")
+async def get_swap_lock() -> dict:
+    """Public, token-free view of the reservation: held? who? until when?"""
+    return swap_lock.status()
+
+
+@app.delete("/api/swap/lock")
+async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
+    """Release the reservation. Needs the matching X-Swap-Lock-Token unless
+    ?force=true (the human override from the dashboard)."""
+    token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
+    try:
+        released = swap_lock.release(token, force=force)
+    except PermissionError as e:
+        raise HTTPException(403, str(e))
+    return {"released": released, **swap_lock.status()}
+
+
@app.get("/api/swap/{job_id}")
 async def get_swap(job_id: str) -> dict:
    job = swap_manager.get(job_id)
@@ -971,52 +1077,10 @@ async def stream_swap(job_id: str):
    return StreamingResponse(gen(), media_type="text/event-stream")


-# ---- Coordination layer: swap lock + schedule registry ----
-# Endpoints are control-surface, not browser-exempt: an external scheduler is a
-# non-browser client (no Origin header) so it passes the CSRF guard already, the
-# same way it calls /api/swap today; the dashboard is same-origin.
-
-class LockAcquireRequest(BaseModel):
-    holder: str
-    ttl_seconds: int | None = None
-    note: str = ""
-    token: str | None = None   # present only to extend an existing hold
-
-
-@app.post("/api/swap/lock")
-async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
-    """Reserve the GPU swap path. Returns a secret token used to swap (header
-    X-Swap-Lock-Token) and to release. 409 if held by another holder."""
-    try:
-        lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
-    except ValueError as e:
-        raise HTTPException(422, str(e))
-    except LockHeld as e:
-        raise HTTPException(status_code=409, detail={
-            "error": "swap lock is held by another holder",
-            "lock": e.state,
-        })
-    return {**swap_lock.status(), "token": lock.token}
-
-
-@app.get("/api/swap/lock")
-async def get_swap_lock() -> dict:
-    """Public, token-free view of the reservation: held? who? until when?"""
-    return swap_lock.status()
-
-
-@app.delete("/api/swap/lock")
-async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
-    """Release the reservation. Needs the matching X-Swap-Lock-Token unless
-    ?force=true (the human override from the dashboard)."""
-    token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
-    try:
-        released = swap_lock.release(token, force=force)
-    except PermissionError as e:
-        raise HTTPException(403, str(e))
-    return {"released": released, **swap_lock.status()}
-
-
+# ---- Coordination layer: read-only schedule registry ----
+# (The swap reservation lock lives above, next to the swap routes.) Same CSRF
+# posture: control-surface, not browser-exempt — external schedulers send no
+# Origin header so they pass the guard; the dashboard is same-origin.
 class ScheduleRequest(BaseModel):
    name: str
    id: str | None = None
@@ -19,8 +19,8 @@ const state = {
  configured: true,
  timer_handle: null,
  deep_health: {},
-  disk_status: {},         // keyed by model key: { on_disk, total_bytes, per_host }
-  disk_status_loaded: false,
+  models_loaded: false,    // true once the first disk scan (/api/models) returns
+  recipes: [],             // known launch recipes (for the download autocomplete)
  lock: { held: false },   // GPU swap reservation (coordination layer)
  schedules: [],           // schedules external automation has registered
 };
@@ -65,67 +65,69 @@ function renderCards() {
  const lockTip = locked
    ? `Reserved by ${state.lock.holder || 'automation'}${state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : ''}`
    : '';
-  for (const key of Object.keys(state.models)) {
+  const keys = Object.keys(state.models);
+  if (keys.length === 0) {
+    // The menu is the disk: nothing downloaded (or the scan hasn't returned yet).
+    root.innerHTML = state.models_loaded
+      ? `<div class="empty-menu muted">No models downloaded on the Sparks yet. Use <strong>+ Download a new model</strong> above to fetch one — it'll appear here when it's done.</div>`
+      : `<div class="empty-menu muted">Scanning the Sparks for downloaded models…</div>`;
+    return;
+  }
+  for (const key of keys) {
    const m = state.models[key];
    const isActive = key === state.current_model_key;
    const card = document.createElement('div');
-    card.className = 'card' + (isActive ? ' active' : '');
+    card.className = 'card' + (isActive ? ' active' : '') + (m.needs_setup ? ' needs-setup' : '');
    const desc = m.description
      ? `<div class="desc">${escapeHtml(m.description)}</div>`
      : '';
    const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
    const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
-    // Disk-presence pill + trash button. Until /api/models/disk-status comes back,
-    // we don't know — render a neutral placeholder.
-    const disk = state.disk_status[key];
-    let diskPill = '';
-    if (state.disk_status_loaded) {
-      if (disk && disk.on_disk) {
-        const gb = (disk.total_bytes / 1e9);
-        diskPill = `<span class="tag on-disk" title="Weights present on disk">on disk · ${gb.toFixed(1)} GB</span>`;
-      } else {
-        diskPill = `<span class="tag not-on-disk" title="Weights not downloaded">not downloaded</span>`;
-      }
-    }
-    // Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
+    // Every card on the menu is on disk by definition — show its real size.
+    const gb = (m.total_bytes || 0) / 1e9;
+    const diskPill = gb > 0
+      ? `<span class="tag on-disk" title="Weights present on the Spark(s)">on disk · ${gb.toFixed(1)} GB</span>`
+      : '';
+    const setupPill = m.needs_setup
+      ? `<span class="tag setup-pill" title="On disk, but Spark Control hasn't been told how to launch it">needs setup</span>`
+      : '';
+    // Trash = remove weights from disk AND from the menu. Disabled if active / mid-swap.
    // Never offered for local models: their directory is hand-placed training output,
    // not a re-downloadable HF cache (the server refuses the delete too).
    let trashBtn = '';
-    if (state.disk_status_loaded && disk && disk.on_disk && !m.local_path) {
+    if (!m.local_path) {
      const disabled = isActive || isSwapping;
      const tip = isActive
        ? 'Currently loaded — switch to another model first'
        : isSwapping
        ? 'A swap is in progress'
-        : 'Delete weights from disk';
-      trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Delete from disk" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
+        : 'Remove weights from disk & menu';
+      trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Remove from disk and menu" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
    }
-    // Primary card action: "Switch to this" (green) when on disk; "Download" (blue) when not.
-    // Before disk-status loads we render the swap button as a sensible default.
-    const isOnDisk = !state.disk_status_loaded || (disk && disk.on_disk);
-    const dlInFlight = !!(typeof dlState !== 'undefined' && dlState && dlState.job_id);
+    // Primary action: "Current" / "Switch to this", or "Set up & switch" for a
+    // model on disk that has no launch recipe yet.
+    const swapBlocked = isSwapping || locked;
+    const lockTipAttr = locked ? ` title="${escapeHtml(lockTip)}"` : '';
    let primaryBtn = '';
    if (isActive) {
      primaryBtn = `<button class="btn" disabled>Current</button>`;
-    } else if (isOnDisk) {
-      const swapBlocked = isSwapping || locked;
-      const tip = locked ? ` title="${escapeHtml(lockTip)}"` : '';
-      primaryBtn = `<button class="btn primary" data-swap-key="${key}"${tip} ${swapBlocked ? 'disabled' : ''}>Switch to this</button>`;
-    } else if (m.local_path) {
-      // A local model can't be "downloaded" — its directory has to exist on the Spark.
-      primaryBtn = `<button class="btn" disabled title="Directory not found on the Spark — create it there, then refresh">Not found on Spark</button>`;
+    } else if (m.needs_setup) {
+      primaryBtn = `<button class="btn primary" data-setup-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Set up &amp; switch</button>`;
    } else {
-      const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
-      primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
+      primaryBtn = `<button class="btn primary" data-swap-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Switch to this</button>`;
    }
+    // The Test/Advanced controls need a saved recipe; hide them until setup is done.
+    const recipeActions = m.needs_setup ? '' : `
+        <button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
+        <button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>`;
    card.innerHTML = `
      <div class="name">${escapeHtml(m.display_name)}</div>
      <div class="meta">
        <span class="tag mode-${m.mode}">${m.mode}</span>
-        <span class="tag">${m.size_gb} GB</span>
+        ${diskPill}
+        ${setupPill}
        ${customPill}
        ${localPill}
-        ${diskPill}
        ${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
      </div>
      ${desc}
@@ -136,9 +138,7 @@ function renderCards() {
      </div>
      <div class="spacer"></div>
      <div class="card-actions">
-        ${primaryBtn}
-        <button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
-        <button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>
+        ${primaryBtn}${recipeActions}
        ${trashBtn}
      </div>
      <div class="test-result hidden" data-test-result-for="${key}"></div>
@@ -148,8 +148,8 @@ function renderCards() {
  for (const btn of root.querySelectorAll('[data-swap-key]')) {
    btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
  }
-  for (const btn of root.querySelectorAll('[data-download-key]')) {
-    btn.addEventListener('click', () => triggerDownloadForKey(btn.dataset.downloadKey));
+  for (const btn of root.querySelectorAll('[data-setup-key]')) {
+    btn.addEventListener('click', () => openSetupForKey(btn.dataset.setupKey));
  }
  for (const btn of root.querySelectorAll('[data-adv-key]')) {
    btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
@@ -1170,24 +1170,44 @@ async function pollStatus() {
  }
 }

+let menuLoadInFlight = false;
+
 async function loadModels() {
+  // The menu is whatever's downloaded on the Sparks — /api/models does the scan
+  // (SSH), so this is the slower model call. Best-effort: a transient failure
+  // leaves the previous menu in place rather than blanking the dashboard.
+  // Guard against overlap: init() fires this un-awaited and pollStatus()'s
+  // empty-menu fallback may call it again before the scan returns.
+  if (menuLoadInFlight) return;
+  menuLoadInFlight = true;
+  try {
    const data = await fetchJSON('/api/models');
    state.defaults = data.defaults || {};
    state.models = data.models || {};
+    state.recipes = data.recipes || [];
+    state.models_loaded = true;
+    populateDownloadSuggestions();
+    renderCards();
+  } catch (e) {
+    console.warn('model menu load failed:', e.message);
+  } finally {
+    menuLoadInFlight = false;
+  }
 }

-async function loadDiskStatus() {
-  // Probes each catalog model's HF cache over SSH; takes a beat. Best-effort.
-  try {
-    const r = await fetchJSON('/api/models/disk-status');
-    if (r && r.models) {
-      state.disk_status = r.models;
-      state.disk_status_loaded = true;
-      renderCards();
-    }
-  } catch (e) {
-    // Silent — pills just won't render. Don't block dashboard.
-    console.warn('disk-status probe failed:', e.message);
+// Populate the download box's autocomplete with known recipes not currently on
+// disk — so common/bundled models stay discoverable without phantom menu cards.
+function populateDownloadSuggestions() {
+  const dl = el('#dl-suggestions');
+  if (!dl) return;
+  const onDiskRepos = new Set(Object.values(state.models).map(m => m.repo).filter(Boolean));
+  dl.innerHTML = '';
+  for (const r of state.recipes || []) {
+    if (onDiskRepos.has(r.repo)) continue;
+    const opt = document.createElement('option');
+    opt.value = r.repo;
+    opt.label = `${r.display_name} (${r.mode})`;
+    dl.appendChild(opt);
  }
 }

@@ -1201,14 +1221,12 @@ function fmtBytesShort(n) {

 function openDiskDeleteDialog(key) {
  const m = state.models[key];
-  const disk = state.disk_status[key];
-  if (!m || !disk || !disk.on_disk) return;
+  if (!m || !m.on_disk) return;
  const dlg = el('#disk-delete-dialog');
-  el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(disk.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from disk.`;
+  el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(m.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from the Sparks. This also takes it off the menu.`;
  const hostsEl = el('#dd-hosts');
  hostsEl.innerHTML = '';
-  for (const h of (disk.per_host || [])) {
-    if (!h.on_disk) continue;
+  for (const h of (m.per_host || [])) {
    const li = document.createElement('li');
    li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
    hostsEl.appendChild(li);
@@ -1227,20 +1245,19 @@ function openDiskDeleteDialog(key) {
    try {
      const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
      dlg.close();
-      // Optimistically clear local disk state for this key, then refresh.
-      delete state.disk_status[key];
+      // Optimistically drop the card, then re-scan the menu (it's gone from disk).
+      delete state.models[key];
      renderCards();
-      // Eagerly re-probe so size is accurate (and shows "not downloaded" pill).
-      loadDiskStatus();
+      await loadModels();
      const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
-      console.log(`Deleted ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
+      console.log(`Removed ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
    } catch (e) {
      errEl.textContent = e.message || 'Delete failed';
      errEl.classList.remove('hidden');
    } finally {
      confirm.disabled = false;
      cancel.disabled = false;
-      confirm.textContent = 'Delete from disk';
+      confirm.textContent = 'Remove from disk & menu';
    }
  };
  cancel.onclick = onCancel;
@@ -1341,38 +1358,6 @@ async function releaseLock() {
  pollCoordination();
 }

-async function triggerDownloadForKey(modelKey) {
-  const m = state.models[modelKey];
-  if (!m) return;
-  if (dlState.job_id) {
-    alert('A download is already in progress; wait for it to finish.');
-    return;
-  }
-  // Pick the download target from the model's mode:
-  //   solo    -> spark1 only
-  //   cluster -> both Sparks (fetch on Spark 1, rsync to Spark 2 in parallel)
-  const dlMode = m.mode === 'cluster' ? 'cluster' : 'spark1';
-  const sizeNote = m.size_gb ? ` (~${m.size_gb} GB)` : '';
-  const target = m.mode === 'cluster' ? 'both Sparks' : 'Spark 1';
-  if (!confirm(`Download "${m.display_name}"${sizeNote} to ${target}? Large models can take a while; you can watch progress in the download panel.`)) {
-    return;
-  }
-  dlState.last_repo = m.repo;
-  dlState.last_mode = dlMode;
-  try {
-    const r = await fetchJSON('/api/download', {
-      method: 'POST',
-      headers: { 'content-type': 'application/json' },
-      body: JSON.stringify({ repo: m.repo, mode: dlMode }),
-    });
-    // Open the download panel + attach to progress stream
-    openDownloadForm();
-    attachToDownload(r.job_id);
-  } catch (e) {
-    alert('Failed to start download: ' + e.message);
-  }
-}
-
 async function attachToSwap(jobId, needsBackfill) {
  if (state.swap_eventsource) {
    state.swap_eventsource.close();
@@ -1603,12 +1588,14 @@ function handleDownloadDone(d) {
    el('#dl-title').textContent = 'Done';
    el('#dl-phase').textContent = 'Done ✓';
    el('#dl-progress-fill').style.width = '100%';
-    // Offer to add to catalog
+    // The new model now appears on the menu (the menu is the disk). If it matched
+    // a known recipe it's ready to switch to; if not, offer to set it up.
    const repo = dlState.last_repo;
-    const mode = dlState.last_mode;
-    if (repo) {
-      setTimeout(() => openCatalogDialog(repo, mode), 600);
-    }
+    loadModels().then(() => {
+      if (!repo) return;
+      const entry = Object.values(state.models).find(m => m.repo === repo);
+      if (entry && entry.needs_setup) setTimeout(() => openSetupDialog(repo, { thenSwap: false }), 600);
+    });
  }
  dlState.job_id = null;
 }
@@ -1721,21 +1708,67 @@ function openAdvanced(key) {
  dlg.showModal();
 }

-function openCatalogDialog(repo, mode) {
+// Context carried from openSetupDialog -> the submit handler: the inferred
+// launch flags (parsers/MoE backend) and whether to swap right after saving.
+let setupCtx = { key: '', repo: '', vllm_args: [], thenSwap: false };
+
+// "Set up & switch" on a needs-setup card.
+async function openSetupForKey(key) {
+  const m = state.models[key];
+  if (!m) return;
+  if (state.lock && state.lock.held) {
+    const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
+    alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
+    return;
+  }
+  await openSetupDialog(m.repo, { thenSwap: true });
+}
+
+// Open the "set up this model" dialog, prefilled from inference (config.json +
+// size). The operator confirms once; on save the recipe persists and (if
+// thenSwap) we switch to it.
+async function openSetupDialog(repo, opts = {}) {
  const dlg = el('#catalog-dialog');
-  const key = repo.split('/').pop().toLowerCase().replace(/[^a-z0-9_-]/g, '-');
-  el('#cd-key').value = key;
-  el('#cd-name').value = repo.split('/').pop();
+  let sug = null;
+  try {
+    sug = await fetchJSON(`/api/models/suggest?repo=${encodeURIComponent(repo)}`);
+  } catch (e) {
+    console.warn('recipe suggestion failed:', e.message);
+  }
+  const fallbackKey = repo.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
+  setupCtx = {
+    key: (sug && sug.key) || fallbackKey,
+    repo,
+    vllm_args: (sug && sug.vllm_args) || [],
+    thenSwap: !!opts.thenSwap,
+  };
+  el('#cd-key').value = setupCtx.key;
+  el('#cd-name').value = (sug && sug.display_name) || repo.split('/').pop();
  el('#cd-repo').value = repo;
  el('#cd-size').value = '';
-  el('#cd-mode').value = mode || 'solo';
+  el('#cd-mode').value = (sug && sug.mode) || 'solo';
  el('#cd-desc').value = '';
-  el('#cd-mml').value = 32768;
-  el('#cd-gmu').value = 0.85;
-  el('#cd-gmu-out').value = '0.85';
-  el('#cd-fst').checked = true;
-  el('#cd-pcache').checked = true;
-  el('#cd-fp8').checked = true;
+  const knobs = (sug && sug.knobs) || {};
+  el('#cd-mml').value = knobs.max_model_len || 32768;
+  el('#cd-gmu').value = knobs.gpu_memory_utilization || 0.85;
+  el('#cd-gmu-out').value = parseFloat(el('#cd-gmu').value).toFixed(2);
+  el('#cd-fst').checked = knobs.fastsafetensors !== false;
+  el('#cd-pcache').checked = knobs.prefix_caching !== false;
+  el('#cd-fp8').checked = (knobs.kv_cache_dtype || 'fp8') === 'fp8';
+
+  const det = el('#cd-detected');
+  if (det) {
+    if (sug) {
+      const caps = (sug.capabilities || []).join(', ');
+      const flags = setupCtx.vllm_args.length ? `: <code>${escapeHtml(setupCtx.vllm_args.join(' '))}</code>` : '';
+      det.innerHTML = `Detected <strong>${escapeHtml(sug.family || 'Generic')}</strong>${caps ? ` · ${escapeHtml(caps)}` : ''}. Launch flags set automatically${flags}.`;
+    } else {
+      det.textContent = "Couldn't auto-detect this model's settings — pick mode and knobs manually.";
+    }
+    det.classList.remove('hidden');
+  }
+  const submit = el('#cd-submit');
+  if (submit) submit.textContent = setupCtx.thenSwap ? 'Save & switch' : 'Save settings';
  dlg.showModal();
 }

@@ -1745,13 +1778,15 @@ function setupCatalogDialog() {
  el('#catalog-form').addEventListener('submit', async (e) => {
    e.preventDefault();
    const body = {
-      key: el('#cd-key').value.trim(),
+      key: el('#cd-key').value.trim() || setupCtx.key,
      display_name: el('#cd-name').value.trim(),
      repo: el('#cd-repo').value.trim(),
      size_gb: parseFloat(el('#cd-size').value) || 0,
      mode: el('#cd-mode').value,
      description: el('#cd-desc').value.trim() || null,
-      vllm_args: [],
+      // The inferred family flags (parsers / MoE backend); knob-controlled flags
+      // are layered on by the server from `knobs`, so no duplication.
+      vllm_args: setupCtx.vllm_args || [],
      knobs: {
        max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
        gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
@@ -1769,8 +1804,9 @@ function setupCatalogDialog() {
      el('#catalog-dialog').close();
      closeDownloadPanel();
      await loadModels();
+      if (setupCtx.thenSwap) triggerSwap(body.key);
      pollStatus();
-    } catch (e) { alert('Add to catalog failed: ' + e.message); }
+    } catch (e) { alert('Saving the model setup failed: ' + e.message); }
  });
 }

@@ -2156,8 +2192,104 @@ function handleUpdateDone(d) {
  setTimeout(pollUpdates, 2000);
 }

+// ===================== settings ('gear') =====================
+// Renders the optional cluster knobs from /api/settings (server-driven field
+// list, so adding a knob server-side needs no JS change) and POSTs edits back.
+// The server reloads its config in place, so changes take effect immediately.
+
+let settingsClearSentinel = '__clear__';
+
+function renderSettingsForm(data) {
+  settingsClearSentinel = data.clear_sentinel || settingsClearSentinel;
+  const body = el('#settings-body');
+  body.innerHTML = (data.groups || []).map((g) => {
+    const rows = g.fields.map((f) => {
+      const help = f.help ? `<span class="muted small settings-help">${escapeHtml(f.help)}</span>` : '';
+      let input;
+      let clearToggle = '';
+      if (f.type === 'secret') {
+        const ph = f.set ? 'set — leave blank to keep' : (f.placeholder || '');
+        input = `<input type="password" autocomplete="off" data-key="${f.key}" data-secret="1" placeholder="${escapeHtml(ph)}">`;
+        // A stored secret is never echoed back, so blank means "keep". Offer an
+        // explicit way to remove it.
+        if (f.set) clearToggle = `<label class="settings-clear muted small"><input type="checkbox" data-clear-for="${f.key}"> clear stored value</label>`;
+      } else if (f.type === 'int') {
+        input = `<input type="number" min="1" max="65535" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
+      } else {
+        input = `<input type="text" autocomplete="off" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
+      }
+      return `<div class="settings-field"><label class="modal-row"><span>${escapeHtml(f.label)}</span>${input}</label>${clearToggle}${help}</div>`;
+    }).join('');
+    return `<fieldset class="modal-fieldset"><legend>${escapeHtml(g.name)}</legend>${rows}</fieldset>`;
+  }).join('');
+}
+
+async function openSettingsDialog() {
+  const dlg = el('#settings-dialog');
+  const err = el('#settings-error');
+  err.classList.add('hidden');
+  el('#settings-body').innerHTML = '<p class="muted small">Loading…</p>';
+  dlg.showModal();
+  try {
+    renderSettingsForm(await fetchJSON('/api/settings'));
+  } catch (e) {
+    el('#settings-body').innerHTML = '';
+    err.textContent = 'Could not load settings: ' + e.message;
+    err.classList.remove('hidden');
+  }
+}
+
+async function saveSettings(e) {
+  e.preventDefault();
+  const err = el('#settings-error');
+  err.classList.add('hidden');
+  const values = {};
+  $$('#settings-body [data-key]').forEach((inp) => {
+    const key = inp.dataset.key;
+    const v = inp.value.trim();
+    if (inp.dataset.secret) {
+      // "clear" checkbox wins; else a typed value sets it; else omit (keep the
+      // stored one — we can't see it to retype it).
+      const clear = el(`[data-clear-for="${key}"]`);
+      if (clear && clear.checked) values[key] = settingsClearSentinel;
+      else if (v) values[key] = v;
+    } else {
+      values[key] = v; // blank non-secret ⇒ server reverts it to the default
+    }
+  });
+  const btn = el('#settings-save');
+  btn.disabled = true;
+  try {
+    await fetchJSON('/api/settings', {
+      method: 'POST',
+      headers: { 'content-type': 'application/json' },
+      body: JSON.stringify({ values }),
+    });
+    el('#settings-dialog').close();
+    // Re-pull everything a knob can move: the Open WebUI link, health probes,
+    // service tiles, and the model menu (host/port changes alter all of them).
+    try {
+      state.config = await fetchJSON('/api/config');
+      const a = el('#open-webui-link');
+      if (state.config.open_webui_url) { a.href = state.config.open_webui_url; a.classList.remove('hidden'); }
+      else { a.classList.add('hidden'); }
+    } catch (e3) { console.warn('post-save /api/config refresh failed:', e3); }
+    pollStatus();
+    renderServices();
+    loadModels();
+  } catch (e2) {
+    err.textContent = 'Save failed: ' + e2.message.replace(/^\d+ [^:]*:\s*/, '');
+    err.classList.remove('hidden');
+  } finally {
+    btn.disabled = false;
+  }
+}
+
 async function init() {
  setupCopyButtons();
+  el('#open-settings').addEventListener('click', openSettingsDialog);
+  el('#settings-cancel').addEventListener('click', () => el('#settings-dialog').close());
+  el('#settings-form').addEventListener('submit', saveSettings);
  el('#open-download').addEventListener('click', openDownloadForm);
  el('#dl-cancel').addEventListener('click', closeDownloadPanel);
  el('#dl-start').addEventListener('click', startDownload);
@@ -2212,21 +2344,22 @@ async function init() {
  } catch {}
  setupDashboardTabs();
  setupEndpointCollapse();
-  await loadModels();
+  // Fire the (SSH-backed) menu scan without awaiting — it self-renders a
+  // "Scanning…" state and fills in when it returns, so a slow/unreachable
+  // cluster never blocks first paint. pollStatus() below paints the rest.
+  loadModels();
  await pollStatus();
  await renderServices();
  pollCoordination();
  pollHardware();
  pollUpdates();
-  // Disk-status probe runs after first paint — slow over SSH and not blocking.
-  loadDiskStatus();
  // Speech-model patches panel — slow over SSH, runs after first paint.
  renderSpeechModels();
  setInterval(pollStatus, 5000);
  setInterval(pollCoordination, 5000); // swap lock + schedule registry
  setInterval(pollHardware, 8000);    // every 8s
  setInterval(pollUpdates, 300000);  // every 5 min
-  setInterval(loadDiskStatus, 60000); // every 60s — disk state changes rarely
+  setInterval(loadModels, 60000); // every 60s — re-scan the Sparks for added/removed models
  setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
 }

@@ -17,14 +17,28 @@
      <span class="muted">connecting…</span>
    </div>
    <a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
+    <button id="open-settings" class="topbar-btn" type="button" title="Settings" aria-label="Open cluster settings">⚙ Settings</button>
  </header>

  <main>
    <section id="setup-banner" class="banner hidden">
      <strong>Configuration needed.</strong>
-      <span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
+      <span>Run the <em>Configure Sparks</em> action in StartOS to set your two Spark IPs and SSH users. Everything else (ports, services, integrations) lives under <em>⚙ Settings</em> above.</span>
    </section>

+    <dialog id="settings-dialog" class="modal">
+      <form method="dialog" class="modal-form" id="settings-form">
+        <h3>Settings</h3>
+        <p class="muted small">Optional cluster knobs — vLLM/service ports, container names, support-service hosts, and integrations. The two Spark IPs and SSH users are set once via the <em>Configure Sparks</em> action in StartOS; everything else is here. Changes apply immediately. Stored on this server and included in StartOS backups.</p>
+        <div id="settings-body" class="settings-body"><p class="muted small">Loading…</p></div>
+        <p id="settings-error" class="muted small dd-error hidden"></p>
+        <div class="modal-actions">
+          <button type="button" id="settings-cancel" class="btn">Cancel</button>
+          <button type="submit" id="settings-save" class="btn primary">Save</button>
+        </div>
+      </form>
+    </dialog>
+
    <section id="hardware-panel" class="hardware-panel hidden">
      <div class="section-header">
        <h2 class="section-title">Spark hardware</h2>
@@ -241,9 +255,10 @@

      <dialog id="catalog-dialog" class="modal">
        <form method="dialog" class="modal-form" id="catalog-form">
-          <h3>Add downloaded model to catalog</h3>
-          <p class="muted small">It will appear as a new card you can swap to. Knob values become its default launch flags — you can tweak later via the model's "Advanced" panel.</p>
-          <label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+"></label>
+          <h3>Set up this model</h3>
+          <p class="muted small">This model is downloaded, but Spark Control needs to know how to launch it. We've guessed from the model's own files — confirm or adjust, and it's saved so you're never asked again.</p>
+          <p id="cd-detected" class="muted small cd-detected hidden"></p>
+          <label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+" readonly></label>
          <label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
          <label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
          <label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
@@ -264,7 +279,7 @@
          </fieldset>
          <div class="modal-actions">
            <button type="button" id="cd-cancel" class="btn">Cancel</button>
-            <button type="submit" class="btn primary">Add to catalog</button>
+            <button type="submit" id="cd-submit" class="btn primary">Save settings</button>
          </div>
        </form>
      </dialog>
@@ -302,14 +317,14 @@

      <dialog id="disk-delete-dialog" class="modal">
        <form method="dialog" class="modal-form">
-          <h3>Delete model weights from disk?</h3>
+          <h3>Remove this model from the Sparks?</h3>
          <p id="dd-summary" class="muted small"></p>
          <ul class="muted small dd-hosts" id="dd-hosts"></ul>
-          <p class="muted small">This is reversible — you can re-download from the catalog at any time. The catalog entry stays intact.</p>
+          <p class="muted small">This deletes the weights and removes the card from the menu. You can always download it again later (re-downloading restores its saved settings).</p>
          <p id="dd-error" class="muted small dd-error hidden"></p>
          <div class="modal-actions">
            <button type="button" id="dd-cancel" class="btn">Cancel</button>
-            <button type="button" id="dd-confirm" class="btn danger">Delete from disk</button>
+            <button type="button" id="dd-confirm" class="btn danger">Remove from disk &amp; menu</button>
          </div>
        </form>
      </dialog>
@@ -354,11 +369,12 @@
        <div class="download-form" id="download-form">
          <label class="dl-row">
            <span class="dl-label">HuggingFace repo</span>
-            <input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
+            <input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off" list="dl-suggestions">
+            <datalist id="dl-suggestions"></datalist>
            <a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
          </label>
          <div class="dl-help muted small">
-            <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
+            Type any repo, or pick a known one from the list. <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
            · NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
          </div>
          <div class="dl-row">
@@ -778,6 +778,12 @@ main {
 .card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
 .tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
 .tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
+.tag.setup-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
+.card.needs-setup { border-style: dashed; }
+.card-actions .btn[data-setup-key] { flex: 1; }
+.empty-menu { grid-column: 1 / -1; padding: 28px 16px; text-align: center; border: 1px dashed var(--border); border-radius: 10px; }
+.cd-detected { padding: 8px 10px; border: 1px solid var(--border); border-radius: 8px; background: rgba(255,255,255,0.02); }
+.cd-detected code { word-break: break-all; }
 .card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
 .card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
 .card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
@@ -958,3 +964,13 @@ main {
 .tab-content.active { display: block; }

 /* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */
+
+/* ===== Settings ('gear') dialog ===== */
+.modal#settings-dialog { max-width: 560px; }
+/* Cap the (tall) form so the Save/Cancel actions stay reachable; the grouped
+   fields scroll within. */
+#settings-body { max-height: 60vh; overflow-y: auto; padding-right: 6px; display: flex; flex-direction: column; gap: 12px; }
+.settings-field { display: flex; flex-direction: column; gap: 2px; }
+.settings-help { display: block; line-height: 1.35; }
+.settings-clear { display: inline-flex; align-items: center; gap: 6px; margin-top: 2px; cursor: pointer; }
+.settings-clear input { width: auto; }
@@ -1,9 +1,14 @@
-# spark-control model catalog
+# spark-control launch recipes
 #
-# Edit this file (or override at runtime via the StartOS "Edit Model Catalog"
-# action) to add or change available models.
+# These are NOT the dashboard menu. The menu is whatever is actually downloaded
+# on the Sparks — Spark Control scans the Hugging Face cache on each load and
+# shows what it finds. These entries are launch *recipes*: matched to an on-disk
+# model by `repo`, they say HOW to launch it. A downloaded model with no recipe
+# here shows up as "needs setup", and the dashboard infers + saves one on first
+# use (from the model's own config.json). Add a recipe to make a known model
+# launch correctly the moment it's downloaded, with no setup prompt.
 #
-# Each model entry produces this command on Spark 1:
+# Each recipe produces this command on Spark 1:
 #   cd ~/spark-vllm-docker
 #   ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
 #     --port=<defaults.port> --host=<defaults.host> <vllm_args...>
@@ -54,6 +59,34 @@ models:
      - --enable-prefix-caching
      - --kv-cache-dtype=fp8

+  gemma4-26b:
+    display_name: "Gemma 4 26B-A4B (vision, light)"
+    description: >-
+      Lighter, faster sibling of the Gemma 4 31B above: a Mixture-of-Experts
+      model with 26B total parameters but only ~4B active per token, so it
+      generates quickly. Takes images as well as text (good for tasks like
+      reading a business card into structured text). Reasoning is a bit
+      shallower than the dense 31B. Runs solo on one Spark.
+    repo: nvidia/Gemma-4-26B-A4B-NVFP4
+    size_gb: 17
+    mode: solo
+    capabilities: [vision, reasoning, tools]
+    expected_ready_seconds: 240
+    vllm_args:
+      - --gpu-memory-utilization=0.8
+      - --max-model-len=32768
+      - --max-num-batched-tokens=16384
+      - --reasoning-parser=gemma4
+      - --tool-call-parser=gemma4
+      - --enable-auto-tool-choice
+      # MoE backend: research found this model's expert layers fall back to
+      # 'marlin' on GB10 (the fast flashinfer_cutlass path errors on sm_121).
+      # If a swap fails to start, this flag is the first thing to flip.
+      - --moe_backend=marlin
+      - --load-format=fastsafetensors
+      - --enable-prefix-caching
+      - --kv-cache-dtype=fp8
+
  qwen36:
    display_name: "Qwen3.6 35B-A3B (daily driver)"
    description: >-
@@ -74,36 +107,3 @@ models:
      - --load-format=fastsafetensors
      - --enable-prefix-caching
      - --kv-cache-dtype=fp8
-
-  qwen3-235b-fp8:
-    display_name: "Qwen3 235B-A22B FP8 (legacy)"
-    description: >-
-      Earlier generation of the Qwen 235B family in native FP8 precision.
-      Runs across both Sparks. Mostly superseded by Qwen3-VL above; keep
-      around for text-only baseline comparisons.
-    repo: Qwen/Qwen3-235B-A22B-FP8
-    size_gb: 220
-    mode: cluster
-    capabilities: []
-    expected_ready_seconds: 360
-    vllm_args:
-      - --gpu-memory-utilization=0.7
-      - -tp=2
-      - --distributed-executor-backend=ray
-      - --max-model-len=32768
-
-  qwen25-72b:
-    display_name: "Qwen2.5 72B (legacy)"
-    description: >-
-      Last-generation 72B dense model. Cluster mode required due to size.
-      Kept for compatibility and baseline comparison against newer Qwens.
-    repo: Qwen/Qwen2.5-72B-Instruct
-    size_gb: 145
-    mode: cluster
-    capabilities: []
-    expected_ready_seconds: 360
-    vllm_args:
-      - --gpu-memory-utilization=0.7
-      - -tp=2
-      - --distributed-executor-backend=ray
-      - --max-model-len=32768
@@ -15,3 +15,6 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
 os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
 os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
+# Keep the in-app settings overlay off the container-only /data path; tests that
+# care about its contents point it at their own tmp file via monkeypatch.
+os.environ.setdefault("APP_SETTINGS_FILE", "/tmp/spark_control_test_app_settings.json")
@@ -0,0 +1,174 @@
+"""In-app settings overlay (the dashboard 'gear') + swap-lock routing regression.
+
+Covers app_settings (the /data overlay backing the gear): first-run seeding from
+env (the migration path), known-key filtering, apply() validation, secret
+masking — and, end-to-end via TestClient, that POST /api/settings reloads the
+shared Settings instance live, and that GET /api/swap/lock is no longer shadowed
+by /api/swap/{job_id}.
+"""
+import json
+import pytest
+
+from app import app_settings
+
+
+@pytest.fixture
+def overlay_file(tmp_path, monkeypatch):
+    p = tmp_path / "app_settings.json"
+    monkeypatch.setenv("APP_SETTINGS_FILE", str(p))
+    return p
+
+
+# ---- overlay store ----
+
+def test_seed_from_env_filters_unknown_and_blank(overlay_file):
+    # An existing install upgrading in: values previously set via the StartOS
+    # action arrive as env; only known, non-empty keys migrate into the overlay.
+    app_settings.seed_from_env({
+        "VLLM_PORT": "8000",
+        "QDRANT_COLLECTION": "",   # blank → skipped
+        "TOTALLY_UNKNOWN": "x",    # not a gear key → skipped
+        "PARAKEET_PORT": "8010",
+    })
+    expected = {"VLLM_PORT": "8000", "PARAKEET_PORT": "8010"}
+    assert app_settings.load_overlay() == expected
+    assert json.loads(overlay_file.read_text()) == expected
+
+
+def test_seed_is_a_one_time_noop_when_file_present(overlay_file):
+    overlay_file.write_text(json.dumps({"VLLM_PORT": "8000", "BOGUS": "y", "NGC_API_KEY": ""}))
+    app_settings.seed_from_env({"VLLM_PORT": "9999"})  # file exists ⇒ no-op
+    # unknown + blank keys dropped on read; existing value untouched by the seed.
+    assert app_settings.load_overlay() == {"VLLM_PORT": "8000"}
+
+
+def test_no_file_is_empty_and_seed_of_blank_env_writes_nothing(overlay_file):
+    assert app_settings.load_overlay() == {}
+    app_settings.seed_from_env({"VLLM_PORT": "", "QDRANT_COLLECTION": ""})
+    assert not overlay_file.exists()  # nothing worth seeding ⇒ no file
+    assert app_settings.load_overlay() == {}
+
+
+def test_apply_set_then_blank_deletes(overlay_file):
+    app_settings.apply({"VLLM_PORT": "8000"})
+    assert app_settings.load_overlay()["VLLM_PORT"] == "8000"
+    app_settings.apply({"VLLM_PORT": ""})  # blank non-secret ⇒ revert to default
+    assert "VLLM_PORT" not in app_settings.load_overlay()
+
+
+def test_apply_rejects_unknown_key(overlay_file):
+    with pytest.raises(app_settings.SettingsError):
+        app_settings.apply({"NOT_A_KNOB": "x"})
+
+
+def test_apply_rejects_non_numeric_port(overlay_file):
+    with pytest.raises(app_settings.SettingsError):
+        app_settings.apply({"PARAKEET_PORT": "80x0"})
+
+
+def test_apply_rejects_control_chars(overlay_file):
+    with pytest.raises(app_settings.SettingsError):
+        app_settings.apply({"QDRANT_COLLECTION": "a\nb"})
+
+
+def test_secret_blank_keeps_existing(overlay_file):
+    app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
+    app_settings.apply({"NGC_API_KEY": ""})  # blank secret ⇒ leave it in place
+    assert app_settings.load_overlay()["NGC_API_KEY"] == "nvapi-abc"
+
+
+def test_apply_rejects_out_of_range_port(overlay_file):
+    for bad in ("0", "99999", "65536"):
+        with pytest.raises(app_settings.SettingsError):
+            app_settings.apply({"VLLM_PORT": bad})
+
+
+def test_apply_accepts_port_bounds(overlay_file):
+    app_settings.apply({"VLLM_PORT": "1", "PARAKEET_PORT": "65535"})
+    o = app_settings.load_overlay()
+    assert o["VLLM_PORT"] == "1" and o["PARAKEET_PORT"] == "65535"
+
+
+def test_secret_clear_sentinel_removes(overlay_file):
+    app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
+    app_settings.apply({"NGC_API_KEY": app_settings.CLEAR_SENTINEL})
+    assert "NGC_API_KEY" not in app_settings.load_overlay()
+
+
+def test_seed_skips_invalid_and_strips(overlay_file):
+    app_settings.seed_from_env({
+        "VLLM_PORT": "8000\n",        # trailing newline → stripped
+        "PARAKEET_PORT": "99999",      # out of range → skipped, not written
+        "QDRANT_COLLECTION": "crm",
+    })
+    o = app_settings.load_overlay()
+    assert o["VLLM_PORT"] == "8000"
+    assert "PARAKEET_PORT" not in o
+    assert o["QDRANT_COLLECTION"] == "crm"
+
+
+def test_public_view_exposes_clear_sentinel(overlay_file):
+    assert app_settings.public_view()["clear_sentinel"] == app_settings.CLEAR_SENTINEL
+
+
+def test_public_view_masks_secrets_and_groups(overlay_file):
+    app_settings.apply({"NGC_API_KEY": "nvapi-abc", "VLLM_PORT": "8000"})
+    view = app_settings.public_view()
+    fields = {f["key"]: f for g in view["groups"] for f in g["fields"]}
+    # Secret: value never echoed to the browser, only a set flag.
+    assert "value" not in fields["NGC_API_KEY"]
+    assert fields["NGC_API_KEY"]["set"] is True
+    # Non-secret: current value present for prefill.
+    assert fields["VLLM_PORT"]["value"] == "8000"
+    assert {g["name"] for g in view["groups"]} >= {"vLLM (Spark 1)", "Integrations"}
+    # The previously-missing support-service ports are now exposed.
+    assert {"PARAKEET_PORT", "KOKORO_PORT", "EMBED_PORT", "QDRANT_PORT"} <= set(fields)
+
+
+# ---- end-to-end (TestClient): live reload + route order ----
+# TestClient is created without the `with` context manager so app startup events
+# (the deep-health poll loop) don't run — these stay fully offline.
+
+def _client(monkeypatch, tmp_path):
+    monkeypatch.setenv("APP_SETTINGS_FILE", str(tmp_path / "live.json"))
+    from fastapi.testclient import TestClient
+    from app import server
+    return TestClient(server.app)
+
+
+def test_swap_lock_get_is_not_shadowed(monkeypatch, tmp_path):
+    client = _client(monkeypatch, tmp_path)
+    r = client.get("/api/swap/lock")
+    # Regression: must hit get_swap_lock (200, {"held": False}), NOT the
+    # /api/swap/{job_id} catch-all that returns 404 "no such job".
+    assert r.status_code == 200
+    assert r.json() == {"held": False}
+
+
+def test_settings_apply_is_live_without_restart(monkeypatch, tmp_path):
+    client = _client(monkeypatch, tmp_path)
+    r = client.post("/api/settings", json={"values": {"VLLM_PORT": "8123"}})
+    assert r.status_code == 200
+    # Settings reloaded in place ⇒ /api/config reflects it immediately.
+    assert client.get("/api/config").json()["vllm_port"] == 8123
+    # And clearing it reverts to the default, still live.
+    client.post("/api/settings", json={"values": {"VLLM_PORT": ""}})
+    assert client.get("/api/config").json()["vllm_port"] == 8888
+
+
+def test_settings_post_rejects_bad_value(monkeypatch, tmp_path):
+    client = _client(monkeypatch, tmp_path)
+    r = client.post("/api/settings", json={"values": {"PARAKEET_PORT": "nope"}})
+    assert r.status_code == 422
+
+
+def test_webhook_notifier_repoints_live(monkeypatch, tmp_path):
+    # WebhookNotifier snapshots url/secret, so reload() alone can't reach it;
+    # post_settings must re-point it. Regression for that P1.
+    client = _client(monkeypatch, tmp_path)
+    from app import server
+    client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": "https://example.test/hook"}})
+    assert server.swap_webhook.url == "https://example.test/hook"
+    assert server.swap_webhook.enabled
+    client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": ""}})
+    assert server.swap_webhook.url == ""
@@ -0,0 +1,190 @@
+"""Disk-driven menu helpers: cache-dir parsing + launch-recipe inference.
+
+All offline — pure functions over a fake cache listing and fake config.json
+dicts. The SSH scan, the menu merge, and the suggest endpoint that wire these
+together are exercised by hand against the live cluster (mock-heavy unit tests of
+those would test the mocks).
+"""
+import asyncio
+
+from app import discovery
+from app.config import Settings
+from app.disk import DiskStatus, cache_dirname_to_repo, parse_cache_listing
+from app.discovery import repo_to_key, infer_recipe, _detect_family
+from app.models import load_catalog
+
+
+# ---- cache dirname <-> repo ----
+
+def test_cache_dirname_to_repo_roundtrip():
+    assert cache_dirname_to_repo("models--RedHatAI--Qwen3.6-35B-A3B-NVFP4") == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
+
+
+def test_cache_dirname_name_with_double_dash():
+    # The org is the first segment; everything after is the name (single '/').
+    assert cache_dirname_to_repo("models--org--weird--name") == "org/weird--name"
+
+
+def test_cache_dirname_rejects_non_model_dirs():
+    assert cache_dirname_to_repo("datasets--foo--bar") is None
+    assert cache_dirname_to_repo("models--onlyorg") is None
+    assert cache_dirname_to_repo("random") is None
+
+
+# ---- parse_cache_listing ----
+
+def test_parse_cache_listing_complete_and_incomplete():
+    out = (
+        "20000000000|1|models--RedHatAI--Qwen3.6-35B-A3B-NVFP4\n"
+        "5000000000|0|models--some--half-downloaded\n"
+        "\n"
+        "garbage line with no pipes\n"
+        "123|1|not-a-model-dir\n"
+    )
+    items = parse_cache_listing(out)
+    assert items == [
+        ("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20000000000, True),
+        ("some/half-downloaded", 5000000000, False),
+    ]
+
+
+def test_parse_cache_listing_bad_size_defaults_zero():
+    items = parse_cache_listing("notanumber|1|models--a--b")
+    assert items == [("a/b", 0, True)]
+
+
+# ---- repo_to_key ----
+
+def test_repo_to_key_is_url_safe_and_stable():
+    assert repo_to_key("RedHatAI/Qwen3.6-35B-A3B-NVFP4") == "redhatai-qwen3-6-35b-a3b-nvfp4"
+    # Idempotent enough to be a stable id across calls.
+    assert repo_to_key("nvidia/Gemma-4-26B-A4B-NVFP4") == "nvidia-gemma-4-26b-a4b-nvfp4"
+
+
+# ---- family detection ----
+
+def test_detect_qwen3_moe():
+    cfg = {"architectures": ["Qwen3MoeForCausalLM"], "model_type": "qwen3_moe", "num_experts": 128}
+    label, flags, caps = _detect_family(cfg)
+    assert "--reasoning-parser=qwen3" in flags
+    assert "--moe_backend=flashinfer_cutlass" in flags
+    assert "reasoning" in caps
+    assert "MoE" in label
+
+
+def test_detect_gemma_moe_uses_marlin():
+    cfg = {"architectures": ["Gemma4MoeForConditionalGeneration"], "model_type": "gemma4_moe", "num_local_experts": 8}
+    label, flags, caps = _detect_family(cfg)
+    assert "--reasoning-parser=gemma4" in flags
+    assert "--tool-call-parser=gemma4" in flags
+    assert "--moe_backend=marlin" in flags        # NOT flashinfer_cutlass — GB10 footgun
+    assert "vision" in caps                         # ConditionalGeneration => multimodal
+    assert "tools" in caps
+
+
+def test_detect_generic_has_no_family_flags():
+    label, flags, caps = _detect_family({"architectures": ["LlamaForCausalLM"], "model_type": "llama"})
+    assert flags == []
+    assert label == "Generic"
+
+
+def test_detect_vision_from_config_keys():
+    _, _, caps = _detect_family({"model_type": "qwen3", "vision_config": {"x": 1}})
+    assert "vision" in caps
+
+
+# ---- infer_recipe (the prefill the setup form receives) ----
+
+def test_infer_recipe_solo_small_model():
+    cfg = {"architectures": ["Qwen3ForCausalLM"], "model_type": "qwen3"}
+    rec = infer_recipe("RedHatAI/Qwen3.6-35B-A3B-NVFP4", cfg, total_bytes=20_000_000_000, on_host_count=1)
+    assert rec["mode"] == "solo"
+    assert rec["key"] == "redhatai-qwen3-6-35b-a3b-nvfp4"
+    assert rec["repo"] == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
+    assert "--reasoning-parser=qwen3" in rec["vllm_args"]
+    assert "-tp=2" not in rec["vllm_args"]
+    assert rec["knobs"]["kv_cache_dtype"] == "fp8"
+
+
+def test_infer_recipe_cluster_when_on_both_hosts():
+    rec = infer_recipe("org/big", {}, total_bytes=10_000_000_000, on_host_count=2)
+    assert rec["mode"] == "cluster"
+    assert "-tp=2" in rec["vllm_args"]
+    assert "--distributed-executor-backend=ray" in rec["vllm_args"]
+    assert rec["knobs"]["gpu_memory_utilization"] == 0.7
+
+
+def test_infer_recipe_cluster_when_too_big_for_one_spark():
+    rec = infer_recipe("org/huge", {}, total_bytes=200_000_000_000, on_host_count=1)
+    assert rec["mode"] == "cluster"
+
+
+# ---- build_menu merge (disk scan ∪ recipes) ----
+
+def _both_spark_settings(monkeypatch) -> Settings:
+    for k in ("SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER"):
+        monkeypatch.delenv(k, raising=False)
+    monkeypatch.setenv("SPARK1_HOST", "1.1.1.1")
+    monkeypatch.setenv("SPARK1_USER", "u")
+    monkeypatch.setenv("SPARK2_HOST", "2.2.2.2")
+    monkeypatch.setenv("SPARK2_USER", "u")
+    return Settings.from_env()
+
+
+def test_build_menu_merges_recipe_discovered_and_hides_incomplete(monkeypatch):
+    cat = load_catalog("models.yaml")  # bundled recipes incl. qwen36 + gemma4
+    settings = _both_spark_settings(monkeypatch)
+
+    async def fake_list(host, user, s):
+        if host == "1.1.1.1":
+            return [
+                ("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20_000_000_000, True),  # recipe match
+                ("someorg/mystery-7B", 7_000_000_000, True),               # needs setup
+                ("broken/half", 1_000_000_000, False),                     # incomplete -> hidden
+            ]
+        return []  # spark2 empty
+
+    async def fake_probe(repo, mode, s, *, local_path=None):
+        return DiskStatus(repo=local_path or repo, on_disk=False, total_bytes=0, per_host=[])
+
+    monkeypatch.setattr(discovery, "list_cached_models", fake_list)
+    monkeypatch.setattr(discovery, "probe_disk", fake_probe)
+
+    menu = asyncio.run(discovery.build_menu(settings, cat))
+
+    # Recipe-matched: keyed by recipe key, ready (not needs_setup), real size.
+    assert "qwen36" in menu
+    assert menu["qwen36"]["needs_setup"] is False
+    assert menu["qwen36"]["total_bytes"] == 20_000_000_000
+
+    # Discovered-without-recipe: slug key, needs_setup.
+    slug = repo_to_key("someorg/mystery-7B")
+    assert menu[slug]["needs_setup"] is True
+
+    # Incomplete download is filtered out entirely.
+    assert all("half" not in k for k in menu)
+
+    # A recipe with nothing on disk (e.g. gemma4) must NOT appear — the menu is the disk.
+    assert "gemma4" not in menu
+
+
+def test_build_menu_sums_cluster_model_across_both_sparks(monkeypatch):
+    cat = load_catalog("models.yaml")
+    settings = _both_spark_settings(monkeypatch)
+
+    async def fake_list(host, user, s):
+        # Same repo present on BOTH Sparks — one card, sizes summed (not two cards).
+        return [("org/sharded-235B", 70_000_000_000, True)]
+
+    async def fake_probe(repo, mode, s, *, local_path=None):
+        return DiskStatus(repo=repo, on_disk=False, total_bytes=0, per_host=[])
+
+    monkeypatch.setattr(discovery, "list_cached_models", fake_list)
+    monkeypatch.setattr(discovery, "probe_disk", fake_probe)
+
+    menu = asyncio.run(discovery.build_menu(settings, cat))
+    key = repo_to_key("org/sharded-235B")
+    assert list(menu) == [key]                       # exactly one card
+    assert menu[key]["total_bytes"] == 140_000_000_000  # summed across both hosts
+    assert len(menu[key]["per_host"]) == 2
+    assert menu[key]["mode"] == "cluster"            # present on 2 hosts -> cluster
@@ -0,0 +1,35 @@
+"""build_download_command: the ~/.local/bin PATH fix + shell-injection quoting.
+
+hf-download.sh on the Spark shells out to `uvx`, which the uv installer puts in
+~/.local/bin — off the PATH of our non-interactive SSH session. The command must
+prepend ~/.local/bin (via $HOME, expanded server-side) or the download dies with
+"uvx: command not found". The repo value must also be shlex-quoted at the sink so
+a crafted value can't break out of the command (validate_repo gates it upstream).
+"""
+import shlex
+
+from app.download import build_download_command
+
+
+def test_prepends_local_bin_to_path():
+    cmd = build_download_command("org/name")
+    assert cmd.startswith('export PATH="$HOME/.local/bin:$PATH" && ')
+    assert "cd ~/spark-vllm-docker" in cmd
+    assert "./hf-download.sh org/name" in cmd
+
+
+def test_no_trailing_space_without_flags():
+    assert build_download_command("org/name", "").endswith("./hf-download.sh org/name")
+
+
+def test_cluster_flags_appended():
+    cmd = build_download_command("org/name", "-c --copy-parallel")
+    assert cmd.endswith("./hf-download.sh org/name -c --copy-parallel")
+
+
+def test_repo_is_shlex_quoted():
+    # Everything after the script name must shlex-split back to the exact repo,
+    # the same round-trip invariant build_launch_command relies on.
+    cmd = build_download_command("org/na;me")
+    after = cmd.split("./hf-download.sh ", 1)[1]
+    assert shlex.split(after) == ["org/na;me"]
@@ -3,6 +3,15 @@ import { sparkConfigYaml } from '../fileModels/sparkConfig.yaml'

 const { InputSpec, Value } = sdk

+// This action is intentionally minimal: just the required wiring needed before
+// Spark Control can do anything — the two Spark node addresses and SSH users.
+// Every other knob (vLLM/service ports, container names, support-service hosts,
+// integrations, webhooks) now lives behind the ⚙ Settings gear in the dashboard
+// itself, which is where StartOS 0.4 expects routine config to live (and most
+// operators never open StartOS actions). The optional keys still exist in the
+// config.yaml schema (set by older versions); they're read into env at launch
+// and migrated into the in-app settings overlay on first boot, so nothing is
+// lost on upgrade — they're simply edited in the dashboard from now on.
 const inputSpec = InputSpec.of({
  spark1_host: Value.text({
    name: 'Spark 1 hostname or IP',
@@ -40,164 +49,14 @@ const inputSpec = InputSpec.of({
    placeholder: 'your SSH username',
    masked: false,
  }),
-  vllm_port: Value.text({
-    name: 'vLLM port (optional)',
-    description:
-      "The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
-    required: false,
-    default: null,
-    placeholder: 'leave blank for 8888',
-    masked: false,
-  }),
-  vllm_container: Value.text({
-    name: 'vLLM container name (optional)',
-    description:
-      'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
-    required: false,
-    default: null,
-    placeholder: 'leave blank for vllm_node',
-    masked: false,
-  }),
-  disabled_services: Value.text({
-    name: 'Services to hide (optional)',
-    description:
-      "Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
-    required: false,
-    default: null,
-    placeholder: 'e.g. parakeet,kokoro',
-    masked: false,
-  }),
-  parakeet_host: Value.text({
-    name: 'Parakeet host (optional)',
-    description:
-      "Override the host running the Parakeet STT container. Leave blank if Parakeet runs on Spark 2 — that's the default. Set this if you run Parakeet on Spark 1 or a different machine.",
-    required: false,
-    default: null,
-    placeholder: 'leave blank to use Spark 2',
-    masked: false,
-  }),
-  parakeet_container: Value.text({
-    name: 'Parakeet container name (optional)',
-    description:
-      'Docker container name for Parakeet. Defaults to "parakeet-asr" — change only if you named yours something else.',
-    required: false,
-    default: null,
-    placeholder: 'parakeet-asr',
-    masked: false,
-  }),
-  kokoro_host: Value.text({
-    name: 'Kokoro host (optional)',
-    description:
-      'Override the host running the Kokoro TTS container. Leave blank if Kokoro runs on Spark 2.',
-    required: false,
-    default: null,
-    placeholder: 'leave blank to use Spark 2',
-    masked: false,
-  }),
-  kokoro_container: Value.text({
-    name: 'Kokoro container name (optional)',
-    description: 'Docker container name for Kokoro. Defaults to "kokoro-tts".',
-    required: false,
-    default: null,
-    placeholder: 'kokoro-tts',
-    masked: false,
-  }),
-  embed_host: Value.text({
-    name: 'Embedding server host (optional)',
-    description:
-      'Override the host running the spark-embed container (bge-m3 dense embeddings + reranker). Leave blank if it runs on Spark 2.',
-    required: false,
-    default: null,
-    placeholder: 'leave blank to use Spark 2',
-    masked: false,
-  }),
-  embed_container: Value.text({
-    name: 'Embedding container name (optional)',
-    description:
-      'Docker container name for the embedding server. Defaults to "spark-embed".',
-    required: false,
-    default: null,
-    placeholder: 'spark-embed',
-    masked: false,
-  }),
-  qdrant_host: Value.text({
-    name: 'Qdrant host (optional)',
-    description:
-      'Override the host running the Qdrant vector database. Leave blank if it runs on Spark 2.',
-    required: false,
-    default: null,
-    placeholder: 'leave blank to use Spark 2',
-    masked: false,
-  }),
-  qdrant_container: Value.text({
-    name: 'Qdrant container name (optional)',
-    description: 'Docker container name for Qdrant. Defaults to "qdrant".',
-    required: false,
-    default: null,
-    placeholder: 'qdrant',
-    masked: false,
-  }),
-  qdrant_collection: Value.text({
-    name: 'Default Qdrant collection (optional)',
-    description:
-      'Default collection name used by /api/search when a request does not specify one. Leave blank to require callers to pass a collection.',
-    required: false,
-    default: null,
-    placeholder: 'e.g. crm_chunks',
-    masked: false,
-  }),
-  matrix_bridge_user: Value.text({
-    name: 'matrix-bridge bot SSH user (optional)',
-    description:
-      "If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
-    required: false,
-    default: null,
-    placeholder: 'e.g. modelo',
-    masked: false,
-  }),
-  open_webui_url: Value.text({
-    name: 'Open WebUI URL (optional)',
-    description:
-      'If you also run Open WebUI on your LAN, paste its URL here. Spark Control will then show a one-click "Open chat" button next to the current model so you can jump straight to it.',
-    required: false,
-    default: null,
-    placeholder: 'e.g. https://open-webui.yourserver.local',
-    masked: false,
-  }),
-  ngc_api_key: Value.text({
-    name: 'NGC API key (optional)',
-    description:
-      'NVIDIA NGC personal API key — needed to install NIM containers (Parakeet, etc.) from nvcr.io. Get one free at https://ngc.nvidia.com/setup/personal-key. Stored only on this Start9 server; passed to docker as the NGC_API_KEY env var when installing NIM services. (Kokoro TTS is Apache 2.0 and does not need an NGC key.)',
-    required: false,
-    default: null,
-    placeholder: 'starts with "nvapi-..."',
-    masked: true,
-  }),
-  swap_webhook_url: Value.text({
-    name: 'Swap webhook URL (optional)',
-    description:
-      'If you run automation that needs to know when the loaded model changes, paste a URL here. Spark Control POSTs a small JSON event (swap_complete / swap_failed) to it after every model swap, so the consumer can re-point its config to the new model. Leave blank to disable. Only needed if something other than this dashboard cares about swaps.',
-    required: false,
-    default: null,
-    placeholder: 'e.g. https://my-service.local/spark-swap',
-    masked: false,
-  }),
-  swap_webhook_secret: Value.text({
-    name: 'Swap webhook secret (optional)',
-    description:
-      'Optional shared secret. If set, each webhook is signed with an "X-Spark-Signature: sha256=…" header (HMAC of the body) so the receiver can verify it really came from Spark Control. Leave blank to send the webhook unsigned.',
-    required: false,
-    default: null,
-    placeholder: 'a random string the receiver also knows',
-    masked: true,
-  }),
 })

 export const configureSparks = sdk.Action.withInput(
  'configure-sparks',
  async () => ({
    name: 'Configure Sparks',
-    description: 'Set the hostnames and SSH users for your two Spark nodes.',
+    description:
+      'Set your two Spark node addresses and SSH users — the required wiring. Everything else (ports, container names, support services, integrations) is configured under ⚙ Settings in the Spark Control dashboard.',
    warning: null,
    visibility: 'enabled',
    allowedStatuses: 'any',
@@ -205,11 +64,19 @@ export const configureSparks = sdk.Action.withInput(
  }),
  async () => inputSpec,
  async ({ effects }) => {
+    // Prefill from the saved config, but only the keys this (trimmed) form owns.
    const cfg = await sparkConfigYaml.read().once()
-    return cfg ?? null
+    if (!cfg) return null
+    return {
+      spark1_host: cfg.spark1_host,
+      spark1_user: cfg.spark1_user,
+      spark2_host: cfg.spark2_host,
+      spark2_user: cfg.spark2_user,
+    }
  },
  async ({ effects, input }) => {
-    // Optional fields come through as `null`; coerce to empty string for the schema.
+    // merge() only touches the four keys we submit, leaving any legacy optional
+    // values already in config.yaml intact.
    const normalized = Object.fromEntries(
      Object.entries(input).map(([k, v]) => [k, v ?? '']),
    ) as Record<string, string>
@@ -1,10 +1,10 @@
 import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'

 export const v0_1_0 = VersionInfo.of({
-  version: '0.25.0:0',
+  version: '0.27.1:0',
  releaseNotes: {
    en_US:
-      "v0.25.0:0 — cluster coordination layer (GPU arbiter). For clusters where automation, not just this dashboard, swaps models. Three additions: (1) Swap reservation lock — an external scheduler can reserve the GPU swap path (POST /api/swap/lock) and gets a secret token; while held, any swap without the token is refused (423), so the dashboard's manual swap is paused and shows who holds the GPU and until when (with a human Release override). The lock is TTL-bounded and self-frees. (2) Swap webhook — set a URL (and optional signing secret) in Configure Sparks; Spark Control POSTs a swap_complete / swap_failed event after each swap so downstream consumers re-point their model config. (3) Schedule registry — your automation can register its cron jobs (POST /api/schedule) for a read-only \"Scheduled jobs\" panel on the dashboard; Spark Control only displays them, it never runs them. New API: /api/swap/lock (GET/POST/DELETE), /api/schedule (GET/POST/DELETE). See docs/COORDINATION.md. Spark Control remains a control plane, not a job runner — business pipelines stay in their own services and call the swap API.",
+      'v0.27.1:0 — bug fix: "Download a new model" now works on its own. The downloader on the Spark relies on a helper tool (uvx, part of Astral\'s uv) that the standard installer places under your home directory in ~/.local/bin. Spark Control runs downloads over an automated SSH session that wasn\'t looking there, so a download failed immediately with "uvx: command not found" even though the tool was installed. Spark Control now includes ~/.local/bin on the path when it runs a download, so the Download button works with no manual setup. No other changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
  },
  migrations: {
    up: async ({ effects }) => {},
@@ -74,11 +74,15 @@ For a cluster wired differently from the reference layout, three optional knobs

 ## Adding a new model

-1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
-2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host> 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
-3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
+The menu is whatever's downloaded on the Sparks, so the normal path is just:
+**download it, then set it up once.**

-If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
+1. **Download** from the dashboard (**+ Download a new model**, paste the HF repo) or on Spark 1 with `./hf-download.sh <repo>`. When it finishes it appears on the menu by itself.
+2. **Set it up.** If Spark Control already has a recipe for it (see below), it's ready to switch to. Otherwise it shows a **"needs setup"** card: the first switch reads the model's `config.json`, proposes how to launch it (family/parsers, solo vs cluster, vLLM flags), and you confirm once. The confirmed recipe persists to `/data/models-overrides.yaml` (survives package updates).
+
+### Bundling a launch recipe (optional — skips the setup prompt)
+
+To make a known model launch correctly the instant it's downloaded, add a *recipe* to `image/models.yaml`. These are **not** the menu — they're matched to an on-disk model by `repo`. Required: `display_name`, `repo`, `size_gb`, `mode` (`solo`/`cluster`), `vllm_args`. Optional: `description`, `capabilities` (e.g. `[vision, reasoning, tools]`), `expected_ready_seconds`. Then rebuild + redeploy: `cd package && make x86 && make install`. Keep descriptions generic (not user-specific) so the recipes stay portable.

 ### Local / fine-tuned models (v0.23.0+)
Author	SHA1	Message	Date
Keysat	c846386c1a	docs: v0.27.1:0 live + published to Clankistry; Gemma download fix confirmed end-to-end	2026-06-18 16:46:24 -05:00
Keysat	1e1e1cb568	v0.27.1:0 - fix model download: prepend ~/.local/bin so SSH finds uvx hf-download.sh shells out to uvx (the uv installer drops it in ~/.local/bin), but the non-interactive SSH session doesn't source the user's profile, so ~/.local/bin was off PATH and downloads died with "uvx: command not found". build_download_command now prepends $HOME/.local/bin. Adds test_download.py.	2026-06-18 16:44:07 -05:00
Keysat	a20c538ebf	docs: v0.27.0:0 live + shipped; record settings-gear architecture + snapshot-holder gotcha	2026-06-18 13:51:11 -05:00
Keysat	7e0759846f	v0.27.0:0 - in-app settings gear + swap-lock route fix Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks" action (now just the 4 required fields) and into a dashboard ⚙ Settings gear, backed by a /data/app_settings.json overlay keyed by env-var names. One shared mutable Settings instance + Settings.reload() applies edits live without a restart; existing installs' values migrate automatically on first boot. Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the /api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its url/secret reload live too.	2026-06-18 13:41:28 -05:00
Keysat	b67e001642	docs: v0.26.0:0 live + published to registry; surface Gemma-26B eval as next	2026-06-18 12:35:16 -05:00
Keysat	df9f244eae	v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup) The dashboard menu is now the set of models actually downloaded on the Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as launch recipes matched to an on-disk model by repo; an on-disk model with no recipe is flagged needs_setup and its launch settings are inferred from its config.json for a one-time operator confirmation (discovery.py). - delete now removes weights AND the menu card (delete_from_disk sweeps all hosts; the delete endpoint resolves keys via the live menu) - new GET /api/models/suggest; /api/models returns the menu + a recipes list (download autocomplete); GET /api/models/disk-status removed - dropped the two legacy Qwen recipes (235B FP8, 2.5 72B) - tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)	2026-06-18 11:09:56 -05:00
Keysat	c0b35184ba	docs: trim Current state to live status — coordination epic shipped	2026-06-18 08:09:59 -05:00
Keysat	7ecd77f1e5	docs: defer raw-docker swap generalization — multi-node rationale recorded	2026-06-18 07:58:25 -05:00
Keysat	6bcda6e348	docs: v0.25.0:0 installed live — update Current state	2026-06-18 07:11:33 -05:00