23 Commits

Author SHA1 Message Date
Keysat 1f359e3c79 v0.27.3:0 - cap image resolution (fix oversized-image 400); remove vision-check button
A 12MP photo expands past vLLM's ~4096-image-token limit -> 400. Cap via
--mm-processor-kwargs max_pixels in the qwen36 recipe so big images auto-
downscale server-side for every /v1 consumer (verified live: 400->200).
Remove the v0.27.2 in-dashboard vision-check button per owner request; the
vision badge already signals capability.
2026-06-18 18:41:28 -05:00
Keysat 9a3bf9ed86 v0.27.2:0 - vision check tool + mark Qwen3.6 vision-capable
Qwen3.6-35B-A3B is multimodal (vision tower on disk) but was labelled
text-only. Mark it [vision, reasoning] and add a 'Vision check' button on
the running vision-capable card: upload an image + prompt -> existing /v1
passthrough proxy -> show the model's text. Confirmed 7/7 fields on a
business card. Records the Gemma-4-26B deferral + research findings.
2026-06-18 18:14:30 -05:00
Keysat c846386c1a docs: v0.27.1:0 live + published to Clankistry; Gemma download fix confirmed end-to-end 2026-06-18 16:46:24 -05:00
Keysat 1e1e1cb568 v0.27.1:0 - fix model download: prepend ~/.local/bin so SSH finds uvx
hf-download.sh shells out to uvx (the uv installer drops it in ~/.local/bin),
but the non-interactive SSH session doesn't source the user's profile, so
~/.local/bin was off PATH and downloads died with "uvx: command not found".
build_download_command now prepends $HOME/.local/bin. Adds test_download.py.
2026-06-18 16:44:07 -05:00
Keysat a20c538ebf docs: v0.27.0:0 live + shipped; record settings-gear architecture + snapshot-holder gotcha 2026-06-18 13:51:11 -05:00
Keysat 7e0759846f v0.27.0:0 - in-app settings gear + swap-lock route fix
Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks"
action (now just the 4 required fields) and into a dashboard ⚙ Settings gear,
backed by a /data/app_settings.json overlay keyed by env-var names. One shared
mutable Settings instance + Settings.reload() applies edits live without a
restart; existing installs' values migrate automatically on first boot.

Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now
configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the
/api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its
url/secret reload live too.
2026-06-18 13:41:28 -05:00
Keysat b67e001642 docs: v0.26.0:0 live + published to registry; surface Gemma-26B eval as next 2026-06-18 12:35:16 -05:00
Keysat df9f244eae v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup)
The dashboard menu is now the set of models actually downloaded on the
Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as
launch recipes matched to an on-disk model by repo; an on-disk model with
no recipe is flagged needs_setup and its launch settings are inferred from
its config.json for a one-time operator confirmation (discovery.py).

- delete now removes weights AND the menu card (delete_from_disk sweeps all
  hosts; the delete endpoint resolves keys via the live menu)
- new GET /api/models/suggest; /api/models returns the menu + a recipes list
  (download autocomplete); GET /api/models/disk-status removed
- dropped the two legacy Qwen recipes (235B FP8, 2.5 72B)
- tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)
2026-06-18 11:09:56 -05:00
Keysat c0b35184ba docs: trim Current state to live status — coordination epic shipped 2026-06-18 08:09:59 -05:00
Keysat 7ecd77f1e5 docs: defer raw-docker swap generalization — multi-node rationale recorded 2026-06-18 07:58:25 -05:00
Keysat 6bcda6e348 docs: v0.25.0:0 installed live — update Current state 2026-06-18 07:11:33 -05:00
Keysat 7ae6ab3ba8 v0.25.0:0 - cluster coordination layer (swap lock + webhook + schedule registry)
GPU-arbiter safety layer for when automation, not just the dashboard, swaps
models:
- swap reservation lock (POST/GET/DELETE /api/swap/lock); 423-enforced in
  post_swap via a single-read gate, TTL-bounded, secret-token auth, human
  force-release override + dashboard banner
- swap webhook (swap_complete/swap_failed) fired outside the swap lock, optional
  HMAC signature, configurable URL+secret
- read-only schedule registry (GET/POST/DELETE /api/schedule) + dashboard panel

New module image/app/coordination.py; docs/COORDINATION.md for consumers; 22
offline tests in test_coordination.py.
2026-06-18 07:07:08 -05:00
Keysat dd3d1412d4 docs: v0.24.0:0 committed/tagged/pushed — Gitea release asset + live install still pending 2026-06-17 23:11:14 -05:00
Keysat 26070eb191 v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor)
Make the cluster topology configurable so an adopter wired differently
(vLLM on both Sparks, port 8000, different container name, no Parakeet)
can monitor without forking. Covers the OpenClaw report P4/P5/#6.

- VLLM_CONTAINER override (default vllm_node), validated at the boundary
  and quote_arg-quoted into the swap log-tail + pre-flight validator exec.
- DISABLED_SERVICES list: hidden services show no tile and are skipped by
  status/deep-health/connectivity probes (kills the Parakeet-on-8000
  collision).
- kind: vllm custom service monitors a second Spark's vLLM via the shared
  probe_vllm_endpoint; /api/endpoints gains a disabled flag.

Swap mechanism intentionally not generalized to raw docker run (that's
coordination, roadmap item 4).
2026-06-17 23:03:33 -05:00
Keysat 90394f891b docs: v0.23.0 published, live install pending (mDNS); runbook sideload troubleshooting 2026-06-17 22:36:41 -05:00
Keysat e783653ef0 v0.23.0:0 - local / fine-tuned model support
Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.

- ModelDef gains local_path; a model must set exactly one of repo / local_path.
  The validator also enforces the local-path whitelist and that any
  --chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
  host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
  then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
  expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
  bad entry can't be written and then break catalog load; _merge_overrides skips
  an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
  link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
  exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
  pass; findings addressed.
2026-06-17 22:27:41 -05:00
Keysat 57a893000e docs: document the Gitea release ritual in startos-package guide 2026-06-17 21:29:27 -05:00
Keysat 56f7ea4444 fix: gitea-release.sh tolerate 404 on tag lookup; report HTTP errors; mark v0.22.0 published 2026-06-17 21:23:21 -05:00
Keysat aaad57d88f docs: mark v0.22.0:0 shipped + record Gitea-release distribution decision 2026-06-17 19:47:49 -05:00
Keysat 136a4713a1 v0.22.0:0 - configurable vllm port; gitea-release tooling; coexistence roadmap
- Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh
  default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env
  -> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix
  the "vLLM unreachable" health check without rebuilding the package.
- Harden numeric env parsing (config._env_int): a blank or malformed port now
  falls back to its default instead of crashing daemon startup (closes a P3
  tech-debt item; the Configure panel passes unset optional fields as "").
- Add scripts/gitea-release.sh + `make release` to publish the built s9pk to
  Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token
  instead of being hand-sent the package.
- Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a
  job runner" stance in ROADMAP.md and Current state.
2026-06-17 19:45:09 -05:00
Keysat c179389731 docs: trim Current state post-matrix-bridge ship; add bot-tile ops note to runbook 2026-06-15 23:18:28 -05:00
Keysat 9debeb4bbe v0.21.0:1 - tidy host display for port-less bot tile 2026-06-15 23:09:24 -05:00
Keysat 39f8410623 v0.21.0:0 - matrix-bridge bot tile (status, update, restart, logs) 2026-06-15 22:57:40 -05:00
44 changed files with 3948 additions and 424 deletions
+17 -6
View File
@@ -33,7 +33,7 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI). - `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes. - `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
- `docs/``AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md` (consumer-facing API refs; update with API changes). - `docs/``AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`, `COORDINATION.md` (consumer-facing API refs; update with API changes).
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up). - `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
## Conventions ## Conventions
@@ -55,11 +55,22 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state ## Current state
- **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`. - **Live: v0.27.3:0 — Qwen3.6 vision works end-to-end (incl. full-size phone photos).** Installed on `immense-voyage` (`start-cli` confirms `0.27.3:0`). Two-part story: **(A) the daily driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — recipe was mislabelled `[reasoning]`, now `[vision, reasoning]`. Real business card read **7/7 fields perfect** (~97 tok/s, no patches). **(B) oversized-image fix:** a 12MP phone photo expands to ~11.8k vision tokens → exceeds vLLM's ~4096-image-token cap → **400 "Failed to apply Qwen3VLProcessor … token count mismatch."** Fix = cap resolution server-side via `'--mm-processor-kwargs={"max_pixels": 2000000}'` in the qwen36 recipe (auto-downscales big images for *every* `/v1` consumer; verified live — the 12MP image went 400→200). Quoting survives the stack because `launch-cluster.sh` does `printf "%q"` on the serve args (line 163) and `build_launch_command` shlex-quotes (round-trip test passes). **An in-dashboard "Vision check" button shipped in v0.27.2 then was removed in v0.27.3 at the owner's request** (clutter; the `vision` badge already signals capability — don't re-add it). The `/v1/chat/completions` proxy is a dumb passthrough that already forwards image content, so no backend change was needed. 161 pytest green.
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (65 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, and the `shellsafe` validators. Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts. - **Gemma-4-26B-A4B-NVFP4 eval — RESOLVED as "defer; Qwen covers vision better."** Two independent deep-research agents (this session) confirmed: it does NOT run on the stock `eugr/spark-vllm-docker` stack (crashes on `tie_weights` `NotImplementedError` — the checkpoint declares compressed-tensors in config.json but is modelopt NVFP4). The working path needs the **`vllm/vllm-openai:gemma4-0505-arm64-cu130`** image (lacks Ray → can't go through `launch-cluster.sh`, needs **raw `docker run`** = the deferred raw-docker-swap feature) **+ a bind-mounted patched `gemma4.py`** (upstream PR #39084 unmerged) **+ `--moe-backend marlin`**, AND even then **vision is degraded** by open vLLM bug #40106 (wrong attention on image tokens — hurts OCR specifically). ~52 tok/s vs Qwen's 97. Net: more duct tape for worse vision than the Qwen Grant already runs. Revisit when #40106 + #39084 land. Alternatives agent also flagged **`RedHatAI/Qwen3.5-122B-A10B-NVFP4`** as the proven single-Spark *reasoning* step-up (3051 tok/s, fits 128 GB, no patches) — a future daily-driver upgrade, orthogonal to vision.
- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage``start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
- **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
- **Gemma-4-26B-A4B vision eval — DONE this session (deferred; see the v0.27.2 + Gemma bullets up top).** The `gemma4-26b` recipe stays in the catalog but is known not to launch on the stock stack; the owner's vision/OCR goal is met by the Qwen3.6 daily driver instead.
- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (157 passing; the in-app settings gear + swap-lock route-order regression + the webhook-repoint live-reload check are in `test_app_settings.py`, incl. `TestClient` end-to-end). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee. - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector. - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers. - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag. - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) - **Hosting / distribution:** source on self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) The s9pk ships via Gitea Releases (`make release`) **and** a self-hosted StartOS registry — operator-local publish tooling lives outside the repo; owner-specific addresses + the **authenticated-writes-must-be-direct-not-via-the-tunnel** gotcha are in session memory.
- **Next:** (1) audio concurrency sweep — only if the Signal Engine dev wants the measured knee; needs owner OK in a quiet window. (2) Otherwise pull from `ROADMAP.md`: local-path/fine-tuned model support (new) or P2 tech-debt. Parakeet long-audio guard is deferred (rationale in ROADMAP). - **Design stance (decided):** Spark Control = control plane / GPU arbiter, **not** a job runner; recurring business jobs live in separate services that *call* the swap API (`POST /api/swap`). Full epic history (v0.22→v0.25) is in git log + `ROADMAP.md` → "Cluster coordination".
- **Usage note (2026-06-18):** owner's daily driver is the solo **Qwen3.6 35B**; the 235B `cluster` models are dormant. Keeping `launch-cluster.sh` (the `eugr/spark-vllm-docker` community standard, mirrors NVIDIA's `dgx-spark-playbooks` Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
- **Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in `ROADMAP.md`):** (1) raw-`docker run` swap generalization — **DEFERRED** (rationale in ROADMAP; revisit only if an adopter wants Spark Control to *drive*, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting `launch-cluster.sh`). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker `HEALTHCHECK` — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).
+5 -6
View File
@@ -73,16 +73,15 @@ The first start generates an ed25519 SSH keypair inside the package volume. Wait
### 4. Configure Sparks ### 4. Configure Sparks
- Open Spark Control → **Actions → Configure Sparks**. - Open Spark Control → **Actions → Configure Sparks**.
- Fill in: - Fill in just the four required fields:
- **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first. - **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first.
- **Spark 1 SSH user** — whatever username you set up on Spark 1. - **Spark 1 SSH user** — whatever username you set up on Spark 1.
- **Spark 2 hostname or IP** + **SSH user** — same idea. - **Spark 2 hostname or IP** + **SSH user** — same idea.
- Optional Parakeet/Kokoro overrides — leave blank if those services run on Spark 2 (the normal case).
- Optional **Open WebUI URL** — paste your Open WebUI LAN URL to get a deep-link button in the dashboard next to the current model.
- Optional **NGC API key** — paste it here if you have one.
Save. Save.
Everything else is optional and lives in the dashboard, not this action: open Spark Control and click **⚙ Settings** in the top bar to set vLLM/service **ports** (e.g. if your vLLM runs on 8000 rather than the default 8888, or you moved Parakeet off 8000), container names, support-service hosts, an **Open WebUI URL** (adds a deep-link button), an **NGC API key**, and a swap webhook. Changes there apply immediately and are included in StartOS backups.
### 5. Re-run Show Public Key (if you skipped earlier) ### 5. Re-run Show Public Key (if you skipped earlier)
Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3. Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
@@ -92,7 +91,7 @@ Now that hosts are configured, Show Public Key will give you the paste-ready ins
From the Spark Control service page, click the Web UI button. You should see: From the Spark Control service page, click the Web UI button. You should see:
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh). - A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
- An **LLM tab** with cards for each model in the bundled catalog. Models you've downloaded show "on disk" badges; others show "not downloaded". - An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes.
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro. - An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**. If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
@@ -159,7 +158,7 @@ All of these inherit Spark Control's TLS cert and StartOS access controls. You o
A few things worth knowing: A few things worth knowing:
- The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated. - The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated.
- **All connection info** comes from environment variables in `image/app/config.py`, populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action. No IPs, usernames, or paths are hardcoded in runtime code. - **All connection info** comes from environment variables in `image/app/config.py`. The four required fields are populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action; the optional knobs are overlaid from the in-app `⚙ Settings` store (`/data/app_settings.json`, see `image/app/app_settings.py`). No IPs, usernames, or paths are hardcoded in runtime code.
- The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it. - The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it.
- **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates. - **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates.
- The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips. - The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips.
+3 -3
View File
@@ -112,14 +112,14 @@ Fields: `service` (required), `ok` (required), `source` (optional, free-form), `
## Status ## Status
**v0.2.3 / s9pk version 0.13.0:4** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI. **s9pk version 0.26.0:0** — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled *launch recipes* (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.
### What v0.2 added on top of v0.1 ### What v0.2 added on top of v0.1
- **Service discovery API** (`/api/endpoints`) for other LAN services - **Service discovery API** (`/api/endpoints`) for other LAN services
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story. - **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host configuration in Configure Sparks (so they can live on Spark 1, Spark 2, or anywhere) - **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host/port/container configuration in the in-app **⚙ Settings** gear (so they can live on Spark 1, Spark 2, or anywhere, on any port)
- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled. - **Model download** from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log - **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike. - **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows - **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
+28 -1
View File
@@ -2,6 +2,34 @@
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up. Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap``GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
Sequenced:
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
4. **Coordination layer** — DONE in tree, staged as **v0.25.0:0** (built/typechecked clean; install pending). All three primitives shipped; `image/app/coordination.py` + `docs/COORDINATION.md`. Brought forward 2026-06-17 on request rather than waiting for our own automation.
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). Acquire returns a secret token; the swap endpoint refuses any real swap (`423`) that doesn't present it in `X-Swap-Lock-Token`, so the dashboard's manual swap is paused while a scheduler holds it (with a `?force=true` human override). In-memory + TTL-bounded → resets to unlocked on restart; re-acquire with the token extends. Enforced in `post_swap`, not advisory.
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL (Configure-Sparks field), fired from `SwapManager._run` *outside* the swap lock; optional shared secret ⇒ `X-Spark-Signature` HMAC. Fire-and-forget (5 s, no retries); dry runs don't fire.
- **Schedule visibility** — `GET/POST/DELETE /api/schedule`; read-only "Scheduled jobs" dashboard panel, registered by external schedulers. Spark Control stores and displays, never executes.
- Tests: `image/tests/test_coordination.py` (22 cases — lock lifecycle/expiry/token, the single-read swap gate, schedule CRUD + id validation, webhook payload+signature). Known limit: lock + schedules are in-memory (a restart frees the lock and empties the registry until schedulers re-register) — persist to `/data` only if that bites.
### Generalizing the swap mechanism to raw `docker run` — DEFERRED (decided 2026-06-18, research-backed; was item 4's last open thread)
Our swap drives `~/spark-vllm-docker/launch-cluster.sh` over SSH on Spark 1 (`./launch-cluster.sh stop`, then `[VLLM_SPARK_EXTRA_DOCKER_ARGS=…] ./launch-cluster.sh [--solo ]-d exec vllm serve <model> <args>`, then `docker logs -f` until the ready marker). The OpenClaw adopter launches vLLM with a plain `docker run` instead, so the swap button can't drive his cluster — only monitor it. The portability fix would be a configurable "swap backend": keep `launch-cluster.sh` as the default and add a "bring your own command" mode (operator-authored stop/launch templates in `services-overrides.yaml` with quoted `{model}`/`{container}`/`{port}`/`{extra_args}` substitution; ready-detection unchanged; the vLLM-argparse pre-flight disabled for that backend).
**Why deferred, not built:**
- **Raw docker is not an upgrade for *us* — for half our catalog it's impossible.** `launch-cluster.sh` is the `eugr/spark-vllm-docker` community project (de-facto DGX Spark standard; mirrors NVIDIA's own `dgx-spark-playbooks` Ray+RDMA architecture). Its headline job is **multi-node** serving: our 235B `cluster` models (Qwen3-VL 235B, Qwen3 235B) exceed one Spark's 128 GB and *must* shard across both Sparks via Ray over the 200 Gbps ConnectX/RoCE link — plumbing (NCCL/MTU/per-node env) that a single-node `docker run` cannot do. So we keep the helper script; switching our own cluster to raw docker is off the table.
- **The feature is therefore portability-only** (for differently-wired adopters), and the one known adopter doesn't need it — he swaps via his own crons and uses Spark Control to watch.
- **Untestable on our hardware** — our cluster uses the helper script, so we can't validate a real raw-docker swap without risking the live vLLM.
- The one real standing risk is eugr's single-maintainer status; fallback is community forks or migrating to NVIDIA's official `dgx-spark-playbooks` launcher (same design). No reason to switch now.
**Revisit only if** an adopter explicitly wants Spark Control to *drive* (not just monitor) swaps on a raw-`docker run` cluster. At that point, get their actual working `docker run` command and build the command-template backend to it.
## Near term ## Near term
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files. - parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee. - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
@@ -19,7 +47,6 @@ Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE
- Second audio worker / queueing layer; revisit which services share Spark 2. - Second audio worker / queueing layer; revisit which services share Spark 2.
## Dashboard ## Dashboard
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (`models.yaml` + custom overrides) and the "Add custom model" path (`POST /api/models`) only accepts an HF `org/name` repo (`shellsafe._HF_REPO_RE`), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the `org/name` regex (path whitelist + `shlex.quote`, no traversal); (b) `models.build_launch_command` / `launch-cluster.sh` able to `vllm serve <path>`; (c) `disk.py` size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild). - Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
- Spark host update actions (OS/driver) from the UI. - Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views. - Open WebUI link-out integration; richer per-service detail views.
+157
View File
@@ -0,0 +1,157 @@
# Cluster coordination through Spark Control (v0.25.0)
Spark Control is the **GPU arbiter, not a job runner.** Your recurring pipelines
(model-warming crons, "daily X" generators, batch jobs) live in your own
services and *drive Spark Control's swap API*. This page documents the safety
layer around that: a **swap reservation lock**, a **swap-event webhook**, and a
**read-only schedule registry**.
If only the dashboard ever swaps models, you don't need any of this — it's for
when something automated also swaps.
All endpoints are on the Spark Control host (same LAN/VPN URL as the LLM, audio,
and embeddings proxies). There is no API-token auth by design (LAN + split-tunnel
VPN only); a non-browser client passes the same-origin guard automatically.
---
## 1. Swap reservation lock
A short, TTL-bounded reservation of the swap path. While a lock is held, **any
real swap that doesn't present the holder's token is refused with `423 Locked`**
— including the dashboard's manual swap. The holder *name* is descriptive; the
returned **token** is the secret that authorises swaps and the release.
The lock is in-memory: it resets to *unlocked* if Spark Control restarts (the
safe-for-availability default), and the swap engine's own in-progress guard
still prevents two swaps running at once.
### `POST /api/swap/lock` — acquire (or extend)
```json
// request
{ "holder": "openclaw-daily-vol", "ttl_seconds": 900, "note": "daily vol run" }
// 200 response
{
"held": true,
"holder": "openclaw-daily-vol",
"acquired_at": "2026-06-17T12:00:00+00:00",
"expires_at": "2026-06-17T12:15:00+00:00",
"seconds_remaining": 900,
"note": "daily vol run",
"token": "a1b2c3…" // SECRET — store it; needed to swap and to release
}
```
- `ttl_seconds` is optional (default 900) and clamped to `[1, 86400]`.
- **`409`** if a *different* holder already holds it (body includes the current
`lock` state). To **extend** your own lock, POST again with the same `holder`
**and** your `token` — the token is preserved and the window slides forward.
### `GET /api/swap/lock` — status (no token)
```json
{ "held": true, "holder": "openclaw-daily-vol", "expires_at": "…", "seconds_remaining": 612, "note": "…" }
// or
{ "held": false }
```
### `DELETE /api/swap/lock` — release
Send your token in the `X-Swap-Lock-Token` header (or `?token=`):
```
DELETE /api/swap/lock
X-Swap-Lock-Token: a1b2c3…
```
- **`403`** if the token doesn't match. The dashboard's human override is
`DELETE /api/swap/lock?force=true` (no token).
### Swapping while you hold the lock
Pass the token on the swap call; the dashboard (no token) is then blocked:
```
POST /api/swap
X-Swap-Lock-Token: a1b2c3…
{ "model_key": "gemma-3-27b" }
```
Recommended scheduler flow: **acquire → swap (with token) → poll `/api/swap/{id}`
→ release**. Always release in a `finally`; if you crash, the TTL frees it.
> `POST /api/swap/{key}/validate` (pre-flight) and dry-run swaps are **not**
> blocked by the lock — they don't touch the cluster.
---
## 2. Swap-event webhook
Configure a URL in **Configure Sparks → "Swap webhook URL"**. After every real
swap, Spark Control POSTs:
```json
{
"event": "swap_complete", // or "swap_failed"
"job_id": "1a2b3c4d",
"model_key": "gemma-3-27b",
"state": "ready", // or "failed"
"returncode": 0,
"started_at": "2026-06-17T12:00:00+00:00",
"finished_at": "2026-06-17T12:03:11+00:00",
"dry_run": false
}
```
Headers: `X-Spark-Event: swap_complete`. If you set a **webhook secret**, the
body is signed: `X-Spark-Signature: sha256=<hmac>` (HMAC-SHA256 of the raw body
with the shared secret). Verify it like:
```python
import hmac, hashlib
expected = "sha256=" + hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest()
assert hmac.compare_digest(expected, request.headers["X-Spark-Signature"])
```
Delivery is best-effort and fire-and-forget (5 s timeout, no retries) — a
webhook failure never affects the swap itself. Dry runs don't fire.
---
## 3. Schedule registry (read-only display)
So the dashboard can show *what's scheduled to touch the GPU and when*, your
schedulers register their jobs here. **Spark Control only displays these — it
never executes them.**
### `POST /api/schedule` — register / update
```json
// request (pass a stable `id` to update in place on re-register)
{ "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
"cron": "0 6 * * *", "next_run": "2026-06-18T06:00:00Z",
"description": "Swaps to the big model, generates the vol report" }
// response: the stored entry (generates an id if you omit one)
```
`name` is required; `id` (if given) must match `[A-Za-z0-9_.-]` (≤64 chars).
### `GET /api/schedule` — list
```json
{ "schedules": [ { "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
"cron": "0 6 * * *", "next_run": "…", "description": "…",
"registered_at": "…", "updated_at": "…" } ] }
```
### `DELETE /api/schedule/{id}` — deregister
```json
{ "deleted": true }
```
The registry is in-memory — re-register your schedules on your own startup so
they survive a Spark Control restart.
+4 -1
View File
@@ -35,10 +35,13 @@ Two kinds, both run with the `image/.venv` interpreter (system python3 has no de
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes. - New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant. - **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers). - **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
- **Settings split (gear vs StartOS action):** only the four *required* fields (both Spark IPs + SSH users) live in the StartOS "Configure Sparks" action → `config.yaml` → env. Every *optional* knob (ports, container names, support-service hosts, integrations, webhook) is edited in the dashboard's ⚙ Settings gear, backed by the `/data/app_settings.json` overlay (`app_settings.py`), keyed by the same env-var names. Precedence (`config._effective_env`): `os.environ` first, overlay on top. `app_settings.seed_from_env` runs **once at startup** to migrate a pre-gear install's env values into the overlay (don't move seeding into `from_env`/`reload` — it writes, and `from_env` runs on every build → it would clobber across calls, which it did once already). **`Settings` is deliberately not frozen:** one shared instance is threaded by reference into every router closure/manager, and `Settings.reload()` (called after a gear save) recomputes its fields **in place** so changes apply live with no restart and no call-site changes. **Gotcha:** this only reaches holders that keep the *object* (`self.settings = settings`); anything that snapshots a *value* at construction is invisible to `reload()` and must be re-synced explicitly. The one such holder is `WebhookNotifier`, which copies `url`/`secret``post_settings` calls `swap_webhook.update(...)` right after `reload()`. Any future component that caches a gear-managed value (rather than reading `settings.x` at use time) needs the same treatment. A new gear knob = add one entry to `app_settings.FIELDS` (the front-end renders it generically); the matching `config.Settings` field must already read that env var.
## Layout ## Layout
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …). - `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
- `image/app/discovery.py` — the disk-driven model menu. `/api/models` lists what's actually downloaded on the Sparks (via `disk.list_cached_models`); `models.yaml`/overrides are *launch recipes* matched by repo, not the menu. An on-disk model with no recipe is `needs_setup``infer_recipe` reads its `config.json` to prefill a setup form the operator confirms once.
- `image/app/app_settings.py` — the in-app settings overlay backing the ⚙ gear: `FIELDS` metadata (drives `/api/settings` + the UI form), `load_overlay()` (pure read), `seed_from_env()` (one-time migration), `apply()` (validate + persist). `GET/POST /api/settings` in `server.py` read/write it, then `settings.reload()`.
- `image/app/static/` — the dashboard UI. - `image/app/static/` — the dashboard UI.
- `image/models.yaml`vLLM model catalog bundled into the image. - `image/models.yaml`bundled vLLM **launch recipes** (how to launch a known model), NOT the dashboard menu — the menu is the on-disk scan.
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats). - `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
+16
View File
@@ -25,6 +25,22 @@ npm run prettier # prettier --write startos (no semicolons, single quotes, tra
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`). - Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs). - New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
## Releasing to Gitea
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
```bash
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
cd package && make x86 # build
# 2. commit + push the source change
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
make install # optional: sideload to your own server (restarts it — go/no-go)
# 3. publish the s9pk as a release asset (needs a write-scoped token):
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
```
`make release``scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
## Layout ## Layout
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes). - `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
+286
View File
@@ -0,0 +1,286 @@
"""App-owned settings overlay: the in-dashboard 'gear' knobs.
Spark Control's *required* wiring — the two Spark IPs and SSH users — is set once
via the StartOS "Configure Sparks" action and arrives as env vars. Everything
else (ports, container names, support-service hosts, integrations, webhook) is
optional and lives here: a small JSON overlay on /data that the dashboard gear
reads and writes, so an operator never has to open StartOS actions to tune the
cluster. This follows the StartOS 0.4 convention (minimal setup action; routine
config in the app's own UI) and stays inside the package's backup volume, so the
file is backed up and restored for free.
Each overlay entry is keyed by the *same env var name* config.Settings already
reads, so the overlay is simply an env-var override store. Precedence (see
config._effective_env): process env first, this overlay on top — so a knob set
in the gear wins, while an un-touched knob falls through to whatever the StartOS
action injected, then to the code default.
First-run migration: when the overlay file doesn't exist yet (e.g. an existing
install upgrading into this version), it's seeded from the current env so any
value previously set via the StartOS action carries over into the gear with no
operator action and nothing lost.
"""
from __future__ import annotations
import json
import logging
import os
import re
import tempfile
from pathlib import Path
from typing import Mapping
log = logging.getLogger(__name__)
# Field metadata drives BOTH the /api/settings response (the front-end renders
# the form generically from this) and light server-side validation. `key` is the
# env var name; `type` is one of text|int|csv|secret. `secret` values are
# write-only — never echoed back to the browser.
FIELDS: list[dict] = [
# --- vLLM (Spark 1) ---
{"group": "vLLM (Spark 1)", "key": "VLLM_PORT", "label": "vLLM port", "type": "int",
"placeholder": "8888",
"help": "Port your vLLM listens on. Blank ⇒ 8888 (the bundled launch-cluster.sh). Set 8000 for vanilla vLLM, or wherever yours listens."},
{"group": "vLLM (Spark 1)", "key": "VLLM_CONTAINER", "label": "vLLM container name", "type": "text",
"placeholder": "vllm_node",
"help": "Docker container the swappable vLLM runs in. Blank ⇒ vllm_node. The swap log-tail and pre-flight validator exec into it by name."},
# --- Monitoring ---
{"group": "Monitoring", "key": "DISABLED_SERVICES", "label": "Services to hide", "type": "csv",
"placeholder": "e.g. parakeet,kokoro",
"help": "Comma-separated built-in services your cluster doesn't run, so their tiles are hidden and never probed. Valid: parakeet, kokoro, embeddings, qdrant. Blank ⇒ monitor all."},
# --- Parakeet (STT) ---
{"group": "Parakeet (STT)", "key": "PARAKEET_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Parakeet STT container. Blank ⇒ Spark 2."},
{"group": "Parakeet (STT)", "key": "PARAKEET_PORT", "label": "Port", "type": "int",
"placeholder": "8000",
"help": "Port Parakeet listens on. Blank ⇒ 8000. Set this if you remapped it (e.g. because your vLLM holds 8000)."},
{"group": "Parakeet (STT)", "key": "PARAKEET_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "parakeet-asr",
"help": "Docker container name for Parakeet. Blank ⇒ parakeet-asr."},
{"group": "Parakeet (STT)", "key": "PARAKEET_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Parakeet container. Blank ⇒ your Spark 2 user."},
# --- Kokoro (TTS) ---
{"group": "Kokoro (TTS)", "key": "KOKORO_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Kokoro TTS container. Blank ⇒ Spark 2."},
{"group": "Kokoro (TTS)", "key": "KOKORO_PORT", "label": "Port", "type": "int",
"placeholder": "8880",
"help": "Port Kokoro listens on. Blank ⇒ 8880."},
{"group": "Kokoro (TTS)", "key": "KOKORO_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "kokoro-tts",
"help": "Docker container name for Kokoro. Blank ⇒ kokoro-tts."},
{"group": "Kokoro (TTS)", "key": "KOKORO_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Kokoro container. Blank ⇒ your Spark 2 user."},
# --- Embeddings ---
{"group": "Embeddings", "key": "EMBED_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the spark-embed container (bge-m3 + reranker). Blank ⇒ Spark 2."},
{"group": "Embeddings", "key": "EMBED_PORT", "label": "Port", "type": "int",
"placeholder": "8088",
"help": "Port the embedding server listens on. Blank ⇒ 8088."},
{"group": "Embeddings", "key": "EMBED_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "spark-embed",
"help": "Docker container name for the embedding server. Blank ⇒ spark-embed."},
{"group": "Embeddings", "key": "EMBED_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the embedding container. Blank ⇒ your Spark 2 user."},
# --- Qdrant ---
{"group": "Qdrant", "key": "QDRANT_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Qdrant vector database. Blank ⇒ Spark 2."},
{"group": "Qdrant", "key": "QDRANT_PORT", "label": "Port", "type": "int",
"placeholder": "6333",
"help": "Port Qdrant's REST API listens on. Blank ⇒ 6333."},
{"group": "Qdrant", "key": "QDRANT_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "qdrant",
"help": "Docker container name for Qdrant. Blank ⇒ qdrant."},
{"group": "Qdrant", "key": "QDRANT_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Qdrant container. Blank ⇒ your Spark 2 user."},
{"group": "Qdrant", "key": "QDRANT_COLLECTION", "label": "Default collection", "type": "text",
"placeholder": "e.g. crm_chunks",
"help": "Collection used by /api/search when a request doesn't name one. Blank ⇒ callers must pass a collection."},
# --- Integrations ---
{"group": "Integrations", "key": "OPEN_WEBUI_URL", "label": "Open WebUI URL", "type": "text",
"placeholder": "e.g. https://open-webui.yourserver.local",
"help": "If set, the header shows a one-click 'Open chat' button to your Open WebUI."},
{"group": "Integrations", "key": "MATRIX_BRIDGE_USER", "label": "matrix-bridge bot SSH user", "type": "text",
"placeholder": "e.g. modelo",
"help": "SSH user owning the bot's ~/matrix-bridge clone (Spark 2). Set this to show the bot tile (update/restart/logs). Blank ⇒ tile hidden."},
{"group": "Integrations", "key": "NGC_API_KEY", "label": "NGC API key", "type": "secret",
"placeholder": "starts with nvapi-…",
"help": "NVIDIA NGC personal key, needed only to install NIM containers from nvcr.io. Stored on this server."},
{"group": "Integrations", "key": "SWAP_WEBHOOK_URL", "label": "Swap webhook URL", "type": "text",
"placeholder": "e.g. https://my-service.local/spark-swap",
"help": "POSTed a small JSON event (swap_complete / swap_failed) after every model swap, so automation can re-point to the new model. Blank ⇒ disabled."},
{"group": "Integrations", "key": "SWAP_WEBHOOK_SECRET", "label": "Swap webhook secret", "type": "secret",
"placeholder": "a random shared string",
"help": "If set, each webhook is HMAC-signed (X-Spark-Signature) so the receiver can verify it. Blank ⇒ unsigned."},
]
_BY_KEY = {f["key"]: f for f in FIELDS}
_SECRET_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "secret")
_INT_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "int")
# Reject control characters (incl. newlines) — these values flow into env vars,
# URLs, and SSH command lines (quoted at the sink, but defence in depth).
_BAD_CHARS = re.compile(r"[\x00-\x1f\x7f]")
# A secret's value is never echoed back, so a blank submit means "keep the stored
# one" (you can't see it to retype it). To actually *remove* a stored secret the
# UI sends this sentinel instead of a real value. Surfaced to the front-end via
# public_view so the two stay in sync.
CLEAR_SENTINEL = "__clear__"
def _path() -> Path:
return Path(os.environ.get("APP_SETTINGS_FILE", "/data/app_settings.json"))
def field_keys() -> frozenset[str]:
return frozenset(_BY_KEY)
def load_overlay() -> dict[str, str]:
"""Return the overlay as {ENV_KEY: value}, filtered to known, non-empty keys.
Pure read (no side effects) — called on every Settings (re)build, so it must
not write. Missing/corrupt file ⇒ {}. The file is tiny."""
p = _path()
if not p.exists():
return {}
try:
raw = json.loads(p.read_text())
except (ValueError, OSError) as e:
log.warning("ignoring unreadable %s: %s", p, e)
return {}
if not isinstance(raw, dict):
return {}
return {k: str(v) for k, v in raw.items() if k in _BY_KEY and v not in (None, "")}
def seed_from_env(env: Mapping[str, str]) -> None:
"""One-time migration, called once at startup: if no overlay exists yet, seed
it from the current env so any optional value previously set via the StartOS
action carries into the gear automatically (nothing lost on upgrade). No-op
if the file already exists or the env carries no known non-empty knob — a
fresh install then starts with no overlay and pure defaults. Values run
through the same validation as apply(); a malformed one (e.g. a paste-error
port) is skipped rather than written, matching the gear's own guards."""
if _path().exists():
return
seeded: dict[str, str] = {}
for k in _BY_KEY:
v = env.get(k)
if not v:
continue
try:
cleaned = _validate(k, v)
except SettingsError as e:
log.warning("skipping invalid env value while seeding overlay: %s", e)
continue
if cleaned and cleaned != CLEAR_SENTINEL:
seeded[k] = cleaned
if seeded:
_write(seeded)
log.info("seeded settings overlay from env (%d keys): %s", len(seeded), _path())
def _write(overlay: dict[str, str]) -> None:
p = _path()
p.parent.mkdir(parents=True, exist_ok=True)
# Atomic replace so a crash mid-write never leaves a truncated overlay.
fd, tmp = tempfile.mkstemp(dir=str(p.parent), prefix=".app_settings.", suffix=".tmp")
try:
with os.fdopen(fd, "w") as fh:
json.dump(overlay, fh, indent=2, sort_keys=True)
os.replace(tmp, p)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
def public_view() -> dict:
"""Shape the gear form for the browser: ordered groups of fields with their
current overlay value. Secret values are never sent — only a `set` flag."""
overlay = load_overlay()
groups: list[dict] = []
index: dict[str, dict] = {}
for f in FIELDS:
g = index.get(f["group"])
if g is None:
g = {"name": f["group"], "fields": []}
index[f["group"]] = g
groups.append(g)
entry = {
"key": f["key"],
"label": f["label"],
"type": f["type"],
"placeholder": f.get("placeholder", ""),
"help": f.get("help", ""),
}
if f["type"] == "secret":
entry["set"] = bool(overlay.get(f["key"]))
else:
entry["value"] = overlay.get(f["key"], "")
g["fields"].append(entry)
return {"groups": groups, "clear_sentinel": CLEAR_SENTINEL}
class SettingsError(ValueError):
"""Bad input to apply() — surfaced as 422 by the endpoint."""
def _validate(key: str, value) -> str:
"""Clean + validate one value; raise SettingsError on bad input. Returns the
stripped string ('' is valid and means 'unset'). The CLEAR_SENTINEL passes
through for the caller to interpret (secret removal)."""
if key not in _BY_KEY:
raise SettingsError(f"unknown setting: {key}")
val = ("" if value is None else str(value)).strip()
if val == CLEAR_SENTINEL:
return val
if _BAD_CHARS.search(val):
raise SettingsError(f"{key}: control characters are not allowed")
if key in _INT_KEYS and val:
if not val.isdigit() or not (1 <= int(val) <= 65535):
raise SettingsError(f"{key}: must be a port number between 1 and 65535")
return val
def apply(updates: Mapping[str, str]) -> dict[str, str]:
"""Validate `updates` and merge them into the overlay, then persist.
Rules per key:
- unknown key / bad int / control chars → reject (422, via _validate)
- secret + CLEAR_SENTINEL → delete the stored secret
- secret + blank value → leave the stored secret unchanged (don't wipe)
- non-secret + blank → delete the key (revert to env/default)
- otherwise → set the key
Returns the new overlay. The caller reloads Settings so the change goes live.
"""
overlay = load_overlay()
for key, value in updates.items():
val = _validate(key, value)
if key in _SECRET_KEYS:
if val == CLEAR_SENTINEL:
overlay.pop(key, None)
elif val:
overlay[key] = val
# blank secret ⇒ leave the existing value in place
elif val and val != CLEAR_SENTINEL:
overlay[key] = val
else:
overlay.pop(key, None)
_write(overlay)
return overlay
+137 -34
View File
@@ -1,11 +1,54 @@
from __future__ import annotations from __future__ import annotations
import logging
import os import os
from dataclasses import dataclass from dataclasses import dataclass, fields
from pathlib import Path from pathlib import Path
from typing import Mapping
from . import app_settings
from .shellsafe import validate_container
log = logging.getLogger(__name__)
def _env(name: str, default: str = "") -> str: def _env(src: Mapping[str, str], name: str, default: str = "") -> str:
return os.environ.get(name, default) return src.get(name, default)
def _env_container(src: Mapping[str, str], name: str, default: str) -> str:
"""Resolve a container-name env var, validating it at the config boundary.
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
the sink — but per the repo's two-layer convention it's also whitelist-checked
here. A malformed optional value falls back to `default` rather than crashing
daemon startup (mirrors `_env_int`)."""
val = src.get(name, "") or default
try:
return validate_container(val)
except ValueError:
log.warning("ignoring invalid %s=%r; using %r", name, val, default)
return default
def _env_set(src: Mapping[str, str], name: str) -> frozenset[str]:
"""Parse a comma-separated env var into a lowercased frozenset of keys.
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
support service can switch its tile + probes off entirely (rather than have
the probe hit whatever else listens on that port — e.g. a vLLM sharing
Parakeet's default 8000)."""
raw = src.get(name, "")
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
def _env_int(src: Mapping[str, str], name: str, default: int) -> int:
"""Parse an int env var, falling back to `default` when unset, blank, or
malformed. Optional numeric fields arrive as an empty string when left blank,
so a bare int("") would crash daemon startup."""
try:
return int(src.get(name, "") or default)
except (TypeError, ValueError):
return default
def _resolve_models_yaml() -> str: def _resolve_models_yaml() -> str:
@@ -23,8 +66,23 @@ def _resolve_models_yaml() -> str:
return str(candidates[0]) # let load fail with a clear path return str(candidates[0]) # let load fail with a clear path
@dataclass(frozen=True) def _effective_env() -> dict[str, str]:
"""The env Settings is built from: process env first, the in-app settings
overlay on top. The overlay (the dashboard 'gear') is keyed by the same env
var names, so a knob set in the UI overrides the value the StartOS action
injected — while an un-touched knob keeps falling through to the action's
value, then to the code default. See app_settings."""
return {**os.environ, **app_settings.load_overlay()}
@dataclass
class Settings: class Settings:
# NOTE: intentionally NOT frozen. There is exactly one Settings instance,
# shared by reference across every router closure and manager (build_router,
# self.settings = settings). `reload()` mutates it in place so a change saved
# via the in-app settings gear goes live for all of them without rebuilding
# the app — the only window of inconsistency is the microseconds it takes to
# reassign the fields, acceptable for a single-operator config save.
spark1_host: str spark1_host: str
spark1_user: str spark1_user: str
spark2_host: str spark2_host: str
@@ -42,12 +100,19 @@ class Settings:
qdrant_user: str qdrant_user: str
qdrant_container: str qdrant_container: str
qdrant_collection: str qdrant_collection: str
matrix_bridge_host: str
matrix_bridge_user: str
matrix_bridge_container: str
matrix_bridge_dir: str
matrix_bridge_branch: str
redaction_map_db: str redaction_map_db: str
redaction_map_ttl: int redaction_map_ttl: int
ssh_key_path: str ssh_key_path: str
ssh_known_hosts: str ssh_known_hosts: str
models_yaml: str models_yaml: str
vllm_port: int vllm_port: int
vllm_container: str
disabled_services: frozenset[str]
parakeet_port: int parakeet_port: int
kokoro_port: int kokoro_port: int
embed_port: int embed_port: int
@@ -55,48 +120,86 @@ class Settings:
bind_port: int bind_port: int
open_webui_url: str open_webui_url: str
ngc_api_key: str ngc_api_key: str
swap_webhook_url: str
swap_webhook_secret: str
@classmethod @classmethod
def from_env(cls) -> "Settings": def from_env(cls, src: Mapping[str, str] | None = None) -> "Settings":
spark2_host = _env("SPARK2_HOST") src = _effective_env() if src is None else src
spark2_user = _env("SPARK2_USER") spark2_host = _env(src, "SPARK2_HOST")
spark2_user = _env(src, "SPARK2_USER")
# Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden. # Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden.
return cls( return cls(
spark1_host=_env("SPARK1_HOST"), spark1_host=_env(src, "SPARK1_HOST"),
spark1_user=_env("SPARK1_USER"), spark1_user=_env(src, "SPARK1_USER"),
spark2_host=spark2_host, spark2_host=spark2_host,
spark2_user=spark2_user, spark2_user=spark2_user,
parakeet_host=_env("PARAKEET_HOST") or spark2_host, parakeet_host=_env(src, "PARAKEET_HOST") or spark2_host,
parakeet_user=_env("PARAKEET_USER") or spark2_user, parakeet_user=_env(src, "PARAKEET_USER") or spark2_user,
parakeet_container=_env("PARAKEET_CONTAINER") or "parakeet-asr", parakeet_container=_env(src, "PARAKEET_CONTAINER") or "parakeet-asr",
kokoro_host=_env("KOKORO_HOST") or spark2_host, kokoro_host=_env(src, "KOKORO_HOST") or spark2_host,
kokoro_user=_env("KOKORO_USER") or spark2_user, kokoro_user=_env(src, "KOKORO_USER") or spark2_user,
kokoro_container=_env("KOKORO_CONTAINER") or "kokoro-tts", kokoro_container=_env(src, "KOKORO_CONTAINER") or "kokoro-tts",
# Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant # Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant
# (vector storage) default to Spark 2 unless overridden. # (vector storage) default to Spark 2 unless overridden.
embed_host=_env("EMBED_HOST") or spark2_host, embed_host=_env(src, "EMBED_HOST") or spark2_host,
embed_user=_env("EMBED_USER") or spark2_user, embed_user=_env(src, "EMBED_USER") or spark2_user,
embed_container=_env("EMBED_CONTAINER") or "spark-embed", embed_container=_env(src, "EMBED_CONTAINER") or "spark-embed",
qdrant_host=_env("QDRANT_HOST") or spark2_host, qdrant_host=_env(src, "QDRANT_HOST") or spark2_host,
qdrant_user=_env("QDRANT_USER") or spark2_user, qdrant_user=_env(src, "QDRANT_USER") or spark2_user,
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant", qdrant_container=_env(src, "QDRANT_CONTAINER") or "qdrant",
qdrant_collection=_env("QDRANT_COLLECTION", ""), qdrant_collection=_env(src, "QDRANT_COLLECTION", ""),
# matrix-bridge bot container, driven as its own SSH user (the owner
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
# The user is BLANK by default and set via the settings gear; leaving
# it blank reports the service as unconfigured, which hides the tile.
# That keeps the shared package portable — a deployment without the
# bot never shows a stray tile or a hardcoded username. Host defaults
# to Spark 2 (same box); container/dir/branch are sensible defaults.
matrix_bridge_host=_env(src, "MATRIX_BRIDGE_HOST") or spark2_host,
matrix_bridge_user=_env(src, "MATRIX_BRIDGE_USER"),
matrix_bridge_container=_env(src, "MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
matrix_bridge_dir=_env(src, "MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
matrix_bridge_branch=_env(src, "MATRIX_BRIDGE_BRANCH") or "master",
# Redaction gateway pseudonym-map store (server-held de-anon key). # Redaction gateway pseudonym-map store (server-held de-anon key).
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"), redaction_map_db=_env(src, "REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")), redaction_map_ttl=_env_int(src, "REDACTION_MAP_TTL", 7200),
ssh_key_path=_env("SSH_KEY_PATH"), ssh_key_path=_env(src, "SSH_KEY_PATH"),
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"), ssh_known_hosts=_env(src, "SSH_KNOWN_HOSTS"),
models_yaml=_resolve_models_yaml(), models_yaml=_resolve_models_yaml(),
vllm_port=int(_env("VLLM_PORT", "8888")), vllm_port=_env_int(src, "VLLM_PORT", 8888),
parakeet_port=int(_env("PARAKEET_PORT", "8000")), # Container name for the swappable vLLM on Spark 1. Defaults to the
kokoro_port=int(_env("KOKORO_PORT", "8880")), # bundled launch-cluster.sh container; override if you named yours
embed_port=int(_env("EMBED_PORT", "8088")), # something else (the swap log-tail and pre-flight validator exec
qdrant_port=int(_env("QDRANT_PORT", "6333")), # into it by name).
bind_port=int(_env("BIND_PORT", "9999")), vllm_container=_env_container(src, "VLLM_CONTAINER", "vllm_node"),
open_webui_url=_env("OPEN_WEBUI_URL", ""), # Built-in support-service keys (parakeet, kokoro, embeddings,
ngc_api_key=_env("NGC_API_KEY", ""), # qdrant) the deployment doesn't run — hidden from the dashboard and
# never probed.
disabled_services=_env_set(src, "DISABLED_SERVICES"),
parakeet_port=_env_int(src, "PARAKEET_PORT", 8000),
kokoro_port=_env_int(src, "KOKORO_PORT", 8880),
embed_port=_env_int(src, "EMBED_PORT", 8088),
qdrant_port=_env_int(src, "QDRANT_PORT", 6333),
bind_port=_env_int(src, "BIND_PORT", 9999),
open_webui_url=_env(src, "OPEN_WEBUI_URL", ""),
ngc_api_key=_env(src, "NGC_API_KEY", ""),
# Coordination layer: fire a swap-lifecycle webhook to this URL so
# downstream consumers re-point their model config on a swap. Blank
# ⇒ disabled. The optional secret HMAC-signs the body (X-Spark-Signature).
swap_webhook_url=_env(src, "SWAP_WEBHOOK_URL", ""),
swap_webhook_secret=_env(src, "SWAP_WEBHOOK_SECRET", ""),
) )
def reload(self) -> None:
"""Recompute every field from the current env + settings overlay and
assign it onto this same instance, so all holders of the reference see
the change without an app restart. Called after the gear writes the
overlay (see server.post_settings)."""
fresh = Settings.from_env()
for f in fields(self):
setattr(self, f.name, getattr(fresh, f.name))
@property @property
def configured(self) -> bool: def configured(self) -> bool:
return bool(self.spark1_host) return bool(self.spark1_host)
+350
View File
@@ -0,0 +1,350 @@
"""Cluster-coordination layer: the GPU swap lock, swap-event webhook, and the
read-only schedule registry.
Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring
business pipelines live in separate services that *call* the swap API. These
three primitives add the *safety* layer around that:
- **Swap lock** — a TTL-bounded reservation of the swap path. An external
scheduler acquires it before swapping; while held by someone else the
dashboard's manual swap is refused (enforced in the swap endpoint, not
advisory). Holder name is descriptive; the returned token is the secret that
authorises a swap or a release.
- **Webhook** — fires `swap_complete` / `swap_failed` to a configurable URL so
downstream consumers re-point their provider config when the running model
changes. Optionally HMAC-signed.
- **Schedule registry** — a read-only view the dashboard surfaces, *registered
by* external schedulers. Spark Control stores what it's told; it does not own
or execute any schedule.
All state is in-memory (mirroring the swap/download/NIM job managers). On a
restart the lock resets to *unlocked* — the available-by-default failure mode;
the swap manager's own in-progress guard still prevents two swaps at once —
and schedulers re-register their schedules.
"""
from __future__ import annotations
import hashlib
import hmac
import json
import logging
import re
import uuid
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from typing import Optional
import httpx
log = logging.getLogger(__name__)
# A lock reserves the GPU for a window; clamp the TTL so a buggy client can
# neither pin the cluster forever nor take a zero-length (useless) lock.
LOCK_TTL_MIN = 1
LOCK_TTL_MAX = 86_400 # 24h
LOCK_TTL_DEFAULT = 900 # 15 min
# Schedule ids are reflected to the dashboard and used as a URL path segment on
# delete, so a caller-supplied id is whitelist-checked. Generated ids are hex.
_SCHEDULE_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
def valid_schedule_id(value: str) -> bool:
"""Whitelist check for a caller-supplied schedule id (register and delete)."""
return bool(_SCHEDULE_ID_RE.match(value or ""))
def _now() -> datetime:
return datetime.now(timezone.utc)
def _iso(dt: datetime) -> str:
return dt.isoformat()
# ---------------------------------------------------------------- swap lock ----
class LockHeld(Exception):
"""The lock is held by a different holder. Carries the public lock state so
the endpoint can return holder + expiry in the 409 body."""
def __init__(self, state: dict) -> None:
self.state = state
super().__init__("swap lock is held by another holder")
@dataclass
class LockState:
holder: str
token: str
acquired_at: datetime
expires_at: datetime
note: str = ""
def public(self, now: datetime) -> dict:
"""Token-free view safe to expose on GET / in error bodies."""
return {
"held": True,
"holder": self.holder,
"acquired_at": _iso(self.acquired_at),
"expires_at": _iso(self.expires_at),
"seconds_remaining": max(0, int((self.expires_at - now).total_seconds())),
"note": self.note,
}
class SwapLockManager:
"""In-memory, TTL-bounded reservation of the GPU swap path.
`now` is injectable on every method purely so the expiry logic is testable
without sleeping; production calls omit it and get wall-clock UTC.
"""
def __init__(self) -> None:
self._lock: Optional[LockState] = None
def _active(self, now: Optional[datetime] = None) -> Optional[LockState]:
"""The current lock if one is held and unexpired; lazily clears an
expired lock so it never lingers."""
now = now or _now()
if self._lock is not None and self._lock.expires_at <= now:
self._lock = None
return self._lock
def status(self, now: Optional[datetime] = None) -> dict:
now = now or _now()
active = self._active(now)
return active.public(now) if active else {"held": False}
def acquire(
self,
holder: str,
ttl_seconds: Optional[int] = None,
note: str = "",
token: Optional[str] = None,
*,
now: Optional[datetime] = None,
) -> LockState:
"""Acquire a free lock (new token), or extend one already held by
presenting its token. A request without the token is refused even if the
holder name matches — the name is descriptive, the token is the secret.
"""
now = now or _now()
holder = (holder or "").strip()
if not holder:
raise ValueError("holder is required")
ttl = ttl_seconds if ttl_seconds is not None else LOCK_TTL_DEFAULT
try:
ttl = int(ttl)
except (TypeError, ValueError):
ttl = LOCK_TTL_DEFAULT
ttl = max(LOCK_TTL_MIN, min(LOCK_TTL_MAX, ttl))
active = self._active(now)
if active is not None:
# Held — only the token-holder may extend/re-acquire.
if not (token and hmac.compare_digest(active.token, token)):
raise LockHeld(active.public(now))
self._lock = LockState(
holder=holder or active.holder,
token=active.token,
acquired_at=active.acquired_at,
expires_at=now + timedelta(seconds=ttl),
note=note or active.note,
)
return self._lock
self._lock = LockState(
holder=holder,
token=uuid.uuid4().hex,
acquired_at=now,
expires_at=now + timedelta(seconds=ttl),
note=note,
)
return self._lock
def verify(self, token: Optional[str], now: Optional[datetime] = None) -> bool:
"""True iff `token` matches the currently-active lock."""
active = self._active(now)
return bool(active and token and hmac.compare_digest(active.token, token))
def is_blocked_by(self, token: Optional[str], now: Optional[datetime] = None) -> Optional[dict]:
"""Single-read swap gate. Returns the public lock state if an active
lock blocks a swap carrying this token, else None. Does exactly one
`_active()` read so the decision can't straddle a TTL expiry the way a
separate status()+verify() pair could (which, at the expiry tick, would
spuriously refuse a swap that should now be allowed)."""
now = now or _now()
active = self._active(now)
if active is None:
return None
if token and hmac.compare_digest(active.token, token):
return None
return active.public(now)
def release(
self,
token: Optional[str] = None,
*,
force: bool = False,
now: Optional[datetime] = None,
) -> bool:
"""Release the lock. Returns False if nothing was held. Requires the
matching token unless `force` (the human override from the dashboard)."""
active = self._active(now)
if active is None:
return False
if not force and not self.verify(token, now):
raise PermissionError("token does not hold the lock")
self._lock = None
return True
# ----------------------------------------------------------------- webhook ----
def build_webhook_payload(
*,
event: str,
job_id: str,
model_key: str,
state: str,
returncode: Optional[int],
started_at: Optional[str],
finished_at: Optional[str],
dry_run: bool,
) -> dict:
return {
"event": event, # swap_complete | swap_failed
"job_id": job_id,
"model_key": model_key,
"state": state,
"returncode": returncode,
"started_at": started_at,
"finished_at": finished_at,
"dry_run": dry_run,
}
def sign_payload(secret: str, body: bytes) -> str:
"""`X-Spark-Signature` value: sha256 HMAC of the exact JSON body the
consumer receives, so they can recompute and trust it."""
return "sha256=" + hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
class WebhookNotifier:
"""Fire-and-forget POST of swap-lifecycle events. A webhook failure is
logged and swallowed — it must never affect the swap outcome."""
def __init__(self, url: str, secret: str = "", timeout: float = 5.0) -> None:
self.url = (url or "").strip()
self.secret = secret or ""
self.timeout = timeout
def update(self, url: str, secret: str = "") -> None:
"""Re-point after a live settings change. The notifier holds snapshot
copies of these two fields (not the Settings object), so Settings.reload()
can't reach it — server.post_settings calls this explicitly so editing the
webhook URL/secret in the dashboard gear takes effect without a restart."""
self.url = (url or "").strip()
self.secret = secret or ""
@property
def enabled(self) -> bool:
return bool(self.url)
async def fire(self, event: str, payload: dict) -> None:
if not self.enabled:
return
body = json.dumps(payload).encode()
headers = {
"content-type": "application/json",
"user-agent": "spark-control-webhook",
"x-spark-event": event,
}
if self.secret:
headers["x-spark-signature"] = sign_payload(self.secret, body)
try:
async with httpx.AsyncClient(timeout=self.timeout) as client:
await client.post(self.url, content=body, headers=headers)
except Exception as e: # noqa: BLE001 — best-effort, never propagate
log.warning("swap webhook to %s failed: %s", self.url, e)
# -------------------------------------------------------- schedule registry ----
@dataclass
class ScheduleEntry:
id: str
name: str
owner: str = ""
cron: str = ""
next_run: str = ""
description: str = ""
registered_at: str = ""
updated_at: str = ""
def public(self) -> dict:
return {
"id": self.id,
"name": self.name,
"owner": self.owner,
"cron": self.cron,
"next_run": self.next_run,
"description": self.description,
"registered_at": self.registered_at,
"updated_at": self.updated_at,
}
class ScheduleRegistry:
"""What external schedulers tell us about their cron jobs. Read-only from the
dashboard's side; Spark Control never executes any of it."""
def __init__(self) -> None:
self._items: dict[str, ScheduleEntry] = {}
def list(self) -> list[dict]:
return [e.public() for e in self._items.values()]
def register(
self,
*,
name: str,
id: Optional[str] = None,
owner: str = "",
cron: str = "",
next_run: str = "",
description: str = "",
) -> ScheduleEntry:
name = (name or "").strip()
if not name:
raise ValueError("name is required")
if id is not None:
id = id.strip()
if id and not valid_schedule_id(id):
raise ValueError("id must match [A-Za-z0-9_.-] (max 64 chars)")
ts = _iso(_now())
existing = self._items.get(id) if id else None
if existing is not None:
existing.name = name
existing.owner = owner.strip()
existing.cron = cron
existing.next_run = next_run
existing.description = description
existing.updated_at = ts
return existing
sid = id or uuid.uuid4().hex[:8]
entry = ScheduleEntry(
id=sid,
name=name,
owner=owner.strip(),
cron=cron,
next_run=next_run,
description=description,
registered_at=ts,
updated_at=ts,
)
self._items[sid] = entry
return entry
def delete(self, schedule_id: str) -> bool:
return self._items.pop(schedule_id, None) is not None
+11
View File
@@ -10,6 +10,17 @@ Format:
port: 8001 port: 8001
health_path: /health health_path: /health
image: nvcr.io/nim/nvidia/riva-multilingual:latest image: nvcr.io/nim/nvidia/riva-multilingual:latest
A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
probed via /v1/models plus container state and start/stop/restart:
custom:
- key: vllm-spark2
kind: vllm
host: <spark-2-ip>
user: <ssh-user>
container: vllm_node
port: 8000
""" """
from __future__ import annotations from __future__ import annotations
import os import os
+4
View File
@@ -377,6 +377,10 @@ class DeepHealth:
async def run_all(self) -> dict[str, ProbeResult]: async def run_all(self) -> dict[str, ProbeResult]:
results = {} results = {}
for name in self.PROBES: for name in self.PROBES:
# Don't deep-probe a service the deployment switched off — its port
# may be answered by something else (e.g. a vLLM on Parakeet's 8000).
if name in self.settings.disabled_services:
continue
results[name] = await self.run_one(name) results[name] = await self.run_one(name)
return results return results
+209
View File
@@ -0,0 +1,209 @@
"""Disk-driven model menu + launch-recipe inference.
The dashboard's model list is whatever is actually downloaded on the Sparks
(see `disk.list_cached_models`), NOT a hard-coded catalog. The bundled/overridden
catalog entries are *launch recipes*: matched to an on-disk model by repo, they
say HOW to launch it. A completed model on disk with no matching recipe shows up
as `needs_setup` — the first switch reads its `config.json`, proposes a recipe
(`infer_recipe`) the operator confirms once, and that confirmed recipe is saved
to /data so it's a normal card from then on.
Why a recipe layer at all, if the menu is the disk? Because a folder on disk
doesn't say how to launch it: the per-family parsers (`--reasoning-parser`,
`--tool-call-parser`), the MoE backend (some Gemma MoE checkpoints need
`marlin` on GB10), and solo-vs-cluster topology can't be read off a directory.
We infer a best guess from the model's own config + size, but the operator
confirms it — a wrong guess is cheap, a wrong launch is not.
"""
from __future__ import annotations
import asyncio
import re
from .config import Settings
from .disk import list_cached_models, probe_disk
from .overrides import extract_knobs_from_args
# A model whose weights exceed this can't fit one Spark's 128 GB beside a KV
# cache, so it must shard across both via Ray. A heuristic prefill only — the
# operator confirms mode in the setup form, so the exact cutoff isn't critical.
SINGLE_SPARK_BYTES = 115 * 1000 ** 3
# Generic knob defaults applied to every inferred recipe (the operator can tweak
# these in the setup form). Family-specific flags (parsers, MoE backend) are
# layered on separately by `_detect_family`.
_COMMON_KNOBS = {
"max_model_len": 32768,
"gpu_memory_utilization": 0.85,
"fastsafetensors": True,
"prefix_caching": True,
"kv_cache_dtype": "fp8",
}
def repo_to_key(repo: str) -> str:
"""Stable, URL-safe menu key for a discovered model with no recipe key yet.
'RedHatAI/Qwen3.6-35B-A3B-NVFP4' -> 'redhatai-qwen3-6-35b-a3b-nvfp4'. The same
slug is used by the menu, the setup form, and `_identify_current_model`, so a
loaded-but-unconfigured model still highlights as active."""
return re.sub(r"[^a-z0-9_-]+", "-", repo.lower()).strip("-")
def _detect_family(config: dict) -> tuple[str, list[str], list[str]]:
"""Return (family_label, vllm_flags, capabilities) inferred from config.json.
Only family-specific, non-knob flags (parsers, MoE backend) go in vllm_flags;
generic knob defaults are handled by the caller. Best-effort and operator-
confirmed, so a wrong guess is cheap."""
arch = " ".join(config.get("architectures") or [])
mtype = str(config.get("model_type") or "")
s = (arch + " " + mtype).lower()
is_moe = (
"moe" in s
or any(config.get(k) for k in ("num_experts", "n_routed_experts", "num_local_experts"))
)
is_vision = (
"conditionalgeneration" in s
or "vision" in s
or "vlforcausallm" in s
or "vision_config" in config
or "image_token_index" in config
)
flags: list[str] = []
caps: list[str] = []
label = "Generic"
if mtype.startswith("qwen3") or "qwen3" in s:
label = "Qwen3 (MoE)" if is_moe else "Qwen3"
flags.append("--reasoning-parser=qwen3")
caps.append("reasoning")
if is_moe:
flags.append("--moe_backend=flashinfer_cutlass")
elif "gemma" in s:
label = "Gemma (MoE)" if is_moe else "Gemma"
flags += ["--reasoning-parser=gemma4", "--tool-call-parser=gemma4", "--enable-auto-tool-choice"]
caps += ["reasoning", "tools"]
if is_moe:
# The fast flashinfer/CUTLASS FP4 path errors on GB10 for Gemma MoE;
# marlin is the working fallback (see the Gemma 26B trial notes).
flags.append("--moe_backend=marlin")
if is_vision and "vision" not in caps:
caps.append("vision")
return label, flags, caps
def _infer_mode(total_bytes: int, on_host_count: int) -> str:
"""Solo unless the weights are present on both Sparks or too big for one."""
if on_host_count >= 2 or total_bytes > SINGLE_SPARK_BYTES:
return "cluster"
return "solo"
def infer_recipe(repo: str, config: dict, total_bytes: int, on_host_count: int) -> dict:
"""Propose a launch recipe for a discovered model — prefills the setup form."""
label, flags, caps = _detect_family(config or {})
mode = _infer_mode(total_bytes, on_host_count)
vllm_args = list(flags)
vllm_args.append("--max-num-batched-tokens=16384")
knobs = dict(_COMMON_KNOBS)
if mode == "cluster":
# Large models shard across both Sparks via Ray; leave more headroom.
vllm_args += ["-tp=2", "--distributed-executor-backend=ray"]
knobs["gpu_memory_utilization"] = 0.7
return {
"key": repo_to_key(repo),
"repo": repo,
"display_name": repo.split("/")[-1],
"mode": mode,
"capabilities": caps,
"vllm_args": vllm_args,
"knobs": knobs,
"family": label,
}
def _menu_entry_from_recipe(m, *, on_disk: bool, total_bytes: int, per_host: list[dict]) -> dict:
d = m.model_dump()
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
d["needs_setup"] = False
d["on_disk"] = on_disk
d["total_bytes"] = total_bytes
d["per_host"] = per_host
return d
async def build_menu(settings: Settings, catalog) -> dict[str, dict]:
"""The disk-driven model menu: every completed model on the Sparks, annotated
with its launch recipe (matched by repo) or flagged `needs_setup` if none.
Two SSH scans total (one per Spark), run in parallel — much cheaper than the
old per-recipe disk probe. A host that errors is skipped, not fatal."""
hosts = [(settings.spark1_host, settings.spark1_user)]
if settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user))
scans = await asyncio.gather(
*(list_cached_models(h, u, settings) for h, u in hosts),
return_exceptions=True,
)
by_repo: dict[str, dict] = {}
for (h, _u), res in zip(hosts, scans):
if isinstance(res, Exception):
continue
for repo, size, complete in res:
e = by_repo.setdefault(repo, {"total_bytes": 0, "per_host": [], "complete": False})
e["total_bytes"] += size
e["per_host"].append({"host": h, "size_bytes": size})
e["complete"] = e["complete"] or complete
recipe_by_repo = {m.repo: (k, m) for k, m in catalog.models.items() if m.repo}
menu: dict[str, dict] = {}
for repo, info in by_repo.items():
# Skip half-fetched / corrupt caches (no finished snapshot) — they'd show
# as broken cards. In-flight downloads surface in the download panel.
if not info["complete"]:
continue
if repo in recipe_by_repo:
key, m = recipe_by_repo[repo]
menu[key] = _menu_entry_from_recipe(
m, on_disk=True, total_bytes=info["total_bytes"], per_host=info["per_host"]
)
else:
key = repo_to_key(repo)
menu[key] = {
"display_name": repo.split("/")[-1],
"repo": repo,
"local_path": None,
"size_gb": round(info["total_bytes"] / 1e9, 1),
"mode": _infer_mode(info["total_bytes"], len(info["per_host"])),
"capabilities": [],
"expected_ready_seconds": 300,
"vllm_args": [],
"description": None,
"knobs": None,
"custom": False,
"needs_setup": True,
"effective_knobs": {},
"on_disk": True,
"total_bytes": info["total_bytes"],
"per_host": info["per_host"],
}
# Local/fine-tuned recipes live as a directory, not an HF cache entry — probe
# each by path and include it if present. Their keys are unique catalog keys
# (and local models carry repo="" per ModelDef), so they never collide with a
# discovered repo's slug or an HF recipe key above.
for key, m in catalog.models.items():
if not m.local_path:
continue
st = await probe_disk(m.repo, m.mode, settings, local_path=m.local_path)
if not st.on_disk:
continue
menu[key] = _menu_entry_from_recipe(
m,
on_disk=True,
total_bytes=st.total_bytes,
per_host=[{"host": r.host, "size_bytes": r.size_bytes} for r in st.per_host if r.on_disk],
)
return menu
+129 -6
View File
@@ -10,11 +10,13 @@ model or one tied to an in-flight swap/download.
""" """
from __future__ import annotations from __future__ import annotations
import asyncio import asyncio
import json
import re import re
from dataclasses import dataclass from dataclasses import dataclass
from typing import Optional from typing import Optional
from .config import Settings from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run from .ssh import ssh_run
@@ -35,6 +37,87 @@ def repo_to_cache_dirname(repo: str) -> str:
return dn return dn
def cache_dirname_to_repo(dirname: str) -> Optional[str]:
"""Inverse of `repo_to_cache_dirname`: 'models--org--name' -> 'org/name'.
A repo has exactly one '/', so the org is the first '--'-segment and the name
is everything after (names may themselves contain single dashes). Returns
None for anything that isn't a model cache dir."""
if not dirname.startswith("models--"):
return None
parts = dirname[len("models--"):].split("--")
if len(parts) < 2 or not parts[0] or not parts[1]:
return None
return f"{parts[0]}/{'--'.join(parts[1:])}"
def parse_cache_listing(out: str) -> list[tuple[str, int, bool]]:
"""Parse the 'size|complete|dirname' lines from `list_cached_models`'s scan.
Returns [(repo, size_bytes, complete), ...], skipping non-model lines. Pure
function so the parsing is unit-testable without SSH."""
items: list[tuple[str, int, bool]] = []
for line in out.splitlines():
line = line.strip()
if line.count("|") < 2:
continue
size_s, complete_s, dirname = line.split("|", 2)
repo = cache_dirname_to_repo(dirname.strip())
if not repo:
continue
try:
size = int(size_s)
except ValueError:
size = 0
items.append((repo, size, complete_s.strip() == "1"))
return items
async def list_cached_models(host: str, user: str, settings: Settings) -> list[tuple[str, int, bool]]:
"""Enumerate every Hugging Face model cached on a host: (repo, size_bytes, complete).
'complete' = the cache has at least one snapshot carrying a config.json (a
finished download, not a half-fetched/corrupt dir). One SSH round-trip; the
glob's no-match case is handled by the `[ -d ]` guard."""
if not host or not user:
return []
cmd = (
'HUB="$HOME/.cache/huggingface/hub"; '
'for d in "$HUB"/models--*; do '
'[ -d "$d" ] || continue; '
'n=$(basename "$d"); '
'sz=$(du -sb "$d" 2>/dev/null | cut -f1); sz=${sz:-0}; '
'if ls "$d"/snapshots/*/config.json >/dev/null 2>&1; then c=1; else c=0; fi; '
'echo "${sz}|${c}|${n}"; '
'done'
)
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=30.0)
if rc != 0:
return []
return parse_cache_listing(out)
async def read_model_config(host: str, user: str, repo: str, settings: Settings) -> Optional[dict]:
"""Read a cached model's config.json (first snapshot) for launch inference.
Returns the parsed dict, or None if absent/unreadable. The dirname is
whitelisted (repo_to_cache_dirname) so it's safe to embed unquoted."""
if not host or not user:
return None
dn = repo_to_cache_dirname(repo)
cmd = (
f'D=$(ls -d "$HOME/.cache/huggingface/hub/{dn}/snapshots/"*/ 2>/dev/null | head -1); '
f'[ -n "$D" ] && cat "${{D}}config.json" 2>/dev/null'
)
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
if rc != 0 or not out.strip():
return None
try:
return json.loads(out)
except (ValueError, TypeError):
return None
@dataclass @dataclass
class HostDiskResult: class HostDiskResult:
host: str host: str
@@ -76,16 +159,52 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
return HostDiskResult(host=host, on_disk=True, size_bytes=size) return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus: async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).""" """Return whether a local model directory exists on this host and its size.
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
shlex-quote it here in depth.
"""
if not host or not user:
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
qp = quote_arg(path)
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
if rc != 0:
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
raw = out.strip()
if raw == "MISSING" or raw == "":
return HostDiskResult(host=host, on_disk=False)
try:
size = int(raw.splitlines()[-1])
except ValueError:
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_disk(
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
) -> DiskStatus:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
A local model (local_path set) is probed by directory; otherwise by HF cache.
"""
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)] hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
if mode == "cluster" and settings.spark2_host: if mode == "cluster" and settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user)) hosts.append((settings.spark2_host, settings.spark2_user))
if local_path:
results = await asyncio.gather(
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
)
key = local_path
else:
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts)) results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
key = repo
on_disk = any(r.on_disk for r in results) on_disk = any(r.on_disk for r in results)
total = sum(r.size_bytes for r in results) total = sum(r.size_bytes for r in results)
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results)) return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult: async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
@@ -122,10 +241,14 @@ async def delete_host(host: str, user: str, repo: str, settings: Settings) -> Ho
return HostDiskResult(host=host, on_disk=False, size_bytes=freed) return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
async def delete_from_disk(repo: str, mode: str, settings: Settings) -> DiskStatus: async def delete_from_disk(repo: str, settings: Settings) -> DiskStatus:
"""rm -rf the model's cache dir on the relevant Sparks. Idempotent.""" """rm -rf the model's cache dir on ALL configured Sparks. Idempotent.
We sweep both Sparks regardless of the model's declared mode: a 'remove from
disk & menu' must leave nothing behind, and rm of an absent dir reports 0
bytes freed (FREED 0), so an extra host is harmless."""
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)] hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
if mode == "cluster" and settings.spark2_host: if settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user)) hosts.append((settings.spark2_host, settings.spark2_user))
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts)) results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
+15 -1
View File
@@ -23,6 +23,20 @@ from .ssh import ssh_stream, StreamHandle
Mode = Literal["spark1", "spark2", "cluster"] Mode = Literal["spark1", "spark2", "cluster"]
def build_download_command(repo: str, flags: str = "") -> str:
"""Remote shell command that drives hf-download.sh on a Spark.
Prepends ~/.local/bin to PATH. hf-download.sh shells out to `uvx` (Astral's
uv), and the official uv installer drops its binaries in ~/.local/bin — but
our SSH session is non-interactive, so it never sources the user's profile
and ~/.local/bin is off PATH, leaving `uvx` as "command not found". $HOME
expands server-side, so this stays correct for any adopter/user. `repo` is
shlex-quoted at the sink (validate_repo gates the charset upstream).
"""
serve = f"./hf-download.sh {quote_arg(repo)} {flags}".strip()
return f'export PATH="$HOME/.local/bin:$PATH" && cd ~/spark-vllm-docker && {serve}'
_TQDM_RE = re.compile( _TQDM_RE = re.compile(
r"(\d+(?:\.\d+)?)\s*%\s*\|.*?\|\s*" r"(\d+(?:\.\d+)?)\s*%\s*\|.*?\|\s*"
r"([\d.]+[KMG]?B?)\s*/\s*([\d.]+[KMG]?B?)\s*" r"([\d.]+[KMG]?B?)\s*/\s*([\d.]+[KMG]?B?)\s*"
@@ -126,7 +140,7 @@ class DownloadManager:
if not target_host or not target_user: if not target_host or not target_user:
raise RuntimeError(f"{job.mode} host not configured") raise RuntimeError(f"{job.mode} host not configured")
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {quote_arg(job.repo)} {flags}".strip() cmd = build_download_command(job.repo, flags)
job.append(f"$ {cmd}") job.append(f"$ {cmd}")
job.state = "downloading" job.state = "downloading"
job.progress.phase = "Connecting to Hugging Face…" job.progress.phase = "Connecting to Hugging Face…"
+34 -9
View File
@@ -6,17 +6,28 @@ from .config import Settings
_TIMEOUT = 3.0 _TIMEOUT = 3.0
async def check_vllm(settings: Settings) -> dict: def _disabled(settings: Settings, key: str) -> dict | None:
base_url = ( """A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
if settings.spark1_host Lets an adopter who doesn't run a given support service switch its probe off
else None entirely — so the probe never hits whatever else listens on that port, and
) the connectivity log doesn't record it as perpetually down."""
if not settings.spark1_host: if key in settings.disabled_services:
return {"ok": False, "error": "spark1 not configured", "base_url": base_url} return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
return None
async def probe_vllm_endpoint(host: str, port: int) -> dict:
"""Probe any OpenAI-compatible vLLM at host:port via /v1/models.
Shared by the primary (Spark 1) health check and any extra vLLM registered
as a custom service (kind: vllm) to monitor a second Spark."""
base_url = f"http://{host}:{port}/v1" if host else None
if not host:
return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
try: try:
async with httpx.AsyncClient(timeout=_TIMEOUT) as c: async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models") r = await c.get(f"http://{host}:{port}/v1/models")
r.raise_for_status() r.raise_for_status()
ids = [m["id"] for m in r.json().get("data", [])] ids = [m["id"] for m in r.json().get("data", [])]
return { return {
@@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict:
return {"ok": False, "error": str(e), "base_url": base_url} return {"ok": False, "error": str(e), "base_url": base_url}
async def check_vllm(settings: Settings) -> dict:
if not settings.spark1_host:
return {"ok": False, "error": "spark1 not configured", "base_url": None}
return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
async def check_parakeet(settings: Settings) -> dict: async def check_parakeet(settings: Settings) -> dict:
if d := _disabled(settings, "parakeet"):
return d
base_url = ( base_url = (
f"http://{settings.parakeet_host}:{settings.parakeet_port}" f"http://{settings.parakeet_host}:{settings.parakeet_port}"
if settings.parakeet_host if settings.parakeet_host
@@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict:
async def check_kokoro(settings: Settings) -> dict: async def check_kokoro(settings: Settings) -> dict:
if d := _disabled(settings, "kokoro"):
return d
base_url = ( base_url = (
f"http://{settings.kokoro_host}:{settings.kokoro_port}" f"http://{settings.kokoro_host}:{settings.kokoro_port}"
if settings.kokoro_host if settings.kokoro_host
@@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict:
async def check_embeddings(settings: Settings) -> dict: async def check_embeddings(settings: Settings) -> dict:
if d := _disabled(settings, "embeddings"):
return d
base_url = ( base_url = (
f"http://{settings.embed_host}:{settings.embed_port}" f"http://{settings.embed_host}:{settings.embed_port}"
if settings.embed_host if settings.embed_host
@@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict:
async def check_qdrant(settings: Settings) -> dict: async def check_qdrant(settings: Settings) -> dict:
if d := _disabled(settings, "qdrant"):
return d
base_url = ( base_url = (
f"http://{settings.qdrant_host}:{settings.qdrant_port}" f"http://{settings.qdrant_host}:{settings.qdrant_port}"
if settings.qdrant_host if settings.qdrant_host
+186
View File
@@ -0,0 +1,186 @@
"""Update + logs for the matrix-bridge bot container on the Spark.
matrix-bridge is a single Docker container managed by docker compose out of a
git clone at `~matrix_bridge_user/matrix-bridge`. Status (the badge) and
start/stop/restart ride the generic service machinery in `services.py`
(`docker_state` / `run_action`). The two things that don't fit that mould live
here:
- **Update** — `git fetch && git reset --hard origin/<branch> && docker
compose up -d --build`. Long-running (docker build), so it streams like the
vLLM `UpdateManager`: fire-and-forget job, SSE stream, fail-loud rc.
- **Logs** — a one-shot `docker logs --tail N` for diagnosing a red badge.
We connect **directly as the configured user** (`modelo` — the repo owner), so
git never trips its dubious-ownership guard and docker runs via the user's
docker-group membership. We deliberately do NOT `sudo -iu modelo`: this Spark
has no passwordless sudo, so a sudo wrap would hang in SSH BatchMode.
"""
from __future__ import annotations
import asyncio
import time
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run, ssh_stream, StreamHandle
# Hard ceiling on a single update. A first build after a base-image bump is
# slow (minutes); the cache makes later ones quick. 25 min is generous headroom
# without letting a genuinely wedged build spin forever.
_UPDATE_TIMEOUT_S = 1500
def build_update_command(directory: str, branch: str) -> str:
"""The update one-liner, run from the bot's git clone as its owner.
`directory` and `branch` come from operator config (not request input), so
they're interpolated directly — same trust model as the Spark hostnames in
`health`/`updates`. `directory` may be `~/...`, which must stay unquoted so
the remote login shell expands it; quoting would defeat that.
"""
return (
f"cd {directory} && "
f"git fetch origin && "
f"git reset --hard origin/{branch} && "
f"docker compose up -d --build"
)
def _phase_for(line: str) -> Optional[str]:
"""Map a streamed output line to a human-readable phase, or None to keep
the current phase. Kept loose — compose/buildkit output varies by version."""
low = line.lower()
if "git reset" in low or "head is now at" in low:
return "Resetting to the latest release…"
if "docker compose" in low or "buildkit" in low or low.startswith("step ") or "=> " in line or "building " in low:
return "Building the bot image…"
if "recreate" in low or "starting" in low or "started" in low or "container matrix-bridge" in low:
return "Recreating the container…"
if "already up to date" in low:
return "No new code; rebuilding…"
return None
@dataclass
class UpdateJob:
id: str
started_at: str
state: str = "starting"
lines: list[str] = field(default_factory=list)
returncode: Optional[int] = None
finished_at: Optional[str] = None
phase: str = "Starting…"
def append(self, line: str) -> None:
self.lines.append(line)
if len(self.lines) > 1000:
del self.lines[: len(self.lines) - 1000]
class MatrixBridgeManager:
def __init__(self, settings: Settings) -> None:
self.settings = settings
self.lock = asyncio.Lock()
self.jobs: dict[str, UpdateJob] = {}
self.current_job_id: Optional[str] = None
def _configured(self) -> bool:
s = self.settings
return bool(s.matrix_bridge_host and s.matrix_bridge_user)
def get(self, job_id: str) -> UpdateJob | None:
return self.jobs.get(job_id)
async def fetch_logs(self, tail: int = 100) -> dict:
"""One-shot `docker logs --tail N <container>` (stderr merged in)."""
s = self.settings
if not self._configured():
return {"ok": False, "error": "matrix-bridge host not configured"}
tail = max(1, min(int(tail), 1000))
# tail is already int-clamped, but quote at the sink anyway so the
# shellsafe convention (no raw interpolation into an SSH command) holds
# regardless of caller.
cmd = f"docker logs --tail {quote_arg(str(tail))} {quote_arg(s.matrix_bridge_container)} 2>&1"
rc, out, err = await ssh_run(
s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, timeout=20
)
return {
"ok": rc == 0,
"rc": rc,
"container": s.matrix_bridge_container,
"output": (out or err).strip(),
}
async def trigger_update(self) -> UpdateJob:
if not self._configured():
raise RuntimeError("matrix-bridge host not configured")
if self.lock.locked():
raise RuntimeError("An update is already in progress")
job = UpdateJob(
id=uuid.uuid4().hex[:8],
started_at=datetime.now(timezone.utc).isoformat(),
)
self.jobs[job.id] = job
self.current_job_id = job.id
asyncio.create_task(self._run(job))
return job
async def _run(self, job: UpdateJob) -> None:
async with self.lock:
try:
await self._do(job)
if job.state != "failed":
job.state = "done"
job.returncode = 0
job.phase = "Done"
except asyncio.TimeoutError:
job.append(f"[error] update timed out after {_UPDATE_TIMEOUT_S}s")
job.state = "failed"
job.returncode = 124
job.phase = "Timed out"
except Exception as e:
job.append(f"[error] {type(e).__name__}: {e}")
job.state = "failed"
if job.returncode is None:
job.returncode = 1
finally:
job.finished_at = datetime.now(timezone.utc).isoformat()
if self.current_job_id == job.id:
self.current_job_id = None
async def _do(self, job: UpdateJob) -> None:
s = self.settings
cmd = build_update_command(s.matrix_bridge_dir, s.matrix_bridge_branch)
job.append(f"$ {cmd}")
job.state = "running"
job.phase = "Fetching latest code…"
handle = StreamHandle()
gen = ssh_stream(s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, handle=handle)
deadline = time.monotonic() + _UPDATE_TIMEOUT_S
try:
while True:
remaining = deadline - time.monotonic()
if remaining <= 0:
raise asyncio.TimeoutError
try:
line = await asyncio.wait_for(gen.__anext__(), timeout=remaining)
except StopAsyncIteration:
break
job.append(line)
phase = _phase_for(line)
if phase:
job.phase = phase
finally:
# Closing the generator terminates the underlying ssh process and
# populates handle.returncode via ssh_stream's finally block.
await gen.aclose()
rc = handle.returncode or 0
if rc != 0:
job.state = "failed"
job.returncode = rc
+77 -7
View File
@@ -1,15 +1,33 @@
from __future__ import annotations from __future__ import annotations
import logging
from typing import Literal, Optional from typing import Literal, Optional
import yaml import yaml
from pydantic import BaseModel, Field from pydantic import BaseModel, Field, model_validator
from .overrides import apply_knobs_to_args, load_overrides from .overrides import apply_knobs_to_args, load_overrides
from .shellsafe import quote_arg, quote_args from .shellsafe import quote_arg, quote_args, validate_local_path
log = logging.getLogger(__name__)
def _chat_template_path(vllm_args: list[str]) -> str | None:
"""Extract the path from a `--chat-template=<path>` arg, if present."""
for a in vllm_args:
if a.startswith("--chat-template="):
return a.split("=", 1)[1]
return None
def _is_within(path: str, base: str) -> bool:
"""True if `path` is `base` itself or lives inside it (lexical check)."""
base = base.rstrip("/")
return path == base or path.startswith(base + "/")
class ModelDef(BaseModel): class ModelDef(BaseModel):
display_name: str display_name: str
repo: str repo: str = "" # HF 'org/name'; empty for a local model
local_path: str | None = None # absolute dir on the Spark; set => local model
size_gb: float size_gb: float
mode: Literal["solo", "cluster"] mode: Literal["solo", "cluster"]
capabilities: list[str] = Field(default_factory=list) capabilities: list[str] = Field(default_factory=list)
@@ -19,6 +37,38 @@ class ModelDef(BaseModel):
knobs: dict | None = None # user-customized; merged at launch time knobs: dict | None = None # user-customized; merged at launch time
custom: bool = False # True if this came from /data overrides custom: bool = False # True if this came from /data overrides
@model_validator(mode="after")
def _validate_source(self) -> "ModelDef":
if bool(self.repo) == bool(self.local_path):
raise ValueError(
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
f"or 'local_path' (Spark directory)"
)
if self.local_path:
# Single place that enforces the path whitelist, so YAML/override
# entries get the same boundary check as the API. The quote_arg sink
# is still defense-in-depth.
validate_local_path(self.local_path)
# Only local_path is bind-mounted into the vLLM container, so any
# --chat-template path must live inside it or vLLM can't find it.
tmpl = _chat_template_path(self.vllm_args)
if tmpl is not None and not _is_within(tmpl, self.local_path):
raise ValueError(
f"--chat-template path {tmpl!r} must be inside the model "
f"directory {self.local_path!r} (only that directory is mounted "
f"into the container)"
)
return self
@property
def is_local(self) -> bool:
return bool(self.local_path)
@property
def source(self) -> str:
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
return self.local_path if self.local_path else self.repo
class Defaults(BaseModel): class Defaults(BaseModel):
port: int = 8888 port: int = 8888
@@ -47,7 +97,8 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
continue continue
defaults_dump = { defaults_dump = {
"display_name": entry.get("display_name", key), "display_name": entry.get("display_name", key),
"repo": entry["repo"], "repo": entry.get("repo", ""),
"local_path": entry.get("local_path"),
"size_gb": float(entry.get("size_gb", 0)), "size_gb": float(entry.get("size_gb", 0)),
"mode": entry.get("mode", "solo"), "mode": entry.get("mode", "solo"),
"capabilities": entry.get("capabilities") or [], "capabilities": entry.get("capabilities") or [],
@@ -57,7 +108,12 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
"knobs": entry.get("knobs"), "knobs": entry.get("knobs"),
"custom": True, "custom": True,
} }
# A single malformed override entry (bad path, missing source, etc.) must
# not take down the whole catalog — skip it and keep the rest loadable.
try:
new_models[key] = ModelDef.model_validate(defaults_dump) new_models[key] = ModelDef.model_validate(defaults_dump)
except Exception as e:
log.warning("skipping invalid custom model %r: %s", key, e)
return Catalog(defaults=catalog.defaults, models=new_models) return Catalog(defaults=catalog.defaults, models=new_models)
@@ -78,7 +134,21 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
solo = "--solo " if model.mode == "solo" else "" solo = "--solo " if model.mode == "solo" else ""
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs) base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args] args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
# repo + args are user-controlled (custom models, knobs); shlex.quote each so # source + args are user-controlled (custom models, knobs); shlex.quote each
# they cannot break out of the SSH shell command. shlex.split (used by the # so they cannot break out of the SSH shell command. shlex.split (used by the
# vLLM pre-flight validator) cleanly reverses this quoting. # vLLM pre-flight validator) cleanly reverses this quoting.
return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}" prefix = ""
if model.local_path:
# A local model's directory isn't in the HF cache the launch script
# already mounts, so bind-mount it at the SAME path inside the vllm
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
# path inside and out means `vllm serve <dir>` and any
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
# change needed. (The env assignment sits before the script, so the
# validator's `serve`-keyed shlex round-trip is unaffected.)
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
return (
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
f"{quote_arg(model.source)} {quote_args(args)}"
)
+7 -1
View File
@@ -14,7 +14,7 @@ Shape:
custom: custom:
- key: my-new-model - key: my-new-model
display_name: My New Model (from download) display_name: My New Model (from download)
repo: my-org/my-model repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
size_gb: 20 size_gb: 20
mode: solo mode: solo
description: null description: null
@@ -25,6 +25,12 @@ Shape:
fastsafetensors: true fastsafetensors: true
prefix_caching: true prefix_caching: true
kv_cache_dtype: fp8 kv_cache_dtype: fp8
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
display_name: My Fine-tune
local_path: /home/you/models/my-finetune
size_gb: 59
mode: solo
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
""" """
from __future__ import annotations from __future__ import annotations
import os import os
+359 -67
View File
@@ -1,29 +1,34 @@
from __future__ import annotations from __future__ import annotations
import asyncio import asyncio
import json import json
import os
from pathlib import Path from pathlib import Path
from fastapi import FastAPI, HTTPException from fastapi import FastAPI, HTTPException, Query, Request
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel from pydantic import BaseModel, ValidationError
from typing import Literal from typing import Literal
from . import app_settings
from .config import Settings from .config import Settings
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
from .coordination import LockHeld, ScheduleRegistry, SwapLockManager, WebhookNotifier, valid_schedule_id
from .custom_services import add_custom_service, delete_custom_service from .custom_services import add_custom_service, delete_custom_service
from .audio_proxy import build_router as build_audio_router from .audio_proxy import build_router as build_audio_router
from .deep_health import DeepHealth from .deep_health import DeepHealth
from .disk import delete_from_disk, probe_disk from .discovery import build_menu, infer_recipe, repo_to_key
from .disk import delete_from_disk, probe_host, read_model_config
from .download import DownloadManager from .download import DownloadManager
from .llm_proxy import build_router as build_llm_router from .llm_proxy import build_router as build_llm_router
from .embeddings_proxy import build_router as build_embeddings_router from .embeddings_proxy import build_router as build_embeddings_router
from .redaction_gateway import build_router as build_redaction_router, MapStore from .redaction_gateway import build_router as build_redaction_router, MapStore
from .hardware import HardwareProbe from .hardware import HardwareProbe
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
from .models import load_catalog from .matrix_bridge import MatrixBridgeManager
from .models import ModelDef, load_catalog
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs from .overrides import add_custom, delete_custom, load_overrides, set_knobs
from .services import docker_state, run_action, services_from_settings from .services import docker_state, run_action, services_from_settings
from .shellsafe import validate_container, validate_image, validate_repo from .shellsafe import validate_container, validate_image, validate_repo
from .speech_models import SpeechModelsManager from .speech_models import SpeechModelsManager
@@ -34,15 +39,25 @@ from .validate import validate_launch
from .wol import send_local_broadcast, send_via_peer from .wol import send_local_broadcast, send_via_peer
# One-time migration: seed the in-app settings overlay from env (values set via
# the StartOS action on a pre-gear install) before building Settings, so nothing
# is lost on upgrade. No-op once the overlay exists. See app_settings.
app_settings.seed_from_env(os.environ)
settings = Settings.from_env() settings = Settings.from_env()
catalog = load_catalog(settings.models_yaml) catalog = load_catalog(settings.models_yaml)
swap_manager = SwapManager(settings, catalog) # Coordination layer (GPU arbiter): swap-lifecycle webhook, the swap reservation
# lock, and the read-only schedule registry. See coordination.py.
swap_webhook = WebhookNotifier(settings.swap_webhook_url, settings.swap_webhook_secret)
swap_lock = SwapLockManager()
schedule_registry = ScheduleRegistry()
swap_manager = SwapManager(settings, catalog, notifier=swap_webhook)
download_manager = DownloadManager(settings) download_manager = DownloadManager(settings)
update_manager = UpdateManager(settings) update_manager = UpdateManager(settings)
hardware_probe = HardwareProbe(settings) hardware_probe = HardwareProbe(settings)
nim_manager = NimManager(settings) nim_manager = NimManager(settings)
deep_health = DeepHealth(settings) deep_health = DeepHealth(settings)
speech_models = SpeechModelsManager(settings) speech_models = SpeechModelsManager(settings)
matrix_bridge = MatrixBridgeManager(settings)
app = FastAPI(title="spark-control", version="0.1.0") app = FastAPI(title="spark-control", version="0.1.0")
@@ -65,6 +80,10 @@ _CSRF_EXEMPT_PREFIXES = (
"/api/audio/", # diarize-chunk / label-merge / transcribe-with-speakers "/api/audio/", # diarize-chunk / label-merge / transcribe-with-speakers
"/api/health-event", # health reports posted by consumer apps "/api/health-event", # health reports posted by consumer apps
) )
# Note: the coordination endpoints (/api/swap/lock, /api/schedule) are
# intentionally NOT exempt. External schedulers are non-browser clients (no
# Origin header) so they pass the guard already — same as /api/swap — while a
# malicious page can't drive them from the operator's browser. Don't add them.
@app.middleware("http") @app.middleware("http")
@@ -143,26 +162,100 @@ async def get_config() -> dict:
} }
# ---- In-app settings ('gear') ----
# The optional cluster knobs (ports, container names, support-service hosts,
# integrations) live in an app-owned overlay on /data, edited here instead of in
# the StartOS action — which keeps to just the four required setup fields. See
# app_settings. Writes apply live: we rewrite the overlay then reload the shared
# Settings instance in place, so every router/manager holding the reference picks
# up the change with no container restart.
@app.get("/api/settings")
async def get_settings() -> dict:
return app_settings.public_view()
class SettingsUpdate(BaseModel):
values: dict[str, str]
@app.post("/api/settings")
async def post_settings(req: SettingsUpdate) -> dict:
try:
app_settings.apply(req.values)
except app_settings.SettingsError as e:
raise HTTPException(422, str(e))
settings.reload()
# WebhookNotifier snapshots url/secret (not the Settings object), so reload()
# can't reach it — re-point it explicitly so a webhook edit applies live too.
swap_webhook.update(settings.swap_webhook_url, settings.swap_webhook_secret)
return app_settings.public_view()
def _reload_catalog() -> None: def _reload_catalog() -> None:
global catalog global catalog
catalog = load_catalog(settings.models_yaml) catalog = load_catalog(settings.models_yaml)
swap_manager.reload_catalog(catalog) swap_manager.reload_catalog(catalog)
def _recipe_summaries() -> list[dict]:
"""Known launch recipes (bundled + saved), for the download panel's autocomplete.
These are NOT the menu — the menu is what's on disk. This is just the set of
repos Spark Control already knows how to launch, so the download box can
suggest them by name without putting phantom cards on the dashboard."""
out = []
for m in catalog.models.values():
if m.repo:
out.append({"repo": m.repo, "display_name": m.display_name, "mode": m.mode})
return out
@app.get("/api/models") @app.get("/api/models")
async def get_models() -> dict: async def get_models() -> dict:
out_models: dict[str, dict] = {} """The model menu = what's actually downloaded on the Sparks (one scan per
for key, m in catalog.models.items(): Spark), each annotated with its launch recipe or flagged `needs_setup`.
d = m.model_dump()
# Always include effective knobs for the UI (defaults from base args + any overrides) Does SSH, so it's the slower of the model endpoints; the front-end calls it on
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})} load, after a swap/download/delete, and on a slow timer — not every poll."""
out_models[key] = d if not settings.configured:
return {"configured": False, "defaults": catalog.defaults.model_dump(), "models": {}, "recipes": []}
menu = await build_menu(settings, catalog)
return { return {
"configured": True,
"defaults": catalog.defaults.model_dump(), "defaults": catalog.defaults.model_dump(),
"models": out_models, "models": menu,
"recipes": _recipe_summaries(),
} }
@app.get("/api/models/suggest")
async def suggest_model(repo: str = Query(...)) -> dict:
"""Read a downloaded model's config.json + size and propose a launch recipe.
Prefills the 'set up this model' form for an on-disk model that has no recipe
yet. The operator confirms/edits, then POSTs it to /api/models to save."""
if not settings.configured:
raise HTTPException(503, "spark1 not configured")
try:
validate_repo(repo)
except ValueError as e:
raise HTTPException(400, str(e))
hosts = [(settings.spark1_host, settings.spark1_user)]
if settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user))
# Config from whichever Spark has it; size summed across the Sparks that do.
sizes = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
total = sum(r.size_bytes for r in sizes if r.on_disk)
on_hosts = sum(1 for r in sizes if r.on_disk)
config = None
for (h, u), r in zip(hosts, sizes):
if r.on_disk:
config = await read_model_config(h, u, repo, settings)
if config is not None:
break
return infer_recipe(repo, config or {}, total, on_hosts)
class KnobsBody(BaseModel): class KnobsBody(BaseModel):
knobs: dict knobs: dict
@@ -181,7 +274,8 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
class CustomModelBody(BaseModel): class CustomModelBody(BaseModel):
key: str key: str
display_name: str display_name: str
repo: str repo: str = ""
local_path: str | None = None
size_gb: float = 0 size_gb: float = 0
mode: Literal["solo", "cluster"] = "solo" mode: Literal["solo", "cluster"] = "solo"
description: str | None = None description: str | None = None
@@ -194,8 +288,17 @@ class CustomModelBody(BaseModel):
async def post_model(body: CustomModelBody) -> dict: async def post_model(body: CustomModelBody) -> dict:
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum(): if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
raise HTTPException(400, "key must be alphanumeric/-/_ only") raise HTTPException(400, "key must be alphanumeric/-/_ only")
# Validate the full entry BEFORE persisting (exactly-one source, local-path
# whitelist, chat-template location). Doing it via ModelDef means the API and
# the YAML-override path share one set of rules, and a bad entry can't be
# written to /data and then break catalog load.
try: try:
validate_repo(body.repo) ModelDef.model_validate(body.model_dump())
if body.repo:
validate_repo(body.repo) # HF charset (the model only validates local paths)
except ValidationError as e:
msg = e.errors()[0]["msg"] if e.errors() else str(e)
raise HTTPException(400, msg.removeprefix("Value error, "))
except ValueError as e: except ValueError as e:
raise HTTPException(400, str(e)) raise HTTPException(400, str(e))
if body.key in catalog.models and not catalog.models[body.key].custom: if body.key in catalog.models and not catalog.models[body.key].custom:
@@ -216,57 +319,43 @@ async def del_model(key: str) -> dict:
return {"ok": True, "key": key} return {"ok": True, "key": key}
@app.get("/api/models/disk-status")
async def get_models_disk_status() -> dict:
"""Probe each catalog model's HF cache on the appropriate Spark(s) in parallel.
Result is keyed by model key: {on_disk, total_bytes, per_host:[{host,on_disk,size_bytes,error?}]}.
Designed to be called once on dashboard load; takes ~13s depending on Spark count.
"""
if not settings.configured:
return {"configured": False, "models": {}}
keys = list(catalog.models.keys())
statuses = await asyncio.gather(*(
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
), return_exceptions=True)
out: dict[str, dict] = {}
for k, s in zip(keys, statuses):
if isinstance(s, Exception):
out[k] = {"on_disk": False, "total_bytes": 0, "per_host": [], "error": str(s)}
continue
out[k] = {
"on_disk": s.on_disk,
"total_bytes": s.total_bytes,
"per_host": [
{"host": r.host, "on_disk": r.on_disk, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
for r in s.per_host
],
}
return {"configured": True, "models": out}
@app.delete("/api/models/{key}/disk") @app.delete("/api/models/{key}/disk")
async def del_model_disk(key: str) -> dict: async def del_model_disk(key: str) -> dict:
"""Delete a model's weights from the Spark filesystem(s). The catalog entry stays. """Remove a model's weights from the Sparks — and thus from the menu, since the
menu IS the disk. Resolves the key against the live menu, so a discovered
model (no saved recipe) is deletable too.
Safety rails: Safety rails:
- Refuses a local/fine-tuned directory (hand-placed, not re-downloadable).
- Refuses if the model is currently loaded on vLLM. - Refuses if the model is currently loaded on vLLM.
- Refuses if a swap or download is in flight. - Refuses if a swap or this model's own download is in flight.
- Idempotent: if the cache dir is already gone on a host, that host reports 0 bytes freed. - Idempotent across both Sparks: an already-absent cache dir frees 0 bytes.
""" """
if key not in catalog.models: if not settings.configured:
raise HTTPException(503, "spark1 not configured")
menu = await build_menu(settings, catalog)
entry = menu.get(key)
if entry is None:
raise HTTPException(404, f"unknown model: {key}") raise HTTPException(404, f"unknown model: {key}")
m = catalog.models[key]
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
# training output the user placed by hand, not a re-downloadable HF cache.
if entry.get("local_path"):
raise HTTPException(
400,
"this is a local model; its directory must be managed on the Spark, not deleted from here",
)
repo = entry["repo"]
# Refuse if currently loaded # Refuse if currently loaded
try: try:
vllm = await check_vllm(settings) vllm = await check_vllm(settings)
except Exception: except Exception:
vllm = {} vllm = {}
if vllm.get("ok") and vllm.get("current_model") == m.repo: if vllm.get("ok") and vllm.get("current_model") == repo:
raise HTTPException( raise HTTPException(
409, 409,
f"'{m.display_name}' is the currently loaded model. Switch to a different model first, then try again." f"'{entry['display_name']}' is the currently loaded model. Switch to a different model first, then try again."
) )
# Refuse if a swap is in flight # Refuse if a swap is in flight
@@ -276,10 +365,10 @@ async def del_model_disk(key: str) -> dict:
# Refuse if a download is in flight for this same repo (a different model's download is fine) # Refuse if a download is in flight for this same repo (a different model's download is fine)
if download_manager.current_job_id: if download_manager.current_job_id:
job = download_manager.get(download_manager.current_job_id) job = download_manager.get(download_manager.current_job_id)
if job and job.repo == m.repo: if job and job.repo == repo:
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish") raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
status = await delete_from_disk(m.repo, m.mode, settings) status = await delete_from_disk(repo, settings)
# Audit log # Audit log
record_report( record_report(
f"disk:{key}", f"disk:{key}",
@@ -290,7 +379,7 @@ async def del_model_disk(key: str) -> dict:
return { return {
"ok": True, "ok": True,
"key": key, "key": key,
"repo": m.repo, "repo": repo,
"bytes_freed": status.total_bytes, "bytes_freed": status.total_bytes,
"per_host": [ "per_host": [
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})} {"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
@@ -474,6 +563,15 @@ async def get_services() -> dict:
http = await check_embeddings(settings) http = await check_embeddings(settings)
elif name == "qdrant": elif name == "qdrant":
http = await check_qdrant(settings) http = await check_qdrant(settings)
elif svc.kind == "vllm":
# An extra vLLM monitored on another Spark (registered as a custom
# service). Probe its own host/port, not the primary Spark 1 one.
http = await probe_vllm_endpoint(svc.host, svc.port)
elif svc.kind == "bot":
# No HTTP health endpoint (host networking, no port) — judged purely
# by docker state. http_ready stays None so the badge isn't pinned
# to a "Starting…" verdict that can never clear.
http = {"ok": None, "base_url": None}
else: else:
# Custom services expose a /health endpoint by convention. # Custom services expose a /health endpoint by convention.
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"} http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
@@ -484,11 +582,13 @@ async def get_services() -> dict:
"container": svc.container, "container": svc.container,
"kind": svc.kind, "kind": svc.kind,
"base_url": http.get("base_url"), "base_url": http.get("base_url"),
"http_ready": bool(http.get("ok")), # None (not False) for services with no HTTP surface (the bot), so
# the UI judges them by docker state alone instead of "Starting…".
"http_ready": None if svc.kind == "bot" else bool(http.get("ok")),
# Prefer the check fn's own top-level model key (embeddings reports # Prefer the check fn's own top-level model key (embeddings reports
# it there); fall back to a model field inside detail for services # it there); fall back to a model field inside detail for services
# whose /health embeds it (parakeet). # whose /health embeds it (parakeet).
"model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None), "model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
"docker_state": docker.get("state"), "docker_state": docker.get("state"),
"restart_count": docker.get("restart_count"), "restart_count": docker.get("restart_count"),
"started_at": docker.get("started_at"), "started_at": docker.get("started_at"),
@@ -500,7 +600,10 @@ async def get_services() -> dict:
results = await asyncio.gather(*[one(n) for n in services.keys()]) results = await asyncio.gather(*[one(n) for n in services.keys()])
for name, info in results: for name, info in results:
out[name] = info out[name] = info
# Feed http reachability into the connectivity log (transition-only) # Feed http reachability into the connectivity log (transition-only).
# Skip services with no HTTP surface (http_ready is None) — they'd
# otherwise register as perpetually "down".
if info.get("http_ready") is not None:
record_state(name, bool(info.get("http_ready"))) record_state(name, bool(info.get("http_ready")))
return out return out
@@ -606,7 +709,7 @@ async def stream_nim_install(job_id: str):
@app.delete("/api/services/{name}") @app.delete("/api/services/{name}")
async def del_service(name: str) -> dict: async def del_service(name: str) -> dict:
# Only allow deleting custom services (not the bundled built-in keys) # Only allow deleting custom services (not the bundled built-in keys)
if name in ("parakeet", "kokoro", "embeddings", "qdrant"): if name in ("parakeet", "kokoro", "embeddings", "qdrant", "matrix-bridge"):
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)") raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
delete_custom_service(name) delete_custom_service(name)
return {"ok": True, "name": name} return {"ok": True, "name": name}
@@ -625,6 +728,81 @@ async def service_action(name: str, action: str) -> dict:
return {"name": name, "action": action, **result} return {"name": name, "action": action, **result}
# ---- matrix-bridge bot: update (git pull + rebuild) + logs ----
# Status badge + start/stop/restart ride the generic /api/services machinery
# above (the bot is a registered ServiceDef). Only the long-running Update and
# the logs view need bespoke endpoints.
def _serialize_mb_update(job) -> dict:
return {
"id": job.id,
"state": job.state,
"phase": job.phase,
"started_at": job.started_at,
"finished_at": job.finished_at,
"returncode": job.returncode,
"lines": job.lines,
}
@app.post("/api/matrix-bridge/update")
async def post_matrix_bridge_update() -> dict:
"""Pull latest code, rebuild, and recreate the bot container. Long-running
(docker build) — returns a job id to stream."""
try:
job = await matrix_bridge.trigger_update()
except RuntimeError as e:
raise HTTPException(409 if "in progress" in str(e) else 503, str(e))
return {"job_id": job.id, "state": job.state}
@app.get("/api/matrix-bridge/update/{job_id}")
async def get_matrix_bridge_update(job_id: str) -> dict:
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
return _serialize_mb_update(job)
@app.get("/api/matrix-bridge/update/{job_id}/stream")
async def stream_matrix_bridge_update(job_id: str, request: Request):
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
async def gen():
sent = 0
last_phase = None
while True:
# An update can run for minutes; bail promptly if the client is gone
# rather than spinning the poll loop until the job's 25-min ceiling.
if await request.is_disconnected():
return
n = len(job.lines)
if n > sent:
for line in job.lines[sent:n]:
yield f"data: {json.dumps({'line': line})}\n\n"
sent = n
if job.phase != last_phase:
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
last_phase = job.phase
if job.returncode is not None and sent >= len(job.lines):
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
return
await asyncio.sleep(0.5)
return StreamingResponse(gen(), media_type="text/event-stream")
@app.get("/api/matrix-bridge/logs")
async def get_matrix_bridge_logs(tail: int = Query(100, ge=1, le=1000)) -> dict:
"""Last N lines of `docker logs` for the bot container (stderr merged)."""
result = await matrix_bridge.fetch_logs(tail=tail)
if not result.get("ok"):
raise HTTPException(502, result.get("output") or result.get("error") or "could not read logs")
return result
# ---- Speech model patch management ---- # ---- Speech model patch management ----
@app.get("/api/speech-models") @app.get("/api/speech-models")
@@ -688,17 +866,20 @@ async def get_endpoints() -> dict:
"base_url": vllm.get("base_url"), "base_url": vllm.get("base_url"),
"model": vllm.get("current_model"), "model": vllm.get("current_model"),
"openai_compat": True, "openai_compat": True,
"disabled": bool(vllm.get("disabled")),
}, },
"parakeet": { "parakeet": {
"ready": bool(parakeet.get("ok")), "ready": bool(parakeet.get("ok")),
"base_url": parakeet.get("base_url"), "base_url": parakeet.get("base_url"),
"kind": "stt", "kind": "stt",
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None, "model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
"disabled": bool(parakeet.get("disabled")),
}, },
"kokoro": { "kokoro": {
"ready": bool(kokoro.get("ok")), "ready": bool(kokoro.get("ok")),
"base_url": kokoro.get("base_url"), "base_url": kokoro.get("base_url"),
"kind": "tts", "kind": "tts",
"disabled": bool(kokoro.get("disabled")),
}, },
"embeddings": { "embeddings": {
"ready": bool(embeddings.get("ok")), "ready": bool(embeddings.get("ok")),
@@ -707,12 +888,14 @@ async def get_endpoints() -> dict:
"model": embeddings.get("model"), "model": embeddings.get("model"),
# The proxied OpenAI-compatible endpoints live on Spark Control itself. # The proxied OpenAI-compatible endpoints live on Spark Control itself.
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"], "openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
"disabled": bool(embeddings.get("disabled")),
}, },
"qdrant": { "qdrant": {
"ready": bool(qdrant.get("ok")), "ready": bool(qdrant.get("ok")),
"base_url": qdrant.get("base_url"), "base_url": qdrant.get("base_url"),
"kind": "vectordb", "kind": "vectordb",
"collection": settings.qdrant_collection or None, "collection": settings.qdrant_collection or None,
"disabled": bool(qdrant.get("disabled")),
}, },
} }
@@ -726,12 +909,15 @@ async def get_status() -> dict:
check_embeddings(settings), check_embeddings(settings),
check_qdrant(settings), check_qdrant(settings),
) )
# Feed health into the connectivity log (deduped — only logs on transition) # Feed health into the connectivity log (deduped — only logs on transition).
record_state("vllm", bool(vllm.get("ok"))) # Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
record_state("parakeet", bool(parakeet.get("ok"))) # perpetually down.
record_state("kokoro", bool(kokoro.get("ok"))) for _name, _r in (
record_state("embeddings", bool(embeddings.get("ok"))) ("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
record_state("qdrant", bool(qdrant.get("ok"))) ("embeddings", embeddings), ("qdrant", qdrant),
):
if not _r.get("disabled"):
record_state(_name, bool(_r.get("ok")))
current_key = _identify_current_model(vllm.get("current_model")) current_key = _identify_current_model(vllm.get("current_model"))
return { return {
"configured": settings.configured, "configured": settings.configured,
@@ -748,10 +934,13 @@ async def get_status() -> dict:
def _identify_current_model(repo: str | None) -> str | None: def _identify_current_model(repo: str | None) -> str | None:
if not repo: if not repo:
return None return None
# A recipe-backed model keys by its recipe key; a discovered model (loaded but
# not yet set up) keys by the same slug build_menu uses, so it still
# highlights as the active card.
for key, m in catalog.models.items(): for key, m in catalog.models.items():
if m.repo == repo: if m.repo == repo:
return key return key
return None return repo_to_key(repo)
class SwapRequest(BaseModel): class SwapRequest(BaseModel):
@@ -769,9 +958,21 @@ async def validate_swap(key: str) -> dict:
@app.post("/api/swap") @app.post("/api/swap")
async def post_swap(req: SwapRequest) -> dict: async def post_swap(req: SwapRequest, request: Request) -> dict:
if not settings.configured and not req.dry_run: if not settings.configured and not req.dry_run:
raise HTTPException(503, "spark1 not configured") raise HTTPException(503, "spark1 not configured")
# Enforce the swap reservation lock (the GPU arbiter). A held lock blocks any
# real swap that doesn't present the holder's token in X-Swap-Lock-Token — so
# an external scheduler that holds the lock can swap, but the dashboard (no
# token) is refused while someone else holds it. Dry runs don't touch the
# cluster, so they're exempt.
if not req.dry_run:
blocked = swap_lock.is_blocked_by(request.headers.get("x-swap-lock-token"))
if blocked is not None:
raise HTTPException(status_code=423, detail={
"error": "the GPU swap path is reserved by another holder",
"lock": blocked,
})
try: try:
job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run) job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run)
except KeyError: except KeyError:
@@ -781,6 +982,56 @@ async def post_swap(req: SwapRequest) -> dict:
return {"job_id": job.id, "model_key": job.model_key, "state": job.state} return {"job_id": job.id, "model_key": job.model_key, "state": job.state}
# ---- Swap reservation lock (the GPU arbiter) ----
# ROUTE ORDER IS LOAD-BEARING: these static `/api/swap/lock` routes MUST be
# registered before the parametric `/api/swap/{job_id}` below. FastAPI matches in
# registration order, so if `{job_id}` came first, GET /api/swap/lock would bind
# job_id="lock", look up a (non-existent) swap job, and 404 — which is exactly
# the bug this ordering fixes. Keep these above the {job_id} routes.
# CSRF: these are control-surface, not browser-exempt — an external scheduler is
# a non-browser client (no Origin header) so it passes the guard already, the
# same way it calls /api/swap; the dashboard is same-origin.
class LockAcquireRequest(BaseModel):
holder: str
ttl_seconds: int | None = None
note: str = ""
token: str | None = None # present only to extend an existing hold
@app.post("/api/swap/lock")
async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
"""Reserve the GPU swap path. Returns a secret token used to swap (header
X-Swap-Lock-Token) and to release. 409 if held by another holder."""
try:
lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
except ValueError as e:
raise HTTPException(422, str(e))
except LockHeld as e:
raise HTTPException(status_code=409, detail={
"error": "swap lock is held by another holder",
"lock": e.state,
})
return {**swap_lock.status(), "token": lock.token}
@app.get("/api/swap/lock")
async def get_swap_lock() -> dict:
"""Public, token-free view of the reservation: held? who? until when?"""
return swap_lock.status()
@app.delete("/api/swap/lock")
async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
"""Release the reservation. Needs the matching X-Swap-Lock-Token unless
?force=true (the human override from the dashboard)."""
token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
try:
released = swap_lock.release(token, force=force)
except PermissionError as e:
raise HTTPException(403, str(e))
return {"released": released, **swap_lock.status()}
@app.get("/api/swap/{job_id}") @app.get("/api/swap/{job_id}")
async def get_swap(job_id: str) -> dict: async def get_swap(job_id: str) -> dict:
job = swap_manager.get(job_id) job = swap_manager.get(job_id)
@@ -826,6 +1077,47 @@ async def stream_swap(job_id: str):
return StreamingResponse(gen(), media_type="text/event-stream") return StreamingResponse(gen(), media_type="text/event-stream")
# ---- Coordination layer: read-only schedule registry ----
# (The swap reservation lock lives above, next to the swap routes.) Same CSRF
# posture: control-surface, not browser-exempt — external schedulers send no
# Origin header so they pass the guard; the dashboard is same-origin.
class ScheduleRequest(BaseModel):
name: str
id: str | None = None
owner: str = ""
cron: str = ""
next_run: str = ""
description: str = ""
@app.get("/api/schedule")
async def list_schedules() -> dict:
return {"schedules": schedule_registry.list()}
@app.post("/api/schedule")
async def register_schedule(req: ScheduleRequest) -> dict:
"""Register (or update, by id) a schedule an external scheduler owns. Spark
Control only stores it for the dashboard — it never executes it."""
try:
entry = schedule_registry.register(
name=req.name, id=req.id, owner=req.owner,
cron=req.cron, next_run=req.next_run, description=req.description,
)
except ValueError as e:
raise HTTPException(422, str(e))
return entry.public()
@app.delete("/api/schedule/{schedule_id}")
async def delete_schedule(schedule_id: str) -> dict:
# Whitelist the path segment at the boundary (repo convention), even though
# it's only ever a dict key — keeps it from being reflected or logged raw.
if not valid_schedule_id(schedule_id):
raise HTTPException(422, "invalid schedule id")
return {"deleted": schedule_registry.delete(schedule_id)}
class DownloadRequest(BaseModel): class DownloadRequest(BaseModel):
repo: str repo: str
mode: Literal["spark1", "spark2", "cluster"] = "spark1" mode: Literal["spark1", "spark2", "cluster"] = "spark1"
+24 -2
View File
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
appropriate host. appropriate host.
""" """
from __future__ import annotations from __future__ import annotations
import logging
import time import time
from dataclasses import dataclass from dataclasses import dataclass
from typing import Literal, Optional from typing import Literal, Optional
@@ -13,6 +14,8 @@ from .config import Settings
from .shellsafe import quote_arg from .shellsafe import quote_arg
from .ssh import ssh_run from .ssh import ssh_run
log = logging.getLogger(__name__)
# Cache the "unreachable" verdict per (host, user) for a short period so that a # Cache the "unreachable" verdict per (host, user) for a short period so that a
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time. # repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
@@ -89,10 +92,27 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
container=s.qdrant_container, container=s.qdrant_container,
port=s.qdrant_port, port=s.qdrant_port,
), ),
# matrix-bridge Matrix bot. No HTTP port to probe (host networking, no
# health endpoint) — judged purely by docker state. Driven as its own
# SSH user (modelo, the repo owner) so git/docker run unprivileged.
"matrix-bridge": ServiceDef(
name="matrix-bridge",
kind="bot",
host=s.matrix_bridge_host,
user=s.matrix_bridge_user,
container=s.matrix_bridge_container,
port=0,
),
} }
for entry in load_custom_services(): for entry in load_custom_services():
key = entry.get("key") key = entry.get("key")
if not key or key in out: if not key:
continue
if key in out:
# A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
# an adopter who picked a colliding key for, say, a second vLLM sees
# why no tile appeared instead of a silent no-op.
log.warning("custom service %r collides with a built-in name; ignoring", key)
continue continue
out[key] = ServiceDef( out[key] = ServiceDef(
name=key, name=key,
@@ -102,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
container=entry.get("container", key), container=entry.get("container", key),
port=int(entry.get("port", 0)), port=int(entry.get("port", 0)),
) )
return out # Drop services the deployment has switched off (DISABLED_SERVICES) so they
# show no tile and are never probed/auto-restarted.
return {k: v for k, v in out.items() if k not in s.disabled_services}
async def docker_state(settings: Settings, svc: ServiceDef) -> dict: async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
+25
View File
@@ -28,6 +28,12 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
# Docker container / volume name (Docker's own rule). # Docker container / volume name (Docker's own rule).
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$") _CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
# Absolute filesystem path to a local model directory on a Spark. Conservative
# charset (letters, digits, and safe path punctuation) with a required leading
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
# and '..' segments) is rejected separately in validate_local_path.
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
def validate_repo(repo: str) -> str: def validate_repo(repo: str) -> str:
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError.""" """Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
@@ -50,6 +56,25 @@ def validate_container(name: str) -> str:
return name return name
def validate_local_path(path: str) -> str:
"""Return `path` if it is a safe absolute model directory path; else ValueError.
For locally fine-tuned models served by directory (not an HF repo). Requires
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
caller cannot traverse out of an intended models directory. The `quote_arg`
sink still quotes it in depth — this is the boundary check.
"""
p = path or ""
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
raise ValueError(
f"invalid local model path (expected an absolute path, no spaces or "
f"shell metacharacters): {path!r}"
)
if any(seg in (".", "..") for seg in p.split("/")):
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
return p
def quote_arg(value: object) -> str: def quote_arg(value: object) -> str:
"""shlex.quote a single token for safe embedding in a shell command string.""" """shlex.quote a single token for safe embedding in a shell command string."""
return shlex.quote(str(value)) return shlex.quote(str(value))
+553 -113
View File
@@ -13,18 +13,27 @@ const state = {
swap_progress: 0, // 01 swap_progress: 0, // 01
services: {}, services: {},
service_action_in_flight: null, // e.g. "parakeet:restart" service_action_in_flight: null, // e.g. "parakeet:restart"
mb_update_in_flight: false, // matrix-bridge update job running
hardware: {}, hardware: {},
config: {}, config: {},
configured: true, configured: true,
timer_handle: null, timer_handle: null,
deep_health: {}, deep_health: {},
disk_status: {}, // keyed by model key: { on_disk, total_bytes, per_host } models_loaded: false, // true once the first disk scan (/api/models) returns
disk_status_loaded: false, recipes: [], // known launch recipes (for the download autocomplete)
lock: { held: false }, // GPU swap reservation (coordination layer)
schedules: [], // schedules external automation has registered
}; };
const el = (sel) => document.querySelector(sel); const el = (sel) => document.querySelector(sel);
const $$ = (sel) => document.querySelectorAll(sel); const $$ = (sel) => document.querySelectorAll(sel);
// ISO timestamp -> local clock string (e.g. "2:45:10 PM"); '' if unparseable.
function fmtClock(iso) {
const t = Date.parse(iso);
return isNaN(t) ? '' : new Date(t).toLocaleTimeString();
}
function escapeHtml(s) { function escapeHtml(s) {
if (s == null) return ''; if (s == null) return '';
return String(s) return String(s)
@@ -50,69 +59,86 @@ function renderCards() {
const root = el('#cards'); const root = el('#cards');
root.innerHTML = ''; root.innerHTML = '';
const isSwapping = !!state.swap_job_id; const isSwapping = !!state.swap_job_id;
for (const key of Object.keys(state.models)) { // GPU reserved by external automation — manual swaps are refused server-side
// (423); reflect that in the buttons so the click never bounces.
const locked = !!(state.lock && state.lock.held);
const lockTip = locked
? `Reserved by ${state.lock.holder || 'automation'}${state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : ''}`
: '';
const keys = Object.keys(state.models);
if (keys.length === 0) {
// The menu is the disk: nothing downloaded (or the scan hasn't returned yet).
root.innerHTML = state.models_loaded
? `<div class="empty-menu muted">No models downloaded on the Sparks yet. Use <strong>+ Download a new model</strong> above to fetch one — it'll appear here when it's done.</div>`
: `<div class="empty-menu muted">Scanning the Sparks for downloaded models…</div>`;
return;
}
for (const key of keys) {
const m = state.models[key]; const m = state.models[key];
const isActive = key === state.current_model_key; const isActive = key === state.current_model_key;
const card = document.createElement('div'); const card = document.createElement('div');
card.className = 'card' + (isActive ? ' active' : ''); card.className = 'card' + (isActive ? ' active' : '') + (m.needs_setup ? ' needs-setup' : '');
const desc = m.description const desc = m.description
? `<div class="desc">${escapeHtml(m.description)}</div>` ? `<div class="desc">${escapeHtml(m.description)}</div>`
: ''; : '';
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : ''; const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
// Disk-presence pill + trash button. Until /api/models/disk-status comes back, const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
// we don't know — render a neutral placeholder. // Every card on the menu is on disk by definition — show its real size.
const disk = state.disk_status[key]; const gb = (m.total_bytes || 0) / 1e9;
let diskPill = ''; const diskPill = gb > 0
if (state.disk_status_loaded) { ? `<span class="tag on-disk" title="Weights present on the Spark(s)">on disk · ${gb.toFixed(1)} GB</span>`
if (disk && disk.on_disk) { : '';
const gb = (disk.total_bytes / 1e9); const setupPill = m.needs_setup
diskPill = `<span class="tag on-disk" title="Weights present on disk">on disk · ${gb.toFixed(1)} GB</span>`; ? `<span class="tag setup-pill" title="On disk, but Spark Control hasn't been told how to launch it">needs setup</span>`
} else { : '';
diskPill = `<span class="tag not-on-disk" title="Weights not downloaded">not downloaded</span>`; // Trash = remove weights from disk AND from the menu. Disabled if active / mid-swap.
} // Never offered for local models: their directory is hand-placed training output,
} // not a re-downloadable HF cache (the server refuses the delete too).
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
let trashBtn = ''; let trashBtn = '';
if (state.disk_status_loaded && disk && disk.on_disk) { if (!m.local_path) {
const disabled = isActive || isSwapping; const disabled = isActive || isSwapping;
const tip = isActive const tip = isActive
? 'Currently loaded — switch to another model first' ? 'Currently loaded — switch to another model first'
: isSwapping : isSwapping
? 'A swap is in progress' ? 'A swap is in progress'
: 'Delete weights from disk'; : 'Remove weights from disk & menu';
trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Delete from disk" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`; trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Remove from disk and menu" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
} }
// Primary card action: "Switch to this" (green) when on disk; "Download" (blue) when not. // Primary action: "Current" / "Switch to this", or "Set up & switch" for a
// Before disk-status loads we render the swap button as a sensible default. // model on disk that has no launch recipe yet.
const isOnDisk = !state.disk_status_loaded || (disk && disk.on_disk); const swapBlocked = isSwapping || locked;
const dlInFlight = !!(typeof dlState !== 'undefined' && dlState && dlState.job_id); const lockTipAttr = locked ? ` title="${escapeHtml(lockTip)}"` : '';
let primaryBtn = ''; let primaryBtn = '';
if (isActive) { if (isActive) {
primaryBtn = `<button class="btn" disabled>Current</button>`; primaryBtn = `<button class="btn" disabled>Current</button>`;
} else if (isOnDisk) { } else if (m.needs_setup) {
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`; primaryBtn = `<button class="btn primary" data-setup-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Set up &amp; switch</button>`;
} else { } else {
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)'; primaryBtn = `<button class="btn primary" data-swap-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Switch to this</button>`;
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
} }
// The Test/Advanced controls need a saved recipe; hide them until setup is done.
const recipeActions = m.needs_setup ? '' : `
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>`;
card.innerHTML = ` card.innerHTML = `
<div class="name">${escapeHtml(m.display_name)}</div> <div class="name">${escapeHtml(m.display_name)}</div>
<div class="meta"> <div class="meta">
<span class="tag mode-${m.mode}">${m.mode}</span> <span class="tag mode-${m.mode}">${m.mode}</span>
<span class="tag">${m.size_gb} GB</span>
${customPill}
${diskPill} ${diskPill}
${setupPill}
${customPill}
${localPill}
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')} ${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
</div> </div>
${desc} ${desc}
<div class="muted small repo"> <div class="muted small repo">
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a> ${m.local_path
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
</div> </div>
<div class="spacer"></div> <div class="spacer"></div>
<div class="card-actions"> <div class="card-actions">
${primaryBtn} ${primaryBtn}${recipeActions}
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>
${trashBtn} ${trashBtn}
</div> </div>
<div class="test-result hidden" data-test-result-for="${key}"></div> <div class="test-result hidden" data-test-result-for="${key}"></div>
@@ -122,8 +148,8 @@ function renderCards() {
for (const btn of root.querySelectorAll('[data-swap-key]')) { for (const btn of root.querySelectorAll('[data-swap-key]')) {
btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey)); btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
} }
for (const btn of root.querySelectorAll('[data-download-key]')) { for (const btn of root.querySelectorAll('[data-setup-key]')) {
btn.addEventListener('click', () => triggerDownloadForKey(btn.dataset.downloadKey)); btn.addEventListener('click', () => openSetupForKey(btn.dataset.setupKey));
} }
for (const btn of root.querySelectorAll('[data-adv-key]')) { for (const btn of root.querySelectorAll('[data-adv-key]')) {
btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey)); btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
@@ -438,8 +464,13 @@ function classifyService(s) {
if (s.docker_state === 'missing') return 'missing'; if (s.docker_state === 'missing') return 'missing';
if (s.docker_state === 'restarting') return 'unhealthy'; if (s.docker_state === 'restarting') return 'unhealthy';
if (s.docker_state === 'exited') return 'unhealthy'; if (s.docker_state === 'exited') return 'unhealthy';
if (s.docker_state === 'running' && !s.http_ready) return 'starting'; if (s.docker_state === 'running') {
if (s.docker_state === 'running' && s.http_ready) return 'running'; // http_ready === false means an HTTP probe is expected but failing → still
// warming up. null means the service has no HTTP surface (e.g. the bot), so
// a running container is simply healthy.
if (s.http_ready === false) return 'starting';
return 'running';
}
return s.docker_state || 'unknown'; return s.docker_state || 'unknown';
} }
@@ -471,6 +502,11 @@ async function renderServices() {
grid.innerHTML = ''; grid.innerHTML = '';
for (const [name, s] of entries) { for (const [name, s] of entries) {
const cls = classifyService(s); const cls = classifyService(s);
const isBot = s.kind === 'bot';
// The bot tile is opt-in: it only belongs to deployments that actually run
// matrix-bridge. When the container is absent (missing) or the host isn't
// configured, hide the tile entirely rather than show a stray red card.
if (isBot && (cls === 'missing' || cls === 'unconfigured')) continue;
const card = document.createElement('div'); const card = document.createElement('div');
card.className = `service-card ${cls}`; card.className = `service-card ${cls}`;
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':'); const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
@@ -483,7 +519,7 @@ async function renderServices() {
return false; return false;
}; };
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`; const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
const hostStr = s.host ? `${s.host}:${s.port}` : ''; const hostStr = s.host ? (s.port ? `${s.host}:${s.port}` : s.host) : '';
const hostRow = s.host const hostRow = s.host
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>` ? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`; : `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
@@ -537,9 +573,11 @@ async function renderServices() {
${restartsRow} ${restartsRow}
${deepRow} ${deepRow}
<div class="service-actions"> <div class="service-actions">
${isBot ? `<button class="btn primary" data-mb-update title="Pull latest code, rebuild, and recreate the bot" ${inFlight || state.mb_update_in_flight ? 'disabled' : ''}>Update</button>` : ''}
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button> <button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button> <button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button> <button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
${isBot ? `<button class="btn" data-mb-logs title="Show the last 100 log lines">View logs</button>` : ''}
</div> </div>
`; `;
grid.appendChild(card); grid.appendChild(card);
@@ -547,6 +585,10 @@ async function renderServices() {
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) { for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction)); btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
} }
const mbUpdateBtn = grid.querySelector('[data-mb-update]');
if (mbUpdateBtn) mbUpdateBtn.addEventListener('click', onMatrixBridgeUpdate);
const mbLogsBtn = grid.querySelector('[data-mb-logs]');
if (mbLogsBtn) mbLogsBtn.addEventListener('click', openMatrixBridgeLogs);
for (const btn of grid.querySelectorAll('[data-dh-run]')) { for (const btn of grid.querySelectorAll('[data-dh-run]')) {
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn)); btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
} }
@@ -725,6 +767,118 @@ async function onServiceAction(key) {
} }
} }
// ===================== matrix-bridge bot (update + logs) =====================
const mbState = { job_id: null, eventsource: null, timer: null, started_at: null };
function mbTimerStart(at) {
mbState.started_at = at;
if (mbState.timer) clearInterval(mbState.timer);
const tick = () => {
if (!mbState.started_at) return;
const sec = Math.max(0, Math.floor((Date.now() - mbState.started_at) / 1000));
el('#mb-update-elapsed').textContent = `${Math.floor(sec / 60)}:${(sec % 60).toString().padStart(2, '0')}`;
};
tick();
mbState.timer = setInterval(tick, 500);
}
async function onMatrixBridgeUpdate() {
if (state.mb_update_in_flight) return;
if (!confirm('Update the matrix-bridge bot?\n\nThis pulls the latest code, rebuilds the container image, and recreates the container. The first build after a base-image change can take several minutes. The bot is briefly offline while it restarts.')) return;
state.mb_update_in_flight = true;
renderServices();
try {
const r = await fetchJSON('/api/matrix-bridge/update', { method: 'POST' });
attachMbUpdateProgress(r.job_id);
} catch (e) {
state.mb_update_in_flight = false;
renderServices();
alert('Update failed to start: ' + e.message);
}
}
async function attachMbUpdateProgress(jobId) {
mbState.job_id = jobId;
el('#mb-update-log').textContent = '';
el('#mb-update-title').textContent = 'Updating matrix-bridge…';
el('#mb-update-phase').textContent = 'Starting…';
el('#mb-update-dialog').showModal();
try {
const snap = await fetchJSON(`/api/matrix-bridge/update/${jobId}`);
mbTimerStart(Date.parse(snap.started_at));
el('#mb-update-phase').textContent = snap.phase || 'Working…';
el('#mb-update-log').textContent = (snap.lines || []).join('\n');
if (snap.returncode !== null) { onMbUpdateDone(snap); return; }
} catch { mbTimerStart(Date.now()); }
const es = new EventSource(`/api/matrix-bridge/update/${jobId}/stream`);
mbState.eventsource = es;
es.onmessage = ev => {
try {
const d = JSON.parse(ev.data);
if (d.line !== undefined) {
const log = el('#mb-update-log');
log.textContent += d.line + '\n';
log.scrollTop = log.scrollHeight;
}
} catch {}
};
es.addEventListener('phase', ev => {
try { el('#mb-update-phase').textContent = JSON.parse(ev.data).phase; } catch {}
});
es.addEventListener('done', ev => {
let d = {}; try { d = JSON.parse(ev.data); } catch {}
onMbUpdateDone(d);
});
es.onerror = () => {
// Don't leave the Update button wedged-disabled on a dropped stream. The
// job keeps running server-side; re-clicking Update returns a clean 409.
es.close();
mbState.eventsource = null;
state.mb_update_in_flight = false;
el('#mb-update-phase').textContent = 'Lost connection to the update stream — reopen or check logs.';
renderServices();
};
}
function onMbUpdateDone(d) {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
if (d.state === 'failed') {
el('#mb-update-title').textContent = `Update failed (rc=${d.returncode})`;
el('#mb-update-phase').textContent = 'Failed — see the log above.';
} else {
el('#mb-update-title').textContent = 'Update complete';
el('#mb-update-phase').textContent = 'Done ✓';
}
// Refresh the tile's badge.
(async () => { try { state.services = await fetchJSON('/api/services'); } catch {} renderServices(); })();
}
async function openMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
el('#mb-logs-title').textContent = 'matrix-bridge logs';
pre.textContent = 'Loading…';
el('#mb-logs-dialog').showModal();
await loadMatrixBridgeLogs();
}
async function loadMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
const btn = el('#mb-logs-refresh');
if (btn) btn.disabled = true;
try {
const r = await fetchJSON('/api/matrix-bridge/logs?tail=100');
pre.textContent = r.output || '(no output)';
pre.scrollTop = pre.scrollHeight;
} catch (e) {
pre.textContent = 'Could not read logs: ' + e.message;
} finally {
if (btn) btn.disabled = false;
}
}
function renderEndpoint(status) { function renderEndpoint(status) {
const v = status.vllm || {}; const v = status.vllm || {};
const panel = el('#endpoint-panel'); const panel = el('#endpoint-panel');
@@ -794,6 +948,10 @@ function renderHealth(status) {
function setDot(id, ok, payload) { function setDot(id, ok, payload) {
const item = el(id); const item = el(id);
if (!item) return; if (!item) return;
// A service switched off via DISABLED_SERVICES isn't part of this
// deployment — hide its indicator entirely rather than show it as down.
if (payload && payload.disabled) { item.classList.add('hidden'); return; }
item.classList.remove('hidden');
const dot = item.querySelector('.dot'); const dot = item.querySelector('.dot');
dot.classList.remove('ok', 'bad', 'warn'); dot.classList.remove('ok', 'bad', 'warn');
if (ok === true) dot.classList.add('ok'); if (ok === true) dot.classList.add('ok');
@@ -1012,24 +1170,44 @@ async function pollStatus() {
} }
} }
let menuLoadInFlight = false;
async function loadModels() { async function loadModels() {
// The menu is whatever's downloaded on the Sparks — /api/models does the scan
// (SSH), so this is the slower model call. Best-effort: a transient failure
// leaves the previous menu in place rather than blanking the dashboard.
// Guard against overlap: init() fires this un-awaited and pollStatus()'s
// empty-menu fallback may call it again before the scan returns.
if (menuLoadInFlight) return;
menuLoadInFlight = true;
try {
const data = await fetchJSON('/api/models'); const data = await fetchJSON('/api/models');
state.defaults = data.defaults || {}; state.defaults = data.defaults || {};
state.models = data.models || {}; state.models = data.models || {};
state.recipes = data.recipes || [];
state.models_loaded = true;
populateDownloadSuggestions();
renderCards();
} catch (e) {
console.warn('model menu load failed:', e.message);
} finally {
menuLoadInFlight = false;
}
} }
async function loadDiskStatus() { // Populate the download box's autocomplete with known recipes not currently on
// Probes each catalog model's HF cache over SSH; takes a beat. Best-effort. // disk — so common/bundled models stay discoverable without phantom menu cards.
try { function populateDownloadSuggestions() {
const r = await fetchJSON('/api/models/disk-status'); const dl = el('#dl-suggestions');
if (r && r.models) { if (!dl) return;
state.disk_status = r.models; const onDiskRepos = new Set(Object.values(state.models).map(m => m.repo).filter(Boolean));
state.disk_status_loaded = true; dl.innerHTML = '';
renderCards(); for (const r of state.recipes || []) {
} if (onDiskRepos.has(r.repo)) continue;
} catch (e) { const opt = document.createElement('option');
// Silent — pills just won't render. Don't block dashboard. opt.value = r.repo;
console.warn('disk-status probe failed:', e.message); opt.label = `${r.display_name} (${r.mode})`;
dl.appendChild(opt);
} }
} }
@@ -1043,14 +1221,12 @@ function fmtBytesShort(n) {
function openDiskDeleteDialog(key) { function openDiskDeleteDialog(key) {
const m = state.models[key]; const m = state.models[key];
const disk = state.disk_status[key]; if (!m || !m.on_disk) return;
if (!m || !disk || !disk.on_disk) return;
const dlg = el('#disk-delete-dialog'); const dlg = el('#disk-delete-dialog');
el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(disk.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from disk.`; el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(m.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from the Sparks. This also takes it off the menu.`;
const hostsEl = el('#dd-hosts'); const hostsEl = el('#dd-hosts');
hostsEl.innerHTML = ''; hostsEl.innerHTML = '';
for (const h of (disk.per_host || [])) { for (const h of (m.per_host || [])) {
if (!h.on_disk) continue;
const li = document.createElement('li'); const li = document.createElement('li');
li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`; li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
hostsEl.appendChild(li); hostsEl.appendChild(li);
@@ -1069,20 +1245,19 @@ function openDiskDeleteDialog(key) {
try { try {
const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' }); const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
dlg.close(); dlg.close();
// Optimistically clear local disk state for this key, then refresh. // Optimistically drop the card, then re-scan the menu (it's gone from disk).
delete state.disk_status[key]; delete state.models[key];
renderCards(); renderCards();
// Eagerly re-probe so size is accurate (and shows "not downloaded" pill). await loadModels();
loadDiskStatus();
const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : ''; const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
console.log(`Deleted ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`); console.log(`Removed ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
} catch (e) { } catch (e) {
errEl.textContent = e.message || 'Delete failed'; errEl.textContent = e.message || 'Delete failed';
errEl.classList.remove('hidden'); errEl.classList.remove('hidden');
} finally { } finally {
confirm.disabled = false; confirm.disabled = false;
cancel.disabled = false; cancel.disabled = false;
confirm.textContent = 'Delete from disk'; confirm.textContent = 'Remove from disk & menu';
} }
}; };
cancel.onclick = onCancel; cancel.onclick = onCancel;
@@ -1092,6 +1267,11 @@ function openDiskDeleteDialog(key) {
async function triggerSwap(modelKey) { async function triggerSwap(modelKey) {
if (state.swap_job_id) return; if (state.swap_job_id) return;
if (state.lock && state.lock.held) {
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
return;
}
try { try {
const r = await fetchJSON('/api/swap', { const r = await fetchJSON('/api/swap', {
method: 'POST', method: 'POST',
@@ -1100,42 +1280,84 @@ async function triggerSwap(modelKey) {
}); });
attachToSwap(r.job_id, /*needsBackfill=*/false); attachToSwap(r.job_id, /*needsBackfill=*/false);
} catch (e) { } catch (e) {
// 423 Locked: a reservation was acquired between our last poll and this click.
if (e.message && e.message.startsWith('423')) {
alert('The GPU swap path was just reserved by automation. Refreshing…');
pollCoordination();
} else {
alert('Failed to start swap: ' + e.message); alert('Failed to start swap: ' + e.message);
} }
}
} }
async function triggerDownloadForKey(modelKey) { // ---- coordination layer: swap lock + schedule registry ----
const m = state.models[modelKey];
if (!m) return; async function pollCoordination() {
if (dlState.job_id) {
alert('A download is already in progress; wait for it to finish.');
return;
}
// Pick the download target from the model's mode:
// solo -> spark1 only
// cluster -> both Sparks (fetch on Spark 1, rsync to Spark 2 in parallel)
const dlMode = m.mode === 'cluster' ? 'cluster' : 'spark1';
const sizeNote = m.size_gb ? ` (~${m.size_gb} GB)` : '';
const target = m.mode === 'cluster' ? 'both Sparks' : 'Spark 1';
if (!confirm(`Download "${m.display_name}"${sizeNote} to ${target}? Large models can take a while; you can watch progress in the download panel.`)) {
return;
}
dlState.last_repo = m.repo;
dlState.last_mode = dlMode;
try { try {
const r = await fetchJSON('/api/download', { state.lock = await fetchJSON('/api/swap/lock');
method: 'POST', } catch { state.lock = { held: false }; }
headers: { 'content-type': 'application/json' }, try {
body: JSON.stringify({ repo: m.repo, mode: dlMode }), const r = await fetchJSON('/api/schedule');
}); state.schedules = r.schedules || [];
// Open the download panel + attach to progress stream } catch { state.schedules = []; }
openDownloadForm(); renderLockBanner();
attachToDownload(r.job_id); renderSchedules();
} catch (e) { renderCards(); // reflect lock state on the swap buttons
alert('Failed to start download: ' + e.message); }
function renderLockBanner() {
const banner = el('#lock-banner');
if (!banner) return;
const lock = state.lock;
if (lock && lock.held) {
const until = lock.expires_at ? ` until ${fmtClock(lock.expires_at)}` : '';
const note = lock.note ? `${escapeHtml(lock.note)}` : '';
el('#lock-text').innerHTML =
`GPU swap path reserved by <strong>${escapeHtml(lock.holder || 'automation')}</strong>${until}${note}. Manual swaps are paused.`;
banner.classList.remove('hidden');
} else {
banner.classList.add('hidden');
} }
} }
function renderSchedules() {
const panel = el('#schedule-panel');
const list = el('#schedule-list');
if (!panel || !list) return;
const items = state.schedules || [];
if (!items.length) {
panel.classList.add('hidden');
list.innerHTML = '';
return;
}
list.innerHTML = items.map((s) => {
const meta = [
s.cron ? `<code>${escapeHtml(s.cron)}</code>` : '',
s.next_run ? `next: ${escapeHtml(s.next_run)}` : '',
s.owner ? `by ${escapeHtml(s.owner)}` : '',
].filter(Boolean).join(' · ');
const desc = s.description ? `<div class="desc">${escapeHtml(s.description)}</div>` : '';
return `<div class="schedule-item">
<div class="name">${escapeHtml(s.name)}</div>
<div class="muted small">${meta}</div>
${desc}
</div>`;
}).join('');
panel.classList.remove('hidden');
}
async function releaseLock() {
const lock = state.lock || {};
const who = lock.holder || 'automation';
if (!confirm(`Force-release the GPU reservation held by ${who}? Any job relying on it may then collide with a manual swap.`)) return;
try {
await fetchJSON('/api/swap/lock?force=true', { method: 'DELETE' });
} catch (e) {
alert('Failed to release: ' + e.message);
}
pollCoordination();
}
async function attachToSwap(jobId, needsBackfill) { async function attachToSwap(jobId, needsBackfill) {
if (state.swap_eventsource) { if (state.swap_eventsource) {
state.swap_eventsource.close(); state.swap_eventsource.close();
@@ -1366,12 +1588,14 @@ function handleDownloadDone(d) {
el('#dl-title').textContent = 'Done'; el('#dl-title').textContent = 'Done';
el('#dl-phase').textContent = 'Done ✓'; el('#dl-phase').textContent = 'Done ✓';
el('#dl-progress-fill').style.width = '100%'; el('#dl-progress-fill').style.width = '100%';
// Offer to add to catalog // The new model now appears on the menu (the menu is the disk). If it matched
// a known recipe it's ready to switch to; if not, offer to set it up.
const repo = dlState.last_repo; const repo = dlState.last_repo;
const mode = dlState.last_mode; loadModels().then(() => {
if (repo) { if (!repo) return;
setTimeout(() => openCatalogDialog(repo, mode), 600); const entry = Object.values(state.models).find(m => m.repo === repo);
} if (entry && entry.needs_setup) setTimeout(() => openSetupDialog(repo, { thenSwap: false }), 600);
});
} }
dlState.job_id = null; dlState.job_id = null;
} }
@@ -1484,21 +1708,67 @@ function openAdvanced(key) {
dlg.showModal(); dlg.showModal();
} }
function openCatalogDialog(repo, mode) { // Context carried from openSetupDialog -> the submit handler: the inferred
// launch flags (parsers/MoE backend) and whether to swap right after saving.
let setupCtx = { key: '', repo: '', vllm_args: [], thenSwap: false };
// "Set up & switch" on a needs-setup card.
async function openSetupForKey(key) {
const m = state.models[key];
if (!m) return;
if (state.lock && state.lock.held) {
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
return;
}
await openSetupDialog(m.repo, { thenSwap: true });
}
// Open the "set up this model" dialog, prefilled from inference (config.json +
// size). The operator confirms once; on save the recipe persists and (if
// thenSwap) we switch to it.
async function openSetupDialog(repo, opts = {}) {
const dlg = el('#catalog-dialog'); const dlg = el('#catalog-dialog');
const key = repo.split('/').pop().toLowerCase().replace(/[^a-z0-9_-]/g, '-'); let sug = null;
el('#cd-key').value = key; try {
el('#cd-name').value = repo.split('/').pop(); sug = await fetchJSON(`/api/models/suggest?repo=${encodeURIComponent(repo)}`);
} catch (e) {
console.warn('recipe suggestion failed:', e.message);
}
const fallbackKey = repo.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
setupCtx = {
key: (sug && sug.key) || fallbackKey,
repo,
vllm_args: (sug && sug.vllm_args) || [],
thenSwap: !!opts.thenSwap,
};
el('#cd-key').value = setupCtx.key;
el('#cd-name').value = (sug && sug.display_name) || repo.split('/').pop();
el('#cd-repo').value = repo; el('#cd-repo').value = repo;
el('#cd-size').value = ''; el('#cd-size').value = '';
el('#cd-mode').value = mode || 'solo'; el('#cd-mode').value = (sug && sug.mode) || 'solo';
el('#cd-desc').value = ''; el('#cd-desc').value = '';
el('#cd-mml').value = 32768; const knobs = (sug && sug.knobs) || {};
el('#cd-gmu').value = 0.85; el('#cd-mml').value = knobs.max_model_len || 32768;
el('#cd-gmu-out').value = '0.85'; el('#cd-gmu').value = knobs.gpu_memory_utilization || 0.85;
el('#cd-fst').checked = true; el('#cd-gmu-out').value = parseFloat(el('#cd-gmu').value).toFixed(2);
el('#cd-pcache').checked = true; el('#cd-fst').checked = knobs.fastsafetensors !== false;
el('#cd-fp8').checked = true; el('#cd-pcache').checked = knobs.prefix_caching !== false;
el('#cd-fp8').checked = (knobs.kv_cache_dtype || 'fp8') === 'fp8';
const det = el('#cd-detected');
if (det) {
if (sug) {
const caps = (sug.capabilities || []).join(', ');
const flags = setupCtx.vllm_args.length ? `: <code>${escapeHtml(setupCtx.vllm_args.join(' '))}</code>` : '';
det.innerHTML = `Detected <strong>${escapeHtml(sug.family || 'Generic')}</strong>${caps ? ` · ${escapeHtml(caps)}` : ''}. Launch flags set automatically${flags}.`;
} else {
det.textContent = "Couldn't auto-detect this model's settings — pick mode and knobs manually.";
}
det.classList.remove('hidden');
}
const submit = el('#cd-submit');
if (submit) submit.textContent = setupCtx.thenSwap ? 'Save & switch' : 'Save settings';
dlg.showModal(); dlg.showModal();
} }
@@ -1508,13 +1778,15 @@ function setupCatalogDialog() {
el('#catalog-form').addEventListener('submit', async (e) => { el('#catalog-form').addEventListener('submit', async (e) => {
e.preventDefault(); e.preventDefault();
const body = { const body = {
key: el('#cd-key').value.trim(), key: el('#cd-key').value.trim() || setupCtx.key,
display_name: el('#cd-name').value.trim(), display_name: el('#cd-name').value.trim(),
repo: el('#cd-repo').value.trim(), repo: el('#cd-repo').value.trim(),
size_gb: parseFloat(el('#cd-size').value) || 0, size_gb: parseFloat(el('#cd-size').value) || 0,
mode: el('#cd-mode').value, mode: el('#cd-mode').value,
description: el('#cd-desc').value.trim() || null, description: el('#cd-desc').value.trim() || null,
vllm_args: [], // The inferred family flags (parsers / MoE backend); knob-controlled flags
// are layered on by the server from `knobs`, so no duplication.
vllm_args: setupCtx.vllm_args || [],
knobs: { knobs: {
max_model_len: parseInt(el('#cd-mml').value, 10) || 32768, max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
gpu_memory_utilization: parseFloat(el('#cd-gmu').value), gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
@@ -1532,8 +1804,9 @@ function setupCatalogDialog() {
el('#catalog-dialog').close(); el('#catalog-dialog').close();
closeDownloadPanel(); closeDownloadPanel();
await loadModels(); await loadModels();
if (setupCtx.thenSwap) triggerSwap(body.key);
pollStatus(); pollStatus();
} catch (e) { alert('Add to catalog failed: ' + e.message); } } catch (e) { alert('Saving the model setup failed: ' + e.message); }
}); });
} }
@@ -1542,6 +1815,60 @@ function setupAdvancedDialog() {
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); }); el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
} }
function openLocalModelDialog() {
const dlg = el('#local-model-dialog');
el('#lm-key').value = '';
el('#lm-name').value = '';
el('#lm-path').value = '';
el('#lm-chat').value = '';
el('#lm-size').value = '';
el('#lm-mode').value = 'solo';
el('#lm-desc').value = '';
el('#lm-mml').value = 32768;
el('#lm-gmu').value = 0.85;
el('#lm-gmu-out').value = '0.85';
el('#lm-fst').checked = true;
el('#lm-pcache').checked = true;
el('#lm-fp8').checked = true;
dlg.showModal();
}
function setupLocalModelDialog() {
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
el('#local-model-form').addEventListener('submit', async (e) => {
e.preventDefault();
const chat = el('#lm-chat').value.trim();
const body = {
key: el('#lm-key').value.trim(),
display_name: el('#lm-name').value.trim(),
local_path: el('#lm-path').value.trim(),
size_gb: parseFloat(el('#lm-size').value) || 0,
mode: el('#lm-mode').value,
description: el('#lm-desc').value.trim() || null,
// A fine-tune's chat template (if any) rides along as a launch flag.
vllm_args: chat ? [`--chat-template=${chat}`] : [],
knobs: {
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
fastsafetensors: el('#lm-fst').checked,
prefix_caching: el('#lm-pcache').checked,
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
},
};
try {
await fetchJSON('/api/models', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(body),
});
el('#local-model-dialog').close();
await loadModels();
pollStatus();
} catch (e) { alert('Add local model failed: ' + e.message); }
});
}
// ===================== NIM installer ===================== // ===================== NIM installer =====================
const nimState = { const nimState = {
@@ -1865,8 +2192,104 @@ function handleUpdateDone(d) {
setTimeout(pollUpdates, 2000); setTimeout(pollUpdates, 2000);
} }
// ===================== settings ('gear') =====================
// Renders the optional cluster knobs from /api/settings (server-driven field
// list, so adding a knob server-side needs no JS change) and POSTs edits back.
// The server reloads its config in place, so changes take effect immediately.
let settingsClearSentinel = '__clear__';
function renderSettingsForm(data) {
settingsClearSentinel = data.clear_sentinel || settingsClearSentinel;
const body = el('#settings-body');
body.innerHTML = (data.groups || []).map((g) => {
const rows = g.fields.map((f) => {
const help = f.help ? `<span class="muted small settings-help">${escapeHtml(f.help)}</span>` : '';
let input;
let clearToggle = '';
if (f.type === 'secret') {
const ph = f.set ? 'set — leave blank to keep' : (f.placeholder || '');
input = `<input type="password" autocomplete="off" data-key="${f.key}" data-secret="1" placeholder="${escapeHtml(ph)}">`;
// A stored secret is never echoed back, so blank means "keep". Offer an
// explicit way to remove it.
if (f.set) clearToggle = `<label class="settings-clear muted small"><input type="checkbox" data-clear-for="${f.key}"> clear stored value</label>`;
} else if (f.type === 'int') {
input = `<input type="number" min="1" max="65535" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
} else {
input = `<input type="text" autocomplete="off" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
}
return `<div class="settings-field"><label class="modal-row"><span>${escapeHtml(f.label)}</span>${input}</label>${clearToggle}${help}</div>`;
}).join('');
return `<fieldset class="modal-fieldset"><legend>${escapeHtml(g.name)}</legend>${rows}</fieldset>`;
}).join('');
}
async function openSettingsDialog() {
const dlg = el('#settings-dialog');
const err = el('#settings-error');
err.classList.add('hidden');
el('#settings-body').innerHTML = '<p class="muted small">Loading…</p>';
dlg.showModal();
try {
renderSettingsForm(await fetchJSON('/api/settings'));
} catch (e) {
el('#settings-body').innerHTML = '';
err.textContent = 'Could not load settings: ' + e.message;
err.classList.remove('hidden');
}
}
async function saveSettings(e) {
e.preventDefault();
const err = el('#settings-error');
err.classList.add('hidden');
const values = {};
$$('#settings-body [data-key]').forEach((inp) => {
const key = inp.dataset.key;
const v = inp.value.trim();
if (inp.dataset.secret) {
// "clear" checkbox wins; else a typed value sets it; else omit (keep the
// stored one — we can't see it to retype it).
const clear = el(`[data-clear-for="${key}"]`);
if (clear && clear.checked) values[key] = settingsClearSentinel;
else if (v) values[key] = v;
} else {
values[key] = v; // blank non-secret ⇒ server reverts it to the default
}
});
const btn = el('#settings-save');
btn.disabled = true;
try {
await fetchJSON('/api/settings', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({ values }),
});
el('#settings-dialog').close();
// Re-pull everything a knob can move: the Open WebUI link, health probes,
// service tiles, and the model menu (host/port changes alter all of them).
try {
state.config = await fetchJSON('/api/config');
const a = el('#open-webui-link');
if (state.config.open_webui_url) { a.href = state.config.open_webui_url; a.classList.remove('hidden'); }
else { a.classList.add('hidden'); }
} catch (e3) { console.warn('post-save /api/config refresh failed:', e3); }
pollStatus();
renderServices();
loadModels();
} catch (e2) {
err.textContent = 'Save failed: ' + e2.message.replace(/^\d+ [^:]*:\s*/, '');
err.classList.remove('hidden');
} finally {
btn.disabled = false;
}
}
async function init() { async function init() {
setupCopyButtons(); setupCopyButtons();
el('#open-settings').addEventListener('click', openSettingsDialog);
el('#settings-cancel').addEventListener('click', () => el('#settings-dialog').close());
el('#settings-form').addEventListener('submit', saveSettings);
el('#open-download').addEventListener('click', openDownloadForm); el('#open-download').addEventListener('click', openDownloadForm);
el('#dl-cancel').addEventListener('click', closeDownloadPanel); el('#dl-cancel').addEventListener('click', closeDownloadPanel);
el('#dl-start').addEventListener('click', startDownload); el('#dl-start').addEventListener('click', startDownload);
@@ -1883,6 +2306,17 @@ async function init() {
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close()); el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
el('#nim-form').addEventListener('submit', submitNim); el('#nim-form').addEventListener('submit', submitNim);
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close()); el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
el('#mb-update-close').addEventListener('click', () => el('#mb-update-dialog').close());
// Dismissing the modal (Close or Esc) stops streaming; the job runs on
// server-side and re-clicking Update returns a 409 if still in progress.
el('#mb-update-dialog').addEventListener('close', () => {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
renderServices();
});
el('#mb-logs-close').addEventListener('click', () => el('#mb-logs-dialog').close());
el('#mb-logs-refresh').addEventListener('click', loadMatrixBridgeLogs);
el('#open-connectivity').addEventListener('click', openConnectivityDialog); el('#open-connectivity').addEventListener('click', openConnectivityDialog);
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close()); el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
// Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on // Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on
@@ -1894,8 +2328,11 @@ async function init() {
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; } if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
}); });
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close()); el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
el('#open-local').addEventListener('click', openLocalModelDialog);
el('#lock-release').addEventListener('click', releaseLock);
setupCatalogDialog(); setupCatalogDialog();
setupAdvancedDialog(); setupAdvancedDialog();
setupLocalModelDialog();
// Open WebUI link from /api/config // Open WebUI link from /api/config
try { try {
state.config = await fetchJSON('/api/config'); state.config = await fetchJSON('/api/config');
@@ -1907,19 +2344,22 @@ async function init() {
} catch {} } catch {}
setupDashboardTabs(); setupDashboardTabs();
setupEndpointCollapse(); setupEndpointCollapse();
await loadModels(); // Fire the (SSH-backed) menu scan without awaiting — it self-renders a
// "Scanning…" state and fills in when it returns, so a slow/unreachable
// cluster never blocks first paint. pollStatus() below paints the rest.
loadModels();
await pollStatus(); await pollStatus();
await renderServices(); await renderServices();
pollCoordination();
pollHardware(); pollHardware();
pollUpdates(); pollUpdates();
// Disk-status probe runs after first paint — slow over SSH and not blocking.
loadDiskStatus();
// Speech-model patches panel — slow over SSH, runs after first paint. // Speech-model patches panel — slow over SSH, runs after first paint.
renderSpeechModels(); renderSpeechModels();
setInterval(pollStatus, 5000); setInterval(pollStatus, 5000);
setInterval(pollCoordination, 5000); // swap lock + schedule registry
setInterval(pollHardware, 8000); // every 8s setInterval(pollHardware, 8000); // every 8s
setInterval(pollUpdates, 300000); // every 5 min setInterval(pollUpdates, 300000); // every 5 min
setInterval(loadDiskStatus, 60000); // every 60s — disk state changes rarely setInterval(loadModels, 60000); // every 60s — re-scan the Sparks for added/removed models
setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
} }
+105 -10
View File
@@ -17,14 +17,28 @@
<span class="muted">connecting…</span> <span class="muted">connecting…</span>
</div> </div>
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a> <a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
<button id="open-settings" class="topbar-btn" type="button" title="Settings" aria-label="Open cluster settings">⚙ Settings</button>
</header> </header>
<main> <main>
<section id="setup-banner" class="banner hidden"> <section id="setup-banner" class="banner hidden">
<strong>Configuration needed.</strong> <strong>Configuration needed.</strong>
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span> <span>Run the <em>Configure Sparks</em> action in StartOS to set your two Spark IPs and SSH users. Everything else (ports, services, integrations) lives under <em>⚙ Settings</em> above.</span>
</section> </section>
<dialog id="settings-dialog" class="modal">
<form method="dialog" class="modal-form" id="settings-form">
<h3>Settings</h3>
<p class="muted small">Optional cluster knobs — vLLM/service ports, container names, support-service hosts, and integrations. The two Spark IPs and SSH users are set once via the <em>Configure Sparks</em> action in StartOS; everything else is here. Changes apply immediately. Stored on this server and included in StartOS backups.</p>
<div id="settings-body" class="settings-body"><p class="muted small">Loading…</p></div>
<p id="settings-error" class="muted small dd-error hidden"></p>
<div class="modal-actions">
<button type="button" id="settings-cancel" class="btn">Cancel</button>
<button type="submit" id="settings-save" class="btn primary">Save</button>
</div>
</form>
</dialog>
<section id="hardware-panel" class="hardware-panel hidden"> <section id="hardware-panel" class="hardware-panel hidden">
<div class="section-header"> <div class="section-header">
<h2 class="section-title">Spark hardware</h2> <h2 class="section-title">Spark hardware</h2>
@@ -96,6 +110,13 @@
</details> </details>
</section> </section>
<section id="lock-banner" class="banner lock-banner hidden">
<span class="lock-icon" aria-hidden="true">🔒</span>
<span id="lock-text">GPU swap path reserved</span>
<span class="spacer"></span>
<button id="lock-release" class="btn small-btn">Release</button>
</section>
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist"> <nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button> <button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button> <button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
@@ -164,6 +185,37 @@
</div> </div>
</form> </form>
</dialog> </dialog>
<dialog id="mb-update-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-update-title">Updating matrix-bridge…</h3>
<div class="phase-row">
<div class="phase" id="mb-update-phase">Starting…</div>
<span class="spacer"></span>
<span class="timer" id="mb-update-elapsed">0:00</span>
</div>
<details open>
<summary class="muted small">Log</summary>
<pre id="mb-update-log" class="log"></pre>
</details>
<div class="modal-actions">
<button type="button" id="mb-update-close" class="btn">Close</button>
</div>
</form>
</dialog>
<dialog id="mb-logs-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-logs-title">matrix-bridge logs</h3>
<p class="muted small">Last 100 lines from <code>docker logs</code> on the Spark.</p>
<pre id="mb-logs-pre" class="log"></pre>
<div class="modal-actions">
<button type="button" id="mb-logs-refresh" class="btn">Refresh</button>
<span class="spacer"></span>
<button type="button" id="mb-logs-close" class="btn">Close</button>
</div>
</form>
</dialog>
</section> </section>
<section id="speech-models-panel" class="speech-models hidden"> <section id="speech-models-panel" class="speech-models hidden">
@@ -198,13 +250,15 @@
<div class="section-header"> <div class="section-header">
<h2 class="section-title">LLM swap</h2> <h2 class="section-title">LLM swap</h2>
<button id="open-download" class="btn small-btn">+ Download a new model</button> <button id="open-download" class="btn small-btn">+ Download a new model</button>
<button id="open-local" class="btn small-btn">+ Add local model</button>
</div> </div>
<dialog id="catalog-dialog" class="modal"> <dialog id="catalog-dialog" class="modal">
<form method="dialog" class="modal-form" id="catalog-form"> <form method="dialog" class="modal-form" id="catalog-form">
<h3>Add downloaded model to catalog</h3> <h3>Set up this model</h3>
<p class="muted small">It will appear as a new card you can swap to. Knob values become its default launch flags — you can tweak later via the model's "Advanced" panel.</p> <p class="muted small">This model is downloaded, but Spark Control needs to know how to launch it. We've guessed from the model's own files — confirm or adjust, and it's saved so you're never asked again.</p>
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+"></label> <p id="cd-detected" class="muted small cd-detected hidden"></p>
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+" readonly></label>
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label> <label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label> <label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label> <label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
@@ -225,21 +279,52 @@
</fieldset> </fieldset>
<div class="modal-actions"> <div class="modal-actions">
<button type="button" id="cd-cancel" class="btn">Cancel</button> <button type="button" id="cd-cancel" class="btn">Cancel</button>
<button type="submit" class="btn primary">Add to catalog</button> <button type="submit" id="cd-submit" class="btn primary">Save settings</button>
</div>
</form>
</dialog>
<dialog id="local-model-dialog" class="modal">
<form method="dialog" class="modal-form" id="local-model-form">
<h3>Add a local / fine-tuned model</h3>
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
<label class="modal-row"><span>Mode</span>
<select id="lm-mode">
<option value="solo">solo (Spark 1 only)</option>
<option value="cluster">cluster (both Sparks via Ray)</option>
</select>
</label>
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
<fieldset class="modal-fieldset">
<legend>Default launch knobs</legend>
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
</fieldset>
<div class="modal-actions">
<button type="button" id="lm-cancel" class="btn">Cancel</button>
<button type="submit" class="btn primary">Add local model</button>
</div> </div>
</form> </form>
</dialog> </dialog>
<dialog id="disk-delete-dialog" class="modal"> <dialog id="disk-delete-dialog" class="modal">
<form method="dialog" class="modal-form"> <form method="dialog" class="modal-form">
<h3>Delete model weights from disk?</h3> <h3>Remove this model from the Sparks?</h3>
<p id="dd-summary" class="muted small"></p> <p id="dd-summary" class="muted small"></p>
<ul class="muted small dd-hosts" id="dd-hosts"></ul> <ul class="muted small dd-hosts" id="dd-hosts"></ul>
<p class="muted small">This is reversible — you can re-download from the catalog at any time. The catalog entry stays intact.</p> <p class="muted small">This deletes the weights and removes the card from the menu. You can always download it again later (re-downloading restores its saved settings).</p>
<p id="dd-error" class="muted small dd-error hidden"></p> <p id="dd-error" class="muted small dd-error hidden"></p>
<div class="modal-actions"> <div class="modal-actions">
<button type="button" id="dd-cancel" class="btn">Cancel</button> <button type="button" id="dd-cancel" class="btn">Cancel</button>
<button type="button" id="dd-confirm" class="btn danger">Delete from disk</button> <button type="button" id="dd-confirm" class="btn danger">Remove from disk &amp; menu</button>
</div> </div>
</form> </form>
</dialog> </dialog>
@@ -280,15 +365,17 @@
</form> </form>
</dialog> </dialog>
<section id="download-panel" class="download-panel hidden"> <section id="download-panel" class="download-panel hidden">
<div class="download-form" id="download-form"> <div class="download-form" id="download-form">
<label class="dl-row"> <label class="dl-row">
<span class="dl-label">HuggingFace repo</span> <span class="dl-label">HuggingFace repo</span>
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off"> <input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off" list="dl-suggestions">
<datalist id="dl-suggestions"></datalist>
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face"></a> <a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face"></a>
</label> </label>
<div class="dl-help muted small"> <div class="dl-help muted small">
<a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a> Type any repo, or pick a known one from the list. <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware · NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
</div> </div>
<div class="dl-row"> <div class="dl-row">
@@ -331,6 +418,14 @@
<section id="cards" class="cards"></section> <section id="cards" class="cards"></section>
</section> </section>
<section id="schedule-panel" class="schedule-panel hidden">
<div class="section-header">
<h2 class="section-title">Scheduled jobs</h2>
</div>
<p class="muted small">Registered by your own automation. Spark Control only displays these — it doesn't run them.</p>
<div id="schedule-list" class="schedule-list"></div>
</section>
<section id="update-banner" class="update-banner hidden"> <section id="update-banner" class="update-banner hidden">
<div class="ub-context muted small"> <div class="ub-context muted small">
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong> Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
+58 -2
View File
@@ -74,6 +74,42 @@ main {
} }
.banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; } .banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; }
/* GPU swap reservation (coordination layer) — informational, not a warning. */
.lock-banner {
display: flex;
align-items: center;
gap: 10px;
border-color: var(--info);
color: var(--info);
}
.lock-banner .lock-icon { font-size: 16px; }
.lock-banner strong { color: var(--text); }
.lock-banner .spacer { flex: 1; }
/* Scheduled-jobs panel — read-only view of what external automation registered. */
.schedule-panel { margin-top: 8px; }
.schedule-list {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(240px, 1fr));
gap: 12px;
margin-top: 8px;
}
.schedule-item {
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 12px 14px;
}
.schedule-item .name { font-weight: 600; margin-bottom: 4px; }
.schedule-item code {
background: var(--surface-2);
border: 1px solid var(--border);
border-radius: 4px;
padding: 1px 5px;
font-size: 12px;
}
.schedule-item .desc { margin-top: 6px; color: var(--muted); font-size: 13px; }
/* ===== Endpoint panel ===== */ /* ===== Endpoint panel ===== */
.endpoint-panel { .endpoint-panel {
@@ -526,10 +562,12 @@ main {
#dl-log-details { margin-top: 12px; } #dl-log-details { margin-top: 12px; }
#dl-log-details summary { cursor: pointer; padding: 4px 0; } #dl-log-details summary { cursor: pointer; padding: 4px 0; }
/* ===== NIM install dialog ===== */ /* ===== NIM install + matrix-bridge dialogs ===== */
.modal#nim-dialog, .modal#nim-dialog,
.modal#nim-progress-dialog { max-width: 640px; } .modal#nim-progress-dialog,
.modal#mb-update-dialog,
.modal#mb-logs-dialog { max-width: 640px; }
.nim-grid { .nim-grid {
display: grid; display: grid;
gap: 8px; gap: 8px;
@@ -692,6 +730,7 @@ main {
.card .repo a { color: inherit; text-decoration: none; } .card .repo a { color: inherit; text-decoration: none; }
.card .repo a:hover { color: var(--info); text-decoration: underline; } .card .repo a:hover { color: var(--info); text-decoration: underline; }
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; } .card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
.tag { .tag {
background: var(--surface-2); background: var(--surface-2);
border: 1px solid var(--border); border: 1px solid var(--border);
@@ -736,8 +775,15 @@ main {
.card .adv-btn, .card .adv-btn,
.card .test-btn { padding: 8px 12px; font-size: 12px; } .card .test-btn { padding: 8px 12px; font-size: 12px; }
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); } .card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); } .tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; } .tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
.tag.setup-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
.card.needs-setup { border-style: dashed; }
.card-actions .btn[data-setup-key] { flex: 1; }
.empty-menu { grid-column: 1 / -1; padding: 28px 16px; text-align: center; border: 1px dashed var(--border); border-radius: 10px; }
.cd-detected { padding: 8px 10px; border: 1px solid var(--border); border-radius: 8px; background: rgba(255,255,255,0.02); }
.cd-detected code { word-break: break-all; }
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; } .card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); } .card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; } .card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
@@ -918,3 +964,13 @@ main {
.tab-content.active { display: block; } .tab-content.active { display: block; }
/* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */ /* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */
/* ===== Settings ('gear') dialog ===== */
.modal#settings-dialog { max-width: 560px; }
/* Cap the (tall) form so the Save/Cancel actions stay reachable; the grouped
fields scroll within. */
#settings-body { max-height: 60vh; overflow-y: auto; padding-right: 6px; display: flex; flex-direction: column; gap: 12px; }
.settings-field { display: flex; flex-direction: column; gap: 2px; }
.settings-help { display: block; line-height: 1.35; }
.settings-clear { display: inline-flex; align-items: center; gap: 6px; margin-top: 2px; cursor: pointer; }
.settings-clear input { width: auto; }
+25 -2
View File
@@ -6,7 +6,9 @@ from datetime import datetime, timezone
from typing import Optional from typing import Optional
from .config import Settings from .config import Settings
from .coordination import WebhookNotifier, build_webhook_payload
from .models import Catalog, build_launch_command from .models import Catalog, build_launch_command
from .shellsafe import quote_arg
from .ssh import ssh_run, ssh_stream, StreamHandle from .ssh import ssh_run, ssh_stream, StreamHandle
@@ -32,9 +34,15 @@ class SwapJob:
class SwapManager: class SwapManager:
def __init__(self, settings: Settings, catalog: Catalog) -> None: def __init__(
self,
settings: Settings,
catalog: Catalog,
notifier: Optional[WebhookNotifier] = None,
) -> None:
self.settings = settings self.settings = settings
self.catalog = catalog self.catalog = catalog
self.notifier = notifier
self.lock = asyncio.Lock() self.lock = asyncio.Lock()
self.jobs: dict[str, SwapJob] = {} self.jobs: dict[str, SwapJob] = {}
self.current_job_id: Optional[str] = None self.current_job_id: Optional[str] = None
@@ -77,6 +85,21 @@ class SwapManager:
job.finished_at = datetime.now(timezone.utc).isoformat() job.finished_at = datetime.now(timezone.utc).isoformat()
if self.current_job_id == job.id: if self.current_job_id == job.id:
self.current_job_id = None self.current_job_id = None
# Outside the swap lock (so a webhook POST can't stall a queued swap) and
# only for real swaps — a dry run never changes the running model. A
# webhook failure is logged inside fire(), never raised.
if self.notifier is not None and self.notifier.enabled and not job.dry_run:
event = "swap_complete" if job.state == "ready" else "swap_failed"
await self.notifier.fire(event, build_webhook_payload(
event=event,
job_id=job.id,
model_key=job.model_key,
state=job.state,
returncode=job.returncode,
started_at=job.started_at,
finished_at=job.finished_at,
dry_run=job.dry_run,
))
async def _do(self, job: SwapJob) -> None: async def _do(self, job: SwapJob) -> None:
model = self.catalog.models[job.model_key] model = self.catalog.models[job.model_key]
@@ -112,7 +135,7 @@ class SwapManager:
# Step 3: tail logs until the ready marker (or timeout) # Step 3: tail logs until the ready marker (or timeout)
job.state = "tailing" job.state = "tailing"
tail_cmd = "docker logs -f --tail 50 vllm_node" tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
job.append(f"$ {tail_cmd}") job.append(f"$ {tail_cmd}")
timeout = max(model.expected_ready_seconds * 2, 600) timeout = max(model.expected_ready_seconds * 2, 600)
handle = StreamHandle() handle = StreamHandle()
+2 -1
View File
@@ -22,6 +22,7 @@ from typing import Any
from .config import Settings from .config import Settings
from .models import Catalog, build_launch_command from .models import Catalog, build_launch_command
from .shellsafe import quote_arg
from .ssh import ssh_run from .ssh import ssh_run
@@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
# Pipe the JSON args list to a here-doc Python invocation. The validator # Pipe the JSON args list to a here-doc Python invocation. The validator
# reads from stdin to avoid shell-escaping the args themselves. # reads from stdin to avoid shell-escaping the args themselves.
cmd = ( cmd = (
f"echo '{payload}' | docker exec -i vllm_node python3 -c " f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
+ shlex.quote(_VALIDATOR_SCRIPT) + shlex.quote(_VALIDATOR_SCRIPT)
) )
+46 -38
View File
@@ -1,9 +1,14 @@
# spark-control model catalog # spark-control launch recipes
# #
# Edit this file (or override at runtime via the StartOS "Edit Model Catalog" # These are NOT the dashboard menu. The menu is whatever is actually downloaded
# action) to add or change available models. # on the Sparks — Spark Control scans the Hugging Face cache on each load and
# shows what it finds. These entries are launch *recipes*: matched to an on-disk
# model by `repo`, they say HOW to launch it. A downloaded model with no recipe
# here shows up as "needs setup", and the dashboard infers + saves one on first
# use (from the model's own config.json). Add a recipe to make a known model
# launch correctly the moment it's downloaded, with no setup prompt.
# #
# Each model entry produces this command on Spark 1: # Each recipe produces this command on Spark 1:
# cd ~/spark-vllm-docker # cd ~/spark-vllm-docker
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \ # ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
# --port=<defaults.port> --host=<defaults.host> <vllm_args...> # --port=<defaults.port> --host=<defaults.host> <vllm_args...>
@@ -54,6 +59,34 @@ models:
- --enable-prefix-caching - --enable-prefix-caching
- --kv-cache-dtype=fp8 - --kv-cache-dtype=fp8
gemma4-26b:
display_name: "Gemma 4 26B-A4B (vision, light)"
description: >-
Lighter, faster sibling of the Gemma 4 31B above: a Mixture-of-Experts
model with 26B total parameters but only ~4B active per token, so it
generates quickly. Takes images as well as text (good for tasks like
reading a business card into structured text). Reasoning is a bit
shallower than the dense 31B. Runs solo on one Spark.
repo: nvidia/Gemma-4-26B-A4B-NVFP4
size_gb: 17
mode: solo
capabilities: [vision, reasoning, tools]
expected_ready_seconds: 240
vllm_args:
- --gpu-memory-utilization=0.8
- --max-model-len=32768
- --max-num-batched-tokens=16384
- --reasoning-parser=gemma4
- --tool-call-parser=gemma4
- --enable-auto-tool-choice
# MoE backend: research found this model's expert layers fall back to
# 'marlin' on GB10 (the fast flashinfer_cutlass path errors on sm_121).
# If a swap fails to start, this flag is the first thing to flip.
- --moe_backend=marlin
- --load-format=fastsafetensors
- --enable-prefix-caching
- --kv-cache-dtype=fp8
qwen36: qwen36:
display_name: "Qwen3.6 35B-A3B (daily driver)" display_name: "Qwen3.6 35B-A3B (daily driver)"
description: >- description: >-
@@ -63,7 +96,10 @@ models:
repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4 repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4
size_gb: 20 size_gb: 20
mode: solo mode: solo
capabilities: [reasoning] # Qwen3.6-35B-A3B is natively multimodal (Qwen3_5MoeForConditionalGeneration,
# vision tower ships in the checkpoint). Confirmed reading a business card
# cleanly on this cluster — use the "Vision check" button on the live card.
capabilities: [vision, reasoning]
expected_ready_seconds: 300 expected_ready_seconds: 300
vllm_args: vllm_args:
- --gpu-memory-utilization=0.85 - --gpu-memory-utilization=0.85
@@ -74,36 +110,8 @@ models:
- --load-format=fastsafetensors - --load-format=fastsafetensors
- --enable-prefix-caching - --enable-prefix-caching
- --kv-cache-dtype=fp8 - --kv-cache-dtype=fp8
# Cap image resolution: a large phone photo (e.g. 12MP) otherwise expands
qwen3-235b-fp8: # to ~11.8k vision tokens, blowing past vLLM's ~4096-image-token limit and
display_name: "Qwen3 235B-A22B FP8 (legacy)" # getting rejected with a 400. ~2MP auto-downscales big images server-side
description: >- # (so every /v1 consumer is covered) while staying sharp enough for OCR.
Earlier generation of the Qwen 235B family in native FP8 precision. - '--mm-processor-kwargs={"max_pixels": 2000000}'
Runs across both Sparks. Mostly superseded by Qwen3-VL above; keep
around for text-only baseline comparisons.
repo: Qwen/Qwen3-235B-A22B-FP8
size_gb: 220
mode: cluster
capabilities: []
expected_ready_seconds: 360
vllm_args:
- --gpu-memory-utilization=0.7
- -tp=2
- --distributed-executor-backend=ray
- --max-model-len=32768
qwen25-72b:
display_name: "Qwen2.5 72B (legacy)"
description: >-
Last-generation 72B dense model. Cluster mode required due to size.
Kept for compatibility and baseline comparison against newer Qwens.
repo: Qwen/Qwen2.5-72B-Instruct
size_gb: 145
mode: cluster
capabilities: []
expected_ready_seconds: 360
vllm_args:
- --gpu-memory-utilization=0.7
- -tp=2
- --distributed-executor-backend=ray
- --max-model-len=32768
+3
View File
@@ -15,3 +15,6 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db") os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json") os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml") os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
# Keep the in-app settings overlay off the container-only /data path; tests that
# care about its contents point it at their own tmp file via monkeypatch.
os.environ.setdefault("APP_SETTINGS_FILE", "/tmp/spark_control_test_app_settings.json")
+174
View File
@@ -0,0 +1,174 @@
"""In-app settings overlay (the dashboard 'gear') + swap-lock routing regression.
Covers app_settings (the /data overlay backing the gear): first-run seeding from
env (the migration path), known-key filtering, apply() validation, secret
masking — and, end-to-end via TestClient, that POST /api/settings reloads the
shared Settings instance live, and that GET /api/swap/lock is no longer shadowed
by /api/swap/{job_id}.
"""
import json
import pytest
from app import app_settings
@pytest.fixture
def overlay_file(tmp_path, monkeypatch):
p = tmp_path / "app_settings.json"
monkeypatch.setenv("APP_SETTINGS_FILE", str(p))
return p
# ---- overlay store ----
def test_seed_from_env_filters_unknown_and_blank(overlay_file):
# An existing install upgrading in: values previously set via the StartOS
# action arrive as env; only known, non-empty keys migrate into the overlay.
app_settings.seed_from_env({
"VLLM_PORT": "8000",
"QDRANT_COLLECTION": "", # blank → skipped
"TOTALLY_UNKNOWN": "x", # not a gear key → skipped
"PARAKEET_PORT": "8010",
})
expected = {"VLLM_PORT": "8000", "PARAKEET_PORT": "8010"}
assert app_settings.load_overlay() == expected
assert json.loads(overlay_file.read_text()) == expected
def test_seed_is_a_one_time_noop_when_file_present(overlay_file):
overlay_file.write_text(json.dumps({"VLLM_PORT": "8000", "BOGUS": "y", "NGC_API_KEY": ""}))
app_settings.seed_from_env({"VLLM_PORT": "9999"}) # file exists ⇒ no-op
# unknown + blank keys dropped on read; existing value untouched by the seed.
assert app_settings.load_overlay() == {"VLLM_PORT": "8000"}
def test_no_file_is_empty_and_seed_of_blank_env_writes_nothing(overlay_file):
assert app_settings.load_overlay() == {}
app_settings.seed_from_env({"VLLM_PORT": "", "QDRANT_COLLECTION": ""})
assert not overlay_file.exists() # nothing worth seeding ⇒ no file
assert app_settings.load_overlay() == {}
def test_apply_set_then_blank_deletes(overlay_file):
app_settings.apply({"VLLM_PORT": "8000"})
assert app_settings.load_overlay()["VLLM_PORT"] == "8000"
app_settings.apply({"VLLM_PORT": ""}) # blank non-secret ⇒ revert to default
assert "VLLM_PORT" not in app_settings.load_overlay()
def test_apply_rejects_unknown_key(overlay_file):
with pytest.raises(app_settings.SettingsError):
app_settings.apply({"NOT_A_KNOB": "x"})
def test_apply_rejects_non_numeric_port(overlay_file):
with pytest.raises(app_settings.SettingsError):
app_settings.apply({"PARAKEET_PORT": "80x0"})
def test_apply_rejects_control_chars(overlay_file):
with pytest.raises(app_settings.SettingsError):
app_settings.apply({"QDRANT_COLLECTION": "a\nb"})
def test_secret_blank_keeps_existing(overlay_file):
app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
app_settings.apply({"NGC_API_KEY": ""}) # blank secret ⇒ leave it in place
assert app_settings.load_overlay()["NGC_API_KEY"] == "nvapi-abc"
def test_apply_rejects_out_of_range_port(overlay_file):
for bad in ("0", "99999", "65536"):
with pytest.raises(app_settings.SettingsError):
app_settings.apply({"VLLM_PORT": bad})
def test_apply_accepts_port_bounds(overlay_file):
app_settings.apply({"VLLM_PORT": "1", "PARAKEET_PORT": "65535"})
o = app_settings.load_overlay()
assert o["VLLM_PORT"] == "1" and o["PARAKEET_PORT"] == "65535"
def test_secret_clear_sentinel_removes(overlay_file):
app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
app_settings.apply({"NGC_API_KEY": app_settings.CLEAR_SENTINEL})
assert "NGC_API_KEY" not in app_settings.load_overlay()
def test_seed_skips_invalid_and_strips(overlay_file):
app_settings.seed_from_env({
"VLLM_PORT": "8000\n", # trailing newline → stripped
"PARAKEET_PORT": "99999", # out of range → skipped, not written
"QDRANT_COLLECTION": "crm",
})
o = app_settings.load_overlay()
assert o["VLLM_PORT"] == "8000"
assert "PARAKEET_PORT" not in o
assert o["QDRANT_COLLECTION"] == "crm"
def test_public_view_exposes_clear_sentinel(overlay_file):
assert app_settings.public_view()["clear_sentinel"] == app_settings.CLEAR_SENTINEL
def test_public_view_masks_secrets_and_groups(overlay_file):
app_settings.apply({"NGC_API_KEY": "nvapi-abc", "VLLM_PORT": "8000"})
view = app_settings.public_view()
fields = {f["key"]: f for g in view["groups"] for f in g["fields"]}
# Secret: value never echoed to the browser, only a set flag.
assert "value" not in fields["NGC_API_KEY"]
assert fields["NGC_API_KEY"]["set"] is True
# Non-secret: current value present for prefill.
assert fields["VLLM_PORT"]["value"] == "8000"
assert {g["name"] for g in view["groups"]} >= {"vLLM (Spark 1)", "Integrations"}
# The previously-missing support-service ports are now exposed.
assert {"PARAKEET_PORT", "KOKORO_PORT", "EMBED_PORT", "QDRANT_PORT"} <= set(fields)
# ---- end-to-end (TestClient): live reload + route order ----
# TestClient is created without the `with` context manager so app startup events
# (the deep-health poll loop) don't run — these stay fully offline.
def _client(monkeypatch, tmp_path):
monkeypatch.setenv("APP_SETTINGS_FILE", str(tmp_path / "live.json"))
from fastapi.testclient import TestClient
from app import server
return TestClient(server.app)
def test_swap_lock_get_is_not_shadowed(monkeypatch, tmp_path):
client = _client(monkeypatch, tmp_path)
r = client.get("/api/swap/lock")
# Regression: must hit get_swap_lock (200, {"held": False}), NOT the
# /api/swap/{job_id} catch-all that returns 404 "no such job".
assert r.status_code == 200
assert r.json() == {"held": False}
def test_settings_apply_is_live_without_restart(monkeypatch, tmp_path):
client = _client(monkeypatch, tmp_path)
r = client.post("/api/settings", json={"values": {"VLLM_PORT": "8123"}})
assert r.status_code == 200
# Settings reloaded in place ⇒ /api/config reflects it immediately.
assert client.get("/api/config").json()["vllm_port"] == 8123
# And clearing it reverts to the default, still live.
client.post("/api/settings", json={"values": {"VLLM_PORT": ""}})
assert client.get("/api/config").json()["vllm_port"] == 8888
def test_settings_post_rejects_bad_value(monkeypatch, tmp_path):
client = _client(monkeypatch, tmp_path)
r = client.post("/api/settings", json={"values": {"PARAKEET_PORT": "nope"}})
assert r.status_code == 422
def test_webhook_notifier_repoints_live(monkeypatch, tmp_path):
# WebhookNotifier snapshots url/secret, so reload() alone can't reach it;
# post_settings must re-point it. Regression for that P1.
client = _client(monkeypatch, tmp_path)
from app import server
client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": "https://example.test/hook"}})
assert server.swap_webhook.url == "https://example.test/hook"
assert server.swap_webhook.enabled
client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": ""}})
assert server.swap_webhook.url == ""
+201
View File
@@ -0,0 +1,201 @@
"""Coordination layer: swap lock lifecycle/expiry, schedule registry CRUD, and
the webhook payload+signature. All offline — the lock takes an injectable `now`
so expiry is tested without sleeping, and the webhook is exercised only on the
disabled (no-network) path plus its pure payload/signature helpers.
"""
import asyncio
from datetime import datetime, timedelta, timezone
import pytest
from app.coordination import (
LOCK_TTL_MAX,
LOCK_TTL_MIN,
LockHeld,
ScheduleRegistry,
SwapLockManager,
WebhookNotifier,
build_webhook_payload,
sign_payload,
valid_schedule_id,
)
T0 = datetime(2026, 6, 17, 12, 0, 0, tzinfo=timezone.utc)
# ----------------------------------------------------------------- swap lock ----
def test_acquire_free_lock_returns_token_and_status_held():
mgr = SwapLockManager()
lock = mgr.acquire("openclaw", ttl_seconds=60, note="daily vol", now=T0)
assert lock.token
st = mgr.status(now=T0)
assert st["held"] is True
assert st["holder"] == "openclaw"
assert st["note"] == "daily vol"
assert st["seconds_remaining"] == 60
assert "token" not in st # public view never leaks the token
def test_acquire_requires_holder():
with pytest.raises(ValueError):
SwapLockManager().acquire(" ", now=T0)
def test_acquire_held_by_other_raises_lockheld_with_state():
mgr = SwapLockManager()
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
with pytest.raises(LockHeld) as ei:
mgr.acquire("johnny5", ttl_seconds=60, now=T0)
assert ei.value.state["holder"] == "openclaw"
def test_reacquire_with_token_extends_and_keeps_token():
mgr = SwapLockManager()
first = mgr.acquire("openclaw", ttl_seconds=60, now=T0)
later = T0 + timedelta(seconds=30)
second = mgr.acquire("openclaw", ttl_seconds=60, token=first.token, now=later)
assert second.token == first.token
# window extended from the later moment, not the original
assert mgr.status(now=later)["seconds_remaining"] == 60
assert second.acquired_at == first.acquired_at # acquired_at preserved
def test_reacquire_without_token_is_refused_even_for_same_holder_name():
# Holder name is descriptive, not a secret — matching it must not grant access.
mgr = SwapLockManager()
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
with pytest.raises(LockHeld):
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
def test_ttl_is_clamped():
mgr = SwapLockManager()
mgr.acquire("a", ttl_seconds=0, now=T0)
assert mgr.status(now=T0)["seconds_remaining"] == LOCK_TTL_MIN
mgr2 = SwapLockManager()
mgr2.acquire("b", ttl_seconds=10**9, now=T0)
assert mgr2.status(now=T0)["seconds_remaining"] == LOCK_TTL_MAX
def test_lock_expires_and_clears_lazily():
mgr = SwapLockManager()
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
after = T0 + timedelta(seconds=11)
assert mgr.status(now=after) == {"held": False}
assert mgr.verify(tok, now=after) is False
# an expired lock is free to re-take by anyone
mgr.acquire("johnny5", ttl_seconds=10, now=after)
assert mgr.status(now=after)["holder"] == "johnny5"
def test_verify_matches_only_active_token():
mgr = SwapLockManager()
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
assert mgr.verify(tok, now=T0) is True
assert mgr.verify("nope", now=T0) is False
assert mgr.verify(None, now=T0) is False
def test_release_requires_token_then_frees():
mgr = SwapLockManager()
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
with pytest.raises(PermissionError):
mgr.release("wrong", now=T0)
assert mgr.release(tok, now=T0) is True
assert mgr.status(now=T0) == {"held": False}
def test_force_release_skips_token_and_release_of_free_lock_is_false():
mgr = SwapLockManager()
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
assert mgr.release(force=True, now=T0) is True
assert mgr.release(force=True, now=T0) is False # nothing held now
def test_is_blocked_by_is_the_swap_gate():
# Mirrors the single-read decision the /api/swap endpoint makes.
mgr = SwapLockManager()
assert mgr.is_blocked_by(None, now=T0) is None # free lock blocks nobody
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
blocked = mgr.is_blocked_by(None, now=T0) # no token -> blocked
assert blocked is not None and blocked["holder"] == "openclaw"
assert mgr.is_blocked_by("wrong", now=T0) is not None # wrong token -> blocked
assert mgr.is_blocked_by(tok, now=T0) is None # holder's token -> allowed
# At/after expiry the gate is open even without a token (the bug a separate
# status()+verify() pair would get wrong).
assert mgr.is_blocked_by(None, now=T0 + timedelta(seconds=11)) is None
# ------------------------------------------------------------------- webhook ----
def test_build_webhook_payload_shape():
p = build_webhook_payload(
event="swap_complete", job_id="abc123", model_key="gemma",
state="ready", returncode=0, started_at="t0", finished_at="t1",
dry_run=False,
)
assert p == {
"event": "swap_complete", "job_id": "abc123", "model_key": "gemma",
"state": "ready", "returncode": 0, "started_at": "t0",
"finished_at": "t1", "dry_run": False,
}
def test_sign_payload_is_deterministic_and_prefixed():
body = b'{"event":"swap_complete"}'
sig = sign_payload("s3cr3t", body)
assert sig.startswith("sha256=")
assert sig == sign_payload("s3cr3t", body)
assert sig != sign_payload("other", body)
def test_disabled_webhook_fire_is_noop():
n = WebhookNotifier("", "")
assert n.enabled is False
# Must not attempt any network call or raise when no URL is configured.
assert asyncio.run(n.fire("swap_complete", {"x": 1})) is None
# --------------------------------------------------------- schedule registry ----
def test_register_and_list_schedule():
reg = ScheduleRegistry()
e = reg.register(name="Daily Vol", owner="openclaw", cron="0 6 * * *")
assert e.id and e.registered_at and e.updated_at
listed = reg.list()
assert len(listed) == 1 and listed[0]["name"] == "Daily Vol"
def test_register_with_id_updates_in_place():
reg = ScheduleRegistry()
reg.register(name="Daily Vol", id="dv", owner="openclaw", cron="0 6 * * *")
reg.register(name="Daily Vol v2", id="dv", owner="openclaw", cron="0 7 * * *")
listed = reg.list()
assert len(listed) == 1
assert listed[0]["name"] == "Daily Vol v2" and listed[0]["cron"] == "0 7 * * *"
def test_register_requires_name_and_validates_id():
reg = ScheduleRegistry()
with pytest.raises(ValueError):
reg.register(name=" ")
with pytest.raises(ValueError):
reg.register(name="ok", id="bad id; rm -rf")
def test_delete_schedule():
reg = ScheduleRegistry()
reg.register(name="Daily Vol", id="dv")
assert reg.delete("dv") is True
assert reg.delete("dv") is False
assert reg.list() == []
def test_valid_schedule_id():
assert valid_schedule_id("daily-vol")
assert valid_schedule_id("a.b_c-1")
assert not valid_schedule_id("")
assert not valid_schedule_id("../etc")
assert not valid_schedule_id("has space")
assert not valid_schedule_id("x" * 65)
+190
View File
@@ -0,0 +1,190 @@
"""Disk-driven menu helpers: cache-dir parsing + launch-recipe inference.
All offline — pure functions over a fake cache listing and fake config.json
dicts. The SSH scan, the menu merge, and the suggest endpoint that wire these
together are exercised by hand against the live cluster (mock-heavy unit tests of
those would test the mocks).
"""
import asyncio
from app import discovery
from app.config import Settings
from app.disk import DiskStatus, cache_dirname_to_repo, parse_cache_listing
from app.discovery import repo_to_key, infer_recipe, _detect_family
from app.models import load_catalog
# ---- cache dirname <-> repo ----
def test_cache_dirname_to_repo_roundtrip():
assert cache_dirname_to_repo("models--RedHatAI--Qwen3.6-35B-A3B-NVFP4") == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
def test_cache_dirname_name_with_double_dash():
# The org is the first segment; everything after is the name (single '/').
assert cache_dirname_to_repo("models--org--weird--name") == "org/weird--name"
def test_cache_dirname_rejects_non_model_dirs():
assert cache_dirname_to_repo("datasets--foo--bar") is None
assert cache_dirname_to_repo("models--onlyorg") is None
assert cache_dirname_to_repo("random") is None
# ---- parse_cache_listing ----
def test_parse_cache_listing_complete_and_incomplete():
out = (
"20000000000|1|models--RedHatAI--Qwen3.6-35B-A3B-NVFP4\n"
"5000000000|0|models--some--half-downloaded\n"
"\n"
"garbage line with no pipes\n"
"123|1|not-a-model-dir\n"
)
items = parse_cache_listing(out)
assert items == [
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20000000000, True),
("some/half-downloaded", 5000000000, False),
]
def test_parse_cache_listing_bad_size_defaults_zero():
items = parse_cache_listing("notanumber|1|models--a--b")
assert items == [("a/b", 0, True)]
# ---- repo_to_key ----
def test_repo_to_key_is_url_safe_and_stable():
assert repo_to_key("RedHatAI/Qwen3.6-35B-A3B-NVFP4") == "redhatai-qwen3-6-35b-a3b-nvfp4"
# Idempotent enough to be a stable id across calls.
assert repo_to_key("nvidia/Gemma-4-26B-A4B-NVFP4") == "nvidia-gemma-4-26b-a4b-nvfp4"
# ---- family detection ----
def test_detect_qwen3_moe():
cfg = {"architectures": ["Qwen3MoeForCausalLM"], "model_type": "qwen3_moe", "num_experts": 128}
label, flags, caps = _detect_family(cfg)
assert "--reasoning-parser=qwen3" in flags
assert "--moe_backend=flashinfer_cutlass" in flags
assert "reasoning" in caps
assert "MoE" in label
def test_detect_gemma_moe_uses_marlin():
cfg = {"architectures": ["Gemma4MoeForConditionalGeneration"], "model_type": "gemma4_moe", "num_local_experts": 8}
label, flags, caps = _detect_family(cfg)
assert "--reasoning-parser=gemma4" in flags
assert "--tool-call-parser=gemma4" in flags
assert "--moe_backend=marlin" in flags # NOT flashinfer_cutlass — GB10 footgun
assert "vision" in caps # ConditionalGeneration => multimodal
assert "tools" in caps
def test_detect_generic_has_no_family_flags():
label, flags, caps = _detect_family({"architectures": ["LlamaForCausalLM"], "model_type": "llama"})
assert flags == []
assert label == "Generic"
def test_detect_vision_from_config_keys():
_, _, caps = _detect_family({"model_type": "qwen3", "vision_config": {"x": 1}})
assert "vision" in caps
# ---- infer_recipe (the prefill the setup form receives) ----
def test_infer_recipe_solo_small_model():
cfg = {"architectures": ["Qwen3ForCausalLM"], "model_type": "qwen3"}
rec = infer_recipe("RedHatAI/Qwen3.6-35B-A3B-NVFP4", cfg, total_bytes=20_000_000_000, on_host_count=1)
assert rec["mode"] == "solo"
assert rec["key"] == "redhatai-qwen3-6-35b-a3b-nvfp4"
assert rec["repo"] == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
assert "--reasoning-parser=qwen3" in rec["vllm_args"]
assert "-tp=2" not in rec["vllm_args"]
assert rec["knobs"]["kv_cache_dtype"] == "fp8"
def test_infer_recipe_cluster_when_on_both_hosts():
rec = infer_recipe("org/big", {}, total_bytes=10_000_000_000, on_host_count=2)
assert rec["mode"] == "cluster"
assert "-tp=2" in rec["vllm_args"]
assert "--distributed-executor-backend=ray" in rec["vllm_args"]
assert rec["knobs"]["gpu_memory_utilization"] == 0.7
def test_infer_recipe_cluster_when_too_big_for_one_spark():
rec = infer_recipe("org/huge", {}, total_bytes=200_000_000_000, on_host_count=1)
assert rec["mode"] == "cluster"
# ---- build_menu merge (disk scan recipes) ----
def _both_spark_settings(monkeypatch) -> Settings:
for k in ("SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER"):
monkeypatch.delenv(k, raising=False)
monkeypatch.setenv("SPARK1_HOST", "1.1.1.1")
monkeypatch.setenv("SPARK1_USER", "u")
monkeypatch.setenv("SPARK2_HOST", "2.2.2.2")
monkeypatch.setenv("SPARK2_USER", "u")
return Settings.from_env()
def test_build_menu_merges_recipe_discovered_and_hides_incomplete(monkeypatch):
cat = load_catalog("models.yaml") # bundled recipes incl. qwen36 + gemma4
settings = _both_spark_settings(monkeypatch)
async def fake_list(host, user, s):
if host == "1.1.1.1":
return [
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20_000_000_000, True), # recipe match
("someorg/mystery-7B", 7_000_000_000, True), # needs setup
("broken/half", 1_000_000_000, False), # incomplete -> hidden
]
return [] # spark2 empty
async def fake_probe(repo, mode, s, *, local_path=None):
return DiskStatus(repo=local_path or repo, on_disk=False, total_bytes=0, per_host=[])
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
menu = asyncio.run(discovery.build_menu(settings, cat))
# Recipe-matched: keyed by recipe key, ready (not needs_setup), real size.
assert "qwen36" in menu
assert menu["qwen36"]["needs_setup"] is False
assert menu["qwen36"]["total_bytes"] == 20_000_000_000
# Discovered-without-recipe: slug key, needs_setup.
slug = repo_to_key("someorg/mystery-7B")
assert menu[slug]["needs_setup"] is True
# Incomplete download is filtered out entirely.
assert all("half" not in k for k in menu)
# A recipe with nothing on disk (e.g. gemma4) must NOT appear — the menu is the disk.
assert "gemma4" not in menu
def test_build_menu_sums_cluster_model_across_both_sparks(monkeypatch):
cat = load_catalog("models.yaml")
settings = _both_spark_settings(monkeypatch)
async def fake_list(host, user, s):
# Same repo present on BOTH Sparks — one card, sizes summed (not two cards).
return [("org/sharded-235B", 70_000_000_000, True)]
async def fake_probe(repo, mode, s, *, local_path=None):
return DiskStatus(repo=repo, on_disk=False, total_bytes=0, per_host=[])
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
menu = asyncio.run(discovery.build_menu(settings, cat))
key = repo_to_key("org/sharded-235B")
assert list(menu) == [key] # exactly one card
assert menu[key]["total_bytes"] == 140_000_000_000 # summed across both hosts
assert len(menu[key]["per_host"]) == 2
assert menu[key]["mode"] == "cluster" # present on 2 hosts -> cluster
+35
View File
@@ -0,0 +1,35 @@
"""build_download_command: the ~/.local/bin PATH fix + shell-injection quoting.
hf-download.sh on the Spark shells out to `uvx`, which the uv installer puts in
~/.local/bin — off the PATH of our non-interactive SSH session. The command must
prepend ~/.local/bin (via $HOME, expanded server-side) or the download dies with
"uvx: command not found". The repo value must also be shlex-quoted at the sink so
a crafted value can't break out of the command (validate_repo gates it upstream).
"""
import shlex
from app.download import build_download_command
def test_prepends_local_bin_to_path():
cmd = build_download_command("org/name")
assert cmd.startswith('export PATH="$HOME/.local/bin:$PATH" && ')
assert "cd ~/spark-vllm-docker" in cmd
assert "./hf-download.sh org/name" in cmd
def test_no_trailing_space_without_flags():
assert build_download_command("org/name", "").endswith("./hf-download.sh org/name")
def test_cluster_flags_appended():
cmd = build_download_command("org/name", "-c --copy-parallel")
assert cmd.endswith("./hf-download.sh org/name -c --copy-parallel")
def test_repo_is_shlex_quoted():
# Everything after the script name must shlex-split back to the exact repo,
# the same round-trip invariant build_launch_command relies on.
cmd = build_download_command("org/na;me")
after = cmd.split("./hf-download.sh ", 1)[1]
assert shlex.split(after) == ["org/na;me"]
+81
View File
@@ -7,6 +7,9 @@ the command back into the exact token list. The vLLM pre-flight validator
""" """
import shlex import shlex
import pytest
from pydantic import ValidationError
from app.models import Defaults, ModelDef, build_launch_command from app.models import Defaults, ModelDef, build_launch_command
DEFAULTS = Defaults(port=8888, host="0.0.0.0") DEFAULTS = Defaults(port=8888, host="0.0.0.0")
@@ -65,3 +68,81 @@ def test_injection_via_vllm_arg_stays_literal():
payload = "--foo=$(touch /tmp/pwned)" payload = "--foo=$(touch /tmp/pwned)"
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS) cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
assert payload in shlex.split(cmd) # preserved as one inert token assert payload in shlex.split(cmd) # preserved as one inert token
# ---- local / fine-tuned models (served by directory, not HF repo) ----
def test_local_model_bind_mounts_dir_and_serves_the_path():
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
# The launch script's hook bind-mounts the host dir at the SAME container path.
assert tokens[0] == (
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
)
# vLLM is pointed at the directory, not an HF repo id.
i = tokens.index("serve")
assert tokens[i + 1] == "/home/u/models/ft-v2"
assert "--max-model-len=2048" in tokens
def test_local_model_chat_template_arg_survives_round_trip():
m = _model(
repo="",
local_path="/m/ft",
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
)
cmd = build_launch_command("k", m, DEFAULTS)
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
def test_local_path_with_metacharacters_is_quoted_not_executed():
# The validator rejects a hostile path at the boundary; bypass it with
# model_construct to prove the quote_arg sink is safe in depth even if a bad
# value somehow reaches build_launch_command.
evil = "/m/ft; rm -rf ~"
m = ModelDef.model_construct(
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
vllm_args=[], knobs=None, custom=False, capabilities=[],
expected_ready_seconds=300, description=None,
)
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
i = tokens.index("serve")
assert tokens[i + 1] == evil # recovered as one literal token, not executed
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
def test_model_requires_exactly_one_source():
with pytest.raises(ValidationError):
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
def test_local_model_rejects_chat_template_outside_dir():
# Only local_path is mounted into the container, so a chat-template elsewhere
# would silently 404 inside vLLM — reject it up front.
with pytest.raises(ValidationError):
ModelDef(
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
vllm_args=["--chat-template=/other/dir/t.jinja"],
)
def test_invalid_local_path_rejected_by_model():
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
# YAML/override-added local models get the same validation as the API; a single
# bad entry is skipped (logged) rather than breaking the whole catalog load.
from app import models as M
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
]})
cat = M._merge_overrides(M.Catalog(models={}))
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
+47
View File
@@ -0,0 +1,47 @@
"""build_update_command: the matrix-bridge update one-liner.
Pure string assembly, no cluster. Locks in the contract from
docs/spark-control-integration.md (matrix-bridge repo): fetch, hard-reset to the
release branch, then rebuild/recreate via docker compose — chained with `&&` so
any failure (e.g. Gitea unreachable) aborts before the build and surfaces a
non-zero exit. The clone dir must stay unquoted so a `~` expands server-side.
"""
from app.matrix_bridge import build_update_command, _phase_for
def test_command_is_the_contract_chain():
cmd = build_update_command("~/matrix-bridge", "master")
assert cmd == (
"cd ~/matrix-bridge && "
"git fetch origin && "
"git reset --hard origin/master && "
"docker compose up -d --build"
)
def test_fail_loud_chaining():
# Every step is &&-chained: a failed fetch never reaches the build.
cmd = build_update_command("~/matrix-bridge", "master")
assert "; " not in cmd
assert cmd.count(" && ") == 3
assert cmd.index("git fetch") < cmd.index("git reset") < cmd.index("docker compose")
def test_tilde_dir_left_unquoted_for_server_side_expansion():
cmd = build_update_command("~/matrix-bridge", "master")
assert "cd ~/matrix-bridge &&" in cmd
assert "'~" not in cmd # quoting would defeat the home-dir expansion
def test_absolute_dir_and_custom_branch():
cmd = build_update_command("/home/modelo/matrix-bridge", "phase-1")
assert cmd.startswith("cd /home/modelo/matrix-bridge && ")
assert "git reset --hard origin/phase-1 &&" in cmd
def test_phase_detection_maps_known_lines():
assert _phase_for("HEAD is now at 1a2b3c4 some commit") == "Resetting to the latest release…"
assert _phase_for("#5 building image") == "Building the bot image…"
assert _phase_for("Container matrix-bridge Recreate") == "Recreating the container…"
assert _phase_for("Already up to date.") == "No new code; rebuilding…"
assert _phase_for("some unremarkable line") is None
+30 -1
View File
@@ -6,7 +6,12 @@ use `validate_x(v)` inline.
""" """
import pytest import pytest
from app.shellsafe import validate_container, validate_image, validate_repo from app.shellsafe import (
validate_container,
validate_image,
validate_local_path,
validate_repo,
)
# Shell metacharacters that must never survive any validator — these are the # Shell metacharacters that must never survive any validator — these are the
# actual injection vectors. (Path traversal like "../" is NOT in scope here: # actual injection vectors. (Path traversal like "../" is NOT in scope here:
@@ -96,3 +101,27 @@ def test_container_valid_passes_through_unchanged(name):
def test_container_rejects_malformed_and_hostile(name): def test_container_rejects_malformed_and_hostile(name):
with pytest.raises(ValueError): with pytest.raises(ValueError):
validate_container(name) validate_container(name)
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
@pytest.mark.parametrize("path", [
"/home/modelo/models/gemma-4-31B-ten31-v2",
"/data/models/ft.v2_1",
"/srv/m/a-b/c",
])
def test_local_path_valid_passes_through_unchanged(path):
assert validate_local_path(path) == path
@pytest.mark.parametrize("path", [
"",
"relative/path", # must be absolute
"~/models/x", # no ~ expansion
"/models/../etc/shadow", # '..' traversal
"/models/./x", # '.' segment
"/a" * 300, # over the 512 cap (600 chars)
] + [f"/models/x{h}" for h in HOSTILE])
def test_local_path_rejects_relative_traversal_and_hostile(path):
with pytest.raises(ValueError):
validate_local_path(path)
+120
View File
@@ -0,0 +1,120 @@
"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
extra-vLLM probe. All offline — the disabled checks short-circuit before any
network call, and the probes are exercised only on the not-configured path.
"""
import asyncio
from app.config import Settings
from app.health import (
check_embeddings,
check_kokoro,
check_parakeet,
check_qdrant,
check_vllm,
probe_vllm_endpoint,
)
from app.services import services_from_settings
def _settings(monkeypatch, **env) -> Settings:
# Pin the topology env vars under test; default the rest to blank so a stray
# value in the real environment can't leak into the assertion.
keys = [
"SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
"DISABLED_SERVICES", "VLLM_CONTAINER",
]
for k in keys:
monkeypatch.delenv(k, raising=False)
for k, v in env.items():
monkeypatch.setenv(k, v)
return Settings.from_env()
# ---- DISABLED_SERVICES parsing ----
def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
assert s.disabled_services == frozenset({"parakeet", "kokoro"})
def test_disabled_services_blank_is_empty(monkeypatch):
assert _settings(monkeypatch).disabled_services == frozenset()
# ---- vLLM container override ----
def test_vllm_container_defaults_to_vllm_node(monkeypatch):
assert _settings(monkeypatch).vllm_container == "vllm_node"
def test_vllm_container_override(monkeypatch):
assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
def test_vllm_container_invalid_falls_back(monkeypatch):
# A malformed value (space / shell metachar) is rejected at the boundary and
# falls back to the default rather than crashing startup or reaching a sink.
assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
# ---- services map honors the disable list ----
def test_services_from_settings_drops_disabled(monkeypatch):
s = _settings(
monkeypatch,
SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
DISABLED_SERVICES="parakeet,qdrant",
)
svcs = services_from_settings(s)
assert "parakeet" not in svcs and "qdrant" not in svcs
assert "kokoro" in svcs and "embeddings" in svcs
def test_custom_vllm_service_registered(monkeypatch):
from app import custom_services
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
{"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
"user": "u", "container": "vllm_node", "port": 8000},
])
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
svc = services_from_settings(s)["vllm-spark2"]
assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
# A custom entry can't shadow a built-in key — the built-in wins.
from app import custom_services
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
{"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
])
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
assert services_from_settings(s)["parakeet"].kind == "stt"
# ---- disabled health checks short-circuit (no network) ----
def test_disabled_check_returns_disabled_verdict(monkeypatch):
s = _settings(
monkeypatch,
SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins
DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
)
for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
r = asyncio.run(check(s))
assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
# ---- vLLM probe: not-configured path is pure ----
def test_probe_vllm_endpoint_unconfigured(monkeypatch):
r = asyncio.run(probe_vllm_endpoint("", 8000))
assert r["ok"] is False and "not configured" in r["error"]
def test_check_vllm_unconfigured_without_spark1(monkeypatch):
s = _settings(monkeypatch) # no SPARK1_HOST
r = asyncio.run(check_vllm(s))
assert r["ok"] is False and "spark1 not configured" in r["error"]
+11
View File
@@ -1,3 +1,14 @@
ARCHES := x86 ARCHES := x86
# overrides to s9pk.mk must precede the include statement # overrides to s9pk.mk must precede the include statement
include s9pk.mk include s9pk.mk
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
.PHONY: release
release:
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
+21 -100
View File
@@ -3,6 +3,15 @@ import { sparkConfigYaml } from '../fileModels/sparkConfig.yaml'
const { InputSpec, Value } = sdk const { InputSpec, Value } = sdk
// This action is intentionally minimal: just the required wiring needed before
// Spark Control can do anything — the two Spark node addresses and SSH users.
// Every other knob (vLLM/service ports, container names, support-service hosts,
// integrations, webhooks) now lives behind the ⚙ Settings gear in the dashboard
// itself, which is where StartOS 0.4 expects routine config to live (and most
// operators never open StartOS actions). The optional keys still exist in the
// config.yaml schema (set by older versions); they're read into env at launch
// and migrated into the in-app settings overlay on first boot, so nothing is
// lost on upgrade — they're simply edited in the dashboard from now on.
const inputSpec = InputSpec.of({ const inputSpec = InputSpec.of({
spark1_host: Value.text({ spark1_host: Value.text({
name: 'Spark 1 hostname or IP', name: 'Spark 1 hostname or IP',
@@ -40,110 +49,14 @@ const inputSpec = InputSpec.of({
placeholder: 'your SSH username', placeholder: 'your SSH username',
masked: false, masked: false,
}), }),
parakeet_host: Value.text({
name: 'Parakeet host (optional)',
description:
"Override the host running the Parakeet STT container. Leave blank if Parakeet runs on Spark 2 — that's the default. Set this if you run Parakeet on Spark 1 or a different machine.",
required: false,
default: null,
placeholder: 'leave blank to use Spark 2',
masked: false,
}),
parakeet_container: Value.text({
name: 'Parakeet container name (optional)',
description:
'Docker container name for Parakeet. Defaults to "parakeet-asr" — change only if you named yours something else.',
required: false,
default: null,
placeholder: 'parakeet-asr',
masked: false,
}),
kokoro_host: Value.text({
name: 'Kokoro host (optional)',
description:
'Override the host running the Kokoro TTS container. Leave blank if Kokoro runs on Spark 2.',
required: false,
default: null,
placeholder: 'leave blank to use Spark 2',
masked: false,
}),
kokoro_container: Value.text({
name: 'Kokoro container name (optional)',
description: 'Docker container name for Kokoro. Defaults to "kokoro-tts".',
required: false,
default: null,
placeholder: 'kokoro-tts',
masked: false,
}),
embed_host: Value.text({
name: 'Embedding server host (optional)',
description:
'Override the host running the spark-embed container (bge-m3 dense embeddings + reranker). Leave blank if it runs on Spark 2.',
required: false,
default: null,
placeholder: 'leave blank to use Spark 2',
masked: false,
}),
embed_container: Value.text({
name: 'Embedding container name (optional)',
description:
'Docker container name for the embedding server. Defaults to "spark-embed".',
required: false,
default: null,
placeholder: 'spark-embed',
masked: false,
}),
qdrant_host: Value.text({
name: 'Qdrant host (optional)',
description:
'Override the host running the Qdrant vector database. Leave blank if it runs on Spark 2.',
required: false,
default: null,
placeholder: 'leave blank to use Spark 2',
masked: false,
}),
qdrant_container: Value.text({
name: 'Qdrant container name (optional)',
description: 'Docker container name for Qdrant. Defaults to "qdrant".',
required: false,
default: null,
placeholder: 'qdrant',
masked: false,
}),
qdrant_collection: Value.text({
name: 'Default Qdrant collection (optional)',
description:
'Default collection name used by /api/search when a request does not specify one. Leave blank to require callers to pass a collection.',
required: false,
default: null,
placeholder: 'e.g. crm_chunks',
masked: false,
}),
open_webui_url: Value.text({
name: 'Open WebUI URL (optional)',
description:
'If you also run Open WebUI on your LAN, paste its URL here. Spark Control will then show a one-click "Open chat" button next to the current model so you can jump straight to it.',
required: false,
default: null,
placeholder: 'e.g. https://open-webui.yourserver.local',
masked: false,
}),
ngc_api_key: Value.text({
name: 'NGC API key (optional)',
description:
'NVIDIA NGC personal API key — needed to install NIM containers (Parakeet, etc.) from nvcr.io. Get one free at https://ngc.nvidia.com/setup/personal-key. Stored only on this Start9 server; passed to docker as the NGC_API_KEY env var when installing NIM services. (Kokoro TTS is Apache 2.0 and does not need an NGC key.)',
required: false,
default: null,
placeholder: 'starts with "nvapi-..."',
masked: true,
}),
}) })
export const configureSparks = sdk.Action.withInput( export const configureSparks = sdk.Action.withInput(
'configure-sparks', 'configure-sparks',
async () => ({ async () => ({
name: 'Configure Sparks', name: 'Configure Sparks',
description: 'Set the hostnames and SSH users for your two Spark nodes.', description:
'Set your two Spark node addresses and SSH users — the required wiring. Everything else (ports, container names, support services, integrations) is configured under ⚙ Settings in the Spark Control dashboard.',
warning: null, warning: null,
visibility: 'enabled', visibility: 'enabled',
allowedStatuses: 'any', allowedStatuses: 'any',
@@ -151,11 +64,19 @@ export const configureSparks = sdk.Action.withInput(
}), }),
async () => inputSpec, async () => inputSpec,
async ({ effects }) => { async ({ effects }) => {
// Prefill from the saved config, but only the keys this (trimmed) form owns.
const cfg = await sparkConfigYaml.read().once() const cfg = await sparkConfigYaml.read().once()
return cfg ?? null if (!cfg) return null
return {
spark1_host: cfg.spark1_host,
spark1_user: cfg.spark1_user,
spark2_host: cfg.spark2_host,
spark2_user: cfg.spark2_user,
}
}, },
async ({ effects, input }) => { async ({ effects, input }) => {
// Optional fields come through as `null`; coerce to empty string for the schema. // merge() only touches the four keys we submit, leaving any legacy optional
// values already in config.yaml intact.
const normalized = Object.fromEntries( const normalized = Object.fromEntries(
Object.entries(input).map(([k, v]) => [k, v ?? '']), Object.entries(input).map(([k, v]) => [k, v ?? '']),
) as Record<string, string> ) as Record<string, string>
@@ -7,6 +7,13 @@ export const sparkConfigSchema = z.object({
spark1_user: z.string().catch(''), spark1_user: z.string().catch(''),
spark2_host: z.string().catch(''), spark2_host: z.string().catch(''),
spark2_user: z.string().catch(''), spark2_user: z.string().catch(''),
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
vllm_port: z.string().catch(''),
// Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
vllm_container: z.string().catch(''),
// Optional comma-separated list of built-in services to switch off
// (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
disabled_services: z.string().catch(''),
// Optional per-service overrides. Blank => use spark2_host / spark2_user. // Optional per-service overrides. Blank => use spark2_host / spark2_user.
parakeet_host: z.string().catch(''), parakeet_host: z.string().catch(''),
parakeet_user: z.string().catch(''), parakeet_user: z.string().catch(''),
@@ -22,10 +29,17 @@ export const sparkConfigSchema = z.object({
qdrant_user: z.string().catch(''), qdrant_user: z.string().catch(''),
qdrant_container: z.string().catch(''), qdrant_container: z.string().catch(''),
qdrant_collection: z.string().catch(''), qdrant_collection: z.string().catch(''),
// Optional matrix-bridge bot. Blank => no tile. Host reuses Spark 2.
matrix_bridge_user: z.string().catch(''),
// Optional Open WebUI deep-link // Optional Open WebUI deep-link
open_webui_url: z.string().catch(''), open_webui_url: z.string().catch(''),
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/... // Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
ngc_api_key: z.string().catch(''), ngc_api_key: z.string().catch(''),
// Optional coordination webhook: POSTed on swap_complete/swap_failed so
// downstream consumers re-point their model config. Blank => disabled.
swap_webhook_url: z.string().catch(''),
// Optional shared secret; if set, the webhook body is HMAC-signed.
swap_webhook_secret: z.string().catch(''),
}) })
export type SparkConfig = z.infer<typeof sparkConfigSchema> export type SparkConfig = z.infer<typeof sparkConfigSchema>
+12
View File
@@ -13,6 +13,9 @@ export const main = sdk.setupMain(async ({ effects }) => {
spark1_user: '', spark1_user: '',
spark2_host: '', spark2_host: '',
spark2_user: '', spark2_user: '',
vllm_port: '',
vllm_container: '',
disabled_services: '',
parakeet_host: '', parakeet_host: '',
parakeet_user: '', parakeet_user: '',
parakeet_container: '', parakeet_container: '',
@@ -26,8 +29,11 @@ export const main = sdk.setupMain(async ({ effects }) => {
qdrant_user: '', qdrant_user: '',
qdrant_container: '', qdrant_container: '',
qdrant_collection: '', qdrant_collection: '',
matrix_bridge_user: '',
open_webui_url: '', open_webui_url: '',
ngc_api_key: '', ngc_api_key: '',
swap_webhook_url: '',
swap_webhook_secret: '',
} }
return sdk.Daemons.of(effects).addDaemon('primary', { return sdk.Daemons.of(effects).addDaemon('primary', {
@@ -49,6 +55,9 @@ export const main = sdk.setupMain(async ({ effects }) => {
SPARK1_USER: cfg.spark1_user, SPARK1_USER: cfg.spark1_user,
SPARK2_HOST: cfg.spark2_host, SPARK2_HOST: cfg.spark2_host,
SPARK2_USER: cfg.spark2_user, SPARK2_USER: cfg.spark2_user,
VLLM_PORT: cfg.vllm_port,
VLLM_CONTAINER: cfg.vllm_container,
DISABLED_SERVICES: cfg.disabled_services,
PARAKEET_HOST: cfg.parakeet_host, PARAKEET_HOST: cfg.parakeet_host,
PARAKEET_USER: cfg.parakeet_user, PARAKEET_USER: cfg.parakeet_user,
PARAKEET_CONTAINER: cfg.parakeet_container, PARAKEET_CONTAINER: cfg.parakeet_container,
@@ -62,11 +71,14 @@ export const main = sdk.setupMain(async ({ effects }) => {
QDRANT_USER: cfg.qdrant_user, QDRANT_USER: cfg.qdrant_user,
QDRANT_CONTAINER: cfg.qdrant_container, QDRANT_CONTAINER: cfg.qdrant_container,
QDRANT_COLLECTION: cfg.qdrant_collection, QDRANT_COLLECTION: cfg.qdrant_collection,
MATRIX_BRIDGE_USER: cfg.matrix_bridge_user,
MODELS_OVERRIDES: '/data/models-overrides.yaml', MODELS_OVERRIDES: '/data/models-overrides.yaml',
SERVICES_OVERRIDES: '/data/services-overrides.yaml', SERVICES_OVERRIDES: '/data/services-overrides.yaml',
CONNECTIVITY_LOG: '/data/connectivity.json', CONNECTIVITY_LOG: '/data/connectivity.json',
OPEN_WEBUI_URL: cfg.open_webui_url, OPEN_WEBUI_URL: cfg.open_webui_url,
NGC_API_KEY: cfg.ngc_api_key, NGC_API_KEY: cfg.ngc_api_key,
SWAP_WEBHOOK_URL: cfg.swap_webhook_url,
SWAP_WEBHOOK_SECRET: cfg.swap_webhook_secret,
BIND_PORT: String(uiPort), BIND_PORT: String(uiPort),
}, },
}, },
+2 -2
View File
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk' import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({ export const v0_1_0 = VersionInfo.of({
version: '0.20.0:0', version: '0.27.3:0',
releaseNotes: { releaseNotes: {
en_US: en_US:
"v0.20.0:0 — Spark connectivity helpers on the hardware cards. (1) A small copy icon in each card's top-right corner grabs that Spark's SSH public key — the key the Spark uses to log in to OTHER machines (e.g. your Mac). If the Spark has no key yet, one is generated on the spot (no passphrase, so apps can use it unattended); an existing key is never overwritten. A dialog shows the key plus a ready-to-paste command for adding it on the target machine. (This is the opposite direction from the existing \"Show Public Key\" action, which grants THIS dashboard access to your Sparks.) (2) If a Spark is on a WireGuard tunnel, its card now shows a read-only \"VPN <ip>\" badge next to the uptime, so you can see at a glance that the box is reachable off-LAN. All read-only — the dashboard does not configure the tunnel.", 'v0.27.3:0 — Qwen3.6 vision now works end-to-end, including full-size phone photos. (1) Qwen3.6-35B-A3B reads images (e.g. business-card OCR) and now shows a "vision" badge on its card. (2) Fix: large/high-resolution images (e.g. a 12-megapixel phone photo) were being rejected by the model with a 400 error — a single big image expands to more vision tokens than vLLM allows. The Qwen launch now caps image resolution (max_pixels) so oversized images are automatically downscaled to a size the model accepts; the dashboard, Open WebUI, and any downstream app can now send full-size photos to the /v1 endpoint without errors, and OCR stays sharp. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
}, },
migrations: { migrations: {
up: async ({ effects }) => {}, up: async ({ effects }) => {},
+63 -4
View File
@@ -34,13 +34,61 @@ These take effect on the **next swap to that model**. If a swap fails after this
- Status auto-refreshes every 5 s. - Status auto-refreshes every 5 s.
- A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream. - A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
## matrix-bridge bot tile (optional)
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
```
Host <your-git-host>
Port <port>
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
```
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
## Configurable topology (v0.24.0+)
For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
```yaml
custom:
- key: vllm-spark2
kind: vllm
host: <spark-2-ip>
user: <ssh-user>
container: vllm_node
port: 8000
```
It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
## Adding a new model ## Adding a new model
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`. The menu is whatever's downloaded on the Sparks, so the normal path is just:
2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host> 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1. **download it, then set it up once.**
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable. 1. **Download** from the dashboard (**+ Download a new model**, paste the HF repo) or on Spark 1 with `./hf-download.sh <repo>`. When it finishes it appears on the menu by itself.
2. **Set it up.** If Spark Control already has a recipe for it (see below), it's ready to switch to. Otherwise it shows a **"needs setup"** card: the first switch reads the model's `config.json`, proposes how to launch it (family/parsers, solo vs cluster, vLLM flags), and you confirm once. The confirmed recipe persists to `/data/models-overrides.yaml` (survives package updates).
### Bundling a launch recipe (optional — skips the setup prompt)
To make a known model launch correctly the instant it's downloaded, add a *recipe* to `image/models.yaml`. These are **not** the menu — they're matched to an on-disk model by `repo`. Required: `display_name`, `repo`, `size_gb`, `mode` (`solo`/`cluster`), `vllm_args`. Optional: `description`, `capabilities` (e.g. `[vision, reasoning, tools]`), `expected_ready_seconds`. Then rebuild + redeploy: `cd package && make x86 && make install`. Keep descriptions generic (not user-specific) so the recipes stay portable.
### Local / fine-tuned models (v0.23.0+)
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
## Manual swap fallback ## Manual swap fallback
@@ -57,6 +105,17 @@ cd ~/spark-vllm-docker
docker logs -f vllm_node # wait for "Application startup complete." docker logs -f vllm_node # wait for "Application startup complete."
``` ```
## Sideload (`make install`) can't reach the server
Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
- **Fix:** make the name resolve again, then re-run `make install`:
- `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
- `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
## Diagnostics ## Diagnostics
```bash ```bash
+65
View File
@@ -0,0 +1,65 @@
#!/usr/bin/env bash
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
# latest package with a read-only token instead of being hand-sent the file.
#
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
#
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
# reuses an existing release for the tag and replaces a same-named asset.
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
set -euo pipefail
VERSION="${1:-}"; S9PK="${2:-}"
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
ASSET="$(basename "$S9PK")"
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
api() {
local method="$1" url="$2"; shift 2
local out
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
-w $'\n%{http_code}' "$url")"
HTTP_CODE="${out##*$'\n'}"
BODY="${out%$'\n'*}"
}
# Reuse an existing release for this tag, otherwise create one.
api GET "$API/releases/tags/$TAG"
if [ "$HTTP_CODE" = 200 ]; then
id="$(printf '%s' "$BODY" | jq -r '.id')"
elif [ "$HTTP_CODE" = 404 ]; then
api POST "$API/releases" -H 'Content-Type: application/json' \
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
id="$(printf '%s' "$BODY" | jq -r '.id')"
else
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
exit 1
fi
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
# Replace a same-named asset so re-runs don't 409.
api GET "$API/releases/$id/assets"
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
api POST "$API/releases/$id/assets?name=$ASSET" \
-F "attachment=@${S9PK};type=application/octet-stream"
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"