Compare commits
19 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 1f359e3c79 | |||
| 9a3bf9ed86 | |||
| c846386c1a | |||
| 1e1e1cb568 | |||
| a20c538ebf | |||
| 7e0759846f | |||
| b67e001642 | |||
| df9f244eae | |||
| c0b35184ba | |||
| 7ecd77f1e5 | |||
| 6bcda6e348 | |||
| 7ae6ab3ba8 | |||
| dd3d1412d4 | |||
| 26070eb191 | |||
| 90394f891b | |||
| e783653ef0 | |||
| 57a893000e | |||
| 56f7ea4444 | |||
| aaad57d88f |
@@ -33,7 +33,7 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
|||||||
|
|
||||||
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
|
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
|
||||||
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
|
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
|
||||||
- `docs/` — `AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md` (consumer-facing API refs; update with API changes).
|
- `docs/` — `AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`, `COORDINATION.md` (consumer-facing API refs; update with API changes).
|
||||||
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
|
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
|
||||||
|
|
||||||
## Conventions
|
## Conventions
|
||||||
@@ -55,12 +55,22 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
|||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
- **Working (v0.21.0:1, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
|
- **Live: v0.27.3:0 — Qwen3.6 vision works end-to-end (incl. full-size phone photos).** Installed on `immense-voyage` (`start-cli` confirms `0.27.3:0`). Two-part story: **(A) the daily driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — recipe was mislabelled `[reasoning]`, now `[vision, reasoning]`. Real business card read **7/7 fields perfect** (~97 tok/s, no patches). **(B) oversized-image fix:** a 12MP phone photo expands to ~11.8k vision tokens → exceeds vLLM's ~4096-image-token cap → **400 "Failed to apply Qwen3VLProcessor … token count mismatch."** Fix = cap resolution server-side via `'--mm-processor-kwargs={"max_pixels": 2000000}'` in the qwen36 recipe (auto-downscales big images for *every* `/v1` consumer; verified live — the 12MP image went 400→200). Quoting survives the stack because `launch-cluster.sh` does `printf "%q"` on the serve args (line 163) and `build_launch_command` shlex-quotes (round-trip test passes). **An in-dashboard "Vision check" button shipped in v0.27.2 then was removed in v0.27.3 at the owner's request** (clutter; the `vision` badge already signals capability — don't re-add it). The `/v1/chat/completions` proxy is a dumb passthrough that already forwards image content, so no backend change was needed. 161 pytest green.
|
||||||
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
|
- **Gemma-4-26B-A4B-NVFP4 eval — RESOLVED as "defer; Qwen covers vision better."** Two independent deep-research agents (this session) confirmed: it does NOT run on the stock `eugr/spark-vllm-docker` stack (crashes on `tie_weights` `NotImplementedError` — the checkpoint declares compressed-tensors in config.json but is modelopt NVFP4). The working path needs the **`vllm/vllm-openai:gemma4-0505-arm64-cu130`** image (lacks Ray → can't go through `launch-cluster.sh`, needs **raw `docker run`** = the deferred raw-docker-swap feature) **+ a bind-mounted patched `gemma4.py`** (upstream PR #39084 unmerged) **+ `--moe-backend marlin`**, AND even then **vision is degraded** by open vLLM bug #40106 (wrong attention on image tokens — hurts OCR specifically). ~52 tok/s vs Qwen's 97. Net: more duct tape for worse vision than the Qwen Grant already runs. Revisit when #40106 + #39084 land. Alternatives agent also flagged **`RedHatAI/Qwen3.5-122B-A10B-NVFP4`** as the proven single-Spark *reasoning* step-up (30–51 tok/s, fits 128 GB, no patches) — a future daily-driver upgrade, orthogonal to vision.
|
||||||
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
|
- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
|
||||||
|
- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
|
||||||
|
- **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
|
||||||
|
- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
|
||||||
|
- **Gemma-4-26B-A4B vision eval — DONE this session (deferred; see the v0.27.2 + Gemma bullets up top).** The `gemma4-26b` recipe stays in the catalog but is known not to launch on the stock stack; the owner's vision/OCR goal is met by the Qwen3.6 daily driver instead.
|
||||||
|
- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
|
||||||
|
- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
|
||||||
|
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
|
||||||
|
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (157 passing; the in-app settings gear + swap-lock route-order regression + the webhook-repoint live-reload check are in `test_app_settings.py`, incl. `TestClient` end-to-end). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
|
||||||
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
||||||
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
||||||
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
||||||
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
||||||
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
|
- **Hosting / distribution:** source on self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) The s9pk ships via Gitea Releases (`make release`) **and** a self-hosted StartOS registry — operator-local publish tooling lives outside the repo; owner-specific addresses + the **authenticated-writes-must-be-direct-not-via-the-tunnel** gotcha are in session memory.
|
||||||
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — DONE in tree, staged as **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). **Not yet built/installed/committed — awaiting go/no-go.** (2) local-path/fine-tuned models (in ROADMAP under Dashboard). (3) configurable topology (service→Spark→port map + container names). (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
|
- **Design stance (decided):** Spark Control = control plane / GPU arbiter, **not** a job runner; recurring business jobs live in separate services that *call* the swap API (`POST /api/swap`). Full epic history (v0.22→v0.25) is in git log + `ROADMAP.md` → "Cluster coordination".
|
||||||
|
- **Usage note (2026-06-18):** owner's daily driver is the solo **Qwen3.6 35B**; the 235B `cluster` models are dormant. Keeping `launch-cluster.sh` (the `eugr/spark-vllm-docker` community standard, mirrors NVIDIA's `dgx-spark-playbooks` Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
|
||||||
|
- **Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in `ROADMAP.md`):** (1) raw-`docker run` swap generalization — **DEFERRED** (rationale in ROADMAP; revisit only if an adopter wants Spark Control to *drive*, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting `launch-cluster.sh`). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker `HEALTHCHECK` — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).
|
||||||
|
|||||||
+5
-6
@@ -73,16 +73,15 @@ The first start generates an ed25519 SSH keypair inside the package volume. Wait
|
|||||||
### 4. Configure Sparks
|
### 4. Configure Sparks
|
||||||
|
|
||||||
- Open Spark Control → **Actions → Configure Sparks**.
|
- Open Spark Control → **Actions → Configure Sparks**.
|
||||||
- Fill in:
|
- Fill in just the four required fields:
|
||||||
- **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first.
|
- **Spark 1 hostname or IP** — prefer the **IP** (e.g. `192.168.1.x`) over `.local` hostnames; vLLM only binds IPv4 and mDNS can resolve to IPv6 first.
|
||||||
- **Spark 1 SSH user** — whatever username you set up on Spark 1.
|
- **Spark 1 SSH user** — whatever username you set up on Spark 1.
|
||||||
- **Spark 2 hostname or IP** + **SSH user** — same idea.
|
- **Spark 2 hostname or IP** + **SSH user** — same idea.
|
||||||
- Optional Parakeet/Kokoro overrides — leave blank if those services run on Spark 2 (the normal case).
|
|
||||||
- Optional **Open WebUI URL** — paste your Open WebUI LAN URL to get a deep-link button in the dashboard next to the current model.
|
|
||||||
- Optional **NGC API key** — paste it here if you have one.
|
|
||||||
|
|
||||||
Save.
|
Save.
|
||||||
|
|
||||||
|
Everything else is optional and lives in the dashboard, not this action: open Spark Control and click **⚙ Settings** in the top bar to set vLLM/service **ports** (e.g. if your vLLM runs on 8000 rather than the default 8888, or you moved Parakeet off 8000), container names, support-service hosts, an **Open WebUI URL** (adds a deep-link button), an **NGC API key**, and a swap webhook. Changes there apply immediately and are included in StartOS backups.
|
||||||
|
|
||||||
### 5. Re-run Show Public Key (if you skipped earlier)
|
### 5. Re-run Show Public Key (if you skipped earlier)
|
||||||
|
|
||||||
Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
|
Now that hosts are configured, Show Public Key will give you the paste-ready install command. Run it as described in step 3.
|
||||||
@@ -92,7 +91,7 @@ Now that hosts are configured, Show Public Key will give you the paste-ready ins
|
|||||||
From the Spark Control service page, click the Web UI button. You should see:
|
From the Spark Control service page, click the Web UI button. You should see:
|
||||||
|
|
||||||
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
|
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
|
||||||
- An **LLM tab** with cards for each model in the bundled catalog. Models you've downloaded show "on disk" badges; others show "not downloaded".
|
- An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes.
|
||||||
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
|
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
|
||||||
|
|
||||||
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
|
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
|
||||||
@@ -159,7 +158,7 @@ All of these inherit Spark Control's TLS cert and StartOS access controls. You o
|
|||||||
A few things worth knowing:
|
A few things worth knowing:
|
||||||
|
|
||||||
- The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated.
|
- The codebase is **two halves**: `image/` is a standalone FastAPI app you can run with `uvicorn app.server:app` for local dev. `package/` is the StartOS wrapper. Changes to either should be coordinated.
|
||||||
- **All connection info** comes from environment variables in `image/app/config.py`, populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action. No IPs, usernames, or paths are hardcoded in runtime code.
|
- **All connection info** comes from environment variables in `image/app/config.py`. The four required fields are populated from `package/startos/fileModels/sparkConfig.yaml.ts` via the Configure Sparks action; the optional knobs are overlaid from the in-app `⚙ Settings` store (`/data/app_settings.json`, see `image/app/app_settings.py`). No IPs, usernames, or paths are hardcoded in runtime code.
|
||||||
- The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it.
|
- The **path `~/spark-vllm-docker`** *is* hardcoded in `swap.py`, `download.py`, `updates.py`, and `models.py`. If the user has cloned the upstream repo elsewhere, either fix the path or symlink it.
|
||||||
- **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates.
|
- **Persistent state** lives at `/data/` inside the container: `config.yaml`, `models-overrides.yaml`, `services-overrides.yaml`, `connectivity.json`, `ssh/`. These survive package updates.
|
||||||
- The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips.
|
- The dashboard polls every 5 s; check `image/app/health.py` and `image/app/connectivity.py` for the probing logic. External apps can also POST failures to `/api/health-event` to log between-poll blips.
|
||||||
|
|||||||
@@ -112,14 +112,14 @@ Fields: `service` (required), `ok` (required), `source` (optional, free-form), `
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
**v0.2.3 / s9pk version 0.13.0:4** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
|
**s9pk version 0.26.0:0** — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled *launch recipes* (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.
|
||||||
|
|
||||||
### What v0.2 added on top of v0.1
|
### What v0.2 added on top of v0.1
|
||||||
|
|
||||||
- **Service discovery API** (`/api/endpoints`) for other LAN services
|
- **Service discovery API** (`/api/endpoints`) for other LAN services
|
||||||
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
|
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
|
||||||
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host configuration in Configure Sparks (so they can live on Spark 1, Spark 2, or anywhere)
|
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host/port/container configuration in the in-app **⚙ Settings** gear (so they can live on Spark 1, Spark 2, or anywhere, on any port)
|
||||||
- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
|
- **Model download** from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
|
||||||
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
|
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
|
||||||
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
|
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
|
||||||
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
|
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
|
||||||
|
|||||||
+19
-7
@@ -10,12 +10,25 @@ Driven by the one other Spark Control adopter (a colleague running OpenClaw + cr
|
|||||||
|
|
||||||
Sequenced:
|
Sequenced:
|
||||||
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
||||||
2. **Local-path / fine-tuned model support** — see the dedicated item under "## Dashboard" below. Independently wanted; his merged `ten31-v2` (a directory, not an HF repo) is the motivating case.
|
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
|
||||||
3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
|
3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
|
||||||
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
|
4. **Coordination layer** — DONE in tree, staged as **v0.25.0:0** (built/typechecked clean; install pending). All three primitives shipped; `image/app/coordination.py` + `docs/COORDINATION.md`. Brought forward 2026-06-17 on request rather than waiting for our own automation.
|
||||||
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
|
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). Acquire returns a secret token; the swap endpoint refuses any real swap (`423`) that doesn't present it in `X-Swap-Lock-Token`, so the dashboard's manual swap is paused while a scheduler holds it (with a `?force=true` human override). In-memory + TTL-bounded → resets to unlocked on restart; re-acquire with the token extends. Enforced in `post_swap`, not advisory.
|
||||||
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
|
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL (Configure-Sparks field), fired from `SwapManager._run` *outside* the swap lock; optional shared secret ⇒ `X-Spark-Signature` HMAC. Fire-and-forget (5 s, no retries); dry runs don't fire.
|
||||||
- **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule).
|
- **Schedule visibility** — `GET/POST/DELETE /api/schedule`; read-only "Scheduled jobs" dashboard panel, registered by external schedulers. Spark Control stores and displays, never executes.
|
||||||
|
- Tests: `image/tests/test_coordination.py` (22 cases — lock lifecycle/expiry/token, the single-read swap gate, schedule CRUD + id validation, webhook payload+signature). Known limit: lock + schedules are in-memory (a restart frees the lock and empties the registry until schedulers re-register) — persist to `/data` only if that bites.
|
||||||
|
|
||||||
|
### Generalizing the swap mechanism to raw `docker run` — DEFERRED (decided 2026-06-18, research-backed; was item 4's last open thread)
|
||||||
|
|
||||||
|
Our swap drives `~/spark-vllm-docker/launch-cluster.sh` over SSH on Spark 1 (`./launch-cluster.sh stop`, then `[VLLM_SPARK_EXTRA_DOCKER_ARGS=…] ./launch-cluster.sh [--solo ]-d exec vllm serve <model> <args>`, then `docker logs -f` until the ready marker). The OpenClaw adopter launches vLLM with a plain `docker run` instead, so the swap button can't drive his cluster — only monitor it. The portability fix would be a configurable "swap backend": keep `launch-cluster.sh` as the default and add a "bring your own command" mode (operator-authored stop/launch templates in `services-overrides.yaml` with quoted `{model}`/`{container}`/`{port}`/`{extra_args}` substitution; ready-detection unchanged; the vLLM-argparse pre-flight disabled for that backend).
|
||||||
|
|
||||||
|
**Why deferred, not built:**
|
||||||
|
- **Raw docker is not an upgrade for *us* — for half our catalog it's impossible.** `launch-cluster.sh` is the `eugr/spark-vllm-docker` community project (de-facto DGX Spark standard; mirrors NVIDIA's own `dgx-spark-playbooks` Ray+RDMA architecture). Its headline job is **multi-node** serving: our 235B `cluster` models (Qwen3-VL 235B, Qwen3 235B) exceed one Spark's 128 GB and *must* shard across both Sparks via Ray over the 200 Gbps ConnectX/RoCE link — plumbing (NCCL/MTU/per-node env) that a single-node `docker run` cannot do. So we keep the helper script; switching our own cluster to raw docker is off the table.
|
||||||
|
- **The feature is therefore portability-only** (for differently-wired adopters), and the one known adopter doesn't need it — he swaps via his own crons and uses Spark Control to watch.
|
||||||
|
- **Untestable on our hardware** — our cluster uses the helper script, so we can't validate a real raw-docker swap without risking the live vLLM.
|
||||||
|
- The one real standing risk is eugr's single-maintainer status; fallback is community forks or migrating to NVIDIA's official `dgx-spark-playbooks` launcher (same design). No reason to switch now.
|
||||||
|
|
||||||
|
**Revisit only if** an adopter explicitly wants Spark Control to *drive* (not just monitor) swaps on a raw-`docker run` cluster. At that point, get their actual working `docker run` command and build the command-template backend to it.
|
||||||
|
|
||||||
## Near term
|
## Near term
|
||||||
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
|
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
|
||||||
@@ -34,7 +47,6 @@ Sequenced:
|
|||||||
- Second audio worker / queueing layer; revisit which services share Spark 2.
|
- Second audio worker / queueing layer; revisit which services share Spark 2.
|
||||||
|
|
||||||
## Dashboard
|
## Dashboard
|
||||||
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (`models.yaml` + custom overrides) and the "Add custom model" path (`POST /api/models`) only accepts an HF `org/name` repo (`shellsafe._HF_REPO_RE`), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the `org/name` regex (path whitelist + `shlex.quote`, no traversal); (b) `models.build_launch_command` / `launch-cluster.sh` able to `vllm serve <path>`; (c) `disk.py` size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
|
|
||||||
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
|
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
|
||||||
- Spark host update actions (OS/driver) from the UI.
|
- Spark host update actions (OS/driver) from the UI.
|
||||||
- Open WebUI link-out integration; richer per-service detail views.
|
- Open WebUI link-out integration; richer per-service detail views.
|
||||||
|
|||||||
@@ -0,0 +1,157 @@
|
|||||||
|
# Cluster coordination through Spark Control (v0.25.0)
|
||||||
|
|
||||||
|
Spark Control is the **GPU arbiter, not a job runner.** Your recurring pipelines
|
||||||
|
(model-warming crons, "daily X" generators, batch jobs) live in your own
|
||||||
|
services and *drive Spark Control's swap API*. This page documents the safety
|
||||||
|
layer around that: a **swap reservation lock**, a **swap-event webhook**, and a
|
||||||
|
**read-only schedule registry**.
|
||||||
|
|
||||||
|
If only the dashboard ever swaps models, you don't need any of this — it's for
|
||||||
|
when something automated also swaps.
|
||||||
|
|
||||||
|
All endpoints are on the Spark Control host (same LAN/VPN URL as the LLM, audio,
|
||||||
|
and embeddings proxies). There is no API-token auth by design (LAN + split-tunnel
|
||||||
|
VPN only); a non-browser client passes the same-origin guard automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Swap reservation lock
|
||||||
|
|
||||||
|
A short, TTL-bounded reservation of the swap path. While a lock is held, **any
|
||||||
|
real swap that doesn't present the holder's token is refused with `423 Locked`**
|
||||||
|
— including the dashboard's manual swap. The holder *name* is descriptive; the
|
||||||
|
returned **token** is the secret that authorises swaps and the release.
|
||||||
|
|
||||||
|
The lock is in-memory: it resets to *unlocked* if Spark Control restarts (the
|
||||||
|
safe-for-availability default), and the swap engine's own in-progress guard
|
||||||
|
still prevents two swaps running at once.
|
||||||
|
|
||||||
|
### `POST /api/swap/lock` — acquire (or extend)
|
||||||
|
|
||||||
|
```json
|
||||||
|
// request
|
||||||
|
{ "holder": "openclaw-daily-vol", "ttl_seconds": 900, "note": "daily vol run" }
|
||||||
|
|
||||||
|
// 200 response
|
||||||
|
{
|
||||||
|
"held": true,
|
||||||
|
"holder": "openclaw-daily-vol",
|
||||||
|
"acquired_at": "2026-06-17T12:00:00+00:00",
|
||||||
|
"expires_at": "2026-06-17T12:15:00+00:00",
|
||||||
|
"seconds_remaining": 900,
|
||||||
|
"note": "daily vol run",
|
||||||
|
"token": "a1b2c3…" // SECRET — store it; needed to swap and to release
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `ttl_seconds` is optional (default 900) and clamped to `[1, 86400]`.
|
||||||
|
- **`409`** if a *different* holder already holds it (body includes the current
|
||||||
|
`lock` state). To **extend** your own lock, POST again with the same `holder`
|
||||||
|
**and** your `token` — the token is preserved and the window slides forward.
|
||||||
|
|
||||||
|
### `GET /api/swap/lock` — status (no token)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "held": true, "holder": "openclaw-daily-vol", "expires_at": "…", "seconds_remaining": 612, "note": "…" }
|
||||||
|
// or
|
||||||
|
{ "held": false }
|
||||||
|
```
|
||||||
|
|
||||||
|
### `DELETE /api/swap/lock` — release
|
||||||
|
|
||||||
|
Send your token in the `X-Swap-Lock-Token` header (or `?token=`):
|
||||||
|
|
||||||
|
```
|
||||||
|
DELETE /api/swap/lock
|
||||||
|
X-Swap-Lock-Token: a1b2c3…
|
||||||
|
```
|
||||||
|
|
||||||
|
- **`403`** if the token doesn't match. The dashboard's human override is
|
||||||
|
`DELETE /api/swap/lock?force=true` (no token).
|
||||||
|
|
||||||
|
### Swapping while you hold the lock
|
||||||
|
|
||||||
|
Pass the token on the swap call; the dashboard (no token) is then blocked:
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/swap
|
||||||
|
X-Swap-Lock-Token: a1b2c3…
|
||||||
|
{ "model_key": "gemma-3-27b" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Recommended scheduler flow: **acquire → swap (with token) → poll `/api/swap/{id}`
|
||||||
|
→ release**. Always release in a `finally`; if you crash, the TTL frees it.
|
||||||
|
|
||||||
|
> `POST /api/swap/{key}/validate` (pre-flight) and dry-run swaps are **not**
|
||||||
|
> blocked by the lock — they don't touch the cluster.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Swap-event webhook
|
||||||
|
|
||||||
|
Configure a URL in **Configure Sparks → "Swap webhook URL"**. After every real
|
||||||
|
swap, Spark Control POSTs:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"event": "swap_complete", // or "swap_failed"
|
||||||
|
"job_id": "1a2b3c4d",
|
||||||
|
"model_key": "gemma-3-27b",
|
||||||
|
"state": "ready", // or "failed"
|
||||||
|
"returncode": 0,
|
||||||
|
"started_at": "2026-06-17T12:00:00+00:00",
|
||||||
|
"finished_at": "2026-06-17T12:03:11+00:00",
|
||||||
|
"dry_run": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Headers: `X-Spark-Event: swap_complete`. If you set a **webhook secret**, the
|
||||||
|
body is signed: `X-Spark-Signature: sha256=<hmac>` (HMAC-SHA256 of the raw body
|
||||||
|
with the shared secret). Verify it like:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import hmac, hashlib
|
||||||
|
expected = "sha256=" + hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest()
|
||||||
|
assert hmac.compare_digest(expected, request.headers["X-Spark-Signature"])
|
||||||
|
```
|
||||||
|
|
||||||
|
Delivery is best-effort and fire-and-forget (5 s timeout, no retries) — a
|
||||||
|
webhook failure never affects the swap itself. Dry runs don't fire.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Schedule registry (read-only display)
|
||||||
|
|
||||||
|
So the dashboard can show *what's scheduled to touch the GPU and when*, your
|
||||||
|
schedulers register their jobs here. **Spark Control only displays these — it
|
||||||
|
never executes them.**
|
||||||
|
|
||||||
|
### `POST /api/schedule` — register / update
|
||||||
|
|
||||||
|
```json
|
||||||
|
// request (pass a stable `id` to update in place on re-register)
|
||||||
|
{ "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
|
||||||
|
"cron": "0 6 * * *", "next_run": "2026-06-18T06:00:00Z",
|
||||||
|
"description": "Swaps to the big model, generates the vol report" }
|
||||||
|
|
||||||
|
// response: the stored entry (generates an id if you omit one)
|
||||||
|
```
|
||||||
|
|
||||||
|
`name` is required; `id` (if given) must match `[A-Za-z0-9_.-]` (≤64 chars).
|
||||||
|
|
||||||
|
### `GET /api/schedule` — list
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "schedules": [ { "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
|
||||||
|
"cron": "0 6 * * *", "next_run": "…", "description": "…",
|
||||||
|
"registered_at": "…", "updated_at": "…" } ] }
|
||||||
|
```
|
||||||
|
|
||||||
|
### `DELETE /api/schedule/{id}` — deregister
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "deleted": true }
|
||||||
|
```
|
||||||
|
|
||||||
|
The registry is in-memory — re-register your schedules on your own startup so
|
||||||
|
they survive a Spark Control restart.
|
||||||
@@ -35,10 +35,13 @@ Two kinds, both run with the `image/.venv` interpreter (system python3 has no de
|
|||||||
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
|
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
|
||||||
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
|
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
|
||||||
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
|
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
|
||||||
|
- **Settings split (gear vs StartOS action):** only the four *required* fields (both Spark IPs + SSH users) live in the StartOS "Configure Sparks" action → `config.yaml` → env. Every *optional* knob (ports, container names, support-service hosts, integrations, webhook) is edited in the dashboard's ⚙ Settings gear, backed by the `/data/app_settings.json` overlay (`app_settings.py`), keyed by the same env-var names. Precedence (`config._effective_env`): `os.environ` first, overlay on top. `app_settings.seed_from_env` runs **once at startup** to migrate a pre-gear install's env values into the overlay (don't move seeding into `from_env`/`reload` — it writes, and `from_env` runs on every build → it would clobber across calls, which it did once already). **`Settings` is deliberately not frozen:** one shared instance is threaded by reference into every router closure/manager, and `Settings.reload()` (called after a gear save) recomputes its fields **in place** so changes apply live with no restart and no call-site changes. **Gotcha:** this only reaches holders that keep the *object* (`self.settings = settings`); anything that snapshots a *value* at construction is invisible to `reload()` and must be re-synced explicitly. The one such holder is `WebhookNotifier`, which copies `url`/`secret` — `post_settings` calls `swap_webhook.update(...)` right after `reload()`. Any future component that caches a gear-managed value (rather than reading `settings.x` at use time) needs the same treatment. A new gear knob = add one entry to `app_settings.FIELDS` (the front-end renders it generically); the matching `config.Settings` field must already read that env var.
|
||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
|
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
|
||||||
|
- `image/app/discovery.py` — the disk-driven model menu. `/api/models` lists what's actually downloaded on the Sparks (via `disk.list_cached_models`); `models.yaml`/overrides are *launch recipes* matched by repo, not the menu. An on-disk model with no recipe is `needs_setup` → `infer_recipe` reads its `config.json` to prefill a setup form the operator confirms once.
|
||||||
|
- `image/app/app_settings.py` — the in-app settings overlay backing the ⚙ gear: `FIELDS` metadata (drives `/api/settings` + the UI form), `load_overlay()` (pure read), `seed_from_env()` (one-time migration), `apply()` (validate + persist). `GET/POST /api/settings` in `server.py` read/write it, then `settings.reload()`.
|
||||||
- `image/app/static/` — the dashboard UI.
|
- `image/app/static/` — the dashboard UI.
|
||||||
- `image/models.yaml` — vLLM model catalog bundled into the image.
|
- `image/models.yaml` — bundled vLLM **launch recipes** (how to launch a known model), NOT the dashboard menu — the menu is the on-disk scan.
|
||||||
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
|
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
|
||||||
|
|||||||
@@ -25,6 +25,22 @@ npm run prettier # prettier --write startos (no semicolons, single quotes, tra
|
|||||||
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
|
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
|
||||||
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
||||||
|
|
||||||
|
## Releasing to Gitea
|
||||||
|
|
||||||
|
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
|
||||||
|
cd package && make x86 # build
|
||||||
|
# 2. commit + push the source change
|
||||||
|
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
|
||||||
|
make install # optional: sideload to your own server (restarts it — go/no-go)
|
||||||
|
# 3. publish the s9pk as a release asset (needs a write-scoped token):
|
||||||
|
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
|
||||||
|
```
|
||||||
|
|
||||||
|
`make release` → `scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
|
||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
|
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
|
||||||
|
|||||||
@@ -0,0 +1,286 @@
|
|||||||
|
"""App-owned settings overlay: the in-dashboard 'gear' knobs.
|
||||||
|
|
||||||
|
Spark Control's *required* wiring — the two Spark IPs and SSH users — is set once
|
||||||
|
via the StartOS "Configure Sparks" action and arrives as env vars. Everything
|
||||||
|
else (ports, container names, support-service hosts, integrations, webhook) is
|
||||||
|
optional and lives here: a small JSON overlay on /data that the dashboard gear
|
||||||
|
reads and writes, so an operator never has to open StartOS actions to tune the
|
||||||
|
cluster. This follows the StartOS 0.4 convention (minimal setup action; routine
|
||||||
|
config in the app's own UI) and stays inside the package's backup volume, so the
|
||||||
|
file is backed up and restored for free.
|
||||||
|
|
||||||
|
Each overlay entry is keyed by the *same env var name* config.Settings already
|
||||||
|
reads, so the overlay is simply an env-var override store. Precedence (see
|
||||||
|
config._effective_env): process env first, this overlay on top — so a knob set
|
||||||
|
in the gear wins, while an un-touched knob falls through to whatever the StartOS
|
||||||
|
action injected, then to the code default.
|
||||||
|
|
||||||
|
First-run migration: when the overlay file doesn't exist yet (e.g. an existing
|
||||||
|
install upgrading into this version), it's seeded from the current env so any
|
||||||
|
value previously set via the StartOS action carries over into the gear with no
|
||||||
|
operator action and nothing lost.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Mapping
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Field metadata drives BOTH the /api/settings response (the front-end renders
|
||||||
|
# the form generically from this) and light server-side validation. `key` is the
|
||||||
|
# env var name; `type` is one of text|int|csv|secret. `secret` values are
|
||||||
|
# write-only — never echoed back to the browser.
|
||||||
|
FIELDS: list[dict] = [
|
||||||
|
# --- vLLM (Spark 1) ---
|
||||||
|
{"group": "vLLM (Spark 1)", "key": "VLLM_PORT", "label": "vLLM port", "type": "int",
|
||||||
|
"placeholder": "8888",
|
||||||
|
"help": "Port your vLLM listens on. Blank ⇒ 8888 (the bundled launch-cluster.sh). Set 8000 for vanilla vLLM, or wherever yours listens."},
|
||||||
|
{"group": "vLLM (Spark 1)", "key": "VLLM_CONTAINER", "label": "vLLM container name", "type": "text",
|
||||||
|
"placeholder": "vllm_node",
|
||||||
|
"help": "Docker container the swappable vLLM runs in. Blank ⇒ vllm_node. The swap log-tail and pre-flight validator exec into it by name."},
|
||||||
|
|
||||||
|
# --- Monitoring ---
|
||||||
|
{"group": "Monitoring", "key": "DISABLED_SERVICES", "label": "Services to hide", "type": "csv",
|
||||||
|
"placeholder": "e.g. parakeet,kokoro",
|
||||||
|
"help": "Comma-separated built-in services your cluster doesn't run, so their tiles are hidden and never probed. Valid: parakeet, kokoro, embeddings, qdrant. Blank ⇒ monitor all."},
|
||||||
|
|
||||||
|
# --- Parakeet (STT) ---
|
||||||
|
{"group": "Parakeet (STT)", "key": "PARAKEET_HOST", "label": "Host", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2",
|
||||||
|
"help": "Host running the Parakeet STT container. Blank ⇒ Spark 2."},
|
||||||
|
{"group": "Parakeet (STT)", "key": "PARAKEET_PORT", "label": "Port", "type": "int",
|
||||||
|
"placeholder": "8000",
|
||||||
|
"help": "Port Parakeet listens on. Blank ⇒ 8000. Set this if you remapped it (e.g. because your vLLM holds 8000)."},
|
||||||
|
{"group": "Parakeet (STT)", "key": "PARAKEET_CONTAINER", "label": "Container name", "type": "text",
|
||||||
|
"placeholder": "parakeet-asr",
|
||||||
|
"help": "Docker container name for Parakeet. Blank ⇒ parakeet-asr."},
|
||||||
|
{"group": "Parakeet (STT)", "key": "PARAKEET_USER", "label": "SSH user", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2 user",
|
||||||
|
"help": "SSH user that owns the Parakeet container. Blank ⇒ your Spark 2 user."},
|
||||||
|
|
||||||
|
# --- Kokoro (TTS) ---
|
||||||
|
{"group": "Kokoro (TTS)", "key": "KOKORO_HOST", "label": "Host", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2",
|
||||||
|
"help": "Host running the Kokoro TTS container. Blank ⇒ Spark 2."},
|
||||||
|
{"group": "Kokoro (TTS)", "key": "KOKORO_PORT", "label": "Port", "type": "int",
|
||||||
|
"placeholder": "8880",
|
||||||
|
"help": "Port Kokoro listens on. Blank ⇒ 8880."},
|
||||||
|
{"group": "Kokoro (TTS)", "key": "KOKORO_CONTAINER", "label": "Container name", "type": "text",
|
||||||
|
"placeholder": "kokoro-tts",
|
||||||
|
"help": "Docker container name for Kokoro. Blank ⇒ kokoro-tts."},
|
||||||
|
{"group": "Kokoro (TTS)", "key": "KOKORO_USER", "label": "SSH user", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2 user",
|
||||||
|
"help": "SSH user that owns the Kokoro container. Blank ⇒ your Spark 2 user."},
|
||||||
|
|
||||||
|
# --- Embeddings ---
|
||||||
|
{"group": "Embeddings", "key": "EMBED_HOST", "label": "Host", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2",
|
||||||
|
"help": "Host running the spark-embed container (bge-m3 + reranker). Blank ⇒ Spark 2."},
|
||||||
|
{"group": "Embeddings", "key": "EMBED_PORT", "label": "Port", "type": "int",
|
||||||
|
"placeholder": "8088",
|
||||||
|
"help": "Port the embedding server listens on. Blank ⇒ 8088."},
|
||||||
|
{"group": "Embeddings", "key": "EMBED_CONTAINER", "label": "Container name", "type": "text",
|
||||||
|
"placeholder": "spark-embed",
|
||||||
|
"help": "Docker container name for the embedding server. Blank ⇒ spark-embed."},
|
||||||
|
{"group": "Embeddings", "key": "EMBED_USER", "label": "SSH user", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2 user",
|
||||||
|
"help": "SSH user that owns the embedding container. Blank ⇒ your Spark 2 user."},
|
||||||
|
|
||||||
|
# --- Qdrant ---
|
||||||
|
{"group": "Qdrant", "key": "QDRANT_HOST", "label": "Host", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2",
|
||||||
|
"help": "Host running the Qdrant vector database. Blank ⇒ Spark 2."},
|
||||||
|
{"group": "Qdrant", "key": "QDRANT_PORT", "label": "Port", "type": "int",
|
||||||
|
"placeholder": "6333",
|
||||||
|
"help": "Port Qdrant's REST API listens on. Blank ⇒ 6333."},
|
||||||
|
{"group": "Qdrant", "key": "QDRANT_CONTAINER", "label": "Container name", "type": "text",
|
||||||
|
"placeholder": "qdrant",
|
||||||
|
"help": "Docker container name for Qdrant. Blank ⇒ qdrant."},
|
||||||
|
{"group": "Qdrant", "key": "QDRANT_USER", "label": "SSH user", "type": "text",
|
||||||
|
"placeholder": "leave blank for Spark 2 user",
|
||||||
|
"help": "SSH user that owns the Qdrant container. Blank ⇒ your Spark 2 user."},
|
||||||
|
{"group": "Qdrant", "key": "QDRANT_COLLECTION", "label": "Default collection", "type": "text",
|
||||||
|
"placeholder": "e.g. crm_chunks",
|
||||||
|
"help": "Collection used by /api/search when a request doesn't name one. Blank ⇒ callers must pass a collection."},
|
||||||
|
|
||||||
|
# --- Integrations ---
|
||||||
|
{"group": "Integrations", "key": "OPEN_WEBUI_URL", "label": "Open WebUI URL", "type": "text",
|
||||||
|
"placeholder": "e.g. https://open-webui.yourserver.local",
|
||||||
|
"help": "If set, the header shows a one-click 'Open chat' button to your Open WebUI."},
|
||||||
|
{"group": "Integrations", "key": "MATRIX_BRIDGE_USER", "label": "matrix-bridge bot SSH user", "type": "text",
|
||||||
|
"placeholder": "e.g. modelo",
|
||||||
|
"help": "SSH user owning the bot's ~/matrix-bridge clone (Spark 2). Set this to show the bot tile (update/restart/logs). Blank ⇒ tile hidden."},
|
||||||
|
{"group": "Integrations", "key": "NGC_API_KEY", "label": "NGC API key", "type": "secret",
|
||||||
|
"placeholder": "starts with nvapi-…",
|
||||||
|
"help": "NVIDIA NGC personal key, needed only to install NIM containers from nvcr.io. Stored on this server."},
|
||||||
|
{"group": "Integrations", "key": "SWAP_WEBHOOK_URL", "label": "Swap webhook URL", "type": "text",
|
||||||
|
"placeholder": "e.g. https://my-service.local/spark-swap",
|
||||||
|
"help": "POSTed a small JSON event (swap_complete / swap_failed) after every model swap, so automation can re-point to the new model. Blank ⇒ disabled."},
|
||||||
|
{"group": "Integrations", "key": "SWAP_WEBHOOK_SECRET", "label": "Swap webhook secret", "type": "secret",
|
||||||
|
"placeholder": "a random shared string",
|
||||||
|
"help": "If set, each webhook is HMAC-signed (X-Spark-Signature) so the receiver can verify it. Blank ⇒ unsigned."},
|
||||||
|
]
|
||||||
|
|
||||||
|
_BY_KEY = {f["key"]: f for f in FIELDS}
|
||||||
|
_SECRET_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "secret")
|
||||||
|
_INT_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "int")
|
||||||
|
# Reject control characters (incl. newlines) — these values flow into env vars,
|
||||||
|
# URLs, and SSH command lines (quoted at the sink, but defence in depth).
|
||||||
|
_BAD_CHARS = re.compile(r"[\x00-\x1f\x7f]")
|
||||||
|
# A secret's value is never echoed back, so a blank submit means "keep the stored
|
||||||
|
# one" (you can't see it to retype it). To actually *remove* a stored secret the
|
||||||
|
# UI sends this sentinel instead of a real value. Surfaced to the front-end via
|
||||||
|
# public_view so the two stay in sync.
|
||||||
|
CLEAR_SENTINEL = "__clear__"
|
||||||
|
|
||||||
|
|
||||||
|
def _path() -> Path:
|
||||||
|
return Path(os.environ.get("APP_SETTINGS_FILE", "/data/app_settings.json"))
|
||||||
|
|
||||||
|
|
||||||
|
def field_keys() -> frozenset[str]:
|
||||||
|
return frozenset(_BY_KEY)
|
||||||
|
|
||||||
|
|
||||||
|
def load_overlay() -> dict[str, str]:
|
||||||
|
"""Return the overlay as {ENV_KEY: value}, filtered to known, non-empty keys.
|
||||||
|
|
||||||
|
Pure read (no side effects) — called on every Settings (re)build, so it must
|
||||||
|
not write. Missing/corrupt file ⇒ {}. The file is tiny."""
|
||||||
|
p = _path()
|
||||||
|
if not p.exists():
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
raw = json.loads(p.read_text())
|
||||||
|
except (ValueError, OSError) as e:
|
||||||
|
log.warning("ignoring unreadable %s: %s", p, e)
|
||||||
|
return {}
|
||||||
|
if not isinstance(raw, dict):
|
||||||
|
return {}
|
||||||
|
return {k: str(v) for k, v in raw.items() if k in _BY_KEY and v not in (None, "")}
|
||||||
|
|
||||||
|
|
||||||
|
def seed_from_env(env: Mapping[str, str]) -> None:
|
||||||
|
"""One-time migration, called once at startup: if no overlay exists yet, seed
|
||||||
|
it from the current env so any optional value previously set via the StartOS
|
||||||
|
action carries into the gear automatically (nothing lost on upgrade). No-op
|
||||||
|
if the file already exists or the env carries no known non-empty knob — a
|
||||||
|
fresh install then starts with no overlay and pure defaults. Values run
|
||||||
|
through the same validation as apply(); a malformed one (e.g. a paste-error
|
||||||
|
port) is skipped rather than written, matching the gear's own guards."""
|
||||||
|
if _path().exists():
|
||||||
|
return
|
||||||
|
seeded: dict[str, str] = {}
|
||||||
|
for k in _BY_KEY:
|
||||||
|
v = env.get(k)
|
||||||
|
if not v:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
cleaned = _validate(k, v)
|
||||||
|
except SettingsError as e:
|
||||||
|
log.warning("skipping invalid env value while seeding overlay: %s", e)
|
||||||
|
continue
|
||||||
|
if cleaned and cleaned != CLEAR_SENTINEL:
|
||||||
|
seeded[k] = cleaned
|
||||||
|
if seeded:
|
||||||
|
_write(seeded)
|
||||||
|
log.info("seeded settings overlay from env (%d keys): %s", len(seeded), _path())
|
||||||
|
|
||||||
|
|
||||||
|
def _write(overlay: dict[str, str]) -> None:
|
||||||
|
p = _path()
|
||||||
|
p.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
# Atomic replace so a crash mid-write never leaves a truncated overlay.
|
||||||
|
fd, tmp = tempfile.mkstemp(dir=str(p.parent), prefix=".app_settings.", suffix=".tmp")
|
||||||
|
try:
|
||||||
|
with os.fdopen(fd, "w") as fh:
|
||||||
|
json.dump(overlay, fh, indent=2, sort_keys=True)
|
||||||
|
os.replace(tmp, p)
|
||||||
|
except BaseException:
|
||||||
|
try:
|
||||||
|
os.unlink(tmp)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
def public_view() -> dict:
|
||||||
|
"""Shape the gear form for the browser: ordered groups of fields with their
|
||||||
|
current overlay value. Secret values are never sent — only a `set` flag."""
|
||||||
|
overlay = load_overlay()
|
||||||
|
groups: list[dict] = []
|
||||||
|
index: dict[str, dict] = {}
|
||||||
|
for f in FIELDS:
|
||||||
|
g = index.get(f["group"])
|
||||||
|
if g is None:
|
||||||
|
g = {"name": f["group"], "fields": []}
|
||||||
|
index[f["group"]] = g
|
||||||
|
groups.append(g)
|
||||||
|
entry = {
|
||||||
|
"key": f["key"],
|
||||||
|
"label": f["label"],
|
||||||
|
"type": f["type"],
|
||||||
|
"placeholder": f.get("placeholder", ""),
|
||||||
|
"help": f.get("help", ""),
|
||||||
|
}
|
||||||
|
if f["type"] == "secret":
|
||||||
|
entry["set"] = bool(overlay.get(f["key"]))
|
||||||
|
else:
|
||||||
|
entry["value"] = overlay.get(f["key"], "")
|
||||||
|
g["fields"].append(entry)
|
||||||
|
return {"groups": groups, "clear_sentinel": CLEAR_SENTINEL}
|
||||||
|
|
||||||
|
|
||||||
|
class SettingsError(ValueError):
|
||||||
|
"""Bad input to apply() — surfaced as 422 by the endpoint."""
|
||||||
|
|
||||||
|
|
||||||
|
def _validate(key: str, value) -> str:
|
||||||
|
"""Clean + validate one value; raise SettingsError on bad input. Returns the
|
||||||
|
stripped string ('' is valid and means 'unset'). The CLEAR_SENTINEL passes
|
||||||
|
through for the caller to interpret (secret removal)."""
|
||||||
|
if key not in _BY_KEY:
|
||||||
|
raise SettingsError(f"unknown setting: {key}")
|
||||||
|
val = ("" if value is None else str(value)).strip()
|
||||||
|
if val == CLEAR_SENTINEL:
|
||||||
|
return val
|
||||||
|
if _BAD_CHARS.search(val):
|
||||||
|
raise SettingsError(f"{key}: control characters are not allowed")
|
||||||
|
if key in _INT_KEYS and val:
|
||||||
|
if not val.isdigit() or not (1 <= int(val) <= 65535):
|
||||||
|
raise SettingsError(f"{key}: must be a port number between 1 and 65535")
|
||||||
|
return val
|
||||||
|
|
||||||
|
|
||||||
|
def apply(updates: Mapping[str, str]) -> dict[str, str]:
|
||||||
|
"""Validate `updates` and merge them into the overlay, then persist.
|
||||||
|
|
||||||
|
Rules per key:
|
||||||
|
- unknown key / bad int / control chars → reject (422, via _validate)
|
||||||
|
- secret + CLEAR_SENTINEL → delete the stored secret
|
||||||
|
- secret + blank value → leave the stored secret unchanged (don't wipe)
|
||||||
|
- non-secret + blank → delete the key (revert to env/default)
|
||||||
|
- otherwise → set the key
|
||||||
|
|
||||||
|
Returns the new overlay. The caller reloads Settings so the change goes live.
|
||||||
|
"""
|
||||||
|
overlay = load_overlay()
|
||||||
|
for key, value in updates.items():
|
||||||
|
val = _validate(key, value)
|
||||||
|
if key in _SECRET_KEYS:
|
||||||
|
if val == CLEAR_SENTINEL:
|
||||||
|
overlay.pop(key, None)
|
||||||
|
elif val:
|
||||||
|
overlay[key] = val
|
||||||
|
# blank secret ⇒ leave the existing value in place
|
||||||
|
elif val and val != CLEAR_SENTINEL:
|
||||||
|
overlay[key] = val
|
||||||
|
else:
|
||||||
|
overlay.pop(key, None)
|
||||||
|
_write(overlay)
|
||||||
|
return overlay
|
||||||
+125
-50
@@ -1,19 +1,52 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import logging
|
||||||
import os
|
import os
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass, fields
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
from typing import Mapping
|
||||||
|
|
||||||
|
from . import app_settings
|
||||||
|
from .shellsafe import validate_container
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def _env(name: str, default: str = "") -> str:
|
def _env(src: Mapping[str, str], name: str, default: str = "") -> str:
|
||||||
return os.environ.get(name, default)
|
return src.get(name, default)
|
||||||
|
|
||||||
|
|
||||||
def _env_int(name: str, default: int) -> int:
|
def _env_container(src: Mapping[str, str], name: str, default: str) -> str:
|
||||||
"""Parse an int env var, falling back to `default` when unset, blank, or
|
"""Resolve a container-name env var, validating it at the config boundary.
|
||||||
malformed. The StartOS Configure panel passes optional numeric fields as an
|
|
||||||
empty string when left blank, so a bare int("") would crash daemon startup."""
|
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
|
||||||
|
the sink — but per the repo's two-layer convention it's also whitelist-checked
|
||||||
|
here. A malformed optional value falls back to `default` rather than crashing
|
||||||
|
daemon startup (mirrors `_env_int`)."""
|
||||||
|
val = src.get(name, "") or default
|
||||||
try:
|
try:
|
||||||
return int(os.environ.get(name, "") or default)
|
return validate_container(val)
|
||||||
|
except ValueError:
|
||||||
|
log.warning("ignoring invalid %s=%r; using %r", name, val, default)
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _env_set(src: Mapping[str, str], name: str) -> frozenset[str]:
|
||||||
|
"""Parse a comma-separated env var into a lowercased frozenset of keys.
|
||||||
|
|
||||||
|
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
|
||||||
|
support service can switch its tile + probes off entirely (rather than have
|
||||||
|
the probe hit whatever else listens on that port — e.g. a vLLM sharing
|
||||||
|
Parakeet's default 8000)."""
|
||||||
|
raw = src.get(name, "")
|
||||||
|
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
|
||||||
|
|
||||||
|
|
||||||
|
def _env_int(src: Mapping[str, str], name: str, default: int) -> int:
|
||||||
|
"""Parse an int env var, falling back to `default` when unset, blank, or
|
||||||
|
malformed. Optional numeric fields arrive as an empty string when left blank,
|
||||||
|
so a bare int("") would crash daemon startup."""
|
||||||
|
try:
|
||||||
|
return int(src.get(name, "") or default)
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
return default
|
return default
|
||||||
|
|
||||||
@@ -33,8 +66,23 @@ def _resolve_models_yaml() -> str:
|
|||||||
return str(candidates[0]) # let load fail with a clear path
|
return str(candidates[0]) # let load fail with a clear path
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
def _effective_env() -> dict[str, str]:
|
||||||
|
"""The env Settings is built from: process env first, the in-app settings
|
||||||
|
overlay on top. The overlay (the dashboard 'gear') is keyed by the same env
|
||||||
|
var names, so a knob set in the UI overrides the value the StartOS action
|
||||||
|
injected — while an un-touched knob keeps falling through to the action's
|
||||||
|
value, then to the code default. See app_settings."""
|
||||||
|
return {**os.environ, **app_settings.load_overlay()}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
class Settings:
|
class Settings:
|
||||||
|
# NOTE: intentionally NOT frozen. There is exactly one Settings instance,
|
||||||
|
# shared by reference across every router closure and manager (build_router,
|
||||||
|
# self.settings = settings). `reload()` mutates it in place so a change saved
|
||||||
|
# via the in-app settings gear goes live for all of them without rebuilding
|
||||||
|
# the app — the only window of inconsistency is the microseconds it takes to
|
||||||
|
# reassign the fields, acceptable for a single-operator config save.
|
||||||
spark1_host: str
|
spark1_host: str
|
||||||
spark1_user: str
|
spark1_user: str
|
||||||
spark2_host: str
|
spark2_host: str
|
||||||
@@ -63,6 +111,8 @@ class Settings:
|
|||||||
ssh_known_hosts: str
|
ssh_known_hosts: str
|
||||||
models_yaml: str
|
models_yaml: str
|
||||||
vllm_port: int
|
vllm_port: int
|
||||||
|
vllm_container: str
|
||||||
|
disabled_services: frozenset[str]
|
||||||
parakeet_port: int
|
parakeet_port: int
|
||||||
kokoro_port: int
|
kokoro_port: int
|
||||||
embed_port: int
|
embed_port: int
|
||||||
@@ -70,61 +120,86 @@ class Settings:
|
|||||||
bind_port: int
|
bind_port: int
|
||||||
open_webui_url: str
|
open_webui_url: str
|
||||||
ngc_api_key: str
|
ngc_api_key: str
|
||||||
|
swap_webhook_url: str
|
||||||
|
swap_webhook_secret: str
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def from_env(cls) -> "Settings":
|
def from_env(cls, src: Mapping[str, str] | None = None) -> "Settings":
|
||||||
spark2_host = _env("SPARK2_HOST")
|
src = _effective_env() if src is None else src
|
||||||
spark2_user = _env("SPARK2_USER")
|
spark2_host = _env(src, "SPARK2_HOST")
|
||||||
|
spark2_user = _env(src, "SPARK2_USER")
|
||||||
# Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden.
|
# Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden.
|
||||||
return cls(
|
return cls(
|
||||||
spark1_host=_env("SPARK1_HOST"),
|
spark1_host=_env(src, "SPARK1_HOST"),
|
||||||
spark1_user=_env("SPARK1_USER"),
|
spark1_user=_env(src, "SPARK1_USER"),
|
||||||
spark2_host=spark2_host,
|
spark2_host=spark2_host,
|
||||||
spark2_user=spark2_user,
|
spark2_user=spark2_user,
|
||||||
parakeet_host=_env("PARAKEET_HOST") or spark2_host,
|
parakeet_host=_env(src, "PARAKEET_HOST") or spark2_host,
|
||||||
parakeet_user=_env("PARAKEET_USER") or spark2_user,
|
parakeet_user=_env(src, "PARAKEET_USER") or spark2_user,
|
||||||
parakeet_container=_env("PARAKEET_CONTAINER") or "parakeet-asr",
|
parakeet_container=_env(src, "PARAKEET_CONTAINER") or "parakeet-asr",
|
||||||
kokoro_host=_env("KOKORO_HOST") or spark2_host,
|
kokoro_host=_env(src, "KOKORO_HOST") or spark2_host,
|
||||||
kokoro_user=_env("KOKORO_USER") or spark2_user,
|
kokoro_user=_env(src, "KOKORO_USER") or spark2_user,
|
||||||
kokoro_container=_env("KOKORO_CONTAINER") or "kokoro-tts",
|
kokoro_container=_env(src, "KOKORO_CONTAINER") or "kokoro-tts",
|
||||||
# Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant
|
# Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant
|
||||||
# (vector storage) default to Spark 2 unless overridden.
|
# (vector storage) default to Spark 2 unless overridden.
|
||||||
embed_host=_env("EMBED_HOST") or spark2_host,
|
embed_host=_env(src, "EMBED_HOST") or spark2_host,
|
||||||
embed_user=_env("EMBED_USER") or spark2_user,
|
embed_user=_env(src, "EMBED_USER") or spark2_user,
|
||||||
embed_container=_env("EMBED_CONTAINER") or "spark-embed",
|
embed_container=_env(src, "EMBED_CONTAINER") or "spark-embed",
|
||||||
qdrant_host=_env("QDRANT_HOST") or spark2_host,
|
qdrant_host=_env(src, "QDRANT_HOST") or spark2_host,
|
||||||
qdrant_user=_env("QDRANT_USER") or spark2_user,
|
qdrant_user=_env(src, "QDRANT_USER") or spark2_user,
|
||||||
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
|
qdrant_container=_env(src, "QDRANT_CONTAINER") or "qdrant",
|
||||||
qdrant_collection=_env("QDRANT_COLLECTION", ""),
|
qdrant_collection=_env(src, "QDRANT_COLLECTION", ""),
|
||||||
# matrix-bridge bot container, driven as its own SSH user (the owner
|
# matrix-bridge bot container, driven as its own SSH user (the owner
|
||||||
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
|
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
|
||||||
# The user is BLANK by default and set via the "Configure Sparks"
|
# The user is BLANK by default and set via the settings gear; leaving
|
||||||
# action; leaving it blank reports the service as unconfigured, which
|
# it blank reports the service as unconfigured, which hides the tile.
|
||||||
# hides the tile. That keeps the shared package portable — a
|
# That keeps the shared package portable — a deployment without the
|
||||||
# deployment without the bot never shows a stray tile or a hardcoded
|
# bot never shows a stray tile or a hardcoded username. Host defaults
|
||||||
# username. Host defaults to Spark 2 (same box); container/dir/branch
|
# to Spark 2 (same box); container/dir/branch are sensible defaults.
|
||||||
# are sensible defaults. All are env-overridable.
|
matrix_bridge_host=_env(src, "MATRIX_BRIDGE_HOST") or spark2_host,
|
||||||
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
|
matrix_bridge_user=_env(src, "MATRIX_BRIDGE_USER"),
|
||||||
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
|
matrix_bridge_container=_env(src, "MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
|
||||||
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
|
matrix_bridge_dir=_env(src, "MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
|
||||||
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
|
matrix_bridge_branch=_env(src, "MATRIX_BRIDGE_BRANCH") or "master",
|
||||||
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
|
|
||||||
# Redaction gateway pseudonym-map store (server-held de-anon key).
|
# Redaction gateway pseudonym-map store (server-held de-anon key).
|
||||||
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
|
redaction_map_db=_env(src, "REDACTION_MAP_DB", "/data/redaction_maps.db"),
|
||||||
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
|
redaction_map_ttl=_env_int(src, "REDACTION_MAP_TTL", 7200),
|
||||||
ssh_key_path=_env("SSH_KEY_PATH"),
|
ssh_key_path=_env(src, "SSH_KEY_PATH"),
|
||||||
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
ssh_known_hosts=_env(src, "SSH_KNOWN_HOSTS"),
|
||||||
models_yaml=_resolve_models_yaml(),
|
models_yaml=_resolve_models_yaml(),
|
||||||
vllm_port=_env_int("VLLM_PORT", 8888),
|
vllm_port=_env_int(src, "VLLM_PORT", 8888),
|
||||||
parakeet_port=_env_int("PARAKEET_PORT", 8000),
|
# Container name for the swappable vLLM on Spark 1. Defaults to the
|
||||||
kokoro_port=_env_int("KOKORO_PORT", 8880),
|
# bundled launch-cluster.sh container; override if you named yours
|
||||||
embed_port=_env_int("EMBED_PORT", 8088),
|
# something else (the swap log-tail and pre-flight validator exec
|
||||||
qdrant_port=_env_int("QDRANT_PORT", 6333),
|
# into it by name).
|
||||||
bind_port=_env_int("BIND_PORT", 9999),
|
vllm_container=_env_container(src, "VLLM_CONTAINER", "vllm_node"),
|
||||||
open_webui_url=_env("OPEN_WEBUI_URL", ""),
|
# Built-in support-service keys (parakeet, kokoro, embeddings,
|
||||||
ngc_api_key=_env("NGC_API_KEY", ""),
|
# qdrant) the deployment doesn't run — hidden from the dashboard and
|
||||||
|
# never probed.
|
||||||
|
disabled_services=_env_set(src, "DISABLED_SERVICES"),
|
||||||
|
parakeet_port=_env_int(src, "PARAKEET_PORT", 8000),
|
||||||
|
kokoro_port=_env_int(src, "KOKORO_PORT", 8880),
|
||||||
|
embed_port=_env_int(src, "EMBED_PORT", 8088),
|
||||||
|
qdrant_port=_env_int(src, "QDRANT_PORT", 6333),
|
||||||
|
bind_port=_env_int(src, "BIND_PORT", 9999),
|
||||||
|
open_webui_url=_env(src, "OPEN_WEBUI_URL", ""),
|
||||||
|
ngc_api_key=_env(src, "NGC_API_KEY", ""),
|
||||||
|
# Coordination layer: fire a swap-lifecycle webhook to this URL so
|
||||||
|
# downstream consumers re-point their model config on a swap. Blank
|
||||||
|
# ⇒ disabled. The optional secret HMAC-signs the body (X-Spark-Signature).
|
||||||
|
swap_webhook_url=_env(src, "SWAP_WEBHOOK_URL", ""),
|
||||||
|
swap_webhook_secret=_env(src, "SWAP_WEBHOOK_SECRET", ""),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def reload(self) -> None:
|
||||||
|
"""Recompute every field from the current env + settings overlay and
|
||||||
|
assign it onto this same instance, so all holders of the reference see
|
||||||
|
the change without an app restart. Called after the gear writes the
|
||||||
|
overlay (see server.post_settings)."""
|
||||||
|
fresh = Settings.from_env()
|
||||||
|
for f in fields(self):
|
||||||
|
setattr(self, f.name, getattr(fresh, f.name))
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def configured(self) -> bool:
|
def configured(self) -> bool:
|
||||||
return bool(self.spark1_host)
|
return bool(self.spark1_host)
|
||||||
|
|||||||
@@ -0,0 +1,350 @@
|
|||||||
|
"""Cluster-coordination layer: the GPU swap lock, swap-event webhook, and the
|
||||||
|
read-only schedule registry.
|
||||||
|
|
||||||
|
Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring
|
||||||
|
business pipelines live in separate services that *call* the swap API. These
|
||||||
|
three primitives add the *safety* layer around that:
|
||||||
|
|
||||||
|
- **Swap lock** — a TTL-bounded reservation of the swap path. An external
|
||||||
|
scheduler acquires it before swapping; while held by someone else the
|
||||||
|
dashboard's manual swap is refused (enforced in the swap endpoint, not
|
||||||
|
advisory). Holder name is descriptive; the returned token is the secret that
|
||||||
|
authorises a swap or a release.
|
||||||
|
- **Webhook** — fires `swap_complete` / `swap_failed` to a configurable URL so
|
||||||
|
downstream consumers re-point their provider config when the running model
|
||||||
|
changes. Optionally HMAC-signed.
|
||||||
|
- **Schedule registry** — a read-only view the dashboard surfaces, *registered
|
||||||
|
by* external schedulers. Spark Control stores what it's told; it does not own
|
||||||
|
or execute any schedule.
|
||||||
|
|
||||||
|
All state is in-memory (mirroring the swap/download/NIM job managers). On a
|
||||||
|
restart the lock resets to *unlocked* — the available-by-default failure mode;
|
||||||
|
the swap manager's own in-progress guard still prevents two swaps at once —
|
||||||
|
and schedulers re-register their schedules.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import hashlib
|
||||||
|
import hmac
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# A lock reserves the GPU for a window; clamp the TTL so a buggy client can
|
||||||
|
# neither pin the cluster forever nor take a zero-length (useless) lock.
|
||||||
|
LOCK_TTL_MIN = 1
|
||||||
|
LOCK_TTL_MAX = 86_400 # 24h
|
||||||
|
LOCK_TTL_DEFAULT = 900 # 15 min
|
||||||
|
|
||||||
|
# Schedule ids are reflected to the dashboard and used as a URL path segment on
|
||||||
|
# delete, so a caller-supplied id is whitelist-checked. Generated ids are hex.
|
||||||
|
_SCHEDULE_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
|
||||||
|
|
||||||
|
|
||||||
|
def valid_schedule_id(value: str) -> bool:
|
||||||
|
"""Whitelist check for a caller-supplied schedule id (register and delete)."""
|
||||||
|
return bool(_SCHEDULE_ID_RE.match(value or ""))
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> datetime:
|
||||||
|
return datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
|
||||||
|
def _iso(dt: datetime) -> str:
|
||||||
|
return dt.isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------- swap lock ----
|
||||||
|
|
||||||
|
class LockHeld(Exception):
|
||||||
|
"""The lock is held by a different holder. Carries the public lock state so
|
||||||
|
the endpoint can return holder + expiry in the 409 body."""
|
||||||
|
|
||||||
|
def __init__(self, state: dict) -> None:
|
||||||
|
self.state = state
|
||||||
|
super().__init__("swap lock is held by another holder")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LockState:
|
||||||
|
holder: str
|
||||||
|
token: str
|
||||||
|
acquired_at: datetime
|
||||||
|
expires_at: datetime
|
||||||
|
note: str = ""
|
||||||
|
|
||||||
|
def public(self, now: datetime) -> dict:
|
||||||
|
"""Token-free view safe to expose on GET / in error bodies."""
|
||||||
|
return {
|
||||||
|
"held": True,
|
||||||
|
"holder": self.holder,
|
||||||
|
"acquired_at": _iso(self.acquired_at),
|
||||||
|
"expires_at": _iso(self.expires_at),
|
||||||
|
"seconds_remaining": max(0, int((self.expires_at - now).total_seconds())),
|
||||||
|
"note": self.note,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class SwapLockManager:
|
||||||
|
"""In-memory, TTL-bounded reservation of the GPU swap path.
|
||||||
|
|
||||||
|
`now` is injectable on every method purely so the expiry logic is testable
|
||||||
|
without sleeping; production calls omit it and get wall-clock UTC.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._lock: Optional[LockState] = None
|
||||||
|
|
||||||
|
def _active(self, now: Optional[datetime] = None) -> Optional[LockState]:
|
||||||
|
"""The current lock if one is held and unexpired; lazily clears an
|
||||||
|
expired lock so it never lingers."""
|
||||||
|
now = now or _now()
|
||||||
|
if self._lock is not None and self._lock.expires_at <= now:
|
||||||
|
self._lock = None
|
||||||
|
return self._lock
|
||||||
|
|
||||||
|
def status(self, now: Optional[datetime] = None) -> dict:
|
||||||
|
now = now or _now()
|
||||||
|
active = self._active(now)
|
||||||
|
return active.public(now) if active else {"held": False}
|
||||||
|
|
||||||
|
def acquire(
|
||||||
|
self,
|
||||||
|
holder: str,
|
||||||
|
ttl_seconds: Optional[int] = None,
|
||||||
|
note: str = "",
|
||||||
|
token: Optional[str] = None,
|
||||||
|
*,
|
||||||
|
now: Optional[datetime] = None,
|
||||||
|
) -> LockState:
|
||||||
|
"""Acquire a free lock (new token), or extend one already held by
|
||||||
|
presenting its token. A request without the token is refused even if the
|
||||||
|
holder name matches — the name is descriptive, the token is the secret.
|
||||||
|
"""
|
||||||
|
now = now or _now()
|
||||||
|
holder = (holder or "").strip()
|
||||||
|
if not holder:
|
||||||
|
raise ValueError("holder is required")
|
||||||
|
ttl = ttl_seconds if ttl_seconds is not None else LOCK_TTL_DEFAULT
|
||||||
|
try:
|
||||||
|
ttl = int(ttl)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
ttl = LOCK_TTL_DEFAULT
|
||||||
|
ttl = max(LOCK_TTL_MIN, min(LOCK_TTL_MAX, ttl))
|
||||||
|
|
||||||
|
active = self._active(now)
|
||||||
|
if active is not None:
|
||||||
|
# Held — only the token-holder may extend/re-acquire.
|
||||||
|
if not (token and hmac.compare_digest(active.token, token)):
|
||||||
|
raise LockHeld(active.public(now))
|
||||||
|
self._lock = LockState(
|
||||||
|
holder=holder or active.holder,
|
||||||
|
token=active.token,
|
||||||
|
acquired_at=active.acquired_at,
|
||||||
|
expires_at=now + timedelta(seconds=ttl),
|
||||||
|
note=note or active.note,
|
||||||
|
)
|
||||||
|
return self._lock
|
||||||
|
|
||||||
|
self._lock = LockState(
|
||||||
|
holder=holder,
|
||||||
|
token=uuid.uuid4().hex,
|
||||||
|
acquired_at=now,
|
||||||
|
expires_at=now + timedelta(seconds=ttl),
|
||||||
|
note=note,
|
||||||
|
)
|
||||||
|
return self._lock
|
||||||
|
|
||||||
|
def verify(self, token: Optional[str], now: Optional[datetime] = None) -> bool:
|
||||||
|
"""True iff `token` matches the currently-active lock."""
|
||||||
|
active = self._active(now)
|
||||||
|
return bool(active and token and hmac.compare_digest(active.token, token))
|
||||||
|
|
||||||
|
def is_blocked_by(self, token: Optional[str], now: Optional[datetime] = None) -> Optional[dict]:
|
||||||
|
"""Single-read swap gate. Returns the public lock state if an active
|
||||||
|
lock blocks a swap carrying this token, else None. Does exactly one
|
||||||
|
`_active()` read so the decision can't straddle a TTL expiry the way a
|
||||||
|
separate status()+verify() pair could (which, at the expiry tick, would
|
||||||
|
spuriously refuse a swap that should now be allowed)."""
|
||||||
|
now = now or _now()
|
||||||
|
active = self._active(now)
|
||||||
|
if active is None:
|
||||||
|
return None
|
||||||
|
if token and hmac.compare_digest(active.token, token):
|
||||||
|
return None
|
||||||
|
return active.public(now)
|
||||||
|
|
||||||
|
def release(
|
||||||
|
self,
|
||||||
|
token: Optional[str] = None,
|
||||||
|
*,
|
||||||
|
force: bool = False,
|
||||||
|
now: Optional[datetime] = None,
|
||||||
|
) -> bool:
|
||||||
|
"""Release the lock. Returns False if nothing was held. Requires the
|
||||||
|
matching token unless `force` (the human override from the dashboard)."""
|
||||||
|
active = self._active(now)
|
||||||
|
if active is None:
|
||||||
|
return False
|
||||||
|
if not force and not self.verify(token, now):
|
||||||
|
raise PermissionError("token does not hold the lock")
|
||||||
|
self._lock = None
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------- webhook ----
|
||||||
|
|
||||||
|
def build_webhook_payload(
|
||||||
|
*,
|
||||||
|
event: str,
|
||||||
|
job_id: str,
|
||||||
|
model_key: str,
|
||||||
|
state: str,
|
||||||
|
returncode: Optional[int],
|
||||||
|
started_at: Optional[str],
|
||||||
|
finished_at: Optional[str],
|
||||||
|
dry_run: bool,
|
||||||
|
) -> dict:
|
||||||
|
return {
|
||||||
|
"event": event, # swap_complete | swap_failed
|
||||||
|
"job_id": job_id,
|
||||||
|
"model_key": model_key,
|
||||||
|
"state": state,
|
||||||
|
"returncode": returncode,
|
||||||
|
"started_at": started_at,
|
||||||
|
"finished_at": finished_at,
|
||||||
|
"dry_run": dry_run,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def sign_payload(secret: str, body: bytes) -> str:
|
||||||
|
"""`X-Spark-Signature` value: sha256 HMAC of the exact JSON body the
|
||||||
|
consumer receives, so they can recompute and trust it."""
|
||||||
|
return "sha256=" + hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
class WebhookNotifier:
|
||||||
|
"""Fire-and-forget POST of swap-lifecycle events. A webhook failure is
|
||||||
|
logged and swallowed — it must never affect the swap outcome."""
|
||||||
|
|
||||||
|
def __init__(self, url: str, secret: str = "", timeout: float = 5.0) -> None:
|
||||||
|
self.url = (url or "").strip()
|
||||||
|
self.secret = secret or ""
|
||||||
|
self.timeout = timeout
|
||||||
|
|
||||||
|
def update(self, url: str, secret: str = "") -> None:
|
||||||
|
"""Re-point after a live settings change. The notifier holds snapshot
|
||||||
|
copies of these two fields (not the Settings object), so Settings.reload()
|
||||||
|
can't reach it — server.post_settings calls this explicitly so editing the
|
||||||
|
webhook URL/secret in the dashboard gear takes effect without a restart."""
|
||||||
|
self.url = (url or "").strip()
|
||||||
|
self.secret = secret or ""
|
||||||
|
|
||||||
|
@property
|
||||||
|
def enabled(self) -> bool:
|
||||||
|
return bool(self.url)
|
||||||
|
|
||||||
|
async def fire(self, event: str, payload: dict) -> None:
|
||||||
|
if not self.enabled:
|
||||||
|
return
|
||||||
|
body = json.dumps(payload).encode()
|
||||||
|
headers = {
|
||||||
|
"content-type": "application/json",
|
||||||
|
"user-agent": "spark-control-webhook",
|
||||||
|
"x-spark-event": event,
|
||||||
|
}
|
||||||
|
if self.secret:
|
||||||
|
headers["x-spark-signature"] = sign_payload(self.secret, body)
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=self.timeout) as client:
|
||||||
|
await client.post(self.url, content=body, headers=headers)
|
||||||
|
except Exception as e: # noqa: BLE001 — best-effort, never propagate
|
||||||
|
log.warning("swap webhook to %s failed: %s", self.url, e)
|
||||||
|
|
||||||
|
|
||||||
|
# -------------------------------------------------------- schedule registry ----
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ScheduleEntry:
|
||||||
|
id: str
|
||||||
|
name: str
|
||||||
|
owner: str = ""
|
||||||
|
cron: str = ""
|
||||||
|
next_run: str = ""
|
||||||
|
description: str = ""
|
||||||
|
registered_at: str = ""
|
||||||
|
updated_at: str = ""
|
||||||
|
|
||||||
|
def public(self) -> dict:
|
||||||
|
return {
|
||||||
|
"id": self.id,
|
||||||
|
"name": self.name,
|
||||||
|
"owner": self.owner,
|
||||||
|
"cron": self.cron,
|
||||||
|
"next_run": self.next_run,
|
||||||
|
"description": self.description,
|
||||||
|
"registered_at": self.registered_at,
|
||||||
|
"updated_at": self.updated_at,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ScheduleRegistry:
|
||||||
|
"""What external schedulers tell us about their cron jobs. Read-only from the
|
||||||
|
dashboard's side; Spark Control never executes any of it."""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._items: dict[str, ScheduleEntry] = {}
|
||||||
|
|
||||||
|
def list(self) -> list[dict]:
|
||||||
|
return [e.public() for e in self._items.values()]
|
||||||
|
|
||||||
|
def register(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
name: str,
|
||||||
|
id: Optional[str] = None,
|
||||||
|
owner: str = "",
|
||||||
|
cron: str = "",
|
||||||
|
next_run: str = "",
|
||||||
|
description: str = "",
|
||||||
|
) -> ScheduleEntry:
|
||||||
|
name = (name or "").strip()
|
||||||
|
if not name:
|
||||||
|
raise ValueError("name is required")
|
||||||
|
if id is not None:
|
||||||
|
id = id.strip()
|
||||||
|
if id and not valid_schedule_id(id):
|
||||||
|
raise ValueError("id must match [A-Za-z0-9_.-] (max 64 chars)")
|
||||||
|
ts = _iso(_now())
|
||||||
|
existing = self._items.get(id) if id else None
|
||||||
|
if existing is not None:
|
||||||
|
existing.name = name
|
||||||
|
existing.owner = owner.strip()
|
||||||
|
existing.cron = cron
|
||||||
|
existing.next_run = next_run
|
||||||
|
existing.description = description
|
||||||
|
existing.updated_at = ts
|
||||||
|
return existing
|
||||||
|
sid = id or uuid.uuid4().hex[:8]
|
||||||
|
entry = ScheduleEntry(
|
||||||
|
id=sid,
|
||||||
|
name=name,
|
||||||
|
owner=owner.strip(),
|
||||||
|
cron=cron,
|
||||||
|
next_run=next_run,
|
||||||
|
description=description,
|
||||||
|
registered_at=ts,
|
||||||
|
updated_at=ts,
|
||||||
|
)
|
||||||
|
self._items[sid] = entry
|
||||||
|
return entry
|
||||||
|
|
||||||
|
def delete(self, schedule_id: str) -> bool:
|
||||||
|
return self._items.pop(schedule_id, None) is not None
|
||||||
@@ -10,6 +10,17 @@ Format:
|
|||||||
port: 8001
|
port: 8001
|
||||||
health_path: /health
|
health_path: /health
|
||||||
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
||||||
|
|
||||||
|
A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
|
||||||
|
the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
|
||||||
|
probed via /v1/models plus container state and start/stop/restart:
|
||||||
|
custom:
|
||||||
|
- key: vllm-spark2
|
||||||
|
kind: vllm
|
||||||
|
host: <spark-2-ip>
|
||||||
|
user: <ssh-user>
|
||||||
|
container: vllm_node
|
||||||
|
port: 8000
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import os
|
import os
|
||||||
|
|||||||
@@ -377,6 +377,10 @@ class DeepHealth:
|
|||||||
async def run_all(self) -> dict[str, ProbeResult]:
|
async def run_all(self) -> dict[str, ProbeResult]:
|
||||||
results = {}
|
results = {}
|
||||||
for name in self.PROBES:
|
for name in self.PROBES:
|
||||||
|
# Don't deep-probe a service the deployment switched off — its port
|
||||||
|
# may be answered by something else (e.g. a vLLM on Parakeet's 8000).
|
||||||
|
if name in self.settings.disabled_services:
|
||||||
|
continue
|
||||||
results[name] = await self.run_one(name)
|
results[name] = await self.run_one(name)
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,209 @@
|
|||||||
|
"""Disk-driven model menu + launch-recipe inference.
|
||||||
|
|
||||||
|
The dashboard's model list is whatever is actually downloaded on the Sparks
|
||||||
|
(see `disk.list_cached_models`), NOT a hard-coded catalog. The bundled/overridden
|
||||||
|
catalog entries are *launch recipes*: matched to an on-disk model by repo, they
|
||||||
|
say HOW to launch it. A completed model on disk with no matching recipe shows up
|
||||||
|
as `needs_setup` — the first switch reads its `config.json`, proposes a recipe
|
||||||
|
(`infer_recipe`) the operator confirms once, and that confirmed recipe is saved
|
||||||
|
to /data so it's a normal card from then on.
|
||||||
|
|
||||||
|
Why a recipe layer at all, if the menu is the disk? Because a folder on disk
|
||||||
|
doesn't say how to launch it: the per-family parsers (`--reasoning-parser`,
|
||||||
|
`--tool-call-parser`), the MoE backend (some Gemma MoE checkpoints need
|
||||||
|
`marlin` on GB10), and solo-vs-cluster topology can't be read off a directory.
|
||||||
|
We infer a best guess from the model's own config + size, but the operator
|
||||||
|
confirms it — a wrong guess is cheap, a wrong launch is not.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .disk import list_cached_models, probe_disk
|
||||||
|
from .overrides import extract_knobs_from_args
|
||||||
|
|
||||||
|
|
||||||
|
# A model whose weights exceed this can't fit one Spark's 128 GB beside a KV
|
||||||
|
# cache, so it must shard across both via Ray. A heuristic prefill only — the
|
||||||
|
# operator confirms mode in the setup form, so the exact cutoff isn't critical.
|
||||||
|
SINGLE_SPARK_BYTES = 115 * 1000 ** 3
|
||||||
|
|
||||||
|
# Generic knob defaults applied to every inferred recipe (the operator can tweak
|
||||||
|
# these in the setup form). Family-specific flags (parsers, MoE backend) are
|
||||||
|
# layered on separately by `_detect_family`.
|
||||||
|
_COMMON_KNOBS = {
|
||||||
|
"max_model_len": 32768,
|
||||||
|
"gpu_memory_utilization": 0.85,
|
||||||
|
"fastsafetensors": True,
|
||||||
|
"prefix_caching": True,
|
||||||
|
"kv_cache_dtype": "fp8",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def repo_to_key(repo: str) -> str:
|
||||||
|
"""Stable, URL-safe menu key for a discovered model with no recipe key yet.
|
||||||
|
|
||||||
|
'RedHatAI/Qwen3.6-35B-A3B-NVFP4' -> 'redhatai-qwen3-6-35b-a3b-nvfp4'. The same
|
||||||
|
slug is used by the menu, the setup form, and `_identify_current_model`, so a
|
||||||
|
loaded-but-unconfigured model still highlights as active."""
|
||||||
|
return re.sub(r"[^a-z0-9_-]+", "-", repo.lower()).strip("-")
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_family(config: dict) -> tuple[str, list[str], list[str]]:
|
||||||
|
"""Return (family_label, vllm_flags, capabilities) inferred from config.json.
|
||||||
|
|
||||||
|
Only family-specific, non-knob flags (parsers, MoE backend) go in vllm_flags;
|
||||||
|
generic knob defaults are handled by the caller. Best-effort and operator-
|
||||||
|
confirmed, so a wrong guess is cheap."""
|
||||||
|
arch = " ".join(config.get("architectures") or [])
|
||||||
|
mtype = str(config.get("model_type") or "")
|
||||||
|
s = (arch + " " + mtype).lower()
|
||||||
|
is_moe = (
|
||||||
|
"moe" in s
|
||||||
|
or any(config.get(k) for k in ("num_experts", "n_routed_experts", "num_local_experts"))
|
||||||
|
)
|
||||||
|
is_vision = (
|
||||||
|
"conditionalgeneration" in s
|
||||||
|
or "vision" in s
|
||||||
|
or "vlforcausallm" in s
|
||||||
|
or "vision_config" in config
|
||||||
|
or "image_token_index" in config
|
||||||
|
)
|
||||||
|
flags: list[str] = []
|
||||||
|
caps: list[str] = []
|
||||||
|
label = "Generic"
|
||||||
|
if mtype.startswith("qwen3") or "qwen3" in s:
|
||||||
|
label = "Qwen3 (MoE)" if is_moe else "Qwen3"
|
||||||
|
flags.append("--reasoning-parser=qwen3")
|
||||||
|
caps.append("reasoning")
|
||||||
|
if is_moe:
|
||||||
|
flags.append("--moe_backend=flashinfer_cutlass")
|
||||||
|
elif "gemma" in s:
|
||||||
|
label = "Gemma (MoE)" if is_moe else "Gemma"
|
||||||
|
flags += ["--reasoning-parser=gemma4", "--tool-call-parser=gemma4", "--enable-auto-tool-choice"]
|
||||||
|
caps += ["reasoning", "tools"]
|
||||||
|
if is_moe:
|
||||||
|
# The fast flashinfer/CUTLASS FP4 path errors on GB10 for Gemma MoE;
|
||||||
|
# marlin is the working fallback (see the Gemma 26B trial notes).
|
||||||
|
flags.append("--moe_backend=marlin")
|
||||||
|
if is_vision and "vision" not in caps:
|
||||||
|
caps.append("vision")
|
||||||
|
return label, flags, caps
|
||||||
|
|
||||||
|
|
||||||
|
def _infer_mode(total_bytes: int, on_host_count: int) -> str:
|
||||||
|
"""Solo unless the weights are present on both Sparks or too big for one."""
|
||||||
|
if on_host_count >= 2 or total_bytes > SINGLE_SPARK_BYTES:
|
||||||
|
return "cluster"
|
||||||
|
return "solo"
|
||||||
|
|
||||||
|
|
||||||
|
def infer_recipe(repo: str, config: dict, total_bytes: int, on_host_count: int) -> dict:
|
||||||
|
"""Propose a launch recipe for a discovered model — prefills the setup form."""
|
||||||
|
label, flags, caps = _detect_family(config or {})
|
||||||
|
mode = _infer_mode(total_bytes, on_host_count)
|
||||||
|
vllm_args = list(flags)
|
||||||
|
vllm_args.append("--max-num-batched-tokens=16384")
|
||||||
|
knobs = dict(_COMMON_KNOBS)
|
||||||
|
if mode == "cluster":
|
||||||
|
# Large models shard across both Sparks via Ray; leave more headroom.
|
||||||
|
vllm_args += ["-tp=2", "--distributed-executor-backend=ray"]
|
||||||
|
knobs["gpu_memory_utilization"] = 0.7
|
||||||
|
return {
|
||||||
|
"key": repo_to_key(repo),
|
||||||
|
"repo": repo,
|
||||||
|
"display_name": repo.split("/")[-1],
|
||||||
|
"mode": mode,
|
||||||
|
"capabilities": caps,
|
||||||
|
"vllm_args": vllm_args,
|
||||||
|
"knobs": knobs,
|
||||||
|
"family": label,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _menu_entry_from_recipe(m, *, on_disk: bool, total_bytes: int, per_host: list[dict]) -> dict:
|
||||||
|
d = m.model_dump()
|
||||||
|
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
|
||||||
|
d["needs_setup"] = False
|
||||||
|
d["on_disk"] = on_disk
|
||||||
|
d["total_bytes"] = total_bytes
|
||||||
|
d["per_host"] = per_host
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
async def build_menu(settings: Settings, catalog) -> dict[str, dict]:
|
||||||
|
"""The disk-driven model menu: every completed model on the Sparks, annotated
|
||||||
|
with its launch recipe (matched by repo) or flagged `needs_setup` if none.
|
||||||
|
|
||||||
|
Two SSH scans total (one per Spark), run in parallel — much cheaper than the
|
||||||
|
old per-recipe disk probe. A host that errors is skipped, not fatal."""
|
||||||
|
hosts = [(settings.spark1_host, settings.spark1_user)]
|
||||||
|
if settings.spark2_host:
|
||||||
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
scans = await asyncio.gather(
|
||||||
|
*(list_cached_models(h, u, settings) for h, u in hosts),
|
||||||
|
return_exceptions=True,
|
||||||
|
)
|
||||||
|
by_repo: dict[str, dict] = {}
|
||||||
|
for (h, _u), res in zip(hosts, scans):
|
||||||
|
if isinstance(res, Exception):
|
||||||
|
continue
|
||||||
|
for repo, size, complete in res:
|
||||||
|
e = by_repo.setdefault(repo, {"total_bytes": 0, "per_host": [], "complete": False})
|
||||||
|
e["total_bytes"] += size
|
||||||
|
e["per_host"].append({"host": h, "size_bytes": size})
|
||||||
|
e["complete"] = e["complete"] or complete
|
||||||
|
|
||||||
|
recipe_by_repo = {m.repo: (k, m) for k, m in catalog.models.items() if m.repo}
|
||||||
|
|
||||||
|
menu: dict[str, dict] = {}
|
||||||
|
for repo, info in by_repo.items():
|
||||||
|
# Skip half-fetched / corrupt caches (no finished snapshot) — they'd show
|
||||||
|
# as broken cards. In-flight downloads surface in the download panel.
|
||||||
|
if not info["complete"]:
|
||||||
|
continue
|
||||||
|
if repo in recipe_by_repo:
|
||||||
|
key, m = recipe_by_repo[repo]
|
||||||
|
menu[key] = _menu_entry_from_recipe(
|
||||||
|
m, on_disk=True, total_bytes=info["total_bytes"], per_host=info["per_host"]
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
key = repo_to_key(repo)
|
||||||
|
menu[key] = {
|
||||||
|
"display_name": repo.split("/")[-1],
|
||||||
|
"repo": repo,
|
||||||
|
"local_path": None,
|
||||||
|
"size_gb": round(info["total_bytes"] / 1e9, 1),
|
||||||
|
"mode": _infer_mode(info["total_bytes"], len(info["per_host"])),
|
||||||
|
"capabilities": [],
|
||||||
|
"expected_ready_seconds": 300,
|
||||||
|
"vllm_args": [],
|
||||||
|
"description": None,
|
||||||
|
"knobs": None,
|
||||||
|
"custom": False,
|
||||||
|
"needs_setup": True,
|
||||||
|
"effective_knobs": {},
|
||||||
|
"on_disk": True,
|
||||||
|
"total_bytes": info["total_bytes"],
|
||||||
|
"per_host": info["per_host"],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Local/fine-tuned recipes live as a directory, not an HF cache entry — probe
|
||||||
|
# each by path and include it if present. Their keys are unique catalog keys
|
||||||
|
# (and local models carry repo="" per ModelDef), so they never collide with a
|
||||||
|
# discovered repo's slug or an HF recipe key above.
|
||||||
|
for key, m in catalog.models.items():
|
||||||
|
if not m.local_path:
|
||||||
|
continue
|
||||||
|
st = await probe_disk(m.repo, m.mode, settings, local_path=m.local_path)
|
||||||
|
if not st.on_disk:
|
||||||
|
continue
|
||||||
|
menu[key] = _menu_entry_from_recipe(
|
||||||
|
m,
|
||||||
|
on_disk=True,
|
||||||
|
total_bytes=st.total_bytes,
|
||||||
|
per_host=[{"host": r.host, "size_bytes": r.size_bytes} for r in st.per_host if r.on_disk],
|
||||||
|
)
|
||||||
|
|
||||||
|
return menu
|
||||||
+129
-6
@@ -10,11 +10,13 @@ model or one tied to an in-flight swap/download.
|
|||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import asyncio
|
import asyncio
|
||||||
|
import json
|
||||||
import re
|
import re
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
@@ -35,6 +37,87 @@ def repo_to_cache_dirname(repo: str) -> str:
|
|||||||
return dn
|
return dn
|
||||||
|
|
||||||
|
|
||||||
|
def cache_dirname_to_repo(dirname: str) -> Optional[str]:
|
||||||
|
"""Inverse of `repo_to_cache_dirname`: 'models--org--name' -> 'org/name'.
|
||||||
|
|
||||||
|
A repo has exactly one '/', so the org is the first '--'-segment and the name
|
||||||
|
is everything after (names may themselves contain single dashes). Returns
|
||||||
|
None for anything that isn't a model cache dir."""
|
||||||
|
if not dirname.startswith("models--"):
|
||||||
|
return None
|
||||||
|
parts = dirname[len("models--"):].split("--")
|
||||||
|
if len(parts) < 2 or not parts[0] or not parts[1]:
|
||||||
|
return None
|
||||||
|
return f"{parts[0]}/{'--'.join(parts[1:])}"
|
||||||
|
|
||||||
|
|
||||||
|
def parse_cache_listing(out: str) -> list[tuple[str, int, bool]]:
|
||||||
|
"""Parse the 'size|complete|dirname' lines from `list_cached_models`'s scan.
|
||||||
|
|
||||||
|
Returns [(repo, size_bytes, complete), ...], skipping non-model lines. Pure
|
||||||
|
function so the parsing is unit-testable without SSH."""
|
||||||
|
items: list[tuple[str, int, bool]] = []
|
||||||
|
for line in out.splitlines():
|
||||||
|
line = line.strip()
|
||||||
|
if line.count("|") < 2:
|
||||||
|
continue
|
||||||
|
size_s, complete_s, dirname = line.split("|", 2)
|
||||||
|
repo = cache_dirname_to_repo(dirname.strip())
|
||||||
|
if not repo:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
size = int(size_s)
|
||||||
|
except ValueError:
|
||||||
|
size = 0
|
||||||
|
items.append((repo, size, complete_s.strip() == "1"))
|
||||||
|
return items
|
||||||
|
|
||||||
|
|
||||||
|
async def list_cached_models(host: str, user: str, settings: Settings) -> list[tuple[str, int, bool]]:
|
||||||
|
"""Enumerate every Hugging Face model cached on a host: (repo, size_bytes, complete).
|
||||||
|
|
||||||
|
'complete' = the cache has at least one snapshot carrying a config.json (a
|
||||||
|
finished download, not a half-fetched/corrupt dir). One SSH round-trip; the
|
||||||
|
glob's no-match case is handled by the `[ -d ]` guard."""
|
||||||
|
if not host or not user:
|
||||||
|
return []
|
||||||
|
cmd = (
|
||||||
|
'HUB="$HOME/.cache/huggingface/hub"; '
|
||||||
|
'for d in "$HUB"/models--*; do '
|
||||||
|
'[ -d "$d" ] || continue; '
|
||||||
|
'n=$(basename "$d"); '
|
||||||
|
'sz=$(du -sb "$d" 2>/dev/null | cut -f1); sz=${sz:-0}; '
|
||||||
|
'if ls "$d"/snapshots/*/config.json >/dev/null 2>&1; then c=1; else c=0; fi; '
|
||||||
|
'echo "${sz}|${c}|${n}"; '
|
||||||
|
'done'
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=30.0)
|
||||||
|
if rc != 0:
|
||||||
|
return []
|
||||||
|
return parse_cache_listing(out)
|
||||||
|
|
||||||
|
|
||||||
|
async def read_model_config(host: str, user: str, repo: str, settings: Settings) -> Optional[dict]:
|
||||||
|
"""Read a cached model's config.json (first snapshot) for launch inference.
|
||||||
|
|
||||||
|
Returns the parsed dict, or None if absent/unreadable. The dirname is
|
||||||
|
whitelisted (repo_to_cache_dirname) so it's safe to embed unquoted."""
|
||||||
|
if not host or not user:
|
||||||
|
return None
|
||||||
|
dn = repo_to_cache_dirname(repo)
|
||||||
|
cmd = (
|
||||||
|
f'D=$(ls -d "$HOME/.cache/huggingface/hub/{dn}/snapshots/"*/ 2>/dev/null | head -1); '
|
||||||
|
f'[ -n "$D" ] && cat "${{D}}config.json" 2>/dev/null'
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
||||||
|
if rc != 0 or not out.strip():
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return json.loads(out)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class HostDiskResult:
|
class HostDiskResult:
|
||||||
host: str
|
host: str
|
||||||
@@ -76,16 +159,52 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
|
|||||||
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||||
|
|
||||||
|
|
||||||
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
|
||||||
"""Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
|
"""Return whether a local model directory exists on this host and its size.
|
||||||
|
|
||||||
|
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
|
||||||
|
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
|
||||||
|
shlex-quote it here in depth.
|
||||||
|
"""
|
||||||
|
if not host or not user:
|
||||||
|
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
|
||||||
|
qp = quote_arg(path)
|
||||||
|
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
|
||||||
|
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
||||||
|
if rc != 0:
|
||||||
|
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
|
||||||
|
raw = out.strip()
|
||||||
|
if raw == "MISSING" or raw == "":
|
||||||
|
return HostDiskResult(host=host, on_disk=False)
|
||||||
|
try:
|
||||||
|
size = int(raw.splitlines()[-1])
|
||||||
|
except ValueError:
|
||||||
|
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
|
||||||
|
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||||
|
|
||||||
|
|
||||||
|
async def probe_disk(
|
||||||
|
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
|
||||||
|
) -> DiskStatus:
|
||||||
|
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
|
||||||
|
|
||||||
|
A local model (local_path set) is probed by directory; otherwise by HF cache.
|
||||||
|
"""
|
||||||
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
if mode == "cluster" and settings.spark2_host:
|
if mode == "cluster" and settings.spark2_host:
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
|
if local_path:
|
||||||
|
results = await asyncio.gather(
|
||||||
|
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
|
||||||
|
)
|
||||||
|
key = local_path
|
||||||
|
else:
|
||||||
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
key = repo
|
||||||
on_disk = any(r.on_disk for r in results)
|
on_disk = any(r.on_disk for r in results)
|
||||||
total = sum(r.size_bytes for r in results)
|
total = sum(r.size_bytes for r in results)
|
||||||
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
||||||
|
|
||||||
|
|
||||||
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
||||||
@@ -122,10 +241,14 @@ async def delete_host(host: str, user: str, repo: str, settings: Settings) -> Ho
|
|||||||
return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
|
return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
|
||||||
|
|
||||||
|
|
||||||
async def delete_from_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
async def delete_from_disk(repo: str, settings: Settings) -> DiskStatus:
|
||||||
"""rm -rf the model's cache dir on the relevant Sparks. Idempotent."""
|
"""rm -rf the model's cache dir on ALL configured Sparks. Idempotent.
|
||||||
|
|
||||||
|
We sweep both Sparks regardless of the model's declared mode: a 'remove from
|
||||||
|
disk & menu' must leave nothing behind, and rm of an absent dir reports 0
|
||||||
|
bytes freed (FREED 0), so an extra host is harmless."""
|
||||||
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
if mode == "cluster" and settings.spark2_host:
|
if settings.spark2_host:
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
|
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
|||||||
+15
-1
@@ -23,6 +23,20 @@ from .ssh import ssh_stream, StreamHandle
|
|||||||
Mode = Literal["spark1", "spark2", "cluster"]
|
Mode = Literal["spark1", "spark2", "cluster"]
|
||||||
|
|
||||||
|
|
||||||
|
def build_download_command(repo: str, flags: str = "") -> str:
|
||||||
|
"""Remote shell command that drives hf-download.sh on a Spark.
|
||||||
|
|
||||||
|
Prepends ~/.local/bin to PATH. hf-download.sh shells out to `uvx` (Astral's
|
||||||
|
uv), and the official uv installer drops its binaries in ~/.local/bin — but
|
||||||
|
our SSH session is non-interactive, so it never sources the user's profile
|
||||||
|
and ~/.local/bin is off PATH, leaving `uvx` as "command not found". $HOME
|
||||||
|
expands server-side, so this stays correct for any adopter/user. `repo` is
|
||||||
|
shlex-quoted at the sink (validate_repo gates the charset upstream).
|
||||||
|
"""
|
||||||
|
serve = f"./hf-download.sh {quote_arg(repo)} {flags}".strip()
|
||||||
|
return f'export PATH="$HOME/.local/bin:$PATH" && cd ~/spark-vllm-docker && {serve}'
|
||||||
|
|
||||||
|
|
||||||
_TQDM_RE = re.compile(
|
_TQDM_RE = re.compile(
|
||||||
r"(\d+(?:\.\d+)?)\s*%\s*\|.*?\|\s*"
|
r"(\d+(?:\.\d+)?)\s*%\s*\|.*?\|\s*"
|
||||||
r"([\d.]+[KMG]?B?)\s*/\s*([\d.]+[KMG]?B?)\s*"
|
r"([\d.]+[KMG]?B?)\s*/\s*([\d.]+[KMG]?B?)\s*"
|
||||||
@@ -126,7 +140,7 @@ class DownloadManager:
|
|||||||
if not target_host or not target_user:
|
if not target_host or not target_user:
|
||||||
raise RuntimeError(f"{job.mode} host not configured")
|
raise RuntimeError(f"{job.mode} host not configured")
|
||||||
|
|
||||||
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {quote_arg(job.repo)} {flags}".strip()
|
cmd = build_download_command(job.repo, flags)
|
||||||
job.append(f"$ {cmd}")
|
job.append(f"$ {cmd}")
|
||||||
job.state = "downloading"
|
job.state = "downloading"
|
||||||
job.progress.phase = "Connecting to Hugging Face…"
|
job.progress.phase = "Connecting to Hugging Face…"
|
||||||
|
|||||||
+34
-9
@@ -6,17 +6,28 @@ from .config import Settings
|
|||||||
_TIMEOUT = 3.0
|
_TIMEOUT = 3.0
|
||||||
|
|
||||||
|
|
||||||
async def check_vllm(settings: Settings) -> dict:
|
def _disabled(settings: Settings, key: str) -> dict | None:
|
||||||
base_url = (
|
"""A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
|
||||||
f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
|
|
||||||
if settings.spark1_host
|
Lets an adopter who doesn't run a given support service switch its probe off
|
||||||
else None
|
entirely — so the probe never hits whatever else listens on that port, and
|
||||||
)
|
the connectivity log doesn't record it as perpetually down."""
|
||||||
if not settings.spark1_host:
|
if key in settings.disabled_services:
|
||||||
return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
|
return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
async def probe_vllm_endpoint(host: str, port: int) -> dict:
|
||||||
|
"""Probe any OpenAI-compatible vLLM at host:port via /v1/models.
|
||||||
|
|
||||||
|
Shared by the primary (Spark 1) health check and any extra vLLM registered
|
||||||
|
as a custom service (kind: vllm) to monitor a second Spark."""
|
||||||
|
base_url = f"http://{host}:{port}/v1" if host else None
|
||||||
|
if not host:
|
||||||
|
return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
||||||
r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
|
r = await c.get(f"http://{host}:{port}/v1/models")
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
ids = [m["id"] for m in r.json().get("data", [])]
|
ids = [m["id"] for m in r.json().get("data", [])]
|
||||||
return {
|
return {
|
||||||
@@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict:
|
|||||||
return {"ok": False, "error": str(e), "base_url": base_url}
|
return {"ok": False, "error": str(e), "base_url": base_url}
|
||||||
|
|
||||||
|
|
||||||
|
async def check_vllm(settings: Settings) -> dict:
|
||||||
|
if not settings.spark1_host:
|
||||||
|
return {"ok": False, "error": "spark1 not configured", "base_url": None}
|
||||||
|
return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
|
||||||
|
|
||||||
|
|
||||||
async def check_parakeet(settings: Settings) -> dict:
|
async def check_parakeet(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "parakeet"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
||||||
if settings.parakeet_host
|
if settings.parakeet_host
|
||||||
@@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_kokoro(settings: Settings) -> dict:
|
async def check_kokoro(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "kokoro"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
||||||
if settings.kokoro_host
|
if settings.kokoro_host
|
||||||
@@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_embeddings(settings: Settings) -> dict:
|
async def check_embeddings(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "embeddings"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.embed_host}:{settings.embed_port}"
|
f"http://{settings.embed_host}:{settings.embed_port}"
|
||||||
if settings.embed_host
|
if settings.embed_host
|
||||||
@@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_qdrant(settings: Settings) -> dict:
|
async def check_qdrant(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "qdrant"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
||||||
if settings.qdrant_host
|
if settings.qdrant_host
|
||||||
|
|||||||
+77
-7
@@ -1,15 +1,33 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import logging
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
import yaml
|
import yaml
|
||||||
from pydantic import BaseModel, Field
|
from pydantic import BaseModel, Field, model_validator
|
||||||
|
|
||||||
from .overrides import apply_knobs_to_args, load_overrides
|
from .overrides import apply_knobs_to_args, load_overrides
|
||||||
from .shellsafe import quote_arg, quote_args
|
from .shellsafe import quote_arg, quote_args, validate_local_path
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _chat_template_path(vllm_args: list[str]) -> str | None:
|
||||||
|
"""Extract the path from a `--chat-template=<path>` arg, if present."""
|
||||||
|
for a in vllm_args:
|
||||||
|
if a.startswith("--chat-template="):
|
||||||
|
return a.split("=", 1)[1]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _is_within(path: str, base: str) -> bool:
|
||||||
|
"""True if `path` is `base` itself or lives inside it (lexical check)."""
|
||||||
|
base = base.rstrip("/")
|
||||||
|
return path == base or path.startswith(base + "/")
|
||||||
|
|
||||||
|
|
||||||
class ModelDef(BaseModel):
|
class ModelDef(BaseModel):
|
||||||
display_name: str
|
display_name: str
|
||||||
repo: str
|
repo: str = "" # HF 'org/name'; empty for a local model
|
||||||
|
local_path: str | None = None # absolute dir on the Spark; set => local model
|
||||||
size_gb: float
|
size_gb: float
|
||||||
mode: Literal["solo", "cluster"]
|
mode: Literal["solo", "cluster"]
|
||||||
capabilities: list[str] = Field(default_factory=list)
|
capabilities: list[str] = Field(default_factory=list)
|
||||||
@@ -19,6 +37,38 @@ class ModelDef(BaseModel):
|
|||||||
knobs: dict | None = None # user-customized; merged at launch time
|
knobs: dict | None = None # user-customized; merged at launch time
|
||||||
custom: bool = False # True if this came from /data overrides
|
custom: bool = False # True if this came from /data overrides
|
||||||
|
|
||||||
|
@model_validator(mode="after")
|
||||||
|
def _validate_source(self) -> "ModelDef":
|
||||||
|
if bool(self.repo) == bool(self.local_path):
|
||||||
|
raise ValueError(
|
||||||
|
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
|
||||||
|
f"or 'local_path' (Spark directory)"
|
||||||
|
)
|
||||||
|
if self.local_path:
|
||||||
|
# Single place that enforces the path whitelist, so YAML/override
|
||||||
|
# entries get the same boundary check as the API. The quote_arg sink
|
||||||
|
# is still defense-in-depth.
|
||||||
|
validate_local_path(self.local_path)
|
||||||
|
# Only local_path is bind-mounted into the vLLM container, so any
|
||||||
|
# --chat-template path must live inside it or vLLM can't find it.
|
||||||
|
tmpl = _chat_template_path(self.vllm_args)
|
||||||
|
if tmpl is not None and not _is_within(tmpl, self.local_path):
|
||||||
|
raise ValueError(
|
||||||
|
f"--chat-template path {tmpl!r} must be inside the model "
|
||||||
|
f"directory {self.local_path!r} (only that directory is mounted "
|
||||||
|
f"into the container)"
|
||||||
|
)
|
||||||
|
return self
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_local(self) -> bool:
|
||||||
|
return bool(self.local_path)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def source(self) -> str:
|
||||||
|
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
|
||||||
|
return self.local_path if self.local_path else self.repo
|
||||||
|
|
||||||
|
|
||||||
class Defaults(BaseModel):
|
class Defaults(BaseModel):
|
||||||
port: int = 8888
|
port: int = 8888
|
||||||
@@ -47,7 +97,8 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
|||||||
continue
|
continue
|
||||||
defaults_dump = {
|
defaults_dump = {
|
||||||
"display_name": entry.get("display_name", key),
|
"display_name": entry.get("display_name", key),
|
||||||
"repo": entry["repo"],
|
"repo": entry.get("repo", ""),
|
||||||
|
"local_path": entry.get("local_path"),
|
||||||
"size_gb": float(entry.get("size_gb", 0)),
|
"size_gb": float(entry.get("size_gb", 0)),
|
||||||
"mode": entry.get("mode", "solo"),
|
"mode": entry.get("mode", "solo"),
|
||||||
"capabilities": entry.get("capabilities") or [],
|
"capabilities": entry.get("capabilities") or [],
|
||||||
@@ -57,7 +108,12 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
|||||||
"knobs": entry.get("knobs"),
|
"knobs": entry.get("knobs"),
|
||||||
"custom": True,
|
"custom": True,
|
||||||
}
|
}
|
||||||
|
# A single malformed override entry (bad path, missing source, etc.) must
|
||||||
|
# not take down the whole catalog — skip it and keep the rest loadable.
|
||||||
|
try:
|
||||||
new_models[key] = ModelDef.model_validate(defaults_dump)
|
new_models[key] = ModelDef.model_validate(defaults_dump)
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("skipping invalid custom model %r: %s", key, e)
|
||||||
|
|
||||||
return Catalog(defaults=catalog.defaults, models=new_models)
|
return Catalog(defaults=catalog.defaults, models=new_models)
|
||||||
|
|
||||||
@@ -78,7 +134,21 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
|
|||||||
solo = "--solo " if model.mode == "solo" else ""
|
solo = "--solo " if model.mode == "solo" else ""
|
||||||
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
|
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
|
||||||
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
|
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
|
||||||
# repo + args are user-controlled (custom models, knobs); shlex.quote each so
|
# source + args are user-controlled (custom models, knobs); shlex.quote each
|
||||||
# they cannot break out of the SSH shell command. shlex.split (used by the
|
# so they cannot break out of the SSH shell command. shlex.split (used by the
|
||||||
# vLLM pre-flight validator) cleanly reverses this quoting.
|
# vLLM pre-flight validator) cleanly reverses this quoting.
|
||||||
return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}"
|
prefix = ""
|
||||||
|
if model.local_path:
|
||||||
|
# A local model's directory isn't in the HF cache the launch script
|
||||||
|
# already mounts, so bind-mount it at the SAME path inside the vllm
|
||||||
|
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
|
||||||
|
# path inside and out means `vllm serve <dir>` and any
|
||||||
|
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
|
||||||
|
# change needed. (The env assignment sits before the script, so the
|
||||||
|
# validator's `serve`-keyed shlex round-trip is unaffected.)
|
||||||
|
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
|
||||||
|
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
|
||||||
|
return (
|
||||||
|
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
|
||||||
|
f"{quote_arg(model.source)} {quote_args(args)}"
|
||||||
|
)
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ Shape:
|
|||||||
custom:
|
custom:
|
||||||
- key: my-new-model
|
- key: my-new-model
|
||||||
display_name: My New Model (from download)
|
display_name: My New Model (from download)
|
||||||
repo: my-org/my-model
|
repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
|
||||||
size_gb: 20
|
size_gb: 20
|
||||||
mode: solo
|
mode: solo
|
||||||
description: null
|
description: null
|
||||||
@@ -25,6 +25,12 @@ Shape:
|
|||||||
fastsafetensors: true
|
fastsafetensors: true
|
||||||
prefix_caching: true
|
prefix_caching: true
|
||||||
kv_cache_dtype: fp8
|
kv_cache_dtype: fp8
|
||||||
|
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
|
||||||
|
display_name: My Fine-tune
|
||||||
|
local_path: /home/you/models/my-finetune
|
||||||
|
size_gb: 59
|
||||||
|
mode: solo
|
||||||
|
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import os
|
import os
|
||||||
|
|||||||
+268
-63
@@ -1,30 +1,34 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import asyncio
|
import asyncio
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from fastapi import FastAPI, HTTPException, Query, Request
|
from fastapi import FastAPI, HTTPException, Query, Request
|
||||||
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
||||||
from fastapi.staticfiles import StaticFiles
|
from fastapi.staticfiles import StaticFiles
|
||||||
from pydantic import BaseModel
|
from pydantic import BaseModel, ValidationError
|
||||||
from typing import Literal
|
from typing import Literal
|
||||||
|
|
||||||
|
from . import app_settings
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
|
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
|
||||||
|
from .coordination import LockHeld, ScheduleRegistry, SwapLockManager, WebhookNotifier, valid_schedule_id
|
||||||
from .custom_services import add_custom_service, delete_custom_service
|
from .custom_services import add_custom_service, delete_custom_service
|
||||||
from .audio_proxy import build_router as build_audio_router
|
from .audio_proxy import build_router as build_audio_router
|
||||||
from .deep_health import DeepHealth
|
from .deep_health import DeepHealth
|
||||||
from .disk import delete_from_disk, probe_disk
|
from .discovery import build_menu, infer_recipe, repo_to_key
|
||||||
|
from .disk import delete_from_disk, probe_host, read_model_config
|
||||||
from .download import DownloadManager
|
from .download import DownloadManager
|
||||||
from .llm_proxy import build_router as build_llm_router
|
from .llm_proxy import build_router as build_llm_router
|
||||||
from .embeddings_proxy import build_router as build_embeddings_router
|
from .embeddings_proxy import build_router as build_embeddings_router
|
||||||
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
||||||
from .hardware import HardwareProbe
|
from .hardware import HardwareProbe
|
||||||
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
|
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
|
||||||
from .matrix_bridge import MatrixBridgeManager
|
from .matrix_bridge import MatrixBridgeManager
|
||||||
from .models import load_catalog
|
from .models import ModelDef, load_catalog
|
||||||
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
||||||
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
|
from .overrides import add_custom, delete_custom, load_overrides, set_knobs
|
||||||
from .services import docker_state, run_action, services_from_settings
|
from .services import docker_state, run_action, services_from_settings
|
||||||
from .shellsafe import validate_container, validate_image, validate_repo
|
from .shellsafe import validate_container, validate_image, validate_repo
|
||||||
from .speech_models import SpeechModelsManager
|
from .speech_models import SpeechModelsManager
|
||||||
@@ -35,9 +39,18 @@ from .validate import validate_launch
|
|||||||
from .wol import send_local_broadcast, send_via_peer
|
from .wol import send_local_broadcast, send_via_peer
|
||||||
|
|
||||||
|
|
||||||
|
# One-time migration: seed the in-app settings overlay from env (values set via
|
||||||
|
# the StartOS action on a pre-gear install) before building Settings, so nothing
|
||||||
|
# is lost on upgrade. No-op once the overlay exists. See app_settings.
|
||||||
|
app_settings.seed_from_env(os.environ)
|
||||||
settings = Settings.from_env()
|
settings = Settings.from_env()
|
||||||
catalog = load_catalog(settings.models_yaml)
|
catalog = load_catalog(settings.models_yaml)
|
||||||
swap_manager = SwapManager(settings, catalog)
|
# Coordination layer (GPU arbiter): swap-lifecycle webhook, the swap reservation
|
||||||
|
# lock, and the read-only schedule registry. See coordination.py.
|
||||||
|
swap_webhook = WebhookNotifier(settings.swap_webhook_url, settings.swap_webhook_secret)
|
||||||
|
swap_lock = SwapLockManager()
|
||||||
|
schedule_registry = ScheduleRegistry()
|
||||||
|
swap_manager = SwapManager(settings, catalog, notifier=swap_webhook)
|
||||||
download_manager = DownloadManager(settings)
|
download_manager = DownloadManager(settings)
|
||||||
update_manager = UpdateManager(settings)
|
update_manager = UpdateManager(settings)
|
||||||
hardware_probe = HardwareProbe(settings)
|
hardware_probe = HardwareProbe(settings)
|
||||||
@@ -67,6 +80,10 @@ _CSRF_EXEMPT_PREFIXES = (
|
|||||||
"/api/audio/", # diarize-chunk / label-merge / transcribe-with-speakers
|
"/api/audio/", # diarize-chunk / label-merge / transcribe-with-speakers
|
||||||
"/api/health-event", # health reports posted by consumer apps
|
"/api/health-event", # health reports posted by consumer apps
|
||||||
)
|
)
|
||||||
|
# Note: the coordination endpoints (/api/swap/lock, /api/schedule) are
|
||||||
|
# intentionally NOT exempt. External schedulers are non-browser clients (no
|
||||||
|
# Origin header) so they pass the guard already — same as /api/swap — while a
|
||||||
|
# malicious page can't drive them from the operator's browser. Don't add them.
|
||||||
|
|
||||||
|
|
||||||
@app.middleware("http")
|
@app.middleware("http")
|
||||||
@@ -145,26 +162,100 @@ async def get_config() -> dict:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---- In-app settings ('gear') ----
|
||||||
|
# The optional cluster knobs (ports, container names, support-service hosts,
|
||||||
|
# integrations) live in an app-owned overlay on /data, edited here instead of in
|
||||||
|
# the StartOS action — which keeps to just the four required setup fields. See
|
||||||
|
# app_settings. Writes apply live: we rewrite the overlay then reload the shared
|
||||||
|
# Settings instance in place, so every router/manager holding the reference picks
|
||||||
|
# up the change with no container restart.
|
||||||
|
@app.get("/api/settings")
|
||||||
|
async def get_settings() -> dict:
|
||||||
|
return app_settings.public_view()
|
||||||
|
|
||||||
|
|
||||||
|
class SettingsUpdate(BaseModel):
|
||||||
|
values: dict[str, str]
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/settings")
|
||||||
|
async def post_settings(req: SettingsUpdate) -> dict:
|
||||||
|
try:
|
||||||
|
app_settings.apply(req.values)
|
||||||
|
except app_settings.SettingsError as e:
|
||||||
|
raise HTTPException(422, str(e))
|
||||||
|
settings.reload()
|
||||||
|
# WebhookNotifier snapshots url/secret (not the Settings object), so reload()
|
||||||
|
# can't reach it — re-point it explicitly so a webhook edit applies live too.
|
||||||
|
swap_webhook.update(settings.swap_webhook_url, settings.swap_webhook_secret)
|
||||||
|
return app_settings.public_view()
|
||||||
|
|
||||||
|
|
||||||
def _reload_catalog() -> None:
|
def _reload_catalog() -> None:
|
||||||
global catalog
|
global catalog
|
||||||
catalog = load_catalog(settings.models_yaml)
|
catalog = load_catalog(settings.models_yaml)
|
||||||
swap_manager.reload_catalog(catalog)
|
swap_manager.reload_catalog(catalog)
|
||||||
|
|
||||||
|
|
||||||
|
def _recipe_summaries() -> list[dict]:
|
||||||
|
"""Known launch recipes (bundled + saved), for the download panel's autocomplete.
|
||||||
|
|
||||||
|
These are NOT the menu — the menu is what's on disk. This is just the set of
|
||||||
|
repos Spark Control already knows how to launch, so the download box can
|
||||||
|
suggest them by name without putting phantom cards on the dashboard."""
|
||||||
|
out = []
|
||||||
|
for m in catalog.models.values():
|
||||||
|
if m.repo:
|
||||||
|
out.append({"repo": m.repo, "display_name": m.display_name, "mode": m.mode})
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/models")
|
@app.get("/api/models")
|
||||||
async def get_models() -> dict:
|
async def get_models() -> dict:
|
||||||
out_models: dict[str, dict] = {}
|
"""The model menu = what's actually downloaded on the Sparks (one scan per
|
||||||
for key, m in catalog.models.items():
|
Spark), each annotated with its launch recipe or flagged `needs_setup`.
|
||||||
d = m.model_dump()
|
|
||||||
# Always include effective knobs for the UI (defaults from base args + any overrides)
|
Does SSH, so it's the slower of the model endpoints; the front-end calls it on
|
||||||
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
|
load, after a swap/download/delete, and on a slow timer — not every poll."""
|
||||||
out_models[key] = d
|
if not settings.configured:
|
||||||
|
return {"configured": False, "defaults": catalog.defaults.model_dump(), "models": {}, "recipes": []}
|
||||||
|
menu = await build_menu(settings, catalog)
|
||||||
return {
|
return {
|
||||||
|
"configured": True,
|
||||||
"defaults": catalog.defaults.model_dump(),
|
"defaults": catalog.defaults.model_dump(),
|
||||||
"models": out_models,
|
"models": menu,
|
||||||
|
"recipes": _recipe_summaries(),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/models/suggest")
|
||||||
|
async def suggest_model(repo: str = Query(...)) -> dict:
|
||||||
|
"""Read a downloaded model's config.json + size and propose a launch recipe.
|
||||||
|
|
||||||
|
Prefills the 'set up this model' form for an on-disk model that has no recipe
|
||||||
|
yet. The operator confirms/edits, then POSTs it to /api/models to save."""
|
||||||
|
if not settings.configured:
|
||||||
|
raise HTTPException(503, "spark1 not configured")
|
||||||
|
try:
|
||||||
|
validate_repo(repo)
|
||||||
|
except ValueError as e:
|
||||||
|
raise HTTPException(400, str(e))
|
||||||
|
hosts = [(settings.spark1_host, settings.spark1_user)]
|
||||||
|
if settings.spark2_host:
|
||||||
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
# Config from whichever Spark has it; size summed across the Sparks that do.
|
||||||
|
sizes = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
total = sum(r.size_bytes for r in sizes if r.on_disk)
|
||||||
|
on_hosts = sum(1 for r in sizes if r.on_disk)
|
||||||
|
config = None
|
||||||
|
for (h, u), r in zip(hosts, sizes):
|
||||||
|
if r.on_disk:
|
||||||
|
config = await read_model_config(h, u, repo, settings)
|
||||||
|
if config is not None:
|
||||||
|
break
|
||||||
|
return infer_recipe(repo, config or {}, total, on_hosts)
|
||||||
|
|
||||||
|
|
||||||
class KnobsBody(BaseModel):
|
class KnobsBody(BaseModel):
|
||||||
knobs: dict
|
knobs: dict
|
||||||
|
|
||||||
@@ -183,7 +274,8 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
|
|||||||
class CustomModelBody(BaseModel):
|
class CustomModelBody(BaseModel):
|
||||||
key: str
|
key: str
|
||||||
display_name: str
|
display_name: str
|
||||||
repo: str
|
repo: str = ""
|
||||||
|
local_path: str | None = None
|
||||||
size_gb: float = 0
|
size_gb: float = 0
|
||||||
mode: Literal["solo", "cluster"] = "solo"
|
mode: Literal["solo", "cluster"] = "solo"
|
||||||
description: str | None = None
|
description: str | None = None
|
||||||
@@ -196,8 +288,17 @@ class CustomModelBody(BaseModel):
|
|||||||
async def post_model(body: CustomModelBody) -> dict:
|
async def post_model(body: CustomModelBody) -> dict:
|
||||||
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
|
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
|
||||||
raise HTTPException(400, "key must be alphanumeric/-/_ only")
|
raise HTTPException(400, "key must be alphanumeric/-/_ only")
|
||||||
|
# Validate the full entry BEFORE persisting (exactly-one source, local-path
|
||||||
|
# whitelist, chat-template location). Doing it via ModelDef means the API and
|
||||||
|
# the YAML-override path share one set of rules, and a bad entry can't be
|
||||||
|
# written to /data and then break catalog load.
|
||||||
try:
|
try:
|
||||||
validate_repo(body.repo)
|
ModelDef.model_validate(body.model_dump())
|
||||||
|
if body.repo:
|
||||||
|
validate_repo(body.repo) # HF charset (the model only validates local paths)
|
||||||
|
except ValidationError as e:
|
||||||
|
msg = e.errors()[0]["msg"] if e.errors() else str(e)
|
||||||
|
raise HTTPException(400, msg.removeprefix("Value error, "))
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
raise HTTPException(400, str(e))
|
raise HTTPException(400, str(e))
|
||||||
if body.key in catalog.models and not catalog.models[body.key].custom:
|
if body.key in catalog.models and not catalog.models[body.key].custom:
|
||||||
@@ -218,57 +319,43 @@ async def del_model(key: str) -> dict:
|
|||||||
return {"ok": True, "key": key}
|
return {"ok": True, "key": key}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/models/disk-status")
|
|
||||||
async def get_models_disk_status() -> dict:
|
|
||||||
"""Probe each catalog model's HF cache on the appropriate Spark(s) in parallel.
|
|
||||||
|
|
||||||
Result is keyed by model key: {on_disk, total_bytes, per_host:[{host,on_disk,size_bytes,error?}]}.
|
|
||||||
Designed to be called once on dashboard load; takes ~1–3s depending on Spark count.
|
|
||||||
"""
|
|
||||||
if not settings.configured:
|
|
||||||
return {"configured": False, "models": {}}
|
|
||||||
keys = list(catalog.models.keys())
|
|
||||||
statuses = await asyncio.gather(*(
|
|
||||||
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
|
|
||||||
), return_exceptions=True)
|
|
||||||
out: dict[str, dict] = {}
|
|
||||||
for k, s in zip(keys, statuses):
|
|
||||||
if isinstance(s, Exception):
|
|
||||||
out[k] = {"on_disk": False, "total_bytes": 0, "per_host": [], "error": str(s)}
|
|
||||||
continue
|
|
||||||
out[k] = {
|
|
||||||
"on_disk": s.on_disk,
|
|
||||||
"total_bytes": s.total_bytes,
|
|
||||||
"per_host": [
|
|
||||||
{"host": r.host, "on_disk": r.on_disk, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
|
||||||
for r in s.per_host
|
|
||||||
],
|
|
||||||
}
|
|
||||||
return {"configured": True, "models": out}
|
|
||||||
|
|
||||||
|
|
||||||
@app.delete("/api/models/{key}/disk")
|
@app.delete("/api/models/{key}/disk")
|
||||||
async def del_model_disk(key: str) -> dict:
|
async def del_model_disk(key: str) -> dict:
|
||||||
"""Delete a model's weights from the Spark filesystem(s). The catalog entry stays.
|
"""Remove a model's weights from the Sparks — and thus from the menu, since the
|
||||||
|
menu IS the disk. Resolves the key against the live menu, so a discovered
|
||||||
|
model (no saved recipe) is deletable too.
|
||||||
|
|
||||||
Safety rails:
|
Safety rails:
|
||||||
|
- Refuses a local/fine-tuned directory (hand-placed, not re-downloadable).
|
||||||
- Refuses if the model is currently loaded on vLLM.
|
- Refuses if the model is currently loaded on vLLM.
|
||||||
- Refuses if a swap or download is in flight.
|
- Refuses if a swap or this model's own download is in flight.
|
||||||
- Idempotent: if the cache dir is already gone on a host, that host reports 0 bytes freed.
|
- Idempotent across both Sparks: an already-absent cache dir frees 0 bytes.
|
||||||
"""
|
"""
|
||||||
if key not in catalog.models:
|
if not settings.configured:
|
||||||
|
raise HTTPException(503, "spark1 not configured")
|
||||||
|
menu = await build_menu(settings, catalog)
|
||||||
|
entry = menu.get(key)
|
||||||
|
if entry is None:
|
||||||
raise HTTPException(404, f"unknown model: {key}")
|
raise HTTPException(404, f"unknown model: {key}")
|
||||||
m = catalog.models[key]
|
|
||||||
|
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
|
||||||
|
# training output the user placed by hand, not a re-downloadable HF cache.
|
||||||
|
if entry.get("local_path"):
|
||||||
|
raise HTTPException(
|
||||||
|
400,
|
||||||
|
"this is a local model; its directory must be managed on the Spark, not deleted from here",
|
||||||
|
)
|
||||||
|
repo = entry["repo"]
|
||||||
|
|
||||||
# Refuse if currently loaded
|
# Refuse if currently loaded
|
||||||
try:
|
try:
|
||||||
vllm = await check_vllm(settings)
|
vllm = await check_vllm(settings)
|
||||||
except Exception:
|
except Exception:
|
||||||
vllm = {}
|
vllm = {}
|
||||||
if vllm.get("ok") and vllm.get("current_model") == m.repo:
|
if vllm.get("ok") and vllm.get("current_model") == repo:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
409,
|
409,
|
||||||
f"'{m.display_name}' is the currently loaded model. Switch to a different model first, then try again."
|
f"'{entry['display_name']}' is the currently loaded model. Switch to a different model first, then try again."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Refuse if a swap is in flight
|
# Refuse if a swap is in flight
|
||||||
@@ -278,10 +365,10 @@ async def del_model_disk(key: str) -> dict:
|
|||||||
# Refuse if a download is in flight for this same repo (a different model's download is fine)
|
# Refuse if a download is in flight for this same repo (a different model's download is fine)
|
||||||
if download_manager.current_job_id:
|
if download_manager.current_job_id:
|
||||||
job = download_manager.get(download_manager.current_job_id)
|
job = download_manager.get(download_manager.current_job_id)
|
||||||
if job and job.repo == m.repo:
|
if job and job.repo == repo:
|
||||||
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
|
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
|
||||||
|
|
||||||
status = await delete_from_disk(m.repo, m.mode, settings)
|
status = await delete_from_disk(repo, settings)
|
||||||
# Audit log
|
# Audit log
|
||||||
record_report(
|
record_report(
|
||||||
f"disk:{key}",
|
f"disk:{key}",
|
||||||
@@ -292,7 +379,7 @@ async def del_model_disk(key: str) -> dict:
|
|||||||
return {
|
return {
|
||||||
"ok": True,
|
"ok": True,
|
||||||
"key": key,
|
"key": key,
|
||||||
"repo": m.repo,
|
"repo": repo,
|
||||||
"bytes_freed": status.total_bytes,
|
"bytes_freed": status.total_bytes,
|
||||||
"per_host": [
|
"per_host": [
|
||||||
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
||||||
@@ -476,6 +563,10 @@ async def get_services() -> dict:
|
|||||||
http = await check_embeddings(settings)
|
http = await check_embeddings(settings)
|
||||||
elif name == "qdrant":
|
elif name == "qdrant":
|
||||||
http = await check_qdrant(settings)
|
http = await check_qdrant(settings)
|
||||||
|
elif svc.kind == "vllm":
|
||||||
|
# An extra vLLM monitored on another Spark (registered as a custom
|
||||||
|
# service). Probe its own host/port, not the primary Spark 1 one.
|
||||||
|
http = await probe_vllm_endpoint(svc.host, svc.port)
|
||||||
elif svc.kind == "bot":
|
elif svc.kind == "bot":
|
||||||
# No HTTP health endpoint (host networking, no port) — judged purely
|
# No HTTP health endpoint (host networking, no port) — judged purely
|
||||||
# by docker state. http_ready stays None so the badge isn't pinned
|
# by docker state. http_ready stays None so the badge isn't pinned
|
||||||
@@ -497,7 +588,7 @@ async def get_services() -> dict:
|
|||||||
# Prefer the check fn's own top-level model key (embeddings reports
|
# Prefer the check fn's own top-level model key (embeddings reports
|
||||||
# it there); fall back to a model field inside detail for services
|
# it there); fall back to a model field inside detail for services
|
||||||
# whose /health embeds it (parakeet).
|
# whose /health embeds it (parakeet).
|
||||||
"model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
"model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
||||||
"docker_state": docker.get("state"),
|
"docker_state": docker.get("state"),
|
||||||
"restart_count": docker.get("restart_count"),
|
"restart_count": docker.get("restart_count"),
|
||||||
"started_at": docker.get("started_at"),
|
"started_at": docker.get("started_at"),
|
||||||
@@ -775,17 +866,20 @@ async def get_endpoints() -> dict:
|
|||||||
"base_url": vllm.get("base_url"),
|
"base_url": vllm.get("base_url"),
|
||||||
"model": vllm.get("current_model"),
|
"model": vllm.get("current_model"),
|
||||||
"openai_compat": True,
|
"openai_compat": True,
|
||||||
|
"disabled": bool(vllm.get("disabled")),
|
||||||
},
|
},
|
||||||
"parakeet": {
|
"parakeet": {
|
||||||
"ready": bool(parakeet.get("ok")),
|
"ready": bool(parakeet.get("ok")),
|
||||||
"base_url": parakeet.get("base_url"),
|
"base_url": parakeet.get("base_url"),
|
||||||
"kind": "stt",
|
"kind": "stt",
|
||||||
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
||||||
|
"disabled": bool(parakeet.get("disabled")),
|
||||||
},
|
},
|
||||||
"kokoro": {
|
"kokoro": {
|
||||||
"ready": bool(kokoro.get("ok")),
|
"ready": bool(kokoro.get("ok")),
|
||||||
"base_url": kokoro.get("base_url"),
|
"base_url": kokoro.get("base_url"),
|
||||||
"kind": "tts",
|
"kind": "tts",
|
||||||
|
"disabled": bool(kokoro.get("disabled")),
|
||||||
},
|
},
|
||||||
"embeddings": {
|
"embeddings": {
|
||||||
"ready": bool(embeddings.get("ok")),
|
"ready": bool(embeddings.get("ok")),
|
||||||
@@ -794,12 +888,14 @@ async def get_endpoints() -> dict:
|
|||||||
"model": embeddings.get("model"),
|
"model": embeddings.get("model"),
|
||||||
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
||||||
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
||||||
|
"disabled": bool(embeddings.get("disabled")),
|
||||||
},
|
},
|
||||||
"qdrant": {
|
"qdrant": {
|
||||||
"ready": bool(qdrant.get("ok")),
|
"ready": bool(qdrant.get("ok")),
|
||||||
"base_url": qdrant.get("base_url"),
|
"base_url": qdrant.get("base_url"),
|
||||||
"kind": "vectordb",
|
"kind": "vectordb",
|
||||||
"collection": settings.qdrant_collection or None,
|
"collection": settings.qdrant_collection or None,
|
||||||
|
"disabled": bool(qdrant.get("disabled")),
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -813,12 +909,15 @@ async def get_status() -> dict:
|
|||||||
check_embeddings(settings),
|
check_embeddings(settings),
|
||||||
check_qdrant(settings),
|
check_qdrant(settings),
|
||||||
)
|
)
|
||||||
# Feed health into the connectivity log (deduped — only logs on transition)
|
# Feed health into the connectivity log (deduped — only logs on transition).
|
||||||
record_state("vllm", bool(vllm.get("ok")))
|
# Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
|
||||||
record_state("parakeet", bool(parakeet.get("ok")))
|
# perpetually down.
|
||||||
record_state("kokoro", bool(kokoro.get("ok")))
|
for _name, _r in (
|
||||||
record_state("embeddings", bool(embeddings.get("ok")))
|
("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
|
||||||
record_state("qdrant", bool(qdrant.get("ok")))
|
("embeddings", embeddings), ("qdrant", qdrant),
|
||||||
|
):
|
||||||
|
if not _r.get("disabled"):
|
||||||
|
record_state(_name, bool(_r.get("ok")))
|
||||||
current_key = _identify_current_model(vllm.get("current_model"))
|
current_key = _identify_current_model(vllm.get("current_model"))
|
||||||
return {
|
return {
|
||||||
"configured": settings.configured,
|
"configured": settings.configured,
|
||||||
@@ -835,10 +934,13 @@ async def get_status() -> dict:
|
|||||||
def _identify_current_model(repo: str | None) -> str | None:
|
def _identify_current_model(repo: str | None) -> str | None:
|
||||||
if not repo:
|
if not repo:
|
||||||
return None
|
return None
|
||||||
|
# A recipe-backed model keys by its recipe key; a discovered model (loaded but
|
||||||
|
# not yet set up) keys by the same slug build_menu uses, so it still
|
||||||
|
# highlights as the active card.
|
||||||
for key, m in catalog.models.items():
|
for key, m in catalog.models.items():
|
||||||
if m.repo == repo:
|
if m.repo == repo:
|
||||||
return key
|
return key
|
||||||
return None
|
return repo_to_key(repo)
|
||||||
|
|
||||||
|
|
||||||
class SwapRequest(BaseModel):
|
class SwapRequest(BaseModel):
|
||||||
@@ -856,9 +958,21 @@ async def validate_swap(key: str) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
@app.post("/api/swap")
|
@app.post("/api/swap")
|
||||||
async def post_swap(req: SwapRequest) -> dict:
|
async def post_swap(req: SwapRequest, request: Request) -> dict:
|
||||||
if not settings.configured and not req.dry_run:
|
if not settings.configured and not req.dry_run:
|
||||||
raise HTTPException(503, "spark1 not configured")
|
raise HTTPException(503, "spark1 not configured")
|
||||||
|
# Enforce the swap reservation lock (the GPU arbiter). A held lock blocks any
|
||||||
|
# real swap that doesn't present the holder's token in X-Swap-Lock-Token — so
|
||||||
|
# an external scheduler that holds the lock can swap, but the dashboard (no
|
||||||
|
# token) is refused while someone else holds it. Dry runs don't touch the
|
||||||
|
# cluster, so they're exempt.
|
||||||
|
if not req.dry_run:
|
||||||
|
blocked = swap_lock.is_blocked_by(request.headers.get("x-swap-lock-token"))
|
||||||
|
if blocked is not None:
|
||||||
|
raise HTTPException(status_code=423, detail={
|
||||||
|
"error": "the GPU swap path is reserved by another holder",
|
||||||
|
"lock": blocked,
|
||||||
|
})
|
||||||
try:
|
try:
|
||||||
job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run)
|
job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run)
|
||||||
except KeyError:
|
except KeyError:
|
||||||
@@ -868,6 +982,56 @@ async def post_swap(req: SwapRequest) -> dict:
|
|||||||
return {"job_id": job.id, "model_key": job.model_key, "state": job.state}
|
return {"job_id": job.id, "model_key": job.model_key, "state": job.state}
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Swap reservation lock (the GPU arbiter) ----
|
||||||
|
# ROUTE ORDER IS LOAD-BEARING: these static `/api/swap/lock` routes MUST be
|
||||||
|
# registered before the parametric `/api/swap/{job_id}` below. FastAPI matches in
|
||||||
|
# registration order, so if `{job_id}` came first, GET /api/swap/lock would bind
|
||||||
|
# job_id="lock", look up a (non-existent) swap job, and 404 — which is exactly
|
||||||
|
# the bug this ordering fixes. Keep these above the {job_id} routes.
|
||||||
|
# CSRF: these are control-surface, not browser-exempt — an external scheduler is
|
||||||
|
# a non-browser client (no Origin header) so it passes the guard already, the
|
||||||
|
# same way it calls /api/swap; the dashboard is same-origin.
|
||||||
|
class LockAcquireRequest(BaseModel):
|
||||||
|
holder: str
|
||||||
|
ttl_seconds: int | None = None
|
||||||
|
note: str = ""
|
||||||
|
token: str | None = None # present only to extend an existing hold
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/swap/lock")
|
||||||
|
async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
|
||||||
|
"""Reserve the GPU swap path. Returns a secret token used to swap (header
|
||||||
|
X-Swap-Lock-Token) and to release. 409 if held by another holder."""
|
||||||
|
try:
|
||||||
|
lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
|
||||||
|
except ValueError as e:
|
||||||
|
raise HTTPException(422, str(e))
|
||||||
|
except LockHeld as e:
|
||||||
|
raise HTTPException(status_code=409, detail={
|
||||||
|
"error": "swap lock is held by another holder",
|
||||||
|
"lock": e.state,
|
||||||
|
})
|
||||||
|
return {**swap_lock.status(), "token": lock.token}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/swap/lock")
|
||||||
|
async def get_swap_lock() -> dict:
|
||||||
|
"""Public, token-free view of the reservation: held? who? until when?"""
|
||||||
|
return swap_lock.status()
|
||||||
|
|
||||||
|
|
||||||
|
@app.delete("/api/swap/lock")
|
||||||
|
async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
|
||||||
|
"""Release the reservation. Needs the matching X-Swap-Lock-Token unless
|
||||||
|
?force=true (the human override from the dashboard)."""
|
||||||
|
token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
|
||||||
|
try:
|
||||||
|
released = swap_lock.release(token, force=force)
|
||||||
|
except PermissionError as e:
|
||||||
|
raise HTTPException(403, str(e))
|
||||||
|
return {"released": released, **swap_lock.status()}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/swap/{job_id}")
|
@app.get("/api/swap/{job_id}")
|
||||||
async def get_swap(job_id: str) -> dict:
|
async def get_swap(job_id: str) -> dict:
|
||||||
job = swap_manager.get(job_id)
|
job = swap_manager.get(job_id)
|
||||||
@@ -913,6 +1077,47 @@ async def stream_swap(job_id: str):
|
|||||||
return StreamingResponse(gen(), media_type="text/event-stream")
|
return StreamingResponse(gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Coordination layer: read-only schedule registry ----
|
||||||
|
# (The swap reservation lock lives above, next to the swap routes.) Same CSRF
|
||||||
|
# posture: control-surface, not browser-exempt — external schedulers send no
|
||||||
|
# Origin header so they pass the guard; the dashboard is same-origin.
|
||||||
|
class ScheduleRequest(BaseModel):
|
||||||
|
name: str
|
||||||
|
id: str | None = None
|
||||||
|
owner: str = ""
|
||||||
|
cron: str = ""
|
||||||
|
next_run: str = ""
|
||||||
|
description: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/schedule")
|
||||||
|
async def list_schedules() -> dict:
|
||||||
|
return {"schedules": schedule_registry.list()}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/schedule")
|
||||||
|
async def register_schedule(req: ScheduleRequest) -> dict:
|
||||||
|
"""Register (or update, by id) a schedule an external scheduler owns. Spark
|
||||||
|
Control only stores it for the dashboard — it never executes it."""
|
||||||
|
try:
|
||||||
|
entry = schedule_registry.register(
|
||||||
|
name=req.name, id=req.id, owner=req.owner,
|
||||||
|
cron=req.cron, next_run=req.next_run, description=req.description,
|
||||||
|
)
|
||||||
|
except ValueError as e:
|
||||||
|
raise HTTPException(422, str(e))
|
||||||
|
return entry.public()
|
||||||
|
|
||||||
|
|
||||||
|
@app.delete("/api/schedule/{schedule_id}")
|
||||||
|
async def delete_schedule(schedule_id: str) -> dict:
|
||||||
|
# Whitelist the path segment at the boundary (repo convention), even though
|
||||||
|
# it's only ever a dict key — keeps it from being reflected or logged raw.
|
||||||
|
if not valid_schedule_id(schedule_id):
|
||||||
|
raise HTTPException(422, "invalid schedule id")
|
||||||
|
return {"deleted": schedule_registry.delete(schedule_id)}
|
||||||
|
|
||||||
|
|
||||||
class DownloadRequest(BaseModel):
|
class DownloadRequest(BaseModel):
|
||||||
repo: str
|
repo: str
|
||||||
mode: Literal["spark1", "spark2", "cluster"] = "spark1"
|
mode: Literal["spark1", "spark2", "cluster"] = "spark1"
|
||||||
|
|||||||
+13
-2
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
|
|||||||
appropriate host.
|
appropriate host.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import logging
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
@@ -13,6 +14,8 @@ from .config import Settings
|
|||||||
from .shellsafe import quote_arg
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
||||||
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
||||||
@@ -103,7 +106,13 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
}
|
}
|
||||||
for entry in load_custom_services():
|
for entry in load_custom_services():
|
||||||
key = entry.get("key")
|
key = entry.get("key")
|
||||||
if not key or key in out:
|
if not key:
|
||||||
|
continue
|
||||||
|
if key in out:
|
||||||
|
# A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
|
||||||
|
# an adopter who picked a colliding key for, say, a second vLLM sees
|
||||||
|
# why no tile appeared instead of a silent no-op.
|
||||||
|
log.warning("custom service %r collides with a built-in name; ignoring", key)
|
||||||
continue
|
continue
|
||||||
out[key] = ServiceDef(
|
out[key] = ServiceDef(
|
||||||
name=key,
|
name=key,
|
||||||
@@ -113,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
container=entry.get("container", key),
|
container=entry.get("container", key),
|
||||||
port=int(entry.get("port", 0)),
|
port=int(entry.get("port", 0)),
|
||||||
)
|
)
|
||||||
return out
|
# Drop services the deployment has switched off (DISABLED_SERVICES) so they
|
||||||
|
# show no tile and are never probed/auto-restarted.
|
||||||
|
return {k: v for k, v in out.items() if k not in s.disabled_services}
|
||||||
|
|
||||||
|
|
||||||
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
||||||
|
|||||||
@@ -28,6 +28,12 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
|
|||||||
# Docker container / volume name (Docker's own rule).
|
# Docker container / volume name (Docker's own rule).
|
||||||
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
|
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
|
||||||
|
|
||||||
|
# Absolute filesystem path to a local model directory on a Spark. Conservative
|
||||||
|
# charset (letters, digits, and safe path punctuation) with a required leading
|
||||||
|
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
|
||||||
|
# and '..' segments) is rejected separately in validate_local_path.
|
||||||
|
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
|
||||||
|
|
||||||
|
|
||||||
def validate_repo(repo: str) -> str:
|
def validate_repo(repo: str) -> str:
|
||||||
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
|
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
|
||||||
@@ -50,6 +56,25 @@ def validate_container(name: str) -> str:
|
|||||||
return name
|
return name
|
||||||
|
|
||||||
|
|
||||||
|
def validate_local_path(path: str) -> str:
|
||||||
|
"""Return `path` if it is a safe absolute model directory path; else ValueError.
|
||||||
|
|
||||||
|
For locally fine-tuned models served by directory (not an HF repo). Requires
|
||||||
|
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
|
||||||
|
caller cannot traverse out of an intended models directory. The `quote_arg`
|
||||||
|
sink still quotes it in depth — this is the boundary check.
|
||||||
|
"""
|
||||||
|
p = path or ""
|
||||||
|
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
|
||||||
|
raise ValueError(
|
||||||
|
f"invalid local model path (expected an absolute path, no spaces or "
|
||||||
|
f"shell metacharacters): {path!r}"
|
||||||
|
)
|
||||||
|
if any(seg in (".", "..") for seg in p.split("/")):
|
||||||
|
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
|
||||||
|
return p
|
||||||
|
|
||||||
|
|
||||||
def quote_arg(value: object) -> str:
|
def quote_arg(value: object) -> str:
|
||||||
"""shlex.quote a single token for safe embedding in a shell command string."""
|
"""shlex.quote a single token for safe embedding in a shell command string."""
|
||||||
return shlex.quote(str(value))
|
return shlex.quote(str(value))
|
||||||
|
|||||||
+410
-110
@@ -19,13 +19,21 @@ const state = {
|
|||||||
configured: true,
|
configured: true,
|
||||||
timer_handle: null,
|
timer_handle: null,
|
||||||
deep_health: {},
|
deep_health: {},
|
||||||
disk_status: {}, // keyed by model key: { on_disk, total_bytes, per_host }
|
models_loaded: false, // true once the first disk scan (/api/models) returns
|
||||||
disk_status_loaded: false,
|
recipes: [], // known launch recipes (for the download autocomplete)
|
||||||
|
lock: { held: false }, // GPU swap reservation (coordination layer)
|
||||||
|
schedules: [], // schedules external automation has registered
|
||||||
};
|
};
|
||||||
|
|
||||||
const el = (sel) => document.querySelector(sel);
|
const el = (sel) => document.querySelector(sel);
|
||||||
const $$ = (sel) => document.querySelectorAll(sel);
|
const $$ = (sel) => document.querySelectorAll(sel);
|
||||||
|
|
||||||
|
// ISO timestamp -> local clock string (e.g. "2:45:10 PM"); '' if unparseable.
|
||||||
|
function fmtClock(iso) {
|
||||||
|
const t = Date.parse(iso);
|
||||||
|
return isNaN(t) ? '' : new Date(t).toLocaleTimeString();
|
||||||
|
}
|
||||||
|
|
||||||
function escapeHtml(s) {
|
function escapeHtml(s) {
|
||||||
if (s == null) return '';
|
if (s == null) return '';
|
||||||
return String(s)
|
return String(s)
|
||||||
@@ -51,69 +59,86 @@ function renderCards() {
|
|||||||
const root = el('#cards');
|
const root = el('#cards');
|
||||||
root.innerHTML = '';
|
root.innerHTML = '';
|
||||||
const isSwapping = !!state.swap_job_id;
|
const isSwapping = !!state.swap_job_id;
|
||||||
for (const key of Object.keys(state.models)) {
|
// GPU reserved by external automation — manual swaps are refused server-side
|
||||||
|
// (423); reflect that in the buttons so the click never bounces.
|
||||||
|
const locked = !!(state.lock && state.lock.held);
|
||||||
|
const lockTip = locked
|
||||||
|
? `Reserved by ${state.lock.holder || 'automation'}${state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : ''}`
|
||||||
|
: '';
|
||||||
|
const keys = Object.keys(state.models);
|
||||||
|
if (keys.length === 0) {
|
||||||
|
// The menu is the disk: nothing downloaded (or the scan hasn't returned yet).
|
||||||
|
root.innerHTML = state.models_loaded
|
||||||
|
? `<div class="empty-menu muted">No models downloaded on the Sparks yet. Use <strong>+ Download a new model</strong> above to fetch one — it'll appear here when it's done.</div>`
|
||||||
|
: `<div class="empty-menu muted">Scanning the Sparks for downloaded models…</div>`;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
for (const key of keys) {
|
||||||
const m = state.models[key];
|
const m = state.models[key];
|
||||||
const isActive = key === state.current_model_key;
|
const isActive = key === state.current_model_key;
|
||||||
const card = document.createElement('div');
|
const card = document.createElement('div');
|
||||||
card.className = 'card' + (isActive ? ' active' : '');
|
card.className = 'card' + (isActive ? ' active' : '') + (m.needs_setup ? ' needs-setup' : '');
|
||||||
const desc = m.description
|
const desc = m.description
|
||||||
? `<div class="desc">${escapeHtml(m.description)}</div>`
|
? `<div class="desc">${escapeHtml(m.description)}</div>`
|
||||||
: '';
|
: '';
|
||||||
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
|
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
|
||||||
// Disk-presence pill + trash button. Until /api/models/disk-status comes back,
|
const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
|
||||||
// we don't know — render a neutral placeholder.
|
// Every card on the menu is on disk by definition — show its real size.
|
||||||
const disk = state.disk_status[key];
|
const gb = (m.total_bytes || 0) / 1e9;
|
||||||
let diskPill = '';
|
const diskPill = gb > 0
|
||||||
if (state.disk_status_loaded) {
|
? `<span class="tag on-disk" title="Weights present on the Spark(s)">on disk · ${gb.toFixed(1)} GB</span>`
|
||||||
if (disk && disk.on_disk) {
|
: '';
|
||||||
const gb = (disk.total_bytes / 1e9);
|
const setupPill = m.needs_setup
|
||||||
diskPill = `<span class="tag on-disk" title="Weights present on disk">on disk · ${gb.toFixed(1)} GB</span>`;
|
? `<span class="tag setup-pill" title="On disk, but Spark Control hasn't been told how to launch it">needs setup</span>`
|
||||||
} else {
|
: '';
|
||||||
diskPill = `<span class="tag not-on-disk" title="Weights not downloaded">not downloaded</span>`;
|
// Trash = remove weights from disk AND from the menu. Disabled if active / mid-swap.
|
||||||
}
|
// Never offered for local models: their directory is hand-placed training output,
|
||||||
}
|
// not a re-downloadable HF cache (the server refuses the delete too).
|
||||||
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
|
|
||||||
let trashBtn = '';
|
let trashBtn = '';
|
||||||
if (state.disk_status_loaded && disk && disk.on_disk) {
|
if (!m.local_path) {
|
||||||
const disabled = isActive || isSwapping;
|
const disabled = isActive || isSwapping;
|
||||||
const tip = isActive
|
const tip = isActive
|
||||||
? 'Currently loaded — switch to another model first'
|
? 'Currently loaded — switch to another model first'
|
||||||
: isSwapping
|
: isSwapping
|
||||||
? 'A swap is in progress'
|
? 'A swap is in progress'
|
||||||
: 'Delete weights from disk';
|
: 'Remove weights from disk & menu';
|
||||||
trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Delete from disk" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
|
trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Remove from disk and menu" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
|
||||||
}
|
}
|
||||||
// Primary card action: "Switch to this" (green) when on disk; "Download" (blue) when not.
|
// Primary action: "Current" / "Switch to this", or "Set up & switch" for a
|
||||||
// Before disk-status loads we render the swap button as a sensible default.
|
// model on disk that has no launch recipe yet.
|
||||||
const isOnDisk = !state.disk_status_loaded || (disk && disk.on_disk);
|
const swapBlocked = isSwapping || locked;
|
||||||
const dlInFlight = !!(typeof dlState !== 'undefined' && dlState && dlState.job_id);
|
const lockTipAttr = locked ? ` title="${escapeHtml(lockTip)}"` : '';
|
||||||
let primaryBtn = '';
|
let primaryBtn = '';
|
||||||
if (isActive) {
|
if (isActive) {
|
||||||
primaryBtn = `<button class="btn" disabled>Current</button>`;
|
primaryBtn = `<button class="btn" disabled>Current</button>`;
|
||||||
} else if (isOnDisk) {
|
} else if (m.needs_setup) {
|
||||||
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
|
primaryBtn = `<button class="btn primary" data-setup-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Set up & switch</button>`;
|
||||||
} else {
|
} else {
|
||||||
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
|
primaryBtn = `<button class="btn primary" data-swap-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Switch to this</button>`;
|
||||||
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
|
|
||||||
}
|
}
|
||||||
|
// The Test/Advanced controls need a saved recipe; hide them until setup is done.
|
||||||
|
const recipeActions = m.needs_setup ? '' : `
|
||||||
|
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
|
||||||
|
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>`;
|
||||||
card.innerHTML = `
|
card.innerHTML = `
|
||||||
<div class="name">${escapeHtml(m.display_name)}</div>
|
<div class="name">${escapeHtml(m.display_name)}</div>
|
||||||
<div class="meta">
|
<div class="meta">
|
||||||
<span class="tag mode-${m.mode}">${m.mode}</span>
|
<span class="tag mode-${m.mode}">${m.mode}</span>
|
||||||
<span class="tag">${m.size_gb} GB</span>
|
|
||||||
${customPill}
|
|
||||||
${diskPill}
|
${diskPill}
|
||||||
|
${setupPill}
|
||||||
|
${customPill}
|
||||||
|
${localPill}
|
||||||
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
|
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
|
||||||
</div>
|
</div>
|
||||||
${desc}
|
${desc}
|
||||||
<div class="muted small repo">
|
<div class="muted small repo">
|
||||||
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>
|
${m.local_path
|
||||||
|
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
|
||||||
|
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
|
||||||
</div>
|
</div>
|
||||||
<div class="spacer"></div>
|
<div class="spacer"></div>
|
||||||
<div class="card-actions">
|
<div class="card-actions">
|
||||||
${primaryBtn}
|
${primaryBtn}${recipeActions}
|
||||||
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
|
|
||||||
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>
|
|
||||||
${trashBtn}
|
${trashBtn}
|
||||||
</div>
|
</div>
|
||||||
<div class="test-result hidden" data-test-result-for="${key}"></div>
|
<div class="test-result hidden" data-test-result-for="${key}"></div>
|
||||||
@@ -123,8 +148,8 @@ function renderCards() {
|
|||||||
for (const btn of root.querySelectorAll('[data-swap-key]')) {
|
for (const btn of root.querySelectorAll('[data-swap-key]')) {
|
||||||
btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
|
btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
|
||||||
}
|
}
|
||||||
for (const btn of root.querySelectorAll('[data-download-key]')) {
|
for (const btn of root.querySelectorAll('[data-setup-key]')) {
|
||||||
btn.addEventListener('click', () => triggerDownloadForKey(btn.dataset.downloadKey));
|
btn.addEventListener('click', () => openSetupForKey(btn.dataset.setupKey));
|
||||||
}
|
}
|
||||||
for (const btn of root.querySelectorAll('[data-adv-key]')) {
|
for (const btn of root.querySelectorAll('[data-adv-key]')) {
|
||||||
btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
|
btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
|
||||||
@@ -923,6 +948,10 @@ function renderHealth(status) {
|
|||||||
function setDot(id, ok, payload) {
|
function setDot(id, ok, payload) {
|
||||||
const item = el(id);
|
const item = el(id);
|
||||||
if (!item) return;
|
if (!item) return;
|
||||||
|
// A service switched off via DISABLED_SERVICES isn't part of this
|
||||||
|
// deployment — hide its indicator entirely rather than show it as down.
|
||||||
|
if (payload && payload.disabled) { item.classList.add('hidden'); return; }
|
||||||
|
item.classList.remove('hidden');
|
||||||
const dot = item.querySelector('.dot');
|
const dot = item.querySelector('.dot');
|
||||||
dot.classList.remove('ok', 'bad', 'warn');
|
dot.classList.remove('ok', 'bad', 'warn');
|
||||||
if (ok === true) dot.classList.add('ok');
|
if (ok === true) dot.classList.add('ok');
|
||||||
@@ -1141,24 +1170,44 @@ async function pollStatus() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
let menuLoadInFlight = false;
|
||||||
|
|
||||||
async function loadModels() {
|
async function loadModels() {
|
||||||
|
// The menu is whatever's downloaded on the Sparks — /api/models does the scan
|
||||||
|
// (SSH), so this is the slower model call. Best-effort: a transient failure
|
||||||
|
// leaves the previous menu in place rather than blanking the dashboard.
|
||||||
|
// Guard against overlap: init() fires this un-awaited and pollStatus()'s
|
||||||
|
// empty-menu fallback may call it again before the scan returns.
|
||||||
|
if (menuLoadInFlight) return;
|
||||||
|
menuLoadInFlight = true;
|
||||||
|
try {
|
||||||
const data = await fetchJSON('/api/models');
|
const data = await fetchJSON('/api/models');
|
||||||
state.defaults = data.defaults || {};
|
state.defaults = data.defaults || {};
|
||||||
state.models = data.models || {};
|
state.models = data.models || {};
|
||||||
|
state.recipes = data.recipes || [];
|
||||||
|
state.models_loaded = true;
|
||||||
|
populateDownloadSuggestions();
|
||||||
|
renderCards();
|
||||||
|
} catch (e) {
|
||||||
|
console.warn('model menu load failed:', e.message);
|
||||||
|
} finally {
|
||||||
|
menuLoadInFlight = false;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async function loadDiskStatus() {
|
// Populate the download box's autocomplete with known recipes not currently on
|
||||||
// Probes each catalog model's HF cache over SSH; takes a beat. Best-effort.
|
// disk — so common/bundled models stay discoverable without phantom menu cards.
|
||||||
try {
|
function populateDownloadSuggestions() {
|
||||||
const r = await fetchJSON('/api/models/disk-status');
|
const dl = el('#dl-suggestions');
|
||||||
if (r && r.models) {
|
if (!dl) return;
|
||||||
state.disk_status = r.models;
|
const onDiskRepos = new Set(Object.values(state.models).map(m => m.repo).filter(Boolean));
|
||||||
state.disk_status_loaded = true;
|
dl.innerHTML = '';
|
||||||
renderCards();
|
for (const r of state.recipes || []) {
|
||||||
}
|
if (onDiskRepos.has(r.repo)) continue;
|
||||||
} catch (e) {
|
const opt = document.createElement('option');
|
||||||
// Silent — pills just won't render. Don't block dashboard.
|
opt.value = r.repo;
|
||||||
console.warn('disk-status probe failed:', e.message);
|
opt.label = `${r.display_name} (${r.mode})`;
|
||||||
|
dl.appendChild(opt);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1172,14 +1221,12 @@ function fmtBytesShort(n) {
|
|||||||
|
|
||||||
function openDiskDeleteDialog(key) {
|
function openDiskDeleteDialog(key) {
|
||||||
const m = state.models[key];
|
const m = state.models[key];
|
||||||
const disk = state.disk_status[key];
|
if (!m || !m.on_disk) return;
|
||||||
if (!m || !disk || !disk.on_disk) return;
|
|
||||||
const dlg = el('#disk-delete-dialog');
|
const dlg = el('#disk-delete-dialog');
|
||||||
el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(disk.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from disk.`;
|
el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(m.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from the Sparks. This also takes it off the menu.`;
|
||||||
const hostsEl = el('#dd-hosts');
|
const hostsEl = el('#dd-hosts');
|
||||||
hostsEl.innerHTML = '';
|
hostsEl.innerHTML = '';
|
||||||
for (const h of (disk.per_host || [])) {
|
for (const h of (m.per_host || [])) {
|
||||||
if (!h.on_disk) continue;
|
|
||||||
const li = document.createElement('li');
|
const li = document.createElement('li');
|
||||||
li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
|
li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
|
||||||
hostsEl.appendChild(li);
|
hostsEl.appendChild(li);
|
||||||
@@ -1198,20 +1245,19 @@ function openDiskDeleteDialog(key) {
|
|||||||
try {
|
try {
|
||||||
const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
|
const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
|
||||||
dlg.close();
|
dlg.close();
|
||||||
// Optimistically clear local disk state for this key, then refresh.
|
// Optimistically drop the card, then re-scan the menu (it's gone from disk).
|
||||||
delete state.disk_status[key];
|
delete state.models[key];
|
||||||
renderCards();
|
renderCards();
|
||||||
// Eagerly re-probe so size is accurate (and shows "not downloaded" pill).
|
await loadModels();
|
||||||
loadDiskStatus();
|
|
||||||
const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
|
const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
|
||||||
console.log(`Deleted ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
|
console.log(`Removed ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
errEl.textContent = e.message || 'Delete failed';
|
errEl.textContent = e.message || 'Delete failed';
|
||||||
errEl.classList.remove('hidden');
|
errEl.classList.remove('hidden');
|
||||||
} finally {
|
} finally {
|
||||||
confirm.disabled = false;
|
confirm.disabled = false;
|
||||||
cancel.disabled = false;
|
cancel.disabled = false;
|
||||||
confirm.textContent = 'Delete from disk';
|
confirm.textContent = 'Remove from disk & menu';
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
cancel.onclick = onCancel;
|
cancel.onclick = onCancel;
|
||||||
@@ -1221,6 +1267,11 @@ function openDiskDeleteDialog(key) {
|
|||||||
|
|
||||||
async function triggerSwap(modelKey) {
|
async function triggerSwap(modelKey) {
|
||||||
if (state.swap_job_id) return;
|
if (state.swap_job_id) return;
|
||||||
|
if (state.lock && state.lock.held) {
|
||||||
|
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
|
||||||
|
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
try {
|
try {
|
||||||
const r = await fetchJSON('/api/swap', {
|
const r = await fetchJSON('/api/swap', {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
@@ -1229,40 +1280,82 @@ async function triggerSwap(modelKey) {
|
|||||||
});
|
});
|
||||||
attachToSwap(r.job_id, /*needsBackfill=*/false);
|
attachToSwap(r.job_id, /*needsBackfill=*/false);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
|
// 423 Locked: a reservation was acquired between our last poll and this click.
|
||||||
|
if (e.message && e.message.startsWith('423')) {
|
||||||
|
alert('The GPU swap path was just reserved by automation. Refreshing…');
|
||||||
|
pollCoordination();
|
||||||
|
} else {
|
||||||
alert('Failed to start swap: ' + e.message);
|
alert('Failed to start swap: ' + e.message);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
async function triggerDownloadForKey(modelKey) {
|
// ---- coordination layer: swap lock + schedule registry ----
|
||||||
const m = state.models[modelKey];
|
|
||||||
if (!m) return;
|
async function pollCoordination() {
|
||||||
if (dlState.job_id) {
|
|
||||||
alert('A download is already in progress; wait for it to finish.');
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
// Pick the download target from the model's mode:
|
|
||||||
// solo -> spark1 only
|
|
||||||
// cluster -> both Sparks (fetch on Spark 1, rsync to Spark 2 in parallel)
|
|
||||||
const dlMode = m.mode === 'cluster' ? 'cluster' : 'spark1';
|
|
||||||
const sizeNote = m.size_gb ? ` (~${m.size_gb} GB)` : '';
|
|
||||||
const target = m.mode === 'cluster' ? 'both Sparks' : 'Spark 1';
|
|
||||||
if (!confirm(`Download "${m.display_name}"${sizeNote} to ${target}? Large models can take a while; you can watch progress in the download panel.`)) {
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
dlState.last_repo = m.repo;
|
|
||||||
dlState.last_mode = dlMode;
|
|
||||||
try {
|
try {
|
||||||
const r = await fetchJSON('/api/download', {
|
state.lock = await fetchJSON('/api/swap/lock');
|
||||||
method: 'POST',
|
} catch { state.lock = { held: false }; }
|
||||||
headers: { 'content-type': 'application/json' },
|
try {
|
||||||
body: JSON.stringify({ repo: m.repo, mode: dlMode }),
|
const r = await fetchJSON('/api/schedule');
|
||||||
});
|
state.schedules = r.schedules || [];
|
||||||
// Open the download panel + attach to progress stream
|
} catch { state.schedules = []; }
|
||||||
openDownloadForm();
|
renderLockBanner();
|
||||||
attachToDownload(r.job_id);
|
renderSchedules();
|
||||||
} catch (e) {
|
renderCards(); // reflect lock state on the swap buttons
|
||||||
alert('Failed to start download: ' + e.message);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function renderLockBanner() {
|
||||||
|
const banner = el('#lock-banner');
|
||||||
|
if (!banner) return;
|
||||||
|
const lock = state.lock;
|
||||||
|
if (lock && lock.held) {
|
||||||
|
const until = lock.expires_at ? ` until ${fmtClock(lock.expires_at)}` : '';
|
||||||
|
const note = lock.note ? ` — ${escapeHtml(lock.note)}` : '';
|
||||||
|
el('#lock-text').innerHTML =
|
||||||
|
`GPU swap path reserved by <strong>${escapeHtml(lock.holder || 'automation')}</strong>${until}${note}. Manual swaps are paused.`;
|
||||||
|
banner.classList.remove('hidden');
|
||||||
|
} else {
|
||||||
|
banner.classList.add('hidden');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderSchedules() {
|
||||||
|
const panel = el('#schedule-panel');
|
||||||
|
const list = el('#schedule-list');
|
||||||
|
if (!panel || !list) return;
|
||||||
|
const items = state.schedules || [];
|
||||||
|
if (!items.length) {
|
||||||
|
panel.classList.add('hidden');
|
||||||
|
list.innerHTML = '';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
list.innerHTML = items.map((s) => {
|
||||||
|
const meta = [
|
||||||
|
s.cron ? `<code>${escapeHtml(s.cron)}</code>` : '',
|
||||||
|
s.next_run ? `next: ${escapeHtml(s.next_run)}` : '',
|
||||||
|
s.owner ? `by ${escapeHtml(s.owner)}` : '',
|
||||||
|
].filter(Boolean).join(' · ');
|
||||||
|
const desc = s.description ? `<div class="desc">${escapeHtml(s.description)}</div>` : '';
|
||||||
|
return `<div class="schedule-item">
|
||||||
|
<div class="name">${escapeHtml(s.name)}</div>
|
||||||
|
<div class="muted small">${meta}</div>
|
||||||
|
${desc}
|
||||||
|
</div>`;
|
||||||
|
}).join('');
|
||||||
|
panel.classList.remove('hidden');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function releaseLock() {
|
||||||
|
const lock = state.lock || {};
|
||||||
|
const who = lock.holder || 'automation';
|
||||||
|
if (!confirm(`Force-release the GPU reservation held by ${who}? Any job relying on it may then collide with a manual swap.`)) return;
|
||||||
|
try {
|
||||||
|
await fetchJSON('/api/swap/lock?force=true', { method: 'DELETE' });
|
||||||
|
} catch (e) {
|
||||||
|
alert('Failed to release: ' + e.message);
|
||||||
|
}
|
||||||
|
pollCoordination();
|
||||||
}
|
}
|
||||||
|
|
||||||
async function attachToSwap(jobId, needsBackfill) {
|
async function attachToSwap(jobId, needsBackfill) {
|
||||||
@@ -1495,12 +1588,14 @@ function handleDownloadDone(d) {
|
|||||||
el('#dl-title').textContent = 'Done';
|
el('#dl-title').textContent = 'Done';
|
||||||
el('#dl-phase').textContent = 'Done ✓';
|
el('#dl-phase').textContent = 'Done ✓';
|
||||||
el('#dl-progress-fill').style.width = '100%';
|
el('#dl-progress-fill').style.width = '100%';
|
||||||
// Offer to add to catalog
|
// The new model now appears on the menu (the menu is the disk). If it matched
|
||||||
|
// a known recipe it's ready to switch to; if not, offer to set it up.
|
||||||
const repo = dlState.last_repo;
|
const repo = dlState.last_repo;
|
||||||
const mode = dlState.last_mode;
|
loadModels().then(() => {
|
||||||
if (repo) {
|
if (!repo) return;
|
||||||
setTimeout(() => openCatalogDialog(repo, mode), 600);
|
const entry = Object.values(state.models).find(m => m.repo === repo);
|
||||||
}
|
if (entry && entry.needs_setup) setTimeout(() => openSetupDialog(repo, { thenSwap: false }), 600);
|
||||||
|
});
|
||||||
}
|
}
|
||||||
dlState.job_id = null;
|
dlState.job_id = null;
|
||||||
}
|
}
|
||||||
@@ -1613,21 +1708,67 @@ function openAdvanced(key) {
|
|||||||
dlg.showModal();
|
dlg.showModal();
|
||||||
}
|
}
|
||||||
|
|
||||||
function openCatalogDialog(repo, mode) {
|
// Context carried from openSetupDialog -> the submit handler: the inferred
|
||||||
|
// launch flags (parsers/MoE backend) and whether to swap right after saving.
|
||||||
|
let setupCtx = { key: '', repo: '', vllm_args: [], thenSwap: false };
|
||||||
|
|
||||||
|
// "Set up & switch" on a needs-setup card.
|
||||||
|
async function openSetupForKey(key) {
|
||||||
|
const m = state.models[key];
|
||||||
|
if (!m) return;
|
||||||
|
if (state.lock && state.lock.held) {
|
||||||
|
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
|
||||||
|
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
await openSetupDialog(m.repo, { thenSwap: true });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open the "set up this model" dialog, prefilled from inference (config.json +
|
||||||
|
// size). The operator confirms once; on save the recipe persists and (if
|
||||||
|
// thenSwap) we switch to it.
|
||||||
|
async function openSetupDialog(repo, opts = {}) {
|
||||||
const dlg = el('#catalog-dialog');
|
const dlg = el('#catalog-dialog');
|
||||||
const key = repo.split('/').pop().toLowerCase().replace(/[^a-z0-9_-]/g, '-');
|
let sug = null;
|
||||||
el('#cd-key').value = key;
|
try {
|
||||||
el('#cd-name').value = repo.split('/').pop();
|
sug = await fetchJSON(`/api/models/suggest?repo=${encodeURIComponent(repo)}`);
|
||||||
|
} catch (e) {
|
||||||
|
console.warn('recipe suggestion failed:', e.message);
|
||||||
|
}
|
||||||
|
const fallbackKey = repo.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
|
||||||
|
setupCtx = {
|
||||||
|
key: (sug && sug.key) || fallbackKey,
|
||||||
|
repo,
|
||||||
|
vllm_args: (sug && sug.vllm_args) || [],
|
||||||
|
thenSwap: !!opts.thenSwap,
|
||||||
|
};
|
||||||
|
el('#cd-key').value = setupCtx.key;
|
||||||
|
el('#cd-name').value = (sug && sug.display_name) || repo.split('/').pop();
|
||||||
el('#cd-repo').value = repo;
|
el('#cd-repo').value = repo;
|
||||||
el('#cd-size').value = '';
|
el('#cd-size').value = '';
|
||||||
el('#cd-mode').value = mode || 'solo';
|
el('#cd-mode').value = (sug && sug.mode) || 'solo';
|
||||||
el('#cd-desc').value = '';
|
el('#cd-desc').value = '';
|
||||||
el('#cd-mml').value = 32768;
|
const knobs = (sug && sug.knobs) || {};
|
||||||
el('#cd-gmu').value = 0.85;
|
el('#cd-mml').value = knobs.max_model_len || 32768;
|
||||||
el('#cd-gmu-out').value = '0.85';
|
el('#cd-gmu').value = knobs.gpu_memory_utilization || 0.85;
|
||||||
el('#cd-fst').checked = true;
|
el('#cd-gmu-out').value = parseFloat(el('#cd-gmu').value).toFixed(2);
|
||||||
el('#cd-pcache').checked = true;
|
el('#cd-fst').checked = knobs.fastsafetensors !== false;
|
||||||
el('#cd-fp8').checked = true;
|
el('#cd-pcache').checked = knobs.prefix_caching !== false;
|
||||||
|
el('#cd-fp8').checked = (knobs.kv_cache_dtype || 'fp8') === 'fp8';
|
||||||
|
|
||||||
|
const det = el('#cd-detected');
|
||||||
|
if (det) {
|
||||||
|
if (sug) {
|
||||||
|
const caps = (sug.capabilities || []).join(', ');
|
||||||
|
const flags = setupCtx.vllm_args.length ? `: <code>${escapeHtml(setupCtx.vllm_args.join(' '))}</code>` : '';
|
||||||
|
det.innerHTML = `Detected <strong>${escapeHtml(sug.family || 'Generic')}</strong>${caps ? ` · ${escapeHtml(caps)}` : ''}. Launch flags set automatically${flags}.`;
|
||||||
|
} else {
|
||||||
|
det.textContent = "Couldn't auto-detect this model's settings — pick mode and knobs manually.";
|
||||||
|
}
|
||||||
|
det.classList.remove('hidden');
|
||||||
|
}
|
||||||
|
const submit = el('#cd-submit');
|
||||||
|
if (submit) submit.textContent = setupCtx.thenSwap ? 'Save & switch' : 'Save settings';
|
||||||
dlg.showModal();
|
dlg.showModal();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1637,13 +1778,15 @@ function setupCatalogDialog() {
|
|||||||
el('#catalog-form').addEventListener('submit', async (e) => {
|
el('#catalog-form').addEventListener('submit', async (e) => {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
const body = {
|
const body = {
|
||||||
key: el('#cd-key').value.trim(),
|
key: el('#cd-key').value.trim() || setupCtx.key,
|
||||||
display_name: el('#cd-name').value.trim(),
|
display_name: el('#cd-name').value.trim(),
|
||||||
repo: el('#cd-repo').value.trim(),
|
repo: el('#cd-repo').value.trim(),
|
||||||
size_gb: parseFloat(el('#cd-size').value) || 0,
|
size_gb: parseFloat(el('#cd-size').value) || 0,
|
||||||
mode: el('#cd-mode').value,
|
mode: el('#cd-mode').value,
|
||||||
description: el('#cd-desc').value.trim() || null,
|
description: el('#cd-desc').value.trim() || null,
|
||||||
vllm_args: [],
|
// The inferred family flags (parsers / MoE backend); knob-controlled flags
|
||||||
|
// are layered on by the server from `knobs`, so no duplication.
|
||||||
|
vllm_args: setupCtx.vllm_args || [],
|
||||||
knobs: {
|
knobs: {
|
||||||
max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
|
max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
|
||||||
gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
|
gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
|
||||||
@@ -1661,8 +1804,9 @@ function setupCatalogDialog() {
|
|||||||
el('#catalog-dialog').close();
|
el('#catalog-dialog').close();
|
||||||
closeDownloadPanel();
|
closeDownloadPanel();
|
||||||
await loadModels();
|
await loadModels();
|
||||||
|
if (setupCtx.thenSwap) triggerSwap(body.key);
|
||||||
pollStatus();
|
pollStatus();
|
||||||
} catch (e) { alert('Add to catalog failed: ' + e.message); }
|
} catch (e) { alert('Saving the model setup failed: ' + e.message); }
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1671,6 +1815,60 @@ function setupAdvancedDialog() {
|
|||||||
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function openLocalModelDialog() {
|
||||||
|
const dlg = el('#local-model-dialog');
|
||||||
|
el('#lm-key').value = '';
|
||||||
|
el('#lm-name').value = '';
|
||||||
|
el('#lm-path').value = '';
|
||||||
|
el('#lm-chat').value = '';
|
||||||
|
el('#lm-size').value = '';
|
||||||
|
el('#lm-mode').value = 'solo';
|
||||||
|
el('#lm-desc').value = '';
|
||||||
|
el('#lm-mml').value = 32768;
|
||||||
|
el('#lm-gmu').value = 0.85;
|
||||||
|
el('#lm-gmu-out').value = '0.85';
|
||||||
|
el('#lm-fst').checked = true;
|
||||||
|
el('#lm-pcache').checked = true;
|
||||||
|
el('#lm-fp8').checked = true;
|
||||||
|
dlg.showModal();
|
||||||
|
}
|
||||||
|
|
||||||
|
function setupLocalModelDialog() {
|
||||||
|
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
|
||||||
|
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
||||||
|
el('#local-model-form').addEventListener('submit', async (e) => {
|
||||||
|
e.preventDefault();
|
||||||
|
const chat = el('#lm-chat').value.trim();
|
||||||
|
const body = {
|
||||||
|
key: el('#lm-key').value.trim(),
|
||||||
|
display_name: el('#lm-name').value.trim(),
|
||||||
|
local_path: el('#lm-path').value.trim(),
|
||||||
|
size_gb: parseFloat(el('#lm-size').value) || 0,
|
||||||
|
mode: el('#lm-mode').value,
|
||||||
|
description: el('#lm-desc').value.trim() || null,
|
||||||
|
// A fine-tune's chat template (if any) rides along as a launch flag.
|
||||||
|
vllm_args: chat ? [`--chat-template=${chat}`] : [],
|
||||||
|
knobs: {
|
||||||
|
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
|
||||||
|
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
|
||||||
|
fastsafetensors: el('#lm-fst').checked,
|
||||||
|
prefix_caching: el('#lm-pcache').checked,
|
||||||
|
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
|
||||||
|
},
|
||||||
|
};
|
||||||
|
try {
|
||||||
|
await fetchJSON('/api/models', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify(body),
|
||||||
|
});
|
||||||
|
el('#local-model-dialog').close();
|
||||||
|
await loadModels();
|
||||||
|
pollStatus();
|
||||||
|
} catch (e) { alert('Add local model failed: ' + e.message); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
// ===================== NIM installer =====================
|
// ===================== NIM installer =====================
|
||||||
|
|
||||||
const nimState = {
|
const nimState = {
|
||||||
@@ -1994,8 +2192,104 @@ function handleUpdateDone(d) {
|
|||||||
setTimeout(pollUpdates, 2000);
|
setTimeout(pollUpdates, 2000);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ===================== settings ('gear') =====================
|
||||||
|
// Renders the optional cluster knobs from /api/settings (server-driven field
|
||||||
|
// list, so adding a knob server-side needs no JS change) and POSTs edits back.
|
||||||
|
// The server reloads its config in place, so changes take effect immediately.
|
||||||
|
|
||||||
|
let settingsClearSentinel = '__clear__';
|
||||||
|
|
||||||
|
function renderSettingsForm(data) {
|
||||||
|
settingsClearSentinel = data.clear_sentinel || settingsClearSentinel;
|
||||||
|
const body = el('#settings-body');
|
||||||
|
body.innerHTML = (data.groups || []).map((g) => {
|
||||||
|
const rows = g.fields.map((f) => {
|
||||||
|
const help = f.help ? `<span class="muted small settings-help">${escapeHtml(f.help)}</span>` : '';
|
||||||
|
let input;
|
||||||
|
let clearToggle = '';
|
||||||
|
if (f.type === 'secret') {
|
||||||
|
const ph = f.set ? 'set — leave blank to keep' : (f.placeholder || '');
|
||||||
|
input = `<input type="password" autocomplete="off" data-key="${f.key}" data-secret="1" placeholder="${escapeHtml(ph)}">`;
|
||||||
|
// A stored secret is never echoed back, so blank means "keep". Offer an
|
||||||
|
// explicit way to remove it.
|
||||||
|
if (f.set) clearToggle = `<label class="settings-clear muted small"><input type="checkbox" data-clear-for="${f.key}"> clear stored value</label>`;
|
||||||
|
} else if (f.type === 'int') {
|
||||||
|
input = `<input type="number" min="1" max="65535" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
|
||||||
|
} else {
|
||||||
|
input = `<input type="text" autocomplete="off" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
|
||||||
|
}
|
||||||
|
return `<div class="settings-field"><label class="modal-row"><span>${escapeHtml(f.label)}</span>${input}</label>${clearToggle}${help}</div>`;
|
||||||
|
}).join('');
|
||||||
|
return `<fieldset class="modal-fieldset"><legend>${escapeHtml(g.name)}</legend>${rows}</fieldset>`;
|
||||||
|
}).join('');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function openSettingsDialog() {
|
||||||
|
const dlg = el('#settings-dialog');
|
||||||
|
const err = el('#settings-error');
|
||||||
|
err.classList.add('hidden');
|
||||||
|
el('#settings-body').innerHTML = '<p class="muted small">Loading…</p>';
|
||||||
|
dlg.showModal();
|
||||||
|
try {
|
||||||
|
renderSettingsForm(await fetchJSON('/api/settings'));
|
||||||
|
} catch (e) {
|
||||||
|
el('#settings-body').innerHTML = '';
|
||||||
|
err.textContent = 'Could not load settings: ' + e.message;
|
||||||
|
err.classList.remove('hidden');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function saveSettings(e) {
|
||||||
|
e.preventDefault();
|
||||||
|
const err = el('#settings-error');
|
||||||
|
err.classList.add('hidden');
|
||||||
|
const values = {};
|
||||||
|
$$('#settings-body [data-key]').forEach((inp) => {
|
||||||
|
const key = inp.dataset.key;
|
||||||
|
const v = inp.value.trim();
|
||||||
|
if (inp.dataset.secret) {
|
||||||
|
// "clear" checkbox wins; else a typed value sets it; else omit (keep the
|
||||||
|
// stored one — we can't see it to retype it).
|
||||||
|
const clear = el(`[data-clear-for="${key}"]`);
|
||||||
|
if (clear && clear.checked) values[key] = settingsClearSentinel;
|
||||||
|
else if (v) values[key] = v;
|
||||||
|
} else {
|
||||||
|
values[key] = v; // blank non-secret ⇒ server reverts it to the default
|
||||||
|
}
|
||||||
|
});
|
||||||
|
const btn = el('#settings-save');
|
||||||
|
btn.disabled = true;
|
||||||
|
try {
|
||||||
|
await fetchJSON('/api/settings', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({ values }),
|
||||||
|
});
|
||||||
|
el('#settings-dialog').close();
|
||||||
|
// Re-pull everything a knob can move: the Open WebUI link, health probes,
|
||||||
|
// service tiles, and the model menu (host/port changes alter all of them).
|
||||||
|
try {
|
||||||
|
state.config = await fetchJSON('/api/config');
|
||||||
|
const a = el('#open-webui-link');
|
||||||
|
if (state.config.open_webui_url) { a.href = state.config.open_webui_url; a.classList.remove('hidden'); }
|
||||||
|
else { a.classList.add('hidden'); }
|
||||||
|
} catch (e3) { console.warn('post-save /api/config refresh failed:', e3); }
|
||||||
|
pollStatus();
|
||||||
|
renderServices();
|
||||||
|
loadModels();
|
||||||
|
} catch (e2) {
|
||||||
|
err.textContent = 'Save failed: ' + e2.message.replace(/^\d+ [^:]*:\s*/, '');
|
||||||
|
err.classList.remove('hidden');
|
||||||
|
} finally {
|
||||||
|
btn.disabled = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
async function init() {
|
async function init() {
|
||||||
setupCopyButtons();
|
setupCopyButtons();
|
||||||
|
el('#open-settings').addEventListener('click', openSettingsDialog);
|
||||||
|
el('#settings-cancel').addEventListener('click', () => el('#settings-dialog').close());
|
||||||
|
el('#settings-form').addEventListener('submit', saveSettings);
|
||||||
el('#open-download').addEventListener('click', openDownloadForm);
|
el('#open-download').addEventListener('click', openDownloadForm);
|
||||||
el('#dl-cancel').addEventListener('click', closeDownloadPanel);
|
el('#dl-cancel').addEventListener('click', closeDownloadPanel);
|
||||||
el('#dl-start').addEventListener('click', startDownload);
|
el('#dl-start').addEventListener('click', startDownload);
|
||||||
@@ -2034,8 +2328,11 @@ async function init() {
|
|||||||
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
|
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
|
||||||
});
|
});
|
||||||
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
|
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
|
||||||
|
el('#open-local').addEventListener('click', openLocalModelDialog);
|
||||||
|
el('#lock-release').addEventListener('click', releaseLock);
|
||||||
setupCatalogDialog();
|
setupCatalogDialog();
|
||||||
setupAdvancedDialog();
|
setupAdvancedDialog();
|
||||||
|
setupLocalModelDialog();
|
||||||
// Open WebUI link from /api/config
|
// Open WebUI link from /api/config
|
||||||
try {
|
try {
|
||||||
state.config = await fetchJSON('/api/config');
|
state.config = await fetchJSON('/api/config');
|
||||||
@@ -2047,19 +2344,22 @@ async function init() {
|
|||||||
} catch {}
|
} catch {}
|
||||||
setupDashboardTabs();
|
setupDashboardTabs();
|
||||||
setupEndpointCollapse();
|
setupEndpointCollapse();
|
||||||
await loadModels();
|
// Fire the (SSH-backed) menu scan without awaiting — it self-renders a
|
||||||
|
// "Scanning…" state and fills in when it returns, so a slow/unreachable
|
||||||
|
// cluster never blocks first paint. pollStatus() below paints the rest.
|
||||||
|
loadModels();
|
||||||
await pollStatus();
|
await pollStatus();
|
||||||
await renderServices();
|
await renderServices();
|
||||||
|
pollCoordination();
|
||||||
pollHardware();
|
pollHardware();
|
||||||
pollUpdates();
|
pollUpdates();
|
||||||
// Disk-status probe runs after first paint — slow over SSH and not blocking.
|
|
||||||
loadDiskStatus();
|
|
||||||
// Speech-model patches panel — slow over SSH, runs after first paint.
|
// Speech-model patches panel — slow over SSH, runs after first paint.
|
||||||
renderSpeechModels();
|
renderSpeechModels();
|
||||||
setInterval(pollStatus, 5000);
|
setInterval(pollStatus, 5000);
|
||||||
|
setInterval(pollCoordination, 5000); // swap lock + schedule registry
|
||||||
setInterval(pollHardware, 8000); // every 8s
|
setInterval(pollHardware, 8000); // every 8s
|
||||||
setInterval(pollUpdates, 300000); // every 5 min
|
setInterval(pollUpdates, 300000); // every 5 min
|
||||||
setInterval(loadDiskStatus, 60000); // every 60s — disk state changes rarely
|
setInterval(loadModels, 60000); // every 60s — re-scan the Sparks for added/removed models
|
||||||
setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
|
setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
+74
-10
@@ -17,14 +17,28 @@
|
|||||||
<span class="muted">connecting…</span>
|
<span class="muted">connecting…</span>
|
||||||
</div>
|
</div>
|
||||||
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
|
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
|
||||||
|
<button id="open-settings" class="topbar-btn" type="button" title="Settings" aria-label="Open cluster settings">⚙ Settings</button>
|
||||||
</header>
|
</header>
|
||||||
|
|
||||||
<main>
|
<main>
|
||||||
<section id="setup-banner" class="banner hidden">
|
<section id="setup-banner" class="banner hidden">
|
||||||
<strong>Configuration needed.</strong>
|
<strong>Configuration needed.</strong>
|
||||||
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
|
<span>Run the <em>Configure Sparks</em> action in StartOS to set your two Spark IPs and SSH users. Everything else (ports, services, integrations) lives under <em>⚙ Settings</em> above.</span>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<dialog id="settings-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form" id="settings-form">
|
||||||
|
<h3>Settings</h3>
|
||||||
|
<p class="muted small">Optional cluster knobs — vLLM/service ports, container names, support-service hosts, and integrations. The two Spark IPs and SSH users are set once via the <em>Configure Sparks</em> action in StartOS; everything else is here. Changes apply immediately. Stored on this server and included in StartOS backups.</p>
|
||||||
|
<div id="settings-body" class="settings-body"><p class="muted small">Loading…</p></div>
|
||||||
|
<p id="settings-error" class="muted small dd-error hidden"></p>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="settings-cancel" class="btn">Cancel</button>
|
||||||
|
<button type="submit" id="settings-save" class="btn primary">Save</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
|
||||||
<section id="hardware-panel" class="hardware-panel hidden">
|
<section id="hardware-panel" class="hardware-panel hidden">
|
||||||
<div class="section-header">
|
<div class="section-header">
|
||||||
<h2 class="section-title">Spark hardware</h2>
|
<h2 class="section-title">Spark hardware</h2>
|
||||||
@@ -96,6 +110,13 @@
|
|||||||
</details>
|
</details>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section id="lock-banner" class="banner lock-banner hidden">
|
||||||
|
<span class="lock-icon" aria-hidden="true">🔒</span>
|
||||||
|
<span id="lock-text">GPU swap path reserved</span>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<button id="lock-release" class="btn small-btn">Release</button>
|
||||||
|
</section>
|
||||||
|
|
||||||
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
|
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
|
||||||
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
|
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
|
||||||
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
|
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
|
||||||
@@ -229,13 +250,15 @@
|
|||||||
<div class="section-header">
|
<div class="section-header">
|
||||||
<h2 class="section-title">LLM swap</h2>
|
<h2 class="section-title">LLM swap</h2>
|
||||||
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
||||||
|
<button id="open-local" class="btn small-btn">+ Add local model</button>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<dialog id="catalog-dialog" class="modal">
|
<dialog id="catalog-dialog" class="modal">
|
||||||
<form method="dialog" class="modal-form" id="catalog-form">
|
<form method="dialog" class="modal-form" id="catalog-form">
|
||||||
<h3>Add downloaded model to catalog</h3>
|
<h3>Set up this model</h3>
|
||||||
<p class="muted small">It will appear as a new card you can swap to. Knob values become its default launch flags — you can tweak later via the model's "Advanced" panel.</p>
|
<p class="muted small">This model is downloaded, but Spark Control needs to know how to launch it. We've guessed from the model's own files — confirm or adjust, and it's saved so you're never asked again.</p>
|
||||||
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
<p id="cd-detected" class="muted small cd-detected hidden"></p>
|
||||||
|
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+" readonly></label>
|
||||||
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
|
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
|
||||||
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
|
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
|
||||||
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
|
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
|
||||||
@@ -256,21 +279,52 @@
|
|||||||
</fieldset>
|
</fieldset>
|
||||||
<div class="modal-actions">
|
<div class="modal-actions">
|
||||||
<button type="button" id="cd-cancel" class="btn">Cancel</button>
|
<button type="button" id="cd-cancel" class="btn">Cancel</button>
|
||||||
<button type="submit" class="btn primary">Add to catalog</button>
|
<button type="submit" id="cd-submit" class="btn primary">Save settings</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
|
||||||
|
<dialog id="local-model-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form" id="local-model-form">
|
||||||
|
<h3>Add a local / fine-tuned model</h3>
|
||||||
|
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
|
||||||
|
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
||||||
|
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
|
||||||
|
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
|
||||||
|
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
|
||||||
|
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
|
||||||
|
<label class="modal-row"><span>Mode</span>
|
||||||
|
<select id="lm-mode">
|
||||||
|
<option value="solo">solo (Spark 1 only)</option>
|
||||||
|
<option value="cluster">cluster (both Sparks via Ray)</option>
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
|
||||||
|
<fieldset class="modal-fieldset">
|
||||||
|
<legend>Default launch knobs</legend>
|
||||||
|
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
|
||||||
|
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
|
||||||
|
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
|
||||||
|
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
|
||||||
|
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
|
||||||
|
</fieldset>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="lm-cancel" class="btn">Cancel</button>
|
||||||
|
<button type="submit" class="btn primary">Add local model</button>
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
|
|
||||||
<dialog id="disk-delete-dialog" class="modal">
|
<dialog id="disk-delete-dialog" class="modal">
|
||||||
<form method="dialog" class="modal-form">
|
<form method="dialog" class="modal-form">
|
||||||
<h3>Delete model weights from disk?</h3>
|
<h3>Remove this model from the Sparks?</h3>
|
||||||
<p id="dd-summary" class="muted small"></p>
|
<p id="dd-summary" class="muted small"></p>
|
||||||
<ul class="muted small dd-hosts" id="dd-hosts"></ul>
|
<ul class="muted small dd-hosts" id="dd-hosts"></ul>
|
||||||
<p class="muted small">This is reversible — you can re-download from the catalog at any time. The catalog entry stays intact.</p>
|
<p class="muted small">This deletes the weights and removes the card from the menu. You can always download it again later (re-downloading restores its saved settings).</p>
|
||||||
<p id="dd-error" class="muted small dd-error hidden"></p>
|
<p id="dd-error" class="muted small dd-error hidden"></p>
|
||||||
<div class="modal-actions">
|
<div class="modal-actions">
|
||||||
<button type="button" id="dd-cancel" class="btn">Cancel</button>
|
<button type="button" id="dd-cancel" class="btn">Cancel</button>
|
||||||
<button type="button" id="dd-confirm" class="btn danger">Delete from disk</button>
|
<button type="button" id="dd-confirm" class="btn danger">Remove from disk & menu</button>
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
@@ -311,15 +365,17 @@
|
|||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
|
|
||||||
|
|
||||||
<section id="download-panel" class="download-panel hidden">
|
<section id="download-panel" class="download-panel hidden">
|
||||||
<div class="download-form" id="download-form">
|
<div class="download-form" id="download-form">
|
||||||
<label class="dl-row">
|
<label class="dl-row">
|
||||||
<span class="dl-label">HuggingFace repo</span>
|
<span class="dl-label">HuggingFace repo</span>
|
||||||
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
|
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off" list="dl-suggestions">
|
||||||
|
<datalist id="dl-suggestions"></datalist>
|
||||||
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
|
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
|
||||||
</label>
|
</label>
|
||||||
<div class="dl-help muted small">
|
<div class="dl-help muted small">
|
||||||
<a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
|
Type any repo, or pick a known one from the list. <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
|
||||||
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
|
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
|
||||||
</div>
|
</div>
|
||||||
<div class="dl-row">
|
<div class="dl-row">
|
||||||
@@ -362,6 +418,14 @@
|
|||||||
<section id="cards" class="cards"></section>
|
<section id="cards" class="cards"></section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section id="schedule-panel" class="schedule-panel hidden">
|
||||||
|
<div class="section-header">
|
||||||
|
<h2 class="section-title">Scheduled jobs</h2>
|
||||||
|
</div>
|
||||||
|
<p class="muted small">Registered by your own automation. Spark Control only displays these — it doesn't run them.</p>
|
||||||
|
<div id="schedule-list" class="schedule-list"></div>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section id="update-banner" class="update-banner hidden">
|
<section id="update-banner" class="update-banner hidden">
|
||||||
<div class="ub-context muted small">
|
<div class="ub-context muted small">
|
||||||
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
|
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
|
||||||
|
|||||||
@@ -74,6 +74,42 @@ main {
|
|||||||
}
|
}
|
||||||
.banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; }
|
.banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; }
|
||||||
|
|
||||||
|
/* GPU swap reservation (coordination layer) — informational, not a warning. */
|
||||||
|
.lock-banner {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 10px;
|
||||||
|
border-color: var(--info);
|
||||||
|
color: var(--info);
|
||||||
|
}
|
||||||
|
.lock-banner .lock-icon { font-size: 16px; }
|
||||||
|
.lock-banner strong { color: var(--text); }
|
||||||
|
.lock-banner .spacer { flex: 1; }
|
||||||
|
|
||||||
|
/* Scheduled-jobs panel — read-only view of what external automation registered. */
|
||||||
|
.schedule-panel { margin-top: 8px; }
|
||||||
|
.schedule-list {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(240px, 1fr));
|
||||||
|
gap: 12px;
|
||||||
|
margin-top: 8px;
|
||||||
|
}
|
||||||
|
.schedule-item {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius);
|
||||||
|
padding: 12px 14px;
|
||||||
|
}
|
||||||
|
.schedule-item .name { font-weight: 600; margin-bottom: 4px; }
|
||||||
|
.schedule-item code {
|
||||||
|
background: var(--surface-2);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 4px;
|
||||||
|
padding: 1px 5px;
|
||||||
|
font-size: 12px;
|
||||||
|
}
|
||||||
|
.schedule-item .desc { margin-top: 6px; color: var(--muted); font-size: 13px; }
|
||||||
|
|
||||||
/* ===== Endpoint panel ===== */
|
/* ===== Endpoint panel ===== */
|
||||||
|
|
||||||
.endpoint-panel {
|
.endpoint-panel {
|
||||||
@@ -694,6 +730,7 @@ main {
|
|||||||
.card .repo a { color: inherit; text-decoration: none; }
|
.card .repo a { color: inherit; text-decoration: none; }
|
||||||
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
||||||
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
||||||
|
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
|
||||||
.tag {
|
.tag {
|
||||||
background: var(--surface-2);
|
background: var(--surface-2);
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
@@ -738,8 +775,15 @@ main {
|
|||||||
.card .adv-btn,
|
.card .adv-btn,
|
||||||
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
||||||
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
||||||
|
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
||||||
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
||||||
|
.tag.setup-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
||||||
|
.card.needs-setup { border-style: dashed; }
|
||||||
|
.card-actions .btn[data-setup-key] { flex: 1; }
|
||||||
|
.empty-menu { grid-column: 1 / -1; padding: 28px 16px; text-align: center; border: 1px dashed var(--border); border-radius: 10px; }
|
||||||
|
.cd-detected { padding: 8px 10px; border: 1px solid var(--border); border-radius: 8px; background: rgba(255,255,255,0.02); }
|
||||||
|
.cd-detected code { word-break: break-all; }
|
||||||
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
||||||
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
|
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
|
||||||
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
|
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
|
||||||
@@ -920,3 +964,13 @@ main {
|
|||||||
.tab-content.active { display: block; }
|
.tab-content.active { display: block; }
|
||||||
|
|
||||||
/* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */
|
/* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */
|
||||||
|
|
||||||
|
/* ===== Settings ('gear') dialog ===== */
|
||||||
|
.modal#settings-dialog { max-width: 560px; }
|
||||||
|
/* Cap the (tall) form so the Save/Cancel actions stay reachable; the grouped
|
||||||
|
fields scroll within. */
|
||||||
|
#settings-body { max-height: 60vh; overflow-y: auto; padding-right: 6px; display: flex; flex-direction: column; gap: 12px; }
|
||||||
|
.settings-field { display: flex; flex-direction: column; gap: 2px; }
|
||||||
|
.settings-help { display: block; line-height: 1.35; }
|
||||||
|
.settings-clear { display: inline-flex; align-items: center; gap: 6px; margin-top: 2px; cursor: pointer; }
|
||||||
|
.settings-clear input { width: auto; }
|
||||||
|
|||||||
+25
-2
@@ -6,7 +6,9 @@ from datetime import datetime, timezone
|
|||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
|
from .coordination import WebhookNotifier, build_webhook_payload
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run, ssh_stream, StreamHandle
|
from .ssh import ssh_run, ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
@@ -32,9 +34,15 @@ class SwapJob:
|
|||||||
|
|
||||||
|
|
||||||
class SwapManager:
|
class SwapManager:
|
||||||
def __init__(self, settings: Settings, catalog: Catalog) -> None:
|
def __init__(
|
||||||
|
self,
|
||||||
|
settings: Settings,
|
||||||
|
catalog: Catalog,
|
||||||
|
notifier: Optional[WebhookNotifier] = None,
|
||||||
|
) -> None:
|
||||||
self.settings = settings
|
self.settings = settings
|
||||||
self.catalog = catalog
|
self.catalog = catalog
|
||||||
|
self.notifier = notifier
|
||||||
self.lock = asyncio.Lock()
|
self.lock = asyncio.Lock()
|
||||||
self.jobs: dict[str, SwapJob] = {}
|
self.jobs: dict[str, SwapJob] = {}
|
||||||
self.current_job_id: Optional[str] = None
|
self.current_job_id: Optional[str] = None
|
||||||
@@ -77,6 +85,21 @@ class SwapManager:
|
|||||||
job.finished_at = datetime.now(timezone.utc).isoformat()
|
job.finished_at = datetime.now(timezone.utc).isoformat()
|
||||||
if self.current_job_id == job.id:
|
if self.current_job_id == job.id:
|
||||||
self.current_job_id = None
|
self.current_job_id = None
|
||||||
|
# Outside the swap lock (so a webhook POST can't stall a queued swap) and
|
||||||
|
# only for real swaps — a dry run never changes the running model. A
|
||||||
|
# webhook failure is logged inside fire(), never raised.
|
||||||
|
if self.notifier is not None and self.notifier.enabled and not job.dry_run:
|
||||||
|
event = "swap_complete" if job.state == "ready" else "swap_failed"
|
||||||
|
await self.notifier.fire(event, build_webhook_payload(
|
||||||
|
event=event,
|
||||||
|
job_id=job.id,
|
||||||
|
model_key=job.model_key,
|
||||||
|
state=job.state,
|
||||||
|
returncode=job.returncode,
|
||||||
|
started_at=job.started_at,
|
||||||
|
finished_at=job.finished_at,
|
||||||
|
dry_run=job.dry_run,
|
||||||
|
))
|
||||||
|
|
||||||
async def _do(self, job: SwapJob) -> None:
|
async def _do(self, job: SwapJob) -> None:
|
||||||
model = self.catalog.models[job.model_key]
|
model = self.catalog.models[job.model_key]
|
||||||
@@ -112,7 +135,7 @@ class SwapManager:
|
|||||||
|
|
||||||
# Step 3: tail logs until the ready marker (or timeout)
|
# Step 3: tail logs until the ready marker (or timeout)
|
||||||
job.state = "tailing"
|
job.state = "tailing"
|
||||||
tail_cmd = "docker logs -f --tail 50 vllm_node"
|
tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
|
||||||
job.append(f"$ {tail_cmd}")
|
job.append(f"$ {tail_cmd}")
|
||||||
timeout = max(model.expected_ready_seconds * 2, 600)
|
timeout = max(model.expected_ready_seconds * 2, 600)
|
||||||
handle = StreamHandle()
|
handle = StreamHandle()
|
||||||
|
|||||||
@@ -22,6 +22,7 @@ from typing import Any
|
|||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
@@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
|
|||||||
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
||||||
# reads from stdin to avoid shell-escaping the args themselves.
|
# reads from stdin to avoid shell-escaping the args themselves.
|
||||||
cmd = (
|
cmd = (
|
||||||
f"echo '{payload}' | docker exec -i vllm_node python3 -c "
|
f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
|
||||||
+ shlex.quote(_VALIDATOR_SCRIPT)
|
+ shlex.quote(_VALIDATOR_SCRIPT)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
+46
-38
@@ -1,9 +1,14 @@
|
|||||||
# spark-control model catalog
|
# spark-control launch recipes
|
||||||
#
|
#
|
||||||
# Edit this file (or override at runtime via the StartOS "Edit Model Catalog"
|
# These are NOT the dashboard menu. The menu is whatever is actually downloaded
|
||||||
# action) to add or change available models.
|
# on the Sparks — Spark Control scans the Hugging Face cache on each load and
|
||||||
|
# shows what it finds. These entries are launch *recipes*: matched to an on-disk
|
||||||
|
# model by `repo`, they say HOW to launch it. A downloaded model with no recipe
|
||||||
|
# here shows up as "needs setup", and the dashboard infers + saves one on first
|
||||||
|
# use (from the model's own config.json). Add a recipe to make a known model
|
||||||
|
# launch correctly the moment it's downloaded, with no setup prompt.
|
||||||
#
|
#
|
||||||
# Each model entry produces this command on Spark 1:
|
# Each recipe produces this command on Spark 1:
|
||||||
# cd ~/spark-vllm-docker
|
# cd ~/spark-vllm-docker
|
||||||
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
|
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
|
||||||
# --port=<defaults.port> --host=<defaults.host> <vllm_args...>
|
# --port=<defaults.port> --host=<defaults.host> <vllm_args...>
|
||||||
@@ -54,6 +59,34 @@ models:
|
|||||||
- --enable-prefix-caching
|
- --enable-prefix-caching
|
||||||
- --kv-cache-dtype=fp8
|
- --kv-cache-dtype=fp8
|
||||||
|
|
||||||
|
gemma4-26b:
|
||||||
|
display_name: "Gemma 4 26B-A4B (vision, light)"
|
||||||
|
description: >-
|
||||||
|
Lighter, faster sibling of the Gemma 4 31B above: a Mixture-of-Experts
|
||||||
|
model with 26B total parameters but only ~4B active per token, so it
|
||||||
|
generates quickly. Takes images as well as text (good for tasks like
|
||||||
|
reading a business card into structured text). Reasoning is a bit
|
||||||
|
shallower than the dense 31B. Runs solo on one Spark.
|
||||||
|
repo: nvidia/Gemma-4-26B-A4B-NVFP4
|
||||||
|
size_gb: 17
|
||||||
|
mode: solo
|
||||||
|
capabilities: [vision, reasoning, tools]
|
||||||
|
expected_ready_seconds: 240
|
||||||
|
vllm_args:
|
||||||
|
- --gpu-memory-utilization=0.8
|
||||||
|
- --max-model-len=32768
|
||||||
|
- --max-num-batched-tokens=16384
|
||||||
|
- --reasoning-parser=gemma4
|
||||||
|
- --tool-call-parser=gemma4
|
||||||
|
- --enable-auto-tool-choice
|
||||||
|
# MoE backend: research found this model's expert layers fall back to
|
||||||
|
# 'marlin' on GB10 (the fast flashinfer_cutlass path errors on sm_121).
|
||||||
|
# If a swap fails to start, this flag is the first thing to flip.
|
||||||
|
- --moe_backend=marlin
|
||||||
|
- --load-format=fastsafetensors
|
||||||
|
- --enable-prefix-caching
|
||||||
|
- --kv-cache-dtype=fp8
|
||||||
|
|
||||||
qwen36:
|
qwen36:
|
||||||
display_name: "Qwen3.6 35B-A3B (daily driver)"
|
display_name: "Qwen3.6 35B-A3B (daily driver)"
|
||||||
description: >-
|
description: >-
|
||||||
@@ -63,7 +96,10 @@ models:
|
|||||||
repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4
|
repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4
|
||||||
size_gb: 20
|
size_gb: 20
|
||||||
mode: solo
|
mode: solo
|
||||||
capabilities: [reasoning]
|
# Qwen3.6-35B-A3B is natively multimodal (Qwen3_5MoeForConditionalGeneration,
|
||||||
|
# vision tower ships in the checkpoint). Confirmed reading a business card
|
||||||
|
# cleanly on this cluster — use the "Vision check" button on the live card.
|
||||||
|
capabilities: [vision, reasoning]
|
||||||
expected_ready_seconds: 300
|
expected_ready_seconds: 300
|
||||||
vllm_args:
|
vllm_args:
|
||||||
- --gpu-memory-utilization=0.85
|
- --gpu-memory-utilization=0.85
|
||||||
@@ -74,36 +110,8 @@ models:
|
|||||||
- --load-format=fastsafetensors
|
- --load-format=fastsafetensors
|
||||||
- --enable-prefix-caching
|
- --enable-prefix-caching
|
||||||
- --kv-cache-dtype=fp8
|
- --kv-cache-dtype=fp8
|
||||||
|
# Cap image resolution: a large phone photo (e.g. 12MP) otherwise expands
|
||||||
qwen3-235b-fp8:
|
# to ~11.8k vision tokens, blowing past vLLM's ~4096-image-token limit and
|
||||||
display_name: "Qwen3 235B-A22B FP8 (legacy)"
|
# getting rejected with a 400. ~2MP auto-downscales big images server-side
|
||||||
description: >-
|
# (so every /v1 consumer is covered) while staying sharp enough for OCR.
|
||||||
Earlier generation of the Qwen 235B family in native FP8 precision.
|
- '--mm-processor-kwargs={"max_pixels": 2000000}'
|
||||||
Runs across both Sparks. Mostly superseded by Qwen3-VL above; keep
|
|
||||||
around for text-only baseline comparisons.
|
|
||||||
repo: Qwen/Qwen3-235B-A22B-FP8
|
|
||||||
size_gb: 220
|
|
||||||
mode: cluster
|
|
||||||
capabilities: []
|
|
||||||
expected_ready_seconds: 360
|
|
||||||
vllm_args:
|
|
||||||
- --gpu-memory-utilization=0.7
|
|
||||||
- -tp=2
|
|
||||||
- --distributed-executor-backend=ray
|
|
||||||
- --max-model-len=32768
|
|
||||||
|
|
||||||
qwen25-72b:
|
|
||||||
display_name: "Qwen2.5 72B (legacy)"
|
|
||||||
description: >-
|
|
||||||
Last-generation 72B dense model. Cluster mode required due to size.
|
|
||||||
Kept for compatibility and baseline comparison against newer Qwens.
|
|
||||||
repo: Qwen/Qwen2.5-72B-Instruct
|
|
||||||
size_gb: 145
|
|
||||||
mode: cluster
|
|
||||||
capabilities: []
|
|
||||||
expected_ready_seconds: 360
|
|
||||||
vllm_args:
|
|
||||||
- --gpu-memory-utilization=0.7
|
|
||||||
- -tp=2
|
|
||||||
- --distributed-executor-backend=ray
|
|
||||||
- --max-model-len=32768
|
|
||||||
|
|||||||
@@ -15,3 +15,6 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
|||||||
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
|
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
|
||||||
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
|
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
|
||||||
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
|
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
|
||||||
|
# Keep the in-app settings overlay off the container-only /data path; tests that
|
||||||
|
# care about its contents point it at their own tmp file via monkeypatch.
|
||||||
|
os.environ.setdefault("APP_SETTINGS_FILE", "/tmp/spark_control_test_app_settings.json")
|
||||||
|
|||||||
@@ -0,0 +1,174 @@
|
|||||||
|
"""In-app settings overlay (the dashboard 'gear') + swap-lock routing regression.
|
||||||
|
|
||||||
|
Covers app_settings (the /data overlay backing the gear): first-run seeding from
|
||||||
|
env (the migration path), known-key filtering, apply() validation, secret
|
||||||
|
masking — and, end-to-end via TestClient, that POST /api/settings reloads the
|
||||||
|
shared Settings instance live, and that GET /api/swap/lock is no longer shadowed
|
||||||
|
by /api/swap/{job_id}.
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from app import app_settings
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def overlay_file(tmp_path, monkeypatch):
|
||||||
|
p = tmp_path / "app_settings.json"
|
||||||
|
monkeypatch.setenv("APP_SETTINGS_FILE", str(p))
|
||||||
|
return p
|
||||||
|
|
||||||
|
|
||||||
|
# ---- overlay store ----
|
||||||
|
|
||||||
|
def test_seed_from_env_filters_unknown_and_blank(overlay_file):
|
||||||
|
# An existing install upgrading in: values previously set via the StartOS
|
||||||
|
# action arrive as env; only known, non-empty keys migrate into the overlay.
|
||||||
|
app_settings.seed_from_env({
|
||||||
|
"VLLM_PORT": "8000",
|
||||||
|
"QDRANT_COLLECTION": "", # blank → skipped
|
||||||
|
"TOTALLY_UNKNOWN": "x", # not a gear key → skipped
|
||||||
|
"PARAKEET_PORT": "8010",
|
||||||
|
})
|
||||||
|
expected = {"VLLM_PORT": "8000", "PARAKEET_PORT": "8010"}
|
||||||
|
assert app_settings.load_overlay() == expected
|
||||||
|
assert json.loads(overlay_file.read_text()) == expected
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_is_a_one_time_noop_when_file_present(overlay_file):
|
||||||
|
overlay_file.write_text(json.dumps({"VLLM_PORT": "8000", "BOGUS": "y", "NGC_API_KEY": ""}))
|
||||||
|
app_settings.seed_from_env({"VLLM_PORT": "9999"}) # file exists ⇒ no-op
|
||||||
|
# unknown + blank keys dropped on read; existing value untouched by the seed.
|
||||||
|
assert app_settings.load_overlay() == {"VLLM_PORT": "8000"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_file_is_empty_and_seed_of_blank_env_writes_nothing(overlay_file):
|
||||||
|
assert app_settings.load_overlay() == {}
|
||||||
|
app_settings.seed_from_env({"VLLM_PORT": "", "QDRANT_COLLECTION": ""})
|
||||||
|
assert not overlay_file.exists() # nothing worth seeding ⇒ no file
|
||||||
|
assert app_settings.load_overlay() == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_set_then_blank_deletes(overlay_file):
|
||||||
|
app_settings.apply({"VLLM_PORT": "8000"})
|
||||||
|
assert app_settings.load_overlay()["VLLM_PORT"] == "8000"
|
||||||
|
app_settings.apply({"VLLM_PORT": ""}) # blank non-secret ⇒ revert to default
|
||||||
|
assert "VLLM_PORT" not in app_settings.load_overlay()
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_rejects_unknown_key(overlay_file):
|
||||||
|
with pytest.raises(app_settings.SettingsError):
|
||||||
|
app_settings.apply({"NOT_A_KNOB": "x"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_rejects_non_numeric_port(overlay_file):
|
||||||
|
with pytest.raises(app_settings.SettingsError):
|
||||||
|
app_settings.apply({"PARAKEET_PORT": "80x0"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_rejects_control_chars(overlay_file):
|
||||||
|
with pytest.raises(app_settings.SettingsError):
|
||||||
|
app_settings.apply({"QDRANT_COLLECTION": "a\nb"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_secret_blank_keeps_existing(overlay_file):
|
||||||
|
app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
|
||||||
|
app_settings.apply({"NGC_API_KEY": ""}) # blank secret ⇒ leave it in place
|
||||||
|
assert app_settings.load_overlay()["NGC_API_KEY"] == "nvapi-abc"
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_rejects_out_of_range_port(overlay_file):
|
||||||
|
for bad in ("0", "99999", "65536"):
|
||||||
|
with pytest.raises(app_settings.SettingsError):
|
||||||
|
app_settings.apply({"VLLM_PORT": bad})
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_accepts_port_bounds(overlay_file):
|
||||||
|
app_settings.apply({"VLLM_PORT": "1", "PARAKEET_PORT": "65535"})
|
||||||
|
o = app_settings.load_overlay()
|
||||||
|
assert o["VLLM_PORT"] == "1" and o["PARAKEET_PORT"] == "65535"
|
||||||
|
|
||||||
|
|
||||||
|
def test_secret_clear_sentinel_removes(overlay_file):
|
||||||
|
app_settings.apply({"NGC_API_KEY": "nvapi-abc"})
|
||||||
|
app_settings.apply({"NGC_API_KEY": app_settings.CLEAR_SENTINEL})
|
||||||
|
assert "NGC_API_KEY" not in app_settings.load_overlay()
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_skips_invalid_and_strips(overlay_file):
|
||||||
|
app_settings.seed_from_env({
|
||||||
|
"VLLM_PORT": "8000\n", # trailing newline → stripped
|
||||||
|
"PARAKEET_PORT": "99999", # out of range → skipped, not written
|
||||||
|
"QDRANT_COLLECTION": "crm",
|
||||||
|
})
|
||||||
|
o = app_settings.load_overlay()
|
||||||
|
assert o["VLLM_PORT"] == "8000"
|
||||||
|
assert "PARAKEET_PORT" not in o
|
||||||
|
assert o["QDRANT_COLLECTION"] == "crm"
|
||||||
|
|
||||||
|
|
||||||
|
def test_public_view_exposes_clear_sentinel(overlay_file):
|
||||||
|
assert app_settings.public_view()["clear_sentinel"] == app_settings.CLEAR_SENTINEL
|
||||||
|
|
||||||
|
|
||||||
|
def test_public_view_masks_secrets_and_groups(overlay_file):
|
||||||
|
app_settings.apply({"NGC_API_KEY": "nvapi-abc", "VLLM_PORT": "8000"})
|
||||||
|
view = app_settings.public_view()
|
||||||
|
fields = {f["key"]: f for g in view["groups"] for f in g["fields"]}
|
||||||
|
# Secret: value never echoed to the browser, only a set flag.
|
||||||
|
assert "value" not in fields["NGC_API_KEY"]
|
||||||
|
assert fields["NGC_API_KEY"]["set"] is True
|
||||||
|
# Non-secret: current value present for prefill.
|
||||||
|
assert fields["VLLM_PORT"]["value"] == "8000"
|
||||||
|
assert {g["name"] for g in view["groups"]} >= {"vLLM (Spark 1)", "Integrations"}
|
||||||
|
# The previously-missing support-service ports are now exposed.
|
||||||
|
assert {"PARAKEET_PORT", "KOKORO_PORT", "EMBED_PORT", "QDRANT_PORT"} <= set(fields)
|
||||||
|
|
||||||
|
|
||||||
|
# ---- end-to-end (TestClient): live reload + route order ----
|
||||||
|
# TestClient is created without the `with` context manager so app startup events
|
||||||
|
# (the deep-health poll loop) don't run — these stay fully offline.
|
||||||
|
|
||||||
|
def _client(monkeypatch, tmp_path):
|
||||||
|
monkeypatch.setenv("APP_SETTINGS_FILE", str(tmp_path / "live.json"))
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
from app import server
|
||||||
|
return TestClient(server.app)
|
||||||
|
|
||||||
|
|
||||||
|
def test_swap_lock_get_is_not_shadowed(monkeypatch, tmp_path):
|
||||||
|
client = _client(monkeypatch, tmp_path)
|
||||||
|
r = client.get("/api/swap/lock")
|
||||||
|
# Regression: must hit get_swap_lock (200, {"held": False}), NOT the
|
||||||
|
# /api/swap/{job_id} catch-all that returns 404 "no such job".
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert r.json() == {"held": False}
|
||||||
|
|
||||||
|
|
||||||
|
def test_settings_apply_is_live_without_restart(monkeypatch, tmp_path):
|
||||||
|
client = _client(monkeypatch, tmp_path)
|
||||||
|
r = client.post("/api/settings", json={"values": {"VLLM_PORT": "8123"}})
|
||||||
|
assert r.status_code == 200
|
||||||
|
# Settings reloaded in place ⇒ /api/config reflects it immediately.
|
||||||
|
assert client.get("/api/config").json()["vllm_port"] == 8123
|
||||||
|
# And clearing it reverts to the default, still live.
|
||||||
|
client.post("/api/settings", json={"values": {"VLLM_PORT": ""}})
|
||||||
|
assert client.get("/api/config").json()["vllm_port"] == 8888
|
||||||
|
|
||||||
|
|
||||||
|
def test_settings_post_rejects_bad_value(monkeypatch, tmp_path):
|
||||||
|
client = _client(monkeypatch, tmp_path)
|
||||||
|
r = client.post("/api/settings", json={"values": {"PARAKEET_PORT": "nope"}})
|
||||||
|
assert r.status_code == 422
|
||||||
|
|
||||||
|
|
||||||
|
def test_webhook_notifier_repoints_live(monkeypatch, tmp_path):
|
||||||
|
# WebhookNotifier snapshots url/secret, so reload() alone can't reach it;
|
||||||
|
# post_settings must re-point it. Regression for that P1.
|
||||||
|
client = _client(monkeypatch, tmp_path)
|
||||||
|
from app import server
|
||||||
|
client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": "https://example.test/hook"}})
|
||||||
|
assert server.swap_webhook.url == "https://example.test/hook"
|
||||||
|
assert server.swap_webhook.enabled
|
||||||
|
client.post("/api/settings", json={"values": {"SWAP_WEBHOOK_URL": ""}})
|
||||||
|
assert server.swap_webhook.url == ""
|
||||||
@@ -0,0 +1,201 @@
|
|||||||
|
"""Coordination layer: swap lock lifecycle/expiry, schedule registry CRUD, and
|
||||||
|
the webhook payload+signature. All offline — the lock takes an injectable `now`
|
||||||
|
so expiry is tested without sleeping, and the webhook is exercised only on the
|
||||||
|
disabled (no-network) path plus its pure payload/signature helpers.
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from app.coordination import (
|
||||||
|
LOCK_TTL_MAX,
|
||||||
|
LOCK_TTL_MIN,
|
||||||
|
LockHeld,
|
||||||
|
ScheduleRegistry,
|
||||||
|
SwapLockManager,
|
||||||
|
WebhookNotifier,
|
||||||
|
build_webhook_payload,
|
||||||
|
sign_payload,
|
||||||
|
valid_schedule_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
T0 = datetime(2026, 6, 17, 12, 0, 0, tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------- swap lock ----
|
||||||
|
|
||||||
|
def test_acquire_free_lock_returns_token_and_status_held():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
lock = mgr.acquire("openclaw", ttl_seconds=60, note="daily vol", now=T0)
|
||||||
|
assert lock.token
|
||||||
|
st = mgr.status(now=T0)
|
||||||
|
assert st["held"] is True
|
||||||
|
assert st["holder"] == "openclaw"
|
||||||
|
assert st["note"] == "daily vol"
|
||||||
|
assert st["seconds_remaining"] == 60
|
||||||
|
assert "token" not in st # public view never leaks the token
|
||||||
|
|
||||||
|
|
||||||
|
def test_acquire_requires_holder():
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
SwapLockManager().acquire(" ", now=T0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_acquire_held_by_other_raises_lockheld_with_state():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
||||||
|
with pytest.raises(LockHeld) as ei:
|
||||||
|
mgr.acquire("johnny5", ttl_seconds=60, now=T0)
|
||||||
|
assert ei.value.state["holder"] == "openclaw"
|
||||||
|
|
||||||
|
|
||||||
|
def test_reacquire_with_token_extends_and_keeps_token():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
first = mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
||||||
|
later = T0 + timedelta(seconds=30)
|
||||||
|
second = mgr.acquire("openclaw", ttl_seconds=60, token=first.token, now=later)
|
||||||
|
assert second.token == first.token
|
||||||
|
# window extended from the later moment, not the original
|
||||||
|
assert mgr.status(now=later)["seconds_remaining"] == 60
|
||||||
|
assert second.acquired_at == first.acquired_at # acquired_at preserved
|
||||||
|
|
||||||
|
|
||||||
|
def test_reacquire_without_token_is_refused_even_for_same_holder_name():
|
||||||
|
# Holder name is descriptive, not a secret — matching it must not grant access.
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
||||||
|
with pytest.raises(LockHeld):
|
||||||
|
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ttl_is_clamped():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
mgr.acquire("a", ttl_seconds=0, now=T0)
|
||||||
|
assert mgr.status(now=T0)["seconds_remaining"] == LOCK_TTL_MIN
|
||||||
|
mgr2 = SwapLockManager()
|
||||||
|
mgr2.acquire("b", ttl_seconds=10**9, now=T0)
|
||||||
|
assert mgr2.status(now=T0)["seconds_remaining"] == LOCK_TTL_MAX
|
||||||
|
|
||||||
|
|
||||||
|
def test_lock_expires_and_clears_lazily():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
|
||||||
|
after = T0 + timedelta(seconds=11)
|
||||||
|
assert mgr.status(now=after) == {"held": False}
|
||||||
|
assert mgr.verify(tok, now=after) is False
|
||||||
|
# an expired lock is free to re-take by anyone
|
||||||
|
mgr.acquire("johnny5", ttl_seconds=10, now=after)
|
||||||
|
assert mgr.status(now=after)["holder"] == "johnny5"
|
||||||
|
|
||||||
|
|
||||||
|
def test_verify_matches_only_active_token():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
|
||||||
|
assert mgr.verify(tok, now=T0) is True
|
||||||
|
assert mgr.verify("nope", now=T0) is False
|
||||||
|
assert mgr.verify(None, now=T0) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_release_requires_token_then_frees():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
|
||||||
|
with pytest.raises(PermissionError):
|
||||||
|
mgr.release("wrong", now=T0)
|
||||||
|
assert mgr.release(tok, now=T0) is True
|
||||||
|
assert mgr.status(now=T0) == {"held": False}
|
||||||
|
|
||||||
|
|
||||||
|
def test_force_release_skips_token_and_release_of_free_lock_is_false():
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
||||||
|
assert mgr.release(force=True, now=T0) is True
|
||||||
|
assert mgr.release(force=True, now=T0) is False # nothing held now
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_blocked_by_is_the_swap_gate():
|
||||||
|
# Mirrors the single-read decision the /api/swap endpoint makes.
|
||||||
|
mgr = SwapLockManager()
|
||||||
|
assert mgr.is_blocked_by(None, now=T0) is None # free lock blocks nobody
|
||||||
|
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
|
||||||
|
blocked = mgr.is_blocked_by(None, now=T0) # no token -> blocked
|
||||||
|
assert blocked is not None and blocked["holder"] == "openclaw"
|
||||||
|
assert mgr.is_blocked_by("wrong", now=T0) is not None # wrong token -> blocked
|
||||||
|
assert mgr.is_blocked_by(tok, now=T0) is None # holder's token -> allowed
|
||||||
|
# At/after expiry the gate is open even without a token (the bug a separate
|
||||||
|
# status()+verify() pair would get wrong).
|
||||||
|
assert mgr.is_blocked_by(None, now=T0 + timedelta(seconds=11)) is None
|
||||||
|
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------- webhook ----
|
||||||
|
|
||||||
|
def test_build_webhook_payload_shape():
|
||||||
|
p = build_webhook_payload(
|
||||||
|
event="swap_complete", job_id="abc123", model_key="gemma",
|
||||||
|
state="ready", returncode=0, started_at="t0", finished_at="t1",
|
||||||
|
dry_run=False,
|
||||||
|
)
|
||||||
|
assert p == {
|
||||||
|
"event": "swap_complete", "job_id": "abc123", "model_key": "gemma",
|
||||||
|
"state": "ready", "returncode": 0, "started_at": "t0",
|
||||||
|
"finished_at": "t1", "dry_run": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_sign_payload_is_deterministic_and_prefixed():
|
||||||
|
body = b'{"event":"swap_complete"}'
|
||||||
|
sig = sign_payload("s3cr3t", body)
|
||||||
|
assert sig.startswith("sha256=")
|
||||||
|
assert sig == sign_payload("s3cr3t", body)
|
||||||
|
assert sig != sign_payload("other", body)
|
||||||
|
|
||||||
|
|
||||||
|
def test_disabled_webhook_fire_is_noop():
|
||||||
|
n = WebhookNotifier("", "")
|
||||||
|
assert n.enabled is False
|
||||||
|
# Must not attempt any network call or raise when no URL is configured.
|
||||||
|
assert asyncio.run(n.fire("swap_complete", {"x": 1})) is None
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------- schedule registry ----
|
||||||
|
|
||||||
|
def test_register_and_list_schedule():
|
||||||
|
reg = ScheduleRegistry()
|
||||||
|
e = reg.register(name="Daily Vol", owner="openclaw", cron="0 6 * * *")
|
||||||
|
assert e.id and e.registered_at and e.updated_at
|
||||||
|
listed = reg.list()
|
||||||
|
assert len(listed) == 1 and listed[0]["name"] == "Daily Vol"
|
||||||
|
|
||||||
|
|
||||||
|
def test_register_with_id_updates_in_place():
|
||||||
|
reg = ScheduleRegistry()
|
||||||
|
reg.register(name="Daily Vol", id="dv", owner="openclaw", cron="0 6 * * *")
|
||||||
|
reg.register(name="Daily Vol v2", id="dv", owner="openclaw", cron="0 7 * * *")
|
||||||
|
listed = reg.list()
|
||||||
|
assert len(listed) == 1
|
||||||
|
assert listed[0]["name"] == "Daily Vol v2" and listed[0]["cron"] == "0 7 * * *"
|
||||||
|
|
||||||
|
|
||||||
|
def test_register_requires_name_and_validates_id():
|
||||||
|
reg = ScheduleRegistry()
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
reg.register(name=" ")
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
reg.register(name="ok", id="bad id; rm -rf")
|
||||||
|
|
||||||
|
|
||||||
|
def test_delete_schedule():
|
||||||
|
reg = ScheduleRegistry()
|
||||||
|
reg.register(name="Daily Vol", id="dv")
|
||||||
|
assert reg.delete("dv") is True
|
||||||
|
assert reg.delete("dv") is False
|
||||||
|
assert reg.list() == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_valid_schedule_id():
|
||||||
|
assert valid_schedule_id("daily-vol")
|
||||||
|
assert valid_schedule_id("a.b_c-1")
|
||||||
|
assert not valid_schedule_id("")
|
||||||
|
assert not valid_schedule_id("../etc")
|
||||||
|
assert not valid_schedule_id("has space")
|
||||||
|
assert not valid_schedule_id("x" * 65)
|
||||||
@@ -0,0 +1,190 @@
|
|||||||
|
"""Disk-driven menu helpers: cache-dir parsing + launch-recipe inference.
|
||||||
|
|
||||||
|
All offline — pure functions over a fake cache listing and fake config.json
|
||||||
|
dicts. The SSH scan, the menu merge, and the suggest endpoint that wire these
|
||||||
|
together are exercised by hand against the live cluster (mock-heavy unit tests of
|
||||||
|
those would test the mocks).
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
from app import discovery
|
||||||
|
from app.config import Settings
|
||||||
|
from app.disk import DiskStatus, cache_dirname_to_repo, parse_cache_listing
|
||||||
|
from app.discovery import repo_to_key, infer_recipe, _detect_family
|
||||||
|
from app.models import load_catalog
|
||||||
|
|
||||||
|
|
||||||
|
# ---- cache dirname <-> repo ----
|
||||||
|
|
||||||
|
def test_cache_dirname_to_repo_roundtrip():
|
||||||
|
assert cache_dirname_to_repo("models--RedHatAI--Qwen3.6-35B-A3B-NVFP4") == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
|
||||||
|
|
||||||
|
|
||||||
|
def test_cache_dirname_name_with_double_dash():
|
||||||
|
# The org is the first segment; everything after is the name (single '/').
|
||||||
|
assert cache_dirname_to_repo("models--org--weird--name") == "org/weird--name"
|
||||||
|
|
||||||
|
|
||||||
|
def test_cache_dirname_rejects_non_model_dirs():
|
||||||
|
assert cache_dirname_to_repo("datasets--foo--bar") is None
|
||||||
|
assert cache_dirname_to_repo("models--onlyorg") is None
|
||||||
|
assert cache_dirname_to_repo("random") is None
|
||||||
|
|
||||||
|
|
||||||
|
# ---- parse_cache_listing ----
|
||||||
|
|
||||||
|
def test_parse_cache_listing_complete_and_incomplete():
|
||||||
|
out = (
|
||||||
|
"20000000000|1|models--RedHatAI--Qwen3.6-35B-A3B-NVFP4\n"
|
||||||
|
"5000000000|0|models--some--half-downloaded\n"
|
||||||
|
"\n"
|
||||||
|
"garbage line with no pipes\n"
|
||||||
|
"123|1|not-a-model-dir\n"
|
||||||
|
)
|
||||||
|
items = parse_cache_listing(out)
|
||||||
|
assert items == [
|
||||||
|
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20000000000, True),
|
||||||
|
("some/half-downloaded", 5000000000, False),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_cache_listing_bad_size_defaults_zero():
|
||||||
|
items = parse_cache_listing("notanumber|1|models--a--b")
|
||||||
|
assert items == [("a/b", 0, True)]
|
||||||
|
|
||||||
|
|
||||||
|
# ---- repo_to_key ----
|
||||||
|
|
||||||
|
def test_repo_to_key_is_url_safe_and_stable():
|
||||||
|
assert repo_to_key("RedHatAI/Qwen3.6-35B-A3B-NVFP4") == "redhatai-qwen3-6-35b-a3b-nvfp4"
|
||||||
|
# Idempotent enough to be a stable id across calls.
|
||||||
|
assert repo_to_key("nvidia/Gemma-4-26B-A4B-NVFP4") == "nvidia-gemma-4-26b-a4b-nvfp4"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- family detection ----
|
||||||
|
|
||||||
|
def test_detect_qwen3_moe():
|
||||||
|
cfg = {"architectures": ["Qwen3MoeForCausalLM"], "model_type": "qwen3_moe", "num_experts": 128}
|
||||||
|
label, flags, caps = _detect_family(cfg)
|
||||||
|
assert "--reasoning-parser=qwen3" in flags
|
||||||
|
assert "--moe_backend=flashinfer_cutlass" in flags
|
||||||
|
assert "reasoning" in caps
|
||||||
|
assert "MoE" in label
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_gemma_moe_uses_marlin():
|
||||||
|
cfg = {"architectures": ["Gemma4MoeForConditionalGeneration"], "model_type": "gemma4_moe", "num_local_experts": 8}
|
||||||
|
label, flags, caps = _detect_family(cfg)
|
||||||
|
assert "--reasoning-parser=gemma4" in flags
|
||||||
|
assert "--tool-call-parser=gemma4" in flags
|
||||||
|
assert "--moe_backend=marlin" in flags # NOT flashinfer_cutlass — GB10 footgun
|
||||||
|
assert "vision" in caps # ConditionalGeneration => multimodal
|
||||||
|
assert "tools" in caps
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_generic_has_no_family_flags():
|
||||||
|
label, flags, caps = _detect_family({"architectures": ["LlamaForCausalLM"], "model_type": "llama"})
|
||||||
|
assert flags == []
|
||||||
|
assert label == "Generic"
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_vision_from_config_keys():
|
||||||
|
_, _, caps = _detect_family({"model_type": "qwen3", "vision_config": {"x": 1}})
|
||||||
|
assert "vision" in caps
|
||||||
|
|
||||||
|
|
||||||
|
# ---- infer_recipe (the prefill the setup form receives) ----
|
||||||
|
|
||||||
|
def test_infer_recipe_solo_small_model():
|
||||||
|
cfg = {"architectures": ["Qwen3ForCausalLM"], "model_type": "qwen3"}
|
||||||
|
rec = infer_recipe("RedHatAI/Qwen3.6-35B-A3B-NVFP4", cfg, total_bytes=20_000_000_000, on_host_count=1)
|
||||||
|
assert rec["mode"] == "solo"
|
||||||
|
assert rec["key"] == "redhatai-qwen3-6-35b-a3b-nvfp4"
|
||||||
|
assert rec["repo"] == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
|
||||||
|
assert "--reasoning-parser=qwen3" in rec["vllm_args"]
|
||||||
|
assert "-tp=2" not in rec["vllm_args"]
|
||||||
|
assert rec["knobs"]["kv_cache_dtype"] == "fp8"
|
||||||
|
|
||||||
|
|
||||||
|
def test_infer_recipe_cluster_when_on_both_hosts():
|
||||||
|
rec = infer_recipe("org/big", {}, total_bytes=10_000_000_000, on_host_count=2)
|
||||||
|
assert rec["mode"] == "cluster"
|
||||||
|
assert "-tp=2" in rec["vllm_args"]
|
||||||
|
assert "--distributed-executor-backend=ray" in rec["vllm_args"]
|
||||||
|
assert rec["knobs"]["gpu_memory_utilization"] == 0.7
|
||||||
|
|
||||||
|
|
||||||
|
def test_infer_recipe_cluster_when_too_big_for_one_spark():
|
||||||
|
rec = infer_recipe("org/huge", {}, total_bytes=200_000_000_000, on_host_count=1)
|
||||||
|
assert rec["mode"] == "cluster"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- build_menu merge (disk scan ∪ recipes) ----
|
||||||
|
|
||||||
|
def _both_spark_settings(monkeypatch) -> Settings:
|
||||||
|
for k in ("SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER"):
|
||||||
|
monkeypatch.delenv(k, raising=False)
|
||||||
|
monkeypatch.setenv("SPARK1_HOST", "1.1.1.1")
|
||||||
|
monkeypatch.setenv("SPARK1_USER", "u")
|
||||||
|
monkeypatch.setenv("SPARK2_HOST", "2.2.2.2")
|
||||||
|
monkeypatch.setenv("SPARK2_USER", "u")
|
||||||
|
return Settings.from_env()
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_menu_merges_recipe_discovered_and_hides_incomplete(monkeypatch):
|
||||||
|
cat = load_catalog("models.yaml") # bundled recipes incl. qwen36 + gemma4
|
||||||
|
settings = _both_spark_settings(monkeypatch)
|
||||||
|
|
||||||
|
async def fake_list(host, user, s):
|
||||||
|
if host == "1.1.1.1":
|
||||||
|
return [
|
||||||
|
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20_000_000_000, True), # recipe match
|
||||||
|
("someorg/mystery-7B", 7_000_000_000, True), # needs setup
|
||||||
|
("broken/half", 1_000_000_000, False), # incomplete -> hidden
|
||||||
|
]
|
||||||
|
return [] # spark2 empty
|
||||||
|
|
||||||
|
async def fake_probe(repo, mode, s, *, local_path=None):
|
||||||
|
return DiskStatus(repo=local_path or repo, on_disk=False, total_bytes=0, per_host=[])
|
||||||
|
|
||||||
|
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
|
||||||
|
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
|
||||||
|
|
||||||
|
menu = asyncio.run(discovery.build_menu(settings, cat))
|
||||||
|
|
||||||
|
# Recipe-matched: keyed by recipe key, ready (not needs_setup), real size.
|
||||||
|
assert "qwen36" in menu
|
||||||
|
assert menu["qwen36"]["needs_setup"] is False
|
||||||
|
assert menu["qwen36"]["total_bytes"] == 20_000_000_000
|
||||||
|
|
||||||
|
# Discovered-without-recipe: slug key, needs_setup.
|
||||||
|
slug = repo_to_key("someorg/mystery-7B")
|
||||||
|
assert menu[slug]["needs_setup"] is True
|
||||||
|
|
||||||
|
# Incomplete download is filtered out entirely.
|
||||||
|
assert all("half" not in k for k in menu)
|
||||||
|
|
||||||
|
# A recipe with nothing on disk (e.g. gemma4) must NOT appear — the menu is the disk.
|
||||||
|
assert "gemma4" not in menu
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_menu_sums_cluster_model_across_both_sparks(monkeypatch):
|
||||||
|
cat = load_catalog("models.yaml")
|
||||||
|
settings = _both_spark_settings(monkeypatch)
|
||||||
|
|
||||||
|
async def fake_list(host, user, s):
|
||||||
|
# Same repo present on BOTH Sparks — one card, sizes summed (not two cards).
|
||||||
|
return [("org/sharded-235B", 70_000_000_000, True)]
|
||||||
|
|
||||||
|
async def fake_probe(repo, mode, s, *, local_path=None):
|
||||||
|
return DiskStatus(repo=repo, on_disk=False, total_bytes=0, per_host=[])
|
||||||
|
|
||||||
|
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
|
||||||
|
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
|
||||||
|
|
||||||
|
menu = asyncio.run(discovery.build_menu(settings, cat))
|
||||||
|
key = repo_to_key("org/sharded-235B")
|
||||||
|
assert list(menu) == [key] # exactly one card
|
||||||
|
assert menu[key]["total_bytes"] == 140_000_000_000 # summed across both hosts
|
||||||
|
assert len(menu[key]["per_host"]) == 2
|
||||||
|
assert menu[key]["mode"] == "cluster" # present on 2 hosts -> cluster
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
"""build_download_command: the ~/.local/bin PATH fix + shell-injection quoting.
|
||||||
|
|
||||||
|
hf-download.sh on the Spark shells out to `uvx`, which the uv installer puts in
|
||||||
|
~/.local/bin — off the PATH of our non-interactive SSH session. The command must
|
||||||
|
prepend ~/.local/bin (via $HOME, expanded server-side) or the download dies with
|
||||||
|
"uvx: command not found". The repo value must also be shlex-quoted at the sink so
|
||||||
|
a crafted value can't break out of the command (validate_repo gates it upstream).
|
||||||
|
"""
|
||||||
|
import shlex
|
||||||
|
|
||||||
|
from app.download import build_download_command
|
||||||
|
|
||||||
|
|
||||||
|
def test_prepends_local_bin_to_path():
|
||||||
|
cmd = build_download_command("org/name")
|
||||||
|
assert cmd.startswith('export PATH="$HOME/.local/bin:$PATH" && ')
|
||||||
|
assert "cd ~/spark-vllm-docker" in cmd
|
||||||
|
assert "./hf-download.sh org/name" in cmd
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_trailing_space_without_flags():
|
||||||
|
assert build_download_command("org/name", "").endswith("./hf-download.sh org/name")
|
||||||
|
|
||||||
|
|
||||||
|
def test_cluster_flags_appended():
|
||||||
|
cmd = build_download_command("org/name", "-c --copy-parallel")
|
||||||
|
assert cmd.endswith("./hf-download.sh org/name -c --copy-parallel")
|
||||||
|
|
||||||
|
|
||||||
|
def test_repo_is_shlex_quoted():
|
||||||
|
# Everything after the script name must shlex-split back to the exact repo,
|
||||||
|
# the same round-trip invariant build_launch_command relies on.
|
||||||
|
cmd = build_download_command("org/na;me")
|
||||||
|
after = cmd.split("./hf-download.sh ", 1)[1]
|
||||||
|
assert shlex.split(after) == ["org/na;me"]
|
||||||
@@ -7,6 +7,9 @@ the command back into the exact token list. The vLLM pre-flight validator
|
|||||||
"""
|
"""
|
||||||
import shlex
|
import shlex
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from pydantic import ValidationError
|
||||||
|
|
||||||
from app.models import Defaults, ModelDef, build_launch_command
|
from app.models import Defaults, ModelDef, build_launch_command
|
||||||
|
|
||||||
DEFAULTS = Defaults(port=8888, host="0.0.0.0")
|
DEFAULTS = Defaults(port=8888, host="0.0.0.0")
|
||||||
@@ -65,3 +68,81 @@ def test_injection_via_vllm_arg_stays_literal():
|
|||||||
payload = "--foo=$(touch /tmp/pwned)"
|
payload = "--foo=$(touch /tmp/pwned)"
|
||||||
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
|
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
|
||||||
assert payload in shlex.split(cmd) # preserved as one inert token
|
assert payload in shlex.split(cmd) # preserved as one inert token
|
||||||
|
|
||||||
|
|
||||||
|
# ---- local / fine-tuned models (served by directory, not HF repo) ----
|
||||||
|
|
||||||
|
def test_local_model_bind_mounts_dir_and_serves_the_path():
|
||||||
|
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
|
||||||
|
cmd = build_launch_command("k", m, DEFAULTS)
|
||||||
|
tokens = shlex.split(cmd)
|
||||||
|
# The launch script's hook bind-mounts the host dir at the SAME container path.
|
||||||
|
assert tokens[0] == (
|
||||||
|
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
|
||||||
|
)
|
||||||
|
# vLLM is pointed at the directory, not an HF repo id.
|
||||||
|
i = tokens.index("serve")
|
||||||
|
assert tokens[i + 1] == "/home/u/models/ft-v2"
|
||||||
|
assert "--max-model-len=2048" in tokens
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_model_chat_template_arg_survives_round_trip():
|
||||||
|
m = _model(
|
||||||
|
repo="",
|
||||||
|
local_path="/m/ft",
|
||||||
|
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
|
||||||
|
)
|
||||||
|
cmd = build_launch_command("k", m, DEFAULTS)
|
||||||
|
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_path_with_metacharacters_is_quoted_not_executed():
|
||||||
|
# The validator rejects a hostile path at the boundary; bypass it with
|
||||||
|
# model_construct to prove the quote_arg sink is safe in depth even if a bad
|
||||||
|
# value somehow reaches build_launch_command.
|
||||||
|
evil = "/m/ft; rm -rf ~"
|
||||||
|
m = ModelDef.model_construct(
|
||||||
|
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
|
||||||
|
vllm_args=[], knobs=None, custom=False, capabilities=[],
|
||||||
|
expected_ready_seconds=300, description=None,
|
||||||
|
)
|
||||||
|
cmd = build_launch_command("k", m, DEFAULTS)
|
||||||
|
tokens = shlex.split(cmd)
|
||||||
|
i = tokens.index("serve")
|
||||||
|
assert tokens[i + 1] == evil # recovered as one literal token, not executed
|
||||||
|
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_model_requires_exactly_one_source():
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_model_rejects_chat_template_outside_dir():
|
||||||
|
# Only local_path is mounted into the container, so a chat-template elsewhere
|
||||||
|
# would silently 404 inside vLLM — reject it up front.
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ModelDef(
|
||||||
|
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
|
||||||
|
vllm_args=["--chat-template=/other/dir/t.jinja"],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_invalid_local_path_rejected_by_model():
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
|
||||||
|
|
||||||
|
|
||||||
|
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
|
||||||
|
# YAML/override-added local models get the same validation as the API; a single
|
||||||
|
# bad entry is skipped (logged) rather than breaking the whole catalog load.
|
||||||
|
from app import models as M
|
||||||
|
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
|
||||||
|
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
|
||||||
|
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
|
||||||
|
]})
|
||||||
|
cat = M._merge_overrides(M.Catalog(models={}))
|
||||||
|
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
|
||||||
|
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
|
||||||
|
|||||||
@@ -6,7 +6,12 @@ use `validate_x(v)` inline.
|
|||||||
"""
|
"""
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from app.shellsafe import validate_container, validate_image, validate_repo
|
from app.shellsafe import (
|
||||||
|
validate_container,
|
||||||
|
validate_image,
|
||||||
|
validate_local_path,
|
||||||
|
validate_repo,
|
||||||
|
)
|
||||||
|
|
||||||
# Shell metacharacters that must never survive any validator — these are the
|
# Shell metacharacters that must never survive any validator — these are the
|
||||||
# actual injection vectors. (Path traversal like "../" is NOT in scope here:
|
# actual injection vectors. (Path traversal like "../" is NOT in scope here:
|
||||||
@@ -96,3 +101,27 @@ def test_container_valid_passes_through_unchanged(name):
|
|||||||
def test_container_rejects_malformed_and_hostile(name):
|
def test_container_rejects_malformed_and_hostile(name):
|
||||||
with pytest.raises(ValueError):
|
with pytest.raises(ValueError):
|
||||||
validate_container(name)
|
validate_container(name)
|
||||||
|
|
||||||
|
|
||||||
|
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("path", [
|
||||||
|
"/home/modelo/models/gemma-4-31B-ten31-v2",
|
||||||
|
"/data/models/ft.v2_1",
|
||||||
|
"/srv/m/a-b/c",
|
||||||
|
])
|
||||||
|
def test_local_path_valid_passes_through_unchanged(path):
|
||||||
|
assert validate_local_path(path) == path
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("path", [
|
||||||
|
"",
|
||||||
|
"relative/path", # must be absolute
|
||||||
|
"~/models/x", # no ~ expansion
|
||||||
|
"/models/../etc/shadow", # '..' traversal
|
||||||
|
"/models/./x", # '.' segment
|
||||||
|
"/a" * 300, # over the 512 cap (600 chars)
|
||||||
|
] + [f"/models/x{h}" for h in HOSTILE])
|
||||||
|
def test_local_path_rejects_relative_traversal_and_hostile(path):
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
validate_local_path(path)
|
||||||
|
|||||||
@@ -0,0 +1,120 @@
|
|||||||
|
"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
|
||||||
|
extra-vLLM probe. All offline — the disabled checks short-circuit before any
|
||||||
|
network call, and the probes are exercised only on the not-configured path.
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
from app.config import Settings
|
||||||
|
from app.health import (
|
||||||
|
check_embeddings,
|
||||||
|
check_kokoro,
|
||||||
|
check_parakeet,
|
||||||
|
check_qdrant,
|
||||||
|
check_vllm,
|
||||||
|
probe_vllm_endpoint,
|
||||||
|
)
|
||||||
|
from app.services import services_from_settings
|
||||||
|
|
||||||
|
|
||||||
|
def _settings(monkeypatch, **env) -> Settings:
|
||||||
|
# Pin the topology env vars under test; default the rest to blank so a stray
|
||||||
|
# value in the real environment can't leak into the assertion.
|
||||||
|
keys = [
|
||||||
|
"SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
|
||||||
|
"DISABLED_SERVICES", "VLLM_CONTAINER",
|
||||||
|
]
|
||||||
|
for k in keys:
|
||||||
|
monkeypatch.delenv(k, raising=False)
|
||||||
|
for k, v in env.items():
|
||||||
|
monkeypatch.setenv(k, v)
|
||||||
|
return Settings.from_env()
|
||||||
|
|
||||||
|
|
||||||
|
# ---- DISABLED_SERVICES parsing ----
|
||||||
|
|
||||||
|
def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
|
||||||
|
s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
|
||||||
|
assert s.disabled_services == frozenset({"parakeet", "kokoro"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_disabled_services_blank_is_empty(monkeypatch):
|
||||||
|
assert _settings(monkeypatch).disabled_services == frozenset()
|
||||||
|
|
||||||
|
|
||||||
|
# ---- vLLM container override ----
|
||||||
|
|
||||||
|
def test_vllm_container_defaults_to_vllm_node(monkeypatch):
|
||||||
|
assert _settings(monkeypatch).vllm_container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
def test_vllm_container_override(monkeypatch):
|
||||||
|
assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
|
||||||
|
|
||||||
|
|
||||||
|
def test_vllm_container_invalid_falls_back(monkeypatch):
|
||||||
|
# A malformed value (space / shell metachar) is rejected at the boundary and
|
||||||
|
# falls back to the default rather than crashing startup or reaching a sink.
|
||||||
|
assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- services map honors the disable list ----
|
||||||
|
|
||||||
|
def test_services_from_settings_drops_disabled(monkeypatch):
|
||||||
|
s = _settings(
|
||||||
|
monkeypatch,
|
||||||
|
SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
|
||||||
|
DISABLED_SERVICES="parakeet,qdrant",
|
||||||
|
)
|
||||||
|
svcs = services_from_settings(s)
|
||||||
|
assert "parakeet" not in svcs and "qdrant" not in svcs
|
||||||
|
assert "kokoro" in svcs and "embeddings" in svcs
|
||||||
|
|
||||||
|
|
||||||
|
def test_custom_vllm_service_registered(monkeypatch):
|
||||||
|
from app import custom_services
|
||||||
|
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
||||||
|
{"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
|
||||||
|
"user": "u", "container": "vllm_node", "port": 8000},
|
||||||
|
])
|
||||||
|
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
||||||
|
svc = services_from_settings(s)["vllm-spark2"]
|
||||||
|
assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
|
||||||
|
# A custom entry can't shadow a built-in key — the built-in wins.
|
||||||
|
from app import custom_services
|
||||||
|
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
||||||
|
{"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
|
||||||
|
])
|
||||||
|
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
||||||
|
assert services_from_settings(s)["parakeet"].kind == "stt"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- disabled health checks short-circuit (no network) ----
|
||||||
|
|
||||||
|
def test_disabled_check_returns_disabled_verdict(monkeypatch):
|
||||||
|
s = _settings(
|
||||||
|
monkeypatch,
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins
|
||||||
|
DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
|
||||||
|
)
|
||||||
|
for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
|
||||||
|
r = asyncio.run(check(s))
|
||||||
|
assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
||||||
|
|
||||||
|
|
||||||
|
# ---- vLLM probe: not-configured path is pure ----
|
||||||
|
|
||||||
|
def test_probe_vllm_endpoint_unconfigured(monkeypatch):
|
||||||
|
r = asyncio.run(probe_vllm_endpoint("", 8000))
|
||||||
|
assert r["ok"] is False and "not configured" in r["error"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_check_vllm_unconfigured_without_spark1(monkeypatch):
|
||||||
|
s = _settings(monkeypatch) # no SPARK1_HOST
|
||||||
|
r = asyncio.run(check_vllm(s))
|
||||||
|
assert r["ok"] is False and "spark1 not configured" in r["error"]
|
||||||
@@ -3,6 +3,15 @@ import { sparkConfigYaml } from '../fileModels/sparkConfig.yaml'
|
|||||||
|
|
||||||
const { InputSpec, Value } = sdk
|
const { InputSpec, Value } = sdk
|
||||||
|
|
||||||
|
// This action is intentionally minimal: just the required wiring needed before
|
||||||
|
// Spark Control can do anything — the two Spark node addresses and SSH users.
|
||||||
|
// Every other knob (vLLM/service ports, container names, support-service hosts,
|
||||||
|
// integrations, webhooks) now lives behind the ⚙ Settings gear in the dashboard
|
||||||
|
// itself, which is where StartOS 0.4 expects routine config to live (and most
|
||||||
|
// operators never open StartOS actions). The optional keys still exist in the
|
||||||
|
// config.yaml schema (set by older versions); they're read into env at launch
|
||||||
|
// and migrated into the in-app settings overlay on first boot, so nothing is
|
||||||
|
// lost on upgrade — they're simply edited in the dashboard from now on.
|
||||||
const inputSpec = InputSpec.of({
|
const inputSpec = InputSpec.of({
|
||||||
spark1_host: Value.text({
|
spark1_host: Value.text({
|
||||||
name: 'Spark 1 hostname or IP',
|
name: 'Spark 1 hostname or IP',
|
||||||
@@ -40,128 +49,14 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'your SSH username',
|
placeholder: 'your SSH username',
|
||||||
masked: false,
|
masked: false,
|
||||||
}),
|
}),
|
||||||
vllm_port: Value.text({
|
|
||||||
name: 'vLLM port (optional)',
|
|
||||||
description:
|
|
||||||
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank for 8888',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
parakeet_host: Value.text({
|
|
||||||
name: 'Parakeet host (optional)',
|
|
||||||
description:
|
|
||||||
"Override the host running the Parakeet STT container. Leave blank if Parakeet runs on Spark 2 — that's the default. Set this if you run Parakeet on Spark 1 or a different machine.",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank to use Spark 2',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
parakeet_container: Value.text({
|
|
||||||
name: 'Parakeet container name (optional)',
|
|
||||||
description:
|
|
||||||
'Docker container name for Parakeet. Defaults to "parakeet-asr" — change only if you named yours something else.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'parakeet-asr',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
kokoro_host: Value.text({
|
|
||||||
name: 'Kokoro host (optional)',
|
|
||||||
description:
|
|
||||||
'Override the host running the Kokoro TTS container. Leave blank if Kokoro runs on Spark 2.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank to use Spark 2',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
kokoro_container: Value.text({
|
|
||||||
name: 'Kokoro container name (optional)',
|
|
||||||
description: 'Docker container name for Kokoro. Defaults to "kokoro-tts".',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'kokoro-tts',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
embed_host: Value.text({
|
|
||||||
name: 'Embedding server host (optional)',
|
|
||||||
description:
|
|
||||||
'Override the host running the spark-embed container (bge-m3 dense embeddings + reranker). Leave blank if it runs on Spark 2.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank to use Spark 2',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
embed_container: Value.text({
|
|
||||||
name: 'Embedding container name (optional)',
|
|
||||||
description:
|
|
||||||
'Docker container name for the embedding server. Defaults to "spark-embed".',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'spark-embed',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
qdrant_host: Value.text({
|
|
||||||
name: 'Qdrant host (optional)',
|
|
||||||
description:
|
|
||||||
'Override the host running the Qdrant vector database. Leave blank if it runs on Spark 2.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank to use Spark 2',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
qdrant_container: Value.text({
|
|
||||||
name: 'Qdrant container name (optional)',
|
|
||||||
description: 'Docker container name for Qdrant. Defaults to "qdrant".',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'qdrant',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
qdrant_collection: Value.text({
|
|
||||||
name: 'Default Qdrant collection (optional)',
|
|
||||||
description:
|
|
||||||
'Default collection name used by /api/search when a request does not specify one. Leave blank to require callers to pass a collection.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. crm_chunks',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
matrix_bridge_user: Value.text({
|
|
||||||
name: 'matrix-bridge bot SSH user (optional)',
|
|
||||||
description:
|
|
||||||
"If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. modelo',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
open_webui_url: Value.text({
|
|
||||||
name: 'Open WebUI URL (optional)',
|
|
||||||
description:
|
|
||||||
'If you also run Open WebUI on your LAN, paste its URL here. Spark Control will then show a one-click "Open chat" button next to the current model so you can jump straight to it.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. https://open-webui.yourserver.local',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
ngc_api_key: Value.text({
|
|
||||||
name: 'NGC API key (optional)',
|
|
||||||
description:
|
|
||||||
'NVIDIA NGC personal API key — needed to install NIM containers (Parakeet, etc.) from nvcr.io. Get one free at https://ngc.nvidia.com/setup/personal-key. Stored only on this Start9 server; passed to docker as the NGC_API_KEY env var when installing NIM services. (Kokoro TTS is Apache 2.0 and does not need an NGC key.)',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'starts with "nvapi-..."',
|
|
||||||
masked: true,
|
|
||||||
}),
|
|
||||||
})
|
})
|
||||||
|
|
||||||
export const configureSparks = sdk.Action.withInput(
|
export const configureSparks = sdk.Action.withInput(
|
||||||
'configure-sparks',
|
'configure-sparks',
|
||||||
async () => ({
|
async () => ({
|
||||||
name: 'Configure Sparks',
|
name: 'Configure Sparks',
|
||||||
description: 'Set the hostnames and SSH users for your two Spark nodes.',
|
description:
|
||||||
|
'Set your two Spark node addresses and SSH users — the required wiring. Everything else (ports, container names, support services, integrations) is configured under ⚙ Settings in the Spark Control dashboard.',
|
||||||
warning: null,
|
warning: null,
|
||||||
visibility: 'enabled',
|
visibility: 'enabled',
|
||||||
allowedStatuses: 'any',
|
allowedStatuses: 'any',
|
||||||
@@ -169,11 +64,19 @@ export const configureSparks = sdk.Action.withInput(
|
|||||||
}),
|
}),
|
||||||
async () => inputSpec,
|
async () => inputSpec,
|
||||||
async ({ effects }) => {
|
async ({ effects }) => {
|
||||||
|
// Prefill from the saved config, but only the keys this (trimmed) form owns.
|
||||||
const cfg = await sparkConfigYaml.read().once()
|
const cfg = await sparkConfigYaml.read().once()
|
||||||
return cfg ?? null
|
if (!cfg) return null
|
||||||
|
return {
|
||||||
|
spark1_host: cfg.spark1_host,
|
||||||
|
spark1_user: cfg.spark1_user,
|
||||||
|
spark2_host: cfg.spark2_host,
|
||||||
|
spark2_user: cfg.spark2_user,
|
||||||
|
}
|
||||||
},
|
},
|
||||||
async ({ effects, input }) => {
|
async ({ effects, input }) => {
|
||||||
// Optional fields come through as `null`; coerce to empty string for the schema.
|
// merge() only touches the four keys we submit, leaving any legacy optional
|
||||||
|
// values already in config.yaml intact.
|
||||||
const normalized = Object.fromEntries(
|
const normalized = Object.fromEntries(
|
||||||
Object.entries(input).map(([k, v]) => [k, v ?? '']),
|
Object.entries(input).map(([k, v]) => [k, v ?? '']),
|
||||||
) as Record<string, string>
|
) as Record<string, string>
|
||||||
|
|||||||
@@ -9,6 +9,11 @@ export const sparkConfigSchema = z.object({
|
|||||||
spark2_user: z.string().catch(''),
|
spark2_user: z.string().catch(''),
|
||||||
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
||||||
vllm_port: z.string().catch(''),
|
vllm_port: z.string().catch(''),
|
||||||
|
// Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
|
||||||
|
vllm_container: z.string().catch(''),
|
||||||
|
// Optional comma-separated list of built-in services to switch off
|
||||||
|
// (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
|
||||||
|
disabled_services: z.string().catch(''),
|
||||||
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
||||||
parakeet_host: z.string().catch(''),
|
parakeet_host: z.string().catch(''),
|
||||||
parakeet_user: z.string().catch(''),
|
parakeet_user: z.string().catch(''),
|
||||||
@@ -30,6 +35,11 @@ export const sparkConfigSchema = z.object({
|
|||||||
open_webui_url: z.string().catch(''),
|
open_webui_url: z.string().catch(''),
|
||||||
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
||||||
ngc_api_key: z.string().catch(''),
|
ngc_api_key: z.string().catch(''),
|
||||||
|
// Optional coordination webhook: POSTed on swap_complete/swap_failed so
|
||||||
|
// downstream consumers re-point their model config. Blank => disabled.
|
||||||
|
swap_webhook_url: z.string().catch(''),
|
||||||
|
// Optional shared secret; if set, the webhook body is HMAC-signed.
|
||||||
|
swap_webhook_secret: z.string().catch(''),
|
||||||
})
|
})
|
||||||
|
|
||||||
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
||||||
|
|||||||
@@ -14,6 +14,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
spark2_host: '',
|
spark2_host: '',
|
||||||
spark2_user: '',
|
spark2_user: '',
|
||||||
vllm_port: '',
|
vllm_port: '',
|
||||||
|
vllm_container: '',
|
||||||
|
disabled_services: '',
|
||||||
parakeet_host: '',
|
parakeet_host: '',
|
||||||
parakeet_user: '',
|
parakeet_user: '',
|
||||||
parakeet_container: '',
|
parakeet_container: '',
|
||||||
@@ -30,6 +32,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
matrix_bridge_user: '',
|
matrix_bridge_user: '',
|
||||||
open_webui_url: '',
|
open_webui_url: '',
|
||||||
ngc_api_key: '',
|
ngc_api_key: '',
|
||||||
|
swap_webhook_url: '',
|
||||||
|
swap_webhook_secret: '',
|
||||||
}
|
}
|
||||||
|
|
||||||
return sdk.Daemons.of(effects).addDaemon('primary', {
|
return sdk.Daemons.of(effects).addDaemon('primary', {
|
||||||
@@ -52,6 +56,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
SPARK2_HOST: cfg.spark2_host,
|
SPARK2_HOST: cfg.spark2_host,
|
||||||
SPARK2_USER: cfg.spark2_user,
|
SPARK2_USER: cfg.spark2_user,
|
||||||
VLLM_PORT: cfg.vllm_port,
|
VLLM_PORT: cfg.vllm_port,
|
||||||
|
VLLM_CONTAINER: cfg.vllm_container,
|
||||||
|
DISABLED_SERVICES: cfg.disabled_services,
|
||||||
PARAKEET_HOST: cfg.parakeet_host,
|
PARAKEET_HOST: cfg.parakeet_host,
|
||||||
PARAKEET_USER: cfg.parakeet_user,
|
PARAKEET_USER: cfg.parakeet_user,
|
||||||
PARAKEET_CONTAINER: cfg.parakeet_container,
|
PARAKEET_CONTAINER: cfg.parakeet_container,
|
||||||
@@ -71,6 +77,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
CONNECTIVITY_LOG: '/data/connectivity.json',
|
CONNECTIVITY_LOG: '/data/connectivity.json',
|
||||||
OPEN_WEBUI_URL: cfg.open_webui_url,
|
OPEN_WEBUI_URL: cfg.open_webui_url,
|
||||||
NGC_API_KEY: cfg.ngc_api_key,
|
NGC_API_KEY: cfg.ngc_api_key,
|
||||||
|
SWAP_WEBHOOK_URL: cfg.swap_webhook_url,
|
||||||
|
SWAP_WEBHOOK_SECRET: cfg.swap_webhook_secret,
|
||||||
BIND_PORT: String(uiPort),
|
BIND_PORT: String(uiPort),
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
export const v0_1_0 = VersionInfo.of({
|
export const v0_1_0 = VersionInfo.of({
|
||||||
version: '0.22.0:0',
|
version: '0.27.3:0',
|
||||||
releaseNotes: {
|
releaseNotes: {
|
||||||
en_US:
|
en_US:
|
||||||
"v0.22.0:0 — configurable vLLM port. The port Spark Control uses to reach vLLM on Spark 1 (the health check and the chat proxy) is now a field in the Configure Sparks action, so you can point it at a vLLM that listens on a non-default port without rebuilding the package. Leave it blank to keep the previous default of 8888 — what the bundled launch-cluster.sh wrapper uses; set it to 8000 (vLLM's own default) or any other port if your vLLM listens elsewhere. Also hardened numeric-setting parsing so a blank or malformed port value falls back to its default instead of crashing daemon startup.",
|
'v0.27.3:0 — Qwen3.6 vision now works end-to-end, including full-size phone photos. (1) Qwen3.6-35B-A3B reads images (e.g. business-card OCR) and now shows a "vision" badge on its card. (2) Fix: large/high-resolution images (e.g. a 12-megapixel phone photo) were being rejected by the model with a 400 error — a single big image expands to more vision tokens than vLLM allows. The Qwen launch now caps image resolution (max_pixels) so oversized images are automatically downscaled to a size the model accepts; the dashboard, Open WebUI, and any downstream app can now send full-size photos to the /v1 endpoint without errors, and OCR stays sharp. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
|
||||||
},
|
},
|
||||||
migrations: {
|
migrations: {
|
||||||
up: async ({ effects }) => {},
|
up: async ({ effects }) => {},
|
||||||
|
|||||||
+45
-4
@@ -52,13 +52,43 @@ The **Update** button runs `git fetch && git reset --hard origin/<branch> && doc
|
|||||||
|
|
||||||
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
||||||
|
|
||||||
|
## Configurable topology (v0.24.0+)
|
||||||
|
|
||||||
|
For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
|
||||||
|
|
||||||
|
- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
|
||||||
|
- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
|
||||||
|
- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
custom:
|
||||||
|
- key: vllm-spark2
|
||||||
|
kind: vllm
|
||||||
|
host: <spark-2-ip>
|
||||||
|
user: <ssh-user>
|
||||||
|
container: vllm_node
|
||||||
|
port: 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
|
||||||
|
|
||||||
## Adding a new model
|
## Adding a new model
|
||||||
|
|
||||||
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
The menu is whatever's downloaded on the Sparks, so the normal path is just:
|
||||||
2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host> 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
|
**download it, then set it up once.**
|
||||||
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
|
|
||||||
|
|
||||||
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
|
1. **Download** from the dashboard (**+ Download a new model**, paste the HF repo) or on Spark 1 with `./hf-download.sh <repo>`. When it finishes it appears on the menu by itself.
|
||||||
|
2. **Set it up.** If Spark Control already has a recipe for it (see below), it's ready to switch to. Otherwise it shows a **"needs setup"** card: the first switch reads the model's `config.json`, proposes how to launch it (family/parsers, solo vs cluster, vLLM flags), and you confirm once. The confirmed recipe persists to `/data/models-overrides.yaml` (survives package updates).
|
||||||
|
|
||||||
|
### Bundling a launch recipe (optional — skips the setup prompt)
|
||||||
|
|
||||||
|
To make a known model launch correctly the instant it's downloaded, add a *recipe* to `image/models.yaml`. These are **not** the menu — they're matched to an on-disk model by `repo`. Required: `display_name`, `repo`, `size_gb`, `mode` (`solo`/`cluster`), `vllm_args`. Optional: `description`, `capabilities` (e.g. `[vision, reasoning, tools]`), `expected_ready_seconds`. Then rebuild + redeploy: `cd package && make x86 && make install`. Keep descriptions generic (not user-specific) so the recipes stay portable.
|
||||||
|
|
||||||
|
### Local / fine-tuned models (v0.23.0+)
|
||||||
|
|
||||||
|
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
|
||||||
|
|
||||||
|
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
|
||||||
|
|
||||||
## Manual swap fallback
|
## Manual swap fallback
|
||||||
|
|
||||||
@@ -75,6 +105,17 @@ cd ~/spark-vllm-docker
|
|||||||
docker logs -f vllm_node # wait for "Application startup complete."
|
docker logs -f vllm_node # wait for "Application startup complete."
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Sideload (`make install`) can't reach the server
|
||||||
|
|
||||||
|
Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
|
||||||
|
|
||||||
|
- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
|
||||||
|
- **Fix:** make the name resolve again, then re-run `make install`:
|
||||||
|
- `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
|
||||||
|
- `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
|
||||||
|
|
||||||
|
Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
|
||||||
|
|
||||||
## Diagnostics
|
## Diagnostics
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
+32
-12
@@ -8,38 +8,58 @@
|
|||||||
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
|
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
|
||||||
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
|
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
|
||||||
# reuses an existing release for the tag and replaces a same-named asset.
|
# reuses an existing release for the tag and replaces a same-named asset.
|
||||||
|
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
VERSION="${1:-}"; S9PK="${2:-}"
|
VERSION="${1:-}"; S9PK="${2:-}"
|
||||||
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
|
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
|
||||||
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
|
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
|
||||||
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
|
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
|
||||||
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository write access}"
|
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
|
||||||
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
|
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
|
||||||
|
|
||||||
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
|
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
|
||||||
ASSET="$(basename "$S9PK")"
|
ASSET="$(basename "$S9PK")"
|
||||||
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
|
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
|
||||||
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
|
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
|
||||||
AUTH=(-H "Authorization: token ${GITEA_TOKEN}")
|
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
|
||||||
|
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
|
||||||
|
|
||||||
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
|
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
|
||||||
|
|
||||||
|
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
|
||||||
|
api() {
|
||||||
|
local method="$1" url="$2"; shift 2
|
||||||
|
local out
|
||||||
|
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
|
||||||
|
-w $'\n%{http_code}' "$url")"
|
||||||
|
HTTP_CODE="${out##*$'\n'}"
|
||||||
|
BODY="${out%$'\n'*}"
|
||||||
|
}
|
||||||
|
|
||||||
# Reuse an existing release for this tag, otherwise create one.
|
# Reuse an existing release for this tag, otherwise create one.
|
||||||
id="$(curl -fsS "${AUTH[@]}" "$API/releases/tags/$TAG" 2>/dev/null | jq -r '.id // empty')"
|
api GET "$API/releases/tags/$TAG"
|
||||||
if [ -z "$id" ]; then
|
if [ "$HTTP_CODE" = 200 ]; then
|
||||||
id="$(curl -fsS -X POST "${AUTH[@]}" -H 'Content-Type: application/json' \
|
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
||||||
|
elif [ "$HTTP_CODE" = 404 ]; then
|
||||||
|
api POST "$API/releases" -H 'Content-Type: application/json' \
|
||||||
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
|
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
|
||||||
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')" \
|
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
|
||||||
"$API/releases" | jq -r '.id')"
|
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
||||||
|
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
||||||
|
else
|
||||||
|
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
[ -n "$id" ] && [ "$id" != null ] || { echo "could not obtain release id (check URL/token/tag)" >&2; exit 1; }
|
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
|
||||||
|
|
||||||
# Replace a same-named asset so re-runs don't 409.
|
# Replace a same-named asset so re-runs don't 409.
|
||||||
old="$(curl -fsS "${AUTH[@]}" "$API/releases/$id/assets" | jq -r --arg n "$ASSET" '.[] | select(.name==$n) | .id')"
|
api GET "$API/releases/$id/assets"
|
||||||
[ -n "$old" ] && curl -fsS -X DELETE "${AUTH[@]}" "$API/releases/$id/assets/$old" >/dev/null || true
|
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
|
||||||
|
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
|
||||||
|
|
||||||
curl -fsS -X POST "${AUTH[@]}" -F "attachment=@${S9PK};type=application/octet-stream" \
|
api POST "$API/releases/$id/assets?name=$ASSET" \
|
||||||
"$API/releases/$id/assets?name=$ASSET" >/dev/null
|
-F "attachment=@${S9PK};type=application/octet-stream"
|
||||||
|
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
||||||
|
|
||||||
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"
|
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"
|
||||||
|
|||||||
Reference in New Issue
Block a user