Compare commits
53 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| c18050cb87 | |||
| 8e978b44cf | |||
| 367d9869da | |||
| 81c448f70c | |||
| 01c5ab784d | |||
| 9bcb45789d | |||
| 8f8efbf0fc | |||
| ef5d6ec334 | |||
| 99a50a6776 | |||
| 15210e9590 | |||
| 0ba2a3a3fc | |||
| b87cb0f99b | |||
| 346df907d2 | |||
| 82224f53e7 | |||
| a7102105aa | |||
| ddfd508c2f | |||
| 6664543dec | |||
| 33768ae3d7 | |||
| 8a5862dcb2 | |||
| 589b3e59ab | |||
| d67152624e | |||
| 26382dc932 | |||
| c07eaeb4ee | |||
| 36ca99f73b | |||
| 5662d957af | |||
| 5a5634a3a9 | |||
| 9edb70418e | |||
| b7be1bab24 | |||
| 6b799113c4 | |||
| d90e7a230a | |||
| 6209c40f79 | |||
| e332363004 | |||
| ea35ac03ef | |||
| 1a86fb0bf0 | |||
| 66be0c1fc1 | |||
| 91b5d6d6a6 | |||
| 4c67ccd28d | |||
| ea328c2e2f | |||
| 27bfc2d6fd | |||
| 61e5d5cce8 | |||
| 0aa4bfb303 | |||
| ad68e0e16e | |||
| d6fefec017 | |||
| 22c817a4ec | |||
| d718a3b78a | |||
| f99241ec3e | |||
| d6f4390372 | |||
| 8a95609504 | |||
| f553547c32 | |||
| 49f776f172 | |||
| 4a3eeb4f20 | |||
| f2beb500e7 | |||
| db2f4269da |
@@ -1 +0,0 @@
|
|||||||
../../docs/guides/audio-speech.md
|
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
---
|
||||||
|
paths:
|
||||||
|
- "image/app/audio_proxy.py"
|
||||||
|
- "image/app/speech_models.py"
|
||||||
|
- "image/app/deep_health.py"
|
||||||
|
- "image/parakeet_patches/**"
|
||||||
|
- "scripts/test-audio-with-speakers.sh"
|
||||||
|
- "docs/AUDIO_API.md"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Audio / speech stack (Parakeet STT + Sortformer diarizer + Kokoro TTS on Spark 2)
|
||||||
|
|
||||||
|
## Changing the parakeet-asr container
|
||||||
|
|
||||||
|
- `image/parakeet_patches/` (`main.py`, `diarizer.py`) is an overlay copied into the `parakeet-asr` container by the "Reapply speech-model patches" dashboard action (`image/app/speech_models.py`). This is the **only** durable way to change that container — `docker exec` / pip changes inside it die on `docker rm`.
|
||||||
|
- **Never install `cuda-python` in parakeet-asr** to "fix" the startup warning about CUDA graphs being disabled. The warning is harmless; enabling the graph path crashes real decode with illegal memory access on this GPU/CUDA-13 stack (GB10/sm_121). The slow path served 11k+ requests with zero failures — leave it alone.
|
||||||
|
- Pin/constrain torch versions when pip-installing anything into NGC-based containers on the Sparks (ABI breaks otherwise); expect ARM64 wheel gaps and source builds (`--no-build-isolation` for torchaudio). Applies to `spark_embed` too.
|
||||||
|
|
||||||
|
## Testing audio endpoints
|
||||||
|
|
||||||
|
- Test with **real speech** (e.g. `say -o /tmp/t.wav --data-format=LEI16@16000 "<a couple of sentences>"`), not tones/silence — zero-token audio skips the decoder paths where crashes live.
|
||||||
|
- Send audio requests to Spark 2 **sequentially** in tests/scripts. Parallel audio requests can race (cuFFT → 503), and the single GPU serializes them anyway.
|
||||||
|
- End-to-end suite (hits the LIVE cluster):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/test-audio-with-speakers.sh <audio-file> # from repo root
|
||||||
|
```
|
||||||
|
|
||||||
|
`SPARK_CONTROL` defaults to `http://127.0.0.1:9999` (a running local dev server); point it at the installed package URL otherwise.
|
||||||
|
|
||||||
|
## API quirk
|
||||||
|
|
||||||
|
Spark Control's `/v1/models` lists *audio* models (STT model + Kokoro voices) by design — **not** the loaded LLM. Discover the LLM via `/api/status` (`vllm.current_model`).
|
||||||
|
|
||||||
|
Diarizer caps at 4 speakers (Sortformer `diar_sortformer_4spk-v1`).
|
||||||
@@ -1 +0,0 @@
|
|||||||
../../docs/guides/fastapi-image.md
|
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
---
|
||||||
|
paths:
|
||||||
|
- "image/**"
|
||||||
|
---
|
||||||
|
|
||||||
|
# FastAPI image (`image/`)
|
||||||
|
|
||||||
|
Standalone FastAPI app (Python ≥3.11; ships on `python:3.12-slim`; UI on port 9999; vanilla HTML/CSS/JS, no framework). Python has no configured linter/formatter — match the style of the file you're editing.
|
||||||
|
|
||||||
|
## Local dev (no StartOS)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd image
|
||||||
|
python3 -m venv .venv && source .venv/bin/activate # one-time
|
||||||
|
pip install -e .
|
||||||
|
export SPARK1_HOST=<ip> SPARK1_USER=<user> SPARK2_HOST=<ip> SPARK2_USER=<user> SSH_KEY_PATH=<private-key>
|
||||||
|
# Required outside the container — these default to paths under /data, which only exists in the image
|
||||||
|
# (missing REDACTION_MAP_DB crashes startup; missing CONNECTIVITY_LOG 500s /api/status):
|
||||||
|
export REDACTION_MAP_DB=/tmp/redaction_maps.db CONNECTIVITY_LOG=/tmp/connectivity.json
|
||||||
|
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
|
||||||
|
```
|
||||||
|
|
||||||
|
Other env vars: `BIND_PORT`, `MODELS_YAML`, `SSH_DIR`, `SSH_KNOWN_HOSTS`, `MODELS_OVERRIDES`, `SERVICES_OVERRIDES`.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
No pytest harness — each suite is a standalone script run with the `image/.venv` interpreter (system python3 has no deps). See the redaction and audio rules for the suites themselves.
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- Pydantic request models go at **module scope**, never inside a `build_router()` body (FastAPI silently 422s otherwise).
|
||||||
|
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
|
||||||
|
- `image/app/static/` — the dashboard UI.
|
||||||
|
- `image/models.yaml` — vLLM model catalog bundled into the image.
|
||||||
|
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
|
||||||
@@ -1 +0,0 @@
|
|||||||
../../docs/guides/redaction.md
|
|
||||||
@@ -0,0 +1,23 @@
|
|||||||
|
---
|
||||||
|
paths:
|
||||||
|
- "image/app/redaction/**"
|
||||||
|
- "image/app/redaction_gateway.py"
|
||||||
|
- "docs/REDACTION_GATEWAY.md"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Redaction (`/scrub` + `/rehydrate`)
|
||||||
|
|
||||||
|
- `image/app/redaction/scrub.py` + `test_scrub_leak.py` are vendored **byte-for-byte** from the CRM repo (sha recorded in `redaction/__init__.py`). **Never edit them here** — change them in the CRM repo, re-vendor (`cp`), update the sha, re-run the leak test.
|
||||||
|
- The gateway around the vendored scrubber is `image/app/redaction_gateway.py`. Its token-map store lives on `/data` (`REDACTION_MAP_DB`, default `/data/redaction_maps.db`) and fails closed if it can't open — set the env var when running outside the container.
|
||||||
|
|
||||||
|
## Test suites — both must pass before shipping ANY redaction change
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd image
|
||||||
|
.venv/bin/python -m app.redaction.test_gateway # /scrub + /rehydrate acceptance; offline, no cluster needed
|
||||||
|
.venv/bin/python app/redaction/test_scrub_leak.py # vendored golden-file leak test; offline
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep the leak test green against the vendored `scrub.py` after any re-vendor.
|
||||||
|
|
||||||
|
Policy context: scrubbed text via `/scrub` is the **only** sanctioned path toward frontier/cloud models — see the whole-repo privacy rule in CLAUDE.md.
|
||||||
@@ -1 +0,0 @@
|
|||||||
../../docs/guides/startos-package.md
|
|
||||||
@@ -0,0 +1,31 @@
|
|||||||
|
---
|
||||||
|
paths:
|
||||||
|
- "package/**"
|
||||||
|
---
|
||||||
|
|
||||||
|
# StartOS package (`package/`)
|
||||||
|
|
||||||
|
TypeScript wrapper that ships the Docker image as an s9pk. `@start9labs/start-sdk` pinned `1.3.3`, Node ≥22, bundled by `@vercel/ncc`.
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd package
|
||||||
|
npm i # one-time
|
||||||
|
make x86 # typecheck + ncc bundle + docker build + pack → spark-control_x86_64.s9pk
|
||||||
|
make install # sideload to the Start9 server; needs "host: http(s)://<server>.local" in ~/.startos/config.yaml
|
||||||
|
npm run check # tsc --noEmit — run after any startos/ edit; make x86 also runs it
|
||||||
|
npm run prettier # prettier --write startos (no semicolons, single quotes, trailing commas)
|
||||||
|
```
|
||||||
|
|
||||||
|
`make aarch64` for ARM Start9 servers. `make install` picks the newest `*.s9pk` in `package/` and restarts the live spark-control service — get a go/no-go first.
|
||||||
|
|
||||||
|
## Versioning & release notes
|
||||||
|
|
||||||
|
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
|
||||||
|
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
|
||||||
|
- The "Reapply speech-model patches" action is **not** a StartOS action — it's a dashboard action implemented in `image/app/speech_models.py`.
|
||||||
@@ -11,11 +11,5 @@ node_modules/
|
|||||||
dist/
|
dist/
|
||||||
build/
|
build/
|
||||||
.DS_Store
|
.DS_Store
|
||||||
|
|
||||||
# Claude Code — deny by default, allow-list shared wiring (see standards/portability.md)
|
|
||||||
.claude/*
|
.claude/*
|
||||||
!.claude/rules/
|
!.claude/rules/
|
||||||
!.claude/agents/
|
|
||||||
!.claude/commands/
|
|
||||||
!.claude/skills/
|
|
||||||
!.claude/settings.json
|
|
||||||
|
|||||||
@@ -1,70 +0,0 @@
|
|||||||
# AGENTS.md
|
|
||||||
|
|
||||||
This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the `CLAUDE.md` symlink.)
|
|
||||||
|
|
||||||
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
|
|
||||||
|
|
||||||
Subsystem guidance lives in `docs/guides/` and loads when matching files are touched (Claude Code lazy-loads via `.claude/rules/` symlinks; other agents read the guides directly): `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `docs/guides/audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
|
|
||||||
|
|
||||||
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
|
|
||||||
> items tagged `(spark-control)` and surface them before proposing next steps; triage with `/triage`.
|
|
||||||
|
|
||||||
## Stack
|
|
||||||
|
|
||||||
- Two halves, always coordinated:
|
|
||||||
- `image/` — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
|
|
||||||
- `package/` — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
|
|
||||||
- Build host needs `start-cli`, Node ≥22 + npm, and Docker.
|
|
||||||
- Cluster runtimes live **on the Sparks, not in this repo** (`spark-vllm-docker`, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
|
|
||||||
- Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM `:8888` (Spark 1); `parakeet-asr` `:8000`, Kokoro TTS `:8880`, bge-m3 embeddings + Qdrant (Spark 2). See `docs/` for API contracts.
|
|
||||||
|
|
||||||
## Commands (headlines — details in the scoped rules)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
|
|
||||||
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
|
|
||||||
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
|
|
||||||
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
|
|
||||||
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
|
|
||||||
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
|
|
||||||
```
|
|
||||||
|
|
||||||
## Layout
|
|
||||||
|
|
||||||
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
|
|
||||||
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
|
|
||||||
- `docs/` — `AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`, `COORDINATION.md` (consumer-facing API refs; update with API changes).
|
|
||||||
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
|
|
||||||
|
|
||||||
## Conventions
|
|
||||||
|
|
||||||
- Every shipped change = version bump + release notes + rebuilt s9pk (version format `X.Y.Z:N`; details in the startos-package rule).
|
|
||||||
- Commit messages: `vX.Y.Z:N - short lowercase summary`. **Never add a Co-Authored-By / Claude attribution trailer.**
|
|
||||||
- The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
|
|
||||||
- New external-facing endpoints get documented in `docs/` and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
|
||||||
- Doc layout: `AGENTS.md` is the canonical file; `CLAUDE.md` is a symlink to it (don't overwrite it). Subsystem guides are real files in `docs/guides/<topic>.md` (with `paths:` frontmatter); `.claude/rules/<topic>.md` are relative symlinks into them. A new guide = add `docs/guides/<topic>.md`, symlink it from `.claude/rules/`, and add an index line above.
|
|
||||||
|
|
||||||
## Always / Never (cluster-wide)
|
|
||||||
|
|
||||||
- **Always** confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
|
|
||||||
- **Always** use the Spark's **IP** for HTTP probes — `.local` mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust `.local` hostnames inside HTTP client code.
|
|
||||||
- **Always** pass `SSH_KEY_PATH` / `-i <key>` explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
|
|
||||||
- **Never** route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via `/scrub` is the only sanctioned path toward frontier models.)
|
|
||||||
- **Never** commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders. Canonical set: `<spark-1-ip>` / `<spark-2-ip>`, `<spark-1-host>` / `<spark-2-host>`, `<spark-user>`, and generic example names (`Alice`/`Bob`).
|
|
||||||
- **Never** install `cuda-python` in `parakeet-asr` — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.
|
|
||||||
|
|
||||||
## Current state
|
|
||||||
|
|
||||||
- **Built, install pending: v0.26.0:0 — disk-driven model menu.** The dashboard now lists what's *actually downloaded* on the Sparks instead of a hard-coded catalog. `models.yaml` + overrides are reframed as **launch recipes** matched to an on-disk model by `repo` (no longer "the menu"); `image/app/discovery.py` does the merge: `build_menu` scans both Sparks (`disk.list_cached_models`, one `du` per host) ∪ recipes; an on-disk model with no recipe is `needs_setup` and `infer_recipe` reads its `config.json` to prefill a one-time setup form (operator confirms; saved to `/data` overrides). Delete now removes weights **and** the card (`delete_from_disk` sweeps all hosts; the delete endpoint resolves keys via the live menu so discovered models are deletable). New `GET /api/models/suggest`; `/api/models` returns the menu + a `recipes` list (download-box autocomplete); `GET /api/models/disk-status` removed (folded into `/api/models`). Dropped the two legacy Qwen recipes (235B FP8, 2.5 72B). Build/typecheck clean; **install (live-service restart) needs go/no-go.** Why a recipe layer survives a "menu = disk" redesign: a folder can't tell you parsers / solo-vs-cluster / MoE backend (Gemma MoE needs `marlin` on GB10) — disk drives *presence*, recipes drive *launch*.
|
|
||||||
- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
|
|
||||||
- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
|
|
||||||
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
|
|
||||||
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (137 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
|
|
||||||
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
|
||||||
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
|
||||||
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
|
||||||
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
|
||||||
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
|
|
||||||
- **Design stance (decided):** Spark Control = control plane / GPU arbiter, **not** a job runner; recurring business jobs live in separate services that *call* the swap API (`POST /api/swap`). Full epic history (v0.22→v0.25) is in git log + `ROADMAP.md` → "Cluster coordination".
|
|
||||||
- **Usage note (2026-06-18):** owner's daily driver is the solo **Qwen3.6 35B**; the 235B `cluster` models are dormant. Keeping `launch-cluster.sh` (the `eugr/spark-vllm-docker` community standard, mirrors NVIDIA's `dgx-spark-playbooks` Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
|
|
||||||
- **Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in `ROADMAP.md`):** (1) raw-`docker run` swap generalization — **DEFERRED** (rationale in ROADMAP; revisit only if an adopter wants Spark Control to *drive*, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting `launch-cluster.sh`). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker `HEALTHCHECK` — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).
|
|
||||||
@@ -0,0 +1,58 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
|
||||||
|
|
||||||
|
Subsystem guidance lives in `.claude/rules/` and loads when matching files are touched: `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
|
||||||
|
|
||||||
|
## Stack
|
||||||
|
|
||||||
|
- Two halves, always coordinated:
|
||||||
|
- `image/` — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
|
||||||
|
- `package/` — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
|
||||||
|
- Build host needs `start-cli`, Node ≥22 + npm, and Docker.
|
||||||
|
- Cluster runtimes live **on the Sparks, not in this repo** (`spark-vllm-docker`, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
|
||||||
|
- Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM `:8888` (Spark 1); `parakeet-asr` `:8000`, Kokoro TTS `:8880`, bge-m3 embeddings + Qdrant (Spark 2). See `docs/` for API contracts.
|
||||||
|
|
||||||
|
## Commands (headlines — details in the scoped rules)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
|
||||||
|
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
|
||||||
|
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
|
||||||
|
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
|
||||||
|
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
|
||||||
|
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
|
||||||
|
- `docs/` — `AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md` (consumer-facing API refs; update with API changes).
|
||||||
|
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- Every shipped change = version bump + release notes + rebuilt s9pk (version format `X.Y.Z:N`; details in the startos-package rule).
|
||||||
|
- Commit messages: `vX.Y.Z:N - short lowercase summary`. **Never add a Co-Authored-By / Claude attribution trailer.**
|
||||||
|
- The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
|
||||||
|
- New external-facing endpoints get documented in `docs/` and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
||||||
|
|
||||||
|
## Always / Never (cluster-wide)
|
||||||
|
|
||||||
|
- **Always** confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
|
||||||
|
- **Always** use the Spark's **IP** for HTTP probes — `.local` mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust `.local` hostnames inside HTTP client code.
|
||||||
|
- **Always** pass `SSH_KEY_PATH` / `-i <key>` explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
|
||||||
|
- **Never** route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via `/scrub` is the only sanctioned path toward frontier models.)
|
||||||
|
- **Never** commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders (`<spark-1-ip>` style).
|
||||||
|
- **Never** install `cuda-python` in `parakeet-asr` — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.
|
||||||
|
|
||||||
|
## Current state
|
||||||
|
|
||||||
|
- **Working (v0.18.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack is healthy (11k+ requests/12h, all 200).
|
||||||
|
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 1–4s unresponsiveness while the single GPU is continuously busy. Remedy is client-side; a drafted message (in-flight cap 2, hard ceiling 3 global across audio endpoints, retry-with-backoff on timeout/503) is with the owner to forward to that dev.
|
||||||
|
- **Decided, not implemented:** remote access stays WireGuard/Tailscale split-tunnel — no public interface, so no API auth built; an empirical concurrency sweep is offered but needs the owner's explicit OK in a quiet window.
|
||||||
|
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
|
||||||
|
- **Repo wart:** commit `367d986` is labeled `v0.13.0:4` but actually contains everything through v0.18.0:0 — per-version commits for v0.14–v0.18 are missing. Keep commit messages accurate going forward.
|
||||||
|
- **Next:** (1) owner forwards the concurrency note to the Signal Engine dev; (2) run the concurrency sweep if the dev wants the measured knee; (3) add the `--memory` cap to parakeet-asr via the Reapply-patches action; (4) pick the next item from ROADMAP.md.
|
||||||
@@ -1,70 +0,0 @@
|
|||||||
# Evaluation — spark-control — 2026-06-12
|
|
||||||
|
|
||||||
Intent: A browser-based StartOS 0.4 package controlling a dual-DGX-Spark vLLM cluster — one-click model swaps plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
|
|
||||||
|
|
||||||
Agents run: evaluator, security-auditor, exerciser, start9-spec-checker. Reviewer skipped (working tree clean — no diff to review).
|
|
||||||
|
|
||||||
## Verdict
|
|
||||||
|
|
||||||
This is a capable, well-documented single-operator control plane: a ~960-line FastAPI app fronting SSH-driven model swaps plus honest HTTP proxies for chat, speech, embeddings, and a genuinely well-engineered fail-closed redaction gateway, wrapped by a thin, spec-conformant StartOS 0.4 package that builds cleanly and passes both offline test suites. The app boots and behaves correctly with the cluster absent, and the packaging is compliant on every structural requirement. The dominant risk, corroborated by two agents at the same code paths, is **unauthenticated remote command execution**: several endpoints interpolate caller-controlled strings (`repo`, `vllm_args`, NIM `image`/`container`, custom-service names) unquoted into shell commands run over SSH on the GPU nodes, and the app has no auth or CSRF protection by design — so the LAN/VPN trust boundary is the only thing between a browser-reachable request and cluster RCE. Owner infra topology (IPs, hostnames, SSH username, key name) was scrubbed from the working tree but still lives in git history, handing an attacker a target list for exactly those endpoints. The package is structurally ready but not safe to share widely until the injection sinks are quoted/validated and the history is dealt with.
|
|
||||||
|
|
||||||
## Cross-referenced findings
|
|
||||||
|
|
||||||
- **Command injection → cluster RCE** is reported by *both* the evaluator (P1) and the security-auditor (P0) at the same sinks (`models.py:80`, `swap.py:101`, `download.py:129`, `nim.py:145-166`, `services.py:144`). The evaluator demonstrated `build_launch_command` producing a live `;`-separated command from a hostile `repo`. Merged as **one P0** — the auditor's adversarial evidence (browser/CSRF reachability over plaintext HTTP, no auth) escalates the evaluator's network-gated P1.
|
|
||||||
- **No auth on state-mutating endpoints** is the shared root enabler: the evaluator filed it P2 (documented/intentional), the auditor filed the **CSRF** angle P1 (a malicious page in the operator's browser can `fetch()` the mutating routes and chain into the P0 injections). Merged into one P1, noting the auditor's CSRF evidence escalates the evaluator's original P2.
|
|
||||||
- **Owner data exposure**: the evaluator flagged real IPs/username in the (gitignored, untracked) `.claude/settings.local.json`; the auditor independently found the same class of data — IPs, hostnames, user `<spark-user>`, key name — persisting in **git history** despite the v0.18.0:1 working-tree scrub. These are the same concern at two locations; the git-history copy is the P0.
|
|
||||||
- **Front-end output hygiene**: the evaluator flagged `current_model` rendered via `innerHTML` without `escapeHtml` (`app.js:177`, P3); the exerciser noted `task_id` echoed verbatim in scrub JSON. The auditor read the UI as broadly `escapeHtml`-clean — see Disagreements.
|
|
||||||
|
|
||||||
## Priority queue
|
|
||||||
|
|
||||||
- [P0] Command injection via unquoted user input (`repo`, `vllm_args`, NIM `image`/`container`/`port`, custom-service `container`) interpolated into SSH shell commands → arbitrary RCE as the SSH user on the Sparks — `models.py:80`, `swap.py:101`, `download.py:129`, `nim.py:145-166`, `services.py:144`; demonstrated via `build_launch_command` — evaluator + security-auditor
|
|
||||||
- [P0] Owner infra topology (IPs `<spark-1-ip>`/`<spark-2-ip>`, QSFP `<spark-1-qsfp-ip>`/`<spark-2-qsfp-ip>`, hosts `<spark-1-host>`/`<spark-2-host>`, user `<spark-user>`, key `<ssh-key>`) persisted in git history despite the working-tree scrub → target list for the unauthenticated endpoints — security-auditor [RESOLVED 2026-06-12: history rewritten with git filter-repo; 0 hits across all refs]
|
|
||||||
- [P1] No auth + no CSRF protection on state-changing endpoints (plaintext `http`, `interfaces.ts:8`) → any LAN peer, or a malicious page in the operator's browser, can drive swap/install/stop/delete and chain into the P0 injections — security-auditor (CSRF P1) + evaluator (auth P2, escalated)
|
|
||||||
- [P1] SSRF / Qdrant path injection: caller `collection` interpolated into the Qdrant URL with no validation and raw `filter` forwarded verbatim — `embeddings_proxy.py:237,175,204` — security-auditor
|
|
||||||
- [P2] Test coverage is redaction-only; the swap state machine, proxies, SSH wrapper, and the StartOS package have zero automated tests — evaluator
|
|
||||||
- [P2] Loose dependency floors permit known-vulnerable `python-multipart`/`starlette` (DoS CVE-2024-53981 / CVE-2024-47874) on rebuild; no lockfile; no upload size caps — `pyproject.toml:6-13` — security-auditor
|
|
||||||
- [P2] Registry-submission blockers: source not public + `packageRepo`/`upstreamRepo` are `https://example.com` placeholders — `manifest/index.ts:12-13` — start9-spec-checker
|
|
||||||
- [P2] Unhandled `OSError` → opaque HTTP 500 on `POST /api/models` and `PUT /knobs` when `MODELS_OVERRIDES` is unset in dev (write to read-only `/data`) — exerciser
|
|
||||||
- [P2] NGC API key inlined single-quoted into a remote shell command (`export NGC_API_KEY='...'`) → quote-breakout risk + exposure in target process list — `nim.py:147` — security-auditor
|
|
||||||
- [P2] Single global mutable `catalog` reassigned via `global`, shared across in-flight async requests with no snapshot → latent race as concurrency grows — `server.py:107` — evaluator
|
|
||||||
- [P2] Container runs uvicorn as **root** (no `USER` in Dockerfile) bound to `0.0.0.0:9999` → any injection RCE runs the SSH client as root in-container — security-auditor (surprise)
|
|
||||||
- [P3] README Status block stale ("v0.2.3 / s9pk 0.13.0:4", undercounts features) vs actual v0.18.0:1 — `README.md:115` — evaluator
|
|
||||||
- [P3] `current_model` rendered via `innerHTML` without `escapeHtml` (`app.js:177`); `task_id` echoed verbatim in scrub JSON — evaluator + exerciser
|
|
||||||
- [P3] httpx exception class names leak into `/v1/audio/speech` and `/api/speech-models` error responses — exerciser
|
|
||||||
- [P3] `NimInstallBody.register` shadows `BaseModel` attribute → `UserWarning` on every startup; rename (e.g. `register_service`) — exerciser
|
|
||||||
- [P3] Deprecated `@app.on_event` startup/shutdown and hardcoded `app.version="0.1.0"` (real version 0.18.0:1) — `server.py:49,55` — evaluator
|
|
||||||
- [P3] `marketingUrl` is an `example.com` placeholder (set `null` or a real URL) — `manifest/index.ts:14` — start9-spec-checker
|
|
||||||
- [P3] `instructions.md:35` has a broken/template source link (`github.com/Start9Labs/... (TBD)`) visible to end users — start9-spec-checker
|
|
||||||
- [P3] Per-service SSH users (`parakeet_user`/`kokoro_user`/`embed_user`/`qdrant_user`) are read by `main.ts` but absent from the Configure-Sparks action inputSpec → silent default-to-empty misconfig — start9-spec-checker
|
|
||||||
- [P3] `Makefile` builds only `x86` though the manifest declares `aarch64`; release notes describe the portability scrub, not package capabilities — start9-spec-checker
|
|
||||||
- [P3] Hardening: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text (`r.text[:500]`) echoed to clients — security-auditor
|
|
||||||
|
|
||||||
## Scorecard
|
|
||||||
|
|
||||||
| Lens | Score /5 | Justification (cross-checked) |
|
|
||||||
|------|----------|-------------------------------|
|
|
||||||
| Architecture | 4 | Clean router-per-concern split, all SSH funnelled through one wrapper (`ssh.py:29`), proxies stay intentionally dumb; global mutable `catalog` + deprecated `on_event` are minor seams. |
|
|
||||||
| Security | 2 | Held at 2: auditor's evidence (P0 git-history leak, P1 CSRF, P1 SSRF, root container) corroborates and escalates the evaluator's injection finding rather than contradicting it. The redaction boundary is the bright spot; the transport around it is not. |
|
|
||||||
| Performance | 4 | Async throughout, parallel health fans, unreachable-host cache avoids repeated 6s SSH stalls; `_win_rms` per-sample Python loop is the one hot spot (`audio_proxy.py:635`). |
|
|
||||||
| Testing | 3 | Two thorough offline redaction suites pass (69/69 + leak); everything else — swap, proxies, SSH, package — is untested, and live-cluster paths couldn't be exercised at all. |
|
|
||||||
| Code quality | 4 | Consistent style, useful "why" comments, typed dataclasses; `server.py` (962 lines) and `audio_proxy.py` (829) are getting long. |
|
|
||||||
| Documentation | 4 | Excellent AGENTS.md, scoped guides, HANDOFF, dated `known-issues.md`; undercut by the stale README status line. |
|
|
||||||
|
|
||||||
No lens score was overturned by cross-agent evidence; Security stays at 2 with the auditor's findings reinforcing it.
|
|
||||||
|
|
||||||
## Disagreements & gaps
|
|
||||||
|
|
||||||
- **Injection severity**: auditor P0 vs evaluator P1. Resolved to P0 — the disagreement is purely about whether the no-auth/LAN posture demotes it; the auditor's CSRF finding shows it's reachable from a browser, so the network gate is weaker than the evaluator assumed.
|
|
||||||
- **Front-end XSS**: the evaluator flagged one unescaped `innerHTML` sink (`current_model`) and the exerciser flagged `task_id` reflection, while the auditor judged the UI broadly `escapeHtml`-clean (47 escape calls). Low-stakes (JSON API + mostly-escaped render path) but unresolved.
|
|
||||||
- **Shared blind spot**: no agent could exercise the live-cluster paths — actual swap execution, audio transcription/diarization/label-merge, embeddings/search with real vectors. These are simultaneously the **largest, most security-relevant, and least-tested** modules (`swap.py`, `audio_proxy.py`, `services.py`), so a regression in launch-command construction or speaker-merge logic would ship silently. The evaluator and exerciser both name this gap.
|
|
||||||
- **Registry context**: the spec-checker notes there is currently no StartOS 0.4 community registry (alpha only), so its blockers are inferred from the 0.3.5.x submission doc — applicable when 0.4 opens, but the process may change.
|
|
||||||
|
|
||||||
## Suggested order of work
|
|
||||||
|
|
||||||
1. **Close the injection sinks** — `shlex.quote` or strict-regex-validate every user-controlled value crossing into SSH (`repo`, `vllm_args`, NIM `image`/`container`/`port`, custom-service names); the safe pattern already exists in `disk.py:_SAFE_DIRNAME`. Cheap, local, independent of the auth decision. (P0)
|
|
||||||
2. **Decide the git-history question** before any wider sharing — rewrite history (`git-filter-repo`) and rotate the named `<ssh-key>` key, or commit to keeping the repo private-forever. (P0)
|
|
||||||
3. **Add a defense-in-depth gate** on mutating endpoints — an `Origin`/referer check or a shared-token header in middleware — so a misconfigured StartOS exposure isn't instant RCE; leave read-only probes open. (P1)
|
|
||||||
4. **Harden the remaining inputs** — validate the Qdrant `collection`, pin dependency floors + commit a lockfile, add upload size caps, drop the root container `USER`. (P1–P2)
|
|
||||||
5. **Add a minimal pytest harness** for `build_launch_command` (incl. injection cases), the swap state transitions, and `_merge_words_with_speakers` — the untested core. (P2)
|
|
||||||
6. **Fix the doc/packaging drift** — README status block, the `example.com` manifest URLs, the `instructions.md` link, release-note content, and the hardcoded `app.version`. (P2–P3)
|
|
||||||
7. **If pursuing the registry later** — publish source publicly, build the declared `aarch64` artifact, and run the manual on-box checklist (`start-cli s9pk inspect`, install/uninstall, backup/restore). (P2)
|
|
||||||
+1
-1
@@ -92,7 +92,7 @@ Now that hosts are configured, Show Public Key will give you the paste-ready ins
|
|||||||
From the Spark Control service page, click the Web UI button. You should see:
|
From the Spark Control service page, click the Web UI button. You should see:
|
||||||
|
|
||||||
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
|
- A **top status bar** with the currently loaded LLM (or "no model loaded" if Spark 1's vLLM container is fresh).
|
||||||
- An **LLM tab** whose cards are the models actually downloaded on your Sparks (the dashboard scans them on load). A model Spark Control doesn't yet know how to launch shows a "needs setup" card; the first switch reads its files, proposes settings, and asks you to confirm once. Use **+ Download a new model** to fetch one — it appears here when it finishes.
|
- An **LLM tab** with cards for each model in the bundled catalog. Models you've downloaded show "on disk" badges; others show "not downloaded".
|
||||||
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
|
- An **Audio / Speech tab** with health status and Install / Start / Stop / Restart buttons for Parakeet and Kokoro.
|
||||||
|
|
||||||
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
|
If the dashboard loads and both Spark hardware cards show CPU/RAM/GPU stats, **you're in**.
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
MIT License
|
MIT License
|
||||||
|
|
||||||
Copyright (c) 2026 Alice
|
Copyright (c) 2026 Grant
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -112,14 +112,14 @@ Fields: `service` (required), `ok` (required), `source` (optional, free-form), `
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
**s9pk version 0.26.0:0** — installed and verified on a Start9 server. The LLM menu is whatever's downloaded on the Sparks (scanned live, not hard-coded); bundled *launch recipes* (qwen3-vl, gemma4, gemma4-26b, qwen36) tell it how to launch known models, and anything else gets a "needs setup" card that infers + saves its settings on first use.
|
**v0.2.3 / s9pk version 0.13.0:4** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
|
||||||
|
|
||||||
### What v0.2 added on top of v0.1
|
### What v0.2 added on top of v0.1
|
||||||
|
|
||||||
- **Service discovery API** (`/api/endpoints`) for other LAN services
|
- **Service discovery API** (`/api/endpoints`) for other LAN services
|
||||||
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
|
- **Kokoro-82M TTS** replaces Magpie/Riva NIM as the default TTS backend (v0.14.0). Magpie's decoder had a ~30-50% truncation rate on multi-sentence inputs and ate 49 GB of GPU memory; Kokoro is 24/24 reliable at every input length tested, uses 1.3 GB GPU, and renders in ~1s. See HANDOFF.md and the release notes for the migration story.
|
||||||
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host configuration in Configure Sparks (so they can live on Spark 1, Spark 2, or anywhere)
|
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Kokoro, plus per-service host configuration in Configure Sparks (so they can live on Spark 1, Spark 2, or anywhere)
|
||||||
- **Model download** from the dashboard — paste an HF repo (with autocomplete for known models), pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion the model appears on the menu automatically; if it's unrecognized, a pre-filled "set up this model" dialog offers to configure it.
|
- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
|
||||||
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
|
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
|
||||||
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
|
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
|
||||||
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
|
- **Diarization with speaker fingerprints** via Sortformer + TitaNet, exposed at `/api/audio/diarize-chunk` for chunked workflows
|
||||||
|
|||||||
+1
-47
@@ -2,36 +2,8 @@
|
|||||||
|
|
||||||
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
|
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
|
||||||
|
|
||||||
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
|
|
||||||
|
|
||||||
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
|
|
||||||
|
|
||||||
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap` → `GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
|
|
||||||
|
|
||||||
Sequenced:
|
|
||||||
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
|
||||||
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
|
|
||||||
3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
|
|
||||||
4. **Coordination layer** — DONE in tree, staged as **v0.25.0:0** (built/typechecked clean; install pending). All three primitives shipped; `image/app/coordination.py` + `docs/COORDINATION.md`. Brought forward 2026-06-17 on request rather than waiting for our own automation.
|
|
||||||
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). Acquire returns a secret token; the swap endpoint refuses any real swap (`423`) that doesn't present it in `X-Swap-Lock-Token`, so the dashboard's manual swap is paused while a scheduler holds it (with a `?force=true` human override). In-memory + TTL-bounded → resets to unlocked on restart; re-acquire with the token extends. Enforced in `post_swap`, not advisory.
|
|
||||||
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL (Configure-Sparks field), fired from `SwapManager._run` *outside* the swap lock; optional shared secret ⇒ `X-Spark-Signature` HMAC. Fire-and-forget (5 s, no retries); dry runs don't fire.
|
|
||||||
- **Schedule visibility** — `GET/POST/DELETE /api/schedule`; read-only "Scheduled jobs" dashboard panel, registered by external schedulers. Spark Control stores and displays, never executes.
|
|
||||||
- Tests: `image/tests/test_coordination.py` (22 cases — lock lifecycle/expiry/token, the single-read swap gate, schedule CRUD + id validation, webhook payload+signature). Known limit: lock + schedules are in-memory (a restart frees the lock and empties the registry until schedulers re-register) — persist to `/data` only if that bites.
|
|
||||||
|
|
||||||
### Generalizing the swap mechanism to raw `docker run` — DEFERRED (decided 2026-06-18, research-backed; was item 4's last open thread)
|
|
||||||
|
|
||||||
Our swap drives `~/spark-vllm-docker/launch-cluster.sh` over SSH on Spark 1 (`./launch-cluster.sh stop`, then `[VLLM_SPARK_EXTRA_DOCKER_ARGS=…] ./launch-cluster.sh [--solo ]-d exec vllm serve <model> <args>`, then `docker logs -f` until the ready marker). The OpenClaw adopter launches vLLM with a plain `docker run` instead, so the swap button can't drive his cluster — only monitor it. The portability fix would be a configurable "swap backend": keep `launch-cluster.sh` as the default and add a "bring your own command" mode (operator-authored stop/launch templates in `services-overrides.yaml` with quoted `{model}`/`{container}`/`{port}`/`{extra_args}` substitution; ready-detection unchanged; the vLLM-argparse pre-flight disabled for that backend).
|
|
||||||
|
|
||||||
**Why deferred, not built:**
|
|
||||||
- **Raw docker is not an upgrade for *us* — for half our catalog it's impossible.** `launch-cluster.sh` is the `eugr/spark-vllm-docker` community project (de-facto DGX Spark standard; mirrors NVIDIA's own `dgx-spark-playbooks` Ray+RDMA architecture). Its headline job is **multi-node** serving: our 235B `cluster` models (Qwen3-VL 235B, Qwen3 235B) exceed one Spark's 128 GB and *must* shard across both Sparks via Ray over the 200 Gbps ConnectX/RoCE link — plumbing (NCCL/MTU/per-node env) that a single-node `docker run` cannot do. So we keep the helper script; switching our own cluster to raw docker is off the table.
|
|
||||||
- **The feature is therefore portability-only** (for differently-wired adopters), and the one known adopter doesn't need it — he swaps via his own crons and uses Spark Control to watch.
|
|
||||||
- **Untestable on our hardware** — our cluster uses the helper script, so we can't validate a real raw-docker swap without risking the live vLLM.
|
|
||||||
- The one real standing risk is eugr's single-maintainer status; fallback is community forks or migrating to NVIDIA's official `dgx-spark-playbooks` launcher (same design). No reason to switch now.
|
|
||||||
|
|
||||||
**Revisit only if** an adopter explicitly wants Spark Control to *drive* (not just monitor) swaps on a raw-`docker run` cluster. At that point, get their actual working `docker run` command and build the command-template backend to it.
|
|
||||||
|
|
||||||
## Near term
|
## Near term
|
||||||
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
|
- parakeet-asr `--memory` cap, shipped via the Reapply-patches action (guards against swap-thrash on very long audio).
|
||||||
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
|
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
|
||||||
|
|
||||||
## Audio quality
|
## Audio quality
|
||||||
@@ -50,21 +22,3 @@ Our swap drives `~/spark-vllm-docker/launch-cluster.sh` over SSH on Spark 1 (`./
|
|||||||
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
|
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
|
||||||
- Spark host update actions (OS/driver) from the UI.
|
- Spark host update actions (OS/driver) from the UI.
|
||||||
- Open WebUI link-out integration; richer per-service detail views.
|
- Open WebUI link-out integration; richer per-service detail views.
|
||||||
|
|
||||||
## Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)
|
|
||||||
|
|
||||||
P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:
|
|
||||||
|
|
||||||
**P2 — track:**
|
|
||||||
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for `build_launch_command` (incl. injection cases), swap transitions, and `_merge_words_with_speakers` is the highest-value start.
|
|
||||||
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml`).
|
|
||||||
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
|
|
||||||
- NGC API key still appears on the remote process command line (`nim.py`) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
|
|
||||||
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py`) — latent race as concurrency grows.
|
|
||||||
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.
|
|
||||||
|
|
||||||
**P3 — bulk-fix when next touching docs/packaging:**
|
|
||||||
- README Status block stale (`v0.2.3 / 0.13.0:4` → now v0.19.0:0); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js`) + `task_id` reflected in scrub JSON.
|
|
||||||
- Packaging: `marketingUrl`/`packageRepo`/`upstreamRepo` are `example.com` placeholders; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though the manifest declares `aarch64`.
|
|
||||||
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
|
|
||||||
- StartOS registry (only if ever pursuing it): source must be public + real repo URLs.
|
|
||||||
|
|||||||
@@ -0,0 +1,260 @@
|
|||||||
|
# Project: spark-control — Model switcher web UI for dual DGX Spark cluster
|
||||||
|
|
||||||
|
> **Update 2026-05-12 — Direction change:** the web UI is being built as a
|
||||||
|
> **StartOS 0.4 package** (sideloaded onto Grant's existing Start9 server),
|
||||||
|
> **not** as a FastAPI service running directly on Spark 1. The Start9 server
|
||||||
|
> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
|
||||||
|
> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
|
||||||
|
> credentials live in a per-install config file managed by a "Configure Sparks"
|
||||||
|
> action. See <https://docs.start9.com/packaging/0.4.0.x/> for the packaging
|
||||||
|
> model. Repo layout:
|
||||||
|
>
|
||||||
|
> - `image/` — Docker image source (FastAPI app, runs anywhere with `uvicorn`).
|
||||||
|
> - `package/` — StartOS 0.4 wrapper (manifest, main, interfaces, actions).
|
||||||
|
>
|
||||||
|
> The "Phase 4: Deploy" section below (systemd on Spark 1) is **superseded** by
|
||||||
|
> the StartOS sideload workflow. Other phases (models.yaml schema, swap script,
|
||||||
|
> FastAPI endpoints, frontend) still apply but live inside `image/`.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
I want to build a small web service that gives me a browser-based interface to:
|
||||||
|
|
||||||
|
1. See which LLM is currently loaded on my DGX Spark cluster
|
||||||
|
2. Click a button to swap to a different model
|
||||||
|
3. See real-time status as the swap progresses (stop → launch → ready)
|
||||||
|
4. See basic health info about supporting services (Parakeet STT, eventually Magpie TTS)
|
||||||
|
|
||||||
|
The UI should live at a stable URL on my LAN so I can bookmark it. I'll likely access it from my laptop and phone.
|
||||||
|
|
||||||
|
## Where this project lives
|
||||||
|
|
||||||
|
This repo lives on **my laptop** (macOS). The Sparks are servers — we control them remotely over SSH. Claude Code runs on my laptop, makes edits in the local repo, and executes commands on the Sparks via SSH.
|
||||||
|
|
||||||
|
The web UI itself, when deployed, will run on **Spark 1** (where it can directly invoke `launch-cluster.sh`), but development happens on my laptop. We'll deploy the code to Spark 1 via `rsync` or `scp` or `git pull` as needed.
|
||||||
|
|
||||||
|
## SSH setup
|
||||||
|
|
||||||
|
From my laptop I can SSH to either Spark directly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh modelo@192.168.1.103 # Spark 1
|
||||||
|
ssh modelo@192.168.1.87 # Spark 2
|
||||||
|
```
|
||||||
|
|
||||||
|
(I can also use SSH key auth — set up earlier.)
|
||||||
|
|
||||||
|
When you need to run a command on a Spark, use this pattern:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh modelo@192.168.1.103 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
||||||
|
```
|
||||||
|
|
||||||
|
For multi-line commands or scripts, you can pipe a heredoc or just SSH in directly and run them interactively. Either works — but always tell me what you're about to run so I can review.
|
||||||
|
|
||||||
|
For file transfers between my laptop and the Sparks, use `rsync`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -avz ~/Projects/spark-control/ modelo@192.168.1.103:~/spark-control/
|
||||||
|
```
|
||||||
|
|
||||||
|
## My hardware and what's running
|
||||||
|
|
||||||
|
**Two NVIDIA DGX Spark units** networked together:
|
||||||
|
|
||||||
|
- **Spark 1** — hostname `spark-27ea`, LAN IP `192.168.1.103`, QSFP IP `192.168.100.10`. Head node for the vLLM cluster.
|
||||||
|
- **Spark 2** — hostname `spark-32d0`, LAN IP `192.168.1.87`, QSFP IP `192.168.100.11`. Worker node for vLLM cluster, also hosts standalone services.
|
||||||
|
|
||||||
|
Both run Ubuntu 24.04, NVIDIA driver 580.x, CUDA 13.0, Docker, and have 128 GB unified memory each. They share a QSFP cable for high-speed (200 Gb/s) inter-node networking.
|
||||||
|
|
||||||
|
Passwordless SSH works in both directions via `~/.ssh/id_ed25519_shared` key. My Linux username on both machines is `modelo`.
|
||||||
|
|
||||||
|
**Currently running:**
|
||||||
|
- One LLM at a time on the cluster (via the `eugr/spark-vllm-docker` project — see below)
|
||||||
|
- `parakeet-asr` Docker container on Spark 2 (port 8000) — running 24/7 for speech-to-text, healthy for weeks
|
||||||
|
- `magpie-tts` Docker container on Spark 2 (port 9000) — was being set up; I'm not 100% sure of its current state; first task is to verify
|
||||||
|
- Open WebUI runs on a separate Start9 server on the LAN (not on the Sparks), accessing the LLM via HTTP
|
||||||
|
|
||||||
|
## The LLM cluster: how it works
|
||||||
|
|
||||||
|
I use the **`eugr/spark-vllm-docker`** community project (cloned to `~/spark-vllm-docker` on Spark 1). It manages a Ray-based vLLM cluster across both Sparks, with a wrapper script called `launch-cluster.sh` that handles starting/stopping Docker containers on both nodes.
|
||||||
|
|
||||||
|
Key commands (all run from `~/spark-vllm-docker` on Spark 1):
|
||||||
|
- `./launch-cluster.sh status` — see what's running on both nodes
|
||||||
|
- `./launch-cluster.sh stop` — stop the cluster
|
||||||
|
- `./launch-cluster.sh -d exec vllm serve ...` — launch in daemon mode with vLLM args
|
||||||
|
- `./launch-cluster.sh --solo -d exec vllm serve ...` — same but only on Spark 1 (for smaller models)
|
||||||
|
- `docker logs -f vllm_node` — tail vLLM logs
|
||||||
|
|
||||||
|
Container names: `vllm_node` (the main vLLM container), `ray_head` and `ray_worker` (Ray cluster), plus support containers.
|
||||||
|
|
||||||
|
The vLLM server binds to port **8888** and exposes an OpenAI-compatible API at `http://192.168.1.103:8888/v1`.
|
||||||
|
|
||||||
|
## Models I have on disk (both Sparks)
|
||||||
|
|
||||||
|
All weights live in `~/.cache/huggingface/hub/` on each Spark:
|
||||||
|
|
||||||
|
1. **`RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4`** (~135 GB) — flagship MoE, runs across both Sparks (-tp 2), has vision capability. Use for: maximum quality, vision input, multilingual.
|
||||||
|
|
||||||
|
2. **`RedHatAI/gemma-4-31B-it-NVFP4`** (~23 GB) — runs solo on Spark 1, has vision, has thinking-mode reasoning. Use for: math/reasoning-heavy tasks. Has a known vLLM Triton-attention slowdown bug (~15-20 tok/s vs theoretical 30-40).
|
||||||
|
|
||||||
|
3. **`RedHatAI/Qwen3.6-35B-A3B-NVFP4`** (~20 GB) — newer-generation Qwen MoE (35B total / 3B active), runs solo on Spark 1, expected to be the fastest (~70-100 tok/s) and my new daily driver. **Note: this may still be downloading or may not be downloaded yet — first task is to verify and download if needed.**
|
||||||
|
|
||||||
|
## Exact launch commands for each model
|
||||||
|
|
||||||
|
These are the commands my system needs to run when I click a swap button.
|
||||||
|
|
||||||
|
### Qwen3-VL-235B (uses both Sparks)
|
||||||
|
```bash
|
||||||
|
cd ~/spark-vllm-docker
|
||||||
|
./launch-cluster.sh stop
|
||||||
|
./launch-cluster.sh -d exec vllm serve \
|
||||||
|
RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4 \
|
||||||
|
--port 8888 --host 0.0.0.0 \
|
||||||
|
--gpu-memory-utilization 0.7 \
|
||||||
|
-tp 2 \
|
||||||
|
--distributed-executor-backend ray \
|
||||||
|
--max-model-len 32768
|
||||||
|
```
|
||||||
|
Expected ready time: ~3-5 min after stop completes.
|
||||||
|
|
||||||
|
### Gemma 4 31B (solo on Spark 1)
|
||||||
|
```bash
|
||||||
|
cd ~/spark-vllm-docker
|
||||||
|
./launch-cluster.sh stop
|
||||||
|
./launch-cluster.sh --solo -d exec vllm serve \
|
||||||
|
RedHatAI/gemma-4-31B-it-NVFP4 \
|
||||||
|
--port 8888 --host 0.0.0.0 \
|
||||||
|
--gpu-memory-utilization 0.8 \
|
||||||
|
--max-model-len 32768 \
|
||||||
|
--reasoning-parser gemma4 \
|
||||||
|
--tool-call-parser gemma4 \
|
||||||
|
--enable-auto-tool-choice
|
||||||
|
```
|
||||||
|
Expected ready time: ~3-4 min.
|
||||||
|
|
||||||
|
### Qwen3.6-35B-A3B (solo on Spark 1) — new daily driver
|
||||||
|
```bash
|
||||||
|
cd ~/spark-vllm-docker
|
||||||
|
./launch-cluster.sh stop
|
||||||
|
./launch-cluster.sh --solo -d exec vllm serve \
|
||||||
|
RedHatAI/Qwen3.6-35B-A3B-NVFP4 \
|
||||||
|
--port 8888 --host 0.0.0.0 \
|
||||||
|
--gpu-memory-utilization 0.85 \
|
||||||
|
--max-model-len 65536 \
|
||||||
|
--reasoning-parser qwen3 \
|
||||||
|
--moe_backend flashinfer_cutlass
|
||||||
|
```
|
||||||
|
Expected ready time: ~3-5 min.
|
||||||
|
|
||||||
|
Note: the `--moe_backend flashinfer_cutlass` flag is Blackwell-specific. If it errors on launch, fallback is to remove that flag.
|
||||||
|
|
||||||
|
### Common operations
|
||||||
|
- Stop everything: `./launch-cluster.sh stop`
|
||||||
|
- Status check: `./launch-cluster.sh status`
|
||||||
|
- See vLLM logs: `docker logs vllm_node` (add `-f` to follow)
|
||||||
|
- Hard reset if stuck: `./launch-cluster.sh stop && docker ps -aq | xargs -r docker rm -f`
|
||||||
|
- Health check (is API responding?): `curl -s http://192.168.1.103:8888/v1/models`
|
||||||
|
|
||||||
|
### "Ready" signal
|
||||||
|
The model is ready to serve when `docker logs vllm_node` contains the line `Application startup complete.` Until then, it's still loading weights or compiling CUDA graphs.
|
||||||
|
|
||||||
|
## Supporting services on Spark 2 (always-on, separate from cluster)
|
||||||
|
|
||||||
|
These don't get touched by model swaps:
|
||||||
|
|
||||||
|
- **`parakeet-asr`** — STT on port 8000. Already running 24/7. Verify with `curl http://192.168.1.87:8000/health` which should return `{"status":"ready",...}`.
|
||||||
|
- **`magpie-tts`** — TTS on port 9000. May or may not be running; verify with `docker ps` on Spark 2 and `curl http://192.168.1.87:9000/v1/health/ready`.
|
||||||
|
|
||||||
|
## What I want you to build
|
||||||
|
|
||||||
|
### Phase 1: Set up the project repo (start here)
|
||||||
|
|
||||||
|
Create a Git repo at `~/Projects/spark-control/` on **my laptop**. Initial structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
spark-control/
|
||||||
|
├── README.md
|
||||||
|
├── models.yaml # Declarative config for each model
|
||||||
|
├── scripts/
|
||||||
|
│ ├── swap-model.sh # Universal swap script
|
||||||
|
│ ├── status.sh # Cluster + service status
|
||||||
|
│ └── health.sh # Health checks for everything
|
||||||
|
├── web-ui/
|
||||||
|
│ ├── server.py # FastAPI backend
|
||||||
|
│ ├── static/
|
||||||
|
│ │ ├── index.html # Toggle UI
|
||||||
|
│ │ ├── style.css
|
||||||
|
│ │ └── app.js # State management, polling
|
||||||
|
│ └── requirements.txt
|
||||||
|
├── runbook.md # Operating notes
|
||||||
|
└── known-issues.md # Gotchas, troubleshooting
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Build the universal swap script
|
||||||
|
|
||||||
|
`scripts/swap-model.sh <model-key>` should:
|
||||||
|
1. Read the launch command from `models.yaml` by key (e.g. `qwen3-vl`, `gemma4`, `qwen36`)
|
||||||
|
2. Stop the current cluster (via SSH to Spark 1)
|
||||||
|
3. Run the new launch command (via SSH to Spark 1)
|
||||||
|
4. Tail logs until "Application startup complete" appears or a timeout (~10 min) hits
|
||||||
|
5. Return exit code 0 on success, non-zero on failure
|
||||||
|
|
||||||
|
Two versions might be useful:
|
||||||
|
- The version that runs on **my laptop** — wraps everything in `ssh modelo@192.168.1.103 ...`
|
||||||
|
- A simpler version that lives on **Spark 1** — runs commands directly without SSH (used by the deployed web UI)
|
||||||
|
|
||||||
|
You can either share one script with a `--remote` flag, or make them two distinct files. Your call — propose the cleaner option.
|
||||||
|
|
||||||
|
### Phase 3: Build the web UI
|
||||||
|
|
||||||
|
FastAPI backend that:
|
||||||
|
- `GET /api/status` → JSON with `{current_model, ready, parakeet_health, magpie_health, last_swap_time}`
|
||||||
|
- `POST /api/swap` with `{model_key}` → starts swap, returns swap job ID
|
||||||
|
- `GET /api/swap/{job_id}/stream` → Server-Sent Events streaming swap progress
|
||||||
|
- `GET /` → serves the HTML UI
|
||||||
|
|
||||||
|
Frontend should:
|
||||||
|
- Show a card per model with a "Switch to this" button
|
||||||
|
- Highlight which model is currently loaded
|
||||||
|
- During a swap, show streaming log output and a spinner
|
||||||
|
- Show a green/red indicator for Parakeet and Magpie health
|
||||||
|
- Auto-refresh every 5 seconds
|
||||||
|
|
||||||
|
Keep the UI simple, clean, dark-themed. No frameworks needed — vanilla HTML/JS is fine.
|
||||||
|
|
||||||
|
### Phase 4: Deploy and make it persistent
|
||||||
|
|
||||||
|
The web UI runs on **Spark 1** so it can directly invoke `launch-cluster.sh` without SSH overhead. To deploy:
|
||||||
|
|
||||||
|
1. `rsync` the project code from my laptop to `~/spark-control/` on Spark 1
|
||||||
|
2. Set up a Python virtual environment on Spark 1 and install requirements
|
||||||
|
3. Create a systemd service file that starts the FastAPI server on boot
|
||||||
|
4. Service should listen on `0.0.0.0:9999` so I can hit it from any device on my LAN
|
||||||
|
5. Add a simple deploy script (`scripts/deploy.sh`) on my laptop that does the rsync + restart in one command for future iteration
|
||||||
|
|
||||||
|
## Working style
|
||||||
|
|
||||||
|
- Before making changes that affect the running cluster, please ask me first.
|
||||||
|
- When you write commands you want me to run, give them in clearly marked code blocks.
|
||||||
|
- Distinguish clearly when a command is meant to run on my laptop vs. on a Spark (which means via SSH).
|
||||||
|
- If you need information about the current state of the Sparks, ask me to run a diagnostic SSH command and paste the output — or run it yourself if you have shell access.
|
||||||
|
- Test things incrementally. Don't build the whole UI before validating the swap script works.
|
||||||
|
- I'm a layman — explain technical decisions briefly in plain English when they involve trade-offs.
|
||||||
|
- When making changes that modify files on a Spark, do them by editing in my laptop's repo first and then deploying — not by editing on the Spark directly. That keeps my laptop as the source of truth.
|
||||||
|
|
||||||
|
## First task
|
||||||
|
|
||||||
|
1. First, **verify SSH access to both Sparks** from my laptop:
|
||||||
|
- `ssh modelo@192.168.1.103 hostname` should return `spark-27ea`
|
||||||
|
- `ssh modelo@192.168.1.87 hostname` should return `spark-32d0`
|
||||||
|
2. Then **verify the current state of the cluster** via SSH:
|
||||||
|
- Confirm `~/spark-vllm-docker` exists on Spark 1 and `launch-cluster.sh` is there: `ssh modelo@192.168.1.103 'ls ~/spark-vllm-docker/launch-cluster.sh'`
|
||||||
|
- Check which LLM (if any) is currently loaded: `ssh modelo@192.168.1.103 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'` and `ssh modelo@192.168.1.103 'curl -s http://localhost:8888/v1/models'`
|
||||||
|
- Verify which models are downloaded: `ssh modelo@192.168.1.103 'ls ~/.cache/huggingface/hub/ | grep -iE "qwen|gemma"'`
|
||||||
|
- Specifically check if `Qwen3.6-35B-A3B-NVFP4` is downloaded; if not, that's the prerequisite step (run the `hf-download.sh` command on Spark 1)
|
||||||
|
- Check what's running on Spark 2: `ssh modelo@192.168.1.87 'docker ps'` (looking for parakeet-asr and possibly magpie-tts)
|
||||||
|
3. Then create the repo structure on my laptop at `~/Projects/spark-control/`
|
||||||
|
4. Then propose the design for `models.yaml` and the swap script before implementing
|
||||||
|
|
||||||
|
Ask me anything that's unclear before starting.
|
||||||
+3
-3
@@ -9,7 +9,7 @@ from the live deployment.
|
|||||||
## 1. Connection / auth
|
## 1. Connection / auth
|
||||||
|
|
||||||
- **Base URL:** `https://<spark-control-host>` (the operator's Start9 LAN address,
|
- **Base URL:** `https://<spark-control-host>` (the operator's Start9 LAN address,
|
||||||
e.g. `https://<spark-control-host>:62419`). A `.local` form also exists (survives IP
|
e.g. `https://192.168.1.72:62419`). A `.local` form also exists (survives IP
|
||||||
changes); the operator can provide it.
|
changes); the operator can provide it.
|
||||||
- **TLS:** Start9's self-signed Root CA. On the LAN, set `verify=False` /
|
- **TLS:** Start9's self-signed Root CA. On the LAN, set `verify=False` /
|
||||||
`rejectUnauthorized:false` (curl `-k`), or install the Start9 Root CA into your
|
`rejectUnauthorized:false` (curl `-k`), or install the Start9 Root CA into your
|
||||||
@@ -247,8 +247,8 @@ to you, not a stranger. You also free a Sortformer speaker slot (you no longer c
|
|||||||
```bash
|
```bash
|
||||||
curl -k -X POST https://<host>/api/audio/label-merge \
|
curl -k -X POST https://<host>/api/audio/label-merge \
|
||||||
-F "mic_file=@mic.wav" -F "system_file=@system.wav" \
|
-F "mic_file=@mic.wav" -F "system_file=@system.wav" \
|
||||||
-F "self_name=Alice" -F 'timeline=[...]' -F "transcribe=true" \
|
-F "self_name=Grant" -F 'timeline=[...]' -F "transcribe=true" \
|
||||||
-F 'known_voiceprints={"Alice":[...],"Bob":[...]}' # include your own
|
-F 'known_voiceprints={"Grant":[...],"Caitlyn":[...]}' # include your own
|
||||||
```
|
```
|
||||||
|
|
||||||
Response is the same shape with `"mode":"dual_channel"`; `speakers` includes a
|
Response is the same shape with `"mode":"dual_channel"`; `speakers` includes a
|
||||||
|
|||||||
@@ -1,157 +0,0 @@
|
|||||||
# Cluster coordination through Spark Control (v0.25.0)
|
|
||||||
|
|
||||||
Spark Control is the **GPU arbiter, not a job runner.** Your recurring pipelines
|
|
||||||
(model-warming crons, "daily X" generators, batch jobs) live in your own
|
|
||||||
services and *drive Spark Control's swap API*. This page documents the safety
|
|
||||||
layer around that: a **swap reservation lock**, a **swap-event webhook**, and a
|
|
||||||
**read-only schedule registry**.
|
|
||||||
|
|
||||||
If only the dashboard ever swaps models, you don't need any of this — it's for
|
|
||||||
when something automated also swaps.
|
|
||||||
|
|
||||||
All endpoints are on the Spark Control host (same LAN/VPN URL as the LLM, audio,
|
|
||||||
and embeddings proxies). There is no API-token auth by design (LAN + split-tunnel
|
|
||||||
VPN only); a non-browser client passes the same-origin guard automatically.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Swap reservation lock
|
|
||||||
|
|
||||||
A short, TTL-bounded reservation of the swap path. While a lock is held, **any
|
|
||||||
real swap that doesn't present the holder's token is refused with `423 Locked`**
|
|
||||||
— including the dashboard's manual swap. The holder *name* is descriptive; the
|
|
||||||
returned **token** is the secret that authorises swaps and the release.
|
|
||||||
|
|
||||||
The lock is in-memory: it resets to *unlocked* if Spark Control restarts (the
|
|
||||||
safe-for-availability default), and the swap engine's own in-progress guard
|
|
||||||
still prevents two swaps running at once.
|
|
||||||
|
|
||||||
### `POST /api/swap/lock` — acquire (or extend)
|
|
||||||
|
|
||||||
```json
|
|
||||||
// request
|
|
||||||
{ "holder": "openclaw-daily-vol", "ttl_seconds": 900, "note": "daily vol run" }
|
|
||||||
|
|
||||||
// 200 response
|
|
||||||
{
|
|
||||||
"held": true,
|
|
||||||
"holder": "openclaw-daily-vol",
|
|
||||||
"acquired_at": "2026-06-17T12:00:00+00:00",
|
|
||||||
"expires_at": "2026-06-17T12:15:00+00:00",
|
|
||||||
"seconds_remaining": 900,
|
|
||||||
"note": "daily vol run",
|
|
||||||
"token": "a1b2c3…" // SECRET — store it; needed to swap and to release
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- `ttl_seconds` is optional (default 900) and clamped to `[1, 86400]`.
|
|
||||||
- **`409`** if a *different* holder already holds it (body includes the current
|
|
||||||
`lock` state). To **extend** your own lock, POST again with the same `holder`
|
|
||||||
**and** your `token` — the token is preserved and the window slides forward.
|
|
||||||
|
|
||||||
### `GET /api/swap/lock` — status (no token)
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "held": true, "holder": "openclaw-daily-vol", "expires_at": "…", "seconds_remaining": 612, "note": "…" }
|
|
||||||
// or
|
|
||||||
{ "held": false }
|
|
||||||
```
|
|
||||||
|
|
||||||
### `DELETE /api/swap/lock` — release
|
|
||||||
|
|
||||||
Send your token in the `X-Swap-Lock-Token` header (or `?token=`):
|
|
||||||
|
|
||||||
```
|
|
||||||
DELETE /api/swap/lock
|
|
||||||
X-Swap-Lock-Token: a1b2c3…
|
|
||||||
```
|
|
||||||
|
|
||||||
- **`403`** if the token doesn't match. The dashboard's human override is
|
|
||||||
`DELETE /api/swap/lock?force=true` (no token).
|
|
||||||
|
|
||||||
### Swapping while you hold the lock
|
|
||||||
|
|
||||||
Pass the token on the swap call; the dashboard (no token) is then blocked:
|
|
||||||
|
|
||||||
```
|
|
||||||
POST /api/swap
|
|
||||||
X-Swap-Lock-Token: a1b2c3…
|
|
||||||
{ "model_key": "gemma-3-27b" }
|
|
||||||
```
|
|
||||||
|
|
||||||
Recommended scheduler flow: **acquire → swap (with token) → poll `/api/swap/{id}`
|
|
||||||
→ release**. Always release in a `finally`; if you crash, the TTL frees it.
|
|
||||||
|
|
||||||
> `POST /api/swap/{key}/validate` (pre-flight) and dry-run swaps are **not**
|
|
||||||
> blocked by the lock — they don't touch the cluster.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Swap-event webhook
|
|
||||||
|
|
||||||
Configure a URL in **Configure Sparks → "Swap webhook URL"**. After every real
|
|
||||||
swap, Spark Control POSTs:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"event": "swap_complete", // or "swap_failed"
|
|
||||||
"job_id": "1a2b3c4d",
|
|
||||||
"model_key": "gemma-3-27b",
|
|
||||||
"state": "ready", // or "failed"
|
|
||||||
"returncode": 0,
|
|
||||||
"started_at": "2026-06-17T12:00:00+00:00",
|
|
||||||
"finished_at": "2026-06-17T12:03:11+00:00",
|
|
||||||
"dry_run": false
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Headers: `X-Spark-Event: swap_complete`. If you set a **webhook secret**, the
|
|
||||||
body is signed: `X-Spark-Signature: sha256=<hmac>` (HMAC-SHA256 of the raw body
|
|
||||||
with the shared secret). Verify it like:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import hmac, hashlib
|
|
||||||
expected = "sha256=" + hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest()
|
|
||||||
assert hmac.compare_digest(expected, request.headers["X-Spark-Signature"])
|
|
||||||
```
|
|
||||||
|
|
||||||
Delivery is best-effort and fire-and-forget (5 s timeout, no retries) — a
|
|
||||||
webhook failure never affects the swap itself. Dry runs don't fire.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Schedule registry (read-only display)
|
|
||||||
|
|
||||||
So the dashboard can show *what's scheduled to touch the GPU and when*, your
|
|
||||||
schedulers register their jobs here. **Spark Control only displays these — it
|
|
||||||
never executes them.**
|
|
||||||
|
|
||||||
### `POST /api/schedule` — register / update
|
|
||||||
|
|
||||||
```json
|
|
||||||
// request (pass a stable `id` to update in place on re-register)
|
|
||||||
{ "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
|
|
||||||
"cron": "0 6 * * *", "next_run": "2026-06-18T06:00:00Z",
|
|
||||||
"description": "Swaps to the big model, generates the vol report" }
|
|
||||||
|
|
||||||
// response: the stored entry (generates an id if you omit one)
|
|
||||||
```
|
|
||||||
|
|
||||||
`name` is required; `id` (if given) must match `[A-Za-z0-9_.-]` (≤64 chars).
|
|
||||||
|
|
||||||
### `GET /api/schedule` — list
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "schedules": [ { "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw",
|
|
||||||
"cron": "0 6 * * *", "next_run": "…", "description": "…",
|
|
||||||
"registered_at": "…", "updated_at": "…" } ] }
|
|
||||||
```
|
|
||||||
|
|
||||||
### `DELETE /api/schedule/{id}` — deregister
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "deleted": true }
|
|
||||||
```
|
|
||||||
|
|
||||||
The registry is in-memory — re-register your schedules on your own startup so
|
|
||||||
they survive a Spark Control restart.
|
|
||||||
@@ -1,35 +0,0 @@
|
|||||||
---
|
|
||||||
paths:
|
|
||||||
- "image/app/audio_proxy.py"
|
|
||||||
- "image/app/speech_models.py"
|
|
||||||
- "image/app/deep_health.py"
|
|
||||||
- "image/parakeet_patches/**"
|
|
||||||
- "scripts/test-audio-with-speakers.sh"
|
|
||||||
- "docs/AUDIO_API.md"
|
|
||||||
---
|
|
||||||
|
|
||||||
# Audio / speech stack (Parakeet STT + Sortformer diarizer + Kokoro TTS on Spark 2)
|
|
||||||
|
|
||||||
## Changing the parakeet-asr container
|
|
||||||
|
|
||||||
- `image/parakeet_patches/` (`main.py`, `diarizer.py`) is an overlay copied into the `parakeet-asr` container by the "Reapply speech-model patches" dashboard action (`image/app/speech_models.py`). This is the **only** durable way to change that container — `docker exec` / pip changes inside it die on `docker rm`.
|
|
||||||
- **Never install `cuda-python` in parakeet-asr** to "fix" the startup warning about CUDA graphs being disabled. The warning is harmless; enabling the graph path crashes real decode with illegal memory access on this GPU/CUDA-13 stack (GB10/sm_121). The slow path served 11k+ requests with zero failures — leave it alone.
|
|
||||||
- Pin/constrain torch versions when pip-installing anything into NGC-based containers on the Sparks (ABI breaks otherwise); expect ARM64 wheel gaps and source builds (`--no-build-isolation` for torchaudio). Applies to `spark_embed` too.
|
|
||||||
|
|
||||||
## Testing audio endpoints
|
|
||||||
|
|
||||||
- Test with **real speech** (e.g. `say -o /tmp/t.wav --data-format=LEI16@16000 "<a couple of sentences>"`), not tones/silence — zero-token audio skips the decoder paths where crashes live.
|
|
||||||
- Send audio requests to Spark 2 **sequentially** in tests/scripts. Parallel audio requests can race (cuFFT → 503), and the single GPU serializes them anyway.
|
|
||||||
- End-to-end suite (hits the LIVE cluster):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./scripts/test-audio-with-speakers.sh <audio-file> # from repo root
|
|
||||||
```
|
|
||||||
|
|
||||||
`SPARK_CONTROL` defaults to `http://127.0.0.1:9999` (a running local dev server); point it at the installed package URL otherwise.
|
|
||||||
|
|
||||||
## API quirk
|
|
||||||
|
|
||||||
Spark Control's `/v1/models` lists *audio* models (STT model + Kokoro voices) by design — **not** the loaded LLM. Discover the LLM via `/api/status` (`vllm.current_model`).
|
|
||||||
|
|
||||||
Diarizer caps at 4 speakers (Sortformer `diar_sortformer_4spk-v1`).
|
|
||||||
@@ -1,45 +0,0 @@
|
|||||||
---
|
|
||||||
paths:
|
|
||||||
- "image/**"
|
|
||||||
---
|
|
||||||
|
|
||||||
# FastAPI image (`image/`)
|
|
||||||
|
|
||||||
Standalone FastAPI app (Python ≥3.11; ships on `python:3.12-slim`; UI on port 9999; vanilla HTML/CSS/JS, no framework). Python has no configured linter/formatter — match the style of the file you're editing.
|
|
||||||
|
|
||||||
## Local dev (no StartOS)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd image
|
|
||||||
python3 -m venv .venv && source .venv/bin/activate # one-time
|
|
||||||
pip install -e .
|
|
||||||
export SPARK1_HOST=<ip> SPARK1_USER=<user> SPARK2_HOST=<ip> SPARK2_USER=<user> SSH_KEY_PATH=<private-key>
|
|
||||||
# Required outside the container — these default to paths under /data, which only exists in the image
|
|
||||||
# (missing REDACTION_MAP_DB crashes startup; missing CONNECTIVITY_LOG 500s /api/status):
|
|
||||||
export REDACTION_MAP_DB=/tmp/redaction_maps.db CONNECTIVITY_LOG=/tmp/connectivity.json
|
|
||||||
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
|
|
||||||
```
|
|
||||||
|
|
||||||
Other env vars: `BIND_PORT`, `MODELS_YAML`, `SSH_DIR`, `SSH_KNOWN_HOSTS`, `MODELS_OVERRIDES`, `SERVICES_OVERRIDES`.
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
Two kinds, both run with the `image/.venv` interpreter (system python3 has no deps):
|
|
||||||
|
|
||||||
- **pytest unit suite** — offline, pure functions, no cluster. `.venv/bin/python -m pytest` from `image/`. Lives in `image/tests/`; currently covers `build_launch_command` (incl. the shell-injection / `shlex` round-trip invariant) and the transcript↔diarizer label-merge (`_merge_words_with_speakers`). Install the test dep once with `pip install -e '.[dev]'`. Add new pure-function coverage here.
|
|
||||||
- **Standalone scripts** — the redaction suites and the live-cluster audio e2e are run directly (not via pytest). See the redaction and audio rules.
|
|
||||||
|
|
||||||
## Conventions
|
|
||||||
|
|
||||||
- Pydantic request models go at **module scope**, never inside a `build_router()` body (FastAPI silently 422s otherwise).
|
|
||||||
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
|
|
||||||
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
|
|
||||||
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
|
|
||||||
|
|
||||||
## Layout
|
|
||||||
|
|
||||||
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
|
|
||||||
- `image/app/discovery.py` — the disk-driven model menu. `/api/models` lists what's actually downloaded on the Sparks (via `disk.list_cached_models`); `models.yaml`/overrides are *launch recipes* matched by repo, not the menu. An on-disk model with no recipe is `needs_setup` → `infer_recipe` reads its `config.json` to prefill a setup form the operator confirms once.
|
|
||||||
- `image/app/static/` — the dashboard UI.
|
|
||||||
- `image/models.yaml` — bundled vLLM **launch recipes** (how to launch a known model), NOT the dashboard menu — the menu is the on-disk scan.
|
|
||||||
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base — see the audio/cluster rule for NGC torch-pinning caveats).
|
|
||||||
@@ -1,23 +0,0 @@
|
|||||||
---
|
|
||||||
paths:
|
|
||||||
- "image/app/redaction/**"
|
|
||||||
- "image/app/redaction_gateway.py"
|
|
||||||
- "docs/REDACTION_GATEWAY.md"
|
|
||||||
---
|
|
||||||
|
|
||||||
# Redaction (`/scrub` + `/rehydrate`)
|
|
||||||
|
|
||||||
- `image/app/redaction/scrub.py` + `test_scrub_leak.py` are vendored **byte-for-byte** from the CRM repo (sha recorded in `redaction/__init__.py`). **Never edit them here** — change them in the CRM repo, re-vendor (`cp`), update the sha, re-run the leak test.
|
|
||||||
- The gateway around the vendored scrubber is `image/app/redaction_gateway.py`. Its token-map store lives on `/data` (`REDACTION_MAP_DB`, default `/data/redaction_maps.db`) and fails closed if it can't open — set the env var when running outside the container.
|
|
||||||
|
|
||||||
## Test suites — both must pass before shipping ANY redaction change
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd image
|
|
||||||
.venv/bin/python -m app.redaction.test_gateway # /scrub + /rehydrate acceptance; offline, no cluster needed
|
|
||||||
.venv/bin/python app/redaction/test_scrub_leak.py # vendored golden-file leak test; offline
|
|
||||||
```
|
|
||||||
|
|
||||||
Keep the leak test green against the vendored `scrub.py` after any re-vendor.
|
|
||||||
|
|
||||||
Policy context: scrubbed text via `/scrub` is the **only** sanctioned path toward frontier/cloud models — see the whole-repo privacy rule in AGENTS.md.
|
|
||||||
@@ -1,47 +0,0 @@
|
|||||||
---
|
|
||||||
paths:
|
|
||||||
- "package/**"
|
|
||||||
---
|
|
||||||
|
|
||||||
# StartOS package (`package/`)
|
|
||||||
|
|
||||||
TypeScript wrapper that ships the Docker image as an s9pk. `@start9labs/start-sdk` pinned `1.3.3`, Node ≥22, bundled by `@vercel/ncc`.
|
|
||||||
|
|
||||||
## Commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd package
|
|
||||||
npm i # one-time
|
|
||||||
make x86 # typecheck + ncc bundle + docker build + pack → spark-control_x86_64.s9pk
|
|
||||||
make install # sideload to the Start9 server; needs "host: http(s)://<server>.local" in ~/.startos/config.yaml
|
|
||||||
npm run check # tsc --noEmit — run after any startos/ edit; make x86 also runs it
|
|
||||||
npm run prettier # prettier --write startos (no semicolons, single quotes, trailing commas)
|
|
||||||
```
|
|
||||||
|
|
||||||
`make aarch64` for ARM Start9 servers. `make install` picks the newest `*.s9pk` in `package/` and restarts the live spark-control service — get a go/no-go first.
|
|
||||||
|
|
||||||
## Versioning & release notes
|
|
||||||
|
|
||||||
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
|
|
||||||
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
|
||||||
|
|
||||||
## Releasing to Gitea
|
|
||||||
|
|
||||||
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
|
|
||||||
cd package && make x86 # build
|
|
||||||
# 2. commit + push the source change
|
|
||||||
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
|
|
||||||
make install # optional: sideload to your own server (restarts it — go/no-go)
|
|
||||||
# 3. publish the s9pk as a release asset (needs a write-scoped token):
|
|
||||||
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
|
|
||||||
```
|
|
||||||
|
|
||||||
`make release` → `scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
|
|
||||||
|
|
||||||
## Layout
|
|
||||||
|
|
||||||
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
|
|
||||||
- The "Reapply speech-model patches" action is **not** a StartOS action — it's a dashboard action implemented in `image/app/speech_models.py`.
|
|
||||||
@@ -41,7 +41,7 @@ from .config import Settings
|
|||||||
logger = logging.getLogger("spark-control.audio")
|
logger = logging.getLogger("spark-control.audio")
|
||||||
|
|
||||||
|
|
||||||
# Kokoro default voice. The four curated voices below were Alice-tested for
|
# Kokoro default voice. The four curated voices below were Grant-tested for
|
||||||
# narration/recap-style content; bm_george is the default. Clients can pass
|
# narration/recap-style content; bm_george is the default. Clients can pass
|
||||||
# any of Kokoro's 67 voices in the `voice` field — see /v1/models.
|
# any of Kokoro's 67 voices in the `voice` field — see /v1/models.
|
||||||
DEFAULT_VOICE = "bm_george"
|
DEFAULT_VOICE = "bm_george"
|
||||||
@@ -100,7 +100,7 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
|
|||||||
"kind": "stt",
|
"kind": "stt",
|
||||||
},
|
},
|
||||||
]
|
]
|
||||||
# Curated first — these are the four Alice chose for narration/recap.
|
# Curated first — these are the four Grant chose for narration/recap.
|
||||||
seen = set()
|
seen = set()
|
||||||
for v in CURATED_VOICES:
|
for v in CURATED_VOICES:
|
||||||
data.append({
|
data.append({
|
||||||
|
|||||||
+7
-84
@@ -1,54 +1,13 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import logging
|
|
||||||
import os
|
import os
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from .shellsafe import validate_container
|
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
def _env(name: str, default: str = "") -> str:
|
def _env(name: str, default: str = "") -> str:
|
||||||
return os.environ.get(name, default)
|
return os.environ.get(name, default)
|
||||||
|
|
||||||
|
|
||||||
def _env_container(name: str, default: str) -> str:
|
|
||||||
"""Resolve a container-name env var, validating it at the config boundary.
|
|
||||||
|
|
||||||
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
|
|
||||||
the sink — but per the repo's two-layer convention it's also whitelist-checked
|
|
||||||
here. A malformed optional value falls back to `default` rather than crashing
|
|
||||||
daemon startup (mirrors `_env_int` for VLLM_PORT)."""
|
|
||||||
val = os.environ.get(name, "") or default
|
|
||||||
try:
|
|
||||||
return validate_container(val)
|
|
||||||
except ValueError:
|
|
||||||
log.warning("ignoring invalid %s=%r; using %r", name, val, default)
|
|
||||||
return default
|
|
||||||
|
|
||||||
|
|
||||||
def _env_set(name: str) -> frozenset[str]:
|
|
||||||
"""Parse a comma-separated env var into a lowercased frozenset of keys.
|
|
||||||
|
|
||||||
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
|
|
||||||
support service can switch its tile + probes off entirely (rather than have
|
|
||||||
the probe hit whatever else listens on that port — e.g. a vLLM sharing
|
|
||||||
Parakeet's default 8000)."""
|
|
||||||
raw = os.environ.get(name, "")
|
|
||||||
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
|
|
||||||
|
|
||||||
|
|
||||||
def _env_int(name: str, default: int) -> int:
|
|
||||||
"""Parse an int env var, falling back to `default` when unset, blank, or
|
|
||||||
malformed. The StartOS Configure panel passes optional numeric fields as an
|
|
||||||
empty string when left blank, so a bare int("") would crash daemon startup."""
|
|
||||||
try:
|
|
||||||
return int(os.environ.get(name, "") or default)
|
|
||||||
except (TypeError, ValueError):
|
|
||||||
return default
|
|
||||||
|
|
||||||
|
|
||||||
def _resolve_models_yaml() -> str:
|
def _resolve_models_yaml() -> str:
|
||||||
if env := os.environ.get("MODELS_YAML"):
|
if env := os.environ.get("MODELS_YAML"):
|
||||||
return env
|
return env
|
||||||
@@ -83,19 +42,12 @@ class Settings:
|
|||||||
qdrant_user: str
|
qdrant_user: str
|
||||||
qdrant_container: str
|
qdrant_container: str
|
||||||
qdrant_collection: str
|
qdrant_collection: str
|
||||||
matrix_bridge_host: str
|
|
||||||
matrix_bridge_user: str
|
|
||||||
matrix_bridge_container: str
|
|
||||||
matrix_bridge_dir: str
|
|
||||||
matrix_bridge_branch: str
|
|
||||||
redaction_map_db: str
|
redaction_map_db: str
|
||||||
redaction_map_ttl: int
|
redaction_map_ttl: int
|
||||||
ssh_key_path: str
|
ssh_key_path: str
|
||||||
ssh_known_hosts: str
|
ssh_known_hosts: str
|
||||||
models_yaml: str
|
models_yaml: str
|
||||||
vllm_port: int
|
vllm_port: int
|
||||||
vllm_container: str
|
|
||||||
disabled_services: frozenset[str]
|
|
||||||
parakeet_port: int
|
parakeet_port: int
|
||||||
kokoro_port: int
|
kokoro_port: int
|
||||||
embed_port: int
|
embed_port: int
|
||||||
@@ -103,8 +55,6 @@ class Settings:
|
|||||||
bind_port: int
|
bind_port: int
|
||||||
open_webui_url: str
|
open_webui_url: str
|
||||||
ngc_api_key: str
|
ngc_api_key: str
|
||||||
swap_webhook_url: str
|
|
||||||
swap_webhook_secret: str
|
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def from_env(cls) -> "Settings":
|
def from_env(cls) -> "Settings":
|
||||||
@@ -131,47 +81,20 @@ class Settings:
|
|||||||
qdrant_user=_env("QDRANT_USER") or spark2_user,
|
qdrant_user=_env("QDRANT_USER") or spark2_user,
|
||||||
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
|
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
|
||||||
qdrant_collection=_env("QDRANT_COLLECTION", ""),
|
qdrant_collection=_env("QDRANT_COLLECTION", ""),
|
||||||
# matrix-bridge bot container, driven as its own SSH user (the owner
|
|
||||||
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
|
|
||||||
# The user is BLANK by default and set via the "Configure Sparks"
|
|
||||||
# action; leaving it blank reports the service as unconfigured, which
|
|
||||||
# hides the tile. That keeps the shared package portable — a
|
|
||||||
# deployment without the bot never shows a stray tile or a hardcoded
|
|
||||||
# username. Host defaults to Spark 2 (same box); container/dir/branch
|
|
||||||
# are sensible defaults. All are env-overridable.
|
|
||||||
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
|
|
||||||
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
|
|
||||||
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
|
|
||||||
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
|
|
||||||
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
|
|
||||||
# Redaction gateway pseudonym-map store (server-held de-anon key).
|
# Redaction gateway pseudonym-map store (server-held de-anon key).
|
||||||
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
|
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
|
||||||
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
|
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")),
|
||||||
ssh_key_path=_env("SSH_KEY_PATH"),
|
ssh_key_path=_env("SSH_KEY_PATH"),
|
||||||
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
||||||
models_yaml=_resolve_models_yaml(),
|
models_yaml=_resolve_models_yaml(),
|
||||||
vllm_port=_env_int("VLLM_PORT", 8888),
|
vllm_port=int(_env("VLLM_PORT", "8888")),
|
||||||
# Container name for the swappable vLLM on Spark 1. Defaults to the
|
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
|
||||||
# bundled launch-cluster.sh container; override if you named yours
|
kokoro_port=int(_env("KOKORO_PORT", "8880")),
|
||||||
# something else (the swap log-tail and pre-flight validator exec
|
embed_port=int(_env("EMBED_PORT", "8088")),
|
||||||
# into it by name).
|
qdrant_port=int(_env("QDRANT_PORT", "6333")),
|
||||||
vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
|
bind_port=int(_env("BIND_PORT", "9999")),
|
||||||
# Built-in support-service keys (parakeet, kokoro, embeddings,
|
|
||||||
# qdrant) the deployment doesn't run — hidden from the dashboard and
|
|
||||||
# never probed.
|
|
||||||
disabled_services=_env_set("DISABLED_SERVICES"),
|
|
||||||
parakeet_port=_env_int("PARAKEET_PORT", 8000),
|
|
||||||
kokoro_port=_env_int("KOKORO_PORT", 8880),
|
|
||||||
embed_port=_env_int("EMBED_PORT", 8088),
|
|
||||||
qdrant_port=_env_int("QDRANT_PORT", 6333),
|
|
||||||
bind_port=_env_int("BIND_PORT", 9999),
|
|
||||||
open_webui_url=_env("OPEN_WEBUI_URL", ""),
|
open_webui_url=_env("OPEN_WEBUI_URL", ""),
|
||||||
ngc_api_key=_env("NGC_API_KEY", ""),
|
ngc_api_key=_env("NGC_API_KEY", ""),
|
||||||
# Coordination layer: fire a swap-lifecycle webhook to this URL so
|
|
||||||
# downstream consumers re-point their model config on a swap. Blank
|
|
||||||
# ⇒ disabled. The optional secret HMAC-signs the body (X-Spark-Signature).
|
|
||||||
swap_webhook_url=_env("SWAP_WEBHOOK_URL", ""),
|
|
||||||
swap_webhook_secret=_env("SWAP_WEBHOOK_SECRET", ""),
|
|
||||||
)
|
)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
|
|||||||
@@ -1,342 +0,0 @@
|
|||||||
"""Cluster-coordination layer: the GPU swap lock, swap-event webhook, and the
|
|
||||||
read-only schedule registry.
|
|
||||||
|
|
||||||
Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring
|
|
||||||
business pipelines live in separate services that *call* the swap API. These
|
|
||||||
three primitives add the *safety* layer around that:
|
|
||||||
|
|
||||||
- **Swap lock** — a TTL-bounded reservation of the swap path. An external
|
|
||||||
scheduler acquires it before swapping; while held by someone else the
|
|
||||||
dashboard's manual swap is refused (enforced in the swap endpoint, not
|
|
||||||
advisory). Holder name is descriptive; the returned token is the secret that
|
|
||||||
authorises a swap or a release.
|
|
||||||
- **Webhook** — fires `swap_complete` / `swap_failed` to a configurable URL so
|
|
||||||
downstream consumers re-point their provider config when the running model
|
|
||||||
changes. Optionally HMAC-signed.
|
|
||||||
- **Schedule registry** — a read-only view the dashboard surfaces, *registered
|
|
||||||
by* external schedulers. Spark Control stores what it's told; it does not own
|
|
||||||
or execute any schedule.
|
|
||||||
|
|
||||||
All state is in-memory (mirroring the swap/download/NIM job managers). On a
|
|
||||||
restart the lock resets to *unlocked* — the available-by-default failure mode;
|
|
||||||
the swap manager's own in-progress guard still prevents two swaps at once —
|
|
||||||
and schedulers re-register their schedules.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
import hashlib
|
|
||||||
import hmac
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
import re
|
|
||||||
import uuid
|
|
||||||
from dataclasses import dataclass
|
|
||||||
from datetime import datetime, timedelta, timezone
|
|
||||||
from typing import Optional
|
|
||||||
|
|
||||||
import httpx
|
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
# A lock reserves the GPU for a window; clamp the TTL so a buggy client can
|
|
||||||
# neither pin the cluster forever nor take a zero-length (useless) lock.
|
|
||||||
LOCK_TTL_MIN = 1
|
|
||||||
LOCK_TTL_MAX = 86_400 # 24h
|
|
||||||
LOCK_TTL_DEFAULT = 900 # 15 min
|
|
||||||
|
|
||||||
# Schedule ids are reflected to the dashboard and used as a URL path segment on
|
|
||||||
# delete, so a caller-supplied id is whitelist-checked. Generated ids are hex.
|
|
||||||
_SCHEDULE_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
|
|
||||||
|
|
||||||
|
|
||||||
def valid_schedule_id(value: str) -> bool:
|
|
||||||
"""Whitelist check for a caller-supplied schedule id (register and delete)."""
|
|
||||||
return bool(_SCHEDULE_ID_RE.match(value or ""))
|
|
||||||
|
|
||||||
|
|
||||||
def _now() -> datetime:
|
|
||||||
return datetime.now(timezone.utc)
|
|
||||||
|
|
||||||
|
|
||||||
def _iso(dt: datetime) -> str:
|
|
||||||
return dt.isoformat()
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------- swap lock ----
|
|
||||||
|
|
||||||
class LockHeld(Exception):
|
|
||||||
"""The lock is held by a different holder. Carries the public lock state so
|
|
||||||
the endpoint can return holder + expiry in the 409 body."""
|
|
||||||
|
|
||||||
def __init__(self, state: dict) -> None:
|
|
||||||
self.state = state
|
|
||||||
super().__init__("swap lock is held by another holder")
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class LockState:
|
|
||||||
holder: str
|
|
||||||
token: str
|
|
||||||
acquired_at: datetime
|
|
||||||
expires_at: datetime
|
|
||||||
note: str = ""
|
|
||||||
|
|
||||||
def public(self, now: datetime) -> dict:
|
|
||||||
"""Token-free view safe to expose on GET / in error bodies."""
|
|
||||||
return {
|
|
||||||
"held": True,
|
|
||||||
"holder": self.holder,
|
|
||||||
"acquired_at": _iso(self.acquired_at),
|
|
||||||
"expires_at": _iso(self.expires_at),
|
|
||||||
"seconds_remaining": max(0, int((self.expires_at - now).total_seconds())),
|
|
||||||
"note": self.note,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class SwapLockManager:
|
|
||||||
"""In-memory, TTL-bounded reservation of the GPU swap path.
|
|
||||||
|
|
||||||
`now` is injectable on every method purely so the expiry logic is testable
|
|
||||||
without sleeping; production calls omit it and get wall-clock UTC.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self) -> None:
|
|
||||||
self._lock: Optional[LockState] = None
|
|
||||||
|
|
||||||
def _active(self, now: Optional[datetime] = None) -> Optional[LockState]:
|
|
||||||
"""The current lock if one is held and unexpired; lazily clears an
|
|
||||||
expired lock so it never lingers."""
|
|
||||||
now = now or _now()
|
|
||||||
if self._lock is not None and self._lock.expires_at <= now:
|
|
||||||
self._lock = None
|
|
||||||
return self._lock
|
|
||||||
|
|
||||||
def status(self, now: Optional[datetime] = None) -> dict:
|
|
||||||
now = now or _now()
|
|
||||||
active = self._active(now)
|
|
||||||
return active.public(now) if active else {"held": False}
|
|
||||||
|
|
||||||
def acquire(
|
|
||||||
self,
|
|
||||||
holder: str,
|
|
||||||
ttl_seconds: Optional[int] = None,
|
|
||||||
note: str = "",
|
|
||||||
token: Optional[str] = None,
|
|
||||||
*,
|
|
||||||
now: Optional[datetime] = None,
|
|
||||||
) -> LockState:
|
|
||||||
"""Acquire a free lock (new token), or extend one already held by
|
|
||||||
presenting its token. A request without the token is refused even if the
|
|
||||||
holder name matches — the name is descriptive, the token is the secret.
|
|
||||||
"""
|
|
||||||
now = now or _now()
|
|
||||||
holder = (holder or "").strip()
|
|
||||||
if not holder:
|
|
||||||
raise ValueError("holder is required")
|
|
||||||
ttl = ttl_seconds if ttl_seconds is not None else LOCK_TTL_DEFAULT
|
|
||||||
try:
|
|
||||||
ttl = int(ttl)
|
|
||||||
except (TypeError, ValueError):
|
|
||||||
ttl = LOCK_TTL_DEFAULT
|
|
||||||
ttl = max(LOCK_TTL_MIN, min(LOCK_TTL_MAX, ttl))
|
|
||||||
|
|
||||||
active = self._active(now)
|
|
||||||
if active is not None:
|
|
||||||
# Held — only the token-holder may extend/re-acquire.
|
|
||||||
if not (token and hmac.compare_digest(active.token, token)):
|
|
||||||
raise LockHeld(active.public(now))
|
|
||||||
self._lock = LockState(
|
|
||||||
holder=holder or active.holder,
|
|
||||||
token=active.token,
|
|
||||||
acquired_at=active.acquired_at,
|
|
||||||
expires_at=now + timedelta(seconds=ttl),
|
|
||||||
note=note or active.note,
|
|
||||||
)
|
|
||||||
return self._lock
|
|
||||||
|
|
||||||
self._lock = LockState(
|
|
||||||
holder=holder,
|
|
||||||
token=uuid.uuid4().hex,
|
|
||||||
acquired_at=now,
|
|
||||||
expires_at=now + timedelta(seconds=ttl),
|
|
||||||
note=note,
|
|
||||||
)
|
|
||||||
return self._lock
|
|
||||||
|
|
||||||
def verify(self, token: Optional[str], now: Optional[datetime] = None) -> bool:
|
|
||||||
"""True iff `token` matches the currently-active lock."""
|
|
||||||
active = self._active(now)
|
|
||||||
return bool(active and token and hmac.compare_digest(active.token, token))
|
|
||||||
|
|
||||||
def is_blocked_by(self, token: Optional[str], now: Optional[datetime] = None) -> Optional[dict]:
|
|
||||||
"""Single-read swap gate. Returns the public lock state if an active
|
|
||||||
lock blocks a swap carrying this token, else None. Does exactly one
|
|
||||||
`_active()` read so the decision can't straddle a TTL expiry the way a
|
|
||||||
separate status()+verify() pair could (which, at the expiry tick, would
|
|
||||||
spuriously refuse a swap that should now be allowed)."""
|
|
||||||
now = now or _now()
|
|
||||||
active = self._active(now)
|
|
||||||
if active is None:
|
|
||||||
return None
|
|
||||||
if token and hmac.compare_digest(active.token, token):
|
|
||||||
return None
|
|
||||||
return active.public(now)
|
|
||||||
|
|
||||||
def release(
|
|
||||||
self,
|
|
||||||
token: Optional[str] = None,
|
|
||||||
*,
|
|
||||||
force: bool = False,
|
|
||||||
now: Optional[datetime] = None,
|
|
||||||
) -> bool:
|
|
||||||
"""Release the lock. Returns False if nothing was held. Requires the
|
|
||||||
matching token unless `force` (the human override from the dashboard)."""
|
|
||||||
active = self._active(now)
|
|
||||||
if active is None:
|
|
||||||
return False
|
|
||||||
if not force and not self.verify(token, now):
|
|
||||||
raise PermissionError("token does not hold the lock")
|
|
||||||
self._lock = None
|
|
||||||
return True
|
|
||||||
|
|
||||||
|
|
||||||
# ----------------------------------------------------------------- webhook ----
|
|
||||||
|
|
||||||
def build_webhook_payload(
|
|
||||||
*,
|
|
||||||
event: str,
|
|
||||||
job_id: str,
|
|
||||||
model_key: str,
|
|
||||||
state: str,
|
|
||||||
returncode: Optional[int],
|
|
||||||
started_at: Optional[str],
|
|
||||||
finished_at: Optional[str],
|
|
||||||
dry_run: bool,
|
|
||||||
) -> dict:
|
|
||||||
return {
|
|
||||||
"event": event, # swap_complete | swap_failed
|
|
||||||
"job_id": job_id,
|
|
||||||
"model_key": model_key,
|
|
||||||
"state": state,
|
|
||||||
"returncode": returncode,
|
|
||||||
"started_at": started_at,
|
|
||||||
"finished_at": finished_at,
|
|
||||||
"dry_run": dry_run,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def sign_payload(secret: str, body: bytes) -> str:
|
|
||||||
"""`X-Spark-Signature` value: sha256 HMAC of the exact JSON body the
|
|
||||||
consumer receives, so they can recompute and trust it."""
|
|
||||||
return "sha256=" + hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
|
|
||||||
|
|
||||||
|
|
||||||
class WebhookNotifier:
|
|
||||||
"""Fire-and-forget POST of swap-lifecycle events. A webhook failure is
|
|
||||||
logged and swallowed — it must never affect the swap outcome."""
|
|
||||||
|
|
||||||
def __init__(self, url: str, secret: str = "", timeout: float = 5.0) -> None:
|
|
||||||
self.url = (url or "").strip()
|
|
||||||
self.secret = secret or ""
|
|
||||||
self.timeout = timeout
|
|
||||||
|
|
||||||
@property
|
|
||||||
def enabled(self) -> bool:
|
|
||||||
return bool(self.url)
|
|
||||||
|
|
||||||
async def fire(self, event: str, payload: dict) -> None:
|
|
||||||
if not self.enabled:
|
|
||||||
return
|
|
||||||
body = json.dumps(payload).encode()
|
|
||||||
headers = {
|
|
||||||
"content-type": "application/json",
|
|
||||||
"user-agent": "spark-control-webhook",
|
|
||||||
"x-spark-event": event,
|
|
||||||
}
|
|
||||||
if self.secret:
|
|
||||||
headers["x-spark-signature"] = sign_payload(self.secret, body)
|
|
||||||
try:
|
|
||||||
async with httpx.AsyncClient(timeout=self.timeout) as client:
|
|
||||||
await client.post(self.url, content=body, headers=headers)
|
|
||||||
except Exception as e: # noqa: BLE001 — best-effort, never propagate
|
|
||||||
log.warning("swap webhook to %s failed: %s", self.url, e)
|
|
||||||
|
|
||||||
|
|
||||||
# -------------------------------------------------------- schedule registry ----
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class ScheduleEntry:
|
|
||||||
id: str
|
|
||||||
name: str
|
|
||||||
owner: str = ""
|
|
||||||
cron: str = ""
|
|
||||||
next_run: str = ""
|
|
||||||
description: str = ""
|
|
||||||
registered_at: str = ""
|
|
||||||
updated_at: str = ""
|
|
||||||
|
|
||||||
def public(self) -> dict:
|
|
||||||
return {
|
|
||||||
"id": self.id,
|
|
||||||
"name": self.name,
|
|
||||||
"owner": self.owner,
|
|
||||||
"cron": self.cron,
|
|
||||||
"next_run": self.next_run,
|
|
||||||
"description": self.description,
|
|
||||||
"registered_at": self.registered_at,
|
|
||||||
"updated_at": self.updated_at,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class ScheduleRegistry:
|
|
||||||
"""What external schedulers tell us about their cron jobs. Read-only from the
|
|
||||||
dashboard's side; Spark Control never executes any of it."""
|
|
||||||
|
|
||||||
def __init__(self) -> None:
|
|
||||||
self._items: dict[str, ScheduleEntry] = {}
|
|
||||||
|
|
||||||
def list(self) -> list[dict]:
|
|
||||||
return [e.public() for e in self._items.values()]
|
|
||||||
|
|
||||||
def register(
|
|
||||||
self,
|
|
||||||
*,
|
|
||||||
name: str,
|
|
||||||
id: Optional[str] = None,
|
|
||||||
owner: str = "",
|
|
||||||
cron: str = "",
|
|
||||||
next_run: str = "",
|
|
||||||
description: str = "",
|
|
||||||
) -> ScheduleEntry:
|
|
||||||
name = (name or "").strip()
|
|
||||||
if not name:
|
|
||||||
raise ValueError("name is required")
|
|
||||||
if id is not None:
|
|
||||||
id = id.strip()
|
|
||||||
if id and not valid_schedule_id(id):
|
|
||||||
raise ValueError("id must match [A-Za-z0-9_.-] (max 64 chars)")
|
|
||||||
ts = _iso(_now())
|
|
||||||
existing = self._items.get(id) if id else None
|
|
||||||
if existing is not None:
|
|
||||||
existing.name = name
|
|
||||||
existing.owner = owner.strip()
|
|
||||||
existing.cron = cron
|
|
||||||
existing.next_run = next_run
|
|
||||||
existing.description = description
|
|
||||||
existing.updated_at = ts
|
|
||||||
return existing
|
|
||||||
sid = id or uuid.uuid4().hex[:8]
|
|
||||||
entry = ScheduleEntry(
|
|
||||||
id=sid,
|
|
||||||
name=name,
|
|
||||||
owner=owner.strip(),
|
|
||||||
cron=cron,
|
|
||||||
next_run=next_run,
|
|
||||||
description=description,
|
|
||||||
registered_at=ts,
|
|
||||||
updated_at=ts,
|
|
||||||
)
|
|
||||||
self._items[sid] = entry
|
|
||||||
return entry
|
|
||||||
|
|
||||||
def delete(self, schedule_id: str) -> bool:
|
|
||||||
return self._items.pop(schedule_id, None) is not None
|
|
||||||
@@ -10,17 +10,6 @@ Format:
|
|||||||
port: 8001
|
port: 8001
|
||||||
health_path: /health
|
health_path: /health
|
||||||
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
||||||
|
|
||||||
A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
|
|
||||||
the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
|
|
||||||
probed via /v1/models plus container state and start/stop/restart:
|
|
||||||
custom:
|
|
||||||
- key: vllm-spark2
|
|
||||||
kind: vllm
|
|
||||||
host: <spark-2-ip>
|
|
||||||
user: <ssh-user>
|
|
||||||
container: vllm_node
|
|
||||||
port: 8000
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import os
|
import os
|
||||||
|
|||||||
@@ -377,10 +377,6 @@ class DeepHealth:
|
|||||||
async def run_all(self) -> dict[str, ProbeResult]:
|
async def run_all(self) -> dict[str, ProbeResult]:
|
||||||
results = {}
|
results = {}
|
||||||
for name in self.PROBES:
|
for name in self.PROBES:
|
||||||
# Don't deep-probe a service the deployment switched off — its port
|
|
||||||
# may be answered by something else (e.g. a vLLM on Parakeet's 8000).
|
|
||||||
if name in self.settings.disabled_services:
|
|
||||||
continue
|
|
||||||
results[name] = await self.run_one(name)
|
results[name] = await self.run_one(name)
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
|||||||
@@ -1,209 +0,0 @@
|
|||||||
"""Disk-driven model menu + launch-recipe inference.
|
|
||||||
|
|
||||||
The dashboard's model list is whatever is actually downloaded on the Sparks
|
|
||||||
(see `disk.list_cached_models`), NOT a hard-coded catalog. The bundled/overridden
|
|
||||||
catalog entries are *launch recipes*: matched to an on-disk model by repo, they
|
|
||||||
say HOW to launch it. A completed model on disk with no matching recipe shows up
|
|
||||||
as `needs_setup` — the first switch reads its `config.json`, proposes a recipe
|
|
||||||
(`infer_recipe`) the operator confirms once, and that confirmed recipe is saved
|
|
||||||
to /data so it's a normal card from then on.
|
|
||||||
|
|
||||||
Why a recipe layer at all, if the menu is the disk? Because a folder on disk
|
|
||||||
doesn't say how to launch it: the per-family parsers (`--reasoning-parser`,
|
|
||||||
`--tool-call-parser`), the MoE backend (some Gemma MoE checkpoints need
|
|
||||||
`marlin` on GB10), and solo-vs-cluster topology can't be read off a directory.
|
|
||||||
We infer a best guess from the model's own config + size, but the operator
|
|
||||||
confirms it — a wrong guess is cheap, a wrong launch is not.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
import asyncio
|
|
||||||
import re
|
|
||||||
|
|
||||||
from .config import Settings
|
|
||||||
from .disk import list_cached_models, probe_disk
|
|
||||||
from .overrides import extract_knobs_from_args
|
|
||||||
|
|
||||||
|
|
||||||
# A model whose weights exceed this can't fit one Spark's 128 GB beside a KV
|
|
||||||
# cache, so it must shard across both via Ray. A heuristic prefill only — the
|
|
||||||
# operator confirms mode in the setup form, so the exact cutoff isn't critical.
|
|
||||||
SINGLE_SPARK_BYTES = 115 * 1000 ** 3
|
|
||||||
|
|
||||||
# Generic knob defaults applied to every inferred recipe (the operator can tweak
|
|
||||||
# these in the setup form). Family-specific flags (parsers, MoE backend) are
|
|
||||||
# layered on separately by `_detect_family`.
|
|
||||||
_COMMON_KNOBS = {
|
|
||||||
"max_model_len": 32768,
|
|
||||||
"gpu_memory_utilization": 0.85,
|
|
||||||
"fastsafetensors": True,
|
|
||||||
"prefix_caching": True,
|
|
||||||
"kv_cache_dtype": "fp8",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def repo_to_key(repo: str) -> str:
|
|
||||||
"""Stable, URL-safe menu key for a discovered model with no recipe key yet.
|
|
||||||
|
|
||||||
'RedHatAI/Qwen3.6-35B-A3B-NVFP4' -> 'redhatai-qwen3-6-35b-a3b-nvfp4'. The same
|
|
||||||
slug is used by the menu, the setup form, and `_identify_current_model`, so a
|
|
||||||
loaded-but-unconfigured model still highlights as active."""
|
|
||||||
return re.sub(r"[^a-z0-9_-]+", "-", repo.lower()).strip("-")
|
|
||||||
|
|
||||||
|
|
||||||
def _detect_family(config: dict) -> tuple[str, list[str], list[str]]:
|
|
||||||
"""Return (family_label, vllm_flags, capabilities) inferred from config.json.
|
|
||||||
|
|
||||||
Only family-specific, non-knob flags (parsers, MoE backend) go in vllm_flags;
|
|
||||||
generic knob defaults are handled by the caller. Best-effort and operator-
|
|
||||||
confirmed, so a wrong guess is cheap."""
|
|
||||||
arch = " ".join(config.get("architectures") or [])
|
|
||||||
mtype = str(config.get("model_type") or "")
|
|
||||||
s = (arch + " " + mtype).lower()
|
|
||||||
is_moe = (
|
|
||||||
"moe" in s
|
|
||||||
or any(config.get(k) for k in ("num_experts", "n_routed_experts", "num_local_experts"))
|
|
||||||
)
|
|
||||||
is_vision = (
|
|
||||||
"conditionalgeneration" in s
|
|
||||||
or "vision" in s
|
|
||||||
or "vlforcausallm" in s
|
|
||||||
or "vision_config" in config
|
|
||||||
or "image_token_index" in config
|
|
||||||
)
|
|
||||||
flags: list[str] = []
|
|
||||||
caps: list[str] = []
|
|
||||||
label = "Generic"
|
|
||||||
if mtype.startswith("qwen3") or "qwen3" in s:
|
|
||||||
label = "Qwen3 (MoE)" if is_moe else "Qwen3"
|
|
||||||
flags.append("--reasoning-parser=qwen3")
|
|
||||||
caps.append("reasoning")
|
|
||||||
if is_moe:
|
|
||||||
flags.append("--moe_backend=flashinfer_cutlass")
|
|
||||||
elif "gemma" in s:
|
|
||||||
label = "Gemma (MoE)" if is_moe else "Gemma"
|
|
||||||
flags += ["--reasoning-parser=gemma4", "--tool-call-parser=gemma4", "--enable-auto-tool-choice"]
|
|
||||||
caps += ["reasoning", "tools"]
|
|
||||||
if is_moe:
|
|
||||||
# The fast flashinfer/CUTLASS FP4 path errors on GB10 for Gemma MoE;
|
|
||||||
# marlin is the working fallback (see the Gemma 26B trial notes).
|
|
||||||
flags.append("--moe_backend=marlin")
|
|
||||||
if is_vision and "vision" not in caps:
|
|
||||||
caps.append("vision")
|
|
||||||
return label, flags, caps
|
|
||||||
|
|
||||||
|
|
||||||
def _infer_mode(total_bytes: int, on_host_count: int) -> str:
|
|
||||||
"""Solo unless the weights are present on both Sparks or too big for one."""
|
|
||||||
if on_host_count >= 2 or total_bytes > SINGLE_SPARK_BYTES:
|
|
||||||
return "cluster"
|
|
||||||
return "solo"
|
|
||||||
|
|
||||||
|
|
||||||
def infer_recipe(repo: str, config: dict, total_bytes: int, on_host_count: int) -> dict:
|
|
||||||
"""Propose a launch recipe for a discovered model — prefills the setup form."""
|
|
||||||
label, flags, caps = _detect_family(config or {})
|
|
||||||
mode = _infer_mode(total_bytes, on_host_count)
|
|
||||||
vllm_args = list(flags)
|
|
||||||
vllm_args.append("--max-num-batched-tokens=16384")
|
|
||||||
knobs = dict(_COMMON_KNOBS)
|
|
||||||
if mode == "cluster":
|
|
||||||
# Large models shard across both Sparks via Ray; leave more headroom.
|
|
||||||
vllm_args += ["-tp=2", "--distributed-executor-backend=ray"]
|
|
||||||
knobs["gpu_memory_utilization"] = 0.7
|
|
||||||
return {
|
|
||||||
"key": repo_to_key(repo),
|
|
||||||
"repo": repo,
|
|
||||||
"display_name": repo.split("/")[-1],
|
|
||||||
"mode": mode,
|
|
||||||
"capabilities": caps,
|
|
||||||
"vllm_args": vllm_args,
|
|
||||||
"knobs": knobs,
|
|
||||||
"family": label,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def _menu_entry_from_recipe(m, *, on_disk: bool, total_bytes: int, per_host: list[dict]) -> dict:
|
|
||||||
d = m.model_dump()
|
|
||||||
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
|
|
||||||
d["needs_setup"] = False
|
|
||||||
d["on_disk"] = on_disk
|
|
||||||
d["total_bytes"] = total_bytes
|
|
||||||
d["per_host"] = per_host
|
|
||||||
return d
|
|
||||||
|
|
||||||
|
|
||||||
async def build_menu(settings: Settings, catalog) -> dict[str, dict]:
|
|
||||||
"""The disk-driven model menu: every completed model on the Sparks, annotated
|
|
||||||
with its launch recipe (matched by repo) or flagged `needs_setup` if none.
|
|
||||||
|
|
||||||
Two SSH scans total (one per Spark), run in parallel — much cheaper than the
|
|
||||||
old per-recipe disk probe. A host that errors is skipped, not fatal."""
|
|
||||||
hosts = [(settings.spark1_host, settings.spark1_user)]
|
|
||||||
if settings.spark2_host:
|
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
|
||||||
scans = await asyncio.gather(
|
|
||||||
*(list_cached_models(h, u, settings) for h, u in hosts),
|
|
||||||
return_exceptions=True,
|
|
||||||
)
|
|
||||||
by_repo: dict[str, dict] = {}
|
|
||||||
for (h, _u), res in zip(hosts, scans):
|
|
||||||
if isinstance(res, Exception):
|
|
||||||
continue
|
|
||||||
for repo, size, complete in res:
|
|
||||||
e = by_repo.setdefault(repo, {"total_bytes": 0, "per_host": [], "complete": False})
|
|
||||||
e["total_bytes"] += size
|
|
||||||
e["per_host"].append({"host": h, "size_bytes": size})
|
|
||||||
e["complete"] = e["complete"] or complete
|
|
||||||
|
|
||||||
recipe_by_repo = {m.repo: (k, m) for k, m in catalog.models.items() if m.repo}
|
|
||||||
|
|
||||||
menu: dict[str, dict] = {}
|
|
||||||
for repo, info in by_repo.items():
|
|
||||||
# Skip half-fetched / corrupt caches (no finished snapshot) — they'd show
|
|
||||||
# as broken cards. In-flight downloads surface in the download panel.
|
|
||||||
if not info["complete"]:
|
|
||||||
continue
|
|
||||||
if repo in recipe_by_repo:
|
|
||||||
key, m = recipe_by_repo[repo]
|
|
||||||
menu[key] = _menu_entry_from_recipe(
|
|
||||||
m, on_disk=True, total_bytes=info["total_bytes"], per_host=info["per_host"]
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
key = repo_to_key(repo)
|
|
||||||
menu[key] = {
|
|
||||||
"display_name": repo.split("/")[-1],
|
|
||||||
"repo": repo,
|
|
||||||
"local_path": None,
|
|
||||||
"size_gb": round(info["total_bytes"] / 1e9, 1),
|
|
||||||
"mode": _infer_mode(info["total_bytes"], len(info["per_host"])),
|
|
||||||
"capabilities": [],
|
|
||||||
"expected_ready_seconds": 300,
|
|
||||||
"vllm_args": [],
|
|
||||||
"description": None,
|
|
||||||
"knobs": None,
|
|
||||||
"custom": False,
|
|
||||||
"needs_setup": True,
|
|
||||||
"effective_knobs": {},
|
|
||||||
"on_disk": True,
|
|
||||||
"total_bytes": info["total_bytes"],
|
|
||||||
"per_host": info["per_host"],
|
|
||||||
}
|
|
||||||
|
|
||||||
# Local/fine-tuned recipes live as a directory, not an HF cache entry — probe
|
|
||||||
# each by path and include it if present. Their keys are unique catalog keys
|
|
||||||
# (and local models carry repo="" per ModelDef), so they never collide with a
|
|
||||||
# discovered repo's slug or an HF recipe key above.
|
|
||||||
for key, m in catalog.models.items():
|
|
||||||
if not m.local_path:
|
|
||||||
continue
|
|
||||||
st = await probe_disk(m.repo, m.mode, settings, local_path=m.local_path)
|
|
||||||
if not st.on_disk:
|
|
||||||
continue
|
|
||||||
menu[key] = _menu_entry_from_recipe(
|
|
||||||
m,
|
|
||||||
on_disk=True,
|
|
||||||
total_bytes=st.total_bytes,
|
|
||||||
per_host=[{"host": r.host, "size_bytes": r.size_bytes} for r in st.per_host if r.on_disk],
|
|
||||||
)
|
|
||||||
|
|
||||||
return menu
|
|
||||||
+7
-130
@@ -10,13 +10,11 @@ model or one tied to an in-flight swap/download.
|
|||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import asyncio
|
import asyncio
|
||||||
import json
|
|
||||||
import re
|
import re
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
@@ -37,87 +35,6 @@ def repo_to_cache_dirname(repo: str) -> str:
|
|||||||
return dn
|
return dn
|
||||||
|
|
||||||
|
|
||||||
def cache_dirname_to_repo(dirname: str) -> Optional[str]:
|
|
||||||
"""Inverse of `repo_to_cache_dirname`: 'models--org--name' -> 'org/name'.
|
|
||||||
|
|
||||||
A repo has exactly one '/', so the org is the first '--'-segment and the name
|
|
||||||
is everything after (names may themselves contain single dashes). Returns
|
|
||||||
None for anything that isn't a model cache dir."""
|
|
||||||
if not dirname.startswith("models--"):
|
|
||||||
return None
|
|
||||||
parts = dirname[len("models--"):].split("--")
|
|
||||||
if len(parts) < 2 or not parts[0] or not parts[1]:
|
|
||||||
return None
|
|
||||||
return f"{parts[0]}/{'--'.join(parts[1:])}"
|
|
||||||
|
|
||||||
|
|
||||||
def parse_cache_listing(out: str) -> list[tuple[str, int, bool]]:
|
|
||||||
"""Parse the 'size|complete|dirname' lines from `list_cached_models`'s scan.
|
|
||||||
|
|
||||||
Returns [(repo, size_bytes, complete), ...], skipping non-model lines. Pure
|
|
||||||
function so the parsing is unit-testable without SSH."""
|
|
||||||
items: list[tuple[str, int, bool]] = []
|
|
||||||
for line in out.splitlines():
|
|
||||||
line = line.strip()
|
|
||||||
if line.count("|") < 2:
|
|
||||||
continue
|
|
||||||
size_s, complete_s, dirname = line.split("|", 2)
|
|
||||||
repo = cache_dirname_to_repo(dirname.strip())
|
|
||||||
if not repo:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
size = int(size_s)
|
|
||||||
except ValueError:
|
|
||||||
size = 0
|
|
||||||
items.append((repo, size, complete_s.strip() == "1"))
|
|
||||||
return items
|
|
||||||
|
|
||||||
|
|
||||||
async def list_cached_models(host: str, user: str, settings: Settings) -> list[tuple[str, int, bool]]:
|
|
||||||
"""Enumerate every Hugging Face model cached on a host: (repo, size_bytes, complete).
|
|
||||||
|
|
||||||
'complete' = the cache has at least one snapshot carrying a config.json (a
|
|
||||||
finished download, not a half-fetched/corrupt dir). One SSH round-trip; the
|
|
||||||
glob's no-match case is handled by the `[ -d ]` guard."""
|
|
||||||
if not host or not user:
|
|
||||||
return []
|
|
||||||
cmd = (
|
|
||||||
'HUB="$HOME/.cache/huggingface/hub"; '
|
|
||||||
'for d in "$HUB"/models--*; do '
|
|
||||||
'[ -d "$d" ] || continue; '
|
|
||||||
'n=$(basename "$d"); '
|
|
||||||
'sz=$(du -sb "$d" 2>/dev/null | cut -f1); sz=${sz:-0}; '
|
|
||||||
'if ls "$d"/snapshots/*/config.json >/dev/null 2>&1; then c=1; else c=0; fi; '
|
|
||||||
'echo "${sz}|${c}|${n}"; '
|
|
||||||
'done'
|
|
||||||
)
|
|
||||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=30.0)
|
|
||||||
if rc != 0:
|
|
||||||
return []
|
|
||||||
return parse_cache_listing(out)
|
|
||||||
|
|
||||||
|
|
||||||
async def read_model_config(host: str, user: str, repo: str, settings: Settings) -> Optional[dict]:
|
|
||||||
"""Read a cached model's config.json (first snapshot) for launch inference.
|
|
||||||
|
|
||||||
Returns the parsed dict, or None if absent/unreadable. The dirname is
|
|
||||||
whitelisted (repo_to_cache_dirname) so it's safe to embed unquoted."""
|
|
||||||
if not host or not user:
|
|
||||||
return None
|
|
||||||
dn = repo_to_cache_dirname(repo)
|
|
||||||
cmd = (
|
|
||||||
f'D=$(ls -d "$HOME/.cache/huggingface/hub/{dn}/snapshots/"*/ 2>/dev/null | head -1); '
|
|
||||||
f'[ -n "$D" ] && cat "${{D}}config.json" 2>/dev/null'
|
|
||||||
)
|
|
||||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
|
||||||
if rc != 0 or not out.strip():
|
|
||||||
return None
|
|
||||||
try:
|
|
||||||
return json.loads(out)
|
|
||||||
except (ValueError, TypeError):
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class HostDiskResult:
|
class HostDiskResult:
|
||||||
host: str
|
host: str
|
||||||
@@ -159,52 +76,16 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
|
|||||||
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||||
|
|
||||||
|
|
||||||
async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
|
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
||||||
"""Return whether a local model directory exists on this host and its size.
|
"""Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
|
||||||
|
|
||||||
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
|
|
||||||
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
|
|
||||||
shlex-quote it here in depth.
|
|
||||||
"""
|
|
||||||
if not host or not user:
|
|
||||||
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
|
|
||||||
qp = quote_arg(path)
|
|
||||||
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
|
|
||||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
|
||||||
if rc != 0:
|
|
||||||
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
|
|
||||||
raw = out.strip()
|
|
||||||
if raw == "MISSING" or raw == "":
|
|
||||||
return HostDiskResult(host=host, on_disk=False)
|
|
||||||
try:
|
|
||||||
size = int(raw.splitlines()[-1])
|
|
||||||
except ValueError:
|
|
||||||
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
|
|
||||||
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
|
||||||
|
|
||||||
|
|
||||||
async def probe_disk(
|
|
||||||
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
|
|
||||||
) -> DiskStatus:
|
|
||||||
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
|
|
||||||
|
|
||||||
A local model (local_path set) is probed by directory; otherwise by HF cache.
|
|
||||||
"""
|
|
||||||
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
if mode == "cluster" and settings.spark2_host:
|
if mode == "cluster" and settings.spark2_host:
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
if local_path:
|
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
||||||
results = await asyncio.gather(
|
|
||||||
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
|
|
||||||
)
|
|
||||||
key = local_path
|
|
||||||
else:
|
|
||||||
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
|
||||||
key = repo
|
|
||||||
on_disk = any(r.on_disk for r in results)
|
on_disk = any(r.on_disk for r in results)
|
||||||
total = sum(r.size_bytes for r in results)
|
total = sum(r.size_bytes for r in results)
|
||||||
return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
||||||
|
|
||||||
|
|
||||||
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
||||||
@@ -241,14 +122,10 @@ async def delete_host(host: str, user: str, repo: str, settings: Settings) -> Ho
|
|||||||
return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
|
return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
|
||||||
|
|
||||||
|
|
||||||
async def delete_from_disk(repo: str, settings: Settings) -> DiskStatus:
|
async def delete_from_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
||||||
"""rm -rf the model's cache dir on ALL configured Sparks. Idempotent.
|
"""rm -rf the model's cache dir on the relevant Sparks. Idempotent."""
|
||||||
|
|
||||||
We sweep both Sparks regardless of the model's declared mode: a 'remove from
|
|
||||||
disk & menu' must leave nothing behind, and rm of an absent dir reports 0
|
|
||||||
bytes freed (FREED 0), so an extra host is harmless."""
|
|
||||||
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
if settings.spark2_host:
|
if mode == "cluster" and settings.spark2_host:
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
|
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
|||||||
@@ -16,7 +16,6 @@ from datetime import datetime, timezone
|
|||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .shellsafe import quote_arg, validate_repo
|
|
||||||
from .ssh import ssh_stream, StreamHandle
|
from .ssh import ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
@@ -78,7 +77,8 @@ class DownloadManager:
|
|||||||
return self.jobs.get(job_id)
|
return self.jobs.get(job_id)
|
||||||
|
|
||||||
async def trigger(self, repo: str, mode: Mode) -> DownloadJob:
|
async def trigger(self, repo: str, mode: Mode) -> DownloadJob:
|
||||||
validate_repo(repo) # raises ValueError on anything but a clean 'org/name'
|
if not repo or "/" not in repo:
|
||||||
|
raise ValueError("repo must be in 'org/name' form")
|
||||||
if self.lock.locked():
|
if self.lock.locked():
|
||||||
raise RuntimeError("A download is already in progress")
|
raise RuntimeError("A download is already in progress")
|
||||||
job = DownloadJob(
|
job = DownloadJob(
|
||||||
@@ -126,7 +126,7 @@ class DownloadManager:
|
|||||||
if not target_host or not target_user:
|
if not target_host or not target_user:
|
||||||
raise RuntimeError(f"{job.mode} host not configured")
|
raise RuntimeError(f"{job.mode} host not configured")
|
||||||
|
|
||||||
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {quote_arg(job.repo)} {flags}".strip()
|
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {job.repo} {flags}".strip()
|
||||||
job.append(f"$ {cmd}")
|
job.append(f"$ {cmd}")
|
||||||
job.state = "downloading"
|
job.state = "downloading"
|
||||||
job.progress.phase = "Connecting to Hugging Face…"
|
job.progress.phase = "Connecting to Hugging Face…"
|
||||||
|
|||||||
@@ -25,10 +25,8 @@ vector is supplied, /api/search degrades cleanly to dense + rerank.
|
|||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import logging
|
import logging
|
||||||
import re
|
|
||||||
import time
|
import time
|
||||||
from typing import Any, Optional, Union
|
from typing import Any, Optional, Union
|
||||||
from urllib.parse import quote as urlquote
|
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
from fastapi import APIRouter, HTTPException
|
from fastapi import APIRouter, HTTPException
|
||||||
@@ -38,19 +36,6 @@ from .config import Settings
|
|||||||
|
|
||||||
logger = logging.getLogger("spark-control.embeddings")
|
logger = logging.getLogger("spark-control.embeddings")
|
||||||
|
|
||||||
# Qdrant collection name: caller-supplied and interpolated into the Qdrant URL
|
|
||||||
# path. Restrict to a metacharacter-free whitelist so it cannot inject path
|
|
||||||
# segments ('/', '..'), a query string ('?'), or a fragment ('#') and pivot to
|
|
||||||
# other collections/endpoints on the internal Qdrant. (Qdrant's own names are
|
|
||||||
# alphanumerics + dot/dash/underscore.)
|
|
||||||
_COLLECTION_RE = re.compile(r"^[A-Za-z0-9._-]+$")
|
|
||||||
|
|
||||||
|
|
||||||
def _safe_collection(name: str) -> str:
|
|
||||||
if not name or ".." in name or not _COLLECTION_RE.fullmatch(name):
|
|
||||||
raise HTTPException(400, f"invalid collection name: {name!r}")
|
|
||||||
return name
|
|
||||||
|
|
||||||
# Embedding/rerank can be slow on a cold model; search is interactive.
|
# Embedding/rerank can be slow on a cold model; search is interactive.
|
||||||
EMBED_TIMEOUT = 120.0
|
EMBED_TIMEOUT = 120.0
|
||||||
QDRANT_TIMEOUT = 30.0
|
QDRANT_TIMEOUT = 30.0
|
||||||
@@ -190,7 +175,6 @@ def build_router(settings: Settings) -> APIRouter:
|
|||||||
collection = body.collection or settings.qdrant_collection
|
collection = body.collection or settings.qdrant_collection
|
||||||
if not collection:
|
if not collection:
|
||||||
raise HTTPException(400, "collection is required (no default configured)")
|
raise HTTPException(400, "collection is required (no default configured)")
|
||||||
collection = _safe_collection(collection)
|
|
||||||
|
|
||||||
top_k = max(1, min(body.top_k, 100))
|
top_k = max(1, min(body.top_k, 100))
|
||||||
retrieve_n = body.retrieve_n or max(50, top_k * 10)
|
retrieve_n = body.retrieve_n or max(50, top_k * 10)
|
||||||
@@ -250,7 +234,7 @@ def build_router(settings: Settings) -> APIRouter:
|
|||||||
|
|
||||||
t1 = time.time()
|
t1 = time.time()
|
||||||
qr = await _post(
|
qr = await _post(
|
||||||
f"{_qdrant_base()}/collections/{urlquote(collection, safe='')}/points/query",
|
f"{_qdrant_base()}/collections/{collection}/points/query",
|
||||||
query_body, QDRANT_TIMEOUT, "qdrant",
|
query_body, QDRANT_TIMEOUT, "qdrant",
|
||||||
)
|
)
|
||||||
if qr.status_code == 404:
|
if qr.status_code == 404:
|
||||||
|
|||||||
@@ -26,9 +26,6 @@ echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.dra
|
|||||||
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
|
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
|
||||||
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
|
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
|
||||||
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
|
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
|
||||||
WGIF=$(ip -o link show type wireguard 2>/dev/null | awk -F': ' 'NR==1 {print $2}')
|
|
||||||
echo WG_IFACE=$WGIF
|
|
||||||
echo WG_ADDR=$(ip -o -4 addr show "$WGIF" 2>/dev/null | awk 'NR==1 {print $4}')
|
|
||||||
""".strip()
|
""".strip()
|
||||||
|
|
||||||
|
|
||||||
@@ -87,11 +84,6 @@ def _parse(out: str) -> dict:
|
|||||||
# MAC address on the default-route interface (for Wake-on-LAN)
|
# MAC address on the default-route interface (for Wake-on-LAN)
|
||||||
if info.get("mac"):
|
if info.get("mac"):
|
||||||
parsed["mac"] = info["mac"].lower()
|
parsed["mac"] = info["mac"].lower()
|
||||||
# WireGuard tunnel membership: name + address of the first wg interface, if
|
|
||||||
# any. Read-only and unprivileged (`ip` needs no root), so it never depends
|
|
||||||
# on sudo and never breaks the probe — absence just yields no badge.
|
|
||||||
parsed["wg_iface"] = info.get("wg_iface") or None
|
|
||||||
parsed["wg_addr"] = info.get("wg_addr") or None
|
|
||||||
return parsed
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
+9
-34
@@ -6,28 +6,17 @@ from .config import Settings
|
|||||||
_TIMEOUT = 3.0
|
_TIMEOUT = 3.0
|
||||||
|
|
||||||
|
|
||||||
def _disabled(settings: Settings, key: str) -> dict | None:
|
async def check_vllm(settings: Settings) -> dict:
|
||||||
"""A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
|
base_url = (
|
||||||
|
f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
|
||||||
Lets an adopter who doesn't run a given support service switch its probe off
|
if settings.spark1_host
|
||||||
entirely — so the probe never hits whatever else listens on that port, and
|
else None
|
||||||
the connectivity log doesn't record it as perpetually down."""
|
)
|
||||||
if key in settings.disabled_services:
|
if not settings.spark1_host:
|
||||||
return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
async def probe_vllm_endpoint(host: str, port: int) -> dict:
|
|
||||||
"""Probe any OpenAI-compatible vLLM at host:port via /v1/models.
|
|
||||||
|
|
||||||
Shared by the primary (Spark 1) health check and any extra vLLM registered
|
|
||||||
as a custom service (kind: vllm) to monitor a second Spark."""
|
|
||||||
base_url = f"http://{host}:{port}/v1" if host else None
|
|
||||||
if not host:
|
|
||||||
return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
|
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
||||||
r = await c.get(f"http://{host}:{port}/v1/models")
|
r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
ids = [m["id"] for m in r.json().get("data", [])]
|
ids = [m["id"] for m in r.json().get("data", [])]
|
||||||
return {
|
return {
|
||||||
@@ -40,15 +29,7 @@ async def probe_vllm_endpoint(host: str, port: int) -> dict:
|
|||||||
return {"ok": False, "error": str(e), "base_url": base_url}
|
return {"ok": False, "error": str(e), "base_url": base_url}
|
||||||
|
|
||||||
|
|
||||||
async def check_vllm(settings: Settings) -> dict:
|
|
||||||
if not settings.spark1_host:
|
|
||||||
return {"ok": False, "error": "spark1 not configured", "base_url": None}
|
|
||||||
return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
|
|
||||||
|
|
||||||
|
|
||||||
async def check_parakeet(settings: Settings) -> dict:
|
async def check_parakeet(settings: Settings) -> dict:
|
||||||
if d := _disabled(settings, "parakeet"):
|
|
||||||
return d
|
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
||||||
if settings.parakeet_host
|
if settings.parakeet_host
|
||||||
@@ -66,8 +47,6 @@ async def check_parakeet(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_kokoro(settings: Settings) -> dict:
|
async def check_kokoro(settings: Settings) -> dict:
|
||||||
if d := _disabled(settings, "kokoro"):
|
|
||||||
return d
|
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
||||||
if settings.kokoro_host
|
if settings.kokoro_host
|
||||||
@@ -89,8 +68,6 @@ async def check_kokoro(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_embeddings(settings: Settings) -> dict:
|
async def check_embeddings(settings: Settings) -> dict:
|
||||||
if d := _disabled(settings, "embeddings"):
|
|
||||||
return d
|
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.embed_host}:{settings.embed_port}"
|
f"http://{settings.embed_host}:{settings.embed_port}"
|
||||||
if settings.embed_host
|
if settings.embed_host
|
||||||
@@ -112,8 +89,6 @@ async def check_embeddings(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_qdrant(settings: Settings) -> dict:
|
async def check_qdrant(settings: Settings) -> dict:
|
||||||
if d := _disabled(settings, "qdrant"):
|
|
||||||
return d
|
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
||||||
if settings.qdrant_host
|
if settings.qdrant_host
|
||||||
|
|||||||
@@ -1,186 +0,0 @@
|
|||||||
"""Update + logs for the matrix-bridge bot container on the Spark.
|
|
||||||
|
|
||||||
matrix-bridge is a single Docker container managed by docker compose out of a
|
|
||||||
git clone at `~matrix_bridge_user/matrix-bridge`. Status (the badge) and
|
|
||||||
start/stop/restart ride the generic service machinery in `services.py`
|
|
||||||
(`docker_state` / `run_action`). The two things that don't fit that mould live
|
|
||||||
here:
|
|
||||||
|
|
||||||
- **Update** — `git fetch && git reset --hard origin/<branch> && docker
|
|
||||||
compose up -d --build`. Long-running (docker build), so it streams like the
|
|
||||||
vLLM `UpdateManager`: fire-and-forget job, SSE stream, fail-loud rc.
|
|
||||||
- **Logs** — a one-shot `docker logs --tail N` for diagnosing a red badge.
|
|
||||||
|
|
||||||
We connect **directly as the configured user** (`modelo` — the repo owner), so
|
|
||||||
git never trips its dubious-ownership guard and docker runs via the user's
|
|
||||||
docker-group membership. We deliberately do NOT `sudo -iu modelo`: this Spark
|
|
||||||
has no passwordless sudo, so a sudo wrap would hang in SSH BatchMode.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
import asyncio
|
|
||||||
import time
|
|
||||||
import uuid
|
|
||||||
from dataclasses import dataclass, field
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from typing import Optional
|
|
||||||
|
|
||||||
from .config import Settings
|
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_run, ssh_stream, StreamHandle
|
|
||||||
|
|
||||||
# Hard ceiling on a single update. A first build after a base-image bump is
|
|
||||||
# slow (minutes); the cache makes later ones quick. 25 min is generous headroom
|
|
||||||
# without letting a genuinely wedged build spin forever.
|
|
||||||
_UPDATE_TIMEOUT_S = 1500
|
|
||||||
|
|
||||||
|
|
||||||
def build_update_command(directory: str, branch: str) -> str:
|
|
||||||
"""The update one-liner, run from the bot's git clone as its owner.
|
|
||||||
|
|
||||||
`directory` and `branch` come from operator config (not request input), so
|
|
||||||
they're interpolated directly — same trust model as the Spark hostnames in
|
|
||||||
`health`/`updates`. `directory` may be `~/...`, which must stay unquoted so
|
|
||||||
the remote login shell expands it; quoting would defeat that.
|
|
||||||
"""
|
|
||||||
return (
|
|
||||||
f"cd {directory} && "
|
|
||||||
f"git fetch origin && "
|
|
||||||
f"git reset --hard origin/{branch} && "
|
|
||||||
f"docker compose up -d --build"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _phase_for(line: str) -> Optional[str]:
|
|
||||||
"""Map a streamed output line to a human-readable phase, or None to keep
|
|
||||||
the current phase. Kept loose — compose/buildkit output varies by version."""
|
|
||||||
low = line.lower()
|
|
||||||
if "git reset" in low or "head is now at" in low:
|
|
||||||
return "Resetting to the latest release…"
|
|
||||||
if "docker compose" in low or "buildkit" in low or low.startswith("step ") or "=> " in line or "building " in low:
|
|
||||||
return "Building the bot image…"
|
|
||||||
if "recreate" in low or "starting" in low or "started" in low or "container matrix-bridge" in low:
|
|
||||||
return "Recreating the container…"
|
|
||||||
if "already up to date" in low:
|
|
||||||
return "No new code; rebuilding…"
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class UpdateJob:
|
|
||||||
id: str
|
|
||||||
started_at: str
|
|
||||||
state: str = "starting"
|
|
||||||
lines: list[str] = field(default_factory=list)
|
|
||||||
returncode: Optional[int] = None
|
|
||||||
finished_at: Optional[str] = None
|
|
||||||
phase: str = "Starting…"
|
|
||||||
|
|
||||||
def append(self, line: str) -> None:
|
|
||||||
self.lines.append(line)
|
|
||||||
if len(self.lines) > 1000:
|
|
||||||
del self.lines[: len(self.lines) - 1000]
|
|
||||||
|
|
||||||
|
|
||||||
class MatrixBridgeManager:
|
|
||||||
def __init__(self, settings: Settings) -> None:
|
|
||||||
self.settings = settings
|
|
||||||
self.lock = asyncio.Lock()
|
|
||||||
self.jobs: dict[str, UpdateJob] = {}
|
|
||||||
self.current_job_id: Optional[str] = None
|
|
||||||
|
|
||||||
def _configured(self) -> bool:
|
|
||||||
s = self.settings
|
|
||||||
return bool(s.matrix_bridge_host and s.matrix_bridge_user)
|
|
||||||
|
|
||||||
def get(self, job_id: str) -> UpdateJob | None:
|
|
||||||
return self.jobs.get(job_id)
|
|
||||||
|
|
||||||
async def fetch_logs(self, tail: int = 100) -> dict:
|
|
||||||
"""One-shot `docker logs --tail N <container>` (stderr merged in)."""
|
|
||||||
s = self.settings
|
|
||||||
if not self._configured():
|
|
||||||
return {"ok": False, "error": "matrix-bridge host not configured"}
|
|
||||||
tail = max(1, min(int(tail), 1000))
|
|
||||||
# tail is already int-clamped, but quote at the sink anyway so the
|
|
||||||
# shellsafe convention (no raw interpolation into an SSH command) holds
|
|
||||||
# regardless of caller.
|
|
||||||
cmd = f"docker logs --tail {quote_arg(str(tail))} {quote_arg(s.matrix_bridge_container)} 2>&1"
|
|
||||||
rc, out, err = await ssh_run(
|
|
||||||
s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, timeout=20
|
|
||||||
)
|
|
||||||
return {
|
|
||||||
"ok": rc == 0,
|
|
||||||
"rc": rc,
|
|
||||||
"container": s.matrix_bridge_container,
|
|
||||||
"output": (out or err).strip(),
|
|
||||||
}
|
|
||||||
|
|
||||||
async def trigger_update(self) -> UpdateJob:
|
|
||||||
if not self._configured():
|
|
||||||
raise RuntimeError("matrix-bridge host not configured")
|
|
||||||
if self.lock.locked():
|
|
||||||
raise RuntimeError("An update is already in progress")
|
|
||||||
job = UpdateJob(
|
|
||||||
id=uuid.uuid4().hex[:8],
|
|
||||||
started_at=datetime.now(timezone.utc).isoformat(),
|
|
||||||
)
|
|
||||||
self.jobs[job.id] = job
|
|
||||||
self.current_job_id = job.id
|
|
||||||
asyncio.create_task(self._run(job))
|
|
||||||
return job
|
|
||||||
|
|
||||||
async def _run(self, job: UpdateJob) -> None:
|
|
||||||
async with self.lock:
|
|
||||||
try:
|
|
||||||
await self._do(job)
|
|
||||||
if job.state != "failed":
|
|
||||||
job.state = "done"
|
|
||||||
job.returncode = 0
|
|
||||||
job.phase = "Done"
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
job.append(f"[error] update timed out after {_UPDATE_TIMEOUT_S}s")
|
|
||||||
job.state = "failed"
|
|
||||||
job.returncode = 124
|
|
||||||
job.phase = "Timed out"
|
|
||||||
except Exception as e:
|
|
||||||
job.append(f"[error] {type(e).__name__}: {e}")
|
|
||||||
job.state = "failed"
|
|
||||||
if job.returncode is None:
|
|
||||||
job.returncode = 1
|
|
||||||
finally:
|
|
||||||
job.finished_at = datetime.now(timezone.utc).isoformat()
|
|
||||||
if self.current_job_id == job.id:
|
|
||||||
self.current_job_id = None
|
|
||||||
|
|
||||||
async def _do(self, job: UpdateJob) -> None:
|
|
||||||
s = self.settings
|
|
||||||
cmd = build_update_command(s.matrix_bridge_dir, s.matrix_bridge_branch)
|
|
||||||
job.append(f"$ {cmd}")
|
|
||||||
job.state = "running"
|
|
||||||
job.phase = "Fetching latest code…"
|
|
||||||
|
|
||||||
handle = StreamHandle()
|
|
||||||
gen = ssh_stream(s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, handle=handle)
|
|
||||||
deadline = time.monotonic() + _UPDATE_TIMEOUT_S
|
|
||||||
try:
|
|
||||||
while True:
|
|
||||||
remaining = deadline - time.monotonic()
|
|
||||||
if remaining <= 0:
|
|
||||||
raise asyncio.TimeoutError
|
|
||||||
try:
|
|
||||||
line = await asyncio.wait_for(gen.__anext__(), timeout=remaining)
|
|
||||||
except StopAsyncIteration:
|
|
||||||
break
|
|
||||||
job.append(line)
|
|
||||||
phase = _phase_for(line)
|
|
||||||
if phase:
|
|
||||||
job.phase = phase
|
|
||||||
finally:
|
|
||||||
# Closing the generator terminates the underlying ssh process and
|
|
||||||
# populates handle.returncode via ssh_stream's finally block.
|
|
||||||
await gen.aclose()
|
|
||||||
|
|
||||||
rc = handle.returncode or 0
|
|
||||||
if rc != 0:
|
|
||||||
job.state = "failed"
|
|
||||||
job.returncode = rc
|
|
||||||
+5
-79
@@ -1,33 +1,14 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import logging
|
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
import yaml
|
import yaml
|
||||||
from pydantic import BaseModel, Field, model_validator
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
from .overrides import apply_knobs_to_args, load_overrides
|
from .overrides import apply_knobs_to_args, load_overrides
|
||||||
from .shellsafe import quote_arg, quote_args, validate_local_path
|
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
def _chat_template_path(vllm_args: list[str]) -> str | None:
|
|
||||||
"""Extract the path from a `--chat-template=<path>` arg, if present."""
|
|
||||||
for a in vllm_args:
|
|
||||||
if a.startswith("--chat-template="):
|
|
||||||
return a.split("=", 1)[1]
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _is_within(path: str, base: str) -> bool:
|
|
||||||
"""True if `path` is `base` itself or lives inside it (lexical check)."""
|
|
||||||
base = base.rstrip("/")
|
|
||||||
return path == base or path.startswith(base + "/")
|
|
||||||
|
|
||||||
|
|
||||||
class ModelDef(BaseModel):
|
class ModelDef(BaseModel):
|
||||||
display_name: str
|
display_name: str
|
||||||
repo: str = "" # HF 'org/name'; empty for a local model
|
repo: str
|
||||||
local_path: str | None = None # absolute dir on the Spark; set => local model
|
|
||||||
size_gb: float
|
size_gb: float
|
||||||
mode: Literal["solo", "cluster"]
|
mode: Literal["solo", "cluster"]
|
||||||
capabilities: list[str] = Field(default_factory=list)
|
capabilities: list[str] = Field(default_factory=list)
|
||||||
@@ -37,38 +18,6 @@ class ModelDef(BaseModel):
|
|||||||
knobs: dict | None = None # user-customized; merged at launch time
|
knobs: dict | None = None # user-customized; merged at launch time
|
||||||
custom: bool = False # True if this came from /data overrides
|
custom: bool = False # True if this came from /data overrides
|
||||||
|
|
||||||
@model_validator(mode="after")
|
|
||||||
def _validate_source(self) -> "ModelDef":
|
|
||||||
if bool(self.repo) == bool(self.local_path):
|
|
||||||
raise ValueError(
|
|
||||||
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
|
|
||||||
f"or 'local_path' (Spark directory)"
|
|
||||||
)
|
|
||||||
if self.local_path:
|
|
||||||
# Single place that enforces the path whitelist, so YAML/override
|
|
||||||
# entries get the same boundary check as the API. The quote_arg sink
|
|
||||||
# is still defense-in-depth.
|
|
||||||
validate_local_path(self.local_path)
|
|
||||||
# Only local_path is bind-mounted into the vLLM container, so any
|
|
||||||
# --chat-template path must live inside it or vLLM can't find it.
|
|
||||||
tmpl = _chat_template_path(self.vllm_args)
|
|
||||||
if tmpl is not None and not _is_within(tmpl, self.local_path):
|
|
||||||
raise ValueError(
|
|
||||||
f"--chat-template path {tmpl!r} must be inside the model "
|
|
||||||
f"directory {self.local_path!r} (only that directory is mounted "
|
|
||||||
f"into the container)"
|
|
||||||
)
|
|
||||||
return self
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_local(self) -> bool:
|
|
||||||
return bool(self.local_path)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def source(self) -> str:
|
|
||||||
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
|
|
||||||
return self.local_path if self.local_path else self.repo
|
|
||||||
|
|
||||||
|
|
||||||
class Defaults(BaseModel):
|
class Defaults(BaseModel):
|
||||||
port: int = 8888
|
port: int = 8888
|
||||||
@@ -97,8 +46,7 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
|||||||
continue
|
continue
|
||||||
defaults_dump = {
|
defaults_dump = {
|
||||||
"display_name": entry.get("display_name", key),
|
"display_name": entry.get("display_name", key),
|
||||||
"repo": entry.get("repo", ""),
|
"repo": entry["repo"],
|
||||||
"local_path": entry.get("local_path"),
|
|
||||||
"size_gb": float(entry.get("size_gb", 0)),
|
"size_gb": float(entry.get("size_gb", 0)),
|
||||||
"mode": entry.get("mode", "solo"),
|
"mode": entry.get("mode", "solo"),
|
||||||
"capabilities": entry.get("capabilities") or [],
|
"capabilities": entry.get("capabilities") or [],
|
||||||
@@ -108,12 +56,7 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
|||||||
"knobs": entry.get("knobs"),
|
"knobs": entry.get("knobs"),
|
||||||
"custom": True,
|
"custom": True,
|
||||||
}
|
}
|
||||||
# A single malformed override entry (bad path, missing source, etc.) must
|
new_models[key] = ModelDef.model_validate(defaults_dump)
|
||||||
# not take down the whole catalog — skip it and keep the rest loadable.
|
|
||||||
try:
|
|
||||||
new_models[key] = ModelDef.model_validate(defaults_dump)
|
|
||||||
except Exception as e:
|
|
||||||
log.warning("skipping invalid custom model %r: %s", key, e)
|
|
||||||
|
|
||||||
return Catalog(defaults=catalog.defaults, models=new_models)
|
return Catalog(defaults=catalog.defaults, models=new_models)
|
||||||
|
|
||||||
@@ -134,21 +77,4 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
|
|||||||
solo = "--solo " if model.mode == "solo" else ""
|
solo = "--solo " if model.mode == "solo" else ""
|
||||||
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
|
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
|
||||||
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
|
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
|
||||||
# source + args are user-controlled (custom models, knobs); shlex.quote each
|
return f"./launch-cluster.sh {solo}-d exec vllm serve {model.repo} {' '.join(args)}"
|
||||||
# so they cannot break out of the SSH shell command. shlex.split (used by the
|
|
||||||
# vLLM pre-flight validator) cleanly reverses this quoting.
|
|
||||||
prefix = ""
|
|
||||||
if model.local_path:
|
|
||||||
# A local model's directory isn't in the HF cache the launch script
|
|
||||||
# already mounts, so bind-mount it at the SAME path inside the vllm
|
|
||||||
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
|
|
||||||
# path inside and out means `vllm serve <dir>` and any
|
|
||||||
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
|
|
||||||
# change needed. (The env assignment sits before the script, so the
|
|
||||||
# validator's `serve`-keyed shlex round-trip is unaffected.)
|
|
||||||
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
|
|
||||||
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
|
|
||||||
return (
|
|
||||||
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
|
|
||||||
f"{quote_arg(model.source)} {quote_args(args)}"
|
|
||||||
)
|
|
||||||
|
|||||||
+12
-23
@@ -18,7 +18,6 @@ from datetime import datetime, timezone
|
|||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_stream, StreamHandle
|
from .ssh import ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
@@ -139,40 +138,30 @@ class NimManager:
|
|||||||
|
|
||||||
async def _do(self, job: NimInstallJob, extra_env: dict[str, str]) -> None:
|
async def _do(self, job: NimInstallJob, extra_env: dict[str, str]) -> None:
|
||||||
# Build the bash one-liner. We use docker login non-interactively with the NGC API key.
|
# Build the bash one-liner. We use docker login non-interactively with the NGC API key.
|
||||||
# The real docker commands use shlex.quote'd values (img/ctr/vol) so nothing
|
env_parts = [f'-e NGC_API_KEY=$NGC_API_KEY']
|
||||||
# user-controlled can break out of the SSH shell. The cosmetic `echo` log lines
|
|
||||||
# embed the *raw* values inside single quotes — safe because image/container are
|
|
||||||
# validated against a metacharacter-free whitelist at the API boundary, and
|
|
||||||
# volume/port derive from them. (Embedding shlex.quote output inside another
|
|
||||||
# quoted echo string would be wrong — it can re-expose $() / $VAR.)
|
|
||||||
img = quote_arg(job.image)
|
|
||||||
ctr = quote_arg(job.container)
|
|
||||||
vol = quote_arg(job.volume)
|
|
||||||
port = int(job.port) # int can't inject; coerce defensively
|
|
||||||
env_parts = ['-e NGC_API_KEY=$NGC_API_KEY']
|
|
||||||
for k, v in extra_env.items():
|
for k, v in extra_env.items():
|
||||||
env_parts.append(f"-e {quote_arg(k)}={quote_arg(v)}")
|
env_parts.append(f"-e {k}={v}")
|
||||||
env_str = " ".join(env_parts)
|
env_str = " ".join(env_parts)
|
||||||
cmd = (
|
cmd = (
|
||||||
f"set -e; "
|
f"set -e; "
|
||||||
f"export NGC_API_KEY={quote_arg(self.settings.ngc_api_key or '')}; "
|
f"export NGC_API_KEY='{self.settings.ngc_api_key}'; "
|
||||||
f"echo '=== docker login nvcr.io ==='; "
|
f"echo '=== docker login nvcr.io ==='; "
|
||||||
f"echo \"$NGC_API_KEY\" | docker login nvcr.io -u '$oauthtoken' --password-stdin; "
|
f"echo \"$NGC_API_KEY\" | docker login nvcr.io -u '$oauthtoken' --password-stdin; "
|
||||||
f"echo '=== docker pull {job.image} (this can be 1-10 GB) ==='; "
|
f"echo '=== docker pull {job.image} (this can be 1-10 GB) ==='; "
|
||||||
f"docker pull {img}; "
|
f"docker pull {job.image}; "
|
||||||
f"echo '=== remove any prior container with the same name ==='; "
|
f"echo '=== remove any prior container with the same name ==='; "
|
||||||
f"docker rm -f {ctr} 2>/dev/null || true; "
|
f"docker rm -f {job.container} 2>/dev/null || true; "
|
||||||
f"echo '=== docker run -d --gpus all -p {job.port}:{job.port} -v {job.volume}:/opt/nim/.cache --name {job.container} --restart unless-stopped {job.image} ==='; "
|
f"echo '=== docker run -d --gpus all -p {job.port}:{job.port} -v {job.volume}:/opt/nim/.cache {env_str} --name {job.container} --restart unless-stopped {job.image} ==='; "
|
||||||
f"docker run -d --gpus all "
|
f"docker run -d --gpus all "
|
||||||
f"-p {port}:{port} "
|
f"-p {job.port}:{job.port} "
|
||||||
f"-v {vol}:/opt/nim/.cache "
|
f"-v {job.volume}:/opt/nim/.cache "
|
||||||
f"{env_str} "
|
f"{env_str} "
|
||||||
f"--name {ctr} "
|
f"--name {job.container} "
|
||||||
f"--restart unless-stopped "
|
f"--restart unless-stopped "
|
||||||
f"{img}; "
|
f"{job.image}; "
|
||||||
f"echo '=== ensuring cache volume is writable by uid 1000 (riva-server) ==='; "
|
f"echo '=== ensuring cache volume is writable by uid 1000 (riva-server) ==='; "
|
||||||
f"docker run --rm -v {vol}:/cache alpine chown -R 1000:1000 /cache && "
|
f"docker run --rm -v {job.volume}:/cache alpine chown -R 1000:1000 /cache && "
|
||||||
f"docker restart {ctr}; "
|
f"docker restart {job.container}; "
|
||||||
f"echo '=== install complete; container is starting up and will download its model on first boot ==='"
|
f"echo '=== install complete; container is starting up and will download its model on first boot ==='"
|
||||||
)
|
)
|
||||||
job.append(f"$ <install command for {job.image} on {job.host}>")
|
job.append(f"$ <install command for {job.image} on {job.host}>")
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ Shape:
|
|||||||
custom:
|
custom:
|
||||||
- key: my-new-model
|
- key: my-new-model
|
||||||
display_name: My New Model (from download)
|
display_name: My New Model (from download)
|
||||||
repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
|
repo: my-org/my-model
|
||||||
size_gb: 20
|
size_gb: 20
|
||||||
mode: solo
|
mode: solo
|
||||||
description: null
|
description: null
|
||||||
@@ -25,12 +25,6 @@ Shape:
|
|||||||
fastsafetensors: true
|
fastsafetensors: true
|
||||||
prefix_caching: true
|
prefix_caching: true
|
||||||
kv_cache_dtype: fp8
|
kv_cache_dtype: fp8
|
||||||
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
|
|
||||||
display_name: My Fine-tune
|
|
||||||
local_path: /home/you/models/my-finetune
|
|
||||||
size_gb: 59
|
|
||||||
mode: solo
|
|
||||||
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import os
|
import os
|
||||||
|
|||||||
@@ -28,8 +28,8 @@ import scrub as R # noqa: E402 (vendored engine)
|
|||||||
import test_scrub_leak as REF # noqa: E402 (reference fixtures)
|
import test_scrub_leak as REF # noqa: E402 (reference fixtures)
|
||||||
|
|
||||||
# Build the gateway app against a throwaway map store.
|
# Build the gateway app against a throwaway map store.
|
||||||
os.environ.setdefault("SPARK1_HOST", "<spark-1-ip>")
|
os.environ.setdefault("SPARK1_HOST", "192.168.1.103")
|
||||||
os.environ.setdefault("SPARK2_HOST", "<spark-2-ip>")
|
os.environ.setdefault("SPARK2_HOST", "192.168.1.87")
|
||||||
from app.config import Settings # noqa: E402
|
from app.config import Settings # noqa: E402
|
||||||
from app.redaction_gateway import build_router, MapStore # noqa: E402
|
from app.redaction_gateway import build_router, MapStore # noqa: E402
|
||||||
|
|
||||||
|
|||||||
+70
-414
@@ -3,32 +3,28 @@ import asyncio
|
|||||||
import json
|
import json
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from fastapi import FastAPI, HTTPException, Query, Request
|
from fastapi import FastAPI, HTTPException
|
||||||
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
||||||
from fastapi.staticfiles import StaticFiles
|
from fastapi.staticfiles import StaticFiles
|
||||||
from pydantic import BaseModel, ValidationError
|
from pydantic import BaseModel
|
||||||
from typing import Literal
|
from typing import Literal
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
|
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
|
||||||
from .coordination import LockHeld, ScheduleRegistry, SwapLockManager, WebhookNotifier, valid_schedule_id
|
|
||||||
from .custom_services import add_custom_service, delete_custom_service
|
from .custom_services import add_custom_service, delete_custom_service
|
||||||
from .audio_proxy import build_router as build_audio_router
|
from .audio_proxy import build_router as build_audio_router
|
||||||
from .deep_health import DeepHealth
|
from .deep_health import DeepHealth
|
||||||
from .discovery import build_menu, infer_recipe, repo_to_key
|
from .disk import delete_from_disk, probe_disk
|
||||||
from .disk import delete_from_disk, probe_host, read_model_config
|
|
||||||
from .download import DownloadManager
|
from .download import DownloadManager
|
||||||
from .llm_proxy import build_router as build_llm_router
|
from .llm_proxy import build_router as build_llm_router
|
||||||
from .embeddings_proxy import build_router as build_embeddings_router
|
from .embeddings_proxy import build_router as build_embeddings_router
|
||||||
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
||||||
from .hardware import HardwareProbe
|
from .hardware import HardwareProbe
|
||||||
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
|
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
|
||||||
from .matrix_bridge import MatrixBridgeManager
|
from .models import load_catalog
|
||||||
from .models import ModelDef, load_catalog
|
|
||||||
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
||||||
from .overrides import add_custom, delete_custom, load_overrides, set_knobs
|
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
|
||||||
from .services import docker_state, run_action, services_from_settings
|
from .services import docker_state, run_action, services_from_settings
|
||||||
from .shellsafe import validate_container, validate_image, validate_repo
|
|
||||||
from .speech_models import SpeechModelsManager
|
from .speech_models import SpeechModelsManager
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
from .swap import SwapManager
|
from .swap import SwapManager
|
||||||
@@ -39,65 +35,17 @@ from .wol import send_local_broadcast, send_via_peer
|
|||||||
|
|
||||||
settings = Settings.from_env()
|
settings = Settings.from_env()
|
||||||
catalog = load_catalog(settings.models_yaml)
|
catalog = load_catalog(settings.models_yaml)
|
||||||
# Coordination layer (GPU arbiter): swap-lifecycle webhook, the swap reservation
|
swap_manager = SwapManager(settings, catalog)
|
||||||
# lock, and the read-only schedule registry. See coordination.py.
|
|
||||||
swap_webhook = WebhookNotifier(settings.swap_webhook_url, settings.swap_webhook_secret)
|
|
||||||
swap_lock = SwapLockManager()
|
|
||||||
schedule_registry = ScheduleRegistry()
|
|
||||||
swap_manager = SwapManager(settings, catalog, notifier=swap_webhook)
|
|
||||||
download_manager = DownloadManager(settings)
|
download_manager = DownloadManager(settings)
|
||||||
update_manager = UpdateManager(settings)
|
update_manager = UpdateManager(settings)
|
||||||
hardware_probe = HardwareProbe(settings)
|
hardware_probe = HardwareProbe(settings)
|
||||||
nim_manager = NimManager(settings)
|
nim_manager = NimManager(settings)
|
||||||
deep_health = DeepHealth(settings)
|
deep_health = DeepHealth(settings)
|
||||||
speech_models = SpeechModelsManager(settings)
|
speech_models = SpeechModelsManager(settings)
|
||||||
matrix_bridge = MatrixBridgeManager(settings)
|
|
||||||
|
|
||||||
app = FastAPI(title="spark-control", version="0.1.0")
|
app = FastAPI(title="spark-control", version="0.1.0")
|
||||||
|
|
||||||
|
|
||||||
# ---- Same-origin (CSRF) guard on state-mutating control endpoints ----
|
|
||||||
# The app ships no API auth by design (LAN/VPN-only, no public interface). That
|
|
||||||
# makes the realistic remote threat a *browser-driven CSRF*: a malicious page open
|
|
||||||
# in the operator's browser silently POSTing to the control endpoints (swap, NIM
|
|
||||||
# install, service stop, disk delete, …) while they're on the trusted network.
|
|
||||||
# Browsers attach an Origin (and Referer) header to every cross-site state-changing
|
|
||||||
# request, so we reject mutating requests whose Origin/Referer hostname doesn't
|
|
||||||
# match the host the dashboard was served from. Programmatic consumers (Recap Relay,
|
|
||||||
# CRM, Open WebUI, …) hit the proxy/data surface below and send no browser Origin,
|
|
||||||
# so they're unaffected; the exempt prefixes are the cross-origin-by-design API.
|
|
||||||
_CSRF_SAFE_METHODS = {"GET", "HEAD", "OPTIONS", "TRACE"}
|
|
||||||
_CSRF_EXEMPT_PREFIXES = (
|
|
||||||
"/v1/", # OpenAI-compatible chat/audio/embeddings/rerank proxies
|
|
||||||
"/scrub", "/rehydrate", # redaction gateway (used by downstream apps)
|
|
||||||
"/api/search", # retrieval proxy
|
|
||||||
"/api/audio/", # diarize-chunk / label-merge / transcribe-with-speakers
|
|
||||||
"/api/health-event", # health reports posted by consumer apps
|
|
||||||
)
|
|
||||||
# Note: the coordination endpoints (/api/swap/lock, /api/schedule) are
|
|
||||||
# intentionally NOT exempt. External schedulers are non-browser clients (no
|
|
||||||
# Origin header) so they pass the guard already — same as /api/swap — while a
|
|
||||||
# malicious page can't drive them from the operator's browser. Don't add them.
|
|
||||||
|
|
||||||
|
|
||||||
@app.middleware("http")
|
|
||||||
async def csrf_guard(request, call_next):
|
|
||||||
if request.method not in _CSRF_SAFE_METHODS and not request.url.path.startswith(_CSRF_EXEMPT_PREFIXES):
|
|
||||||
origin = request.headers.get("origin") or request.headers.get("referer")
|
|
||||||
if origin:
|
|
||||||
from urllib.parse import urlparse
|
|
||||||
origin_host = urlparse(origin).hostname
|
|
||||||
req_host = (request.headers.get("host") or "").rsplit(":", 1)[0]
|
|
||||||
# Only block when we can positively identify a mismatch; absence of a
|
|
||||||
# header (non-browser client) or an unparseable Host falls through.
|
|
||||||
if origin_host and req_host and origin_host != req_host:
|
|
||||||
return JSONResponse(
|
|
||||||
status_code=403,
|
|
||||||
content={"detail": "cross-origin request to a control endpoint was blocked"},
|
|
||||||
)
|
|
||||||
return await call_next(request)
|
|
||||||
|
|
||||||
|
|
||||||
@app.on_event("startup")
|
@app.on_event("startup")
|
||||||
async def _start_deep_health() -> None:
|
async def _start_deep_health() -> None:
|
||||||
# Fire-and-forget; the loop catches its own exceptions.
|
# Fire-and-forget; the loop catches its own exceptions.
|
||||||
@@ -162,65 +110,20 @@ def _reload_catalog() -> None:
|
|||||||
swap_manager.reload_catalog(catalog)
|
swap_manager.reload_catalog(catalog)
|
||||||
|
|
||||||
|
|
||||||
def _recipe_summaries() -> list[dict]:
|
|
||||||
"""Known launch recipes (bundled + saved), for the download panel's autocomplete.
|
|
||||||
|
|
||||||
These are NOT the menu — the menu is what's on disk. This is just the set of
|
|
||||||
repos Spark Control already knows how to launch, so the download box can
|
|
||||||
suggest them by name without putting phantom cards on the dashboard."""
|
|
||||||
out = []
|
|
||||||
for m in catalog.models.values():
|
|
||||||
if m.repo:
|
|
||||||
out.append({"repo": m.repo, "display_name": m.display_name, "mode": m.mode})
|
|
||||||
return out
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/models")
|
@app.get("/api/models")
|
||||||
async def get_models() -> dict:
|
async def get_models() -> dict:
|
||||||
"""The model menu = what's actually downloaded on the Sparks (one scan per
|
out_models: dict[str, dict] = {}
|
||||||
Spark), each annotated with its launch recipe or flagged `needs_setup`.
|
for key, m in catalog.models.items():
|
||||||
|
d = m.model_dump()
|
||||||
Does SSH, so it's the slower of the model endpoints; the front-end calls it on
|
# Always include effective knobs for the UI (defaults from base args + any overrides)
|
||||||
load, after a swap/download/delete, and on a slow timer — not every poll."""
|
d["effective_knobs"] = {**extract_knobs_from_args(m.vllm_args), **(m.knobs or {})}
|
||||||
if not settings.configured:
|
out_models[key] = d
|
||||||
return {"configured": False, "defaults": catalog.defaults.model_dump(), "models": {}, "recipes": []}
|
|
||||||
menu = await build_menu(settings, catalog)
|
|
||||||
return {
|
return {
|
||||||
"configured": True,
|
|
||||||
"defaults": catalog.defaults.model_dump(),
|
"defaults": catalog.defaults.model_dump(),
|
||||||
"models": menu,
|
"models": out_models,
|
||||||
"recipes": _recipe_summaries(),
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/models/suggest")
|
|
||||||
async def suggest_model(repo: str = Query(...)) -> dict:
|
|
||||||
"""Read a downloaded model's config.json + size and propose a launch recipe.
|
|
||||||
|
|
||||||
Prefills the 'set up this model' form for an on-disk model that has no recipe
|
|
||||||
yet. The operator confirms/edits, then POSTs it to /api/models to save."""
|
|
||||||
if not settings.configured:
|
|
||||||
raise HTTPException(503, "spark1 not configured")
|
|
||||||
try:
|
|
||||||
validate_repo(repo)
|
|
||||||
except ValueError as e:
|
|
||||||
raise HTTPException(400, str(e))
|
|
||||||
hosts = [(settings.spark1_host, settings.spark1_user)]
|
|
||||||
if settings.spark2_host:
|
|
||||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
|
||||||
# Config from whichever Spark has it; size summed across the Sparks that do.
|
|
||||||
sizes = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
|
||||||
total = sum(r.size_bytes for r in sizes if r.on_disk)
|
|
||||||
on_hosts = sum(1 for r in sizes if r.on_disk)
|
|
||||||
config = None
|
|
||||||
for (h, u), r in zip(hosts, sizes):
|
|
||||||
if r.on_disk:
|
|
||||||
config = await read_model_config(h, u, repo, settings)
|
|
||||||
if config is not None:
|
|
||||||
break
|
|
||||||
return infer_recipe(repo, config or {}, total, on_hosts)
|
|
||||||
|
|
||||||
|
|
||||||
class KnobsBody(BaseModel):
|
class KnobsBody(BaseModel):
|
||||||
knobs: dict
|
knobs: dict
|
||||||
|
|
||||||
@@ -239,8 +142,7 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
|
|||||||
class CustomModelBody(BaseModel):
|
class CustomModelBody(BaseModel):
|
||||||
key: str
|
key: str
|
||||||
display_name: str
|
display_name: str
|
||||||
repo: str = ""
|
repo: str
|
||||||
local_path: str | None = None
|
|
||||||
size_gb: float = 0
|
size_gb: float = 0
|
||||||
mode: Literal["solo", "cluster"] = "solo"
|
mode: Literal["solo", "cluster"] = "solo"
|
||||||
description: str | None = None
|
description: str | None = None
|
||||||
@@ -253,19 +155,6 @@ class CustomModelBody(BaseModel):
|
|||||||
async def post_model(body: CustomModelBody) -> dict:
|
async def post_model(body: CustomModelBody) -> dict:
|
||||||
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
|
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
|
||||||
raise HTTPException(400, "key must be alphanumeric/-/_ only")
|
raise HTTPException(400, "key must be alphanumeric/-/_ only")
|
||||||
# Validate the full entry BEFORE persisting (exactly-one source, local-path
|
|
||||||
# whitelist, chat-template location). Doing it via ModelDef means the API and
|
|
||||||
# the YAML-override path share one set of rules, and a bad entry can't be
|
|
||||||
# written to /data and then break catalog load.
|
|
||||||
try:
|
|
||||||
ModelDef.model_validate(body.model_dump())
|
|
||||||
if body.repo:
|
|
||||||
validate_repo(body.repo) # HF charset (the model only validates local paths)
|
|
||||||
except ValidationError as e:
|
|
||||||
msg = e.errors()[0]["msg"] if e.errors() else str(e)
|
|
||||||
raise HTTPException(400, msg.removeprefix("Value error, "))
|
|
||||||
except ValueError as e:
|
|
||||||
raise HTTPException(400, str(e))
|
|
||||||
if body.key in catalog.models and not catalog.models[body.key].custom:
|
if body.key in catalog.models and not catalog.models[body.key].custom:
|
||||||
raise HTTPException(409, f"'{body.key}' is a bundled model — pick a different key")
|
raise HTTPException(409, f"'{body.key}' is a bundled model — pick a different key")
|
||||||
add_custom(body.model_dump())
|
add_custom(body.model_dump())
|
||||||
@@ -284,43 +173,57 @@ async def del_model(key: str) -> dict:
|
|||||||
return {"ok": True, "key": key}
|
return {"ok": True, "key": key}
|
||||||
|
|
||||||
|
|
||||||
@app.delete("/api/models/{key}/disk")
|
@app.get("/api/models/disk-status")
|
||||||
async def del_model_disk(key: str) -> dict:
|
async def get_models_disk_status() -> dict:
|
||||||
"""Remove a model's weights from the Sparks — and thus from the menu, since the
|
"""Probe each catalog model's HF cache on the appropriate Spark(s) in parallel.
|
||||||
menu IS the disk. Resolves the key against the live menu, so a discovered
|
|
||||||
model (no saved recipe) is deletable too.
|
|
||||||
|
|
||||||
Safety rails:
|
Result is keyed by model key: {on_disk, total_bytes, per_host:[{host,on_disk,size_bytes,error?}]}.
|
||||||
- Refuses a local/fine-tuned directory (hand-placed, not re-downloadable).
|
Designed to be called once on dashboard load; takes ~1–3s depending on Spark count.
|
||||||
- Refuses if the model is currently loaded on vLLM.
|
|
||||||
- Refuses if a swap or this model's own download is in flight.
|
|
||||||
- Idempotent across both Sparks: an already-absent cache dir frees 0 bytes.
|
|
||||||
"""
|
"""
|
||||||
if not settings.configured:
|
if not settings.configured:
|
||||||
raise HTTPException(503, "spark1 not configured")
|
return {"configured": False, "models": {}}
|
||||||
menu = await build_menu(settings, catalog)
|
keys = list(catalog.models.keys())
|
||||||
entry = menu.get(key)
|
statuses = await asyncio.gather(*(
|
||||||
if entry is None:
|
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
|
||||||
raise HTTPException(404, f"unknown model: {key}")
|
), return_exceptions=True)
|
||||||
|
out: dict[str, dict] = {}
|
||||||
|
for k, s in zip(keys, statuses):
|
||||||
|
if isinstance(s, Exception):
|
||||||
|
out[k] = {"on_disk": False, "total_bytes": 0, "per_host": [], "error": str(s)}
|
||||||
|
continue
|
||||||
|
out[k] = {
|
||||||
|
"on_disk": s.on_disk,
|
||||||
|
"total_bytes": s.total_bytes,
|
||||||
|
"per_host": [
|
||||||
|
{"host": r.host, "on_disk": r.on_disk, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
||||||
|
for r in s.per_host
|
||||||
|
],
|
||||||
|
}
|
||||||
|
return {"configured": True, "models": out}
|
||||||
|
|
||||||
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
|
|
||||||
# training output the user placed by hand, not a re-downloadable HF cache.
|
@app.delete("/api/models/{key}/disk")
|
||||||
if entry.get("local_path"):
|
async def del_model_disk(key: str) -> dict:
|
||||||
raise HTTPException(
|
"""Delete a model's weights from the Spark filesystem(s). The catalog entry stays.
|
||||||
400,
|
|
||||||
"this is a local model; its directory must be managed on the Spark, not deleted from here",
|
Safety rails:
|
||||||
)
|
- Refuses if the model is currently loaded on vLLM.
|
||||||
repo = entry["repo"]
|
- Refuses if a swap or download is in flight.
|
||||||
|
- Idempotent: if the cache dir is already gone on a host, that host reports 0 bytes freed.
|
||||||
|
"""
|
||||||
|
if key not in catalog.models:
|
||||||
|
raise HTTPException(404, f"unknown model: {key}")
|
||||||
|
m = catalog.models[key]
|
||||||
|
|
||||||
# Refuse if currently loaded
|
# Refuse if currently loaded
|
||||||
try:
|
try:
|
||||||
vllm = await check_vllm(settings)
|
vllm = await check_vllm(settings)
|
||||||
except Exception:
|
except Exception:
|
||||||
vllm = {}
|
vllm = {}
|
||||||
if vllm.get("ok") and vllm.get("current_model") == repo:
|
if vllm.get("ok") and vllm.get("current_model") == m.repo:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
409,
|
409,
|
||||||
f"'{entry['display_name']}' is the currently loaded model. Switch to a different model first, then try again."
|
f"'{m.display_name}' is the currently loaded model. Switch to a different model first, then try again."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Refuse if a swap is in flight
|
# Refuse if a swap is in flight
|
||||||
@@ -330,10 +233,10 @@ async def del_model_disk(key: str) -> dict:
|
|||||||
# Refuse if a download is in flight for this same repo (a different model's download is fine)
|
# Refuse if a download is in flight for this same repo (a different model's download is fine)
|
||||||
if download_manager.current_job_id:
|
if download_manager.current_job_id:
|
||||||
job = download_manager.get(download_manager.current_job_id)
|
job = download_manager.get(download_manager.current_job_id)
|
||||||
if job and job.repo == repo:
|
if job and job.repo == m.repo:
|
||||||
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
|
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
|
||||||
|
|
||||||
status = await delete_from_disk(repo, settings)
|
status = await delete_from_disk(m.repo, m.mode, settings)
|
||||||
# Audit log
|
# Audit log
|
||||||
record_report(
|
record_report(
|
||||||
f"disk:{key}",
|
f"disk:{key}",
|
||||||
@@ -344,7 +247,7 @@ async def del_model_disk(key: str) -> dict:
|
|||||||
return {
|
return {
|
||||||
"ok": True,
|
"ok": True,
|
||||||
"key": key,
|
"key": key,
|
||||||
"repo": repo,
|
"repo": m.repo,
|
||||||
"bytes_freed": status.total_bytes,
|
"bytes_freed": status.total_bytes,
|
||||||
"per_host": [
|
"per_host": [
|
||||||
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
||||||
@@ -455,53 +358,6 @@ async def wake_spark(name: str) -> dict:
|
|||||||
return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
|
return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/spark/{name}/ssh-key")
|
|
||||||
async def spark_ssh_key(name: str) -> dict:
|
|
||||||
"""Ensure the named Spark has an ed25519 keypair and return its PUBLIC key.
|
|
||||||
|
|
||||||
This is the Spark's *outbound* identity — the key it uses to log in to other
|
|
||||||
machines (e.g. the operator's Mac). It is the opposite direction from, and
|
|
||||||
distinct from, the package's own key shown by the StartOS "Show Public Key"
|
|
||||||
action (which grants this dashboard SSH access to the Sparks).
|
|
||||||
|
|
||||||
Non-destructive: generates the key only if absent, never overwrites an
|
|
||||||
existing one (which may already be an identity the Spark uses elsewhere).
|
|
||||||
Public keys are not secret, so returning it is safe. No request-supplied
|
|
||||||
value reaches the command — `name` is constrained to a fixed set and
|
|
||||||
host/user come from operator config — so there is nothing to shell-quote.
|
|
||||||
"""
|
|
||||||
if name not in ("spark1", "spark2"):
|
|
||||||
raise HTTPException(404, f"unknown spark: {name}")
|
|
||||||
host = settings.spark1_host if name == "spark1" else settings.spark2_host
|
|
||||||
user = settings.spark1_user if name == "spark1" else settings.spark2_user
|
|
||||||
if not host or not user:
|
|
||||||
raise HTTPException(400, f"{name} is not configured")
|
|
||||||
# Empty passphrase so the key is usable unattended; comment carries the
|
|
||||||
# remote hostname so it's identifiable in an authorized_keys file later.
|
|
||||||
cmd = (
|
|
||||||
"set -e; "
|
|
||||||
"mkdir -p ~/.ssh && chmod 700 ~/.ssh; "
|
|
||||||
"if [ ! -f ~/.ssh/id_ed25519 ]; then "
|
|
||||||
'ssh-keygen -t ed25519 -N "" -C "spark-control@$(hostname)" -f ~/.ssh/id_ed25519 >/dev/null 2>&1; '
|
|
||||||
"echo CREATED=1; else echo CREATED=0; fi; "
|
|
||||||
"[ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -y -f ~/.ssh/id_ed25519 > ~/.ssh/id_ed25519.pub; "
|
|
||||||
"echo PUBKEY=$(cat ~/.ssh/id_ed25519.pub)"
|
|
||||||
)
|
|
||||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=15)
|
|
||||||
if rc != 0:
|
|
||||||
raise HTTPException(502, f"couldn't read/create the SSH key on {name}: {err.strip() or out.strip() or f'rc={rc}'}")
|
|
||||||
created = False
|
|
||||||
pubkey = ""
|
|
||||||
for line in out.splitlines():
|
|
||||||
if line.startswith("CREATED="):
|
|
||||||
created = line.strip() == "CREATED=1"
|
|
||||||
elif line.startswith("PUBKEY="):
|
|
||||||
pubkey = line[len("PUBKEY="):].strip()
|
|
||||||
if not pubkey:
|
|
||||||
raise HTTPException(502, f"no public key returned from {name}")
|
|
||||||
return {"ok": True, "spark": name, "host": host, "user": user, "pubkey": pubkey, "created": created}
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/services")
|
@app.get("/api/services")
|
||||||
async def get_services() -> dict:
|
async def get_services() -> dict:
|
||||||
"""Lifecycle state of always-on support services (Parakeet, Kokoro, …).
|
"""Lifecycle state of always-on support services (Parakeet, Kokoro, …).
|
||||||
@@ -528,15 +384,6 @@ async def get_services() -> dict:
|
|||||||
http = await check_embeddings(settings)
|
http = await check_embeddings(settings)
|
||||||
elif name == "qdrant":
|
elif name == "qdrant":
|
||||||
http = await check_qdrant(settings)
|
http = await check_qdrant(settings)
|
||||||
elif svc.kind == "vllm":
|
|
||||||
# An extra vLLM monitored on another Spark (registered as a custom
|
|
||||||
# service). Probe its own host/port, not the primary Spark 1 one.
|
|
||||||
http = await probe_vllm_endpoint(svc.host, svc.port)
|
|
||||||
elif svc.kind == "bot":
|
|
||||||
# No HTTP health endpoint (host networking, no port) — judged purely
|
|
||||||
# by docker state. http_ready stays None so the badge isn't pinned
|
|
||||||
# to a "Starting…" verdict that can never clear.
|
|
||||||
http = {"ok": None, "base_url": None}
|
|
||||||
else:
|
else:
|
||||||
# Custom services expose a /health endpoint by convention.
|
# Custom services expose a /health endpoint by convention.
|
||||||
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
|
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
|
||||||
@@ -547,13 +394,11 @@ async def get_services() -> dict:
|
|||||||
"container": svc.container,
|
"container": svc.container,
|
||||||
"kind": svc.kind,
|
"kind": svc.kind,
|
||||||
"base_url": http.get("base_url"),
|
"base_url": http.get("base_url"),
|
||||||
# None (not False) for services with no HTTP surface (the bot), so
|
"http_ready": bool(http.get("ok")),
|
||||||
# the UI judges them by docker state alone instead of "Starting…".
|
|
||||||
"http_ready": None if svc.kind == "bot" else bool(http.get("ok")),
|
|
||||||
# Prefer the check fn's own top-level model key (embeddings reports
|
# Prefer the check fn's own top-level model key (embeddings reports
|
||||||
# it there); fall back to a model field inside detail for services
|
# it there); fall back to a model field inside detail for services
|
||||||
# whose /health embeds it (parakeet).
|
# whose /health embeds it (parakeet).
|
||||||
"model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
"model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
||||||
"docker_state": docker.get("state"),
|
"docker_state": docker.get("state"),
|
||||||
"restart_count": docker.get("restart_count"),
|
"restart_count": docker.get("restart_count"),
|
||||||
"started_at": docker.get("started_at"),
|
"started_at": docker.get("started_at"),
|
||||||
@@ -565,11 +410,8 @@ async def get_services() -> dict:
|
|||||||
results = await asyncio.gather(*[one(n) for n in services.keys()])
|
results = await asyncio.gather(*[one(n) for n in services.keys()])
|
||||||
for name, info in results:
|
for name, info in results:
|
||||||
out[name] = info
|
out[name] = info
|
||||||
# Feed http reachability into the connectivity log (transition-only).
|
# Feed http reachability into the connectivity log (transition-only)
|
||||||
# Skip services with no HTTP surface (http_ready is None) — they'd
|
record_state(name, bool(info.get("http_ready")))
|
||||||
# otherwise register as perpetually "down".
|
|
||||||
if info.get("http_ready") is not None:
|
|
||||||
record_state(name, bool(info.get("http_ready")))
|
|
||||||
return out
|
return out
|
||||||
|
|
||||||
|
|
||||||
@@ -593,11 +435,6 @@ class NimInstallBody(BaseModel):
|
|||||||
|
|
||||||
@app.post("/api/nim/install")
|
@app.post("/api/nim/install")
|
||||||
async def post_nim_install(body: NimInstallBody) -> dict:
|
async def post_nim_install(body: NimInstallBody) -> dict:
|
||||||
try:
|
|
||||||
validate_image(body.image)
|
|
||||||
validate_container(body.container)
|
|
||||||
except ValueError as e:
|
|
||||||
raise HTTPException(400, str(e))
|
|
||||||
target_host = settings.spark1_host if body.host == "spark1" else settings.spark2_host
|
target_host = settings.spark1_host if body.host == "spark1" else settings.spark2_host
|
||||||
target_user = settings.spark1_user if body.host == "spark1" else settings.spark2_user
|
target_user = settings.spark1_user if body.host == "spark1" else settings.spark2_user
|
||||||
try:
|
try:
|
||||||
@@ -674,7 +511,7 @@ async def stream_nim_install(job_id: str):
|
|||||||
@app.delete("/api/services/{name}")
|
@app.delete("/api/services/{name}")
|
||||||
async def del_service(name: str) -> dict:
|
async def del_service(name: str) -> dict:
|
||||||
# Only allow deleting custom services (not the bundled built-in keys)
|
# Only allow deleting custom services (not the bundled built-in keys)
|
||||||
if name in ("parakeet", "kokoro", "embeddings", "qdrant", "matrix-bridge"):
|
if name in ("parakeet", "kokoro", "embeddings", "qdrant"):
|
||||||
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
|
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
|
||||||
delete_custom_service(name)
|
delete_custom_service(name)
|
||||||
return {"ok": True, "name": name}
|
return {"ok": True, "name": name}
|
||||||
@@ -693,81 +530,6 @@ async def service_action(name: str, action: str) -> dict:
|
|||||||
return {"name": name, "action": action, **result}
|
return {"name": name, "action": action, **result}
|
||||||
|
|
||||||
|
|
||||||
# ---- matrix-bridge bot: update (git pull + rebuild) + logs ----
|
|
||||||
# Status badge + start/stop/restart ride the generic /api/services machinery
|
|
||||||
# above (the bot is a registered ServiceDef). Only the long-running Update and
|
|
||||||
# the logs view need bespoke endpoints.
|
|
||||||
|
|
||||||
def _serialize_mb_update(job) -> dict:
|
|
||||||
return {
|
|
||||||
"id": job.id,
|
|
||||||
"state": job.state,
|
|
||||||
"phase": job.phase,
|
|
||||||
"started_at": job.started_at,
|
|
||||||
"finished_at": job.finished_at,
|
|
||||||
"returncode": job.returncode,
|
|
||||||
"lines": job.lines,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/matrix-bridge/update")
|
|
||||||
async def post_matrix_bridge_update() -> dict:
|
|
||||||
"""Pull latest code, rebuild, and recreate the bot container. Long-running
|
|
||||||
(docker build) — returns a job id to stream."""
|
|
||||||
try:
|
|
||||||
job = await matrix_bridge.trigger_update()
|
|
||||||
except RuntimeError as e:
|
|
||||||
raise HTTPException(409 if "in progress" in str(e) else 503, str(e))
|
|
||||||
return {"job_id": job.id, "state": job.state}
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/matrix-bridge/update/{job_id}")
|
|
||||||
async def get_matrix_bridge_update(job_id: str) -> dict:
|
|
||||||
job = matrix_bridge.get(job_id)
|
|
||||||
if job is None:
|
|
||||||
raise HTTPException(404, "no such job")
|
|
||||||
return _serialize_mb_update(job)
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/matrix-bridge/update/{job_id}/stream")
|
|
||||||
async def stream_matrix_bridge_update(job_id: str, request: Request):
|
|
||||||
job = matrix_bridge.get(job_id)
|
|
||||||
if job is None:
|
|
||||||
raise HTTPException(404, "no such job")
|
|
||||||
|
|
||||||
async def gen():
|
|
||||||
sent = 0
|
|
||||||
last_phase = None
|
|
||||||
while True:
|
|
||||||
# An update can run for minutes; bail promptly if the client is gone
|
|
||||||
# rather than spinning the poll loop until the job's 25-min ceiling.
|
|
||||||
if await request.is_disconnected():
|
|
||||||
return
|
|
||||||
n = len(job.lines)
|
|
||||||
if n > sent:
|
|
||||||
for line in job.lines[sent:n]:
|
|
||||||
yield f"data: {json.dumps({'line': line})}\n\n"
|
|
||||||
sent = n
|
|
||||||
if job.phase != last_phase:
|
|
||||||
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
|
|
||||||
last_phase = job.phase
|
|
||||||
if job.returncode is not None and sent >= len(job.lines):
|
|
||||||
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
|
|
||||||
return
|
|
||||||
await asyncio.sleep(0.5)
|
|
||||||
|
|
||||||
return StreamingResponse(gen(), media_type="text/event-stream")
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/matrix-bridge/logs")
|
|
||||||
async def get_matrix_bridge_logs(tail: int = Query(100, ge=1, le=1000)) -> dict:
|
|
||||||
"""Last N lines of `docker logs` for the bot container (stderr merged)."""
|
|
||||||
result = await matrix_bridge.fetch_logs(tail=tail)
|
|
||||||
if not result.get("ok"):
|
|
||||||
raise HTTPException(502, result.get("output") or result.get("error") or "could not read logs")
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
# ---- Speech model patch management ----
|
# ---- Speech model patch management ----
|
||||||
|
|
||||||
@app.get("/api/speech-models")
|
@app.get("/api/speech-models")
|
||||||
@@ -831,20 +593,17 @@ async def get_endpoints() -> dict:
|
|||||||
"base_url": vllm.get("base_url"),
|
"base_url": vllm.get("base_url"),
|
||||||
"model": vllm.get("current_model"),
|
"model": vllm.get("current_model"),
|
||||||
"openai_compat": True,
|
"openai_compat": True,
|
||||||
"disabled": bool(vllm.get("disabled")),
|
|
||||||
},
|
},
|
||||||
"parakeet": {
|
"parakeet": {
|
||||||
"ready": bool(parakeet.get("ok")),
|
"ready": bool(parakeet.get("ok")),
|
||||||
"base_url": parakeet.get("base_url"),
|
"base_url": parakeet.get("base_url"),
|
||||||
"kind": "stt",
|
"kind": "stt",
|
||||||
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
||||||
"disabled": bool(parakeet.get("disabled")),
|
|
||||||
},
|
},
|
||||||
"kokoro": {
|
"kokoro": {
|
||||||
"ready": bool(kokoro.get("ok")),
|
"ready": bool(kokoro.get("ok")),
|
||||||
"base_url": kokoro.get("base_url"),
|
"base_url": kokoro.get("base_url"),
|
||||||
"kind": "tts",
|
"kind": "tts",
|
||||||
"disabled": bool(kokoro.get("disabled")),
|
|
||||||
},
|
},
|
||||||
"embeddings": {
|
"embeddings": {
|
||||||
"ready": bool(embeddings.get("ok")),
|
"ready": bool(embeddings.get("ok")),
|
||||||
@@ -853,14 +612,12 @@ async def get_endpoints() -> dict:
|
|||||||
"model": embeddings.get("model"),
|
"model": embeddings.get("model"),
|
||||||
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
||||||
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
||||||
"disabled": bool(embeddings.get("disabled")),
|
|
||||||
},
|
},
|
||||||
"qdrant": {
|
"qdrant": {
|
||||||
"ready": bool(qdrant.get("ok")),
|
"ready": bool(qdrant.get("ok")),
|
||||||
"base_url": qdrant.get("base_url"),
|
"base_url": qdrant.get("base_url"),
|
||||||
"kind": "vectordb",
|
"kind": "vectordb",
|
||||||
"collection": settings.qdrant_collection or None,
|
"collection": settings.qdrant_collection or None,
|
||||||
"disabled": bool(qdrant.get("disabled")),
|
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -874,15 +631,12 @@ async def get_status() -> dict:
|
|||||||
check_embeddings(settings),
|
check_embeddings(settings),
|
||||||
check_qdrant(settings),
|
check_qdrant(settings),
|
||||||
)
|
)
|
||||||
# Feed health into the connectivity log (deduped — only logs on transition).
|
# Feed health into the connectivity log (deduped — only logs on transition)
|
||||||
# Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
|
record_state("vllm", bool(vllm.get("ok")))
|
||||||
# perpetually down.
|
record_state("parakeet", bool(parakeet.get("ok")))
|
||||||
for _name, _r in (
|
record_state("kokoro", bool(kokoro.get("ok")))
|
||||||
("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
|
record_state("embeddings", bool(embeddings.get("ok")))
|
||||||
("embeddings", embeddings), ("qdrant", qdrant),
|
record_state("qdrant", bool(qdrant.get("ok")))
|
||||||
):
|
|
||||||
if not _r.get("disabled"):
|
|
||||||
record_state(_name, bool(_r.get("ok")))
|
|
||||||
current_key = _identify_current_model(vllm.get("current_model"))
|
current_key = _identify_current_model(vllm.get("current_model"))
|
||||||
return {
|
return {
|
||||||
"configured": settings.configured,
|
"configured": settings.configured,
|
||||||
@@ -899,13 +653,10 @@ async def get_status() -> dict:
|
|||||||
def _identify_current_model(repo: str | None) -> str | None:
|
def _identify_current_model(repo: str | None) -> str | None:
|
||||||
if not repo:
|
if not repo:
|
||||||
return None
|
return None
|
||||||
# A recipe-backed model keys by its recipe key; a discovered model (loaded but
|
|
||||||
# not yet set up) keys by the same slug build_menu uses, so it still
|
|
||||||
# highlights as the active card.
|
|
||||||
for key, m in catalog.models.items():
|
for key, m in catalog.models.items():
|
||||||
if m.repo == repo:
|
if m.repo == repo:
|
||||||
return key
|
return key
|
||||||
return repo_to_key(repo)
|
return None
|
||||||
|
|
||||||
|
|
||||||
class SwapRequest(BaseModel):
|
class SwapRequest(BaseModel):
|
||||||
@@ -923,21 +674,9 @@ async def validate_swap(key: str) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
@app.post("/api/swap")
|
@app.post("/api/swap")
|
||||||
async def post_swap(req: SwapRequest, request: Request) -> dict:
|
async def post_swap(req: SwapRequest) -> dict:
|
||||||
if not settings.configured and not req.dry_run:
|
if not settings.configured and not req.dry_run:
|
||||||
raise HTTPException(503, "spark1 not configured")
|
raise HTTPException(503, "spark1 not configured")
|
||||||
# Enforce the swap reservation lock (the GPU arbiter). A held lock blocks any
|
|
||||||
# real swap that doesn't present the holder's token in X-Swap-Lock-Token — so
|
|
||||||
# an external scheduler that holds the lock can swap, but the dashboard (no
|
|
||||||
# token) is refused while someone else holds it. Dry runs don't touch the
|
|
||||||
# cluster, so they're exempt.
|
|
||||||
if not req.dry_run:
|
|
||||||
blocked = swap_lock.is_blocked_by(request.headers.get("x-swap-lock-token"))
|
|
||||||
if blocked is not None:
|
|
||||||
raise HTTPException(status_code=423, detail={
|
|
||||||
"error": "the GPU swap path is reserved by another holder",
|
|
||||||
"lock": blocked,
|
|
||||||
})
|
|
||||||
try:
|
try:
|
||||||
job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run)
|
job = await swap_manager.trigger(req.model_key, dry_run=req.dry_run)
|
||||||
except KeyError:
|
except KeyError:
|
||||||
@@ -992,89 +731,6 @@ async def stream_swap(job_id: str):
|
|||||||
return StreamingResponse(gen(), media_type="text/event-stream")
|
return StreamingResponse(gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
|
||||||
# ---- Coordination layer: swap lock + schedule registry ----
|
|
||||||
# Endpoints are control-surface, not browser-exempt: an external scheduler is a
|
|
||||||
# non-browser client (no Origin header) so it passes the CSRF guard already, the
|
|
||||||
# same way it calls /api/swap today; the dashboard is same-origin.
|
|
||||||
|
|
||||||
class LockAcquireRequest(BaseModel):
|
|
||||||
holder: str
|
|
||||||
ttl_seconds: int | None = None
|
|
||||||
note: str = ""
|
|
||||||
token: str | None = None # present only to extend an existing hold
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/swap/lock")
|
|
||||||
async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
|
|
||||||
"""Reserve the GPU swap path. Returns a secret token used to swap (header
|
|
||||||
X-Swap-Lock-Token) and to release. 409 if held by another holder."""
|
|
||||||
try:
|
|
||||||
lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
|
|
||||||
except ValueError as e:
|
|
||||||
raise HTTPException(422, str(e))
|
|
||||||
except LockHeld as e:
|
|
||||||
raise HTTPException(status_code=409, detail={
|
|
||||||
"error": "swap lock is held by another holder",
|
|
||||||
"lock": e.state,
|
|
||||||
})
|
|
||||||
return {**swap_lock.status(), "token": lock.token}
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/swap/lock")
|
|
||||||
async def get_swap_lock() -> dict:
|
|
||||||
"""Public, token-free view of the reservation: held? who? until when?"""
|
|
||||||
return swap_lock.status()
|
|
||||||
|
|
||||||
|
|
||||||
@app.delete("/api/swap/lock")
|
|
||||||
async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
|
|
||||||
"""Release the reservation. Needs the matching X-Swap-Lock-Token unless
|
|
||||||
?force=true (the human override from the dashboard)."""
|
|
||||||
token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
|
|
||||||
try:
|
|
||||||
released = swap_lock.release(token, force=force)
|
|
||||||
except PermissionError as e:
|
|
||||||
raise HTTPException(403, str(e))
|
|
||||||
return {"released": released, **swap_lock.status()}
|
|
||||||
|
|
||||||
|
|
||||||
class ScheduleRequest(BaseModel):
|
|
||||||
name: str
|
|
||||||
id: str | None = None
|
|
||||||
owner: str = ""
|
|
||||||
cron: str = ""
|
|
||||||
next_run: str = ""
|
|
||||||
description: str = ""
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/schedule")
|
|
||||||
async def list_schedules() -> dict:
|
|
||||||
return {"schedules": schedule_registry.list()}
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/schedule")
|
|
||||||
async def register_schedule(req: ScheduleRequest) -> dict:
|
|
||||||
"""Register (or update, by id) a schedule an external scheduler owns. Spark
|
|
||||||
Control only stores it for the dashboard — it never executes it."""
|
|
||||||
try:
|
|
||||||
entry = schedule_registry.register(
|
|
||||||
name=req.name, id=req.id, owner=req.owner,
|
|
||||||
cron=req.cron, next_run=req.next_run, description=req.description,
|
|
||||||
)
|
|
||||||
except ValueError as e:
|
|
||||||
raise HTTPException(422, str(e))
|
|
||||||
return entry.public()
|
|
||||||
|
|
||||||
|
|
||||||
@app.delete("/api/schedule/{schedule_id}")
|
|
||||||
async def delete_schedule(schedule_id: str) -> dict:
|
|
||||||
# Whitelist the path segment at the boundary (repo convention), even though
|
|
||||||
# it's only ever a dict key — keeps it from being reflected or logged raw.
|
|
||||||
if not valid_schedule_id(schedule_id):
|
|
||||||
raise HTTPException(422, "invalid schedule id")
|
|
||||||
return {"deleted": schedule_registry.delete(schedule_id)}
|
|
||||||
|
|
||||||
|
|
||||||
class DownloadRequest(BaseModel):
|
class DownloadRequest(BaseModel):
|
||||||
repo: str
|
repo: str
|
||||||
mode: Literal["spark1", "spark2", "cluster"] = "spark1"
|
mode: Literal["spark1", "spark2", "cluster"] = "spark1"
|
||||||
|
|||||||
+4
-27
@@ -5,17 +5,13 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
|
|||||||
appropriate host.
|
appropriate host.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import logging
|
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
||||||
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
||||||
@@ -92,27 +88,10 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
container=s.qdrant_container,
|
container=s.qdrant_container,
|
||||||
port=s.qdrant_port,
|
port=s.qdrant_port,
|
||||||
),
|
),
|
||||||
# matrix-bridge Matrix bot. No HTTP port to probe (host networking, no
|
|
||||||
# health endpoint) — judged purely by docker state. Driven as its own
|
|
||||||
# SSH user (modelo, the repo owner) so git/docker run unprivileged.
|
|
||||||
"matrix-bridge": ServiceDef(
|
|
||||||
name="matrix-bridge",
|
|
||||||
kind="bot",
|
|
||||||
host=s.matrix_bridge_host,
|
|
||||||
user=s.matrix_bridge_user,
|
|
||||||
container=s.matrix_bridge_container,
|
|
||||||
port=0,
|
|
||||||
),
|
|
||||||
}
|
}
|
||||||
for entry in load_custom_services():
|
for entry in load_custom_services():
|
||||||
key = entry.get("key")
|
key = entry.get("key")
|
||||||
if not key:
|
if not key or key in out:
|
||||||
continue
|
|
||||||
if key in out:
|
|
||||||
# A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
|
|
||||||
# an adopter who picked a colliding key for, say, a second vLLM sees
|
|
||||||
# why no tile appeared instead of a silent no-op.
|
|
||||||
log.warning("custom service %r collides with a built-in name; ignoring", key)
|
|
||||||
continue
|
continue
|
||||||
out[key] = ServiceDef(
|
out[key] = ServiceDef(
|
||||||
name=key,
|
name=key,
|
||||||
@@ -122,9 +101,7 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
container=entry.get("container", key),
|
container=entry.get("container", key),
|
||||||
port=int(entry.get("port", 0)),
|
port=int(entry.get("port", 0)),
|
||||||
)
|
)
|
||||||
# Drop services the deployment has switched off (DISABLED_SERVICES) so they
|
return out
|
||||||
# show no tile and are never probed/auto-restarted.
|
|
||||||
return {k: v for k, v in out.items() if k not in s.disabled_services}
|
|
||||||
|
|
||||||
|
|
||||||
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
||||||
@@ -134,7 +111,7 @@ async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
|||||||
if _is_recently_unreachable(svc.host, svc.user):
|
if _is_recently_unreachable(svc.host, svc.user):
|
||||||
return {"state": "unreachable", "host_unreachable": True, "restart_count": None, "uptime": None}
|
return {"state": "unreachable", "host_unreachable": True, "restart_count": None, "uptime": None}
|
||||||
cmd = (
|
cmd = (
|
||||||
f"docker inspect {quote_arg(svc.container)} "
|
f"docker inspect {svc.container} "
|
||||||
f"--format '{{{{.State.Status}}}}|{{{{.State.StartedAt}}}}|{{{{.RestartCount}}}}|{{{{.State.ExitCode}}}}|{{{{.State.Error}}}}' "
|
f"--format '{{{{.State.Status}}}}|{{{{.State.StartedAt}}}}|{{{{.RestartCount}}}}|{{{{.State.ExitCode}}}}|{{{{.State.Error}}}}' "
|
||||||
f"2>&1 || echo 'NOT_FOUND'"
|
f"2>&1 || echo 'NOT_FOUND'"
|
||||||
)
|
)
|
||||||
@@ -164,7 +141,7 @@ async def run_action(settings: Settings, svc: ServiceDef, action: ServiceAction)
|
|||||||
"""Run docker start/stop/restart on the target host."""
|
"""Run docker start/stop/restart on the target host."""
|
||||||
if not svc.host or not svc.user:
|
if not svc.host or not svc.user:
|
||||||
return {"ok": False, "error": "service host not configured"}
|
return {"ok": False, "error": "service host not configured"}
|
||||||
cmd = f"docker {action} {quote_arg(svc.container)}"
|
cmd = f"docker {action} {svc.container}"
|
||||||
rc, out, err = await ssh_run(svc.host, svc.user, cmd, settings, timeout=30)
|
rc, out, err = await ssh_run(svc.host, svc.user, cmd, settings, timeout=30)
|
||||||
return {
|
return {
|
||||||
"ok": rc == 0,
|
"ok": rc == 0,
|
||||||
|
|||||||
@@ -1,85 +0,0 @@
|
|||||||
"""Validation + safe-quoting for user-supplied values that cross into SSH shell
|
|
||||||
commands on the Sparks.
|
|
||||||
|
|
||||||
Two layers of defense (same spirit as disk.py's `_SAFE_DIRNAME`):
|
|
||||||
1. Validate at the API boundary against a strict whitelist — rejects junk
|
|
||||||
early with a clear error, and guarantees the value carries no shell
|
|
||||||
metacharacters (so it is also safe to drop into echo/log lines).
|
|
||||||
2. `quote_arg` / `quote_args` at the actual interpolation site — the real
|
|
||||||
guarantee: even a value that somehow skips validation cannot break out of
|
|
||||||
the command.
|
|
||||||
|
|
||||||
Rule: anything user-controlled that ends up in an `ssh_run` / `ssh_stream`
|
|
||||||
command string must go through one of these, never be raw f-string'd.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
import re
|
|
||||||
import shlex
|
|
||||||
|
|
||||||
# Hugging Face repo 'org/name'. HF identifiers allow letters, digits, dot, dash,
|
|
||||||
# underscore; exactly one slash separates org from name.
|
|
||||||
_HF_REPO_RE = re.compile(r"^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$")
|
|
||||||
|
|
||||||
# Docker/OCI image reference: registry/path/name[:tag][@sha256:digest].
|
|
||||||
# Conservative charset covering e.g. nvcr.io/nim/nvidia/parakeet-...:latest and
|
|
||||||
# @digest pins; excludes every shell metacharacter.
|
|
||||||
_IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
|
|
||||||
|
|
||||||
# Docker container / volume name (Docker's own rule).
|
|
||||||
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
|
|
||||||
|
|
||||||
# Absolute filesystem path to a local model directory on a Spark. Conservative
|
|
||||||
# charset (letters, digits, and safe path punctuation) with a required leading
|
|
||||||
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
|
|
||||||
# and '..' segments) is rejected separately in validate_local_path.
|
|
||||||
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
|
|
||||||
|
|
||||||
|
|
||||||
def validate_repo(repo: str) -> str:
|
|
||||||
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
|
|
||||||
if not _HF_REPO_RE.fullmatch(repo or ""):
|
|
||||||
raise ValueError(f"invalid model repo (expected 'org/name'): {repo!r}")
|
|
||||||
return repo
|
|
||||||
|
|
||||||
|
|
||||||
def validate_image(image: str) -> str:
|
|
||||||
"""Return `image` if it is a well-formed container image ref; else ValueError."""
|
|
||||||
if not image or len(image) > 512 or not _IMAGE_RE.fullmatch(image):
|
|
||||||
raise ValueError(f"invalid container image reference: {image!r}")
|
|
||||||
return image
|
|
||||||
|
|
||||||
|
|
||||||
def validate_container(name: str) -> str:
|
|
||||||
"""Return `name` if it is a valid Docker container/volume name; else ValueError."""
|
|
||||||
if not name or len(name) > 128 or not _CONTAINER_RE.fullmatch(name):
|
|
||||||
raise ValueError(f"invalid container name: {name!r}")
|
|
||||||
return name
|
|
||||||
|
|
||||||
|
|
||||||
def validate_local_path(path: str) -> str:
|
|
||||||
"""Return `path` if it is a safe absolute model directory path; else ValueError.
|
|
||||||
|
|
||||||
For locally fine-tuned models served by directory (not an HF repo). Requires
|
|
||||||
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
|
|
||||||
caller cannot traverse out of an intended models directory. The `quote_arg`
|
|
||||||
sink still quotes it in depth — this is the boundary check.
|
|
||||||
"""
|
|
||||||
p = path or ""
|
|
||||||
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
|
|
||||||
raise ValueError(
|
|
||||||
f"invalid local model path (expected an absolute path, no spaces or "
|
|
||||||
f"shell metacharacters): {path!r}"
|
|
||||||
)
|
|
||||||
if any(seg in (".", "..") for seg in p.split("/")):
|
|
||||||
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
|
|
||||||
return p
|
|
||||||
|
|
||||||
|
|
||||||
def quote_arg(value: object) -> str:
|
|
||||||
"""shlex.quote a single token for safe embedding in a shell command string."""
|
|
||||||
return shlex.quote(str(value))
|
|
||||||
|
|
||||||
|
|
||||||
def quote_args(values: object) -> str:
|
|
||||||
"""shlex.quote each token and join with spaces."""
|
|
||||||
return " ".join(shlex.quote(str(v)) for v in values) # type: ignore[union-attr]
|
|
||||||
+120
-504
@@ -13,27 +13,18 @@ const state = {
|
|||||||
swap_progress: 0, // 0–1
|
swap_progress: 0, // 0–1
|
||||||
services: {},
|
services: {},
|
||||||
service_action_in_flight: null, // e.g. "parakeet:restart"
|
service_action_in_flight: null, // e.g. "parakeet:restart"
|
||||||
mb_update_in_flight: false, // matrix-bridge update job running
|
|
||||||
hardware: {},
|
hardware: {},
|
||||||
config: {},
|
config: {},
|
||||||
configured: true,
|
configured: true,
|
||||||
timer_handle: null,
|
timer_handle: null,
|
||||||
deep_health: {},
|
deep_health: {},
|
||||||
models_loaded: false, // true once the first disk scan (/api/models) returns
|
disk_status: {}, // keyed by model key: { on_disk, total_bytes, per_host }
|
||||||
recipes: [], // known launch recipes (for the download autocomplete)
|
disk_status_loaded: false,
|
||||||
lock: { held: false }, // GPU swap reservation (coordination layer)
|
|
||||||
schedules: [], // schedules external automation has registered
|
|
||||||
};
|
};
|
||||||
|
|
||||||
const el = (sel) => document.querySelector(sel);
|
const el = (sel) => document.querySelector(sel);
|
||||||
const $$ = (sel) => document.querySelectorAll(sel);
|
const $$ = (sel) => document.querySelectorAll(sel);
|
||||||
|
|
||||||
// ISO timestamp -> local clock string (e.g. "2:45:10 PM"); '' if unparseable.
|
|
||||||
function fmtClock(iso) {
|
|
||||||
const t = Date.parse(iso);
|
|
||||||
return isNaN(t) ? '' : new Date(t).toLocaleTimeString();
|
|
||||||
}
|
|
||||||
|
|
||||||
function escapeHtml(s) {
|
function escapeHtml(s) {
|
||||||
if (s == null) return '';
|
if (s == null) return '';
|
||||||
return String(s)
|
return String(s)
|
||||||
@@ -59,86 +50,69 @@ function renderCards() {
|
|||||||
const root = el('#cards');
|
const root = el('#cards');
|
||||||
root.innerHTML = '';
|
root.innerHTML = '';
|
||||||
const isSwapping = !!state.swap_job_id;
|
const isSwapping = !!state.swap_job_id;
|
||||||
// GPU reserved by external automation — manual swaps are refused server-side
|
for (const key of Object.keys(state.models)) {
|
||||||
// (423); reflect that in the buttons so the click never bounces.
|
|
||||||
const locked = !!(state.lock && state.lock.held);
|
|
||||||
const lockTip = locked
|
|
||||||
? `Reserved by ${state.lock.holder || 'automation'}${state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : ''}`
|
|
||||||
: '';
|
|
||||||
const keys = Object.keys(state.models);
|
|
||||||
if (keys.length === 0) {
|
|
||||||
// The menu is the disk: nothing downloaded (or the scan hasn't returned yet).
|
|
||||||
root.innerHTML = state.models_loaded
|
|
||||||
? `<div class="empty-menu muted">No models downloaded on the Sparks yet. Use <strong>+ Download a new model</strong> above to fetch one — it'll appear here when it's done.</div>`
|
|
||||||
: `<div class="empty-menu muted">Scanning the Sparks for downloaded models…</div>`;
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
for (const key of keys) {
|
|
||||||
const m = state.models[key];
|
const m = state.models[key];
|
||||||
const isActive = key === state.current_model_key;
|
const isActive = key === state.current_model_key;
|
||||||
const card = document.createElement('div');
|
const card = document.createElement('div');
|
||||||
card.className = 'card' + (isActive ? ' active' : '') + (m.needs_setup ? ' needs-setup' : '');
|
card.className = 'card' + (isActive ? ' active' : '');
|
||||||
const desc = m.description
|
const desc = m.description
|
||||||
? `<div class="desc">${escapeHtml(m.description)}</div>`
|
? `<div class="desc">${escapeHtml(m.description)}</div>`
|
||||||
: '';
|
: '';
|
||||||
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
|
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
|
||||||
const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
|
// Disk-presence pill + trash button. Until /api/models/disk-status comes back,
|
||||||
// Every card on the menu is on disk by definition — show its real size.
|
// we don't know — render a neutral placeholder.
|
||||||
const gb = (m.total_bytes || 0) / 1e9;
|
const disk = state.disk_status[key];
|
||||||
const diskPill = gb > 0
|
let diskPill = '';
|
||||||
? `<span class="tag on-disk" title="Weights present on the Spark(s)">on disk · ${gb.toFixed(1)} GB</span>`
|
if (state.disk_status_loaded) {
|
||||||
: '';
|
if (disk && disk.on_disk) {
|
||||||
const setupPill = m.needs_setup
|
const gb = (disk.total_bytes / 1e9);
|
||||||
? `<span class="tag setup-pill" title="On disk, but Spark Control hasn't been told how to launch it">needs setup</span>`
|
diskPill = `<span class="tag on-disk" title="Weights present on disk">on disk · ${gb.toFixed(1)} GB</span>`;
|
||||||
: '';
|
} else {
|
||||||
// Trash = remove weights from disk AND from the menu. Disabled if active / mid-swap.
|
diskPill = `<span class="tag not-on-disk" title="Weights not downloaded">not downloaded</span>`;
|
||||||
// Never offered for local models: their directory is hand-placed training output,
|
}
|
||||||
// not a re-downloadable HF cache (the server refuses the delete too).
|
}
|
||||||
|
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
|
||||||
let trashBtn = '';
|
let trashBtn = '';
|
||||||
if (!m.local_path) {
|
if (state.disk_status_loaded && disk && disk.on_disk) {
|
||||||
const disabled = isActive || isSwapping;
|
const disabled = isActive || isSwapping;
|
||||||
const tip = isActive
|
const tip = isActive
|
||||||
? 'Currently loaded — switch to another model first'
|
? 'Currently loaded — switch to another model first'
|
||||||
: isSwapping
|
: isSwapping
|
||||||
? 'A swap is in progress'
|
? 'A swap is in progress'
|
||||||
: 'Remove weights from disk & menu';
|
: 'Delete weights from disk';
|
||||||
trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Remove from disk and menu" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
|
trashBtn = `<button class="icon-btn danger" data-disk-del-key="${key}" title="${escapeHtml(tip)}" aria-label="Delete from disk" ${disabled ? 'disabled' : ''}>${trashIcon}</button>`;
|
||||||
}
|
}
|
||||||
// Primary action: "Current" / "Switch to this", or "Set up & switch" for a
|
// Primary card action: "Switch to this" (green) when on disk; "Download" (blue) when not.
|
||||||
// model on disk that has no launch recipe yet.
|
// Before disk-status loads we render the swap button as a sensible default.
|
||||||
const swapBlocked = isSwapping || locked;
|
const isOnDisk = !state.disk_status_loaded || (disk && disk.on_disk);
|
||||||
const lockTipAttr = locked ? ` title="${escapeHtml(lockTip)}"` : '';
|
const dlInFlight = !!(typeof dlState !== 'undefined' && dlState && dlState.job_id);
|
||||||
let primaryBtn = '';
|
let primaryBtn = '';
|
||||||
if (isActive) {
|
if (isActive) {
|
||||||
primaryBtn = `<button class="btn" disabled>Current</button>`;
|
primaryBtn = `<button class="btn" disabled>Current</button>`;
|
||||||
} else if (m.needs_setup) {
|
} else if (isOnDisk) {
|
||||||
primaryBtn = `<button class="btn primary" data-setup-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Set up & switch</button>`;
|
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
|
||||||
} else {
|
} else {
|
||||||
primaryBtn = `<button class="btn primary" data-swap-key="${key}"${lockTipAttr} ${swapBlocked ? 'disabled' : ''}>Switch to this</button>`;
|
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
|
||||||
|
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
|
||||||
}
|
}
|
||||||
// The Test/Advanced controls need a saved recipe; hide them until setup is done.
|
|
||||||
const recipeActions = m.needs_setup ? '' : `
|
|
||||||
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
|
|
||||||
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>`;
|
|
||||||
card.innerHTML = `
|
card.innerHTML = `
|
||||||
<div class="name">${escapeHtml(m.display_name)}</div>
|
<div class="name">${escapeHtml(m.display_name)}</div>
|
||||||
<div class="meta">
|
<div class="meta">
|
||||||
<span class="tag mode-${m.mode}">${m.mode}</span>
|
<span class="tag mode-${m.mode}">${m.mode}</span>
|
||||||
${diskPill}
|
<span class="tag">${m.size_gb} GB</span>
|
||||||
${setupPill}
|
|
||||||
${customPill}
|
${customPill}
|
||||||
${localPill}
|
${diskPill}
|
||||||
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
|
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
|
||||||
</div>
|
</div>
|
||||||
${desc}
|
${desc}
|
||||||
<div class="muted small repo">
|
<div class="muted small repo">
|
||||||
${m.local_path
|
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>
|
||||||
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
|
|
||||||
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
|
|
||||||
</div>
|
</div>
|
||||||
<div class="spacer"></div>
|
<div class="spacer"></div>
|
||||||
<div class="card-actions">
|
<div class="card-actions">
|
||||||
${primaryBtn}${recipeActions}
|
${primaryBtn}
|
||||||
|
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
|
||||||
|
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>
|
||||||
${trashBtn}
|
${trashBtn}
|
||||||
</div>
|
</div>
|
||||||
<div class="test-result hidden" data-test-result-for="${key}"></div>
|
<div class="test-result hidden" data-test-result-for="${key}"></div>
|
||||||
@@ -148,8 +122,8 @@ function renderCards() {
|
|||||||
for (const btn of root.querySelectorAll('[data-swap-key]')) {
|
for (const btn of root.querySelectorAll('[data-swap-key]')) {
|
||||||
btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
|
btn.addEventListener('click', () => triggerSwap(btn.dataset.swapKey));
|
||||||
}
|
}
|
||||||
for (const btn of root.querySelectorAll('[data-setup-key]')) {
|
for (const btn of root.querySelectorAll('[data-download-key]')) {
|
||||||
btn.addEventListener('click', () => openSetupForKey(btn.dataset.setupKey));
|
btn.addEventListener('click', () => triggerDownloadForKey(btn.dataset.downloadKey));
|
||||||
}
|
}
|
||||||
for (const btn of root.querySelectorAll('[data-adv-key]')) {
|
for (const btn of root.querySelectorAll('[data-adv-key]')) {
|
||||||
btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
|
btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
|
||||||
@@ -331,32 +305,6 @@ async function wakeSpark(name) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Generate-if-missing + copy this Spark's OUTBOUND ssh public key (the key the
|
|
||||||
// Spark uses to log in to other machines, e.g. the Mac). Distinct from the
|
|
||||||
// package's own key in the StartOS "Show Public Key" action.
|
|
||||||
async function copySparkSshKey(name, btn) {
|
|
||||||
if (btn) btn.disabled = true;
|
|
||||||
try {
|
|
||||||
const r = await fetchJSON(`/api/spark/${name}/ssh-key`, { method: 'POST' });
|
|
||||||
// Best-effort clipboard copy; on plain-HTTP this no-ops, but the dialog
|
|
||||||
// below always shows the key for manual selection.
|
|
||||||
await copyText(r.pubkey, btn);
|
|
||||||
const label = r.host ? `${name} (${r.host})` : name;
|
|
||||||
el('#sshkey-title').textContent = `${name} — SSH public key`;
|
|
||||||
el('#sshkey-intro').textContent = r.created
|
|
||||||
? `Generated a new SSH key on ${label} and copied it to your clipboard. This is the key ${name} uses to log in to OTHER machines.`
|
|
||||||
: `${label} already had an SSH key; copied its public key to your clipboard. This is the key ${name} uses to log in to OTHER machines.`;
|
|
||||||
el('#sshkey-value').textContent = r.pubkey;
|
|
||||||
el('#sshkey-install').textContent =
|
|
||||||
`mkdir -p ~/.ssh && echo '${r.pubkey}' >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys`;
|
|
||||||
el('#sshkey-dialog').showModal();
|
|
||||||
} catch (e) {
|
|
||||||
alert(`Couldn't get the SSH key for ${name}: ${e.message}`);
|
|
||||||
} finally {
|
|
||||||
if (btn) btn.disabled = false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
function renderHardware() {
|
function renderHardware() {
|
||||||
const panel = el('#hardware-panel');
|
const panel = el('#hardware-panel');
|
||||||
const grid = el('#hardware-grid');
|
const grid = el('#hardware-grid');
|
||||||
@@ -410,21 +358,11 @@ function renderHardware() {
|
|||||||
if (s.gpu_temp_c != null) gpuExtras.push(`${s.gpu_temp_c}°C`);
|
if (s.gpu_temp_c != null) gpuExtras.push(`${s.gpu_temp_c}°C`);
|
||||||
if (s.gpu_power_w != null) gpuExtras.push(`${s.gpu_power_w.toFixed(0)}W`);
|
if (s.gpu_power_w != null) gpuExtras.push(`${s.gpu_power_w.toFixed(0)}W`);
|
||||||
const gpuExtrasStr = gpuExtras.length ? ` · ${gpuExtras.join(' · ')}` : '';
|
const gpuExtrasStr = gpuExtras.length ? ` · ${gpuExtras.join(' · ')}` : '';
|
||||||
// Read-only WireGuard badge: shown only when the Spark has a wg interface up.
|
|
||||||
// "VPN <ip>" means it's a peer on that tunnel (reachable off-LAN when the
|
|
||||||
// tunnel is up); it reflects interface presence, not live peer reachability.
|
|
||||||
const wgIp = s.wg_addr ? String(s.wg_addr).split('/')[0] : '';
|
|
||||||
const wgBadge = s.wg_iface
|
|
||||||
? ` · <span class="wg-badge" title="On WireGuard tunnel '${escapeHtml(s.wg_iface)}'${wgIp ? ' as ' + escapeHtml(wgIp) : ''} — reachable off-LAN while the tunnel is up">VPN${wgIp ? ' ' + escapeHtml(wgIp) : ''}</span>`
|
|
||||||
: '';
|
|
||||||
card.className = 'hw-card';
|
card.className = 'hw-card';
|
||||||
card.innerHTML = `
|
card.innerHTML = `
|
||||||
<div class="head">
|
<div class="head">
|
||||||
<span class="name">${escapeHtml(s.hostname || key)}</span>
|
<span class="name">${escapeHtml(s.hostname || key)}</span>
|
||||||
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}${wgBadge}</span>
|
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}</span>
|
||||||
<button class="icon-btn ssh-key-btn" data-ssh-key="${escapeHtml(key)}" title="Copy this Spark's SSH public key (creates one if it doesn't have one) — e.g. to let it log in to your Mac" aria-label="Copy SSH public key">
|
|
||||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
|
||||||
</button>
|
|
||||||
</div>
|
</div>
|
||||||
<div class="hw-metric">
|
<div class="hw-metric">
|
||||||
<span class="label">CPU</span>
|
<span class="label">CPU</span>
|
||||||
@@ -464,13 +402,8 @@ function classifyService(s) {
|
|||||||
if (s.docker_state === 'missing') return 'missing';
|
if (s.docker_state === 'missing') return 'missing';
|
||||||
if (s.docker_state === 'restarting') return 'unhealthy';
|
if (s.docker_state === 'restarting') return 'unhealthy';
|
||||||
if (s.docker_state === 'exited') return 'unhealthy';
|
if (s.docker_state === 'exited') return 'unhealthy';
|
||||||
if (s.docker_state === 'running') {
|
if (s.docker_state === 'running' && !s.http_ready) return 'starting';
|
||||||
// http_ready === false means an HTTP probe is expected but failing → still
|
if (s.docker_state === 'running' && s.http_ready) return 'running';
|
||||||
// warming up. null means the service has no HTTP surface (e.g. the bot), so
|
|
||||||
// a running container is simply healthy.
|
|
||||||
if (s.http_ready === false) return 'starting';
|
|
||||||
return 'running';
|
|
||||||
}
|
|
||||||
return s.docker_state || 'unknown';
|
return s.docker_state || 'unknown';
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -502,11 +435,6 @@ async function renderServices() {
|
|||||||
grid.innerHTML = '';
|
grid.innerHTML = '';
|
||||||
for (const [name, s] of entries) {
|
for (const [name, s] of entries) {
|
||||||
const cls = classifyService(s);
|
const cls = classifyService(s);
|
||||||
const isBot = s.kind === 'bot';
|
|
||||||
// The bot tile is opt-in: it only belongs to deployments that actually run
|
|
||||||
// matrix-bridge. When the container is absent (missing) or the host isn't
|
|
||||||
// configured, hide the tile entirely rather than show a stray red card.
|
|
||||||
if (isBot && (cls === 'missing' || cls === 'unconfigured')) continue;
|
|
||||||
const card = document.createElement('div');
|
const card = document.createElement('div');
|
||||||
card.className = `service-card ${cls}`;
|
card.className = `service-card ${cls}`;
|
||||||
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
|
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
|
||||||
@@ -519,7 +447,7 @@ async function renderServices() {
|
|||||||
return false;
|
return false;
|
||||||
};
|
};
|
||||||
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
|
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
|
||||||
const hostStr = s.host ? (s.port ? `${s.host}:${s.port}` : s.host) : '';
|
const hostStr = s.host ? `${s.host}:${s.port}` : '';
|
||||||
const hostRow = s.host
|
const hostRow = s.host
|
||||||
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
|
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
|
||||||
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
|
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
|
||||||
@@ -573,11 +501,9 @@ async function renderServices() {
|
|||||||
${restartsRow}
|
${restartsRow}
|
||||||
${deepRow}
|
${deepRow}
|
||||||
<div class="service-actions">
|
<div class="service-actions">
|
||||||
${isBot ? `<button class="btn primary" data-mb-update title="Pull latest code, rebuild, and recreate the bot" ${inFlight || state.mb_update_in_flight ? 'disabled' : ''}>Update</button>` : ''}
|
|
||||||
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
|
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
|
||||||
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
|
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
|
||||||
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
|
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
|
||||||
${isBot ? `<button class="btn" data-mb-logs title="Show the last 100 log lines">View logs</button>` : ''}
|
|
||||||
</div>
|
</div>
|
||||||
`;
|
`;
|
||||||
grid.appendChild(card);
|
grid.appendChild(card);
|
||||||
@@ -585,10 +511,6 @@ async function renderServices() {
|
|||||||
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
|
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
|
||||||
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
|
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
|
||||||
}
|
}
|
||||||
const mbUpdateBtn = grid.querySelector('[data-mb-update]');
|
|
||||||
if (mbUpdateBtn) mbUpdateBtn.addEventListener('click', onMatrixBridgeUpdate);
|
|
||||||
const mbLogsBtn = grid.querySelector('[data-mb-logs]');
|
|
||||||
if (mbLogsBtn) mbLogsBtn.addEventListener('click', openMatrixBridgeLogs);
|
|
||||||
for (const btn of grid.querySelectorAll('[data-dh-run]')) {
|
for (const btn of grid.querySelectorAll('[data-dh-run]')) {
|
||||||
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
|
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
|
||||||
}
|
}
|
||||||
@@ -767,118 +689,6 @@ async function onServiceAction(key) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ===================== matrix-bridge bot (update + logs) =====================
|
|
||||||
|
|
||||||
const mbState = { job_id: null, eventsource: null, timer: null, started_at: null };
|
|
||||||
|
|
||||||
function mbTimerStart(at) {
|
|
||||||
mbState.started_at = at;
|
|
||||||
if (mbState.timer) clearInterval(mbState.timer);
|
|
||||||
const tick = () => {
|
|
||||||
if (!mbState.started_at) return;
|
|
||||||
const sec = Math.max(0, Math.floor((Date.now() - mbState.started_at) / 1000));
|
|
||||||
el('#mb-update-elapsed').textContent = `${Math.floor(sec / 60)}:${(sec % 60).toString().padStart(2, '0')}`;
|
|
||||||
};
|
|
||||||
tick();
|
|
||||||
mbState.timer = setInterval(tick, 500);
|
|
||||||
}
|
|
||||||
|
|
||||||
async function onMatrixBridgeUpdate() {
|
|
||||||
if (state.mb_update_in_flight) return;
|
|
||||||
if (!confirm('Update the matrix-bridge bot?\n\nThis pulls the latest code, rebuilds the container image, and recreates the container. The first build after a base-image change can take several minutes. The bot is briefly offline while it restarts.')) return;
|
|
||||||
state.mb_update_in_flight = true;
|
|
||||||
renderServices();
|
|
||||||
try {
|
|
||||||
const r = await fetchJSON('/api/matrix-bridge/update', { method: 'POST' });
|
|
||||||
attachMbUpdateProgress(r.job_id);
|
|
||||||
} catch (e) {
|
|
||||||
state.mb_update_in_flight = false;
|
|
||||||
renderServices();
|
|
||||||
alert('Update failed to start: ' + e.message);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function attachMbUpdateProgress(jobId) {
|
|
||||||
mbState.job_id = jobId;
|
|
||||||
el('#mb-update-log').textContent = '';
|
|
||||||
el('#mb-update-title').textContent = 'Updating matrix-bridge…';
|
|
||||||
el('#mb-update-phase').textContent = 'Starting…';
|
|
||||||
el('#mb-update-dialog').showModal();
|
|
||||||
try {
|
|
||||||
const snap = await fetchJSON(`/api/matrix-bridge/update/${jobId}`);
|
|
||||||
mbTimerStart(Date.parse(snap.started_at));
|
|
||||||
el('#mb-update-phase').textContent = snap.phase || 'Working…';
|
|
||||||
el('#mb-update-log').textContent = (snap.lines || []).join('\n');
|
|
||||||
if (snap.returncode !== null) { onMbUpdateDone(snap); return; }
|
|
||||||
} catch { mbTimerStart(Date.now()); }
|
|
||||||
const es = new EventSource(`/api/matrix-bridge/update/${jobId}/stream`);
|
|
||||||
mbState.eventsource = es;
|
|
||||||
es.onmessage = ev => {
|
|
||||||
try {
|
|
||||||
const d = JSON.parse(ev.data);
|
|
||||||
if (d.line !== undefined) {
|
|
||||||
const log = el('#mb-update-log');
|
|
||||||
log.textContent += d.line + '\n';
|
|
||||||
log.scrollTop = log.scrollHeight;
|
|
||||||
}
|
|
||||||
} catch {}
|
|
||||||
};
|
|
||||||
es.addEventListener('phase', ev => {
|
|
||||||
try { el('#mb-update-phase').textContent = JSON.parse(ev.data).phase; } catch {}
|
|
||||||
});
|
|
||||||
es.addEventListener('done', ev => {
|
|
||||||
let d = {}; try { d = JSON.parse(ev.data); } catch {}
|
|
||||||
onMbUpdateDone(d);
|
|
||||||
});
|
|
||||||
es.onerror = () => {
|
|
||||||
// Don't leave the Update button wedged-disabled on a dropped stream. The
|
|
||||||
// job keeps running server-side; re-clicking Update returns a clean 409.
|
|
||||||
es.close();
|
|
||||||
mbState.eventsource = null;
|
|
||||||
state.mb_update_in_flight = false;
|
|
||||||
el('#mb-update-phase').textContent = 'Lost connection to the update stream — reopen or check logs.';
|
|
||||||
renderServices();
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
function onMbUpdateDone(d) {
|
|
||||||
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
|
|
||||||
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
|
|
||||||
state.mb_update_in_flight = false;
|
|
||||||
if (d.state === 'failed') {
|
|
||||||
el('#mb-update-title').textContent = `Update failed (rc=${d.returncode})`;
|
|
||||||
el('#mb-update-phase').textContent = 'Failed — see the log above.';
|
|
||||||
} else {
|
|
||||||
el('#mb-update-title').textContent = 'Update complete';
|
|
||||||
el('#mb-update-phase').textContent = 'Done ✓';
|
|
||||||
}
|
|
||||||
// Refresh the tile's badge.
|
|
||||||
(async () => { try { state.services = await fetchJSON('/api/services'); } catch {} renderServices(); })();
|
|
||||||
}
|
|
||||||
|
|
||||||
async function openMatrixBridgeLogs() {
|
|
||||||
const pre = el('#mb-logs-pre');
|
|
||||||
el('#mb-logs-title').textContent = 'matrix-bridge logs';
|
|
||||||
pre.textContent = 'Loading…';
|
|
||||||
el('#mb-logs-dialog').showModal();
|
|
||||||
await loadMatrixBridgeLogs();
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadMatrixBridgeLogs() {
|
|
||||||
const pre = el('#mb-logs-pre');
|
|
||||||
const btn = el('#mb-logs-refresh');
|
|
||||||
if (btn) btn.disabled = true;
|
|
||||||
try {
|
|
||||||
const r = await fetchJSON('/api/matrix-bridge/logs?tail=100');
|
|
||||||
pre.textContent = r.output || '(no output)';
|
|
||||||
pre.scrollTop = pre.scrollHeight;
|
|
||||||
} catch (e) {
|
|
||||||
pre.textContent = 'Could not read logs: ' + e.message;
|
|
||||||
} finally {
|
|
||||||
if (btn) btn.disabled = false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
function renderEndpoint(status) {
|
function renderEndpoint(status) {
|
||||||
const v = status.vllm || {};
|
const v = status.vllm || {};
|
||||||
const panel = el('#endpoint-panel');
|
const panel = el('#endpoint-panel');
|
||||||
@@ -948,10 +758,6 @@ function renderHealth(status) {
|
|||||||
function setDot(id, ok, payload) {
|
function setDot(id, ok, payload) {
|
||||||
const item = el(id);
|
const item = el(id);
|
||||||
if (!item) return;
|
if (!item) return;
|
||||||
// A service switched off via DISABLED_SERVICES isn't part of this
|
|
||||||
// deployment — hide its indicator entirely rather than show it as down.
|
|
||||||
if (payload && payload.disabled) { item.classList.add('hidden'); return; }
|
|
||||||
item.classList.remove('hidden');
|
|
||||||
const dot = item.querySelector('.dot');
|
const dot = item.querySelector('.dot');
|
||||||
dot.classList.remove('ok', 'bad', 'warn');
|
dot.classList.remove('ok', 'bad', 'warn');
|
||||||
if (ok === true) dot.classList.add('ok');
|
if (ok === true) dot.classList.add('ok');
|
||||||
@@ -1170,44 +976,24 @@ async function pollStatus() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let menuLoadInFlight = false;
|
|
||||||
|
|
||||||
async function loadModels() {
|
async function loadModels() {
|
||||||
// The menu is whatever's downloaded on the Sparks — /api/models does the scan
|
const data = await fetchJSON('/api/models');
|
||||||
// (SSH), so this is the slower model call. Best-effort: a transient failure
|
state.defaults = data.defaults || {};
|
||||||
// leaves the previous menu in place rather than blanking the dashboard.
|
state.models = data.models || {};
|
||||||
// Guard against overlap: init() fires this un-awaited and pollStatus()'s
|
|
||||||
// empty-menu fallback may call it again before the scan returns.
|
|
||||||
if (menuLoadInFlight) return;
|
|
||||||
menuLoadInFlight = true;
|
|
||||||
try {
|
|
||||||
const data = await fetchJSON('/api/models');
|
|
||||||
state.defaults = data.defaults || {};
|
|
||||||
state.models = data.models || {};
|
|
||||||
state.recipes = data.recipes || [];
|
|
||||||
state.models_loaded = true;
|
|
||||||
populateDownloadSuggestions();
|
|
||||||
renderCards();
|
|
||||||
} catch (e) {
|
|
||||||
console.warn('model menu load failed:', e.message);
|
|
||||||
} finally {
|
|
||||||
menuLoadInFlight = false;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Populate the download box's autocomplete with known recipes not currently on
|
async function loadDiskStatus() {
|
||||||
// disk — so common/bundled models stay discoverable without phantom menu cards.
|
// Probes each catalog model's HF cache over SSH; takes a beat. Best-effort.
|
||||||
function populateDownloadSuggestions() {
|
try {
|
||||||
const dl = el('#dl-suggestions');
|
const r = await fetchJSON('/api/models/disk-status');
|
||||||
if (!dl) return;
|
if (r && r.models) {
|
||||||
const onDiskRepos = new Set(Object.values(state.models).map(m => m.repo).filter(Boolean));
|
state.disk_status = r.models;
|
||||||
dl.innerHTML = '';
|
state.disk_status_loaded = true;
|
||||||
for (const r of state.recipes || []) {
|
renderCards();
|
||||||
if (onDiskRepos.has(r.repo)) continue;
|
}
|
||||||
const opt = document.createElement('option');
|
} catch (e) {
|
||||||
opt.value = r.repo;
|
// Silent — pills just won't render. Don't block dashboard.
|
||||||
opt.label = `${r.display_name} (${r.mode})`;
|
console.warn('disk-status probe failed:', e.message);
|
||||||
dl.appendChild(opt);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1221,12 +1007,14 @@ function fmtBytesShort(n) {
|
|||||||
|
|
||||||
function openDiskDeleteDialog(key) {
|
function openDiskDeleteDialog(key) {
|
||||||
const m = state.models[key];
|
const m = state.models[key];
|
||||||
if (!m || !m.on_disk) return;
|
const disk = state.disk_status[key];
|
||||||
|
if (!m || !disk || !disk.on_disk) return;
|
||||||
const dlg = el('#disk-delete-dialog');
|
const dlg = el('#disk-delete-dialog');
|
||||||
el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(m.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from the Sparks. This also takes it off the menu.`;
|
el('#dd-summary').innerHTML = `Free <strong>${fmtBytesShort(disk.total_bytes)}</strong> by removing <strong>${escapeHtml(m.display_name)}</strong> (<code>${escapeHtml(m.repo)}</code>) from disk.`;
|
||||||
const hostsEl = el('#dd-hosts');
|
const hostsEl = el('#dd-hosts');
|
||||||
hostsEl.innerHTML = '';
|
hostsEl.innerHTML = '';
|
||||||
for (const h of (m.per_host || [])) {
|
for (const h of (disk.per_host || [])) {
|
||||||
|
if (!h.on_disk) continue;
|
||||||
const li = document.createElement('li');
|
const li = document.createElement('li');
|
||||||
li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
|
li.innerHTML = `<code>${escapeHtml(h.host)}</code> — ${fmtBytesShort(h.size_bytes)}`;
|
||||||
hostsEl.appendChild(li);
|
hostsEl.appendChild(li);
|
||||||
@@ -1245,19 +1033,20 @@ function openDiskDeleteDialog(key) {
|
|||||||
try {
|
try {
|
||||||
const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
|
const r = await fetchJSON(`/api/models/${encodeURIComponent(key)}/disk`, { method: 'DELETE' });
|
||||||
dlg.close();
|
dlg.close();
|
||||||
// Optimistically drop the card, then re-scan the menu (it's gone from disk).
|
// Optimistically clear local disk state for this key, then refresh.
|
||||||
delete state.models[key];
|
delete state.disk_status[key];
|
||||||
renderCards();
|
renderCards();
|
||||||
await loadModels();
|
// Eagerly re-probe so size is accurate (and shows "not downloaded" pill).
|
||||||
|
loadDiskStatus();
|
||||||
const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
|
const freed = r && typeof r.bytes_freed === 'number' ? fmtBytesShort(r.bytes_freed) : '';
|
||||||
console.log(`Removed ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
|
console.log(`Deleted ${m.display_name} from disk${freed ? ` — freed ${freed}` : ''}.`);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
errEl.textContent = e.message || 'Delete failed';
|
errEl.textContent = e.message || 'Delete failed';
|
||||||
errEl.classList.remove('hidden');
|
errEl.classList.remove('hidden');
|
||||||
} finally {
|
} finally {
|
||||||
confirm.disabled = false;
|
confirm.disabled = false;
|
||||||
cancel.disabled = false;
|
cancel.disabled = false;
|
||||||
confirm.textContent = 'Remove from disk & menu';
|
confirm.textContent = 'Delete from disk';
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
cancel.onclick = onCancel;
|
cancel.onclick = onCancel;
|
||||||
@@ -1267,11 +1056,6 @@ function openDiskDeleteDialog(key) {
|
|||||||
|
|
||||||
async function triggerSwap(modelKey) {
|
async function triggerSwap(modelKey) {
|
||||||
if (state.swap_job_id) return;
|
if (state.swap_job_id) return;
|
||||||
if (state.lock && state.lock.held) {
|
|
||||||
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
|
|
||||||
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
try {
|
try {
|
||||||
const r = await fetchJSON('/api/swap', {
|
const r = await fetchJSON('/api/swap', {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
@@ -1280,82 +1064,40 @@ async function triggerSwap(modelKey) {
|
|||||||
});
|
});
|
||||||
attachToSwap(r.job_id, /*needsBackfill=*/false);
|
attachToSwap(r.job_id, /*needsBackfill=*/false);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
// 423 Locked: a reservation was acquired between our last poll and this click.
|
alert('Failed to start swap: ' + e.message);
|
||||||
if (e.message && e.message.startsWith('423')) {
|
|
||||||
alert('The GPU swap path was just reserved by automation. Refreshing…');
|
|
||||||
pollCoordination();
|
|
||||||
} else {
|
|
||||||
alert('Failed to start swap: ' + e.message);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ---- coordination layer: swap lock + schedule registry ----
|
async function triggerDownloadForKey(modelKey) {
|
||||||
|
const m = state.models[modelKey];
|
||||||
async function pollCoordination() {
|
if (!m) return;
|
||||||
try {
|
if (dlState.job_id) {
|
||||||
state.lock = await fetchJSON('/api/swap/lock');
|
alert('A download is already in progress; wait for it to finish.');
|
||||||
} catch { state.lock = { held: false }; }
|
|
||||||
try {
|
|
||||||
const r = await fetchJSON('/api/schedule');
|
|
||||||
state.schedules = r.schedules || [];
|
|
||||||
} catch { state.schedules = []; }
|
|
||||||
renderLockBanner();
|
|
||||||
renderSchedules();
|
|
||||||
renderCards(); // reflect lock state on the swap buttons
|
|
||||||
}
|
|
||||||
|
|
||||||
function renderLockBanner() {
|
|
||||||
const banner = el('#lock-banner');
|
|
||||||
if (!banner) return;
|
|
||||||
const lock = state.lock;
|
|
||||||
if (lock && lock.held) {
|
|
||||||
const until = lock.expires_at ? ` until ${fmtClock(lock.expires_at)}` : '';
|
|
||||||
const note = lock.note ? ` — ${escapeHtml(lock.note)}` : '';
|
|
||||||
el('#lock-text').innerHTML =
|
|
||||||
`GPU swap path reserved by <strong>${escapeHtml(lock.holder || 'automation')}</strong>${until}${note}. Manual swaps are paused.`;
|
|
||||||
banner.classList.remove('hidden');
|
|
||||||
} else {
|
|
||||||
banner.classList.add('hidden');
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
function renderSchedules() {
|
|
||||||
const panel = el('#schedule-panel');
|
|
||||||
const list = el('#schedule-list');
|
|
||||||
if (!panel || !list) return;
|
|
||||||
const items = state.schedules || [];
|
|
||||||
if (!items.length) {
|
|
||||||
panel.classList.add('hidden');
|
|
||||||
list.innerHTML = '';
|
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
list.innerHTML = items.map((s) => {
|
// Pick the download target from the model's mode:
|
||||||
const meta = [
|
// solo -> spark1 only
|
||||||
s.cron ? `<code>${escapeHtml(s.cron)}</code>` : '',
|
// cluster -> both Sparks (fetch on Spark 1, rsync to Spark 2 in parallel)
|
||||||
s.next_run ? `next: ${escapeHtml(s.next_run)}` : '',
|
const dlMode = m.mode === 'cluster' ? 'cluster' : 'spark1';
|
||||||
s.owner ? `by ${escapeHtml(s.owner)}` : '',
|
const sizeNote = m.size_gb ? ` (~${m.size_gb} GB)` : '';
|
||||||
].filter(Boolean).join(' · ');
|
const target = m.mode === 'cluster' ? 'both Sparks' : 'Spark 1';
|
||||||
const desc = s.description ? `<div class="desc">${escapeHtml(s.description)}</div>` : '';
|
if (!confirm(`Download "${m.display_name}"${sizeNote} to ${target}? Large models can take a while; you can watch progress in the download panel.`)) {
|
||||||
return `<div class="schedule-item">
|
return;
|
||||||
<div class="name">${escapeHtml(s.name)}</div>
|
}
|
||||||
<div class="muted small">${meta}</div>
|
dlState.last_repo = m.repo;
|
||||||
${desc}
|
dlState.last_mode = dlMode;
|
||||||
</div>`;
|
try {
|
||||||
}).join('');
|
const r = await fetchJSON('/api/download', {
|
||||||
panel.classList.remove('hidden');
|
method: 'POST',
|
||||||
}
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({ repo: m.repo, mode: dlMode }),
|
||||||
async function releaseLock() {
|
});
|
||||||
const lock = state.lock || {};
|
// Open the download panel + attach to progress stream
|
||||||
const who = lock.holder || 'automation';
|
openDownloadForm();
|
||||||
if (!confirm(`Force-release the GPU reservation held by ${who}? Any job relying on it may then collide with a manual swap.`)) return;
|
attachToDownload(r.job_id);
|
||||||
try {
|
} catch (e) {
|
||||||
await fetchJSON('/api/swap/lock?force=true', { method: 'DELETE' });
|
alert('Failed to start download: ' + e.message);
|
||||||
} catch (e) {
|
|
||||||
alert('Failed to release: ' + e.message);
|
|
||||||
}
|
}
|
||||||
pollCoordination();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
async function attachToSwap(jobId, needsBackfill) {
|
async function attachToSwap(jobId, needsBackfill) {
|
||||||
@@ -1588,14 +1330,12 @@ function handleDownloadDone(d) {
|
|||||||
el('#dl-title').textContent = 'Done';
|
el('#dl-title').textContent = 'Done';
|
||||||
el('#dl-phase').textContent = 'Done ✓';
|
el('#dl-phase').textContent = 'Done ✓';
|
||||||
el('#dl-progress-fill').style.width = '100%';
|
el('#dl-progress-fill').style.width = '100%';
|
||||||
// The new model now appears on the menu (the menu is the disk). If it matched
|
// Offer to add to catalog
|
||||||
// a known recipe it's ready to switch to; if not, offer to set it up.
|
|
||||||
const repo = dlState.last_repo;
|
const repo = dlState.last_repo;
|
||||||
loadModels().then(() => {
|
const mode = dlState.last_mode;
|
||||||
if (!repo) return;
|
if (repo) {
|
||||||
const entry = Object.values(state.models).find(m => m.repo === repo);
|
setTimeout(() => openCatalogDialog(repo, mode), 600);
|
||||||
if (entry && entry.needs_setup) setTimeout(() => openSetupDialog(repo, { thenSwap: false }), 600);
|
}
|
||||||
});
|
|
||||||
}
|
}
|
||||||
dlState.job_id = null;
|
dlState.job_id = null;
|
||||||
}
|
}
|
||||||
@@ -1708,67 +1448,21 @@ function openAdvanced(key) {
|
|||||||
dlg.showModal();
|
dlg.showModal();
|
||||||
}
|
}
|
||||||
|
|
||||||
// Context carried from openSetupDialog -> the submit handler: the inferred
|
function openCatalogDialog(repo, mode) {
|
||||||
// launch flags (parsers/MoE backend) and whether to swap right after saving.
|
|
||||||
let setupCtx = { key: '', repo: '', vllm_args: [], thenSwap: false };
|
|
||||||
|
|
||||||
// "Set up & switch" on a needs-setup card.
|
|
||||||
async function openSetupForKey(key) {
|
|
||||||
const m = state.models[key];
|
|
||||||
if (!m) return;
|
|
||||||
if (state.lock && state.lock.held) {
|
|
||||||
const until = state.lock.expires_at ? ' until ' + fmtClock(state.lock.expires_at) : '';
|
|
||||||
alert(`The GPU swap path is reserved by ${state.lock.holder || 'automation'}${until}. Use "Release" on the reservation banner to override.`);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
await openSetupDialog(m.repo, { thenSwap: true });
|
|
||||||
}
|
|
||||||
|
|
||||||
// Open the "set up this model" dialog, prefilled from inference (config.json +
|
|
||||||
// size). The operator confirms once; on save the recipe persists and (if
|
|
||||||
// thenSwap) we switch to it.
|
|
||||||
async function openSetupDialog(repo, opts = {}) {
|
|
||||||
const dlg = el('#catalog-dialog');
|
const dlg = el('#catalog-dialog');
|
||||||
let sug = null;
|
const key = repo.split('/').pop().toLowerCase().replace(/[^a-z0-9_-]/g, '-');
|
||||||
try {
|
el('#cd-key').value = key;
|
||||||
sug = await fetchJSON(`/api/models/suggest?repo=${encodeURIComponent(repo)}`);
|
el('#cd-name').value = repo.split('/').pop();
|
||||||
} catch (e) {
|
|
||||||
console.warn('recipe suggestion failed:', e.message);
|
|
||||||
}
|
|
||||||
const fallbackKey = repo.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
|
|
||||||
setupCtx = {
|
|
||||||
key: (sug && sug.key) || fallbackKey,
|
|
||||||
repo,
|
|
||||||
vllm_args: (sug && sug.vllm_args) || [],
|
|
||||||
thenSwap: !!opts.thenSwap,
|
|
||||||
};
|
|
||||||
el('#cd-key').value = setupCtx.key;
|
|
||||||
el('#cd-name').value = (sug && sug.display_name) || repo.split('/').pop();
|
|
||||||
el('#cd-repo').value = repo;
|
el('#cd-repo').value = repo;
|
||||||
el('#cd-size').value = '';
|
el('#cd-size').value = '';
|
||||||
el('#cd-mode').value = (sug && sug.mode) || 'solo';
|
el('#cd-mode').value = mode || 'solo';
|
||||||
el('#cd-desc').value = '';
|
el('#cd-desc').value = '';
|
||||||
const knobs = (sug && sug.knobs) || {};
|
el('#cd-mml').value = 32768;
|
||||||
el('#cd-mml').value = knobs.max_model_len || 32768;
|
el('#cd-gmu').value = 0.85;
|
||||||
el('#cd-gmu').value = knobs.gpu_memory_utilization || 0.85;
|
el('#cd-gmu-out').value = '0.85';
|
||||||
el('#cd-gmu-out').value = parseFloat(el('#cd-gmu').value).toFixed(2);
|
el('#cd-fst').checked = true;
|
||||||
el('#cd-fst').checked = knobs.fastsafetensors !== false;
|
el('#cd-pcache').checked = true;
|
||||||
el('#cd-pcache').checked = knobs.prefix_caching !== false;
|
el('#cd-fp8').checked = true;
|
||||||
el('#cd-fp8').checked = (knobs.kv_cache_dtype || 'fp8') === 'fp8';
|
|
||||||
|
|
||||||
const det = el('#cd-detected');
|
|
||||||
if (det) {
|
|
||||||
if (sug) {
|
|
||||||
const caps = (sug.capabilities || []).join(', ');
|
|
||||||
const flags = setupCtx.vllm_args.length ? `: <code>${escapeHtml(setupCtx.vllm_args.join(' '))}</code>` : '';
|
|
||||||
det.innerHTML = `Detected <strong>${escapeHtml(sug.family || 'Generic')}</strong>${caps ? ` · ${escapeHtml(caps)}` : ''}. Launch flags set automatically${flags}.`;
|
|
||||||
} else {
|
|
||||||
det.textContent = "Couldn't auto-detect this model's settings — pick mode and knobs manually.";
|
|
||||||
}
|
|
||||||
det.classList.remove('hidden');
|
|
||||||
}
|
|
||||||
const submit = el('#cd-submit');
|
|
||||||
if (submit) submit.textContent = setupCtx.thenSwap ? 'Save & switch' : 'Save settings';
|
|
||||||
dlg.showModal();
|
dlg.showModal();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1778,15 +1472,13 @@ function setupCatalogDialog() {
|
|||||||
el('#catalog-form').addEventListener('submit', async (e) => {
|
el('#catalog-form').addEventListener('submit', async (e) => {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
const body = {
|
const body = {
|
||||||
key: el('#cd-key').value.trim() || setupCtx.key,
|
key: el('#cd-key').value.trim(),
|
||||||
display_name: el('#cd-name').value.trim(),
|
display_name: el('#cd-name').value.trim(),
|
||||||
repo: el('#cd-repo').value.trim(),
|
repo: el('#cd-repo').value.trim(),
|
||||||
size_gb: parseFloat(el('#cd-size').value) || 0,
|
size_gb: parseFloat(el('#cd-size').value) || 0,
|
||||||
mode: el('#cd-mode').value,
|
mode: el('#cd-mode').value,
|
||||||
description: el('#cd-desc').value.trim() || null,
|
description: el('#cd-desc').value.trim() || null,
|
||||||
// The inferred family flags (parsers / MoE backend); knob-controlled flags
|
vllm_args: [],
|
||||||
// are layered on by the server from `knobs`, so no duplication.
|
|
||||||
vllm_args: setupCtx.vllm_args || [],
|
|
||||||
knobs: {
|
knobs: {
|
||||||
max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
|
max_model_len: parseInt(el('#cd-mml').value, 10) || 32768,
|
||||||
gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
|
gpu_memory_utilization: parseFloat(el('#cd-gmu').value),
|
||||||
@@ -1804,9 +1496,8 @@ function setupCatalogDialog() {
|
|||||||
el('#catalog-dialog').close();
|
el('#catalog-dialog').close();
|
||||||
closeDownloadPanel();
|
closeDownloadPanel();
|
||||||
await loadModels();
|
await loadModels();
|
||||||
if (setupCtx.thenSwap) triggerSwap(body.key);
|
|
||||||
pollStatus();
|
pollStatus();
|
||||||
} catch (e) { alert('Saving the model setup failed: ' + e.message); }
|
} catch (e) { alert('Add to catalog failed: ' + e.message); }
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1815,60 +1506,6 @@ function setupAdvancedDialog() {
|
|||||||
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
||||||
}
|
}
|
||||||
|
|
||||||
function openLocalModelDialog() {
|
|
||||||
const dlg = el('#local-model-dialog');
|
|
||||||
el('#lm-key').value = '';
|
|
||||||
el('#lm-name').value = '';
|
|
||||||
el('#lm-path').value = '';
|
|
||||||
el('#lm-chat').value = '';
|
|
||||||
el('#lm-size').value = '';
|
|
||||||
el('#lm-mode').value = 'solo';
|
|
||||||
el('#lm-desc').value = '';
|
|
||||||
el('#lm-mml').value = 32768;
|
|
||||||
el('#lm-gmu').value = 0.85;
|
|
||||||
el('#lm-gmu-out').value = '0.85';
|
|
||||||
el('#lm-fst').checked = true;
|
|
||||||
el('#lm-pcache').checked = true;
|
|
||||||
el('#lm-fp8').checked = true;
|
|
||||||
dlg.showModal();
|
|
||||||
}
|
|
||||||
|
|
||||||
function setupLocalModelDialog() {
|
|
||||||
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
|
|
||||||
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
|
||||||
el('#local-model-form').addEventListener('submit', async (e) => {
|
|
||||||
e.preventDefault();
|
|
||||||
const chat = el('#lm-chat').value.trim();
|
|
||||||
const body = {
|
|
||||||
key: el('#lm-key').value.trim(),
|
|
||||||
display_name: el('#lm-name').value.trim(),
|
|
||||||
local_path: el('#lm-path').value.trim(),
|
|
||||||
size_gb: parseFloat(el('#lm-size').value) || 0,
|
|
||||||
mode: el('#lm-mode').value,
|
|
||||||
description: el('#lm-desc').value.trim() || null,
|
|
||||||
// A fine-tune's chat template (if any) rides along as a launch flag.
|
|
||||||
vllm_args: chat ? [`--chat-template=${chat}`] : [],
|
|
||||||
knobs: {
|
|
||||||
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
|
|
||||||
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
|
|
||||||
fastsafetensors: el('#lm-fst').checked,
|
|
||||||
prefix_caching: el('#lm-pcache').checked,
|
|
||||||
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
|
|
||||||
},
|
|
||||||
};
|
|
||||||
try {
|
|
||||||
await fetchJSON('/api/models', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'content-type': 'application/json' },
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
});
|
|
||||||
el('#local-model-dialog').close();
|
|
||||||
await loadModels();
|
|
||||||
pollStatus();
|
|
||||||
} catch (e) { alert('Add local model failed: ' + e.message); }
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
// ===================== NIM installer =====================
|
// ===================== NIM installer =====================
|
||||||
|
|
||||||
const nimState = {
|
const nimState = {
|
||||||
@@ -2210,33 +1847,15 @@ async function init() {
|
|||||||
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
|
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
|
||||||
el('#nim-form').addEventListener('submit', submitNim);
|
el('#nim-form').addEventListener('submit', submitNim);
|
||||||
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
|
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
|
||||||
el('#mb-update-close').addEventListener('click', () => el('#mb-update-dialog').close());
|
|
||||||
// Dismissing the modal (Close or Esc) stops streaming; the job runs on
|
|
||||||
// server-side and re-clicking Update returns a 409 if still in progress.
|
|
||||||
el('#mb-update-dialog').addEventListener('close', () => {
|
|
||||||
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
|
|
||||||
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
|
|
||||||
state.mb_update_in_flight = false;
|
|
||||||
renderServices();
|
|
||||||
});
|
|
||||||
el('#mb-logs-close').addEventListener('click', () => el('#mb-logs-dialog').close());
|
|
||||||
el('#mb-logs-refresh').addEventListener('click', loadMatrixBridgeLogs);
|
|
||||||
el('#open-connectivity').addEventListener('click', openConnectivityDialog);
|
el('#open-connectivity').addEventListener('click', openConnectivityDialog);
|
||||||
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
|
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
|
||||||
// Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on
|
// Wake-on-LAN buttons live on unreachable hardware cards; delegate.
|
||||||
// reachable ones) are rendered dynamically, so delegate from the grid.
|
|
||||||
el('#hardware-grid').addEventListener('click', (e) => {
|
el('#hardware-grid').addEventListener('click', (e) => {
|
||||||
const wbtn = e.target.closest('[data-wake]');
|
const btn = e.target.closest('[data-wake]');
|
||||||
if (wbtn) { wakeSpark(wbtn.dataset.wake); return; }
|
if (btn) wakeSpark(btn.dataset.wake);
|
||||||
const kbtn = e.target.closest('[data-ssh-key]');
|
|
||||||
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
|
|
||||||
});
|
});
|
||||||
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
|
|
||||||
el('#open-local').addEventListener('click', openLocalModelDialog);
|
|
||||||
el('#lock-release').addEventListener('click', releaseLock);
|
|
||||||
setupCatalogDialog();
|
setupCatalogDialog();
|
||||||
setupAdvancedDialog();
|
setupAdvancedDialog();
|
||||||
setupLocalModelDialog();
|
|
||||||
// Open WebUI link from /api/config
|
// Open WebUI link from /api/config
|
||||||
try {
|
try {
|
||||||
state.config = await fetchJSON('/api/config');
|
state.config = await fetchJSON('/api/config');
|
||||||
@@ -2248,22 +1867,19 @@ async function init() {
|
|||||||
} catch {}
|
} catch {}
|
||||||
setupDashboardTabs();
|
setupDashboardTabs();
|
||||||
setupEndpointCollapse();
|
setupEndpointCollapse();
|
||||||
// Fire the (SSH-backed) menu scan without awaiting — it self-renders a
|
await loadModels();
|
||||||
// "Scanning…" state and fills in when it returns, so a slow/unreachable
|
|
||||||
// cluster never blocks first paint. pollStatus() below paints the rest.
|
|
||||||
loadModels();
|
|
||||||
await pollStatus();
|
await pollStatus();
|
||||||
await renderServices();
|
await renderServices();
|
||||||
pollCoordination();
|
|
||||||
pollHardware();
|
pollHardware();
|
||||||
pollUpdates();
|
pollUpdates();
|
||||||
|
// Disk-status probe runs after first paint — slow over SSH and not blocking.
|
||||||
|
loadDiskStatus();
|
||||||
// Speech-model patches panel — slow over SSH, runs after first paint.
|
// Speech-model patches panel — slow over SSH, runs after first paint.
|
||||||
renderSpeechModels();
|
renderSpeechModels();
|
||||||
setInterval(pollStatus, 5000);
|
setInterval(pollStatus, 5000);
|
||||||
setInterval(pollCoordination, 5000); // swap lock + schedule registry
|
|
||||||
setInterval(pollHardware, 8000); // every 8s
|
setInterval(pollHardware, 8000); // every 8s
|
||||||
setInterval(pollUpdates, 300000); // every 5 min
|
setInterval(pollUpdates, 300000); // every 5 min
|
||||||
setInterval(loadModels, 60000); // every 60s — re-scan the Sparks for added/removed models
|
setInterval(loadDiskStatus, 60000); // every 60s — disk state changes rarely
|
||||||
setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
|
setInterval(renderSpeechModels, 120000); // every 2 min — patches change rarely
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
+9
-107
@@ -96,13 +96,6 @@
|
|||||||
</details>
|
</details>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="lock-banner" class="banner lock-banner hidden">
|
|
||||||
<span class="lock-icon" aria-hidden="true">🔒</span>
|
|
||||||
<span id="lock-text">GPU swap path reserved</span>
|
|
||||||
<span class="spacer"></span>
|
|
||||||
<button id="lock-release" class="btn small-btn">Release</button>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
|
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
|
||||||
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
|
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
|
||||||
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
|
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
|
||||||
@@ -171,37 +164,6 @@
|
|||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
|
|
||||||
<dialog id="mb-update-dialog" class="modal">
|
|
||||||
<form method="dialog" class="modal-form">
|
|
||||||
<h3 id="mb-update-title">Updating matrix-bridge…</h3>
|
|
||||||
<div class="phase-row">
|
|
||||||
<div class="phase" id="mb-update-phase">Starting…</div>
|
|
||||||
<span class="spacer"></span>
|
|
||||||
<span class="timer" id="mb-update-elapsed">0:00</span>
|
|
||||||
</div>
|
|
||||||
<details open>
|
|
||||||
<summary class="muted small">Log</summary>
|
|
||||||
<pre id="mb-update-log" class="log"></pre>
|
|
||||||
</details>
|
|
||||||
<div class="modal-actions">
|
|
||||||
<button type="button" id="mb-update-close" class="btn">Close</button>
|
|
||||||
</div>
|
|
||||||
</form>
|
|
||||||
</dialog>
|
|
||||||
|
|
||||||
<dialog id="mb-logs-dialog" class="modal">
|
|
||||||
<form method="dialog" class="modal-form">
|
|
||||||
<h3 id="mb-logs-title">matrix-bridge logs</h3>
|
|
||||||
<p class="muted small">Last 100 lines from <code>docker logs</code> on the Spark.</p>
|
|
||||||
<pre id="mb-logs-pre" class="log"></pre>
|
|
||||||
<div class="modal-actions">
|
|
||||||
<button type="button" id="mb-logs-refresh" class="btn">Refresh</button>
|
|
||||||
<span class="spacer"></span>
|
|
||||||
<button type="button" id="mb-logs-close" class="btn">Close</button>
|
|
||||||
</div>
|
|
||||||
</form>
|
|
||||||
</dialog>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="speech-models-panel" class="speech-models hidden">
|
<section id="speech-models-panel" class="speech-models hidden">
|
||||||
@@ -236,15 +198,13 @@
|
|||||||
<div class="section-header">
|
<div class="section-header">
|
||||||
<h2 class="section-title">LLM swap</h2>
|
<h2 class="section-title">LLM swap</h2>
|
||||||
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
||||||
<button id="open-local" class="btn small-btn">+ Add local model</button>
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<dialog id="catalog-dialog" class="modal">
|
<dialog id="catalog-dialog" class="modal">
|
||||||
<form method="dialog" class="modal-form" id="catalog-form">
|
<form method="dialog" class="modal-form" id="catalog-form">
|
||||||
<h3>Set up this model</h3>
|
<h3>Add downloaded model to catalog</h3>
|
||||||
<p class="muted small">This model is downloaded, but Spark Control needs to know how to launch it. We've guessed from the model's own files — confirm or adjust, and it's saved so you're never asked again.</p>
|
<p class="muted small">It will appear as a new card you can swap to. Knob values become its default launch flags — you can tweak later via the model's "Advanced" panel.</p>
|
||||||
<p id="cd-detected" class="muted small cd-detected hidden"></p>
|
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
||||||
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+" readonly></label>
|
|
||||||
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
|
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
|
||||||
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
|
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
|
||||||
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
|
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
|
||||||
@@ -265,70 +225,21 @@
|
|||||||
</fieldset>
|
</fieldset>
|
||||||
<div class="modal-actions">
|
<div class="modal-actions">
|
||||||
<button type="button" id="cd-cancel" class="btn">Cancel</button>
|
<button type="button" id="cd-cancel" class="btn">Cancel</button>
|
||||||
<button type="submit" id="cd-submit" class="btn primary">Save settings</button>
|
<button type="submit" class="btn primary">Add to catalog</button>
|
||||||
</div>
|
|
||||||
</form>
|
|
||||||
</dialog>
|
|
||||||
|
|
||||||
<dialog id="local-model-dialog" class="modal">
|
|
||||||
<form method="dialog" class="modal-form" id="local-model-form">
|
|
||||||
<h3>Add a local / fine-tuned model</h3>
|
|
||||||
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
|
|
||||||
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
|
||||||
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
|
|
||||||
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
|
|
||||||
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
|
|
||||||
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
|
|
||||||
<label class="modal-row"><span>Mode</span>
|
|
||||||
<select id="lm-mode">
|
|
||||||
<option value="solo">solo (Spark 1 only)</option>
|
|
||||||
<option value="cluster">cluster (both Sparks via Ray)</option>
|
|
||||||
</select>
|
|
||||||
</label>
|
|
||||||
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
|
|
||||||
<fieldset class="modal-fieldset">
|
|
||||||
<legend>Default launch knobs</legend>
|
|
||||||
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
|
|
||||||
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
|
|
||||||
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
|
|
||||||
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
|
|
||||||
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
|
|
||||||
</fieldset>
|
|
||||||
<div class="modal-actions">
|
|
||||||
<button type="button" id="lm-cancel" class="btn">Cancel</button>
|
|
||||||
<button type="submit" class="btn primary">Add local model</button>
|
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
|
|
||||||
<dialog id="disk-delete-dialog" class="modal">
|
<dialog id="disk-delete-dialog" class="modal">
|
||||||
<form method="dialog" class="modal-form">
|
<form method="dialog" class="modal-form">
|
||||||
<h3>Remove this model from the Sparks?</h3>
|
<h3>Delete model weights from disk?</h3>
|
||||||
<p id="dd-summary" class="muted small"></p>
|
<p id="dd-summary" class="muted small"></p>
|
||||||
<ul class="muted small dd-hosts" id="dd-hosts"></ul>
|
<ul class="muted small dd-hosts" id="dd-hosts"></ul>
|
||||||
<p class="muted small">This deletes the weights and removes the card from the menu. You can always download it again later (re-downloading restores its saved settings).</p>
|
<p class="muted small">This is reversible — you can re-download from the catalog at any time. The catalog entry stays intact.</p>
|
||||||
<p id="dd-error" class="muted small dd-error hidden"></p>
|
<p id="dd-error" class="muted small dd-error hidden"></p>
|
||||||
<div class="modal-actions">
|
<div class="modal-actions">
|
||||||
<button type="button" id="dd-cancel" class="btn">Cancel</button>
|
<button type="button" id="dd-cancel" class="btn">Cancel</button>
|
||||||
<button type="button" id="dd-confirm" class="btn danger">Remove from disk & menu</button>
|
<button type="button" id="dd-confirm" class="btn danger">Delete from disk</button>
|
||||||
</div>
|
|
||||||
</form>
|
|
||||||
</dialog>
|
|
||||||
|
|
||||||
<dialog id="sshkey-dialog" class="modal">
|
|
||||||
<form method="dialog" class="modal-form">
|
|
||||||
<h3 id="sshkey-title">SSH public key</h3>
|
|
||||||
<p id="sshkey-intro" class="muted small"></p>
|
|
||||||
<div class="sshkey-row">
|
|
||||||
<pre id="sshkey-value" class="snippet copyable" data-copy-self title="Click to copy"></pre>
|
|
||||||
<button type="button" class="icon-btn" data-copy="#sshkey-value" title="Copy public key" aria-label="Copy public key">
|
|
||||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
<p class="muted small">To let this Spark log in to another machine (e.g. your Mac), run this in a terminal <em>on that machine</em>:</p>
|
|
||||||
<pre id="sshkey-install" class="snippet copyable" data-copy-self title="Click to copy"></pre>
|
|
||||||
<div class="modal-actions">
|
|
||||||
<button type="button" id="sshkey-close" class="btn">Close</button>
|
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
@@ -355,12 +266,11 @@
|
|||||||
<div class="download-form" id="download-form">
|
<div class="download-form" id="download-form">
|
||||||
<label class="dl-row">
|
<label class="dl-row">
|
||||||
<span class="dl-label">HuggingFace repo</span>
|
<span class="dl-label">HuggingFace repo</span>
|
||||||
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off" list="dl-suggestions">
|
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
|
||||||
<datalist id="dl-suggestions"></datalist>
|
|
||||||
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
|
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
|
||||||
</label>
|
</label>
|
||||||
<div class="dl-help muted small">
|
<div class="dl-help muted small">
|
||||||
Type any repo, or pick a known one from the list. <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
|
<a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
|
||||||
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
|
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
|
||||||
</div>
|
</div>
|
||||||
<div class="dl-row">
|
<div class="dl-row">
|
||||||
@@ -403,14 +313,6 @@
|
|||||||
<section id="cards" class="cards"></section>
|
<section id="cards" class="cards"></section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="schedule-panel" class="schedule-panel hidden">
|
|
||||||
<div class="section-header">
|
|
||||||
<h2 class="section-title">Scheduled jobs</h2>
|
|
||||||
</div>
|
|
||||||
<p class="muted small">Registered by your own automation. Spark Control only displays these — it doesn't run them.</p>
|
|
||||||
<div id="schedule-list" class="schedule-list"></div>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section id="update-banner" class="update-banner hidden">
|
<section id="update-banner" class="update-banner hidden">
|
||||||
<div class="ub-context muted small">
|
<div class="ub-context muted small">
|
||||||
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
|
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
|
||||||
|
|||||||
@@ -74,42 +74,6 @@ main {
|
|||||||
}
|
}
|
||||||
.banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; }
|
.banner em { font-style: normal; background: rgba(245, 158, 11, 0.15); padding: 2px 6px; border-radius: 4px; }
|
||||||
|
|
||||||
/* GPU swap reservation (coordination layer) — informational, not a warning. */
|
|
||||||
.lock-banner {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 10px;
|
|
||||||
border-color: var(--info);
|
|
||||||
color: var(--info);
|
|
||||||
}
|
|
||||||
.lock-banner .lock-icon { font-size: 16px; }
|
|
||||||
.lock-banner strong { color: var(--text); }
|
|
||||||
.lock-banner .spacer { flex: 1; }
|
|
||||||
|
|
||||||
/* Scheduled-jobs panel — read-only view of what external automation registered. */
|
|
||||||
.schedule-panel { margin-top: 8px; }
|
|
||||||
.schedule-list {
|
|
||||||
display: grid;
|
|
||||||
grid-template-columns: repeat(auto-fill, minmax(240px, 1fr));
|
|
||||||
gap: 12px;
|
|
||||||
margin-top: 8px;
|
|
||||||
}
|
|
||||||
.schedule-item {
|
|
||||||
background: var(--surface);
|
|
||||||
border: 1px solid var(--border);
|
|
||||||
border-radius: var(--radius);
|
|
||||||
padding: 12px 14px;
|
|
||||||
}
|
|
||||||
.schedule-item .name { font-weight: 600; margin-bottom: 4px; }
|
|
||||||
.schedule-item code {
|
|
||||||
background: var(--surface-2);
|
|
||||||
border: 1px solid var(--border);
|
|
||||||
border-radius: 4px;
|
|
||||||
padding: 1px 5px;
|
|
||||||
font-size: 12px;
|
|
||||||
}
|
|
||||||
.schedule-item .desc { margin-top: 6px; color: var(--muted); font-size: 13px; }
|
|
||||||
|
|
||||||
/* ===== Endpoint panel ===== */
|
/* ===== Endpoint panel ===== */
|
||||||
|
|
||||||
.endpoint-panel {
|
.endpoint-panel {
|
||||||
@@ -410,12 +374,6 @@ main {
|
|||||||
}
|
}
|
||||||
.hw-card .head .name { font-weight: 600; font-size: 15px; }
|
.hw-card .head .name { font-weight: 600; font-size: 15px; }
|
||||||
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
|
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
|
||||||
/* WireGuard "VPN <ip>" badge in the meta line — accent (green) = on a tunnel. */
|
|
||||||
.hw-card .head .meta .wg-badge { color: var(--accent); font-weight: 600; cursor: help; }
|
|
||||||
/* Copy-this-Spark's-ssh-key button pins to the top-right corner; meta keeps
|
|
||||||
its margin-left:auto so name/meta/button read left→right→corner. */
|
|
||||||
.hw-card .head .ssh-key-btn { align-self: flex-start; padding: 3px 6px; }
|
|
||||||
.hw-card .head .ssh-key-btn svg { width: 13px; height: 13px; }
|
|
||||||
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
|
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
|
||||||
.hw-card.unreachable .name { color: var(--error); }
|
.hw-card.unreachable .name { color: var(--error); }
|
||||||
.hw-card.unreachable ol { color: var(--muted); }
|
.hw-card.unreachable ol { color: var(--muted); }
|
||||||
@@ -429,10 +387,6 @@ main {
|
|||||||
}
|
}
|
||||||
.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
|
.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
|
||||||
.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
|
.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
|
||||||
/* SSH-key dialog: key line beside its copy button; long key wraps rather than scrolls. */
|
|
||||||
.sshkey-row { display: flex; align-items: flex-start; gap: 8px; }
|
|
||||||
.sshkey-row .snippet { flex: 1; margin: 0; white-space: pre-wrap; word-break: break-all; }
|
|
||||||
#sshkey-install { white-space: pre-wrap; word-break: break-all; }
|
|
||||||
|
|
||||||
.connectivity-content {
|
.connectivity-content {
|
||||||
max-height: 360px;
|
max-height: 360px;
|
||||||
@@ -562,12 +516,10 @@ main {
|
|||||||
#dl-log-details { margin-top: 12px; }
|
#dl-log-details { margin-top: 12px; }
|
||||||
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
|
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
|
||||||
|
|
||||||
/* ===== NIM install + matrix-bridge dialogs ===== */
|
/* ===== NIM install dialog ===== */
|
||||||
|
|
||||||
.modal#nim-dialog,
|
.modal#nim-dialog,
|
||||||
.modal#nim-progress-dialog,
|
.modal#nim-progress-dialog { max-width: 640px; }
|
||||||
.modal#mb-update-dialog,
|
|
||||||
.modal#mb-logs-dialog { max-width: 640px; }
|
|
||||||
.nim-grid {
|
.nim-grid {
|
||||||
display: grid;
|
display: grid;
|
||||||
gap: 8px;
|
gap: 8px;
|
||||||
@@ -730,7 +682,6 @@ main {
|
|||||||
.card .repo a { color: inherit; text-decoration: none; }
|
.card .repo a { color: inherit; text-decoration: none; }
|
||||||
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
||||||
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
||||||
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
|
|
||||||
.tag {
|
.tag {
|
||||||
background: var(--surface-2);
|
background: var(--surface-2);
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
@@ -775,15 +726,8 @@ main {
|
|||||||
.card .adv-btn,
|
.card .adv-btn,
|
||||||
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
||||||
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
||||||
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
|
||||||
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
||||||
.tag.setup-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
|
||||||
.card.needs-setup { border-style: dashed; }
|
|
||||||
.card-actions .btn[data-setup-key] { flex: 1; }
|
|
||||||
.empty-menu { grid-column: 1 / -1; padding: 28px 16px; text-align: center; border: 1px dashed var(--border); border-radius: 10px; }
|
|
||||||
.cd-detected { padding: 8px 10px; border: 1px solid var(--border); border-radius: 8px; background: rgba(255,255,255,0.02); }
|
|
||||||
.cd-detected code { word-break: break-all; }
|
|
||||||
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
||||||
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
|
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
|
||||||
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
|
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
|
||||||
|
|||||||
+2
-25
@@ -6,9 +6,7 @@ from datetime import datetime, timezone
|
|||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .coordination import WebhookNotifier, build_webhook_payload
|
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_run, ssh_stream, StreamHandle
|
from .ssh import ssh_run, ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
@@ -34,15 +32,9 @@ class SwapJob:
|
|||||||
|
|
||||||
|
|
||||||
class SwapManager:
|
class SwapManager:
|
||||||
def __init__(
|
def __init__(self, settings: Settings, catalog: Catalog) -> None:
|
||||||
self,
|
|
||||||
settings: Settings,
|
|
||||||
catalog: Catalog,
|
|
||||||
notifier: Optional[WebhookNotifier] = None,
|
|
||||||
) -> None:
|
|
||||||
self.settings = settings
|
self.settings = settings
|
||||||
self.catalog = catalog
|
self.catalog = catalog
|
||||||
self.notifier = notifier
|
|
||||||
self.lock = asyncio.Lock()
|
self.lock = asyncio.Lock()
|
||||||
self.jobs: dict[str, SwapJob] = {}
|
self.jobs: dict[str, SwapJob] = {}
|
||||||
self.current_job_id: Optional[str] = None
|
self.current_job_id: Optional[str] = None
|
||||||
@@ -85,21 +77,6 @@ class SwapManager:
|
|||||||
job.finished_at = datetime.now(timezone.utc).isoformat()
|
job.finished_at = datetime.now(timezone.utc).isoformat()
|
||||||
if self.current_job_id == job.id:
|
if self.current_job_id == job.id:
|
||||||
self.current_job_id = None
|
self.current_job_id = None
|
||||||
# Outside the swap lock (so a webhook POST can't stall a queued swap) and
|
|
||||||
# only for real swaps — a dry run never changes the running model. A
|
|
||||||
# webhook failure is logged inside fire(), never raised.
|
|
||||||
if self.notifier is not None and self.notifier.enabled and not job.dry_run:
|
|
||||||
event = "swap_complete" if job.state == "ready" else "swap_failed"
|
|
||||||
await self.notifier.fire(event, build_webhook_payload(
|
|
||||||
event=event,
|
|
||||||
job_id=job.id,
|
|
||||||
model_key=job.model_key,
|
|
||||||
state=job.state,
|
|
||||||
returncode=job.returncode,
|
|
||||||
started_at=job.started_at,
|
|
||||||
finished_at=job.finished_at,
|
|
||||||
dry_run=job.dry_run,
|
|
||||||
))
|
|
||||||
|
|
||||||
async def _do(self, job: SwapJob) -> None:
|
async def _do(self, job: SwapJob) -> None:
|
||||||
model = self.catalog.models[job.model_key]
|
model = self.catalog.models[job.model_key]
|
||||||
@@ -135,7 +112,7 @@ class SwapManager:
|
|||||||
|
|
||||||
# Step 3: tail logs until the ready marker (or timeout)
|
# Step 3: tail logs until the ready marker (or timeout)
|
||||||
job.state = "tailing"
|
job.state = "tailing"
|
||||||
tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
|
tail_cmd = "docker logs -f --tail 50 vllm_node"
|
||||||
job.append(f"$ {tail_cmd}")
|
job.append(f"$ {tail_cmd}")
|
||||||
timeout = max(model.expected_ready_seconds * 2, 600)
|
timeout = max(model.expected_ready_seconds * 2, 600)
|
||||||
handle = StreamHandle()
|
handle = StreamHandle()
|
||||||
|
|||||||
@@ -22,7 +22,6 @@ from typing import Any
|
|||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
from .shellsafe import quote_arg
|
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
@@ -115,7 +114,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
|
|||||||
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
||||||
# reads from stdin to avoid shell-escaping the args themselves.
|
# reads from stdin to avoid shell-escaping the args themselves.
|
||||||
cmd = (
|
cmd = (
|
||||||
f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
|
f"echo '{payload}' | docker exec -i vllm_node python3 -c "
|
||||||
+ shlex.quote(_VALIDATOR_SCRIPT)
|
+ shlex.quote(_VALIDATOR_SCRIPT)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
+37
-37
@@ -1,14 +1,9 @@
|
|||||||
# spark-control launch recipes
|
# spark-control model catalog
|
||||||
#
|
#
|
||||||
# These are NOT the dashboard menu. The menu is whatever is actually downloaded
|
# Edit this file (or override at runtime via the StartOS "Edit Model Catalog"
|
||||||
# on the Sparks — Spark Control scans the Hugging Face cache on each load and
|
# action) to add or change available models.
|
||||||
# shows what it finds. These entries are launch *recipes*: matched to an on-disk
|
|
||||||
# model by `repo`, they say HOW to launch it. A downloaded model with no recipe
|
|
||||||
# here shows up as "needs setup", and the dashboard infers + saves one on first
|
|
||||||
# use (from the model's own config.json). Add a recipe to make a known model
|
|
||||||
# launch correctly the moment it's downloaded, with no setup prompt.
|
|
||||||
#
|
#
|
||||||
# Each recipe produces this command on Spark 1:
|
# Each model entry produces this command on Spark 1:
|
||||||
# cd ~/spark-vllm-docker
|
# cd ~/spark-vllm-docker
|
||||||
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
|
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> \
|
||||||
# --port=<defaults.port> --host=<defaults.host> <vllm_args...>
|
# --port=<defaults.port> --host=<defaults.host> <vllm_args...>
|
||||||
@@ -59,34 +54,6 @@ models:
|
|||||||
- --enable-prefix-caching
|
- --enable-prefix-caching
|
||||||
- --kv-cache-dtype=fp8
|
- --kv-cache-dtype=fp8
|
||||||
|
|
||||||
gemma4-26b:
|
|
||||||
display_name: "Gemma 4 26B-A4B (vision, light)"
|
|
||||||
description: >-
|
|
||||||
Lighter, faster sibling of the Gemma 4 31B above: a Mixture-of-Experts
|
|
||||||
model with 26B total parameters but only ~4B active per token, so it
|
|
||||||
generates quickly. Takes images as well as text (good for tasks like
|
|
||||||
reading a business card into structured text). Reasoning is a bit
|
|
||||||
shallower than the dense 31B. Runs solo on one Spark.
|
|
||||||
repo: nvidia/Gemma-4-26B-A4B-NVFP4
|
|
||||||
size_gb: 17
|
|
||||||
mode: solo
|
|
||||||
capabilities: [vision, reasoning, tools]
|
|
||||||
expected_ready_seconds: 240
|
|
||||||
vllm_args:
|
|
||||||
- --gpu-memory-utilization=0.8
|
|
||||||
- --max-model-len=32768
|
|
||||||
- --max-num-batched-tokens=16384
|
|
||||||
- --reasoning-parser=gemma4
|
|
||||||
- --tool-call-parser=gemma4
|
|
||||||
- --enable-auto-tool-choice
|
|
||||||
# MoE backend: research found this model's expert layers fall back to
|
|
||||||
# 'marlin' on GB10 (the fast flashinfer_cutlass path errors on sm_121).
|
|
||||||
# If a swap fails to start, this flag is the first thing to flip.
|
|
||||||
- --moe_backend=marlin
|
|
||||||
- --load-format=fastsafetensors
|
|
||||||
- --enable-prefix-caching
|
|
||||||
- --kv-cache-dtype=fp8
|
|
||||||
|
|
||||||
qwen36:
|
qwen36:
|
||||||
display_name: "Qwen3.6 35B-A3B (daily driver)"
|
display_name: "Qwen3.6 35B-A3B (daily driver)"
|
||||||
description: >-
|
description: >-
|
||||||
@@ -107,3 +74,36 @@ models:
|
|||||||
- --load-format=fastsafetensors
|
- --load-format=fastsafetensors
|
||||||
- --enable-prefix-caching
|
- --enable-prefix-caching
|
||||||
- --kv-cache-dtype=fp8
|
- --kv-cache-dtype=fp8
|
||||||
|
|
||||||
|
qwen3-235b-fp8:
|
||||||
|
display_name: "Qwen3 235B-A22B FP8 (legacy)"
|
||||||
|
description: >-
|
||||||
|
Earlier generation of the Qwen 235B family in native FP8 precision.
|
||||||
|
Runs across both Sparks. Mostly superseded by Qwen3-VL above; keep
|
||||||
|
around for text-only baseline comparisons.
|
||||||
|
repo: Qwen/Qwen3-235B-A22B-FP8
|
||||||
|
size_gb: 220
|
||||||
|
mode: cluster
|
||||||
|
capabilities: []
|
||||||
|
expected_ready_seconds: 360
|
||||||
|
vllm_args:
|
||||||
|
- --gpu-memory-utilization=0.7
|
||||||
|
- -tp=2
|
||||||
|
- --distributed-executor-backend=ray
|
||||||
|
- --max-model-len=32768
|
||||||
|
|
||||||
|
qwen25-72b:
|
||||||
|
display_name: "Qwen2.5 72B (legacy)"
|
||||||
|
description: >-
|
||||||
|
Last-generation 72B dense model. Cluster mode required due to size.
|
||||||
|
Kept for compatibility and baseline comparison against newer Qwens.
|
||||||
|
repo: Qwen/Qwen2.5-72B-Instruct
|
||||||
|
size_gb: 145
|
||||||
|
mode: cluster
|
||||||
|
capabilities: []
|
||||||
|
expected_ready_seconds: 360
|
||||||
|
vllm_args:
|
||||||
|
- --gpu-memory-utilization=0.7
|
||||||
|
- -tp=2
|
||||||
|
- --distributed-executor-backend=ray
|
||||||
|
- --max-model-len=32768
|
||||||
|
|||||||
@@ -12,12 +12,6 @@ dependencies = [
|
|||||||
"python-multipart>=0.0.9",
|
"python-multipart>=0.0.9",
|
||||||
]
|
]
|
||||||
|
|
||||||
[project.optional-dependencies]
|
|
||||||
dev = ["pytest>=8"]
|
|
||||||
|
|
||||||
[tool.pytest.ini_options]
|
|
||||||
testpaths = ["tests"]
|
|
||||||
|
|
||||||
[build-system]
|
[build-system]
|
||||||
requires = ["setuptools>=68"]
|
requires = ["setuptools>=68"]
|
||||||
build-backend = "setuptools.build_meta"
|
build-backend = "setuptools.build_meta"
|
||||||
|
|||||||
@@ -1,17 +0,0 @@
|
|||||||
"""Shared pytest setup.
|
|
||||||
|
|
||||||
These suites are pure/offline — they exercise pure functions and never touch the
|
|
||||||
Sparks, /data, or the network. We still pin the env vars the app modules expect
|
|
||||||
(documented in docs/guides/fastapi-image.md) to tmp paths so importing them can
|
|
||||||
never write to the container-only /data path.
|
|
||||||
"""
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
# Let `import app...` resolve whether or not the package is pip-installed.
|
|
||||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
|
||||||
|
|
||||||
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
|
|
||||||
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
|
|
||||||
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
|
|
||||||
@@ -1,201 +0,0 @@
|
|||||||
"""Coordination layer: swap lock lifecycle/expiry, schedule registry CRUD, and
|
|
||||||
the webhook payload+signature. All offline — the lock takes an injectable `now`
|
|
||||||
so expiry is tested without sleeping, and the webhook is exercised only on the
|
|
||||||
disabled (no-network) path plus its pure payload/signature helpers.
|
|
||||||
"""
|
|
||||||
import asyncio
|
|
||||||
from datetime import datetime, timedelta, timezone
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
from app.coordination import (
|
|
||||||
LOCK_TTL_MAX,
|
|
||||||
LOCK_TTL_MIN,
|
|
||||||
LockHeld,
|
|
||||||
ScheduleRegistry,
|
|
||||||
SwapLockManager,
|
|
||||||
WebhookNotifier,
|
|
||||||
build_webhook_payload,
|
|
||||||
sign_payload,
|
|
||||||
valid_schedule_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
T0 = datetime(2026, 6, 17, 12, 0, 0, tzinfo=timezone.utc)
|
|
||||||
|
|
||||||
|
|
||||||
# ----------------------------------------------------------------- swap lock ----
|
|
||||||
|
|
||||||
def test_acquire_free_lock_returns_token_and_status_held():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
lock = mgr.acquire("openclaw", ttl_seconds=60, note="daily vol", now=T0)
|
|
||||||
assert lock.token
|
|
||||||
st = mgr.status(now=T0)
|
|
||||||
assert st["held"] is True
|
|
||||||
assert st["holder"] == "openclaw"
|
|
||||||
assert st["note"] == "daily vol"
|
|
||||||
assert st["seconds_remaining"] == 60
|
|
||||||
assert "token" not in st # public view never leaks the token
|
|
||||||
|
|
||||||
|
|
||||||
def test_acquire_requires_holder():
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
SwapLockManager().acquire(" ", now=T0)
|
|
||||||
|
|
||||||
|
|
||||||
def test_acquire_held_by_other_raises_lockheld_with_state():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
|
||||||
with pytest.raises(LockHeld) as ei:
|
|
||||||
mgr.acquire("johnny5", ttl_seconds=60, now=T0)
|
|
||||||
assert ei.value.state["holder"] == "openclaw"
|
|
||||||
|
|
||||||
|
|
||||||
def test_reacquire_with_token_extends_and_keeps_token():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
first = mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
|
||||||
later = T0 + timedelta(seconds=30)
|
|
||||||
second = mgr.acquire("openclaw", ttl_seconds=60, token=first.token, now=later)
|
|
||||||
assert second.token == first.token
|
|
||||||
# window extended from the later moment, not the original
|
|
||||||
assert mgr.status(now=later)["seconds_remaining"] == 60
|
|
||||||
assert second.acquired_at == first.acquired_at # acquired_at preserved
|
|
||||||
|
|
||||||
|
|
||||||
def test_reacquire_without_token_is_refused_even_for_same_holder_name():
|
|
||||||
# Holder name is descriptive, not a secret — matching it must not grant access.
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
|
||||||
with pytest.raises(LockHeld):
|
|
||||||
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
|
||||||
|
|
||||||
|
|
||||||
def test_ttl_is_clamped():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
mgr.acquire("a", ttl_seconds=0, now=T0)
|
|
||||||
assert mgr.status(now=T0)["seconds_remaining"] == LOCK_TTL_MIN
|
|
||||||
mgr2 = SwapLockManager()
|
|
||||||
mgr2.acquire("b", ttl_seconds=10**9, now=T0)
|
|
||||||
assert mgr2.status(now=T0)["seconds_remaining"] == LOCK_TTL_MAX
|
|
||||||
|
|
||||||
|
|
||||||
def test_lock_expires_and_clears_lazily():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
|
|
||||||
after = T0 + timedelta(seconds=11)
|
|
||||||
assert mgr.status(now=after) == {"held": False}
|
|
||||||
assert mgr.verify(tok, now=after) is False
|
|
||||||
# an expired lock is free to re-take by anyone
|
|
||||||
mgr.acquire("johnny5", ttl_seconds=10, now=after)
|
|
||||||
assert mgr.status(now=after)["holder"] == "johnny5"
|
|
||||||
|
|
||||||
|
|
||||||
def test_verify_matches_only_active_token():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
|
|
||||||
assert mgr.verify(tok, now=T0) is True
|
|
||||||
assert mgr.verify("nope", now=T0) is False
|
|
||||||
assert mgr.verify(None, now=T0) is False
|
|
||||||
|
|
||||||
|
|
||||||
def test_release_requires_token_then_frees():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
tok = mgr.acquire("openclaw", ttl_seconds=60, now=T0).token
|
|
||||||
with pytest.raises(PermissionError):
|
|
||||||
mgr.release("wrong", now=T0)
|
|
||||||
assert mgr.release(tok, now=T0) is True
|
|
||||||
assert mgr.status(now=T0) == {"held": False}
|
|
||||||
|
|
||||||
|
|
||||||
def test_force_release_skips_token_and_release_of_free_lock_is_false():
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
mgr.acquire("openclaw", ttl_seconds=60, now=T0)
|
|
||||||
assert mgr.release(force=True, now=T0) is True
|
|
||||||
assert mgr.release(force=True, now=T0) is False # nothing held now
|
|
||||||
|
|
||||||
|
|
||||||
def test_is_blocked_by_is_the_swap_gate():
|
|
||||||
# Mirrors the single-read decision the /api/swap endpoint makes.
|
|
||||||
mgr = SwapLockManager()
|
|
||||||
assert mgr.is_blocked_by(None, now=T0) is None # free lock blocks nobody
|
|
||||||
tok = mgr.acquire("openclaw", ttl_seconds=10, now=T0).token
|
|
||||||
blocked = mgr.is_blocked_by(None, now=T0) # no token -> blocked
|
|
||||||
assert blocked is not None and blocked["holder"] == "openclaw"
|
|
||||||
assert mgr.is_blocked_by("wrong", now=T0) is not None # wrong token -> blocked
|
|
||||||
assert mgr.is_blocked_by(tok, now=T0) is None # holder's token -> allowed
|
|
||||||
# At/after expiry the gate is open even without a token (the bug a separate
|
|
||||||
# status()+verify() pair would get wrong).
|
|
||||||
assert mgr.is_blocked_by(None, now=T0 + timedelta(seconds=11)) is None
|
|
||||||
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------- webhook ----
|
|
||||||
|
|
||||||
def test_build_webhook_payload_shape():
|
|
||||||
p = build_webhook_payload(
|
|
||||||
event="swap_complete", job_id="abc123", model_key="gemma",
|
|
||||||
state="ready", returncode=0, started_at="t0", finished_at="t1",
|
|
||||||
dry_run=False,
|
|
||||||
)
|
|
||||||
assert p == {
|
|
||||||
"event": "swap_complete", "job_id": "abc123", "model_key": "gemma",
|
|
||||||
"state": "ready", "returncode": 0, "started_at": "t0",
|
|
||||||
"finished_at": "t1", "dry_run": False,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def test_sign_payload_is_deterministic_and_prefixed():
|
|
||||||
body = b'{"event":"swap_complete"}'
|
|
||||||
sig = sign_payload("s3cr3t", body)
|
|
||||||
assert sig.startswith("sha256=")
|
|
||||||
assert sig == sign_payload("s3cr3t", body)
|
|
||||||
assert sig != sign_payload("other", body)
|
|
||||||
|
|
||||||
|
|
||||||
def test_disabled_webhook_fire_is_noop():
|
|
||||||
n = WebhookNotifier("", "")
|
|
||||||
assert n.enabled is False
|
|
||||||
# Must not attempt any network call or raise when no URL is configured.
|
|
||||||
assert asyncio.run(n.fire("swap_complete", {"x": 1})) is None
|
|
||||||
|
|
||||||
|
|
||||||
# --------------------------------------------------------- schedule registry ----
|
|
||||||
|
|
||||||
def test_register_and_list_schedule():
|
|
||||||
reg = ScheduleRegistry()
|
|
||||||
e = reg.register(name="Daily Vol", owner="openclaw", cron="0 6 * * *")
|
|
||||||
assert e.id and e.registered_at and e.updated_at
|
|
||||||
listed = reg.list()
|
|
||||||
assert len(listed) == 1 and listed[0]["name"] == "Daily Vol"
|
|
||||||
|
|
||||||
|
|
||||||
def test_register_with_id_updates_in_place():
|
|
||||||
reg = ScheduleRegistry()
|
|
||||||
reg.register(name="Daily Vol", id="dv", owner="openclaw", cron="0 6 * * *")
|
|
||||||
reg.register(name="Daily Vol v2", id="dv", owner="openclaw", cron="0 7 * * *")
|
|
||||||
listed = reg.list()
|
|
||||||
assert len(listed) == 1
|
|
||||||
assert listed[0]["name"] == "Daily Vol v2" and listed[0]["cron"] == "0 7 * * *"
|
|
||||||
|
|
||||||
|
|
||||||
def test_register_requires_name_and_validates_id():
|
|
||||||
reg = ScheduleRegistry()
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
reg.register(name=" ")
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
reg.register(name="ok", id="bad id; rm -rf")
|
|
||||||
|
|
||||||
|
|
||||||
def test_delete_schedule():
|
|
||||||
reg = ScheduleRegistry()
|
|
||||||
reg.register(name="Daily Vol", id="dv")
|
|
||||||
assert reg.delete("dv") is True
|
|
||||||
assert reg.delete("dv") is False
|
|
||||||
assert reg.list() == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_valid_schedule_id():
|
|
||||||
assert valid_schedule_id("daily-vol")
|
|
||||||
assert valid_schedule_id("a.b_c-1")
|
|
||||||
assert not valid_schedule_id("")
|
|
||||||
assert not valid_schedule_id("../etc")
|
|
||||||
assert not valid_schedule_id("has space")
|
|
||||||
assert not valid_schedule_id("x" * 65)
|
|
||||||
@@ -1,190 +0,0 @@
|
|||||||
"""Disk-driven menu helpers: cache-dir parsing + launch-recipe inference.
|
|
||||||
|
|
||||||
All offline — pure functions over a fake cache listing and fake config.json
|
|
||||||
dicts. The SSH scan, the menu merge, and the suggest endpoint that wire these
|
|
||||||
together are exercised by hand against the live cluster (mock-heavy unit tests of
|
|
||||||
those would test the mocks).
|
|
||||||
"""
|
|
||||||
import asyncio
|
|
||||||
|
|
||||||
from app import discovery
|
|
||||||
from app.config import Settings
|
|
||||||
from app.disk import DiskStatus, cache_dirname_to_repo, parse_cache_listing
|
|
||||||
from app.discovery import repo_to_key, infer_recipe, _detect_family
|
|
||||||
from app.models import load_catalog
|
|
||||||
|
|
||||||
|
|
||||||
# ---- cache dirname <-> repo ----
|
|
||||||
|
|
||||||
def test_cache_dirname_to_repo_roundtrip():
|
|
||||||
assert cache_dirname_to_repo("models--RedHatAI--Qwen3.6-35B-A3B-NVFP4") == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
|
|
||||||
|
|
||||||
|
|
||||||
def test_cache_dirname_name_with_double_dash():
|
|
||||||
# The org is the first segment; everything after is the name (single '/').
|
|
||||||
assert cache_dirname_to_repo("models--org--weird--name") == "org/weird--name"
|
|
||||||
|
|
||||||
|
|
||||||
def test_cache_dirname_rejects_non_model_dirs():
|
|
||||||
assert cache_dirname_to_repo("datasets--foo--bar") is None
|
|
||||||
assert cache_dirname_to_repo("models--onlyorg") is None
|
|
||||||
assert cache_dirname_to_repo("random") is None
|
|
||||||
|
|
||||||
|
|
||||||
# ---- parse_cache_listing ----
|
|
||||||
|
|
||||||
def test_parse_cache_listing_complete_and_incomplete():
|
|
||||||
out = (
|
|
||||||
"20000000000|1|models--RedHatAI--Qwen3.6-35B-A3B-NVFP4\n"
|
|
||||||
"5000000000|0|models--some--half-downloaded\n"
|
|
||||||
"\n"
|
|
||||||
"garbage line with no pipes\n"
|
|
||||||
"123|1|not-a-model-dir\n"
|
|
||||||
)
|
|
||||||
items = parse_cache_listing(out)
|
|
||||||
assert items == [
|
|
||||||
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20000000000, True),
|
|
||||||
("some/half-downloaded", 5000000000, False),
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def test_parse_cache_listing_bad_size_defaults_zero():
|
|
||||||
items = parse_cache_listing("notanumber|1|models--a--b")
|
|
||||||
assert items == [("a/b", 0, True)]
|
|
||||||
|
|
||||||
|
|
||||||
# ---- repo_to_key ----
|
|
||||||
|
|
||||||
def test_repo_to_key_is_url_safe_and_stable():
|
|
||||||
assert repo_to_key("RedHatAI/Qwen3.6-35B-A3B-NVFP4") == "redhatai-qwen3-6-35b-a3b-nvfp4"
|
|
||||||
# Idempotent enough to be a stable id across calls.
|
|
||||||
assert repo_to_key("nvidia/Gemma-4-26B-A4B-NVFP4") == "nvidia-gemma-4-26b-a4b-nvfp4"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- family detection ----
|
|
||||||
|
|
||||||
def test_detect_qwen3_moe():
|
|
||||||
cfg = {"architectures": ["Qwen3MoeForCausalLM"], "model_type": "qwen3_moe", "num_experts": 128}
|
|
||||||
label, flags, caps = _detect_family(cfg)
|
|
||||||
assert "--reasoning-parser=qwen3" in flags
|
|
||||||
assert "--moe_backend=flashinfer_cutlass" in flags
|
|
||||||
assert "reasoning" in caps
|
|
||||||
assert "MoE" in label
|
|
||||||
|
|
||||||
|
|
||||||
def test_detect_gemma_moe_uses_marlin():
|
|
||||||
cfg = {"architectures": ["Gemma4MoeForConditionalGeneration"], "model_type": "gemma4_moe", "num_local_experts": 8}
|
|
||||||
label, flags, caps = _detect_family(cfg)
|
|
||||||
assert "--reasoning-parser=gemma4" in flags
|
|
||||||
assert "--tool-call-parser=gemma4" in flags
|
|
||||||
assert "--moe_backend=marlin" in flags # NOT flashinfer_cutlass — GB10 footgun
|
|
||||||
assert "vision" in caps # ConditionalGeneration => multimodal
|
|
||||||
assert "tools" in caps
|
|
||||||
|
|
||||||
|
|
||||||
def test_detect_generic_has_no_family_flags():
|
|
||||||
label, flags, caps = _detect_family({"architectures": ["LlamaForCausalLM"], "model_type": "llama"})
|
|
||||||
assert flags == []
|
|
||||||
assert label == "Generic"
|
|
||||||
|
|
||||||
|
|
||||||
def test_detect_vision_from_config_keys():
|
|
||||||
_, _, caps = _detect_family({"model_type": "qwen3", "vision_config": {"x": 1}})
|
|
||||||
assert "vision" in caps
|
|
||||||
|
|
||||||
|
|
||||||
# ---- infer_recipe (the prefill the setup form receives) ----
|
|
||||||
|
|
||||||
def test_infer_recipe_solo_small_model():
|
|
||||||
cfg = {"architectures": ["Qwen3ForCausalLM"], "model_type": "qwen3"}
|
|
||||||
rec = infer_recipe("RedHatAI/Qwen3.6-35B-A3B-NVFP4", cfg, total_bytes=20_000_000_000, on_host_count=1)
|
|
||||||
assert rec["mode"] == "solo"
|
|
||||||
assert rec["key"] == "redhatai-qwen3-6-35b-a3b-nvfp4"
|
|
||||||
assert rec["repo"] == "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
|
|
||||||
assert "--reasoning-parser=qwen3" in rec["vllm_args"]
|
|
||||||
assert "-tp=2" not in rec["vllm_args"]
|
|
||||||
assert rec["knobs"]["kv_cache_dtype"] == "fp8"
|
|
||||||
|
|
||||||
|
|
||||||
def test_infer_recipe_cluster_when_on_both_hosts():
|
|
||||||
rec = infer_recipe("org/big", {}, total_bytes=10_000_000_000, on_host_count=2)
|
|
||||||
assert rec["mode"] == "cluster"
|
|
||||||
assert "-tp=2" in rec["vllm_args"]
|
|
||||||
assert "--distributed-executor-backend=ray" in rec["vllm_args"]
|
|
||||||
assert rec["knobs"]["gpu_memory_utilization"] == 0.7
|
|
||||||
|
|
||||||
|
|
||||||
def test_infer_recipe_cluster_when_too_big_for_one_spark():
|
|
||||||
rec = infer_recipe("org/huge", {}, total_bytes=200_000_000_000, on_host_count=1)
|
|
||||||
assert rec["mode"] == "cluster"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- build_menu merge (disk scan ∪ recipes) ----
|
|
||||||
|
|
||||||
def _both_spark_settings(monkeypatch) -> Settings:
|
|
||||||
for k in ("SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER"):
|
|
||||||
monkeypatch.delenv(k, raising=False)
|
|
||||||
monkeypatch.setenv("SPARK1_HOST", "1.1.1.1")
|
|
||||||
monkeypatch.setenv("SPARK1_USER", "u")
|
|
||||||
monkeypatch.setenv("SPARK2_HOST", "2.2.2.2")
|
|
||||||
monkeypatch.setenv("SPARK2_USER", "u")
|
|
||||||
return Settings.from_env()
|
|
||||||
|
|
||||||
|
|
||||||
def test_build_menu_merges_recipe_discovered_and_hides_incomplete(monkeypatch):
|
|
||||||
cat = load_catalog("models.yaml") # bundled recipes incl. qwen36 + gemma4
|
|
||||||
settings = _both_spark_settings(monkeypatch)
|
|
||||||
|
|
||||||
async def fake_list(host, user, s):
|
|
||||||
if host == "1.1.1.1":
|
|
||||||
return [
|
|
||||||
("RedHatAI/Qwen3.6-35B-A3B-NVFP4", 20_000_000_000, True), # recipe match
|
|
||||||
("someorg/mystery-7B", 7_000_000_000, True), # needs setup
|
|
||||||
("broken/half", 1_000_000_000, False), # incomplete -> hidden
|
|
||||||
]
|
|
||||||
return [] # spark2 empty
|
|
||||||
|
|
||||||
async def fake_probe(repo, mode, s, *, local_path=None):
|
|
||||||
return DiskStatus(repo=local_path or repo, on_disk=False, total_bytes=0, per_host=[])
|
|
||||||
|
|
||||||
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
|
|
||||||
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
|
|
||||||
|
|
||||||
menu = asyncio.run(discovery.build_menu(settings, cat))
|
|
||||||
|
|
||||||
# Recipe-matched: keyed by recipe key, ready (not needs_setup), real size.
|
|
||||||
assert "qwen36" in menu
|
|
||||||
assert menu["qwen36"]["needs_setup"] is False
|
|
||||||
assert menu["qwen36"]["total_bytes"] == 20_000_000_000
|
|
||||||
|
|
||||||
# Discovered-without-recipe: slug key, needs_setup.
|
|
||||||
slug = repo_to_key("someorg/mystery-7B")
|
|
||||||
assert menu[slug]["needs_setup"] is True
|
|
||||||
|
|
||||||
# Incomplete download is filtered out entirely.
|
|
||||||
assert all("half" not in k for k in menu)
|
|
||||||
|
|
||||||
# A recipe with nothing on disk (e.g. gemma4) must NOT appear — the menu is the disk.
|
|
||||||
assert "gemma4" not in menu
|
|
||||||
|
|
||||||
|
|
||||||
def test_build_menu_sums_cluster_model_across_both_sparks(monkeypatch):
|
|
||||||
cat = load_catalog("models.yaml")
|
|
||||||
settings = _both_spark_settings(monkeypatch)
|
|
||||||
|
|
||||||
async def fake_list(host, user, s):
|
|
||||||
# Same repo present on BOTH Sparks — one card, sizes summed (not two cards).
|
|
||||||
return [("org/sharded-235B", 70_000_000_000, True)]
|
|
||||||
|
|
||||||
async def fake_probe(repo, mode, s, *, local_path=None):
|
|
||||||
return DiskStatus(repo=repo, on_disk=False, total_bytes=0, per_host=[])
|
|
||||||
|
|
||||||
monkeypatch.setattr(discovery, "list_cached_models", fake_list)
|
|
||||||
monkeypatch.setattr(discovery, "probe_disk", fake_probe)
|
|
||||||
|
|
||||||
menu = asyncio.run(discovery.build_menu(settings, cat))
|
|
||||||
key = repo_to_key("org/sharded-235B")
|
|
||||||
assert list(menu) == [key] # exactly one card
|
|
||||||
assert menu[key]["total_bytes"] == 140_000_000_000 # summed across both hosts
|
|
||||||
assert len(menu[key]["per_host"]) == 2
|
|
||||||
assert menu[key]["mode"] == "cluster" # present on 2 hosts -> cluster
|
|
||||||
@@ -1,69 +0,0 @@
|
|||||||
"""_merge_words_with_speakers + _assign_speaker_to_word: the transcript/diarizer
|
|
||||||
merge that turns Parakeet words + Sortformer turns into speaker-labelled blocks.
|
|
||||||
Pure functions, no cluster — this is the core of transcribe-with-speakers.
|
|
||||||
"""
|
|
||||||
from app.audio_proxy import _assign_speaker_to_word, _merge_words_with_speakers
|
|
||||||
|
|
||||||
|
|
||||||
def _w(start, end, text):
|
|
||||||
return {"start": start, "end": end, "text": text}
|
|
||||||
|
|
||||||
|
|
||||||
def _t(start, end, speaker):
|
|
||||||
return {"start_s": start, "end_s": end, "speaker": speaker}
|
|
||||||
|
|
||||||
|
|
||||||
# ---- _assign_speaker_to_word ----
|
|
||||||
|
|
||||||
def test_assign_by_midpoint_containment():
|
|
||||||
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
|
|
||||||
assert _assign_speaker_to_word(2.4, 2.8, turns) == "Speaker_1"
|
|
||||||
|
|
||||||
|
|
||||||
def test_assign_falls_back_to_max_overlap_when_midpoint_outside():
|
|
||||||
# midpoint 5.0 is in no turn; word span overlaps Speaker_0 more than Speaker_1.
|
|
||||||
turns = [_t(0.0, 4.9, "Speaker_0"), _t(6.0, 8.0, "Speaker_1")]
|
|
||||||
assert _assign_speaker_to_word(4.0, 6.0, turns) == "Speaker_0"
|
|
||||||
|
|
||||||
|
|
||||||
def test_assign_unknown_when_no_overlap():
|
|
||||||
turns = [_t(0.0, 1.0, "Speaker_0")]
|
|
||||||
assert _assign_speaker_to_word(10.0, 11.0, turns) == "Speaker_unknown"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- _merge_words_with_speakers ----
|
|
||||||
|
|
||||||
def test_empty_words_returns_empty():
|
|
||||||
assert _merge_words_with_speakers([], [_t(0, 1, "Speaker_0")]) == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_consecutive_same_speaker_words_join_into_one_block():
|
|
||||||
words = [_w(0.0, 0.5, "good"), _w(0.5, 1.0, "morning")]
|
|
||||||
turns = [_t(0.0, 2.0, "Speaker_0")]
|
|
||||||
blocks = _merge_words_with_speakers(words, turns)
|
|
||||||
assert blocks == [
|
|
||||||
{"start_ms": 0, "end_ms": 1000, "speaker": "Speaker_0", "text": "good morning"}
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def test_speaker_change_splits_blocks():
|
|
||||||
words = [_w(0.0, 1.0, "hi"), _w(2.1, 3.0, "hello")]
|
|
||||||
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
|
|
||||||
blocks = _merge_words_with_speakers(words, turns)
|
|
||||||
assert [b["speaker"] for b in blocks] == ["Speaker_0", "Speaker_1"]
|
|
||||||
assert [b["text"] for b in blocks] == ["hi", "hello"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_long_silence_breaks_block_for_same_speaker():
|
|
||||||
# >1.5s gap between two words of the same speaker forces a new block.
|
|
||||||
words = [_w(0.0, 0.5, "one"), _w(3.0, 3.5, "two")]
|
|
||||||
turns = [_t(0.0, 4.0, "Speaker_0")]
|
|
||||||
blocks = _merge_words_with_speakers(words, turns)
|
|
||||||
assert len(blocks) == 2
|
|
||||||
assert [b["text"] for b in blocks] == ["one", "two"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_punctuation_token_joins_without_leading_space():
|
|
||||||
words = [_w(0.0, 0.5, "hello"), _w(0.5, 0.7, ".")]
|
|
||||||
turns = [_t(0.0, 2.0, "Speaker_0")]
|
|
||||||
assert _merge_words_with_speakers(words, turns)[0]["text"] == "hello."
|
|
||||||
@@ -1,148 +0,0 @@
|
|||||||
"""build_launch_command: argument assembly + the shell-injection invariant.
|
|
||||||
|
|
||||||
The security-critical property is that every user-controllable value (repo,
|
|
||||||
vllm_args, knobs) is shlex-quoted at the sink, so `shlex.split` cleanly reverses
|
|
||||||
the command back into the exact token list. The vLLM pre-flight validator
|
|
||||||
(validate.py) depends on this round-trip — these tests lock it in.
|
|
||||||
"""
|
|
||||||
import shlex
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
from pydantic import ValidationError
|
|
||||||
|
|
||||||
from app.models import Defaults, ModelDef, build_launch_command
|
|
||||||
|
|
||||||
DEFAULTS = Defaults(port=8888, host="0.0.0.0")
|
|
||||||
|
|
||||||
|
|
||||||
def _model(**kw) -> ModelDef:
|
|
||||||
base = dict(display_name="X", repo="org/name", size_gb=1.0, mode="solo")
|
|
||||||
base.update(kw)
|
|
||||||
return ModelDef(**base)
|
|
||||||
|
|
||||||
|
|
||||||
def test_solo_model_emits_solo_flag_and_ordered_args():
|
|
||||||
cmd = build_launch_command("k", _model(vllm_args=["--max-model-len=1000"]), DEFAULTS)
|
|
||||||
assert cmd == (
|
|
||||||
"./launch-cluster.sh --solo -d exec vllm serve org/name "
|
|
||||||
"--port=8888 --host=0.0.0.0 --max-model-len=1000"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_cluster_model_omits_solo_flag():
|
|
||||||
cmd = build_launch_command("k", _model(mode="cluster", vllm_args=["-tp=2"]), DEFAULTS)
|
|
||||||
assert " --solo " not in cmd
|
|
||||||
assert cmd.startswith("./launch-cluster.sh -d exec vllm serve org/name")
|
|
||||||
|
|
||||||
|
|
||||||
def test_knob_overrides_matching_bundled_flag():
|
|
||||||
# bundled arg sets max-model-len; the knob must win (single occurrence).
|
|
||||||
m = _model(vllm_args=["--max-model-len=1000"], knobs={"max_model_len": 65536})
|
|
||||||
cmd = build_launch_command("k", m, DEFAULTS)
|
|
||||||
assert "--max-model-len=65536" in cmd
|
|
||||||
assert "--max-model-len=1000" not in cmd
|
|
||||||
|
|
||||||
|
|
||||||
def test_repo_with_shell_metacharacters_is_quoted_not_executed():
|
|
||||||
# build_launch_command quotes even a hostile repo (validate_repo guards the
|
|
||||||
# API boundary; this proves the sink itself is safe in depth).
|
|
||||||
evil = "org/name; rm -rf ~ #"
|
|
||||||
cmd = build_launch_command("k", _model(repo=evil), DEFAULTS)
|
|
||||||
# The raw metacharacters must not appear unquoted...
|
|
||||||
assert "; rm -rf" not in cmd.replace(shlex.quote(evil), "")
|
|
||||||
# ...and shlex.split must recover the repo as one literal token.
|
|
||||||
tokens = shlex.split(cmd)
|
|
||||||
assert evil in tokens
|
|
||||||
|
|
||||||
|
|
||||||
def test_command_string_round_trips_through_shlex_split():
|
|
||||||
# The invariant validate.py relies on: every arg survives quote -> split intact.
|
|
||||||
args = ["--max-model-len=32768", "--load-format=fastsafetensors", "--note=a b c"]
|
|
||||||
cmd = build_launch_command("k", _model(vllm_args=args), DEFAULTS)
|
|
||||||
tokens = shlex.split(cmd)
|
|
||||||
for a in args:
|
|
||||||
assert a in tokens
|
|
||||||
|
|
||||||
|
|
||||||
def test_injection_via_vllm_arg_stays_literal():
|
|
||||||
payload = "--foo=$(touch /tmp/pwned)"
|
|
||||||
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
|
|
||||||
assert payload in shlex.split(cmd) # preserved as one inert token
|
|
||||||
|
|
||||||
|
|
||||||
# ---- local / fine-tuned models (served by directory, not HF repo) ----
|
|
||||||
|
|
||||||
def test_local_model_bind_mounts_dir_and_serves_the_path():
|
|
||||||
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
|
|
||||||
cmd = build_launch_command("k", m, DEFAULTS)
|
|
||||||
tokens = shlex.split(cmd)
|
|
||||||
# The launch script's hook bind-mounts the host dir at the SAME container path.
|
|
||||||
assert tokens[0] == (
|
|
||||||
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
|
|
||||||
)
|
|
||||||
# vLLM is pointed at the directory, not an HF repo id.
|
|
||||||
i = tokens.index("serve")
|
|
||||||
assert tokens[i + 1] == "/home/u/models/ft-v2"
|
|
||||||
assert "--max-model-len=2048" in tokens
|
|
||||||
|
|
||||||
|
|
||||||
def test_local_model_chat_template_arg_survives_round_trip():
|
|
||||||
m = _model(
|
|
||||||
repo="",
|
|
||||||
local_path="/m/ft",
|
|
||||||
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
|
|
||||||
)
|
|
||||||
cmd = build_launch_command("k", m, DEFAULTS)
|
|
||||||
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
|
|
||||||
|
|
||||||
|
|
||||||
def test_local_path_with_metacharacters_is_quoted_not_executed():
|
|
||||||
# The validator rejects a hostile path at the boundary; bypass it with
|
|
||||||
# model_construct to prove the quote_arg sink is safe in depth even if a bad
|
|
||||||
# value somehow reaches build_launch_command.
|
|
||||||
evil = "/m/ft; rm -rf ~"
|
|
||||||
m = ModelDef.model_construct(
|
|
||||||
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
|
|
||||||
vllm_args=[], knobs=None, custom=False, capabilities=[],
|
|
||||||
expected_ready_seconds=300, description=None,
|
|
||||||
)
|
|
||||||
cmd = build_launch_command("k", m, DEFAULTS)
|
|
||||||
tokens = shlex.split(cmd)
|
|
||||||
i = tokens.index("serve")
|
|
||||||
assert tokens[i + 1] == evil # recovered as one literal token, not executed
|
|
||||||
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
|
|
||||||
|
|
||||||
|
|
||||||
def test_model_requires_exactly_one_source():
|
|
||||||
with pytest.raises(ValidationError):
|
|
||||||
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
|
|
||||||
with pytest.raises(ValidationError):
|
|
||||||
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
|
|
||||||
|
|
||||||
|
|
||||||
def test_local_model_rejects_chat_template_outside_dir():
|
|
||||||
# Only local_path is mounted into the container, so a chat-template elsewhere
|
|
||||||
# would silently 404 inside vLLM — reject it up front.
|
|
||||||
with pytest.raises(ValidationError):
|
|
||||||
ModelDef(
|
|
||||||
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
|
|
||||||
vllm_args=["--chat-template=/other/dir/t.jinja"],
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_invalid_local_path_rejected_by_model():
|
|
||||||
with pytest.raises(ValidationError):
|
|
||||||
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
|
|
||||||
|
|
||||||
|
|
||||||
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
|
|
||||||
# YAML/override-added local models get the same validation as the API; a single
|
|
||||||
# bad entry is skipped (logged) rather than breaking the whole catalog load.
|
|
||||||
from app import models as M
|
|
||||||
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
|
|
||||||
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
|
|
||||||
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
|
|
||||||
]})
|
|
||||||
cat = M._merge_overrides(M.Catalog(models={}))
|
|
||||||
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
|
|
||||||
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
|
|
||||||
@@ -1,47 +0,0 @@
|
|||||||
"""build_update_command: the matrix-bridge update one-liner.
|
|
||||||
|
|
||||||
Pure string assembly, no cluster. Locks in the contract from
|
|
||||||
docs/spark-control-integration.md (matrix-bridge repo): fetch, hard-reset to the
|
|
||||||
release branch, then rebuild/recreate via docker compose — chained with `&&` so
|
|
||||||
any failure (e.g. Gitea unreachable) aborts before the build and surfaces a
|
|
||||||
non-zero exit. The clone dir must stay unquoted so a `~` expands server-side.
|
|
||||||
"""
|
|
||||||
from app.matrix_bridge import build_update_command, _phase_for
|
|
||||||
|
|
||||||
|
|
||||||
def test_command_is_the_contract_chain():
|
|
||||||
cmd = build_update_command("~/matrix-bridge", "master")
|
|
||||||
assert cmd == (
|
|
||||||
"cd ~/matrix-bridge && "
|
|
||||||
"git fetch origin && "
|
|
||||||
"git reset --hard origin/master && "
|
|
||||||
"docker compose up -d --build"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_fail_loud_chaining():
|
|
||||||
# Every step is &&-chained: a failed fetch never reaches the build.
|
|
||||||
cmd = build_update_command("~/matrix-bridge", "master")
|
|
||||||
assert "; " not in cmd
|
|
||||||
assert cmd.count(" && ") == 3
|
|
||||||
assert cmd.index("git fetch") < cmd.index("git reset") < cmd.index("docker compose")
|
|
||||||
|
|
||||||
|
|
||||||
def test_tilde_dir_left_unquoted_for_server_side_expansion():
|
|
||||||
cmd = build_update_command("~/matrix-bridge", "master")
|
|
||||||
assert "cd ~/matrix-bridge &&" in cmd
|
|
||||||
assert "'~" not in cmd # quoting would defeat the home-dir expansion
|
|
||||||
|
|
||||||
|
|
||||||
def test_absolute_dir_and_custom_branch():
|
|
||||||
cmd = build_update_command("/home/modelo/matrix-bridge", "phase-1")
|
|
||||||
assert cmd.startswith("cd /home/modelo/matrix-bridge && ")
|
|
||||||
assert "git reset --hard origin/phase-1 &&" in cmd
|
|
||||||
|
|
||||||
|
|
||||||
def test_phase_detection_maps_known_lines():
|
|
||||||
assert _phase_for("HEAD is now at 1a2b3c4 some commit") == "Resetting to the latest release…"
|
|
||||||
assert _phase_for("#5 building image") == "Building the bot image…"
|
|
||||||
assert _phase_for("Container matrix-bridge Recreate") == "Recreating the container…"
|
|
||||||
assert _phase_for("Already up to date.") == "No new code; rebuilding…"
|
|
||||||
assert _phase_for("some unremarkable line") is None
|
|
||||||
@@ -1,127 +0,0 @@
|
|||||||
"""shellsafe validators: the API-boundary whitelist behind the v0.19.0 SSH
|
|
||||||
command-injection hardening. The quoting *sink* is covered in
|
|
||||||
test_launch_command.py; this locks in the *boundary* — that hostile input is
|
|
||||||
rejected early, and that a valid value passes through unchanged so callers can
|
|
||||||
use `validate_x(v)` inline.
|
|
||||||
"""
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
from app.shellsafe import (
|
|
||||||
validate_container,
|
|
||||||
validate_image,
|
|
||||||
validate_local_path,
|
|
||||||
validate_repo,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Shell metacharacters that must never survive any validator — these are the
|
|
||||||
# actual injection vectors. (Path traversal like "../" is NOT in scope here:
|
|
||||||
# validate_image legitimately permits "/" and "." for real image refs such as
|
|
||||||
# nvcr.io/nim/...; the defense for images is "no shell metacharacters" + the
|
|
||||||
# quote_arg sink, not path-shape. Slash-rejection is tested directly for repo
|
|
||||||
# and container, where "/" is disallowed.)
|
|
||||||
HOSTILE = [
|
|
||||||
"; rm -rf /",
|
|
||||||
" a b",
|
|
||||||
"$(touch pwned)",
|
|
||||||
"`id`",
|
|
||||||
"x|cat",
|
|
||||||
"x&y",
|
|
||||||
"x>out",
|
|
||||||
"x\nrm",
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
# ---- validate_repo: HF 'org/name', exactly one slash ----
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("repo", [
|
|
||||||
"RedHatAI/Qwen3.6-35B-A3B-NVFP4", # the live production model
|
|
||||||
"org/name",
|
|
||||||
"a.b_c-d/x.y_z-1",
|
|
||||||
])
|
|
||||||
def test_repo_valid_passes_through_unchanged(repo):
|
|
||||||
assert validate_repo(repo) == repo
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("repo", [
|
|
||||||
"",
|
|
||||||
"noslash",
|
|
||||||
"a/b/c", # two slashes
|
|
||||||
"/name", # empty org
|
|
||||||
"org/", # empty name
|
|
||||||
] + [f"org/name{h}" for h in HOSTILE])
|
|
||||||
def test_repo_rejects_malformed_and_hostile(repo):
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
validate_repo(repo)
|
|
||||||
|
|
||||||
|
|
||||||
# ---- validate_image: registry/path:tag@digest ----
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("image", [
|
|
||||||
"nvcr.io/nim/nvidia/parakeet-1_1b-ctc-en-us:latest",
|
|
||||||
"ubuntu",
|
|
||||||
"img@sha256:deadbeefcafe",
|
|
||||||
"a.b/c:1.2_3-4",
|
|
||||||
])
|
|
||||||
def test_image_valid_passes_through_unchanged(image):
|
|
||||||
assert validate_image(image) == image
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("image", [
|
|
||||||
"",
|
|
||||||
"-leading", # must start alphanumeric
|
|
||||||
".leading",
|
|
||||||
"/leading",
|
|
||||||
":leading",
|
|
||||||
"a" * 513, # over the 512 cap
|
|
||||||
] + [f"img{h}" for h in HOSTILE])
|
|
||||||
def test_image_rejects_malformed_and_hostile(image):
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
validate_image(image)
|
|
||||||
|
|
||||||
|
|
||||||
# ---- validate_container: Docker name rule, no slash ----
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("name", [
|
|
||||||
"parakeet-asr",
|
|
||||||
"a",
|
|
||||||
"vol_1.2-3",
|
|
||||||
])
|
|
||||||
def test_container_valid_passes_through_unchanged(name):
|
|
||||||
assert validate_container(name) == name
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("name", [
|
|
||||||
"",
|
|
||||||
"_leading", # underscore is not a valid first char
|
|
||||||
"-leading",
|
|
||||||
".leading",
|
|
||||||
"has/slash", # slash not allowed in a container name
|
|
||||||
"a" * 129, # over the 128 cap
|
|
||||||
] + [f"name{h}" for h in HOSTILE])
|
|
||||||
def test_container_rejects_malformed_and_hostile(name):
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
validate_container(name)
|
|
||||||
|
|
||||||
|
|
||||||
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("path", [
|
|
||||||
"/home/modelo/models/gemma-4-31B-ten31-v2",
|
|
||||||
"/data/models/ft.v2_1",
|
|
||||||
"/srv/m/a-b/c",
|
|
||||||
])
|
|
||||||
def test_local_path_valid_passes_through_unchanged(path):
|
|
||||||
assert validate_local_path(path) == path
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("path", [
|
|
||||||
"",
|
|
||||||
"relative/path", # must be absolute
|
|
||||||
"~/models/x", # no ~ expansion
|
|
||||||
"/models/../etc/shadow", # '..' traversal
|
|
||||||
"/models/./x", # '.' segment
|
|
||||||
"/a" * 300, # over the 512 cap (600 chars)
|
|
||||||
] + [f"/models/x{h}" for h in HOSTILE])
|
|
||||||
def test_local_path_rejects_relative_traversal_and_hostile(path):
|
|
||||||
with pytest.raises(ValueError):
|
|
||||||
validate_local_path(path)
|
|
||||||
@@ -1,120 +0,0 @@
|
|||||||
"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
|
|
||||||
extra-vLLM probe. All offline — the disabled checks short-circuit before any
|
|
||||||
network call, and the probes are exercised only on the not-configured path.
|
|
||||||
"""
|
|
||||||
import asyncio
|
|
||||||
|
|
||||||
from app.config import Settings
|
|
||||||
from app.health import (
|
|
||||||
check_embeddings,
|
|
||||||
check_kokoro,
|
|
||||||
check_parakeet,
|
|
||||||
check_qdrant,
|
|
||||||
check_vllm,
|
|
||||||
probe_vllm_endpoint,
|
|
||||||
)
|
|
||||||
from app.services import services_from_settings
|
|
||||||
|
|
||||||
|
|
||||||
def _settings(monkeypatch, **env) -> Settings:
|
|
||||||
# Pin the topology env vars under test; default the rest to blank so a stray
|
|
||||||
# value in the real environment can't leak into the assertion.
|
|
||||||
keys = [
|
|
||||||
"SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
|
|
||||||
"DISABLED_SERVICES", "VLLM_CONTAINER",
|
|
||||||
]
|
|
||||||
for k in keys:
|
|
||||||
monkeypatch.delenv(k, raising=False)
|
|
||||||
for k, v in env.items():
|
|
||||||
monkeypatch.setenv(k, v)
|
|
||||||
return Settings.from_env()
|
|
||||||
|
|
||||||
|
|
||||||
# ---- DISABLED_SERVICES parsing ----
|
|
||||||
|
|
||||||
def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
|
|
||||||
s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
|
|
||||||
assert s.disabled_services == frozenset({"parakeet", "kokoro"})
|
|
||||||
|
|
||||||
|
|
||||||
def test_disabled_services_blank_is_empty(monkeypatch):
|
|
||||||
assert _settings(monkeypatch).disabled_services == frozenset()
|
|
||||||
|
|
||||||
|
|
||||||
# ---- vLLM container override ----
|
|
||||||
|
|
||||||
def test_vllm_container_defaults_to_vllm_node(monkeypatch):
|
|
||||||
assert _settings(monkeypatch).vllm_container == "vllm_node"
|
|
||||||
|
|
||||||
|
|
||||||
def test_vllm_container_override(monkeypatch):
|
|
||||||
assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
|
|
||||||
|
|
||||||
|
|
||||||
def test_vllm_container_invalid_falls_back(monkeypatch):
|
|
||||||
# A malformed value (space / shell metachar) is rejected at the boundary and
|
|
||||||
# falls back to the default rather than crashing startup or reaching a sink.
|
|
||||||
assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- services map honors the disable list ----
|
|
||||||
|
|
||||||
def test_services_from_settings_drops_disabled(monkeypatch):
|
|
||||||
s = _settings(
|
|
||||||
monkeypatch,
|
|
||||||
SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
|
||||||
SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
|
|
||||||
DISABLED_SERVICES="parakeet,qdrant",
|
|
||||||
)
|
|
||||||
svcs = services_from_settings(s)
|
|
||||||
assert "parakeet" not in svcs and "qdrant" not in svcs
|
|
||||||
assert "kokoro" in svcs and "embeddings" in svcs
|
|
||||||
|
|
||||||
|
|
||||||
def test_custom_vllm_service_registered(monkeypatch):
|
|
||||||
from app import custom_services
|
|
||||||
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
|
||||||
{"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
|
|
||||||
"user": "u", "container": "vllm_node", "port": 8000},
|
|
||||||
])
|
|
||||||
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
|
||||||
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
|
||||||
svc = services_from_settings(s)["vllm-spark2"]
|
|
||||||
assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
|
|
||||||
|
|
||||||
|
|
||||||
def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
|
|
||||||
# A custom entry can't shadow a built-in key — the built-in wins.
|
|
||||||
from app import custom_services
|
|
||||||
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
|
||||||
{"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
|
|
||||||
])
|
|
||||||
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
|
||||||
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
|
||||||
assert services_from_settings(s)["parakeet"].kind == "stt"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- disabled health checks short-circuit (no network) ----
|
|
||||||
|
|
||||||
def test_disabled_check_returns_disabled_verdict(monkeypatch):
|
|
||||||
s = _settings(
|
|
||||||
monkeypatch,
|
|
||||||
SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins
|
|
||||||
DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
|
|
||||||
)
|
|
||||||
for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
|
|
||||||
r = asyncio.run(check(s))
|
|
||||||
assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
|
||||||
|
|
||||||
|
|
||||||
# ---- vLLM probe: not-configured path is pure ----
|
|
||||||
|
|
||||||
def test_probe_vllm_endpoint_unconfigured(monkeypatch):
|
|
||||||
r = asyncio.run(probe_vllm_endpoint("", 8000))
|
|
||||||
assert r["ok"] is False and "not configured" in r["error"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_check_vllm_unconfigured_without_spark1(monkeypatch):
|
|
||||||
s = _settings(monkeypatch) # no SPARK1_HOST
|
|
||||||
r = asyncio.run(check_vllm(s))
|
|
||||||
assert r["ok"] is False and "spark1 not configured" in r["error"]
|
|
||||||
+2
-2
@@ -17,7 +17,7 @@ The old chunking/retry workaround in `audio_proxy.py` and the Magpie sections in
|
|||||||
**Fix:**
|
**Fix:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-2-host> 'docker run --rm -v magpie-model-cache:/cache alpine chown -R 1000:1000 /cache && docker restart magpie-tts'
|
ssh modelo@<spark-2-host> 'docker run --rm -v magpie-model-cache:/cache alpine chown -R 1000:1000 /cache && docker restart magpie-tts'
|
||||||
```
|
```
|
||||||
|
|
||||||
The trick is the `docker run --rm alpine chown` — it runs as root inside the throwaway container, which is enough to chown the bind-mounted volume on the host, without needing `sudo` on the host itself. After the chown + restart, magpie downloaded its ~3 GB model from NGC into the cache and came up healthy on `:9000`.
|
The trick is the `docker run --rm alpine chown` — it runs as root inside the throwaway container, which is enough to chown the bind-mounted volume on the host, without needing `sudo` on the host itself. After the chown + restart, magpie downloaded its ~3 GB model from NGC into the cache and came up healthy on `:9000`.
|
||||||
@@ -38,7 +38,7 @@ After the eugr/spark-vllm-docker update, vLLM became stricter about multimodal t
|
|||||||
|
|
||||||
## Two SSH paths to Spark 1 from the laptop
|
## Two SSH paths to Spark 1 from the laptop
|
||||||
|
|
||||||
`ssh <spark-user>@<spark-1-ip>` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for the Spark's `.local` mDNS name, not its bare IP. Always SSH via the `<spark-1-host>.local` hostname (or another entry that the ssh_config actually matches) rather than the raw IP.
|
`ssh modelo@192.168.1.103` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for `spark-27ea.local`. Always use the `.local` hostname or `192.168.1.87`-style entries that ARE matched.
|
||||||
|
|
||||||
## Older models in `models.yaml`
|
## Older models in `models.yaml`
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -1,6 +1,6 @@
|
|||||||
MIT License
|
MIT License
|
||||||
|
|
||||||
Copyright (c) 2026 Alice
|
Copyright (c) 2026 Grant
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -1,14 +1,3 @@
|
|||||||
ARCHES := x86
|
ARCHES := x86
|
||||||
# overrides to s9pk.mk must precede the include statement
|
# overrides to s9pk.mk must precede the include statement
|
||||||
include s9pk.mk
|
include s9pk.mk
|
||||||
|
|
||||||
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
|
|
||||||
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
|
|
||||||
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
|
|
||||||
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
|
|
||||||
|
|
||||||
.PHONY: release
|
|
||||||
release:
|
|
||||||
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
|
|
||||||
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
|
|
||||||
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
|
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ This package SSHes into your Spark server to run cluster commands, so it needs a
|
|||||||
```bash
|
```bash
|
||||||
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
||||||
```
|
```
|
||||||
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username you log into each Spark with.
|
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `modelo`).
|
||||||
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
||||||
|
|
||||||
## Using Spark Control
|
## Using Spark Control
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ This package SSHes into your Spark server to run cluster commands, so it needs a
|
|||||||
```bash
|
```bash
|
||||||
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
||||||
```
|
```
|
||||||
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username you log into each Spark with.
|
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `modelo`).
|
||||||
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
||||||
|
|
||||||
## Using Spark Control
|
## Using Spark Control
|
||||||
|
|||||||
@@ -40,33 +40,6 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'your SSH username',
|
placeholder: 'your SSH username',
|
||||||
masked: false,
|
masked: false,
|
||||||
}),
|
}),
|
||||||
vllm_port: Value.text({
|
|
||||||
name: 'vLLM port (optional)',
|
|
||||||
description:
|
|
||||||
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank for 8888',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
vllm_container: Value.text({
|
|
||||||
name: 'vLLM container name (optional)',
|
|
||||||
description:
|
|
||||||
'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'leave blank for vllm_node',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
disabled_services: Value.text({
|
|
||||||
name: 'Services to hide (optional)',
|
|
||||||
description:
|
|
||||||
"Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. parakeet,kokoro',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
parakeet_host: Value.text({
|
parakeet_host: Value.text({
|
||||||
name: 'Parakeet host (optional)',
|
name: 'Parakeet host (optional)',
|
||||||
description:
|
description:
|
||||||
@@ -146,15 +119,6 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'e.g. crm_chunks',
|
placeholder: 'e.g. crm_chunks',
|
||||||
masked: false,
|
masked: false,
|
||||||
}),
|
}),
|
||||||
matrix_bridge_user: Value.text({
|
|
||||||
name: 'matrix-bridge bot SSH user (optional)',
|
|
||||||
description:
|
|
||||||
"If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. modelo',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
open_webui_url: Value.text({
|
open_webui_url: Value.text({
|
||||||
name: 'Open WebUI URL (optional)',
|
name: 'Open WebUI URL (optional)',
|
||||||
description:
|
description:
|
||||||
@@ -173,24 +137,6 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'starts with "nvapi-..."',
|
placeholder: 'starts with "nvapi-..."',
|
||||||
masked: true,
|
masked: true,
|
||||||
}),
|
}),
|
||||||
swap_webhook_url: Value.text({
|
|
||||||
name: 'Swap webhook URL (optional)',
|
|
||||||
description:
|
|
||||||
'If you run automation that needs to know when the loaded model changes, paste a URL here. Spark Control POSTs a small JSON event (swap_complete / swap_failed) to it after every model swap, so the consumer can re-point its config to the new model. Leave blank to disable. Only needed if something other than this dashboard cares about swaps.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'e.g. https://my-service.local/spark-swap',
|
|
||||||
masked: false,
|
|
||||||
}),
|
|
||||||
swap_webhook_secret: Value.text({
|
|
||||||
name: 'Swap webhook secret (optional)',
|
|
||||||
description:
|
|
||||||
'Optional shared secret. If set, each webhook is signed with an "X-Spark-Signature: sha256=…" header (HMAC of the body) so the receiver can verify it really came from Spark Control. Leave blank to send the webhook unsigned.',
|
|
||||||
required: false,
|
|
||||||
default: null,
|
|
||||||
placeholder: 'a random string the receiver also knows',
|
|
||||||
masked: true,
|
|
||||||
}),
|
|
||||||
})
|
})
|
||||||
|
|
||||||
export const configureSparks = sdk.Action.withInput(
|
export const configureSparks = sdk.Action.withInput(
|
||||||
|
|||||||
@@ -7,13 +7,6 @@ export const sparkConfigSchema = z.object({
|
|||||||
spark1_user: z.string().catch(''),
|
spark1_user: z.string().catch(''),
|
||||||
spark2_host: z.string().catch(''),
|
spark2_host: z.string().catch(''),
|
||||||
spark2_user: z.string().catch(''),
|
spark2_user: z.string().catch(''),
|
||||||
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
|
||||||
vllm_port: z.string().catch(''),
|
|
||||||
// Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
|
|
||||||
vllm_container: z.string().catch(''),
|
|
||||||
// Optional comma-separated list of built-in services to switch off
|
|
||||||
// (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
|
|
||||||
disabled_services: z.string().catch(''),
|
|
||||||
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
||||||
parakeet_host: z.string().catch(''),
|
parakeet_host: z.string().catch(''),
|
||||||
parakeet_user: z.string().catch(''),
|
parakeet_user: z.string().catch(''),
|
||||||
@@ -29,17 +22,10 @@ export const sparkConfigSchema = z.object({
|
|||||||
qdrant_user: z.string().catch(''),
|
qdrant_user: z.string().catch(''),
|
||||||
qdrant_container: z.string().catch(''),
|
qdrant_container: z.string().catch(''),
|
||||||
qdrant_collection: z.string().catch(''),
|
qdrant_collection: z.string().catch(''),
|
||||||
// Optional matrix-bridge bot. Blank => no tile. Host reuses Spark 2.
|
|
||||||
matrix_bridge_user: z.string().catch(''),
|
|
||||||
// Optional Open WebUI deep-link
|
// Optional Open WebUI deep-link
|
||||||
open_webui_url: z.string().catch(''),
|
open_webui_url: z.string().catch(''),
|
||||||
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
||||||
ngc_api_key: z.string().catch(''),
|
ngc_api_key: z.string().catch(''),
|
||||||
// Optional coordination webhook: POSTed on swap_complete/swap_failed so
|
|
||||||
// downstream consumers re-point their model config. Blank => disabled.
|
|
||||||
swap_webhook_url: z.string().catch(''),
|
|
||||||
// Optional shared secret; if set, the webhook body is HMAC-signed.
|
|
||||||
swap_webhook_secret: z.string().catch(''),
|
|
||||||
})
|
})
|
||||||
|
|
||||||
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
||||||
|
|||||||
@@ -13,9 +13,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
spark1_user: '',
|
spark1_user: '',
|
||||||
spark2_host: '',
|
spark2_host: '',
|
||||||
spark2_user: '',
|
spark2_user: '',
|
||||||
vllm_port: '',
|
|
||||||
vllm_container: '',
|
|
||||||
disabled_services: '',
|
|
||||||
parakeet_host: '',
|
parakeet_host: '',
|
||||||
parakeet_user: '',
|
parakeet_user: '',
|
||||||
parakeet_container: '',
|
parakeet_container: '',
|
||||||
@@ -29,11 +26,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
qdrant_user: '',
|
qdrant_user: '',
|
||||||
qdrant_container: '',
|
qdrant_container: '',
|
||||||
qdrant_collection: '',
|
qdrant_collection: '',
|
||||||
matrix_bridge_user: '',
|
|
||||||
open_webui_url: '',
|
open_webui_url: '',
|
||||||
ngc_api_key: '',
|
ngc_api_key: '',
|
||||||
swap_webhook_url: '',
|
|
||||||
swap_webhook_secret: '',
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return sdk.Daemons.of(effects).addDaemon('primary', {
|
return sdk.Daemons.of(effects).addDaemon('primary', {
|
||||||
@@ -55,9 +49,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
SPARK1_USER: cfg.spark1_user,
|
SPARK1_USER: cfg.spark1_user,
|
||||||
SPARK2_HOST: cfg.spark2_host,
|
SPARK2_HOST: cfg.spark2_host,
|
||||||
SPARK2_USER: cfg.spark2_user,
|
SPARK2_USER: cfg.spark2_user,
|
||||||
VLLM_PORT: cfg.vllm_port,
|
|
||||||
VLLM_CONTAINER: cfg.vllm_container,
|
|
||||||
DISABLED_SERVICES: cfg.disabled_services,
|
|
||||||
PARAKEET_HOST: cfg.parakeet_host,
|
PARAKEET_HOST: cfg.parakeet_host,
|
||||||
PARAKEET_USER: cfg.parakeet_user,
|
PARAKEET_USER: cfg.parakeet_user,
|
||||||
PARAKEET_CONTAINER: cfg.parakeet_container,
|
PARAKEET_CONTAINER: cfg.parakeet_container,
|
||||||
@@ -71,14 +62,11 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
QDRANT_USER: cfg.qdrant_user,
|
QDRANT_USER: cfg.qdrant_user,
|
||||||
QDRANT_CONTAINER: cfg.qdrant_container,
|
QDRANT_CONTAINER: cfg.qdrant_container,
|
||||||
QDRANT_COLLECTION: cfg.qdrant_collection,
|
QDRANT_COLLECTION: cfg.qdrant_collection,
|
||||||
MATRIX_BRIDGE_USER: cfg.matrix_bridge_user,
|
|
||||||
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
||||||
SERVICES_OVERRIDES: '/data/services-overrides.yaml',
|
SERVICES_OVERRIDES: '/data/services-overrides.yaml',
|
||||||
CONNECTIVITY_LOG: '/data/connectivity.json',
|
CONNECTIVITY_LOG: '/data/connectivity.json',
|
||||||
OPEN_WEBUI_URL: cfg.open_webui_url,
|
OPEN_WEBUI_URL: cfg.open_webui_url,
|
||||||
NGC_API_KEY: cfg.ngc_api_key,
|
NGC_API_KEY: cfg.ngc_api_key,
|
||||||
SWAP_WEBHOOK_URL: cfg.swap_webhook_url,
|
|
||||||
SWAP_WEBHOOK_SECRET: cfg.swap_webhook_secret,
|
|
||||||
BIND_PORT: String(uiPort),
|
BIND_PORT: String(uiPort),
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
export const v0_1_0 = VersionInfo.of({
|
export const v0_1_0 = VersionInfo.of({
|
||||||
version: '0.26.0:0',
|
version: '0.18.0:0',
|
||||||
releaseNotes: {
|
releaseNotes: {
|
||||||
en_US:
|
en_US:
|
||||||
"v0.26.0:0 — the model menu is now what's actually on your Sparks. The dashboard scans both Sparks for downloaded models and shows exactly those — no more hard-coded list. (1) Delete means delete: removing a model frees its weights AND takes the card off the menu (re-download later to bring it back, with its saved settings). (2) Download a new model and it appears on the menu by itself when it finishes. (3) Models Spark Control doesn't recognize show a \"needs setup\" card — the first time you switch to one, it reads the model's own files, guesses how to launch it (which family, solo vs both Sparks, the right vLLM flags), and asks you to confirm once; after that it's a normal card. (4) The download box now autocompletes known-good models. (5) Each install shows its own Sparks' models, so a shared copy no longer displays someone else's list. Removed the two legacy Qwen entries (235B FP8, 2.5 72B) — they'll still appear if you actually have them downloaded. No consumer-API changes; the /v1 proxy and swap API are unchanged.",
|
'v0.18.0 — dual-channel mode for POST /api/audio/label-merge. Instead of one mixed-mono file, a caller (Ten31 Transcripts) can send two sample-aligned tracks: mic_file (the local user) + system_file (everyone else, from screen capture). Rather than force the diarizer to re-disentangle a mono mix (which over-segments — proven: a stereo clip of 2 clean voices returned 3 speakers), we split the problem so each model gets the easiest mono input. The mic track yields the local user\'s words, gated to windows where the mic is genuinely the user speaking (mic louder than system — a self-VAD computed server-side per-window, or supplied via self_vad); this gate is load-bearing because the mic picks up the remote audio as quiet bleed. The system track is diarized (only has to separate the remote people) and named via the visual timeline + voiceprints. The user\'s clean voiceprint is enrolled from the mic track and injected into the voiceprint library, so a system cluster that is the user dialed in from a second device (dual-login) resolves to the user, not a stranger. Validated on a real misattributing call: fixes both mono-mix misattributions, recovers the dropped-to-Unknown local line, and correctly splits overlapping speech (two people saying "Hello" at once) that the coarse ground truth itself conflated. New form fields: mic_file + system_file (dual mode), self_name, self_vad (optional). The mono file path is unchanged and fully backward-compatible. Response gains a "mode" field (mono | dual_channel). Known limit: if loud remote bleed masks a quiet local word, the mic-track ASR may miss it — mitigated by a cleaner mic (headphones) or future echo-cancellation. See docs/AUDIO_API.md.',
|
||||||
},
|
},
|
||||||
migrations: {
|
migrations: {
|
||||||
up: async ({ effects }) => {},
|
up: async ({ effects }) => {},
|
||||||
|
|||||||
+11
-70
@@ -34,68 +34,20 @@ These take effect on the **next swap to that model**. If a swap fails after this
|
|||||||
- Status auto-refreshes every 5 s.
|
- Status auto-refreshes every 5 s.
|
||||||
- A swap takes 3–6 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
|
- A swap takes 3–6 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
|
||||||
|
|
||||||
## matrix-bridge bot tile (optional)
|
|
||||||
|
|
||||||
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
|
|
||||||
|
|
||||||
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
|
|
||||||
|
|
||||||
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
|
|
||||||
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
|
|
||||||
|
|
||||||
```
|
|
||||||
Host <your-git-host>
|
|
||||||
Port <port>
|
|
||||||
IdentityFile ~/.ssh/id_ed25519
|
|
||||||
IdentitiesOnly yes
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
|
||||||
|
|
||||||
## Configurable topology (v0.24.0+)
|
|
||||||
|
|
||||||
For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
|
|
||||||
|
|
||||||
- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
|
|
||||||
- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
|
|
||||||
- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
custom:
|
|
||||||
- key: vllm-spark2
|
|
||||||
kind: vllm
|
|
||||||
host: <spark-2-ip>
|
|
||||||
user: <ssh-user>
|
|
||||||
container: vllm_node
|
|
||||||
port: 8000
|
|
||||||
```
|
|
||||||
|
|
||||||
It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
|
|
||||||
|
|
||||||
## Adding a new model
|
## Adding a new model
|
||||||
|
|
||||||
The menu is whatever's downloaded on the Sparks, so the normal path is just:
|
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
||||||
**download it, then set it up once.**
|
2. Confirm the weights are on the Spark: `ssh modelo@spark-27ea.local 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
|
||||||
|
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
|
||||||
|
|
||||||
1. **Download** from the dashboard (**+ Download a new model**, paste the HF repo) or on Spark 1 with `./hf-download.sh <repo>`. When it finishes it appears on the menu by itself.
|
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
|
||||||
2. **Set it up.** If Spark Control already has a recipe for it (see below), it's ready to switch to. Otherwise it shows a **"needs setup"** card: the first switch reads the model's `config.json`, proposes how to launch it (family/parsers, solo vs cluster, vLLM flags), and you confirm once. The confirmed recipe persists to `/data/models-overrides.yaml` (survives package updates).
|
|
||||||
|
|
||||||
### Bundling a launch recipe (optional — skips the setup prompt)
|
|
||||||
|
|
||||||
To make a known model launch correctly the instant it's downloaded, add a *recipe* to `image/models.yaml`. These are **not** the menu — they're matched to an on-disk model by `repo`. Required: `display_name`, `repo`, `size_gb`, `mode` (`solo`/`cluster`), `vllm_args`. Optional: `description`, `capabilities` (e.g. `[vision, reasoning, tools]`), `expected_ready_seconds`. Then rebuild + redeploy: `cd package && make x86 && make install`. Keep descriptions generic (not user-specific) so the recipes stay portable.
|
|
||||||
|
|
||||||
### Local / fine-tuned models (v0.23.0+)
|
|
||||||
|
|
||||||
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
|
|
||||||
|
|
||||||
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
|
|
||||||
|
|
||||||
## Manual swap fallback
|
## Manual swap fallback
|
||||||
|
|
||||||
If the UI is unavailable and you need to swap by hand:
|
If the UI is unavailable and you need to swap by hand:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-host>
|
ssh modelo@spark-27ea.local
|
||||||
cd ~/spark-vllm-docker
|
cd ~/spark-vllm-docker
|
||||||
./launch-cluster.sh stop
|
./launch-cluster.sh stop
|
||||||
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
|
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
|
||||||
@@ -105,34 +57,23 @@ cd ~/spark-vllm-docker
|
|||||||
docker logs -f vllm_node # wait for "Application startup complete."
|
docker logs -f vllm_node # wait for "Application startup complete."
|
||||||
```
|
```
|
||||||
|
|
||||||
## Sideload (`make install`) can't reach the server
|
|
||||||
|
|
||||||
Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
|
|
||||||
|
|
||||||
- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
|
|
||||||
- **Fix:** make the name resolve again, then re-run `make install`:
|
|
||||||
- `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
|
|
||||||
- `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
|
|
||||||
|
|
||||||
Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
|
|
||||||
|
|
||||||
## Diagnostics
|
## Diagnostics
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Is vLLM serving?
|
# Is vLLM serving?
|
||||||
curl -s http://<spark-1-ip>:8888/v1/models | jq .
|
curl -s http://192.168.1.103:8888/v1/models | jq .
|
||||||
|
|
||||||
# Cluster status (containers up?)
|
# Cluster status (containers up?)
|
||||||
ssh <spark-user>@<spark-1-host> 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
ssh modelo@spark-27ea.local 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
||||||
|
|
||||||
# Tail current model's logs
|
# Tail current model's logs
|
||||||
ssh <spark-user>@<spark-1-host> 'docker logs --tail 200 -f vllm_node'
|
ssh modelo@spark-27ea.local 'docker logs --tail 200 -f vllm_node'
|
||||||
|
|
||||||
# Parakeet
|
# Parakeet
|
||||||
curl -s http://<spark-2-ip>:8000/health
|
curl -s http://192.168.1.87:8000/health
|
||||||
|
|
||||||
# Kokoro TTS (v0.14.0+)
|
# Kokoro TTS (v0.14.0+)
|
||||||
curl -s http://<spark-2-ip>:8880/health
|
curl -s http://192.168.1.87:8880/health
|
||||||
```
|
```
|
||||||
|
|
||||||
## Hard reset
|
## Hard reset
|
||||||
@@ -140,7 +81,7 @@ curl -s http://<spark-2-ip>:8880/health
|
|||||||
If launch-cluster.sh gets stuck:
|
If launch-cluster.sh gets stuck:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-host>
|
ssh modelo@spark-27ea.local
|
||||||
cd ~/spark-vllm-docker
|
cd ~/spark-vllm-docker
|
||||||
./launch-cluster.sh stop
|
./launch-cluster.sh stop
|
||||||
docker ps -aq | xargs -r docker rm -f
|
docker ps -aq | xargs -r docker rm -f
|
||||||
|
|||||||
@@ -1,65 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
|
|
||||||
# latest package with a read-only token instead of being hand-sent the file.
|
|
||||||
#
|
|
||||||
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
|
|
||||||
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
|
|
||||||
#
|
|
||||||
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
|
|
||||||
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
|
|
||||||
# reuses an existing release for the tag and replaces a same-named asset.
|
|
||||||
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
VERSION="${1:-}"; S9PK="${2:-}"
|
|
||||||
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
|
|
||||||
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
|
|
||||||
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
|
|
||||||
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
|
|
||||||
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
|
|
||||||
|
|
||||||
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
|
|
||||||
ASSET="$(basename "$S9PK")"
|
|
||||||
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
|
|
||||||
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
|
|
||||||
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
|
|
||||||
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
|
|
||||||
|
|
||||||
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
|
|
||||||
|
|
||||||
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
|
|
||||||
api() {
|
|
||||||
local method="$1" url="$2"; shift 2
|
|
||||||
local out
|
|
||||||
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
|
|
||||||
-w $'\n%{http_code}' "$url")"
|
|
||||||
HTTP_CODE="${out##*$'\n'}"
|
|
||||||
BODY="${out%$'\n'*}"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Reuse an existing release for this tag, otherwise create one.
|
|
||||||
api GET "$API/releases/tags/$TAG"
|
|
||||||
if [ "$HTTP_CODE" = 200 ]; then
|
|
||||||
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
|
||||||
elif [ "$HTTP_CODE" = 404 ]; then
|
|
||||||
api POST "$API/releases" -H 'Content-Type: application/json' \
|
|
||||||
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
|
|
||||||
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
|
|
||||||
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
|
||||||
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
|
||||||
else
|
|
||||||
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
|
|
||||||
|
|
||||||
# Replace a same-named asset so re-runs don't 409.
|
|
||||||
api GET "$API/releases/$id/assets"
|
|
||||||
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
|
|
||||||
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
|
|
||||||
|
|
||||||
api POST "$API/releases/$id/assets?name=$ASSET" \
|
|
||||||
-F "attachment=@${S9PK};type=application/octet-stream"
|
|
||||||
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
|
||||||
|
|
||||||
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"
|
|
||||||
Reference in New Issue
Block a user