Compare commits
17 Commits
de902f5e4b
...
v0.23.0
| Author | SHA1 | Date | |
|---|---|---|---|
| e783653ef0 | |||
| 57a893000e | |||
| 56f7ea4444 | |||
| aaad57d88f | |||
| 136a4713a1 | |||
| c179389731 | |||
| 9debeb4bbe | |||
| 39f8410623 | |||
| e307a08f05 | |||
| 89338c97f5 | |||
| d9c098262f | |||
| 6238ac88f7 | |||
| 17a9973ba2 | |||
| e87158c492 | |||
| 5341fcc506 | |||
| 05d03beeeb | |||
| 56a519ff4f |
@@ -11,5 +11,11 @@ node_modules/
|
||||
dist/
|
||||
build/
|
||||
.DS_Store
|
||||
|
||||
# Claude Code — deny by default, allow-list shared wiring (see standards/portability.md)
|
||||
.claude/*
|
||||
!.claude/rules/
|
||||
!.claude/agents/
|
||||
!.claude/commands/
|
||||
!.claude/skills/
|
||||
!.claude/settings.json
|
||||
|
||||
@@ -6,6 +6,9 @@ Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster
|
||||
|
||||
Subsystem guidance lives in `docs/guides/` and loads when matching files are touched (Claude Code lazy-loads via `.claude/rules/` symlinks; other agents read the guides directly): `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `docs/guides/audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
|
||||
|
||||
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
|
||||
> items tagged `(spark-control)` and surface them before proposing next steps; triage with `/triage`.
|
||||
|
||||
## Stack
|
||||
|
||||
- Two halves, always coordinated:
|
||||
@@ -20,6 +23,7 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
||||
```bash
|
||||
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
|
||||
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
|
||||
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
|
||||
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
|
||||
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
|
||||
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
|
||||
@@ -51,37 +55,12 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
||||
|
||||
## Current state
|
||||
|
||||
- **Working (v0.18.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack is healthy (11k+ requests/12h, all 200).
|
||||
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 1–4s unresponsiveness while the single GPU is continuously busy. Remedy is client-side; a drafted message (in-flight cap 2, hard ceiling 3 global across audio endpoints, retry-with-backoff on timeout/503) is with the owner to forward to that dev.
|
||||
- **Decided, not implemented:** remote access stays WireGuard/Tailscale split-tunnel — no public interface, so no API auth built; an empirical concurrency sweep is offered but needs the owner's explicit OK in a quiet window. **Revisit (full-eval 2026-06-12):** the "LAN-only, so no auth" call is now load-bearing against RCE — unquoted user input reaches the SSH shell on several endpoints, so the network boundary is the *only* thing preventing cluster takeover. Quoting the injection sinks (work queue) is needed regardless of the auth decision; a defense-in-depth auth/CSRF gate is the follow-on.
|
||||
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
|
||||
- **Portability:** working tree scrubbed 2026-06-12 — all owner-specific IPs/hostnames/usernames/names replaced with placeholders in tracked files; `claude-code-starter-prompt.md` deleted (old build-time prompt). Real cluster values live only in StartOS install config, shell env vars, and the gitignored `settings.local.json`. **Caveat (full-eval 2026-06-12): git *history* was not rewritten** — the old IPs/hosts/user `<spark-user>`/key name are still recoverable pre-`50c67cd`. The scrub is working-tree-only; treat the repo as private until history is rewritten (see work queue below).
|
||||
- **Repo wart:** commit `367d986` is labeled `v0.13.0:4` but actually contains everything through v0.18.0:0 — per-version commits for v0.14–v0.18 are missing. Keep commit messages accurate going forward.
|
||||
- **Hosting:** repo pushes to the owner's self-hosted Gitea — remote `gitea`, branch `master`, over SSH (host alias + key live in the local `~/.ssh/config`; no owner-specific details belong in the repo). Push there after committing.
|
||||
- **Next (pre-eval backlog):** (1) owner forwards the concurrency note to the Signal Engine dev; (2) run the concurrency sweep if the dev wants the measured knee; (3) add the `--memory` cap to parakeet-asr via the Reapply-patches action; (4) pick the next item from ROADMAP.md.
|
||||
|
||||
### Full-eval triage (2026-06-12)
|
||||
|
||||
Source: `EVALUATION.md` at repo root (full evidence, file:line pointers, scorecard). Findings triaged below; do these before the pre-eval backlog above where they overlap.
|
||||
|
||||
**Work queue — P0/P1, fix before sharing the package wider:**
|
||||
1. ~~**[P0] Shell-quote/validate every user value crossing into SSH**~~ — **DONE (code, 2026-06-12; not yet shipped).** New `image/app/shellsafe.py` (`validate_repo`/`validate_image`/`validate_container` whitelists + `quote_arg`/`quote_args`). Boundary validation added to `POST /api/models` (repo) and `POST /api/nim/install` (image+container); `shlex.quote` applied at every SSH sink — `models.build_launch_command` (repo+args, covers `vllm_args`+knobs), `download._do` (repo), `nim._do` (image/container/volume/port/env), `services.docker_state`+`run_action` (container). Verified: injection survives only as a single quoted token, vLLM preflight `shlex.split` round-trip intact, both redaction suites still pass. Side-benefit: NGC key now `shlex.quote`'d in `nim._do` (was single-quoted) — closes the quote-breakout half of the P2 NGC-key item; the process-list-exposure half remains. **Ship step pending:** version bump + release notes + rebuilt s9pk.
|
||||
2. **[P0] Decide the git-history question** — owner IPs/hosts/user `<spark-user>`/key name persist pre-`50c67cd` despite the working-tree scrub. Either rewrite history (`git-filter-repo`) + rotate the `<ssh-key>` key, or keep the repo private-forever. Blocks any public/shared publish. **(Open — git-ops decision, not code.)**
|
||||
3. ~~**[P1] Defense-in-depth gate on mutating endpoints**~~ — **DONE (code, 2026-06-12; not yet shipped).** `csrf_guard` HTTP middleware in `server.py` rejects state-changing requests whose `Origin`/`Referer` hostname ≠ the served host. Scoped to control endpoints; the programmatic API surface is exempt (`/v1/*`, `/scrub`, `/rehydrate`, `/api/search`, `/api/audio/`, `/api/health-event`) so downstream consumers are unaffected. No app-layer token auth (deliberate — would break consumers + the non-technical owner). Verified via TestClient: cross-origin control POST→403, same-origin/no-Origin→pass, exempt prefixes always pass, GET never blocked. **Verify on-box:** confirm the StartOS reverse proxy passes `Host`/`Origin` so the dashboard isn't false-positive-blocked.
|
||||
4. ~~**[P1] Validate the Qdrant `collection`**~~ — **DONE (code, 2026-06-12; not yet shipped).** `_safe_collection` whitelist (`[A-Za-z0-9._-]`, rejects `..`) + URL-encoded path segment in `embeddings_proxy.py`. The raw `filter` is left as a passthrough (Qdrant parses it; pydantic enforces `dict`) — locking it to an allowlist would break hybrid-search consumers; the path segment was the real injection vector.
|
||||
|
||||
**Shipping (all of #1/#3/#4 batched):** version bumped `0.18.0:1`→`0.19.0:0` with release notes (`versions/v0_1_0.ts`). Rebuild `make x86`; `make install` (live-service restart) needs explicit go-ahead. Not committed yet.
|
||||
|
||||
**Known debt — P2, track but not blocking:**
|
||||
- Test coverage is redaction-only; swap state machine, proxies, SSH wrapper, and the package have zero automated tests. Live-cluster paths (swap exec, audio, embeddings/search) couldn't be exercised at all — biggest blind spot.
|
||||
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml:6-13`).
|
||||
- StartOS registry blockers (only if pursuing the registry): source not public + `packageRepo`/`upstreamRepo` are `example.com` placeholders (`manifest/index.ts:12-13`).
|
||||
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
|
||||
- NGC API key inlined single-quoted into a remote shell command (`nim.py:147`) — pass via stdin/env.
|
||||
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py:107`) — latent race as concurrency grows.
|
||||
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.
|
||||
|
||||
**Parked — P3+, do in bulk when next touching docs/packaging:**
|
||||
- README Status block stale (`v0.2.3 / 0.13.0:4` → v0.18.0:1, undercounts features); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js:177`) + `task_id` reflected in scrub JSON.
|
||||
- Packaging cosmetics: `marketingUrl` placeholder; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though manifest declares `aarch64`; release notes describe the scrub, not capabilities.
|
||||
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
|
||||
- **Working (v0.22.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
|
||||
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
|
||||
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
|
||||
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
||||
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
||||
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
||||
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
||||
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
|
||||
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass; verified via TestClient). **Reviewer-agent pass done; findings addressed:** path validation folded into the `ModelDef` validator (so YAML/override-added local models are checked too), a chat-template-must-live-inside-`local_path` guard, `_merge_overrides` skips a bad entry instead of breaking the whole catalog, and the `VLLM_SPARK_EXTRA_DOCKER_ARGS` unquoted-expansion contract is documented in `runbook.md`. **Not yet built/installed/published — awaiting go/no-go.** Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
|
||||
|
||||
+1
-1
@@ -18,7 +18,7 @@ This is a capable, well-documented single-operator control plane: a ~960-line Fa
|
||||
## Priority queue
|
||||
|
||||
- [P0] Command injection via unquoted user input (`repo`, `vllm_args`, NIM `image`/`container`/`port`, custom-service `container`) interpolated into SSH shell commands → arbitrary RCE as the SSH user on the Sparks — `models.py:80`, `swap.py:101`, `download.py:129`, `nim.py:145-166`, `services.py:144`; demonstrated via `build_launch_command` — evaluator + security-auditor
|
||||
- [P0] Owner infra topology (IPs `<spark-1-ip>/.87`, QSFP `<spark-1-qsfp-ip>/11`, hosts `<spark-1-host>`/`<spark-2-host>`, user `<spark-user>`, key `<ssh-key>`) persists in git history pre-`50c67cd` despite the working-tree scrub → target list for the unauthenticated endpoints — security-auditor
|
||||
- [P0] Owner infra topology (IPs `<spark-1-ip>`/`<spark-2-ip>`, QSFP `<spark-1-qsfp-ip>`/`<spark-2-qsfp-ip>`, hosts `<spark-1-host>`/`<spark-2-host>`, user `<spark-user>`, key `<ssh-key>`) persisted in git history despite the working-tree scrub → target list for the unauthenticated endpoints — security-auditor [RESOLVED 2026-06-12: history rewritten with git filter-repo; 0 hits across all refs]
|
||||
- [P1] No auth + no CSRF protection on state-changing endpoints (plaintext `http`, `interfaces.ts:8`) → any LAN peer, or a malicious page in the operator's browser, can drive swap/install/stop/delete and chain into the P0 injections — security-auditor (CSRF P1) + evaluator (auth P2, escalated)
|
||||
- [P1] SSRF / Qdrant path injection: caller `collection` interpolated into the Qdrant URL with no validation and raw `filter` forwarded verbatim — `embeddings_proxy.py:237,175,204` — security-auditor
|
||||
- [P2] Test coverage is redaction-only; the swap state machine, proxies, SSH wrapper, and the StartOS package have zero automated tests — evaluator
|
||||
|
||||
+34
-1
@@ -2,8 +2,23 @@
|
||||
|
||||
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
|
||||
|
||||
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
|
||||
|
||||
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
|
||||
|
||||
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap` → `GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
|
||||
|
||||
Sequenced:
|
||||
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
||||
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
|
||||
3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
|
||||
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
|
||||
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
|
||||
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
|
||||
- **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule).
|
||||
|
||||
## Near term
|
||||
- parakeet-asr `--memory` cap, shipped via the Reapply-patches action (guards against swap-thrash on very long audio).
|
||||
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
|
||||
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
|
||||
|
||||
## Audio quality
|
||||
@@ -22,3 +37,21 @@ Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE
|
||||
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
|
||||
- Spark host update actions (OS/driver) from the UI.
|
||||
- Open WebUI link-out integration; richer per-service detail views.
|
||||
|
||||
## Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)
|
||||
|
||||
P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:
|
||||
|
||||
**P2 — track:**
|
||||
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for `build_launch_command` (incl. injection cases), swap transitions, and `_merge_words_with_speakers` is the highest-value start.
|
||||
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml`).
|
||||
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
|
||||
- NGC API key still appears on the remote process command line (`nim.py`) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
|
||||
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py`) — latent race as concurrency grows.
|
||||
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.
|
||||
|
||||
**P3 — bulk-fix when next touching docs/packaging:**
|
||||
- README Status block stale (`v0.2.3 / 0.13.0:4` → now v0.19.0:0); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js`) + `task_id` reflected in scrub JSON.
|
||||
- Packaging: `marketingUrl`/`packageRepo`/`upstreamRepo` are `example.com` placeholders; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though the manifest declares `aarch64`.
|
||||
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
|
||||
- StartOS registry (only if ever pursuing it): source must be public + real repo URLs.
|
||||
|
||||
@@ -24,12 +24,17 @@ Other env vars: `BIND_PORT`, `MODELS_YAML`, `SSH_DIR`, `SSH_KNOWN_HOSTS`, `MODEL
|
||||
|
||||
## Tests
|
||||
|
||||
No pytest harness — each suite is a standalone script run with the `image/.venv` interpreter (system python3 has no deps). See the redaction and audio rules for the suites themselves.
|
||||
Two kinds, both run with the `image/.venv` interpreter (system python3 has no deps):
|
||||
|
||||
- **pytest unit suite** — offline, pure functions, no cluster. `.venv/bin/python -m pytest` from `image/`. Lives in `image/tests/`; currently covers `build_launch_command` (incl. the shell-injection / `shlex` round-trip invariant) and the transcript↔diarizer label-merge (`_merge_words_with_speakers`). Install the test dep once with `pip install -e '.[dev]'`. Add new pure-function coverage here.
|
||||
- **Standalone scripts** — the redaction suites and the live-cluster audio e2e are run directly (not via pytest). See the redaction and audio rules.
|
||||
|
||||
## Conventions
|
||||
|
||||
- Pydantic request models go at **module scope**, never inside a `build_router()` body (FastAPI silently 422s otherwise).
|
||||
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
|
||||
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
|
||||
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
|
||||
|
||||
## Layout
|
||||
|
||||
|
||||
@@ -25,6 +25,22 @@ npm run prettier # prettier --write startos (no semicolons, single quotes, tra
|
||||
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
|
||||
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
|
||||
|
||||
## Releasing to Gitea
|
||||
|
||||
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
|
||||
|
||||
```bash
|
||||
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
|
||||
cd package && make x86 # build
|
||||
# 2. commit + push the source change
|
||||
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
|
||||
make install # optional: sideload to your own server (restarts it — go/no-go)
|
||||
# 3. publish the s9pk as a release asset (needs a write-scoped token):
|
||||
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
|
||||
```
|
||||
|
||||
`make release` → `scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
|
||||
|
||||
## Layout
|
||||
|
||||
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
|
||||
|
||||
+35
-7
@@ -8,6 +8,16 @@ def _env(name: str, default: str = "") -> str:
|
||||
return os.environ.get(name, default)
|
||||
|
||||
|
||||
def _env_int(name: str, default: int) -> int:
|
||||
"""Parse an int env var, falling back to `default` when unset, blank, or
|
||||
malformed. The StartOS Configure panel passes optional numeric fields as an
|
||||
empty string when left blank, so a bare int("") would crash daemon startup."""
|
||||
try:
|
||||
return int(os.environ.get(name, "") or default)
|
||||
except (TypeError, ValueError):
|
||||
return default
|
||||
|
||||
|
||||
def _resolve_models_yaml() -> str:
|
||||
if env := os.environ.get("MODELS_YAML"):
|
||||
return env
|
||||
@@ -42,6 +52,11 @@ class Settings:
|
||||
qdrant_user: str
|
||||
qdrant_container: str
|
||||
qdrant_collection: str
|
||||
matrix_bridge_host: str
|
||||
matrix_bridge_user: str
|
||||
matrix_bridge_container: str
|
||||
matrix_bridge_dir: str
|
||||
matrix_bridge_branch: str
|
||||
redaction_map_db: str
|
||||
redaction_map_ttl: int
|
||||
ssh_key_path: str
|
||||
@@ -81,18 +96,31 @@ class Settings:
|
||||
qdrant_user=_env("QDRANT_USER") or spark2_user,
|
||||
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
|
||||
qdrant_collection=_env("QDRANT_COLLECTION", ""),
|
||||
# matrix-bridge bot container, driven as its own SSH user (the owner
|
||||
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
|
||||
# The user is BLANK by default and set via the "Configure Sparks"
|
||||
# action; leaving it blank reports the service as unconfigured, which
|
||||
# hides the tile. That keeps the shared package portable — a
|
||||
# deployment without the bot never shows a stray tile or a hardcoded
|
||||
# username. Host defaults to Spark 2 (same box); container/dir/branch
|
||||
# are sensible defaults. All are env-overridable.
|
||||
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
|
||||
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
|
||||
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
|
||||
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
|
||||
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
|
||||
# Redaction gateway pseudonym-map store (server-held de-anon key).
|
||||
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
|
||||
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")),
|
||||
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
|
||||
ssh_key_path=_env("SSH_KEY_PATH"),
|
||||
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
||||
models_yaml=_resolve_models_yaml(),
|
||||
vllm_port=int(_env("VLLM_PORT", "8888")),
|
||||
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
|
||||
kokoro_port=int(_env("KOKORO_PORT", "8880")),
|
||||
embed_port=int(_env("EMBED_PORT", "8088")),
|
||||
qdrant_port=int(_env("QDRANT_PORT", "6333")),
|
||||
bind_port=int(_env("BIND_PORT", "9999")),
|
||||
vllm_port=_env_int("VLLM_PORT", 8888),
|
||||
parakeet_port=_env_int("PARAKEET_PORT", 8000),
|
||||
kokoro_port=_env_int("KOKORO_PORT", 8880),
|
||||
embed_port=_env_int("EMBED_PORT", 8088),
|
||||
qdrant_port=_env_int("QDRANT_PORT", 6333),
|
||||
bind_port=_env_int("BIND_PORT", 9999),
|
||||
open_webui_url=_env("OPEN_WEBUI_URL", ""),
|
||||
ngc_api_key=_env("NGC_API_KEY", ""),
|
||||
)
|
||||
|
||||
+40
-3
@@ -15,6 +15,7 @@ from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
from .config import Settings
|
||||
from .shellsafe import quote_arg
|
||||
from .ssh import ssh_run
|
||||
|
||||
|
||||
@@ -76,16 +77,52 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
|
||||
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||
|
||||
|
||||
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
||||
"""Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
|
||||
async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
|
||||
"""Return whether a local model directory exists on this host and its size.
|
||||
|
||||
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
|
||||
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
|
||||
shlex-quote it here in depth.
|
||||
"""
|
||||
if not host or not user:
|
||||
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
|
||||
qp = quote_arg(path)
|
||||
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
|
||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
||||
if rc != 0:
|
||||
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
|
||||
raw = out.strip()
|
||||
if raw == "MISSING" or raw == "":
|
||||
return HostDiskResult(host=host, on_disk=False)
|
||||
try:
|
||||
size = int(raw.splitlines()[-1])
|
||||
except ValueError:
|
||||
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
|
||||
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||
|
||||
|
||||
async def probe_disk(
|
||||
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
|
||||
) -> DiskStatus:
|
||||
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
|
||||
|
||||
A local model (local_path set) is probed by directory; otherwise by HF cache.
|
||||
"""
|
||||
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||
if mode == "cluster" and settings.spark2_host:
|
||||
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||
|
||||
if local_path:
|
||||
results = await asyncio.gather(
|
||||
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
|
||||
)
|
||||
key = local_path
|
||||
else:
|
||||
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
||||
key = repo
|
||||
on_disk = any(r.on_disk for r in results)
|
||||
total = sum(r.size_bytes for r in results)
|
||||
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
||||
return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
||||
|
||||
|
||||
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
||||
|
||||
@@ -26,6 +26,9 @@ echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.dra
|
||||
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
|
||||
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
|
||||
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
|
||||
WGIF=$(ip -o link show type wireguard 2>/dev/null | awk -F': ' 'NR==1 {print $2}')
|
||||
echo WG_IFACE=$WGIF
|
||||
echo WG_ADDR=$(ip -o -4 addr show "$WGIF" 2>/dev/null | awk 'NR==1 {print $4}')
|
||||
""".strip()
|
||||
|
||||
|
||||
@@ -84,6 +87,11 @@ def _parse(out: str) -> dict:
|
||||
# MAC address on the default-route interface (for Wake-on-LAN)
|
||||
if info.get("mac"):
|
||||
parsed["mac"] = info["mac"].lower()
|
||||
# WireGuard tunnel membership: name + address of the first wg interface, if
|
||||
# any. Read-only and unprivileged (`ip` needs no root), so it never depends
|
||||
# on sudo and never breaks the probe — absence just yields no badge.
|
||||
parsed["wg_iface"] = info.get("wg_iface") or None
|
||||
parsed["wg_addr"] = info.get("wg_addr") or None
|
||||
return parsed
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,186 @@
|
||||
"""Update + logs for the matrix-bridge bot container on the Spark.
|
||||
|
||||
matrix-bridge is a single Docker container managed by docker compose out of a
|
||||
git clone at `~matrix_bridge_user/matrix-bridge`. Status (the badge) and
|
||||
start/stop/restart ride the generic service machinery in `services.py`
|
||||
(`docker_state` / `run_action`). The two things that don't fit that mould live
|
||||
here:
|
||||
|
||||
- **Update** — `git fetch && git reset --hard origin/<branch> && docker
|
||||
compose up -d --build`. Long-running (docker build), so it streams like the
|
||||
vLLM `UpdateManager`: fire-and-forget job, SSE stream, fail-loud rc.
|
||||
- **Logs** — a one-shot `docker logs --tail N` for diagnosing a red badge.
|
||||
|
||||
We connect **directly as the configured user** (`modelo` — the repo owner), so
|
||||
git never trips its dubious-ownership guard and docker runs via the user's
|
||||
docker-group membership. We deliberately do NOT `sudo -iu modelo`: this Spark
|
||||
has no passwordless sudo, so a sudo wrap would hang in SSH BatchMode.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import asyncio
|
||||
import time
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional
|
||||
|
||||
from .config import Settings
|
||||
from .shellsafe import quote_arg
|
||||
from .ssh import ssh_run, ssh_stream, StreamHandle
|
||||
|
||||
# Hard ceiling on a single update. A first build after a base-image bump is
|
||||
# slow (minutes); the cache makes later ones quick. 25 min is generous headroom
|
||||
# without letting a genuinely wedged build spin forever.
|
||||
_UPDATE_TIMEOUT_S = 1500
|
||||
|
||||
|
||||
def build_update_command(directory: str, branch: str) -> str:
|
||||
"""The update one-liner, run from the bot's git clone as its owner.
|
||||
|
||||
`directory` and `branch` come from operator config (not request input), so
|
||||
they're interpolated directly — same trust model as the Spark hostnames in
|
||||
`health`/`updates`. `directory` may be `~/...`, which must stay unquoted so
|
||||
the remote login shell expands it; quoting would defeat that.
|
||||
"""
|
||||
return (
|
||||
f"cd {directory} && "
|
||||
f"git fetch origin && "
|
||||
f"git reset --hard origin/{branch} && "
|
||||
f"docker compose up -d --build"
|
||||
)
|
||||
|
||||
|
||||
def _phase_for(line: str) -> Optional[str]:
|
||||
"""Map a streamed output line to a human-readable phase, or None to keep
|
||||
the current phase. Kept loose — compose/buildkit output varies by version."""
|
||||
low = line.lower()
|
||||
if "git reset" in low or "head is now at" in low:
|
||||
return "Resetting to the latest release…"
|
||||
if "docker compose" in low or "buildkit" in low or low.startswith("step ") or "=> " in line or "building " in low:
|
||||
return "Building the bot image…"
|
||||
if "recreate" in low or "starting" in low or "started" in low or "container matrix-bridge" in low:
|
||||
return "Recreating the container…"
|
||||
if "already up to date" in low:
|
||||
return "No new code; rebuilding…"
|
||||
return None
|
||||
|
||||
|
||||
@dataclass
|
||||
class UpdateJob:
|
||||
id: str
|
||||
started_at: str
|
||||
state: str = "starting"
|
||||
lines: list[str] = field(default_factory=list)
|
||||
returncode: Optional[int] = None
|
||||
finished_at: Optional[str] = None
|
||||
phase: str = "Starting…"
|
||||
|
||||
def append(self, line: str) -> None:
|
||||
self.lines.append(line)
|
||||
if len(self.lines) > 1000:
|
||||
del self.lines[: len(self.lines) - 1000]
|
||||
|
||||
|
||||
class MatrixBridgeManager:
|
||||
def __init__(self, settings: Settings) -> None:
|
||||
self.settings = settings
|
||||
self.lock = asyncio.Lock()
|
||||
self.jobs: dict[str, UpdateJob] = {}
|
||||
self.current_job_id: Optional[str] = None
|
||||
|
||||
def _configured(self) -> bool:
|
||||
s = self.settings
|
||||
return bool(s.matrix_bridge_host and s.matrix_bridge_user)
|
||||
|
||||
def get(self, job_id: str) -> UpdateJob | None:
|
||||
return self.jobs.get(job_id)
|
||||
|
||||
async def fetch_logs(self, tail: int = 100) -> dict:
|
||||
"""One-shot `docker logs --tail N <container>` (stderr merged in)."""
|
||||
s = self.settings
|
||||
if not self._configured():
|
||||
return {"ok": False, "error": "matrix-bridge host not configured"}
|
||||
tail = max(1, min(int(tail), 1000))
|
||||
# tail is already int-clamped, but quote at the sink anyway so the
|
||||
# shellsafe convention (no raw interpolation into an SSH command) holds
|
||||
# regardless of caller.
|
||||
cmd = f"docker logs --tail {quote_arg(str(tail))} {quote_arg(s.matrix_bridge_container)} 2>&1"
|
||||
rc, out, err = await ssh_run(
|
||||
s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, timeout=20
|
||||
)
|
||||
return {
|
||||
"ok": rc == 0,
|
||||
"rc": rc,
|
||||
"container": s.matrix_bridge_container,
|
||||
"output": (out or err).strip(),
|
||||
}
|
||||
|
||||
async def trigger_update(self) -> UpdateJob:
|
||||
if not self._configured():
|
||||
raise RuntimeError("matrix-bridge host not configured")
|
||||
if self.lock.locked():
|
||||
raise RuntimeError("An update is already in progress")
|
||||
job = UpdateJob(
|
||||
id=uuid.uuid4().hex[:8],
|
||||
started_at=datetime.now(timezone.utc).isoformat(),
|
||||
)
|
||||
self.jobs[job.id] = job
|
||||
self.current_job_id = job.id
|
||||
asyncio.create_task(self._run(job))
|
||||
return job
|
||||
|
||||
async def _run(self, job: UpdateJob) -> None:
|
||||
async with self.lock:
|
||||
try:
|
||||
await self._do(job)
|
||||
if job.state != "failed":
|
||||
job.state = "done"
|
||||
job.returncode = 0
|
||||
job.phase = "Done"
|
||||
except asyncio.TimeoutError:
|
||||
job.append(f"[error] update timed out after {_UPDATE_TIMEOUT_S}s")
|
||||
job.state = "failed"
|
||||
job.returncode = 124
|
||||
job.phase = "Timed out"
|
||||
except Exception as e:
|
||||
job.append(f"[error] {type(e).__name__}: {e}")
|
||||
job.state = "failed"
|
||||
if job.returncode is None:
|
||||
job.returncode = 1
|
||||
finally:
|
||||
job.finished_at = datetime.now(timezone.utc).isoformat()
|
||||
if self.current_job_id == job.id:
|
||||
self.current_job_id = None
|
||||
|
||||
async def _do(self, job: UpdateJob) -> None:
|
||||
s = self.settings
|
||||
cmd = build_update_command(s.matrix_bridge_dir, s.matrix_bridge_branch)
|
||||
job.append(f"$ {cmd}")
|
||||
job.state = "running"
|
||||
job.phase = "Fetching latest code…"
|
||||
|
||||
handle = StreamHandle()
|
||||
gen = ssh_stream(s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, handle=handle)
|
||||
deadline = time.monotonic() + _UPDATE_TIMEOUT_S
|
||||
try:
|
||||
while True:
|
||||
remaining = deadline - time.monotonic()
|
||||
if remaining <= 0:
|
||||
raise asyncio.TimeoutError
|
||||
try:
|
||||
line = await asyncio.wait_for(gen.__anext__(), timeout=remaining)
|
||||
except StopAsyncIteration:
|
||||
break
|
||||
job.append(line)
|
||||
phase = _phase_for(line)
|
||||
if phase:
|
||||
job.phase = phase
|
||||
finally:
|
||||
# Closing the generator terminates the underlying ssh process and
|
||||
# populates handle.returncode via ssh_stream's finally block.
|
||||
await gen.aclose()
|
||||
|
||||
rc = handle.returncode or 0
|
||||
if rc != 0:
|
||||
job.state = "failed"
|
||||
job.returncode = rc
|
||||
+77
-7
@@ -1,15 +1,33 @@
|
||||
from __future__ import annotations
|
||||
import logging
|
||||
from typing import Literal, Optional
|
||||
import yaml
|
||||
from pydantic import BaseModel, Field
|
||||
from pydantic import BaseModel, Field, model_validator
|
||||
|
||||
from .overrides import apply_knobs_to_args, load_overrides
|
||||
from .shellsafe import quote_arg, quote_args
|
||||
from .shellsafe import quote_arg, quote_args, validate_local_path
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _chat_template_path(vllm_args: list[str]) -> str | None:
|
||||
"""Extract the path from a `--chat-template=<path>` arg, if present."""
|
||||
for a in vllm_args:
|
||||
if a.startswith("--chat-template="):
|
||||
return a.split("=", 1)[1]
|
||||
return None
|
||||
|
||||
|
||||
def _is_within(path: str, base: str) -> bool:
|
||||
"""True if `path` is `base` itself or lives inside it (lexical check)."""
|
||||
base = base.rstrip("/")
|
||||
return path == base or path.startswith(base + "/")
|
||||
|
||||
|
||||
class ModelDef(BaseModel):
|
||||
display_name: str
|
||||
repo: str
|
||||
repo: str = "" # HF 'org/name'; empty for a local model
|
||||
local_path: str | None = None # absolute dir on the Spark; set => local model
|
||||
size_gb: float
|
||||
mode: Literal["solo", "cluster"]
|
||||
capabilities: list[str] = Field(default_factory=list)
|
||||
@@ -19,6 +37,38 @@ class ModelDef(BaseModel):
|
||||
knobs: dict | None = None # user-customized; merged at launch time
|
||||
custom: bool = False # True if this came from /data overrides
|
||||
|
||||
@model_validator(mode="after")
|
||||
def _validate_source(self) -> "ModelDef":
|
||||
if bool(self.repo) == bool(self.local_path):
|
||||
raise ValueError(
|
||||
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
|
||||
f"or 'local_path' (Spark directory)"
|
||||
)
|
||||
if self.local_path:
|
||||
# Single place that enforces the path whitelist, so YAML/override
|
||||
# entries get the same boundary check as the API. The quote_arg sink
|
||||
# is still defense-in-depth.
|
||||
validate_local_path(self.local_path)
|
||||
# Only local_path is bind-mounted into the vLLM container, so any
|
||||
# --chat-template path must live inside it or vLLM can't find it.
|
||||
tmpl = _chat_template_path(self.vllm_args)
|
||||
if tmpl is not None and not _is_within(tmpl, self.local_path):
|
||||
raise ValueError(
|
||||
f"--chat-template path {tmpl!r} must be inside the model "
|
||||
f"directory {self.local_path!r} (only that directory is mounted "
|
||||
f"into the container)"
|
||||
)
|
||||
return self
|
||||
|
||||
@property
|
||||
def is_local(self) -> bool:
|
||||
return bool(self.local_path)
|
||||
|
||||
@property
|
||||
def source(self) -> str:
|
||||
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
|
||||
return self.local_path if self.local_path else self.repo
|
||||
|
||||
|
||||
class Defaults(BaseModel):
|
||||
port: int = 8888
|
||||
@@ -47,7 +97,8 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
||||
continue
|
||||
defaults_dump = {
|
||||
"display_name": entry.get("display_name", key),
|
||||
"repo": entry["repo"],
|
||||
"repo": entry.get("repo", ""),
|
||||
"local_path": entry.get("local_path"),
|
||||
"size_gb": float(entry.get("size_gb", 0)),
|
||||
"mode": entry.get("mode", "solo"),
|
||||
"capabilities": entry.get("capabilities") or [],
|
||||
@@ -57,7 +108,12 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
|
||||
"knobs": entry.get("knobs"),
|
||||
"custom": True,
|
||||
}
|
||||
# A single malformed override entry (bad path, missing source, etc.) must
|
||||
# not take down the whole catalog — skip it and keep the rest loadable.
|
||||
try:
|
||||
new_models[key] = ModelDef.model_validate(defaults_dump)
|
||||
except Exception as e:
|
||||
log.warning("skipping invalid custom model %r: %s", key, e)
|
||||
|
||||
return Catalog(defaults=catalog.defaults, models=new_models)
|
||||
|
||||
@@ -78,7 +134,21 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
|
||||
solo = "--solo " if model.mode == "solo" else ""
|
||||
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
|
||||
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
|
||||
# repo + args are user-controlled (custom models, knobs); shlex.quote each so
|
||||
# they cannot break out of the SSH shell command. shlex.split (used by the
|
||||
# source + args are user-controlled (custom models, knobs); shlex.quote each
|
||||
# so they cannot break out of the SSH shell command. shlex.split (used by the
|
||||
# vLLM pre-flight validator) cleanly reverses this quoting.
|
||||
return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}"
|
||||
prefix = ""
|
||||
if model.local_path:
|
||||
# A local model's directory isn't in the HF cache the launch script
|
||||
# already mounts, so bind-mount it at the SAME path inside the vllm
|
||||
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
|
||||
# path inside and out means `vllm serve <dir>` and any
|
||||
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
|
||||
# change needed. (The env assignment sits before the script, so the
|
||||
# validator's `serve`-keyed shlex round-trip is unaffected.)
|
||||
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
|
||||
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
|
||||
return (
|
||||
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
|
||||
f"{quote_arg(model.source)} {quote_args(args)}"
|
||||
)
|
||||
|
||||
@@ -14,7 +14,7 @@ Shape:
|
||||
custom:
|
||||
- key: my-new-model
|
||||
display_name: My New Model (from download)
|
||||
repo: my-org/my-model
|
||||
repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
|
||||
size_gb: 20
|
||||
mode: solo
|
||||
description: null
|
||||
@@ -25,6 +25,12 @@ Shape:
|
||||
fastsafetensors: true
|
||||
prefix_caching: true
|
||||
kv_cache_dtype: fp8
|
||||
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
|
||||
display_name: My Fine-tune
|
||||
local_path: /home/you/models/my-finetune
|
||||
size_gb: 59
|
||||
mode: solo
|
||||
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import os
|
||||
|
||||
+167
-9
@@ -3,10 +3,10 @@ import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from fastapi import FastAPI, HTTPException, Query, Request
|
||||
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, ValidationError
|
||||
from typing import Literal
|
||||
|
||||
from .config import Settings
|
||||
@@ -21,7 +21,8 @@ from .embeddings_proxy import build_router as build_embeddings_router
|
||||
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
||||
from .hardware import HardwareProbe
|
||||
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
|
||||
from .models import load_catalog
|
||||
from .matrix_bridge import MatrixBridgeManager
|
||||
from .models import ModelDef, load_catalog
|
||||
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
||||
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
|
||||
from .services import docker_state, run_action, services_from_settings
|
||||
@@ -43,6 +44,7 @@ hardware_probe = HardwareProbe(settings)
|
||||
nim_manager = NimManager(settings)
|
||||
deep_health = DeepHealth(settings)
|
||||
speech_models = SpeechModelsManager(settings)
|
||||
matrix_bridge = MatrixBridgeManager(settings)
|
||||
|
||||
app = FastAPI(title="spark-control", version="0.1.0")
|
||||
|
||||
@@ -181,7 +183,8 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
|
||||
class CustomModelBody(BaseModel):
|
||||
key: str
|
||||
display_name: str
|
||||
repo: str
|
||||
repo: str = ""
|
||||
local_path: str | None = None
|
||||
size_gb: float = 0
|
||||
mode: Literal["solo", "cluster"] = "solo"
|
||||
description: str | None = None
|
||||
@@ -194,8 +197,17 @@ class CustomModelBody(BaseModel):
|
||||
async def post_model(body: CustomModelBody) -> dict:
|
||||
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
|
||||
raise HTTPException(400, "key must be alphanumeric/-/_ only")
|
||||
# Validate the full entry BEFORE persisting (exactly-one source, local-path
|
||||
# whitelist, chat-template location). Doing it via ModelDef means the API and
|
||||
# the YAML-override path share one set of rules, and a bad entry can't be
|
||||
# written to /data and then break catalog load.
|
||||
try:
|
||||
validate_repo(body.repo)
|
||||
ModelDef.model_validate(body.model_dump())
|
||||
if body.repo:
|
||||
validate_repo(body.repo) # HF charset (the model only validates local paths)
|
||||
except ValidationError as e:
|
||||
msg = e.errors()[0]["msg"] if e.errors() else str(e)
|
||||
raise HTTPException(400, msg.removeprefix("Value error, "))
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
if body.key in catalog.models and not catalog.models[body.key].custom:
|
||||
@@ -227,7 +239,13 @@ async def get_models_disk_status() -> dict:
|
||||
return {"configured": False, "models": {}}
|
||||
keys = list(catalog.models.keys())
|
||||
statuses = await asyncio.gather(*(
|
||||
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
|
||||
probe_disk(
|
||||
catalog.models[k].repo,
|
||||
catalog.models[k].mode,
|
||||
settings,
|
||||
local_path=catalog.models[k].local_path,
|
||||
)
|
||||
for k in keys
|
||||
), return_exceptions=True)
|
||||
out: dict[str, dict] = {}
|
||||
for k, s in zip(keys, statuses):
|
||||
@@ -258,6 +276,14 @@ async def del_model_disk(key: str) -> dict:
|
||||
raise HTTPException(404, f"unknown model: {key}")
|
||||
m = catalog.models[key]
|
||||
|
||||
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
|
||||
# training output the user placed by hand, not a re-downloadable HF cache.
|
||||
if m.local_path:
|
||||
raise HTTPException(
|
||||
400,
|
||||
"this is a local model; its directory must be managed on the Spark, not deleted from here",
|
||||
)
|
||||
|
||||
# Refuse if currently loaded
|
||||
try:
|
||||
vllm = await check_vllm(settings)
|
||||
@@ -401,6 +427,53 @@ async def wake_spark(name: str) -> dict:
|
||||
return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
|
||||
|
||||
|
||||
@app.post("/api/spark/{name}/ssh-key")
|
||||
async def spark_ssh_key(name: str) -> dict:
|
||||
"""Ensure the named Spark has an ed25519 keypair and return its PUBLIC key.
|
||||
|
||||
This is the Spark's *outbound* identity — the key it uses to log in to other
|
||||
machines (e.g. the operator's Mac). It is the opposite direction from, and
|
||||
distinct from, the package's own key shown by the StartOS "Show Public Key"
|
||||
action (which grants this dashboard SSH access to the Sparks).
|
||||
|
||||
Non-destructive: generates the key only if absent, never overwrites an
|
||||
existing one (which may already be an identity the Spark uses elsewhere).
|
||||
Public keys are not secret, so returning it is safe. No request-supplied
|
||||
value reaches the command — `name` is constrained to a fixed set and
|
||||
host/user come from operator config — so there is nothing to shell-quote.
|
||||
"""
|
||||
if name not in ("spark1", "spark2"):
|
||||
raise HTTPException(404, f"unknown spark: {name}")
|
||||
host = settings.spark1_host if name == "spark1" else settings.spark2_host
|
||||
user = settings.spark1_user if name == "spark1" else settings.spark2_user
|
||||
if not host or not user:
|
||||
raise HTTPException(400, f"{name} is not configured")
|
||||
# Empty passphrase so the key is usable unattended; comment carries the
|
||||
# remote hostname so it's identifiable in an authorized_keys file later.
|
||||
cmd = (
|
||||
"set -e; "
|
||||
"mkdir -p ~/.ssh && chmod 700 ~/.ssh; "
|
||||
"if [ ! -f ~/.ssh/id_ed25519 ]; then "
|
||||
'ssh-keygen -t ed25519 -N "" -C "spark-control@$(hostname)" -f ~/.ssh/id_ed25519 >/dev/null 2>&1; '
|
||||
"echo CREATED=1; else echo CREATED=0; fi; "
|
||||
"[ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -y -f ~/.ssh/id_ed25519 > ~/.ssh/id_ed25519.pub; "
|
||||
"echo PUBKEY=$(cat ~/.ssh/id_ed25519.pub)"
|
||||
)
|
||||
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=15)
|
||||
if rc != 0:
|
||||
raise HTTPException(502, f"couldn't read/create the SSH key on {name}: {err.strip() or out.strip() or f'rc={rc}'}")
|
||||
created = False
|
||||
pubkey = ""
|
||||
for line in out.splitlines():
|
||||
if line.startswith("CREATED="):
|
||||
created = line.strip() == "CREATED=1"
|
||||
elif line.startswith("PUBKEY="):
|
||||
pubkey = line[len("PUBKEY="):].strip()
|
||||
if not pubkey:
|
||||
raise HTTPException(502, f"no public key returned from {name}")
|
||||
return {"ok": True, "spark": name, "host": host, "user": user, "pubkey": pubkey, "created": created}
|
||||
|
||||
|
||||
@app.get("/api/services")
|
||||
async def get_services() -> dict:
|
||||
"""Lifecycle state of always-on support services (Parakeet, Kokoro, …).
|
||||
@@ -427,6 +500,11 @@ async def get_services() -> dict:
|
||||
http = await check_embeddings(settings)
|
||||
elif name == "qdrant":
|
||||
http = await check_qdrant(settings)
|
||||
elif svc.kind == "bot":
|
||||
# No HTTP health endpoint (host networking, no port) — judged purely
|
||||
# by docker state. http_ready stays None so the badge isn't pinned
|
||||
# to a "Starting…" verdict that can never clear.
|
||||
http = {"ok": None, "base_url": None}
|
||||
else:
|
||||
# Custom services expose a /health endpoint by convention.
|
||||
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
|
||||
@@ -437,7 +515,9 @@ async def get_services() -> dict:
|
||||
"container": svc.container,
|
||||
"kind": svc.kind,
|
||||
"base_url": http.get("base_url"),
|
||||
"http_ready": bool(http.get("ok")),
|
||||
# None (not False) for services with no HTTP surface (the bot), so
|
||||
# the UI judges them by docker state alone instead of "Starting…".
|
||||
"http_ready": None if svc.kind == "bot" else bool(http.get("ok")),
|
||||
# Prefer the check fn's own top-level model key (embeddings reports
|
||||
# it there); fall back to a model field inside detail for services
|
||||
# whose /health embeds it (parakeet).
|
||||
@@ -453,7 +533,10 @@ async def get_services() -> dict:
|
||||
results = await asyncio.gather(*[one(n) for n in services.keys()])
|
||||
for name, info in results:
|
||||
out[name] = info
|
||||
# Feed http reachability into the connectivity log (transition-only)
|
||||
# Feed http reachability into the connectivity log (transition-only).
|
||||
# Skip services with no HTTP surface (http_ready is None) — they'd
|
||||
# otherwise register as perpetually "down".
|
||||
if info.get("http_ready") is not None:
|
||||
record_state(name, bool(info.get("http_ready")))
|
||||
return out
|
||||
|
||||
@@ -559,7 +642,7 @@ async def stream_nim_install(job_id: str):
|
||||
@app.delete("/api/services/{name}")
|
||||
async def del_service(name: str) -> dict:
|
||||
# Only allow deleting custom services (not the bundled built-in keys)
|
||||
if name in ("parakeet", "kokoro", "embeddings", "qdrant"):
|
||||
if name in ("parakeet", "kokoro", "embeddings", "qdrant", "matrix-bridge"):
|
||||
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
|
||||
delete_custom_service(name)
|
||||
return {"ok": True, "name": name}
|
||||
@@ -578,6 +661,81 @@ async def service_action(name: str, action: str) -> dict:
|
||||
return {"name": name, "action": action, **result}
|
||||
|
||||
|
||||
# ---- matrix-bridge bot: update (git pull + rebuild) + logs ----
|
||||
# Status badge + start/stop/restart ride the generic /api/services machinery
|
||||
# above (the bot is a registered ServiceDef). Only the long-running Update and
|
||||
# the logs view need bespoke endpoints.
|
||||
|
||||
def _serialize_mb_update(job) -> dict:
|
||||
return {
|
||||
"id": job.id,
|
||||
"state": job.state,
|
||||
"phase": job.phase,
|
||||
"started_at": job.started_at,
|
||||
"finished_at": job.finished_at,
|
||||
"returncode": job.returncode,
|
||||
"lines": job.lines,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/api/matrix-bridge/update")
|
||||
async def post_matrix_bridge_update() -> dict:
|
||||
"""Pull latest code, rebuild, and recreate the bot container. Long-running
|
||||
(docker build) — returns a job id to stream."""
|
||||
try:
|
||||
job = await matrix_bridge.trigger_update()
|
||||
except RuntimeError as e:
|
||||
raise HTTPException(409 if "in progress" in str(e) else 503, str(e))
|
||||
return {"job_id": job.id, "state": job.state}
|
||||
|
||||
|
||||
@app.get("/api/matrix-bridge/update/{job_id}")
|
||||
async def get_matrix_bridge_update(job_id: str) -> dict:
|
||||
job = matrix_bridge.get(job_id)
|
||||
if job is None:
|
||||
raise HTTPException(404, "no such job")
|
||||
return _serialize_mb_update(job)
|
||||
|
||||
|
||||
@app.get("/api/matrix-bridge/update/{job_id}/stream")
|
||||
async def stream_matrix_bridge_update(job_id: str, request: Request):
|
||||
job = matrix_bridge.get(job_id)
|
||||
if job is None:
|
||||
raise HTTPException(404, "no such job")
|
||||
|
||||
async def gen():
|
||||
sent = 0
|
||||
last_phase = None
|
||||
while True:
|
||||
# An update can run for minutes; bail promptly if the client is gone
|
||||
# rather than spinning the poll loop until the job's 25-min ceiling.
|
||||
if await request.is_disconnected():
|
||||
return
|
||||
n = len(job.lines)
|
||||
if n > sent:
|
||||
for line in job.lines[sent:n]:
|
||||
yield f"data: {json.dumps({'line': line})}\n\n"
|
||||
sent = n
|
||||
if job.phase != last_phase:
|
||||
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
|
||||
last_phase = job.phase
|
||||
if job.returncode is not None and sent >= len(job.lines):
|
||||
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
|
||||
return
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
return StreamingResponse(gen(), media_type="text/event-stream")
|
||||
|
||||
|
||||
@app.get("/api/matrix-bridge/logs")
|
||||
async def get_matrix_bridge_logs(tail: int = Query(100, ge=1, le=1000)) -> dict:
|
||||
"""Last N lines of `docker logs` for the bot container (stderr merged)."""
|
||||
result = await matrix_bridge.fetch_logs(tail=tail)
|
||||
if not result.get("ok"):
|
||||
raise HTTPException(502, result.get("output") or result.get("error") or "could not read logs")
|
||||
return result
|
||||
|
||||
|
||||
# ---- Speech model patch management ----
|
||||
|
||||
@app.get("/api/speech-models")
|
||||
|
||||
@@ -89,6 +89,17 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
||||
container=s.qdrant_container,
|
||||
port=s.qdrant_port,
|
||||
),
|
||||
# matrix-bridge Matrix bot. No HTTP port to probe (host networking, no
|
||||
# health endpoint) — judged purely by docker state. Driven as its own
|
||||
# SSH user (modelo, the repo owner) so git/docker run unprivileged.
|
||||
"matrix-bridge": ServiceDef(
|
||||
name="matrix-bridge",
|
||||
kind="bot",
|
||||
host=s.matrix_bridge_host,
|
||||
user=s.matrix_bridge_user,
|
||||
container=s.matrix_bridge_container,
|
||||
port=0,
|
||||
),
|
||||
}
|
||||
for entry in load_custom_services():
|
||||
key = entry.get("key")
|
||||
|
||||
@@ -28,6 +28,12 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
|
||||
# Docker container / volume name (Docker's own rule).
|
||||
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
|
||||
|
||||
# Absolute filesystem path to a local model directory on a Spark. Conservative
|
||||
# charset (letters, digits, and safe path punctuation) with a required leading
|
||||
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
|
||||
# and '..' segments) is rejected separately in validate_local_path.
|
||||
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
|
||||
|
||||
|
||||
def validate_repo(repo: str) -> str:
|
||||
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
|
||||
@@ -50,6 +56,25 @@ def validate_container(name: str) -> str:
|
||||
return name
|
||||
|
||||
|
||||
def validate_local_path(path: str) -> str:
|
||||
"""Return `path` if it is a safe absolute model directory path; else ValueError.
|
||||
|
||||
For locally fine-tuned models served by directory (not an HF repo). Requires
|
||||
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
|
||||
caller cannot traverse out of an intended models directory. The `quote_arg`
|
||||
sink still quotes it in depth — this is the boundary check.
|
||||
"""
|
||||
p = path or ""
|
||||
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
|
||||
raise ValueError(
|
||||
f"invalid local model path (expected an absolute path, no spaces or "
|
||||
f"shell metacharacters): {path!r}"
|
||||
)
|
||||
if any(seg in (".", "..") for seg in p.split("/")):
|
||||
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
|
||||
return p
|
||||
|
||||
|
||||
def quote_arg(value: object) -> str:
|
||||
"""shlex.quote a single token for safe embedding in a shell command string."""
|
||||
return shlex.quote(str(value))
|
||||
|
||||
+254
-9
@@ -13,6 +13,7 @@ const state = {
|
||||
swap_progress: 0, // 0–1
|
||||
services: {},
|
||||
service_action_in_flight: null, // e.g. "parakeet:restart"
|
||||
mb_update_in_flight: false, // matrix-bridge update job running
|
||||
hardware: {},
|
||||
config: {},
|
||||
configured: true,
|
||||
@@ -59,6 +60,7 @@ function renderCards() {
|
||||
? `<div class="desc">${escapeHtml(m.description)}</div>`
|
||||
: '';
|
||||
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
|
||||
const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
|
||||
// Disk-presence pill + trash button. Until /api/models/disk-status comes back,
|
||||
// we don't know — render a neutral placeholder.
|
||||
const disk = state.disk_status[key];
|
||||
@@ -72,8 +74,10 @@ function renderCards() {
|
||||
}
|
||||
}
|
||||
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
|
||||
// Never offered for local models: their directory is hand-placed training output,
|
||||
// not a re-downloadable HF cache (the server refuses the delete too).
|
||||
let trashBtn = '';
|
||||
if (state.disk_status_loaded && disk && disk.on_disk) {
|
||||
if (state.disk_status_loaded && disk && disk.on_disk && !m.local_path) {
|
||||
const disabled = isActive || isSwapping;
|
||||
const tip = isActive
|
||||
? 'Currently loaded — switch to another model first'
|
||||
@@ -91,6 +95,9 @@ function renderCards() {
|
||||
primaryBtn = `<button class="btn" disabled>Current</button>`;
|
||||
} else if (isOnDisk) {
|
||||
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
|
||||
} else if (m.local_path) {
|
||||
// A local model can't be "downloaded" — its directory has to exist on the Spark.
|
||||
primaryBtn = `<button class="btn" disabled title="Directory not found on the Spark — create it there, then refresh">Not found on Spark</button>`;
|
||||
} else {
|
||||
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
|
||||
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
|
||||
@@ -101,12 +108,15 @@ function renderCards() {
|
||||
<span class="tag mode-${m.mode}">${m.mode}</span>
|
||||
<span class="tag">${m.size_gb} GB</span>
|
||||
${customPill}
|
||||
${localPill}
|
||||
${diskPill}
|
||||
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
|
||||
</div>
|
||||
${desc}
|
||||
<div class="muted small repo">
|
||||
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>
|
||||
${m.local_path
|
||||
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
|
||||
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
|
||||
</div>
|
||||
<div class="spacer"></div>
|
||||
<div class="card-actions">
|
||||
@@ -305,6 +315,32 @@ async function wakeSpark(name) {
|
||||
}
|
||||
}
|
||||
|
||||
// Generate-if-missing + copy this Spark's OUTBOUND ssh public key (the key the
|
||||
// Spark uses to log in to other machines, e.g. the Mac). Distinct from the
|
||||
// package's own key in the StartOS "Show Public Key" action.
|
||||
async function copySparkSshKey(name, btn) {
|
||||
if (btn) btn.disabled = true;
|
||||
try {
|
||||
const r = await fetchJSON(`/api/spark/${name}/ssh-key`, { method: 'POST' });
|
||||
// Best-effort clipboard copy; on plain-HTTP this no-ops, but the dialog
|
||||
// below always shows the key for manual selection.
|
||||
await copyText(r.pubkey, btn);
|
||||
const label = r.host ? `${name} (${r.host})` : name;
|
||||
el('#sshkey-title').textContent = `${name} — SSH public key`;
|
||||
el('#sshkey-intro').textContent = r.created
|
||||
? `Generated a new SSH key on ${label} and copied it to your clipboard. This is the key ${name} uses to log in to OTHER machines.`
|
||||
: `${label} already had an SSH key; copied its public key to your clipboard. This is the key ${name} uses to log in to OTHER machines.`;
|
||||
el('#sshkey-value').textContent = r.pubkey;
|
||||
el('#sshkey-install').textContent =
|
||||
`mkdir -p ~/.ssh && echo '${r.pubkey}' >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys`;
|
||||
el('#sshkey-dialog').showModal();
|
||||
} catch (e) {
|
||||
alert(`Couldn't get the SSH key for ${name}: ${e.message}`);
|
||||
} finally {
|
||||
if (btn) btn.disabled = false;
|
||||
}
|
||||
}
|
||||
|
||||
function renderHardware() {
|
||||
const panel = el('#hardware-panel');
|
||||
const grid = el('#hardware-grid');
|
||||
@@ -358,11 +394,21 @@ function renderHardware() {
|
||||
if (s.gpu_temp_c != null) gpuExtras.push(`${s.gpu_temp_c}°C`);
|
||||
if (s.gpu_power_w != null) gpuExtras.push(`${s.gpu_power_w.toFixed(0)}W`);
|
||||
const gpuExtrasStr = gpuExtras.length ? ` · ${gpuExtras.join(' · ')}` : '';
|
||||
// Read-only WireGuard badge: shown only when the Spark has a wg interface up.
|
||||
// "VPN <ip>" means it's a peer on that tunnel (reachable off-LAN when the
|
||||
// tunnel is up); it reflects interface presence, not live peer reachability.
|
||||
const wgIp = s.wg_addr ? String(s.wg_addr).split('/')[0] : '';
|
||||
const wgBadge = s.wg_iface
|
||||
? ` · <span class="wg-badge" title="On WireGuard tunnel '${escapeHtml(s.wg_iface)}'${wgIp ? ' as ' + escapeHtml(wgIp) : ''} — reachable off-LAN while the tunnel is up">VPN${wgIp ? ' ' + escapeHtml(wgIp) : ''}</span>`
|
||||
: '';
|
||||
card.className = 'hw-card';
|
||||
card.innerHTML = `
|
||||
<div class="head">
|
||||
<span class="name">${escapeHtml(s.hostname || key)}</span>
|
||||
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}</span>
|
||||
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}${wgBadge}</span>
|
||||
<button class="icon-btn ssh-key-btn" data-ssh-key="${escapeHtml(key)}" title="Copy this Spark's SSH public key (creates one if it doesn't have one) — e.g. to let it log in to your Mac" aria-label="Copy SSH public key">
|
||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
||||
</button>
|
||||
</div>
|
||||
<div class="hw-metric">
|
||||
<span class="label">CPU</span>
|
||||
@@ -402,8 +448,13 @@ function classifyService(s) {
|
||||
if (s.docker_state === 'missing') return 'missing';
|
||||
if (s.docker_state === 'restarting') return 'unhealthy';
|
||||
if (s.docker_state === 'exited') return 'unhealthy';
|
||||
if (s.docker_state === 'running' && !s.http_ready) return 'starting';
|
||||
if (s.docker_state === 'running' && s.http_ready) return 'running';
|
||||
if (s.docker_state === 'running') {
|
||||
// http_ready === false means an HTTP probe is expected but failing → still
|
||||
// warming up. null means the service has no HTTP surface (e.g. the bot), so
|
||||
// a running container is simply healthy.
|
||||
if (s.http_ready === false) return 'starting';
|
||||
return 'running';
|
||||
}
|
||||
return s.docker_state || 'unknown';
|
||||
}
|
||||
|
||||
@@ -435,6 +486,11 @@ async function renderServices() {
|
||||
grid.innerHTML = '';
|
||||
for (const [name, s] of entries) {
|
||||
const cls = classifyService(s);
|
||||
const isBot = s.kind === 'bot';
|
||||
// The bot tile is opt-in: it only belongs to deployments that actually run
|
||||
// matrix-bridge. When the container is absent (missing) or the host isn't
|
||||
// configured, hide the tile entirely rather than show a stray red card.
|
||||
if (isBot && (cls === 'missing' || cls === 'unconfigured')) continue;
|
||||
const card = document.createElement('div');
|
||||
card.className = `service-card ${cls}`;
|
||||
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
|
||||
@@ -447,7 +503,7 @@ async function renderServices() {
|
||||
return false;
|
||||
};
|
||||
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
|
||||
const hostStr = s.host ? `${s.host}:${s.port}` : '';
|
||||
const hostStr = s.host ? (s.port ? `${s.host}:${s.port}` : s.host) : '';
|
||||
const hostRow = s.host
|
||||
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
|
||||
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
|
||||
@@ -501,9 +557,11 @@ async function renderServices() {
|
||||
${restartsRow}
|
||||
${deepRow}
|
||||
<div class="service-actions">
|
||||
${isBot ? `<button class="btn primary" data-mb-update title="Pull latest code, rebuild, and recreate the bot" ${inFlight || state.mb_update_in_flight ? 'disabled' : ''}>Update</button>` : ''}
|
||||
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
|
||||
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
|
||||
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
|
||||
${isBot ? `<button class="btn" data-mb-logs title="Show the last 100 log lines">View logs</button>` : ''}
|
||||
</div>
|
||||
`;
|
||||
grid.appendChild(card);
|
||||
@@ -511,6 +569,10 @@ async function renderServices() {
|
||||
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
|
||||
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
|
||||
}
|
||||
const mbUpdateBtn = grid.querySelector('[data-mb-update]');
|
||||
if (mbUpdateBtn) mbUpdateBtn.addEventListener('click', onMatrixBridgeUpdate);
|
||||
const mbLogsBtn = grid.querySelector('[data-mb-logs]');
|
||||
if (mbLogsBtn) mbLogsBtn.addEventListener('click', openMatrixBridgeLogs);
|
||||
for (const btn of grid.querySelectorAll('[data-dh-run]')) {
|
||||
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
|
||||
}
|
||||
@@ -689,6 +751,118 @@ async function onServiceAction(key) {
|
||||
}
|
||||
}
|
||||
|
||||
// ===================== matrix-bridge bot (update + logs) =====================
|
||||
|
||||
const mbState = { job_id: null, eventsource: null, timer: null, started_at: null };
|
||||
|
||||
function mbTimerStart(at) {
|
||||
mbState.started_at = at;
|
||||
if (mbState.timer) clearInterval(mbState.timer);
|
||||
const tick = () => {
|
||||
if (!mbState.started_at) return;
|
||||
const sec = Math.max(0, Math.floor((Date.now() - mbState.started_at) / 1000));
|
||||
el('#mb-update-elapsed').textContent = `${Math.floor(sec / 60)}:${(sec % 60).toString().padStart(2, '0')}`;
|
||||
};
|
||||
tick();
|
||||
mbState.timer = setInterval(tick, 500);
|
||||
}
|
||||
|
||||
async function onMatrixBridgeUpdate() {
|
||||
if (state.mb_update_in_flight) return;
|
||||
if (!confirm('Update the matrix-bridge bot?\n\nThis pulls the latest code, rebuilds the container image, and recreates the container. The first build after a base-image change can take several minutes. The bot is briefly offline while it restarts.')) return;
|
||||
state.mb_update_in_flight = true;
|
||||
renderServices();
|
||||
try {
|
||||
const r = await fetchJSON('/api/matrix-bridge/update', { method: 'POST' });
|
||||
attachMbUpdateProgress(r.job_id);
|
||||
} catch (e) {
|
||||
state.mb_update_in_flight = false;
|
||||
renderServices();
|
||||
alert('Update failed to start: ' + e.message);
|
||||
}
|
||||
}
|
||||
|
||||
async function attachMbUpdateProgress(jobId) {
|
||||
mbState.job_id = jobId;
|
||||
el('#mb-update-log').textContent = '';
|
||||
el('#mb-update-title').textContent = 'Updating matrix-bridge…';
|
||||
el('#mb-update-phase').textContent = 'Starting…';
|
||||
el('#mb-update-dialog').showModal();
|
||||
try {
|
||||
const snap = await fetchJSON(`/api/matrix-bridge/update/${jobId}`);
|
||||
mbTimerStart(Date.parse(snap.started_at));
|
||||
el('#mb-update-phase').textContent = snap.phase || 'Working…';
|
||||
el('#mb-update-log').textContent = (snap.lines || []).join('\n');
|
||||
if (snap.returncode !== null) { onMbUpdateDone(snap); return; }
|
||||
} catch { mbTimerStart(Date.now()); }
|
||||
const es = new EventSource(`/api/matrix-bridge/update/${jobId}/stream`);
|
||||
mbState.eventsource = es;
|
||||
es.onmessage = ev => {
|
||||
try {
|
||||
const d = JSON.parse(ev.data);
|
||||
if (d.line !== undefined) {
|
||||
const log = el('#mb-update-log');
|
||||
log.textContent += d.line + '\n';
|
||||
log.scrollTop = log.scrollHeight;
|
||||
}
|
||||
} catch {}
|
||||
};
|
||||
es.addEventListener('phase', ev => {
|
||||
try { el('#mb-update-phase').textContent = JSON.parse(ev.data).phase; } catch {}
|
||||
});
|
||||
es.addEventListener('done', ev => {
|
||||
let d = {}; try { d = JSON.parse(ev.data); } catch {}
|
||||
onMbUpdateDone(d);
|
||||
});
|
||||
es.onerror = () => {
|
||||
// Don't leave the Update button wedged-disabled on a dropped stream. The
|
||||
// job keeps running server-side; re-clicking Update returns a clean 409.
|
||||
es.close();
|
||||
mbState.eventsource = null;
|
||||
state.mb_update_in_flight = false;
|
||||
el('#mb-update-phase').textContent = 'Lost connection to the update stream — reopen or check logs.';
|
||||
renderServices();
|
||||
};
|
||||
}
|
||||
|
||||
function onMbUpdateDone(d) {
|
||||
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
|
||||
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
|
||||
state.mb_update_in_flight = false;
|
||||
if (d.state === 'failed') {
|
||||
el('#mb-update-title').textContent = `Update failed (rc=${d.returncode})`;
|
||||
el('#mb-update-phase').textContent = 'Failed — see the log above.';
|
||||
} else {
|
||||
el('#mb-update-title').textContent = 'Update complete';
|
||||
el('#mb-update-phase').textContent = 'Done ✓';
|
||||
}
|
||||
// Refresh the tile's badge.
|
||||
(async () => { try { state.services = await fetchJSON('/api/services'); } catch {} renderServices(); })();
|
||||
}
|
||||
|
||||
async function openMatrixBridgeLogs() {
|
||||
const pre = el('#mb-logs-pre');
|
||||
el('#mb-logs-title').textContent = 'matrix-bridge logs';
|
||||
pre.textContent = 'Loading…';
|
||||
el('#mb-logs-dialog').showModal();
|
||||
await loadMatrixBridgeLogs();
|
||||
}
|
||||
|
||||
async function loadMatrixBridgeLogs() {
|
||||
const pre = el('#mb-logs-pre');
|
||||
const btn = el('#mb-logs-refresh');
|
||||
if (btn) btn.disabled = true;
|
||||
try {
|
||||
const r = await fetchJSON('/api/matrix-bridge/logs?tail=100');
|
||||
pre.textContent = r.output || '(no output)';
|
||||
pre.scrollTop = pre.scrollHeight;
|
||||
} catch (e) {
|
||||
pre.textContent = 'Could not read logs: ' + e.message;
|
||||
} finally {
|
||||
if (btn) btn.disabled = false;
|
||||
}
|
||||
}
|
||||
|
||||
function renderEndpoint(status) {
|
||||
const v = status.vllm || {};
|
||||
const panel = el('#endpoint-panel');
|
||||
@@ -1506,6 +1680,60 @@ function setupAdvancedDialog() {
|
||||
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
||||
}
|
||||
|
||||
function openLocalModelDialog() {
|
||||
const dlg = el('#local-model-dialog');
|
||||
el('#lm-key').value = '';
|
||||
el('#lm-name').value = '';
|
||||
el('#lm-path').value = '';
|
||||
el('#lm-chat').value = '';
|
||||
el('#lm-size').value = '';
|
||||
el('#lm-mode').value = 'solo';
|
||||
el('#lm-desc').value = '';
|
||||
el('#lm-mml').value = 32768;
|
||||
el('#lm-gmu').value = 0.85;
|
||||
el('#lm-gmu-out').value = '0.85';
|
||||
el('#lm-fst').checked = true;
|
||||
el('#lm-pcache').checked = true;
|
||||
el('#lm-fp8').checked = true;
|
||||
dlg.showModal();
|
||||
}
|
||||
|
||||
function setupLocalModelDialog() {
|
||||
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
|
||||
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
|
||||
el('#local-model-form').addEventListener('submit', async (e) => {
|
||||
e.preventDefault();
|
||||
const chat = el('#lm-chat').value.trim();
|
||||
const body = {
|
||||
key: el('#lm-key').value.trim(),
|
||||
display_name: el('#lm-name').value.trim(),
|
||||
local_path: el('#lm-path').value.trim(),
|
||||
size_gb: parseFloat(el('#lm-size').value) || 0,
|
||||
mode: el('#lm-mode').value,
|
||||
description: el('#lm-desc').value.trim() || null,
|
||||
// A fine-tune's chat template (if any) rides along as a launch flag.
|
||||
vllm_args: chat ? [`--chat-template=${chat}`] : [],
|
||||
knobs: {
|
||||
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
|
||||
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
|
||||
fastsafetensors: el('#lm-fst').checked,
|
||||
prefix_caching: el('#lm-pcache').checked,
|
||||
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
|
||||
},
|
||||
};
|
||||
try {
|
||||
await fetchJSON('/api/models', {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
body: JSON.stringify(body),
|
||||
});
|
||||
el('#local-model-dialog').close();
|
||||
await loadModels();
|
||||
pollStatus();
|
||||
} catch (e) { alert('Add local model failed: ' + e.message); }
|
||||
});
|
||||
}
|
||||
|
||||
// ===================== NIM installer =====================
|
||||
|
||||
const nimState = {
|
||||
@@ -1847,15 +2075,32 @@ async function init() {
|
||||
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
|
||||
el('#nim-form').addEventListener('submit', submitNim);
|
||||
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
|
||||
el('#mb-update-close').addEventListener('click', () => el('#mb-update-dialog').close());
|
||||
// Dismissing the modal (Close or Esc) stops streaming; the job runs on
|
||||
// server-side and re-clicking Update returns a 409 if still in progress.
|
||||
el('#mb-update-dialog').addEventListener('close', () => {
|
||||
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
|
||||
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
|
||||
state.mb_update_in_flight = false;
|
||||
renderServices();
|
||||
});
|
||||
el('#mb-logs-close').addEventListener('click', () => el('#mb-logs-dialog').close());
|
||||
el('#mb-logs-refresh').addEventListener('click', loadMatrixBridgeLogs);
|
||||
el('#open-connectivity').addEventListener('click', openConnectivityDialog);
|
||||
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
|
||||
// Wake-on-LAN buttons live on unreachable hardware cards; delegate.
|
||||
// Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on
|
||||
// reachable ones) are rendered dynamically, so delegate from the grid.
|
||||
el('#hardware-grid').addEventListener('click', (e) => {
|
||||
const btn = e.target.closest('[data-wake]');
|
||||
if (btn) wakeSpark(btn.dataset.wake);
|
||||
const wbtn = e.target.closest('[data-wake]');
|
||||
if (wbtn) { wakeSpark(wbtn.dataset.wake); return; }
|
||||
const kbtn = e.target.closest('[data-ssh-key]');
|
||||
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
|
||||
});
|
||||
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
|
||||
el('#open-local').addEventListener('click', openLocalModelDialog);
|
||||
setupCatalogDialog();
|
||||
setupAdvancedDialog();
|
||||
setupLocalModelDialog();
|
||||
// Open WebUI link from /api/config
|
||||
try {
|
||||
state.config = await fetchJSON('/api/config');
|
||||
|
||||
@@ -164,6 +164,37 @@
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="mb-update-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form">
|
||||
<h3 id="mb-update-title">Updating matrix-bridge…</h3>
|
||||
<div class="phase-row">
|
||||
<div class="phase" id="mb-update-phase">Starting…</div>
|
||||
<span class="spacer"></span>
|
||||
<span class="timer" id="mb-update-elapsed">0:00</span>
|
||||
</div>
|
||||
<details open>
|
||||
<summary class="muted small">Log</summary>
|
||||
<pre id="mb-update-log" class="log"></pre>
|
||||
</details>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="mb-update-close" class="btn">Close</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="mb-logs-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form">
|
||||
<h3 id="mb-logs-title">matrix-bridge logs</h3>
|
||||
<p class="muted small">Last 100 lines from <code>docker logs</code> on the Spark.</p>
|
||||
<pre id="mb-logs-pre" class="log"></pre>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="mb-logs-refresh" class="btn">Refresh</button>
|
||||
<span class="spacer"></span>
|
||||
<button type="button" id="mb-logs-close" class="btn">Close</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
</section>
|
||||
|
||||
<section id="speech-models-panel" class="speech-models hidden">
|
||||
@@ -198,6 +229,7 @@
|
||||
<div class="section-header">
|
||||
<h2 class="section-title">LLM swap</h2>
|
||||
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
||||
<button id="open-local" class="btn small-btn">+ Add local model</button>
|
||||
</div>
|
||||
|
||||
<dialog id="catalog-dialog" class="modal">
|
||||
@@ -230,6 +262,37 @@
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="local-model-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="local-model-form">
|
||||
<h3>Add a local / fine-tuned model</h3>
|
||||
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
|
||||
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
||||
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
|
||||
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
|
||||
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
|
||||
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
|
||||
<label class="modal-row"><span>Mode</span>
|
||||
<select id="lm-mode">
|
||||
<option value="solo">solo (Spark 1 only)</option>
|
||||
<option value="cluster">cluster (both Sparks via Ray)</option>
|
||||
</select>
|
||||
</label>
|
||||
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
|
||||
<fieldset class="modal-fieldset">
|
||||
<legend>Default launch knobs</legend>
|
||||
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
|
||||
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
|
||||
</fieldset>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="lm-cancel" class="btn">Cancel</button>
|
||||
<button type="submit" class="btn primary">Add local model</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="disk-delete-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form">
|
||||
<h3>Delete model weights from disk?</h3>
|
||||
@@ -244,6 +307,24 @@
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="sshkey-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form">
|
||||
<h3 id="sshkey-title">SSH public key</h3>
|
||||
<p id="sshkey-intro" class="muted small"></p>
|
||||
<div class="sshkey-row">
|
||||
<pre id="sshkey-value" class="snippet copyable" data-copy-self title="Click to copy"></pre>
|
||||
<button type="button" class="icon-btn" data-copy="#sshkey-value" title="Copy public key" aria-label="Copy public key">
|
||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
||||
</button>
|
||||
</div>
|
||||
<p class="muted small">To let this Spark log in to another machine (e.g. your Mac), run this in a terminal <em>on that machine</em>:</p>
|
||||
<pre id="sshkey-install" class="snippet copyable" data-copy-self title="Click to copy"></pre>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="sshkey-close" class="btn">Close</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="advanced-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="advanced-form">
|
||||
<h3 id="adv-title">Advanced settings</h3>
|
||||
|
||||
@@ -374,6 +374,12 @@ main {
|
||||
}
|
||||
.hw-card .head .name { font-weight: 600; font-size: 15px; }
|
||||
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
|
||||
/* WireGuard "VPN <ip>" badge in the meta line — accent (green) = on a tunnel. */
|
||||
.hw-card .head .meta .wg-badge { color: var(--accent); font-weight: 600; cursor: help; }
|
||||
/* Copy-this-Spark's-ssh-key button pins to the top-right corner; meta keeps
|
||||
its margin-left:auto so name/meta/button read left→right→corner. */
|
||||
.hw-card .head .ssh-key-btn { align-self: flex-start; padding: 3px 6px; }
|
||||
.hw-card .head .ssh-key-btn svg { width: 13px; height: 13px; }
|
||||
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
|
||||
.hw-card.unreachable .name { color: var(--error); }
|
||||
.hw-card.unreachable ol { color: var(--muted); }
|
||||
@@ -387,6 +393,10 @@ main {
|
||||
}
|
||||
.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
|
||||
.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
|
||||
/* SSH-key dialog: key line beside its copy button; long key wraps rather than scrolls. */
|
||||
.sshkey-row { display: flex; align-items: flex-start; gap: 8px; }
|
||||
.sshkey-row .snippet { flex: 1; margin: 0; white-space: pre-wrap; word-break: break-all; }
|
||||
#sshkey-install { white-space: pre-wrap; word-break: break-all; }
|
||||
|
||||
.connectivity-content {
|
||||
max-height: 360px;
|
||||
@@ -516,10 +526,12 @@ main {
|
||||
#dl-log-details { margin-top: 12px; }
|
||||
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
|
||||
|
||||
/* ===== NIM install dialog ===== */
|
||||
/* ===== NIM install + matrix-bridge dialogs ===== */
|
||||
|
||||
.modal#nim-dialog,
|
||||
.modal#nim-progress-dialog { max-width: 640px; }
|
||||
.modal#nim-progress-dialog,
|
||||
.modal#mb-update-dialog,
|
||||
.modal#mb-logs-dialog { max-width: 640px; }
|
||||
.nim-grid {
|
||||
display: grid;
|
||||
gap: 8px;
|
||||
@@ -682,6 +694,7 @@ main {
|
||||
.card .repo a { color: inherit; text-decoration: none; }
|
||||
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
||||
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
||||
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
|
||||
.tag {
|
||||
background: var(--surface-2);
|
||||
border: 1px solid var(--border);
|
||||
@@ -726,6 +739,7 @@ main {
|
||||
.card .adv-btn,
|
||||
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
||||
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
||||
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
||||
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
||||
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
||||
|
||||
@@ -12,6 +12,12 @@ dependencies = [
|
||||
"python-multipart>=0.0.9",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = ["pytest>=8"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools>=68"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
@@ -0,0 +1,17 @@
|
||||
"""Shared pytest setup.
|
||||
|
||||
These suites are pure/offline — they exercise pure functions and never touch the
|
||||
Sparks, /data, or the network. We still pin the env vars the app modules expect
|
||||
(documented in docs/guides/fastapi-image.md) to tmp paths so importing them can
|
||||
never write to the container-only /data path.
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Let `import app...` resolve whether or not the package is pip-installed.
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
|
||||
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
|
||||
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
|
||||
@@ -0,0 +1,69 @@
|
||||
"""_merge_words_with_speakers + _assign_speaker_to_word: the transcript/diarizer
|
||||
merge that turns Parakeet words + Sortformer turns into speaker-labelled blocks.
|
||||
Pure functions, no cluster — this is the core of transcribe-with-speakers.
|
||||
"""
|
||||
from app.audio_proxy import _assign_speaker_to_word, _merge_words_with_speakers
|
||||
|
||||
|
||||
def _w(start, end, text):
|
||||
return {"start": start, "end": end, "text": text}
|
||||
|
||||
|
||||
def _t(start, end, speaker):
|
||||
return {"start_s": start, "end_s": end, "speaker": speaker}
|
||||
|
||||
|
||||
# ---- _assign_speaker_to_word ----
|
||||
|
||||
def test_assign_by_midpoint_containment():
|
||||
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
|
||||
assert _assign_speaker_to_word(2.4, 2.8, turns) == "Speaker_1"
|
||||
|
||||
|
||||
def test_assign_falls_back_to_max_overlap_when_midpoint_outside():
|
||||
# midpoint 5.0 is in no turn; word span overlaps Speaker_0 more than Speaker_1.
|
||||
turns = [_t(0.0, 4.9, "Speaker_0"), _t(6.0, 8.0, "Speaker_1")]
|
||||
assert _assign_speaker_to_word(4.0, 6.0, turns) == "Speaker_0"
|
||||
|
||||
|
||||
def test_assign_unknown_when_no_overlap():
|
||||
turns = [_t(0.0, 1.0, "Speaker_0")]
|
||||
assert _assign_speaker_to_word(10.0, 11.0, turns) == "Speaker_unknown"
|
||||
|
||||
|
||||
# ---- _merge_words_with_speakers ----
|
||||
|
||||
def test_empty_words_returns_empty():
|
||||
assert _merge_words_with_speakers([], [_t(0, 1, "Speaker_0")]) == []
|
||||
|
||||
|
||||
def test_consecutive_same_speaker_words_join_into_one_block():
|
||||
words = [_w(0.0, 0.5, "good"), _w(0.5, 1.0, "morning")]
|
||||
turns = [_t(0.0, 2.0, "Speaker_0")]
|
||||
blocks = _merge_words_with_speakers(words, turns)
|
||||
assert blocks == [
|
||||
{"start_ms": 0, "end_ms": 1000, "speaker": "Speaker_0", "text": "good morning"}
|
||||
]
|
||||
|
||||
|
||||
def test_speaker_change_splits_blocks():
|
||||
words = [_w(0.0, 1.0, "hi"), _w(2.1, 3.0, "hello")]
|
||||
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
|
||||
blocks = _merge_words_with_speakers(words, turns)
|
||||
assert [b["speaker"] for b in blocks] == ["Speaker_0", "Speaker_1"]
|
||||
assert [b["text"] for b in blocks] == ["hi", "hello"]
|
||||
|
||||
|
||||
def test_long_silence_breaks_block_for_same_speaker():
|
||||
# >1.5s gap between two words of the same speaker forces a new block.
|
||||
words = [_w(0.0, 0.5, "one"), _w(3.0, 3.5, "two")]
|
||||
turns = [_t(0.0, 4.0, "Speaker_0")]
|
||||
blocks = _merge_words_with_speakers(words, turns)
|
||||
assert len(blocks) == 2
|
||||
assert [b["text"] for b in blocks] == ["one", "two"]
|
||||
|
||||
|
||||
def test_punctuation_token_joins_without_leading_space():
|
||||
words = [_w(0.0, 0.5, "hello"), _w(0.5, 0.7, ".")]
|
||||
turns = [_t(0.0, 2.0, "Speaker_0")]
|
||||
assert _merge_words_with_speakers(words, turns)[0]["text"] == "hello."
|
||||
@@ -0,0 +1,148 @@
|
||||
"""build_launch_command: argument assembly + the shell-injection invariant.
|
||||
|
||||
The security-critical property is that every user-controllable value (repo,
|
||||
vllm_args, knobs) is shlex-quoted at the sink, so `shlex.split` cleanly reverses
|
||||
the command back into the exact token list. The vLLM pre-flight validator
|
||||
(validate.py) depends on this round-trip — these tests lock it in.
|
||||
"""
|
||||
import shlex
|
||||
|
||||
import pytest
|
||||
from pydantic import ValidationError
|
||||
|
||||
from app.models import Defaults, ModelDef, build_launch_command
|
||||
|
||||
DEFAULTS = Defaults(port=8888, host="0.0.0.0")
|
||||
|
||||
|
||||
def _model(**kw) -> ModelDef:
|
||||
base = dict(display_name="X", repo="org/name", size_gb=1.0, mode="solo")
|
||||
base.update(kw)
|
||||
return ModelDef(**base)
|
||||
|
||||
|
||||
def test_solo_model_emits_solo_flag_and_ordered_args():
|
||||
cmd = build_launch_command("k", _model(vllm_args=["--max-model-len=1000"]), DEFAULTS)
|
||||
assert cmd == (
|
||||
"./launch-cluster.sh --solo -d exec vllm serve org/name "
|
||||
"--port=8888 --host=0.0.0.0 --max-model-len=1000"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_model_omits_solo_flag():
|
||||
cmd = build_launch_command("k", _model(mode="cluster", vllm_args=["-tp=2"]), DEFAULTS)
|
||||
assert " --solo " not in cmd
|
||||
assert cmd.startswith("./launch-cluster.sh -d exec vllm serve org/name")
|
||||
|
||||
|
||||
def test_knob_overrides_matching_bundled_flag():
|
||||
# bundled arg sets max-model-len; the knob must win (single occurrence).
|
||||
m = _model(vllm_args=["--max-model-len=1000"], knobs={"max_model_len": 65536})
|
||||
cmd = build_launch_command("k", m, DEFAULTS)
|
||||
assert "--max-model-len=65536" in cmd
|
||||
assert "--max-model-len=1000" not in cmd
|
||||
|
||||
|
||||
def test_repo_with_shell_metacharacters_is_quoted_not_executed():
|
||||
# build_launch_command quotes even a hostile repo (validate_repo guards the
|
||||
# API boundary; this proves the sink itself is safe in depth).
|
||||
evil = "org/name; rm -rf ~ #"
|
||||
cmd = build_launch_command("k", _model(repo=evil), DEFAULTS)
|
||||
# The raw metacharacters must not appear unquoted...
|
||||
assert "; rm -rf" not in cmd.replace(shlex.quote(evil), "")
|
||||
# ...and shlex.split must recover the repo as one literal token.
|
||||
tokens = shlex.split(cmd)
|
||||
assert evil in tokens
|
||||
|
||||
|
||||
def test_command_string_round_trips_through_shlex_split():
|
||||
# The invariant validate.py relies on: every arg survives quote -> split intact.
|
||||
args = ["--max-model-len=32768", "--load-format=fastsafetensors", "--note=a b c"]
|
||||
cmd = build_launch_command("k", _model(vllm_args=args), DEFAULTS)
|
||||
tokens = shlex.split(cmd)
|
||||
for a in args:
|
||||
assert a in tokens
|
||||
|
||||
|
||||
def test_injection_via_vllm_arg_stays_literal():
|
||||
payload = "--foo=$(touch /tmp/pwned)"
|
||||
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
|
||||
assert payload in shlex.split(cmd) # preserved as one inert token
|
||||
|
||||
|
||||
# ---- local / fine-tuned models (served by directory, not HF repo) ----
|
||||
|
||||
def test_local_model_bind_mounts_dir_and_serves_the_path():
|
||||
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
|
||||
cmd = build_launch_command("k", m, DEFAULTS)
|
||||
tokens = shlex.split(cmd)
|
||||
# The launch script's hook bind-mounts the host dir at the SAME container path.
|
||||
assert tokens[0] == (
|
||||
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
|
||||
)
|
||||
# vLLM is pointed at the directory, not an HF repo id.
|
||||
i = tokens.index("serve")
|
||||
assert tokens[i + 1] == "/home/u/models/ft-v2"
|
||||
assert "--max-model-len=2048" in tokens
|
||||
|
||||
|
||||
def test_local_model_chat_template_arg_survives_round_trip():
|
||||
m = _model(
|
||||
repo="",
|
||||
local_path="/m/ft",
|
||||
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
|
||||
)
|
||||
cmd = build_launch_command("k", m, DEFAULTS)
|
||||
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
|
||||
|
||||
|
||||
def test_local_path_with_metacharacters_is_quoted_not_executed():
|
||||
# The validator rejects a hostile path at the boundary; bypass it with
|
||||
# model_construct to prove the quote_arg sink is safe in depth even if a bad
|
||||
# value somehow reaches build_launch_command.
|
||||
evil = "/m/ft; rm -rf ~"
|
||||
m = ModelDef.model_construct(
|
||||
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
|
||||
vllm_args=[], knobs=None, custom=False, capabilities=[],
|
||||
expected_ready_seconds=300, description=None,
|
||||
)
|
||||
cmd = build_launch_command("k", m, DEFAULTS)
|
||||
tokens = shlex.split(cmd)
|
||||
i = tokens.index("serve")
|
||||
assert tokens[i + 1] == evil # recovered as one literal token, not executed
|
||||
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
|
||||
|
||||
|
||||
def test_model_requires_exactly_one_source():
|
||||
with pytest.raises(ValidationError):
|
||||
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
|
||||
with pytest.raises(ValidationError):
|
||||
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
|
||||
|
||||
|
||||
def test_local_model_rejects_chat_template_outside_dir():
|
||||
# Only local_path is mounted into the container, so a chat-template elsewhere
|
||||
# would silently 404 inside vLLM — reject it up front.
|
||||
with pytest.raises(ValidationError):
|
||||
ModelDef(
|
||||
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
|
||||
vllm_args=["--chat-template=/other/dir/t.jinja"],
|
||||
)
|
||||
|
||||
|
||||
def test_invalid_local_path_rejected_by_model():
|
||||
with pytest.raises(ValidationError):
|
||||
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
|
||||
|
||||
|
||||
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
|
||||
# YAML/override-added local models get the same validation as the API; a single
|
||||
# bad entry is skipped (logged) rather than breaking the whole catalog load.
|
||||
from app import models as M
|
||||
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
|
||||
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
|
||||
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
|
||||
]})
|
||||
cat = M._merge_overrides(M.Catalog(models={}))
|
||||
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
|
||||
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
|
||||
@@ -0,0 +1,47 @@
|
||||
"""build_update_command: the matrix-bridge update one-liner.
|
||||
|
||||
Pure string assembly, no cluster. Locks in the contract from
|
||||
docs/spark-control-integration.md (matrix-bridge repo): fetch, hard-reset to the
|
||||
release branch, then rebuild/recreate via docker compose — chained with `&&` so
|
||||
any failure (e.g. Gitea unreachable) aborts before the build and surfaces a
|
||||
non-zero exit. The clone dir must stay unquoted so a `~` expands server-side.
|
||||
"""
|
||||
from app.matrix_bridge import build_update_command, _phase_for
|
||||
|
||||
|
||||
def test_command_is_the_contract_chain():
|
||||
cmd = build_update_command("~/matrix-bridge", "master")
|
||||
assert cmd == (
|
||||
"cd ~/matrix-bridge && "
|
||||
"git fetch origin && "
|
||||
"git reset --hard origin/master && "
|
||||
"docker compose up -d --build"
|
||||
)
|
||||
|
||||
|
||||
def test_fail_loud_chaining():
|
||||
# Every step is &&-chained: a failed fetch never reaches the build.
|
||||
cmd = build_update_command("~/matrix-bridge", "master")
|
||||
assert "; " not in cmd
|
||||
assert cmd.count(" && ") == 3
|
||||
assert cmd.index("git fetch") < cmd.index("git reset") < cmd.index("docker compose")
|
||||
|
||||
|
||||
def test_tilde_dir_left_unquoted_for_server_side_expansion():
|
||||
cmd = build_update_command("~/matrix-bridge", "master")
|
||||
assert "cd ~/matrix-bridge &&" in cmd
|
||||
assert "'~" not in cmd # quoting would defeat the home-dir expansion
|
||||
|
||||
|
||||
def test_absolute_dir_and_custom_branch():
|
||||
cmd = build_update_command("/home/modelo/matrix-bridge", "phase-1")
|
||||
assert cmd.startswith("cd /home/modelo/matrix-bridge && ")
|
||||
assert "git reset --hard origin/phase-1 &&" in cmd
|
||||
|
||||
|
||||
def test_phase_detection_maps_known_lines():
|
||||
assert _phase_for("HEAD is now at 1a2b3c4 some commit") == "Resetting to the latest release…"
|
||||
assert _phase_for("#5 building image") == "Building the bot image…"
|
||||
assert _phase_for("Container matrix-bridge Recreate") == "Recreating the container…"
|
||||
assert _phase_for("Already up to date.") == "No new code; rebuilding…"
|
||||
assert _phase_for("some unremarkable line") is None
|
||||
@@ -0,0 +1,127 @@
|
||||
"""shellsafe validators: the API-boundary whitelist behind the v0.19.0 SSH
|
||||
command-injection hardening. The quoting *sink* is covered in
|
||||
test_launch_command.py; this locks in the *boundary* — that hostile input is
|
||||
rejected early, and that a valid value passes through unchanged so callers can
|
||||
use `validate_x(v)` inline.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.shellsafe import (
|
||||
validate_container,
|
||||
validate_image,
|
||||
validate_local_path,
|
||||
validate_repo,
|
||||
)
|
||||
|
||||
# Shell metacharacters that must never survive any validator — these are the
|
||||
# actual injection vectors. (Path traversal like "../" is NOT in scope here:
|
||||
# validate_image legitimately permits "/" and "." for real image refs such as
|
||||
# nvcr.io/nim/...; the defense for images is "no shell metacharacters" + the
|
||||
# quote_arg sink, not path-shape. Slash-rejection is tested directly for repo
|
||||
# and container, where "/" is disallowed.)
|
||||
HOSTILE = [
|
||||
"; rm -rf /",
|
||||
" a b",
|
||||
"$(touch pwned)",
|
||||
"`id`",
|
||||
"x|cat",
|
||||
"x&y",
|
||||
"x>out",
|
||||
"x\nrm",
|
||||
]
|
||||
|
||||
|
||||
# ---- validate_repo: HF 'org/name', exactly one slash ----
|
||||
|
||||
@pytest.mark.parametrize("repo", [
|
||||
"RedHatAI/Qwen3.6-35B-A3B-NVFP4", # the live production model
|
||||
"org/name",
|
||||
"a.b_c-d/x.y_z-1",
|
||||
])
|
||||
def test_repo_valid_passes_through_unchanged(repo):
|
||||
assert validate_repo(repo) == repo
|
||||
|
||||
|
||||
@pytest.mark.parametrize("repo", [
|
||||
"",
|
||||
"noslash",
|
||||
"a/b/c", # two slashes
|
||||
"/name", # empty org
|
||||
"org/", # empty name
|
||||
] + [f"org/name{h}" for h in HOSTILE])
|
||||
def test_repo_rejects_malformed_and_hostile(repo):
|
||||
with pytest.raises(ValueError):
|
||||
validate_repo(repo)
|
||||
|
||||
|
||||
# ---- validate_image: registry/path:tag@digest ----
|
||||
|
||||
@pytest.mark.parametrize("image", [
|
||||
"nvcr.io/nim/nvidia/parakeet-1_1b-ctc-en-us:latest",
|
||||
"ubuntu",
|
||||
"img@sha256:deadbeefcafe",
|
||||
"a.b/c:1.2_3-4",
|
||||
])
|
||||
def test_image_valid_passes_through_unchanged(image):
|
||||
assert validate_image(image) == image
|
||||
|
||||
|
||||
@pytest.mark.parametrize("image", [
|
||||
"",
|
||||
"-leading", # must start alphanumeric
|
||||
".leading",
|
||||
"/leading",
|
||||
":leading",
|
||||
"a" * 513, # over the 512 cap
|
||||
] + [f"img{h}" for h in HOSTILE])
|
||||
def test_image_rejects_malformed_and_hostile(image):
|
||||
with pytest.raises(ValueError):
|
||||
validate_image(image)
|
||||
|
||||
|
||||
# ---- validate_container: Docker name rule, no slash ----
|
||||
|
||||
@pytest.mark.parametrize("name", [
|
||||
"parakeet-asr",
|
||||
"a",
|
||||
"vol_1.2-3",
|
||||
])
|
||||
def test_container_valid_passes_through_unchanged(name):
|
||||
assert validate_container(name) == name
|
||||
|
||||
|
||||
@pytest.mark.parametrize("name", [
|
||||
"",
|
||||
"_leading", # underscore is not a valid first char
|
||||
"-leading",
|
||||
".leading",
|
||||
"has/slash", # slash not allowed in a container name
|
||||
"a" * 129, # over the 128 cap
|
||||
] + [f"name{h}" for h in HOSTILE])
|
||||
def test_container_rejects_malformed_and_hostile(name):
|
||||
with pytest.raises(ValueError):
|
||||
validate_container(name)
|
||||
|
||||
|
||||
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
|
||||
|
||||
@pytest.mark.parametrize("path", [
|
||||
"/home/modelo/models/gemma-4-31B-ten31-v2",
|
||||
"/data/models/ft.v2_1",
|
||||
"/srv/m/a-b/c",
|
||||
])
|
||||
def test_local_path_valid_passes_through_unchanged(path):
|
||||
assert validate_local_path(path) == path
|
||||
|
||||
|
||||
@pytest.mark.parametrize("path", [
|
||||
"",
|
||||
"relative/path", # must be absolute
|
||||
"~/models/x", # no ~ expansion
|
||||
"/models/../etc/shadow", # '..' traversal
|
||||
"/models/./x", # '.' segment
|
||||
"/a" * 300, # over the 512 cap (600 chars)
|
||||
] + [f"/models/x{h}" for h in HOSTILE])
|
||||
def test_local_path_rejects_relative_traversal_and_hostile(path):
|
||||
with pytest.raises(ValueError):
|
||||
validate_local_path(path)
|
||||
@@ -1,3 +1,14 @@
|
||||
ARCHES := x86
|
||||
# overrides to s9pk.mk must precede the include statement
|
||||
include s9pk.mk
|
||||
|
||||
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
|
||||
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
|
||||
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
|
||||
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
|
||||
|
||||
.PHONY: release
|
||||
release:
|
||||
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
|
||||
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
|
||||
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
|
||||
|
||||
@@ -40,6 +40,15 @@ const inputSpec = InputSpec.of({
|
||||
placeholder: 'your SSH username',
|
||||
masked: false,
|
||||
}),
|
||||
vllm_port: Value.text({
|
||||
name: 'vLLM port (optional)',
|
||||
description:
|
||||
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
|
||||
required: false,
|
||||
default: null,
|
||||
placeholder: 'leave blank for 8888',
|
||||
masked: false,
|
||||
}),
|
||||
parakeet_host: Value.text({
|
||||
name: 'Parakeet host (optional)',
|
||||
description:
|
||||
@@ -119,6 +128,15 @@ const inputSpec = InputSpec.of({
|
||||
placeholder: 'e.g. crm_chunks',
|
||||
masked: false,
|
||||
}),
|
||||
matrix_bridge_user: Value.text({
|
||||
name: 'matrix-bridge bot SSH user (optional)',
|
||||
description:
|
||||
"If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
|
||||
required: false,
|
||||
default: null,
|
||||
placeholder: 'e.g. modelo',
|
||||
masked: false,
|
||||
}),
|
||||
open_webui_url: Value.text({
|
||||
name: 'Open WebUI URL (optional)',
|
||||
description:
|
||||
|
||||
@@ -7,6 +7,8 @@ export const sparkConfigSchema = z.object({
|
||||
spark1_user: z.string().catch(''),
|
||||
spark2_host: z.string().catch(''),
|
||||
spark2_user: z.string().catch(''),
|
||||
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
||||
vllm_port: z.string().catch(''),
|
||||
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
||||
parakeet_host: z.string().catch(''),
|
||||
parakeet_user: z.string().catch(''),
|
||||
@@ -22,6 +24,8 @@ export const sparkConfigSchema = z.object({
|
||||
qdrant_user: z.string().catch(''),
|
||||
qdrant_container: z.string().catch(''),
|
||||
qdrant_collection: z.string().catch(''),
|
||||
// Optional matrix-bridge bot. Blank => no tile. Host reuses Spark 2.
|
||||
matrix_bridge_user: z.string().catch(''),
|
||||
// Optional Open WebUI deep-link
|
||||
open_webui_url: z.string().catch(''),
|
||||
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
||||
|
||||
@@ -13,6 +13,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
||||
spark1_user: '',
|
||||
spark2_host: '',
|
||||
spark2_user: '',
|
||||
vllm_port: '',
|
||||
parakeet_host: '',
|
||||
parakeet_user: '',
|
||||
parakeet_container: '',
|
||||
@@ -26,6 +27,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
||||
qdrant_user: '',
|
||||
qdrant_container: '',
|
||||
qdrant_collection: '',
|
||||
matrix_bridge_user: '',
|
||||
open_webui_url: '',
|
||||
ngc_api_key: '',
|
||||
}
|
||||
@@ -49,6 +51,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
||||
SPARK1_USER: cfg.spark1_user,
|
||||
SPARK2_HOST: cfg.spark2_host,
|
||||
SPARK2_USER: cfg.spark2_user,
|
||||
VLLM_PORT: cfg.vllm_port,
|
||||
PARAKEET_HOST: cfg.parakeet_host,
|
||||
PARAKEET_USER: cfg.parakeet_user,
|
||||
PARAKEET_CONTAINER: cfg.parakeet_container,
|
||||
@@ -62,6 +65,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
||||
QDRANT_USER: cfg.qdrant_user,
|
||||
QDRANT_CONTAINER: cfg.qdrant_container,
|
||||
QDRANT_COLLECTION: cfg.qdrant_collection,
|
||||
MATRIX_BRIDGE_USER: cfg.matrix_bridge_user,
|
||||
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
||||
SERVICES_OVERRIDES: '/data/services-overrides.yaml',
|
||||
CONNECTIVITY_LOG: '/data/connectivity.json',
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||
|
||||
export const v0_1_0 = VersionInfo.of({
|
||||
version: '0.19.0:0',
|
||||
version: '0.23.0:0',
|
||||
releaseNotes: {
|
||||
en_US:
|
||||
'v0.19.0:0 — security hardening of the cluster-control surface (no change to the proxy/data APIs your other apps use). (1) Every user-supplied value that reaches an SSH command on the Sparks — model repo, vLLM args/knobs, NIM image/container, service names — is now strictly validated and shell-quoted, closing a command-injection path. (2) The Qdrant collection name in /api/search is validated so it can no longer be used to reach other collections. (3) State-changing dashboard endpoints (model swap, NIM install, service start/stop, disk delete, etc.) now require a same-origin request, blocking cross-site (CSRF) attacks from a malicious page open in your browser. The OpenAI-compatible proxies (/v1/*), the redaction gateway (/scrub, /rehydrate), /api/search, /api/audio/*, and /api/health-event are exempt, so Recap Relay, the CRM, Open WebUI and other consumers are unaffected.',
|
||||
"v0.23.0:0 — local / fine-tuned model support. You can now add a model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune), not just a Hugging Face repo. Use the new \"+ Add local model\" button under LLM swap: give it the model's absolute path on the Spark, an optional chat-template path, and the usual launch knobs. On swap, Spark Control bind-mounts that directory into the vLLM container at the same path (via the launch script's existing VLLM_SPARK_EXTRA_DOCKER_ARGS hook — nothing to change on the Spark) and runs `vllm serve <dir>`. Local models show a \"local\" badge and their path instead of a Hugging Face link, and their weights are never offered for dashboard deletion (that directory is your own training output, not a re-downloadable cache). API: POST /api/models now accepts `local_path` (set exactly one of `repo` or `local_path`), validated against a strict path whitelist with no traversal.",
|
||||
},
|
||||
migrations: {
|
||||
up: async ({ effects }) => {},
|
||||
|
||||
+24
@@ -34,6 +34,24 @@ These take effect on the **next swap to that model**. If a swap fails after this
|
||||
- Status auto-refreshes every 5 s.
|
||||
- A swap takes 3–6 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
|
||||
|
||||
## matrix-bridge bot tile (optional)
|
||||
|
||||
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
|
||||
|
||||
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
|
||||
|
||||
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
|
||||
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
|
||||
|
||||
```
|
||||
Host <your-git-host>
|
||||
Port <port>
|
||||
IdentityFile ~/.ssh/id_ed25519
|
||||
IdentitiesOnly yes
|
||||
```
|
||||
|
||||
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
||||
|
||||
## Adding a new model
|
||||
|
||||
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
||||
@@ -42,6 +60,12 @@ These take effect on the **next swap to that model**. If a swap fails after this
|
||||
|
||||
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
|
||||
|
||||
### Local / fine-tuned models (v0.23.0+)
|
||||
|
||||
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
|
||||
|
||||
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
|
||||
|
||||
## Manual swap fallback
|
||||
|
||||
If the UI is unavailable and you need to swap by hand:
|
||||
|
||||
Executable
+65
@@ -0,0 +1,65 @@
|
||||
#!/usr/bin/env bash
|
||||
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
|
||||
# latest package with a read-only token instead of being hand-sent the file.
|
||||
#
|
||||
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
|
||||
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
|
||||
#
|
||||
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
|
||||
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
|
||||
# reuses an existing release for the tag and replaces a same-named asset.
|
||||
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
|
||||
set -euo pipefail
|
||||
|
||||
VERSION="${1:-}"; S9PK="${2:-}"
|
||||
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
|
||||
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
|
||||
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
|
||||
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
|
||||
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
|
||||
|
||||
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
|
||||
ASSET="$(basename "$S9PK")"
|
||||
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
|
||||
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
|
||||
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
|
||||
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
|
||||
|
||||
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
|
||||
|
||||
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
|
||||
api() {
|
||||
local method="$1" url="$2"; shift 2
|
||||
local out
|
||||
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
|
||||
-w $'\n%{http_code}' "$url")"
|
||||
HTTP_CODE="${out##*$'\n'}"
|
||||
BODY="${out%$'\n'*}"
|
||||
}
|
||||
|
||||
# Reuse an existing release for this tag, otherwise create one.
|
||||
api GET "$API/releases/tags/$TAG"
|
||||
if [ "$HTTP_CODE" = 200 ]; then
|
||||
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
||||
elif [ "$HTTP_CODE" = 404 ]; then
|
||||
api POST "$API/releases" -H 'Content-Type: application/json' \
|
||||
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
|
||||
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
|
||||
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
||||
id="$(printf '%s' "$BODY" | jq -r '.id')"
|
||||
else
|
||||
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
|
||||
exit 1
|
||||
fi
|
||||
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
|
||||
|
||||
# Replace a same-named asset so re-runs don't 409.
|
||||
api GET "$API/releases/$id/assets"
|
||||
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
|
||||
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
|
||||
|
||||
api POST "$API/releases/$id/assets?name=$ASSET" \
|
||||
-F "attachment=@${S9PK};type=application/octet-stream"
|
||||
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
|
||||
|
||||
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"
|
||||
Reference in New Issue
Block a user