8 Commits

Author SHA1 Message Date
Keysat e783653ef0 v0.23.0:0 - local / fine-tuned model support
Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.

- ModelDef gains local_path; a model must set exactly one of repo / local_path.
  The validator also enforces the local-path whitelist and that any
  --chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
  host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
  then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
  expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
  bad entry can't be written and then break catalog load; _merge_overrides skips
  an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
  link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
  exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
  pass; findings addressed.
2026-06-17 22:27:41 -05:00
Keysat 57a893000e docs: document the Gitea release ritual in startos-package guide 2026-06-17 21:29:27 -05:00
Keysat 56f7ea4444 fix: gitea-release.sh tolerate 404 on tag lookup; report HTTP errors; mark v0.22.0 published 2026-06-17 21:23:21 -05:00
Keysat aaad57d88f docs: mark v0.22.0:0 shipped + record Gitea-release distribution decision 2026-06-17 19:47:49 -05:00
Keysat 136a4713a1 v0.22.0:0 - configurable vllm port; gitea-release tooling; coexistence roadmap
- Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh
  default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env
  -> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix
  the "vLLM unreachable" health check without rebuilding the package.
- Harden numeric env parsing (config._env_int): a blank or malformed port now
  falls back to its default instead of crashing daemon startup (closes a P3
  tech-debt item; the Configure panel passes unset optional fields as "").
- Add scripts/gitea-release.sh + `make release` to publish the built s9pk to
  Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token
  instead of being hand-sent the package.
- Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a
  job runner" stance in ROADMAP.md and Current state.
2026-06-17 19:45:09 -05:00
Keysat c179389731 docs: trim Current state post-matrix-bridge ship; add bot-tile ops note to runbook 2026-06-15 23:18:28 -05:00
Keysat 9debeb4bbe v0.21.0:1 - tidy host display for port-less bot tile 2026-06-15 23:09:24 -05:00
Keysat 39f8410623 v0.21.0:0 - matrix-bridge bot tile (status, update, restart, logs) 2026-06-15 22:57:40 -05:00
24 changed files with 1105 additions and 45 deletions
+5 -4
View File
@@ -55,11 +55,12 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state ## Current state
- **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`. - **Working (v0.22.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (65 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, and the `shellsafe` validators. Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts. - **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee. - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector. - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers. - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag. - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Next:** (1) audio concurrency sweep — only if the Signal Engine dev wants the measured knee; needs owner OK in a quiet window. (2) Otherwise pull from `ROADMAP.md`: local-path/fine-tuned model support (new) or P2 tech-debt. Parakeet long-audio guard is deferred (rationale in ROADMAP). - **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release``scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass; verified via TestClient). **Reviewer-agent pass done; findings addressed:** path validation folded into the `ModelDef` validator (so YAML/override-added local models are checked too), a chat-template-must-live-inside-`local_path` guard, `_merge_overrides` skips a bad entry instead of breaking the whole catalog, and the `VLLM_SPARK_EXTRA_DOCKER_ARGS` unquoted-expansion contract is documented in `runbook.md`. **Not yet built/installed/published — awaiting go/no-go.** Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
+15 -1
View File
@@ -2,6 +2,21 @@
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up. Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap``GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
Sequenced:
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
- **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule).
## Near term ## Near term
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files. - parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee. - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
@@ -19,7 +34,6 @@ Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE
- Second audio worker / queueing layer; revisit which services share Spark 2. - Second audio worker / queueing layer; revisit which services share Spark 2.
## Dashboard ## Dashboard
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (`models.yaml` + custom overrides) and the "Add custom model" path (`POST /api/models`) only accepts an HF `org/name` repo (`shellsafe._HF_REPO_RE`), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the `org/name` regex (path whitelist + `shlex.quote`, no traversal); (b) `models.build_launch_command` / `launch-cluster.sh` able to `vllm serve <path>`; (c) `disk.py` size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild). - Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
- Spark host update actions (OS/driver) from the UI. - Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views. - Open WebUI link-out integration; richer per-service detail views.
+16
View File
@@ -25,6 +25,22 @@ npm run prettier # prettier --write startos (no semicolons, single quotes, tra
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`). - Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs). - New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
## Releasing to Gitea
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
```bash
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
cd package && make x86 # build
# 2. commit + push the source change
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
make install # optional: sideload to your own server (restarts it — go/no-go)
# 3. publish the s9pk as a release asset (needs a write-scoped token):
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
```
`make release``scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
## Layout ## Layout
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes). - `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
+35 -7
View File
@@ -8,6 +8,16 @@ def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default) return os.environ.get(name, default)
def _env_int(name: str, default: int) -> int:
"""Parse an int env var, falling back to `default` when unset, blank, or
malformed. The StartOS Configure panel passes optional numeric fields as an
empty string when left blank, so a bare int("") would crash daemon startup."""
try:
return int(os.environ.get(name, "") or default)
except (TypeError, ValueError):
return default
def _resolve_models_yaml() -> str: def _resolve_models_yaml() -> str:
if env := os.environ.get("MODELS_YAML"): if env := os.environ.get("MODELS_YAML"):
return env return env
@@ -42,6 +52,11 @@ class Settings:
qdrant_user: str qdrant_user: str
qdrant_container: str qdrant_container: str
qdrant_collection: str qdrant_collection: str
matrix_bridge_host: str
matrix_bridge_user: str
matrix_bridge_container: str
matrix_bridge_dir: str
matrix_bridge_branch: str
redaction_map_db: str redaction_map_db: str
redaction_map_ttl: int redaction_map_ttl: int
ssh_key_path: str ssh_key_path: str
@@ -81,18 +96,31 @@ class Settings:
qdrant_user=_env("QDRANT_USER") or spark2_user, qdrant_user=_env("QDRANT_USER") or spark2_user,
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant", qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
qdrant_collection=_env("QDRANT_COLLECTION", ""), qdrant_collection=_env("QDRANT_COLLECTION", ""),
# matrix-bridge bot container, driven as its own SSH user (the owner
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
# The user is BLANK by default and set via the "Configure Sparks"
# action; leaving it blank reports the service as unconfigured, which
# hides the tile. That keeps the shared package portable — a
# deployment without the bot never shows a stray tile or a hardcoded
# username. Host defaults to Spark 2 (same box); container/dir/branch
# are sensible defaults. All are env-overridable.
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
# Redaction gateway pseudonym-map store (server-held de-anon key). # Redaction gateway pseudonym-map store (server-held de-anon key).
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"), redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")), redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
ssh_key_path=_env("SSH_KEY_PATH"), ssh_key_path=_env("SSH_KEY_PATH"),
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"), ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
models_yaml=_resolve_models_yaml(), models_yaml=_resolve_models_yaml(),
vllm_port=int(_env("VLLM_PORT", "8888")), vllm_port=_env_int("VLLM_PORT", 8888),
parakeet_port=int(_env("PARAKEET_PORT", "8000")), parakeet_port=_env_int("PARAKEET_PORT", 8000),
kokoro_port=int(_env("KOKORO_PORT", "8880")), kokoro_port=_env_int("KOKORO_PORT", 8880),
embed_port=int(_env("EMBED_PORT", "8088")), embed_port=_env_int("EMBED_PORT", 8088),
qdrant_port=int(_env("QDRANT_PORT", "6333")), qdrant_port=_env_int("QDRANT_PORT", 6333),
bind_port=int(_env("BIND_PORT", "9999")), bind_port=_env_int("BIND_PORT", 9999),
open_webui_url=_env("OPEN_WEBUI_URL", ""), open_webui_url=_env("OPEN_WEBUI_URL", ""),
ngc_api_key=_env("NGC_API_KEY", ""), ngc_api_key=_env("NGC_API_KEY", ""),
) )
+40 -3
View File
@@ -15,6 +15,7 @@ from dataclasses import dataclass
from typing import Optional from typing import Optional
from .config import Settings from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run from .ssh import ssh_run
@@ -76,16 +77,52 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
return HostDiskResult(host=host, on_disk=True, size_bytes=size) return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus: async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).""" """Return whether a local model directory exists on this host and its size.
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
shlex-quote it here in depth.
"""
if not host or not user:
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
qp = quote_arg(path)
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
if rc != 0:
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
raw = out.strip()
if raw == "MISSING" or raw == "":
return HostDiskResult(host=host, on_disk=False)
try:
size = int(raw.splitlines()[-1])
except ValueError:
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_disk(
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
) -> DiskStatus:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
A local model (local_path set) is probed by directory; otherwise by HF cache.
"""
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)] hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
if mode == "cluster" and settings.spark2_host: if mode == "cluster" and settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user)) hosts.append((settings.spark2_host, settings.spark2_user))
if local_path:
results = await asyncio.gather(
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
)
key = local_path
else:
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts)) results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
key = repo
on_disk = any(r.on_disk for r in results) on_disk = any(r.on_disk for r in results)
total = sum(r.size_bytes for r in results) total = sum(r.size_bytes for r in results)
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results)) return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult: async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
+186
View File
@@ -0,0 +1,186 @@
"""Update + logs for the matrix-bridge bot container on the Spark.
matrix-bridge is a single Docker container managed by docker compose out of a
git clone at `~matrix_bridge_user/matrix-bridge`. Status (the badge) and
start/stop/restart ride the generic service machinery in `services.py`
(`docker_state` / `run_action`). The two things that don't fit that mould live
here:
- **Update** — `git fetch && git reset --hard origin/<branch> && docker
compose up -d --build`. Long-running (docker build), so it streams like the
vLLM `UpdateManager`: fire-and-forget job, SSE stream, fail-loud rc.
- **Logs** — a one-shot `docker logs --tail N` for diagnosing a red badge.
We connect **directly as the configured user** (`modelo` — the repo owner), so
git never trips its dubious-ownership guard and docker runs via the user's
docker-group membership. We deliberately do NOT `sudo -iu modelo`: this Spark
has no passwordless sudo, so a sudo wrap would hang in SSH BatchMode.
"""
from __future__ import annotations
import asyncio
import time
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run, ssh_stream, StreamHandle
# Hard ceiling on a single update. A first build after a base-image bump is
# slow (minutes); the cache makes later ones quick. 25 min is generous headroom
# without letting a genuinely wedged build spin forever.
_UPDATE_TIMEOUT_S = 1500
def build_update_command(directory: str, branch: str) -> str:
"""The update one-liner, run from the bot's git clone as its owner.
`directory` and `branch` come from operator config (not request input), so
they're interpolated directly — same trust model as the Spark hostnames in
`health`/`updates`. `directory` may be `~/...`, which must stay unquoted so
the remote login shell expands it; quoting would defeat that.
"""
return (
f"cd {directory} && "
f"git fetch origin && "
f"git reset --hard origin/{branch} && "
f"docker compose up -d --build"
)
def _phase_for(line: str) -> Optional[str]:
"""Map a streamed output line to a human-readable phase, or None to keep
the current phase. Kept loose — compose/buildkit output varies by version."""
low = line.lower()
if "git reset" in low or "head is now at" in low:
return "Resetting to the latest release…"
if "docker compose" in low or "buildkit" in low or low.startswith("step ") or "=> " in line or "building " in low:
return "Building the bot image…"
if "recreate" in low or "starting" in low or "started" in low or "container matrix-bridge" in low:
return "Recreating the container…"
if "already up to date" in low:
return "No new code; rebuilding…"
return None
@dataclass
class UpdateJob:
id: str
started_at: str
state: str = "starting"
lines: list[str] = field(default_factory=list)
returncode: Optional[int] = None
finished_at: Optional[str] = None
phase: str = "Starting…"
def append(self, line: str) -> None:
self.lines.append(line)
if len(self.lines) > 1000:
del self.lines[: len(self.lines) - 1000]
class MatrixBridgeManager:
def __init__(self, settings: Settings) -> None:
self.settings = settings
self.lock = asyncio.Lock()
self.jobs: dict[str, UpdateJob] = {}
self.current_job_id: Optional[str] = None
def _configured(self) -> bool:
s = self.settings
return bool(s.matrix_bridge_host and s.matrix_bridge_user)
def get(self, job_id: str) -> UpdateJob | None:
return self.jobs.get(job_id)
async def fetch_logs(self, tail: int = 100) -> dict:
"""One-shot `docker logs --tail N <container>` (stderr merged in)."""
s = self.settings
if not self._configured():
return {"ok": False, "error": "matrix-bridge host not configured"}
tail = max(1, min(int(tail), 1000))
# tail is already int-clamped, but quote at the sink anyway so the
# shellsafe convention (no raw interpolation into an SSH command) holds
# regardless of caller.
cmd = f"docker logs --tail {quote_arg(str(tail))} {quote_arg(s.matrix_bridge_container)} 2>&1"
rc, out, err = await ssh_run(
s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, timeout=20
)
return {
"ok": rc == 0,
"rc": rc,
"container": s.matrix_bridge_container,
"output": (out or err).strip(),
}
async def trigger_update(self) -> UpdateJob:
if not self._configured():
raise RuntimeError("matrix-bridge host not configured")
if self.lock.locked():
raise RuntimeError("An update is already in progress")
job = UpdateJob(
id=uuid.uuid4().hex[:8],
started_at=datetime.now(timezone.utc).isoformat(),
)
self.jobs[job.id] = job
self.current_job_id = job.id
asyncio.create_task(self._run(job))
return job
async def _run(self, job: UpdateJob) -> None:
async with self.lock:
try:
await self._do(job)
if job.state != "failed":
job.state = "done"
job.returncode = 0
job.phase = "Done"
except asyncio.TimeoutError:
job.append(f"[error] update timed out after {_UPDATE_TIMEOUT_S}s")
job.state = "failed"
job.returncode = 124
job.phase = "Timed out"
except Exception as e:
job.append(f"[error] {type(e).__name__}: {e}")
job.state = "failed"
if job.returncode is None:
job.returncode = 1
finally:
job.finished_at = datetime.now(timezone.utc).isoformat()
if self.current_job_id == job.id:
self.current_job_id = None
async def _do(self, job: UpdateJob) -> None:
s = self.settings
cmd = build_update_command(s.matrix_bridge_dir, s.matrix_bridge_branch)
job.append(f"$ {cmd}")
job.state = "running"
job.phase = "Fetching latest code…"
handle = StreamHandle()
gen = ssh_stream(s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, handle=handle)
deadline = time.monotonic() + _UPDATE_TIMEOUT_S
try:
while True:
remaining = deadline - time.monotonic()
if remaining <= 0:
raise asyncio.TimeoutError
try:
line = await asyncio.wait_for(gen.__anext__(), timeout=remaining)
except StopAsyncIteration:
break
job.append(line)
phase = _phase_for(line)
if phase:
job.phase = phase
finally:
# Closing the generator terminates the underlying ssh process and
# populates handle.returncode via ssh_stream's finally block.
await gen.aclose()
rc = handle.returncode or 0
if rc != 0:
job.state = "failed"
job.returncode = rc
+77 -7
View File
@@ -1,15 +1,33 @@
from __future__ import annotations from __future__ import annotations
import logging
from typing import Literal, Optional from typing import Literal, Optional
import yaml import yaml
from pydantic import BaseModel, Field from pydantic import BaseModel, Field, model_validator
from .overrides import apply_knobs_to_args, load_overrides from .overrides import apply_knobs_to_args, load_overrides
from .shellsafe import quote_arg, quote_args from .shellsafe import quote_arg, quote_args, validate_local_path
log = logging.getLogger(__name__)
def _chat_template_path(vllm_args: list[str]) -> str | None:
"""Extract the path from a `--chat-template=<path>` arg, if present."""
for a in vllm_args:
if a.startswith("--chat-template="):
return a.split("=", 1)[1]
return None
def _is_within(path: str, base: str) -> bool:
"""True if `path` is `base` itself or lives inside it (lexical check)."""
base = base.rstrip("/")
return path == base or path.startswith(base + "/")
class ModelDef(BaseModel): class ModelDef(BaseModel):
display_name: str display_name: str
repo: str repo: str = "" # HF 'org/name'; empty for a local model
local_path: str | None = None # absolute dir on the Spark; set => local model
size_gb: float size_gb: float
mode: Literal["solo", "cluster"] mode: Literal["solo", "cluster"]
capabilities: list[str] = Field(default_factory=list) capabilities: list[str] = Field(default_factory=list)
@@ -19,6 +37,38 @@ class ModelDef(BaseModel):
knobs: dict | None = None # user-customized; merged at launch time knobs: dict | None = None # user-customized; merged at launch time
custom: bool = False # True if this came from /data overrides custom: bool = False # True if this came from /data overrides
@model_validator(mode="after")
def _validate_source(self) -> "ModelDef":
if bool(self.repo) == bool(self.local_path):
raise ValueError(
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
f"or 'local_path' (Spark directory)"
)
if self.local_path:
# Single place that enforces the path whitelist, so YAML/override
# entries get the same boundary check as the API. The quote_arg sink
# is still defense-in-depth.
validate_local_path(self.local_path)
# Only local_path is bind-mounted into the vLLM container, so any
# --chat-template path must live inside it or vLLM can't find it.
tmpl = _chat_template_path(self.vllm_args)
if tmpl is not None and not _is_within(tmpl, self.local_path):
raise ValueError(
f"--chat-template path {tmpl!r} must be inside the model "
f"directory {self.local_path!r} (only that directory is mounted "
f"into the container)"
)
return self
@property
def is_local(self) -> bool:
return bool(self.local_path)
@property
def source(self) -> str:
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
return self.local_path if self.local_path else self.repo
class Defaults(BaseModel): class Defaults(BaseModel):
port: int = 8888 port: int = 8888
@@ -47,7 +97,8 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
continue continue
defaults_dump = { defaults_dump = {
"display_name": entry.get("display_name", key), "display_name": entry.get("display_name", key),
"repo": entry["repo"], "repo": entry.get("repo", ""),
"local_path": entry.get("local_path"),
"size_gb": float(entry.get("size_gb", 0)), "size_gb": float(entry.get("size_gb", 0)),
"mode": entry.get("mode", "solo"), "mode": entry.get("mode", "solo"),
"capabilities": entry.get("capabilities") or [], "capabilities": entry.get("capabilities") or [],
@@ -57,7 +108,12 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
"knobs": entry.get("knobs"), "knobs": entry.get("knobs"),
"custom": True, "custom": True,
} }
# A single malformed override entry (bad path, missing source, etc.) must
# not take down the whole catalog — skip it and keep the rest loadable.
try:
new_models[key] = ModelDef.model_validate(defaults_dump) new_models[key] = ModelDef.model_validate(defaults_dump)
except Exception as e:
log.warning("skipping invalid custom model %r: %s", key, e)
return Catalog(defaults=catalog.defaults, models=new_models) return Catalog(defaults=catalog.defaults, models=new_models)
@@ -78,7 +134,21 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
solo = "--solo " if model.mode == "solo" else "" solo = "--solo " if model.mode == "solo" else ""
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs) base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args] args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
# repo + args are user-controlled (custom models, knobs); shlex.quote each so # source + args are user-controlled (custom models, knobs); shlex.quote each
# they cannot break out of the SSH shell command. shlex.split (used by the # so they cannot break out of the SSH shell command. shlex.split (used by the
# vLLM pre-flight validator) cleanly reverses this quoting. # vLLM pre-flight validator) cleanly reverses this quoting.
return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}" prefix = ""
if model.local_path:
# A local model's directory isn't in the HF cache the launch script
# already mounts, so bind-mount it at the SAME path inside the vllm
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
# path inside and out means `vllm serve <dir>` and any
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
# change needed. (The env assignment sits before the script, so the
# validator's `serve`-keyed shlex round-trip is unaffected.)
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
return (
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
f"{quote_arg(model.source)} {quote_args(args)}"
)
+7 -1
View File
@@ -14,7 +14,7 @@ Shape:
custom: custom:
- key: my-new-model - key: my-new-model
display_name: My New Model (from download) display_name: My New Model (from download)
repo: my-org/my-model repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
size_gb: 20 size_gb: 20
mode: solo mode: solo
description: null description: null
@@ -25,6 +25,12 @@ Shape:
fastsafetensors: true fastsafetensors: true
prefix_caching: true prefix_caching: true
kv_cache_dtype: fp8 kv_cache_dtype: fp8
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
display_name: My Fine-tune
local_path: /home/you/models/my-finetune
size_gb: 59
mode: solo
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
""" """
from __future__ import annotations from __future__ import annotations
import os import os
+120 -9
View File
@@ -3,10 +3,10 @@ import asyncio
import json import json
from pathlib import Path from pathlib import Path
from fastapi import FastAPI, HTTPException from fastapi import FastAPI, HTTPException, Query, Request
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel from pydantic import BaseModel, ValidationError
from typing import Literal from typing import Literal
from .config import Settings from .config import Settings
@@ -21,7 +21,8 @@ from .embeddings_proxy import build_router as build_embeddings_router
from .redaction_gateway import build_router as build_redaction_router, MapStore from .redaction_gateway import build_router as build_redaction_router, MapStore
from .hardware import HardwareProbe from .hardware import HardwareProbe
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
from .models import load_catalog from .matrix_bridge import MatrixBridgeManager
from .models import ModelDef, load_catalog
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
from .services import docker_state, run_action, services_from_settings from .services import docker_state, run_action, services_from_settings
@@ -43,6 +44,7 @@ hardware_probe = HardwareProbe(settings)
nim_manager = NimManager(settings) nim_manager = NimManager(settings)
deep_health = DeepHealth(settings) deep_health = DeepHealth(settings)
speech_models = SpeechModelsManager(settings) speech_models = SpeechModelsManager(settings)
matrix_bridge = MatrixBridgeManager(settings)
app = FastAPI(title="spark-control", version="0.1.0") app = FastAPI(title="spark-control", version="0.1.0")
@@ -181,7 +183,8 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
class CustomModelBody(BaseModel): class CustomModelBody(BaseModel):
key: str key: str
display_name: str display_name: str
repo: str repo: str = ""
local_path: str | None = None
size_gb: float = 0 size_gb: float = 0
mode: Literal["solo", "cluster"] = "solo" mode: Literal["solo", "cluster"] = "solo"
description: str | None = None description: str | None = None
@@ -194,8 +197,17 @@ class CustomModelBody(BaseModel):
async def post_model(body: CustomModelBody) -> dict: async def post_model(body: CustomModelBody) -> dict:
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum(): if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
raise HTTPException(400, "key must be alphanumeric/-/_ only") raise HTTPException(400, "key must be alphanumeric/-/_ only")
# Validate the full entry BEFORE persisting (exactly-one source, local-path
# whitelist, chat-template location). Doing it via ModelDef means the API and
# the YAML-override path share one set of rules, and a bad entry can't be
# written to /data and then break catalog load.
try: try:
validate_repo(body.repo) ModelDef.model_validate(body.model_dump())
if body.repo:
validate_repo(body.repo) # HF charset (the model only validates local paths)
except ValidationError as e:
msg = e.errors()[0]["msg"] if e.errors() else str(e)
raise HTTPException(400, msg.removeprefix("Value error, "))
except ValueError as e: except ValueError as e:
raise HTTPException(400, str(e)) raise HTTPException(400, str(e))
if body.key in catalog.models and not catalog.models[body.key].custom: if body.key in catalog.models and not catalog.models[body.key].custom:
@@ -227,7 +239,13 @@ async def get_models_disk_status() -> dict:
return {"configured": False, "models": {}} return {"configured": False, "models": {}}
keys = list(catalog.models.keys()) keys = list(catalog.models.keys())
statuses = await asyncio.gather(*( statuses = await asyncio.gather(*(
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys probe_disk(
catalog.models[k].repo,
catalog.models[k].mode,
settings,
local_path=catalog.models[k].local_path,
)
for k in keys
), return_exceptions=True) ), return_exceptions=True)
out: dict[str, dict] = {} out: dict[str, dict] = {}
for k, s in zip(keys, statuses): for k, s in zip(keys, statuses):
@@ -258,6 +276,14 @@ async def del_model_disk(key: str) -> dict:
raise HTTPException(404, f"unknown model: {key}") raise HTTPException(404, f"unknown model: {key}")
m = catalog.models[key] m = catalog.models[key]
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
# training output the user placed by hand, not a re-downloadable HF cache.
if m.local_path:
raise HTTPException(
400,
"this is a local model; its directory must be managed on the Spark, not deleted from here",
)
# Refuse if currently loaded # Refuse if currently loaded
try: try:
vllm = await check_vllm(settings) vllm = await check_vllm(settings)
@@ -474,6 +500,11 @@ async def get_services() -> dict:
http = await check_embeddings(settings) http = await check_embeddings(settings)
elif name == "qdrant": elif name == "qdrant":
http = await check_qdrant(settings) http = await check_qdrant(settings)
elif svc.kind == "bot":
# No HTTP health endpoint (host networking, no port) — judged purely
# by docker state. http_ready stays None so the badge isn't pinned
# to a "Starting…" verdict that can never clear.
http = {"ok": None, "base_url": None}
else: else:
# Custom services expose a /health endpoint by convention. # Custom services expose a /health endpoint by convention.
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"} http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
@@ -484,7 +515,9 @@ async def get_services() -> dict:
"container": svc.container, "container": svc.container,
"kind": svc.kind, "kind": svc.kind,
"base_url": http.get("base_url"), "base_url": http.get("base_url"),
"http_ready": bool(http.get("ok")), # None (not False) for services with no HTTP surface (the bot), so
# the UI judges them by docker state alone instead of "Starting…".
"http_ready": None if svc.kind == "bot" else bool(http.get("ok")),
# Prefer the check fn's own top-level model key (embeddings reports # Prefer the check fn's own top-level model key (embeddings reports
# it there); fall back to a model field inside detail for services # it there); fall back to a model field inside detail for services
# whose /health embeds it (parakeet). # whose /health embeds it (parakeet).
@@ -500,7 +533,10 @@ async def get_services() -> dict:
results = await asyncio.gather(*[one(n) for n in services.keys()]) results = await asyncio.gather(*[one(n) for n in services.keys()])
for name, info in results: for name, info in results:
out[name] = info out[name] = info
# Feed http reachability into the connectivity log (transition-only) # Feed http reachability into the connectivity log (transition-only).
# Skip services with no HTTP surface (http_ready is None) — they'd
# otherwise register as perpetually "down".
if info.get("http_ready") is not None:
record_state(name, bool(info.get("http_ready"))) record_state(name, bool(info.get("http_ready")))
return out return out
@@ -606,7 +642,7 @@ async def stream_nim_install(job_id: str):
@app.delete("/api/services/{name}") @app.delete("/api/services/{name}")
async def del_service(name: str) -> dict: async def del_service(name: str) -> dict:
# Only allow deleting custom services (not the bundled built-in keys) # Only allow deleting custom services (not the bundled built-in keys)
if name in ("parakeet", "kokoro", "embeddings", "qdrant"): if name in ("parakeet", "kokoro", "embeddings", "qdrant", "matrix-bridge"):
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)") raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
delete_custom_service(name) delete_custom_service(name)
return {"ok": True, "name": name} return {"ok": True, "name": name}
@@ -625,6 +661,81 @@ async def service_action(name: str, action: str) -> dict:
return {"name": name, "action": action, **result} return {"name": name, "action": action, **result}
# ---- matrix-bridge bot: update (git pull + rebuild) + logs ----
# Status badge + start/stop/restart ride the generic /api/services machinery
# above (the bot is a registered ServiceDef). Only the long-running Update and
# the logs view need bespoke endpoints.
def _serialize_mb_update(job) -> dict:
return {
"id": job.id,
"state": job.state,
"phase": job.phase,
"started_at": job.started_at,
"finished_at": job.finished_at,
"returncode": job.returncode,
"lines": job.lines,
}
@app.post("/api/matrix-bridge/update")
async def post_matrix_bridge_update() -> dict:
"""Pull latest code, rebuild, and recreate the bot container. Long-running
(docker build) — returns a job id to stream."""
try:
job = await matrix_bridge.trigger_update()
except RuntimeError as e:
raise HTTPException(409 if "in progress" in str(e) else 503, str(e))
return {"job_id": job.id, "state": job.state}
@app.get("/api/matrix-bridge/update/{job_id}")
async def get_matrix_bridge_update(job_id: str) -> dict:
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
return _serialize_mb_update(job)
@app.get("/api/matrix-bridge/update/{job_id}/stream")
async def stream_matrix_bridge_update(job_id: str, request: Request):
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
async def gen():
sent = 0
last_phase = None
while True:
# An update can run for minutes; bail promptly if the client is gone
# rather than spinning the poll loop until the job's 25-min ceiling.
if await request.is_disconnected():
return
n = len(job.lines)
if n > sent:
for line in job.lines[sent:n]:
yield f"data: {json.dumps({'line': line})}\n\n"
sent = n
if job.phase != last_phase:
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
last_phase = job.phase
if job.returncode is not None and sent >= len(job.lines):
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
return
await asyncio.sleep(0.5)
return StreamingResponse(gen(), media_type="text/event-stream")
@app.get("/api/matrix-bridge/logs")
async def get_matrix_bridge_logs(tail: int = Query(100, ge=1, le=1000)) -> dict:
"""Last N lines of `docker logs` for the bot container (stderr merged)."""
result = await matrix_bridge.fetch_logs(tail=tail)
if not result.get("ok"):
raise HTTPException(502, result.get("output") or result.get("error") or "could not read logs")
return result
# ---- Speech model patch management ---- # ---- Speech model patch management ----
@app.get("/api/speech-models") @app.get("/api/speech-models")
+11
View File
@@ -89,6 +89,17 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
container=s.qdrant_container, container=s.qdrant_container,
port=s.qdrant_port, port=s.qdrant_port,
), ),
# matrix-bridge Matrix bot. No HTTP port to probe (host networking, no
# health endpoint) — judged purely by docker state. Driven as its own
# SSH user (modelo, the repo owner) so git/docker run unprivileged.
"matrix-bridge": ServiceDef(
name="matrix-bridge",
kind="bot",
host=s.matrix_bridge_host,
user=s.matrix_bridge_user,
container=s.matrix_bridge_container,
port=0,
),
} }
for entry in load_custom_services(): for entry in load_custom_services():
key = entry.get("key") key = entry.get("key")
+25
View File
@@ -28,6 +28,12 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
# Docker container / volume name (Docker's own rule). # Docker container / volume name (Docker's own rule).
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$") _CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
# Absolute filesystem path to a local model directory on a Spark. Conservative
# charset (letters, digits, and safe path punctuation) with a required leading
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
# and '..' segments) is rejected separately in validate_local_path.
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
def validate_repo(repo: str) -> str: def validate_repo(repo: str) -> str:
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError.""" """Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
@@ -50,6 +56,25 @@ def validate_container(name: str) -> str:
return name return name
def validate_local_path(path: str) -> str:
"""Return `path` if it is a safe absolute model directory path; else ValueError.
For locally fine-tuned models served by directory (not an HF repo). Requires
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
caller cannot traverse out of an intended models directory. The `quote_arg`
sink still quotes it in depth — this is the boundary check.
"""
p = path or ""
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
raise ValueError(
f"invalid local model path (expected an absolute path, no spaces or "
f"shell metacharacters): {path!r}"
)
if any(seg in (".", "..") for seg in p.split("/")):
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
return p
def quote_arg(value: object) -> str: def quote_arg(value: object) -> str:
"""shlex.quote a single token for safe embedding in a shell command string.""" """shlex.quote a single token for safe embedding in a shell command string."""
return shlex.quote(str(value)) return shlex.quote(str(value))
+210 -5
View File
@@ -13,6 +13,7 @@ const state = {
swap_progress: 0, // 01 swap_progress: 0, // 01
services: {}, services: {},
service_action_in_flight: null, // e.g. "parakeet:restart" service_action_in_flight: null, // e.g. "parakeet:restart"
mb_update_in_flight: false, // matrix-bridge update job running
hardware: {}, hardware: {},
config: {}, config: {},
configured: true, configured: true,
@@ -59,6 +60,7 @@ function renderCards() {
? `<div class="desc">${escapeHtml(m.description)}</div>` ? `<div class="desc">${escapeHtml(m.description)}</div>`
: ''; : '';
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : ''; const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
// Disk-presence pill + trash button. Until /api/models/disk-status comes back, // Disk-presence pill + trash button. Until /api/models/disk-status comes back,
// we don't know — render a neutral placeholder. // we don't know — render a neutral placeholder.
const disk = state.disk_status[key]; const disk = state.disk_status[key];
@@ -72,8 +74,10 @@ function renderCards() {
} }
} }
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded. // Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
// Never offered for local models: their directory is hand-placed training output,
// not a re-downloadable HF cache (the server refuses the delete too).
let trashBtn = ''; let trashBtn = '';
if (state.disk_status_loaded && disk && disk.on_disk) { if (state.disk_status_loaded && disk && disk.on_disk && !m.local_path) {
const disabled = isActive || isSwapping; const disabled = isActive || isSwapping;
const tip = isActive const tip = isActive
? 'Currently loaded — switch to another model first' ? 'Currently loaded — switch to another model first'
@@ -91,6 +95,9 @@ function renderCards() {
primaryBtn = `<button class="btn" disabled>Current</button>`; primaryBtn = `<button class="btn" disabled>Current</button>`;
} else if (isOnDisk) { } else if (isOnDisk) {
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`; primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
} else if (m.local_path) {
// A local model can't be "downloaded" — its directory has to exist on the Spark.
primaryBtn = `<button class="btn" disabled title="Directory not found on the Spark — create it there, then refresh">Not found on Spark</button>`;
} else { } else {
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)'; const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`; primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
@@ -101,12 +108,15 @@ function renderCards() {
<span class="tag mode-${m.mode}">${m.mode}</span> <span class="tag mode-${m.mode}">${m.mode}</span>
<span class="tag">${m.size_gb} GB</span> <span class="tag">${m.size_gb} GB</span>
${customPill} ${customPill}
${localPill}
${diskPill} ${diskPill}
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')} ${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
</div> </div>
${desc} ${desc}
<div class="muted small repo"> <div class="muted small repo">
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a> ${m.local_path
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
</div> </div>
<div class="spacer"></div> <div class="spacer"></div>
<div class="card-actions"> <div class="card-actions">
@@ -438,8 +448,13 @@ function classifyService(s) {
if (s.docker_state === 'missing') return 'missing'; if (s.docker_state === 'missing') return 'missing';
if (s.docker_state === 'restarting') return 'unhealthy'; if (s.docker_state === 'restarting') return 'unhealthy';
if (s.docker_state === 'exited') return 'unhealthy'; if (s.docker_state === 'exited') return 'unhealthy';
if (s.docker_state === 'running' && !s.http_ready) return 'starting'; if (s.docker_state === 'running') {
if (s.docker_state === 'running' && s.http_ready) return 'running'; // http_ready === false means an HTTP probe is expected but failing → still
// warming up. null means the service has no HTTP surface (e.g. the bot), so
// a running container is simply healthy.
if (s.http_ready === false) return 'starting';
return 'running';
}
return s.docker_state || 'unknown'; return s.docker_state || 'unknown';
} }
@@ -471,6 +486,11 @@ async function renderServices() {
grid.innerHTML = ''; grid.innerHTML = '';
for (const [name, s] of entries) { for (const [name, s] of entries) {
const cls = classifyService(s); const cls = classifyService(s);
const isBot = s.kind === 'bot';
// The bot tile is opt-in: it only belongs to deployments that actually run
// matrix-bridge. When the container is absent (missing) or the host isn't
// configured, hide the tile entirely rather than show a stray red card.
if (isBot && (cls === 'missing' || cls === 'unconfigured')) continue;
const card = document.createElement('div'); const card = document.createElement('div');
card.className = `service-card ${cls}`; card.className = `service-card ${cls}`;
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':'); const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
@@ -483,7 +503,7 @@ async function renderServices() {
return false; return false;
}; };
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`; const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
const hostStr = s.host ? `${s.host}:${s.port}` : ''; const hostStr = s.host ? (s.port ? `${s.host}:${s.port}` : s.host) : '';
const hostRow = s.host const hostRow = s.host
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>` ? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`; : `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
@@ -537,9 +557,11 @@ async function renderServices() {
${restartsRow} ${restartsRow}
${deepRow} ${deepRow}
<div class="service-actions"> <div class="service-actions">
${isBot ? `<button class="btn primary" data-mb-update title="Pull latest code, rebuild, and recreate the bot" ${inFlight || state.mb_update_in_flight ? 'disabled' : ''}>Update</button>` : ''}
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button> <button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button> <button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button> <button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
${isBot ? `<button class="btn" data-mb-logs title="Show the last 100 log lines">View logs</button>` : ''}
</div> </div>
`; `;
grid.appendChild(card); grid.appendChild(card);
@@ -547,6 +569,10 @@ async function renderServices() {
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) { for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction)); btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
} }
const mbUpdateBtn = grid.querySelector('[data-mb-update]');
if (mbUpdateBtn) mbUpdateBtn.addEventListener('click', onMatrixBridgeUpdate);
const mbLogsBtn = grid.querySelector('[data-mb-logs]');
if (mbLogsBtn) mbLogsBtn.addEventListener('click', openMatrixBridgeLogs);
for (const btn of grid.querySelectorAll('[data-dh-run]')) { for (const btn of grid.querySelectorAll('[data-dh-run]')) {
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn)); btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
} }
@@ -725,6 +751,118 @@ async function onServiceAction(key) {
} }
} }
// ===================== matrix-bridge bot (update + logs) =====================
const mbState = { job_id: null, eventsource: null, timer: null, started_at: null };
function mbTimerStart(at) {
mbState.started_at = at;
if (mbState.timer) clearInterval(mbState.timer);
const tick = () => {
if (!mbState.started_at) return;
const sec = Math.max(0, Math.floor((Date.now() - mbState.started_at) / 1000));
el('#mb-update-elapsed').textContent = `${Math.floor(sec / 60)}:${(sec % 60).toString().padStart(2, '0')}`;
};
tick();
mbState.timer = setInterval(tick, 500);
}
async function onMatrixBridgeUpdate() {
if (state.mb_update_in_flight) return;
if (!confirm('Update the matrix-bridge bot?\n\nThis pulls the latest code, rebuilds the container image, and recreates the container. The first build after a base-image change can take several minutes. The bot is briefly offline while it restarts.')) return;
state.mb_update_in_flight = true;
renderServices();
try {
const r = await fetchJSON('/api/matrix-bridge/update', { method: 'POST' });
attachMbUpdateProgress(r.job_id);
} catch (e) {
state.mb_update_in_flight = false;
renderServices();
alert('Update failed to start: ' + e.message);
}
}
async function attachMbUpdateProgress(jobId) {
mbState.job_id = jobId;
el('#mb-update-log').textContent = '';
el('#mb-update-title').textContent = 'Updating matrix-bridge…';
el('#mb-update-phase').textContent = 'Starting…';
el('#mb-update-dialog').showModal();
try {
const snap = await fetchJSON(`/api/matrix-bridge/update/${jobId}`);
mbTimerStart(Date.parse(snap.started_at));
el('#mb-update-phase').textContent = snap.phase || 'Working…';
el('#mb-update-log').textContent = (snap.lines || []).join('\n');
if (snap.returncode !== null) { onMbUpdateDone(snap); return; }
} catch { mbTimerStart(Date.now()); }
const es = new EventSource(`/api/matrix-bridge/update/${jobId}/stream`);
mbState.eventsource = es;
es.onmessage = ev => {
try {
const d = JSON.parse(ev.data);
if (d.line !== undefined) {
const log = el('#mb-update-log');
log.textContent += d.line + '\n';
log.scrollTop = log.scrollHeight;
}
} catch {}
};
es.addEventListener('phase', ev => {
try { el('#mb-update-phase').textContent = JSON.parse(ev.data).phase; } catch {}
});
es.addEventListener('done', ev => {
let d = {}; try { d = JSON.parse(ev.data); } catch {}
onMbUpdateDone(d);
});
es.onerror = () => {
// Don't leave the Update button wedged-disabled on a dropped stream. The
// job keeps running server-side; re-clicking Update returns a clean 409.
es.close();
mbState.eventsource = null;
state.mb_update_in_flight = false;
el('#mb-update-phase').textContent = 'Lost connection to the update stream — reopen or check logs.';
renderServices();
};
}
function onMbUpdateDone(d) {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
if (d.state === 'failed') {
el('#mb-update-title').textContent = `Update failed (rc=${d.returncode})`;
el('#mb-update-phase').textContent = 'Failed — see the log above.';
} else {
el('#mb-update-title').textContent = 'Update complete';
el('#mb-update-phase').textContent = 'Done ✓';
}
// Refresh the tile's badge.
(async () => { try { state.services = await fetchJSON('/api/services'); } catch {} renderServices(); })();
}
async function openMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
el('#mb-logs-title').textContent = 'matrix-bridge logs';
pre.textContent = 'Loading…';
el('#mb-logs-dialog').showModal();
await loadMatrixBridgeLogs();
}
async function loadMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
const btn = el('#mb-logs-refresh');
if (btn) btn.disabled = true;
try {
const r = await fetchJSON('/api/matrix-bridge/logs?tail=100');
pre.textContent = r.output || '(no output)';
pre.scrollTop = pre.scrollHeight;
} catch (e) {
pre.textContent = 'Could not read logs: ' + e.message;
} finally {
if (btn) btn.disabled = false;
}
}
function renderEndpoint(status) { function renderEndpoint(status) {
const v = status.vllm || {}; const v = status.vllm || {};
const panel = el('#endpoint-panel'); const panel = el('#endpoint-panel');
@@ -1542,6 +1680,60 @@ function setupAdvancedDialog() {
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); }); el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
} }
function openLocalModelDialog() {
const dlg = el('#local-model-dialog');
el('#lm-key').value = '';
el('#lm-name').value = '';
el('#lm-path').value = '';
el('#lm-chat').value = '';
el('#lm-size').value = '';
el('#lm-mode').value = 'solo';
el('#lm-desc').value = '';
el('#lm-mml').value = 32768;
el('#lm-gmu').value = 0.85;
el('#lm-gmu-out').value = '0.85';
el('#lm-fst').checked = true;
el('#lm-pcache').checked = true;
el('#lm-fp8').checked = true;
dlg.showModal();
}
function setupLocalModelDialog() {
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
el('#local-model-form').addEventListener('submit', async (e) => {
e.preventDefault();
const chat = el('#lm-chat').value.trim();
const body = {
key: el('#lm-key').value.trim(),
display_name: el('#lm-name').value.trim(),
local_path: el('#lm-path').value.trim(),
size_gb: parseFloat(el('#lm-size').value) || 0,
mode: el('#lm-mode').value,
description: el('#lm-desc').value.trim() || null,
// A fine-tune's chat template (if any) rides along as a launch flag.
vllm_args: chat ? [`--chat-template=${chat}`] : [],
knobs: {
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
fastsafetensors: el('#lm-fst').checked,
prefix_caching: el('#lm-pcache').checked,
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
},
};
try {
await fetchJSON('/api/models', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(body),
});
el('#local-model-dialog').close();
await loadModels();
pollStatus();
} catch (e) { alert('Add local model failed: ' + e.message); }
});
}
// ===================== NIM installer ===================== // ===================== NIM installer =====================
const nimState = { const nimState = {
@@ -1883,6 +2075,17 @@ async function init() {
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close()); el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
el('#nim-form').addEventListener('submit', submitNim); el('#nim-form').addEventListener('submit', submitNim);
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close()); el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
el('#mb-update-close').addEventListener('click', () => el('#mb-update-dialog').close());
// Dismissing the modal (Close or Esc) stops streaming; the job runs on
// server-side and re-clicking Update returns a 409 if still in progress.
el('#mb-update-dialog').addEventListener('close', () => {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
renderServices();
});
el('#mb-logs-close').addEventListener('click', () => el('#mb-logs-dialog').close());
el('#mb-logs-refresh').addEventListener('click', loadMatrixBridgeLogs);
el('#open-connectivity').addEventListener('click', openConnectivityDialog); el('#open-connectivity').addEventListener('click', openConnectivityDialog);
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close()); el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
// Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on // Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on
@@ -1894,8 +2097,10 @@ async function init() {
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; } if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
}); });
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close()); el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
el('#open-local').addEventListener('click', openLocalModelDialog);
setupCatalogDialog(); setupCatalogDialog();
setupAdvancedDialog(); setupAdvancedDialog();
setupLocalModelDialog();
// Open WebUI link from /api/config // Open WebUI link from /api/config
try { try {
state.config = await fetchJSON('/api/config'); state.config = await fetchJSON('/api/config');
+63
View File
@@ -164,6 +164,37 @@
</div> </div>
</form> </form>
</dialog> </dialog>
<dialog id="mb-update-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-update-title">Updating matrix-bridge…</h3>
<div class="phase-row">
<div class="phase" id="mb-update-phase">Starting…</div>
<span class="spacer"></span>
<span class="timer" id="mb-update-elapsed">0:00</span>
</div>
<details open>
<summary class="muted small">Log</summary>
<pre id="mb-update-log" class="log"></pre>
</details>
<div class="modal-actions">
<button type="button" id="mb-update-close" class="btn">Close</button>
</div>
</form>
</dialog>
<dialog id="mb-logs-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-logs-title">matrix-bridge logs</h3>
<p class="muted small">Last 100 lines from <code>docker logs</code> on the Spark.</p>
<pre id="mb-logs-pre" class="log"></pre>
<div class="modal-actions">
<button type="button" id="mb-logs-refresh" class="btn">Refresh</button>
<span class="spacer"></span>
<button type="button" id="mb-logs-close" class="btn">Close</button>
</div>
</form>
</dialog>
</section> </section>
<section id="speech-models-panel" class="speech-models hidden"> <section id="speech-models-panel" class="speech-models hidden">
@@ -198,6 +229,7 @@
<div class="section-header"> <div class="section-header">
<h2 class="section-title">LLM swap</h2> <h2 class="section-title">LLM swap</h2>
<button id="open-download" class="btn small-btn">+ Download a new model</button> <button id="open-download" class="btn small-btn">+ Download a new model</button>
<button id="open-local" class="btn small-btn">+ Add local model</button>
</div> </div>
<dialog id="catalog-dialog" class="modal"> <dialog id="catalog-dialog" class="modal">
@@ -230,6 +262,37 @@
</form> </form>
</dialog> </dialog>
<dialog id="local-model-dialog" class="modal">
<form method="dialog" class="modal-form" id="local-model-form">
<h3>Add a local / fine-tuned model</h3>
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
<label class="modal-row"><span>Mode</span>
<select id="lm-mode">
<option value="solo">solo (Spark 1 only)</option>
<option value="cluster">cluster (both Sparks via Ray)</option>
</select>
</label>
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
<fieldset class="modal-fieldset">
<legend>Default launch knobs</legend>
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
</fieldset>
<div class="modal-actions">
<button type="button" id="lm-cancel" class="btn">Cancel</button>
<button type="submit" class="btn primary">Add local model</button>
</div>
</form>
</dialog>
<dialog id="disk-delete-dialog" class="modal"> <dialog id="disk-delete-dialog" class="modal">
<form method="dialog" class="modal-form"> <form method="dialog" class="modal-form">
<h3>Delete model weights from disk?</h3> <h3>Delete model weights from disk?</h3>
+6 -2
View File
@@ -526,10 +526,12 @@ main {
#dl-log-details { margin-top: 12px; } #dl-log-details { margin-top: 12px; }
#dl-log-details summary { cursor: pointer; padding: 4px 0; } #dl-log-details summary { cursor: pointer; padding: 4px 0; }
/* ===== NIM install dialog ===== */ /* ===== NIM install + matrix-bridge dialogs ===== */
.modal#nim-dialog, .modal#nim-dialog,
.modal#nim-progress-dialog { max-width: 640px; } .modal#nim-progress-dialog,
.modal#mb-update-dialog,
.modal#mb-logs-dialog { max-width: 640px; }
.nim-grid { .nim-grid {
display: grid; display: grid;
gap: 8px; gap: 8px;
@@ -692,6 +694,7 @@ main {
.card .repo a { color: inherit; text-decoration: none; } .card .repo a { color: inherit; text-decoration: none; }
.card .repo a:hover { color: var(--info); text-decoration: underline; } .card .repo a:hover { color: var(--info); text-decoration: underline; }
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; } .card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
.tag { .tag {
background: var(--surface-2); background: var(--surface-2);
border: 1px solid var(--border); border: 1px solid var(--border);
@@ -736,6 +739,7 @@ main {
.card .adv-btn, .card .adv-btn,
.card .test-btn { padding: 8px 12px; font-size: 12px; } .card .test-btn { padding: 8px 12px; font-size: 12px; }
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); } .card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); } .tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; } .tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; } .card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
+81
View File
@@ -7,6 +7,9 @@ the command back into the exact token list. The vLLM pre-flight validator
""" """
import shlex import shlex
import pytest
from pydantic import ValidationError
from app.models import Defaults, ModelDef, build_launch_command from app.models import Defaults, ModelDef, build_launch_command
DEFAULTS = Defaults(port=8888, host="0.0.0.0") DEFAULTS = Defaults(port=8888, host="0.0.0.0")
@@ -65,3 +68,81 @@ def test_injection_via_vllm_arg_stays_literal():
payload = "--foo=$(touch /tmp/pwned)" payload = "--foo=$(touch /tmp/pwned)"
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS) cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
assert payload in shlex.split(cmd) # preserved as one inert token assert payload in shlex.split(cmd) # preserved as one inert token
# ---- local / fine-tuned models (served by directory, not HF repo) ----
def test_local_model_bind_mounts_dir_and_serves_the_path():
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
# The launch script's hook bind-mounts the host dir at the SAME container path.
assert tokens[0] == (
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
)
# vLLM is pointed at the directory, not an HF repo id.
i = tokens.index("serve")
assert tokens[i + 1] == "/home/u/models/ft-v2"
assert "--max-model-len=2048" in tokens
def test_local_model_chat_template_arg_survives_round_trip():
m = _model(
repo="",
local_path="/m/ft",
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
)
cmd = build_launch_command("k", m, DEFAULTS)
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
def test_local_path_with_metacharacters_is_quoted_not_executed():
# The validator rejects a hostile path at the boundary; bypass it with
# model_construct to prove the quote_arg sink is safe in depth even if a bad
# value somehow reaches build_launch_command.
evil = "/m/ft; rm -rf ~"
m = ModelDef.model_construct(
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
vllm_args=[], knobs=None, custom=False, capabilities=[],
expected_ready_seconds=300, description=None,
)
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
i = tokens.index("serve")
assert tokens[i + 1] == evil # recovered as one literal token, not executed
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
def test_model_requires_exactly_one_source():
with pytest.raises(ValidationError):
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
def test_local_model_rejects_chat_template_outside_dir():
# Only local_path is mounted into the container, so a chat-template elsewhere
# would silently 404 inside vLLM — reject it up front.
with pytest.raises(ValidationError):
ModelDef(
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
vllm_args=["--chat-template=/other/dir/t.jinja"],
)
def test_invalid_local_path_rejected_by_model():
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
# YAML/override-added local models get the same validation as the API; a single
# bad entry is skipped (logged) rather than breaking the whole catalog load.
from app import models as M
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
]})
cat = M._merge_overrides(M.Catalog(models={}))
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
+47
View File
@@ -0,0 +1,47 @@
"""build_update_command: the matrix-bridge update one-liner.
Pure string assembly, no cluster. Locks in the contract from
docs/spark-control-integration.md (matrix-bridge repo): fetch, hard-reset to the
release branch, then rebuild/recreate via docker compose — chained with `&&` so
any failure (e.g. Gitea unreachable) aborts before the build and surfaces a
non-zero exit. The clone dir must stay unquoted so a `~` expands server-side.
"""
from app.matrix_bridge import build_update_command, _phase_for
def test_command_is_the_contract_chain():
cmd = build_update_command("~/matrix-bridge", "master")
assert cmd == (
"cd ~/matrix-bridge && "
"git fetch origin && "
"git reset --hard origin/master && "
"docker compose up -d --build"
)
def test_fail_loud_chaining():
# Every step is &&-chained: a failed fetch never reaches the build.
cmd = build_update_command("~/matrix-bridge", "master")
assert "; " not in cmd
assert cmd.count(" && ") == 3
assert cmd.index("git fetch") < cmd.index("git reset") < cmd.index("docker compose")
def test_tilde_dir_left_unquoted_for_server_side_expansion():
cmd = build_update_command("~/matrix-bridge", "master")
assert "cd ~/matrix-bridge &&" in cmd
assert "'~" not in cmd # quoting would defeat the home-dir expansion
def test_absolute_dir_and_custom_branch():
cmd = build_update_command("/home/modelo/matrix-bridge", "phase-1")
assert cmd.startswith("cd /home/modelo/matrix-bridge && ")
assert "git reset --hard origin/phase-1 &&" in cmd
def test_phase_detection_maps_known_lines():
assert _phase_for("HEAD is now at 1a2b3c4 some commit") == "Resetting to the latest release…"
assert _phase_for("#5 building image") == "Building the bot image…"
assert _phase_for("Container matrix-bridge Recreate") == "Recreating the container…"
assert _phase_for("Already up to date.") == "No new code; rebuilding…"
assert _phase_for("some unremarkable line") is None
+30 -1
View File
@@ -6,7 +6,12 @@ use `validate_x(v)` inline.
""" """
import pytest import pytest
from app.shellsafe import validate_container, validate_image, validate_repo from app.shellsafe import (
validate_container,
validate_image,
validate_local_path,
validate_repo,
)
# Shell metacharacters that must never survive any validator — these are the # Shell metacharacters that must never survive any validator — these are the
# actual injection vectors. (Path traversal like "../" is NOT in scope here: # actual injection vectors. (Path traversal like "../" is NOT in scope here:
@@ -96,3 +101,27 @@ def test_container_valid_passes_through_unchanged(name):
def test_container_rejects_malformed_and_hostile(name): def test_container_rejects_malformed_and_hostile(name):
with pytest.raises(ValueError): with pytest.raises(ValueError):
validate_container(name) validate_container(name)
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
@pytest.mark.parametrize("path", [
"/home/modelo/models/gemma-4-31B-ten31-v2",
"/data/models/ft.v2_1",
"/srv/m/a-b/c",
])
def test_local_path_valid_passes_through_unchanged(path):
assert validate_local_path(path) == path
@pytest.mark.parametrize("path", [
"",
"relative/path", # must be absolute
"~/models/x", # no ~ expansion
"/models/../etc/shadow", # '..' traversal
"/models/./x", # '.' segment
"/a" * 300, # over the 512 cap (600 chars)
] + [f"/models/x{h}" for h in HOSTILE])
def test_local_path_rejects_relative_traversal_and_hostile(path):
with pytest.raises(ValueError):
validate_local_path(path)
+11
View File
@@ -1,3 +1,14 @@
ARCHES := x86 ARCHES := x86
# overrides to s9pk.mk must precede the include statement # overrides to s9pk.mk must precede the include statement
include s9pk.mk include s9pk.mk
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
.PHONY: release
release:
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
@@ -40,6 +40,15 @@ const inputSpec = InputSpec.of({
placeholder: 'your SSH username', placeholder: 'your SSH username',
masked: false, masked: false,
}), }),
vllm_port: Value.text({
name: 'vLLM port (optional)',
description:
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
required: false,
default: null,
placeholder: 'leave blank for 8888',
masked: false,
}),
parakeet_host: Value.text({ parakeet_host: Value.text({
name: 'Parakeet host (optional)', name: 'Parakeet host (optional)',
description: description:
@@ -119,6 +128,15 @@ const inputSpec = InputSpec.of({
placeholder: 'e.g. crm_chunks', placeholder: 'e.g. crm_chunks',
masked: false, masked: false,
}), }),
matrix_bridge_user: Value.text({
name: 'matrix-bridge bot SSH user (optional)',
description:
"If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
required: false,
default: null,
placeholder: 'e.g. modelo',
masked: false,
}),
open_webui_url: Value.text({ open_webui_url: Value.text({
name: 'Open WebUI URL (optional)', name: 'Open WebUI URL (optional)',
description: description:
@@ -7,6 +7,8 @@ export const sparkConfigSchema = z.object({
spark1_user: z.string().catch(''), spark1_user: z.string().catch(''),
spark2_host: z.string().catch(''), spark2_host: z.string().catch(''),
spark2_user: z.string().catch(''), spark2_user: z.string().catch(''),
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
vllm_port: z.string().catch(''),
// Optional per-service overrides. Blank => use spark2_host / spark2_user. // Optional per-service overrides. Blank => use spark2_host / spark2_user.
parakeet_host: z.string().catch(''), parakeet_host: z.string().catch(''),
parakeet_user: z.string().catch(''), parakeet_user: z.string().catch(''),
@@ -22,6 +24,8 @@ export const sparkConfigSchema = z.object({
qdrant_user: z.string().catch(''), qdrant_user: z.string().catch(''),
qdrant_container: z.string().catch(''), qdrant_container: z.string().catch(''),
qdrant_collection: z.string().catch(''), qdrant_collection: z.string().catch(''),
// Optional matrix-bridge bot. Blank => no tile. Host reuses Spark 2.
matrix_bridge_user: z.string().catch(''),
// Optional Open WebUI deep-link // Optional Open WebUI deep-link
open_webui_url: z.string().catch(''), open_webui_url: z.string().catch(''),
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/... // Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
+4
View File
@@ -13,6 +13,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
spark1_user: '', spark1_user: '',
spark2_host: '', spark2_host: '',
spark2_user: '', spark2_user: '',
vllm_port: '',
parakeet_host: '', parakeet_host: '',
parakeet_user: '', parakeet_user: '',
parakeet_container: '', parakeet_container: '',
@@ -26,6 +27,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
qdrant_user: '', qdrant_user: '',
qdrant_container: '', qdrant_container: '',
qdrant_collection: '', qdrant_collection: '',
matrix_bridge_user: '',
open_webui_url: '', open_webui_url: '',
ngc_api_key: '', ngc_api_key: '',
} }
@@ -49,6 +51,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
SPARK1_USER: cfg.spark1_user, SPARK1_USER: cfg.spark1_user,
SPARK2_HOST: cfg.spark2_host, SPARK2_HOST: cfg.spark2_host,
SPARK2_USER: cfg.spark2_user, SPARK2_USER: cfg.spark2_user,
VLLM_PORT: cfg.vllm_port,
PARAKEET_HOST: cfg.parakeet_host, PARAKEET_HOST: cfg.parakeet_host,
PARAKEET_USER: cfg.parakeet_user, PARAKEET_USER: cfg.parakeet_user,
PARAKEET_CONTAINER: cfg.parakeet_container, PARAKEET_CONTAINER: cfg.parakeet_container,
@@ -62,6 +65,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
QDRANT_USER: cfg.qdrant_user, QDRANT_USER: cfg.qdrant_user,
QDRANT_CONTAINER: cfg.qdrant_container, QDRANT_CONTAINER: cfg.qdrant_container,
QDRANT_COLLECTION: cfg.qdrant_collection, QDRANT_COLLECTION: cfg.qdrant_collection,
MATRIX_BRIDGE_USER: cfg.matrix_bridge_user,
MODELS_OVERRIDES: '/data/models-overrides.yaml', MODELS_OVERRIDES: '/data/models-overrides.yaml',
SERVICES_OVERRIDES: '/data/services-overrides.yaml', SERVICES_OVERRIDES: '/data/services-overrides.yaml',
CONNECTIVITY_LOG: '/data/connectivity.json', CONNECTIVITY_LOG: '/data/connectivity.json',
+2 -2
View File
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk' import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({ export const v0_1_0 = VersionInfo.of({
version: '0.20.0:0', version: '0.23.0:0',
releaseNotes: { releaseNotes: {
en_US: en_US:
"v0.20.0:0 — Spark connectivity helpers on the hardware cards. (1) A small copy icon in each card's top-right corner grabs that Spark's SSH public key — the key the Spark uses to log in to OTHER machines (e.g. your Mac). If the Spark has no key yet, one is generated on the spot (no passphrase, so apps can use it unattended); an existing key is never overwritten. A dialog shows the key plus a ready-to-paste command for adding it on the target machine. (This is the opposite direction from the existing \"Show Public Key\" action, which grants THIS dashboard access to your Sparks.) (2) If a Spark is on a WireGuard tunnel, its card now shows a read-only \"VPN <ip>\" badge next to the uptime, so you can see at a glance that the box is reachable off-LAN. All read-only — the dashboard does not configure the tunnel.", "v0.23.0:0 — local / fine-tuned model support. You can now add a model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune), not just a Hugging Face repo. Use the new \"+ Add local model\" button under LLM swap: give it the model's absolute path on the Spark, an optional chat-template path, and the usual launch knobs. On swap, Spark Control bind-mounts that directory into the vLLM container at the same path (via the launch script's existing VLLM_SPARK_EXTRA_DOCKER_ARGS hook — nothing to change on the Spark) and runs `vllm serve <dir>`. Local models show a \"local\" badge and their path instead of a Hugging Face link, and their weights are never offered for dashboard deletion (that directory is your own training output, not a re-downloadable cache). API: POST /api/models now accepts `local_path` (set exactly one of `repo` or `local_path`), validated against a strict path whitelist with no traversal.",
}, },
migrations: { migrations: {
up: async ({ effects }) => {}, up: async ({ effects }) => {},
+24
View File
@@ -34,6 +34,24 @@ These take effect on the **next swap to that model**. If a swap fails after this
- Status auto-refreshes every 5 s. - Status auto-refreshes every 5 s.
- A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream. - A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
## matrix-bridge bot tile (optional)
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
```
Host <your-git-host>
Port <port>
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
```
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
## Adding a new model ## Adding a new model
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`. 1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
@@ -42,6 +60,12 @@ These take effect on the **next swap to that model**. If a swap fails after this
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable. If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
### Local / fine-tuned models (v0.23.0+)
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
## Manual swap fallback ## Manual swap fallback
If the UI is unavailable and you need to swap by hand: If the UI is unavailable and you need to swap by hand:
+65
View File
@@ -0,0 +1,65 @@
#!/usr/bin/env bash
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
# latest package with a read-only token instead of being hand-sent the file.
#
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
#
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
# reuses an existing release for the tag and replaces a same-named asset.
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
set -euo pipefail
VERSION="${1:-}"; S9PK="${2:-}"
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
ASSET="$(basename "$S9PK")"
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
api() {
local method="$1" url="$2"; shift 2
local out
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
-w $'\n%{http_code}' "$url")"
HTTP_CODE="${out##*$'\n'}"
BODY="${out%$'\n'*}"
}
# Reuse an existing release for this tag, otherwise create one.
api GET "$API/releases/tags/$TAG"
if [ "$HTTP_CODE" = 200 ]; then
id="$(printf '%s' "$BODY" | jq -r '.id')"
elif [ "$HTTP_CODE" = 404 ]; then
api POST "$API/releases" -H 'Content-Type: application/json' \
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
id="$(printf '%s' "$BODY" | jq -r '.id')"
else
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
exit 1
fi
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
# Replace a same-named asset so re-runs don't 409.
api GET "$API/releases/$id/assets"
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
api POST "$API/releases/$id/assets?name=$ASSET" \
-F "attachment=@${S9PK};type=application/octet-stream"
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"