diff --git a/AGENTS.md b/AGENTS.md index c3287d8..bde7d87 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -57,10 +57,10 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou - **Live service runs v0.22.0:0** (installed and serving); **v0.23.0:0 is built, committed (`e783653`), tagged, and published to Gitea Releases but its live install is PENDING** — see the P3 line below. Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN ` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Local/fine-tuned model support lands live once v0.23.0:0 installs. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`. - **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note). -- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts. +- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (102 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), and the configurable-topology layer (`test_topology.py`: `DISABLED_SERVICES` parsing, `vllm_container` override, disabled-service skip in `services_from_settings` + `check_*`, `probe_vllm_endpoint`). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts. - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee. - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector. - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns). - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag. - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) -- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H ` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP). +- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H ` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. Next: (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP). diff --git a/ROADMAP.md b/ROADMAP.md index 476c517..6a234f3 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -11,7 +11,7 @@ Driven by the one other Spark Control adopter (a colleague running OpenClaw + cr Sequenced: 1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000. 2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case. -3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.) +3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.) 4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models): - **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory. - **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes. diff --git a/image/app/config.py b/image/app/config.py index e0d50aa..75107c4 100644 --- a/image/app/config.py +++ b/image/app/config.py @@ -1,13 +1,44 @@ from __future__ import annotations +import logging import os from dataclasses import dataclass from pathlib import Path +from .shellsafe import validate_container + +log = logging.getLogger(__name__) + def _env(name: str, default: str = "") -> str: return os.environ.get(name, default) +def _env_container(name: str, default: str) -> str: + """Resolve a container-name env var, validating it at the config boundary. + + The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at + the sink — but per the repo's two-layer convention it's also whitelist-checked + here. A malformed optional value falls back to `default` rather than crashing + daemon startup (mirrors `_env_int` for VLLM_PORT).""" + val = os.environ.get(name, "") or default + try: + return validate_container(val) + except ValueError: + log.warning("ignoring invalid %s=%r; using %r", name, val, default) + return default + + +def _env_set(name: str) -> frozenset[str]: + """Parse a comma-separated env var into a lowercased frozenset of keys. + + Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given + support service can switch its tile + probes off entirely (rather than have + the probe hit whatever else listens on that port — e.g. a vLLM sharing + Parakeet's default 8000).""" + raw = os.environ.get(name, "") + return frozenset(part.strip().lower() for part in raw.split(",") if part.strip()) + + def _env_int(name: str, default: int) -> int: """Parse an int env var, falling back to `default` when unset, blank, or malformed. The StartOS Configure panel passes optional numeric fields as an @@ -63,6 +94,8 @@ class Settings: ssh_known_hosts: str models_yaml: str vllm_port: int + vllm_container: str + disabled_services: frozenset[str] parakeet_port: int kokoro_port: int embed_port: int @@ -116,6 +149,15 @@ class Settings: ssh_known_hosts=_env("SSH_KNOWN_HOSTS"), models_yaml=_resolve_models_yaml(), vllm_port=_env_int("VLLM_PORT", 8888), + # Container name for the swappable vLLM on Spark 1. Defaults to the + # bundled launch-cluster.sh container; override if you named yours + # something else (the swap log-tail and pre-flight validator exec + # into it by name). + vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"), + # Built-in support-service keys (parakeet, kokoro, embeddings, + # qdrant) the deployment doesn't run — hidden from the dashboard and + # never probed. + disabled_services=_env_set("DISABLED_SERVICES"), parakeet_port=_env_int("PARAKEET_PORT", 8000), kokoro_port=_env_int("KOKORO_PORT", 8880), embed_port=_env_int("EMBED_PORT", 8088), diff --git a/image/app/custom_services.py b/image/app/custom_services.py index 3537ef8..18f88a9 100644 --- a/image/app/custom_services.py +++ b/image/app/custom_services.py @@ -10,6 +10,17 @@ Format: port: 8001 health_path: /health image: nvcr.io/nim/nvidia/riva-multilingual:latest + +A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only — +the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile +probed via /v1/models plus container state and start/stop/restart: + custom: + - key: vllm-spark2 + kind: vllm + host: + user: + container: vllm_node + port: 8000 """ from __future__ import annotations import os diff --git a/image/app/deep_health.py b/image/app/deep_health.py index bc15ef8..769d1ea 100644 --- a/image/app/deep_health.py +++ b/image/app/deep_health.py @@ -377,6 +377,10 @@ class DeepHealth: async def run_all(self) -> dict[str, ProbeResult]: results = {} for name in self.PROBES: + # Don't deep-probe a service the deployment switched off — its port + # may be answered by something else (e.g. a vLLM on Parakeet's 8000). + if name in self.settings.disabled_services: + continue results[name] = await self.run_one(name) return results diff --git a/image/app/health.py b/image/app/health.py index 1ddeb12..2ee6d89 100644 --- a/image/app/health.py +++ b/image/app/health.py @@ -6,17 +6,28 @@ from .config import Settings _TIMEOUT = 3.0 -async def check_vllm(settings: Settings) -> dict: - base_url = ( - f"http://{settings.spark1_host}:{settings.vllm_port}/v1" - if settings.spark1_host - else None - ) - if not settings.spark1_host: - return {"ok": False, "error": "spark1 not configured", "base_url": base_url} +def _disabled(settings: Settings, key: str) -> dict | None: + """A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None. + + Lets an adopter who doesn't run a given support service switch its probe off + entirely — so the probe never hits whatever else listens on that port, and + the connectivity log doesn't record it as perpetually down.""" + if key in settings.disabled_services: + return {"ok": False, "disabled": True, "error": "disabled", "base_url": None} + return None + + +async def probe_vllm_endpoint(host: str, port: int) -> dict: + """Probe any OpenAI-compatible vLLM at host:port via /v1/models. + + Shared by the primary (Spark 1) health check and any extra vLLM registered + as a custom service (kind: vllm) to monitor a second Spark.""" + base_url = f"http://{host}:{port}/v1" if host else None + if not host: + return {"ok": False, "error": "vllm host not configured", "base_url": base_url} try: async with httpx.AsyncClient(timeout=_TIMEOUT) as c: - r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models") + r = await c.get(f"http://{host}:{port}/v1/models") r.raise_for_status() ids = [m["id"] for m in r.json().get("data", [])] return { @@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict: return {"ok": False, "error": str(e), "base_url": base_url} +async def check_vllm(settings: Settings) -> dict: + if not settings.spark1_host: + return {"ok": False, "error": "spark1 not configured", "base_url": None} + return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port) + + async def check_parakeet(settings: Settings) -> dict: + if d := _disabled(settings, "parakeet"): + return d base_url = ( f"http://{settings.parakeet_host}:{settings.parakeet_port}" if settings.parakeet_host @@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict: async def check_kokoro(settings: Settings) -> dict: + if d := _disabled(settings, "kokoro"): + return d base_url = ( f"http://{settings.kokoro_host}:{settings.kokoro_port}" if settings.kokoro_host @@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict: async def check_embeddings(settings: Settings) -> dict: + if d := _disabled(settings, "embeddings"): + return d base_url = ( f"http://{settings.embed_host}:{settings.embed_port}" if settings.embed_host @@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict: async def check_qdrant(settings: Settings) -> dict: + if d := _disabled(settings, "qdrant"): + return d base_url = ( f"http://{settings.qdrant_host}:{settings.qdrant_port}" if settings.qdrant_host diff --git a/image/app/server.py b/image/app/server.py index 93ff0a5..e8249ea 100644 --- a/image/app/server.py +++ b/image/app/server.py @@ -20,7 +20,7 @@ from .llm_proxy import build_router as build_llm_router from .embeddings_proxy import build_router as build_embeddings_router from .redaction_gateway import build_router as build_redaction_router, MapStore from .hardware import HardwareProbe -from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant +from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint from .matrix_bridge import MatrixBridgeManager from .models import ModelDef, load_catalog from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager @@ -500,6 +500,10 @@ async def get_services() -> dict: http = await check_embeddings(settings) elif name == "qdrant": http = await check_qdrant(settings) + elif svc.kind == "vllm": + # An extra vLLM monitored on another Spark (registered as a custom + # service). Probe its own host/port, not the primary Spark 1 one. + http = await probe_vllm_endpoint(svc.host, svc.port) elif svc.kind == "bot": # No HTTP health endpoint (host networking, no port) — judged purely # by docker state. http_ready stays None so the badge isn't pinned @@ -521,7 +525,7 @@ async def get_services() -> dict: # Prefer the check fn's own top-level model key (embeddings reports # it there); fall back to a model field inside detail for services # whose /health embeds it (parakeet). - "model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None), + "model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None), "docker_state": docker.get("state"), "restart_count": docker.get("restart_count"), "started_at": docker.get("started_at"), @@ -799,17 +803,20 @@ async def get_endpoints() -> dict: "base_url": vllm.get("base_url"), "model": vllm.get("current_model"), "openai_compat": True, + "disabled": bool(vllm.get("disabled")), }, "parakeet": { "ready": bool(parakeet.get("ok")), "base_url": parakeet.get("base_url"), "kind": "stt", "model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None, + "disabled": bool(parakeet.get("disabled")), }, "kokoro": { "ready": bool(kokoro.get("ok")), "base_url": kokoro.get("base_url"), "kind": "tts", + "disabled": bool(kokoro.get("disabled")), }, "embeddings": { "ready": bool(embeddings.get("ok")), @@ -818,12 +825,14 @@ async def get_endpoints() -> dict: "model": embeddings.get("model"), # The proxied OpenAI-compatible endpoints live on Spark Control itself. "openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"], + "disabled": bool(embeddings.get("disabled")), }, "qdrant": { "ready": bool(qdrant.get("ok")), "base_url": qdrant.get("base_url"), "kind": "vectordb", "collection": settings.qdrant_collection or None, + "disabled": bool(qdrant.get("disabled")), }, } @@ -837,12 +846,15 @@ async def get_status() -> dict: check_embeddings(settings), check_qdrant(settings), ) - # Feed health into the connectivity log (deduped — only logs on transition) - record_state("vllm", bool(vllm.get("ok"))) - record_state("parakeet", bool(parakeet.get("ok"))) - record_state("kokoro", bool(kokoro.get("ok"))) - record_state("embeddings", bool(embeddings.get("ok"))) - record_state("qdrant", bool(qdrant.get("ok"))) + # Feed health into the connectivity log (deduped — only logs on transition). + # Skip services switched off via DISABLED_SERVICES — they'd otherwise log as + # perpetually down. + for _name, _r in ( + ("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro), + ("embeddings", embeddings), ("qdrant", qdrant), + ): + if not _r.get("disabled"): + record_state(_name, bool(_r.get("ok"))) current_key = _identify_current_model(vllm.get("current_model")) return { "configured": settings.configured, diff --git a/image/app/services.py b/image/app/services.py index 2c9b71b..01795bb 100644 --- a/image/app/services.py +++ b/image/app/services.py @@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart ` via SSH on the appropriate host. """ from __future__ import annotations +import logging import time from dataclasses import dataclass from typing import Literal, Optional @@ -13,6 +14,8 @@ from .config import Settings from .shellsafe import quote_arg from .ssh import ssh_run +log = logging.getLogger(__name__) + # Cache the "unreachable" verdict per (host, user) for a short period so that a # repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time. @@ -103,7 +106,13 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]: } for entry in load_custom_services(): key = entry.get("key") - if not key or key in out: + if not key: + continue + if key in out: + # A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so + # an adopter who picked a colliding key for, say, a second vLLM sees + # why no tile appeared instead of a silent no-op. + log.warning("custom service %r collides with a built-in name; ignoring", key) continue out[key] = ServiceDef( name=key, @@ -113,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]: container=entry.get("container", key), port=int(entry.get("port", 0)), ) - return out + # Drop services the deployment has switched off (DISABLED_SERVICES) so they + # show no tile and are never probed/auto-restarted. + return {k: v for k, v in out.items() if k not in s.disabled_services} async def docker_state(settings: Settings, svc: ServiceDef) -> dict: diff --git a/image/app/static/app.js b/image/app/static/app.js index ff96c14..7ea1778 100644 --- a/image/app/static/app.js +++ b/image/app/static/app.js @@ -932,6 +932,10 @@ function renderHealth(status) { function setDot(id, ok, payload) { const item = el(id); if (!item) return; + // A service switched off via DISABLED_SERVICES isn't part of this + // deployment — hide its indicator entirely rather than show it as down. + if (payload && payload.disabled) { item.classList.add('hidden'); return; } + item.classList.remove('hidden'); const dot = item.querySelector('.dot'); dot.classList.remove('ok', 'bad', 'warn'); if (ok === true) dot.classList.add('ok'); diff --git a/image/app/swap.py b/image/app/swap.py index 07d400a..49a1bc1 100644 --- a/image/app/swap.py +++ b/image/app/swap.py @@ -7,6 +7,7 @@ from typing import Optional from .config import Settings from .models import Catalog, build_launch_command +from .shellsafe import quote_arg from .ssh import ssh_run, ssh_stream, StreamHandle @@ -112,7 +113,7 @@ class SwapManager: # Step 3: tail logs until the ready marker (or timeout) job.state = "tailing" - tail_cmd = "docker logs -f --tail 50 vllm_node" + tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}" job.append(f"$ {tail_cmd}") timeout = max(model.expected_ready_seconds * 2, 600) handle = StreamHandle() diff --git a/image/app/validate.py b/image/app/validate.py index 983e267..548c81f 100644 --- a/image/app/validate.py +++ b/image/app/validate.py @@ -22,6 +22,7 @@ from typing import Any from .config import Settings from .models import Catalog, build_launch_command +from .shellsafe import quote_arg from .ssh import ssh_run @@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic # Pipe the JSON args list to a here-doc Python invocation. The validator # reads from stdin to avoid shell-escaping the args themselves. cmd = ( - f"echo '{payload}' | docker exec -i vllm_node python3 -c " + f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c " + shlex.quote(_VALIDATOR_SCRIPT) ) diff --git a/image/tests/test_topology.py b/image/tests/test_topology.py new file mode 100644 index 0000000..3e978ba --- /dev/null +++ b/image/tests/test_topology.py @@ -0,0 +1,120 @@ +"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the +extra-vLLM probe. All offline — the disabled checks short-circuit before any +network call, and the probes are exercised only on the not-configured path. +""" +import asyncio + +from app.config import Settings +from app.health import ( + check_embeddings, + check_kokoro, + check_parakeet, + check_qdrant, + check_vllm, + probe_vllm_endpoint, +) +from app.services import services_from_settings + + +def _settings(monkeypatch, **env) -> Settings: + # Pin the topology env vars under test; default the rest to blank so a stray + # value in the real environment can't leak into the assertion. + keys = [ + "SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER", + "DISABLED_SERVICES", "VLLM_CONTAINER", + ] + for k in keys: + monkeypatch.delenv(k, raising=False) + for k, v in env.items(): + monkeypatch.setenv(k, v) + return Settings.from_env() + + +# ---- DISABLED_SERVICES parsing ---- + +def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch): + s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,") + assert s.disabled_services == frozenset({"parakeet", "kokoro"}) + + +def test_disabled_services_blank_is_empty(monkeypatch): + assert _settings(monkeypatch).disabled_services == frozenset() + + +# ---- vLLM container override ---- + +def test_vllm_container_defaults_to_vllm_node(monkeypatch): + assert _settings(monkeypatch).vllm_container == "vllm_node" + + +def test_vllm_container_override(monkeypatch): + assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4" + + +def test_vllm_container_invalid_falls_back(monkeypatch): + # A malformed value (space / shell metachar) is rejected at the boundary and + # falls back to the default rather than crashing startup or reaching a sink. + assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node" + + +# ---- services map honors the disable list ---- + +def test_services_from_settings_drops_disabled(monkeypatch): + s = _settings( + monkeypatch, + SPARK1_HOST="10.0.0.1", SPARK1_USER="u", + SPARK2_HOST="10.0.0.2", SPARK2_USER="u", + DISABLED_SERVICES="parakeet,qdrant", + ) + svcs = services_from_settings(s) + assert "parakeet" not in svcs and "qdrant" not in svcs + assert "kokoro" in svcs and "embeddings" in svcs + + +def test_custom_vllm_service_registered(monkeypatch): + from app import custom_services + monkeypatch.setattr(custom_services, "load_custom_services", lambda: [ + {"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2", + "user": "u", "container": "vllm_node", "port": 8000}, + ]) + s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u", + SPARK2_HOST="10.0.0.2", SPARK2_USER="u") + svc = services_from_settings(s)["vllm-spark2"] + assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node" + + +def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch): + # A custom entry can't shadow a built-in key — the built-in wins. + from app import custom_services + monkeypatch.setattr(custom_services, "load_custom_services", lambda: [ + {"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000}, + ]) + s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u", + SPARK2_HOST="10.0.0.2", SPARK2_USER="u") + assert services_from_settings(s)["parakeet"].kind == "stt" + + +# ---- disabled health checks short-circuit (no network) ---- + +def test_disabled_check_returns_disabled_verdict(monkeypatch): + s = _settings( + monkeypatch, + SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins + DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant", + ) + for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant): + r = asyncio.run(check(s)) + assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None} + + +# ---- vLLM probe: not-configured path is pure ---- + +def test_probe_vllm_endpoint_unconfigured(monkeypatch): + r = asyncio.run(probe_vllm_endpoint("", 8000)) + assert r["ok"] is False and "not configured" in r["error"] + + +def test_check_vllm_unconfigured_without_spark1(monkeypatch): + s = _settings(monkeypatch) # no SPARK1_HOST + r = asyncio.run(check_vllm(s)) + assert r["ok"] is False and "spark1 not configured" in r["error"] diff --git a/package/startos/actions/configureSparks.ts b/package/startos/actions/configureSparks.ts index abd8168..64d6610 100644 --- a/package/startos/actions/configureSparks.ts +++ b/package/startos/actions/configureSparks.ts @@ -49,6 +49,24 @@ const inputSpec = InputSpec.of({ placeholder: 'leave blank for 8888', masked: false, }), + vllm_container: Value.text({ + name: 'vLLM container name (optional)', + description: + 'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.', + required: false, + default: null, + placeholder: 'leave blank for vllm_node', + masked: false, + }), + disabled_services: Value.text({ + name: 'Services to hide (optional)', + description: + "Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)", + required: false, + default: null, + placeholder: 'e.g. parakeet,kokoro', + masked: false, + }), parakeet_host: Value.text({ name: 'Parakeet host (optional)', description: diff --git a/package/startos/fileModels/sparkConfig.yaml.ts b/package/startos/fileModels/sparkConfig.yaml.ts index 85a63b6..a1d1545 100644 --- a/package/startos/fileModels/sparkConfig.yaml.ts +++ b/package/startos/fileModels/sparkConfig.yaml.ts @@ -9,6 +9,11 @@ export const sparkConfigSchema = z.object({ spark2_user: z.string().catch(''), // Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default). vllm_port: z.string().catch(''), + // Optional vLLM container-name override (Spark 1). Blank => "vllm_node". + vllm_container: z.string().catch(''), + // Optional comma-separated list of built-in services to switch off + // (parakeet, kokoro, embeddings, qdrant). Blank => all enabled. + disabled_services: z.string().catch(''), // Optional per-service overrides. Blank => use spark2_host / spark2_user. parakeet_host: z.string().catch(''), parakeet_user: z.string().catch(''), diff --git a/package/startos/main.ts b/package/startos/main.ts index 9595fa6..96df6c6 100644 --- a/package/startos/main.ts +++ b/package/startos/main.ts @@ -14,6 +14,8 @@ export const main = sdk.setupMain(async ({ effects }) => { spark2_host: '', spark2_user: '', vllm_port: '', + vllm_container: '', + disabled_services: '', parakeet_host: '', parakeet_user: '', parakeet_container: '', @@ -52,6 +54,8 @@ export const main = sdk.setupMain(async ({ effects }) => { SPARK2_HOST: cfg.spark2_host, SPARK2_USER: cfg.spark2_user, VLLM_PORT: cfg.vllm_port, + VLLM_CONTAINER: cfg.vllm_container, + DISABLED_SERVICES: cfg.disabled_services, PARAKEET_HOST: cfg.parakeet_host, PARAKEET_USER: cfg.parakeet_user, PARAKEET_CONTAINER: cfg.parakeet_container, diff --git a/package/startos/versions/v0_1_0.ts b/package/startos/versions/v0_1_0.ts index 7607d14..9415853 100644 --- a/package/startos/versions/v0_1_0.ts +++ b/package/startos/versions/v0_1_0.ts @@ -1,10 +1,10 @@ import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk' export const v0_1_0 = VersionInfo.of({ - version: '0.23.0:0', + version: '0.24.0:0', releaseNotes: { en_US: - "v0.23.0:0 — local / fine-tuned model support. You can now add a model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune), not just a Hugging Face repo. Use the new \"+ Add local model\" button under LLM swap: give it the model's absolute path on the Spark, an optional chat-template path, and the usual launch knobs. On swap, Spark Control bind-mounts that directory into the vLLM container at the same path (via the launch script's existing VLLM_SPARK_EXTRA_DOCKER_ARGS hook — nothing to change on the Spark) and runs `vllm serve `. Local models show a \"local\" badge and their path instead of a Hugging Face link, and their weights are never offered for dashboard deletion (that directory is your own training output, not a re-downloadable cache). API: POST /api/models now accepts `local_path` (set exactly one of `repo` or `local_path`), validated against a strict path whitelist with no traversal.", + "v0.24.0:0 — configurable cluster topology. Spark Control no longer assumes our exact layout, so a cluster that's wired differently can be monitored without forking. Three new optional settings in Configure Sparks: (1) vLLM container name — defaults to \"vllm_node\"; set it if your swappable vLLM runs under a different container name (the swap log view and pre-flight validator exec into it by name). (2) Services to hide — a comma-separated list of built-in services your cluster doesn't run (parakeet, kokoro, embeddings, qdrant); hidden ones show no tile and are never probed, so e.g. a vLLM sharing Parakeet's default port 8000 no longer gets a confusing Parakeet probe. (3) Monitor a second vLLM — register a vLLM on another Spark as a custom service with kind \"vllm\" (in /data/services-overrides.yaml); it gets a read-only health tile (loaded model + container state + start/stop/restart) alongside the swappable one. API: /api/endpoints now reports a `disabled` flag per service.", }, migrations: { up: async ({ effects }) => {}, diff --git a/runbook.md b/runbook.md index 1f3f438..2eeb841 100644 --- a/runbook.md +++ b/runbook.md @@ -52,6 +52,26 @@ The **Update** button runs `git fetch && git reset --hard origin/ && doc 3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark. +## Configurable topology (v0.24.0+) + +For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed): + +- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name. +- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default. +- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`: + + ```yaml + custom: + - key: vllm-spark2 + kind: vllm + host: + user: + container: vllm_node + port: 8000 + ``` + + It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.) + ## Adding a new model 1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.