Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 26070eb191 | |||
| 90394f891b |
@@ -55,12 +55,12 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
|||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
- **Working (v0.22.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
|
- **Live service runs v0.22.0:0** (installed and serving); **v0.23.0:0 is built, committed (`e783653`), tagged, and published to Gitea Releases but its live install is PENDING** — see the P3 line below. Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Local/fine-tuned model support lands live once v0.23.0:0 installs. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
|
||||||
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
|
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
|
||||||
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
|
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (102 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), and the configurable-topology layer (`test_topology.py`: `DISABLED_SERVICES` parsing, `vllm_container` override, disabled-service skip in `services_from_settings` + `check_*`, `probe_vllm_endpoint`). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
|
||||||
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
|
||||||
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
|
||||||
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
|
||||||
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
|
||||||
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
|
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
|
||||||
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass; verified via TestClient). **Reviewer-agent pass done; findings addressed:** path validation folded into the `ModelDef` validator (so YAML/override-added local models are checked too), a chat-template-must-live-inside-`local_path` guard, `_merge_overrides` skips a bad entry instead of breaking the whole catalog, and the `VLLM_SPARK_EXTRA_DOCKER_ARGS` unquoted-expansion contract is documented in `runbook.md`. **Not yet built/installed/published — awaiting go/no-go.** Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
|
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. Next: (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
|
||||||
|
|||||||
+1
-1
@@ -11,7 +11,7 @@ Driven by the one other Spark Control adopter (a colleague running OpenClaw + cr
|
|||||||
Sequenced:
|
Sequenced:
|
||||||
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
|
||||||
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
|
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
|
||||||
3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
|
3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
|
||||||
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
|
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
|
||||||
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
|
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
|
||||||
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
|
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
|
||||||
|
|||||||
@@ -1,13 +1,44 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import logging
|
||||||
import os
|
import os
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .shellsafe import validate_container
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def _env(name: str, default: str = "") -> str:
|
def _env(name: str, default: str = "") -> str:
|
||||||
return os.environ.get(name, default)
|
return os.environ.get(name, default)
|
||||||
|
|
||||||
|
|
||||||
|
def _env_container(name: str, default: str) -> str:
|
||||||
|
"""Resolve a container-name env var, validating it at the config boundary.
|
||||||
|
|
||||||
|
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
|
||||||
|
the sink — but per the repo's two-layer convention it's also whitelist-checked
|
||||||
|
here. A malformed optional value falls back to `default` rather than crashing
|
||||||
|
daemon startup (mirrors `_env_int` for VLLM_PORT)."""
|
||||||
|
val = os.environ.get(name, "") or default
|
||||||
|
try:
|
||||||
|
return validate_container(val)
|
||||||
|
except ValueError:
|
||||||
|
log.warning("ignoring invalid %s=%r; using %r", name, val, default)
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _env_set(name: str) -> frozenset[str]:
|
||||||
|
"""Parse a comma-separated env var into a lowercased frozenset of keys.
|
||||||
|
|
||||||
|
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
|
||||||
|
support service can switch its tile + probes off entirely (rather than have
|
||||||
|
the probe hit whatever else listens on that port — e.g. a vLLM sharing
|
||||||
|
Parakeet's default 8000)."""
|
||||||
|
raw = os.environ.get(name, "")
|
||||||
|
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
|
||||||
|
|
||||||
|
|
||||||
def _env_int(name: str, default: int) -> int:
|
def _env_int(name: str, default: int) -> int:
|
||||||
"""Parse an int env var, falling back to `default` when unset, blank, or
|
"""Parse an int env var, falling back to `default` when unset, blank, or
|
||||||
malformed. The StartOS Configure panel passes optional numeric fields as an
|
malformed. The StartOS Configure panel passes optional numeric fields as an
|
||||||
@@ -63,6 +94,8 @@ class Settings:
|
|||||||
ssh_known_hosts: str
|
ssh_known_hosts: str
|
||||||
models_yaml: str
|
models_yaml: str
|
||||||
vllm_port: int
|
vllm_port: int
|
||||||
|
vllm_container: str
|
||||||
|
disabled_services: frozenset[str]
|
||||||
parakeet_port: int
|
parakeet_port: int
|
||||||
kokoro_port: int
|
kokoro_port: int
|
||||||
embed_port: int
|
embed_port: int
|
||||||
@@ -116,6 +149,15 @@ class Settings:
|
|||||||
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
||||||
models_yaml=_resolve_models_yaml(),
|
models_yaml=_resolve_models_yaml(),
|
||||||
vllm_port=_env_int("VLLM_PORT", 8888),
|
vllm_port=_env_int("VLLM_PORT", 8888),
|
||||||
|
# Container name for the swappable vLLM on Spark 1. Defaults to the
|
||||||
|
# bundled launch-cluster.sh container; override if you named yours
|
||||||
|
# something else (the swap log-tail and pre-flight validator exec
|
||||||
|
# into it by name).
|
||||||
|
vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
|
||||||
|
# Built-in support-service keys (parakeet, kokoro, embeddings,
|
||||||
|
# qdrant) the deployment doesn't run — hidden from the dashboard and
|
||||||
|
# never probed.
|
||||||
|
disabled_services=_env_set("DISABLED_SERVICES"),
|
||||||
parakeet_port=_env_int("PARAKEET_PORT", 8000),
|
parakeet_port=_env_int("PARAKEET_PORT", 8000),
|
||||||
kokoro_port=_env_int("KOKORO_PORT", 8880),
|
kokoro_port=_env_int("KOKORO_PORT", 8880),
|
||||||
embed_port=_env_int("EMBED_PORT", 8088),
|
embed_port=_env_int("EMBED_PORT", 8088),
|
||||||
|
|||||||
@@ -10,6 +10,17 @@ Format:
|
|||||||
port: 8001
|
port: 8001
|
||||||
health_path: /health
|
health_path: /health
|
||||||
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
||||||
|
|
||||||
|
A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
|
||||||
|
the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
|
||||||
|
probed via /v1/models plus container state and start/stop/restart:
|
||||||
|
custom:
|
||||||
|
- key: vllm-spark2
|
||||||
|
kind: vllm
|
||||||
|
host: <spark-2-ip>
|
||||||
|
user: <ssh-user>
|
||||||
|
container: vllm_node
|
||||||
|
port: 8000
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import os
|
import os
|
||||||
|
|||||||
@@ -377,6 +377,10 @@ class DeepHealth:
|
|||||||
async def run_all(self) -> dict[str, ProbeResult]:
|
async def run_all(self) -> dict[str, ProbeResult]:
|
||||||
results = {}
|
results = {}
|
||||||
for name in self.PROBES:
|
for name in self.PROBES:
|
||||||
|
# Don't deep-probe a service the deployment switched off — its port
|
||||||
|
# may be answered by something else (e.g. a vLLM on Parakeet's 8000).
|
||||||
|
if name in self.settings.disabled_services:
|
||||||
|
continue
|
||||||
results[name] = await self.run_one(name)
|
results[name] = await self.run_one(name)
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
|||||||
+34
-9
@@ -6,17 +6,28 @@ from .config import Settings
|
|||||||
_TIMEOUT = 3.0
|
_TIMEOUT = 3.0
|
||||||
|
|
||||||
|
|
||||||
async def check_vllm(settings: Settings) -> dict:
|
def _disabled(settings: Settings, key: str) -> dict | None:
|
||||||
base_url = (
|
"""A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
|
||||||
f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
|
|
||||||
if settings.spark1_host
|
Lets an adopter who doesn't run a given support service switch its probe off
|
||||||
else None
|
entirely — so the probe never hits whatever else listens on that port, and
|
||||||
)
|
the connectivity log doesn't record it as perpetually down."""
|
||||||
if not settings.spark1_host:
|
if key in settings.disabled_services:
|
||||||
return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
|
return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
async def probe_vllm_endpoint(host: str, port: int) -> dict:
|
||||||
|
"""Probe any OpenAI-compatible vLLM at host:port via /v1/models.
|
||||||
|
|
||||||
|
Shared by the primary (Spark 1) health check and any extra vLLM registered
|
||||||
|
as a custom service (kind: vllm) to monitor a second Spark."""
|
||||||
|
base_url = f"http://{host}:{port}/v1" if host else None
|
||||||
|
if not host:
|
||||||
|
return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
|
||||||
r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
|
r = await c.get(f"http://{host}:{port}/v1/models")
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
ids = [m["id"] for m in r.json().get("data", [])]
|
ids = [m["id"] for m in r.json().get("data", [])]
|
||||||
return {
|
return {
|
||||||
@@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict:
|
|||||||
return {"ok": False, "error": str(e), "base_url": base_url}
|
return {"ok": False, "error": str(e), "base_url": base_url}
|
||||||
|
|
||||||
|
|
||||||
|
async def check_vllm(settings: Settings) -> dict:
|
||||||
|
if not settings.spark1_host:
|
||||||
|
return {"ok": False, "error": "spark1 not configured", "base_url": None}
|
||||||
|
return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
|
||||||
|
|
||||||
|
|
||||||
async def check_parakeet(settings: Settings) -> dict:
|
async def check_parakeet(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "parakeet"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
||||||
if settings.parakeet_host
|
if settings.parakeet_host
|
||||||
@@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_kokoro(settings: Settings) -> dict:
|
async def check_kokoro(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "kokoro"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
|
||||||
if settings.kokoro_host
|
if settings.kokoro_host
|
||||||
@@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_embeddings(settings: Settings) -> dict:
|
async def check_embeddings(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "embeddings"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.embed_host}:{settings.embed_port}"
|
f"http://{settings.embed_host}:{settings.embed_port}"
|
||||||
if settings.embed_host
|
if settings.embed_host
|
||||||
@@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
async def check_qdrant(settings: Settings) -> dict:
|
async def check_qdrant(settings: Settings) -> dict:
|
||||||
|
if d := _disabled(settings, "qdrant"):
|
||||||
|
return d
|
||||||
base_url = (
|
base_url = (
|
||||||
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
|
||||||
if settings.qdrant_host
|
if settings.qdrant_host
|
||||||
|
|||||||
+20
-8
@@ -20,7 +20,7 @@ from .llm_proxy import build_router as build_llm_router
|
|||||||
from .embeddings_proxy import build_router as build_embeddings_router
|
from .embeddings_proxy import build_router as build_embeddings_router
|
||||||
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
from .redaction_gateway import build_router as build_redaction_router, MapStore
|
||||||
from .hardware import HardwareProbe
|
from .hardware import HardwareProbe
|
||||||
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
|
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
|
||||||
from .matrix_bridge import MatrixBridgeManager
|
from .matrix_bridge import MatrixBridgeManager
|
||||||
from .models import ModelDef, load_catalog
|
from .models import ModelDef, load_catalog
|
||||||
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
||||||
@@ -500,6 +500,10 @@ async def get_services() -> dict:
|
|||||||
http = await check_embeddings(settings)
|
http = await check_embeddings(settings)
|
||||||
elif name == "qdrant":
|
elif name == "qdrant":
|
||||||
http = await check_qdrant(settings)
|
http = await check_qdrant(settings)
|
||||||
|
elif svc.kind == "vllm":
|
||||||
|
# An extra vLLM monitored on another Spark (registered as a custom
|
||||||
|
# service). Probe its own host/port, not the primary Spark 1 one.
|
||||||
|
http = await probe_vllm_endpoint(svc.host, svc.port)
|
||||||
elif svc.kind == "bot":
|
elif svc.kind == "bot":
|
||||||
# No HTTP health endpoint (host networking, no port) — judged purely
|
# No HTTP health endpoint (host networking, no port) — judged purely
|
||||||
# by docker state. http_ready stays None so the badge isn't pinned
|
# by docker state. http_ready stays None so the badge isn't pinned
|
||||||
@@ -521,7 +525,7 @@ async def get_services() -> dict:
|
|||||||
# Prefer the check fn's own top-level model key (embeddings reports
|
# Prefer the check fn's own top-level model key (embeddings reports
|
||||||
# it there); fall back to a model field inside detail for services
|
# it there); fall back to a model field inside detail for services
|
||||||
# whose /health embeds it (parakeet).
|
# whose /health embeds it (parakeet).
|
||||||
"model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
"model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
|
||||||
"docker_state": docker.get("state"),
|
"docker_state": docker.get("state"),
|
||||||
"restart_count": docker.get("restart_count"),
|
"restart_count": docker.get("restart_count"),
|
||||||
"started_at": docker.get("started_at"),
|
"started_at": docker.get("started_at"),
|
||||||
@@ -799,17 +803,20 @@ async def get_endpoints() -> dict:
|
|||||||
"base_url": vllm.get("base_url"),
|
"base_url": vllm.get("base_url"),
|
||||||
"model": vllm.get("current_model"),
|
"model": vllm.get("current_model"),
|
||||||
"openai_compat": True,
|
"openai_compat": True,
|
||||||
|
"disabled": bool(vllm.get("disabled")),
|
||||||
},
|
},
|
||||||
"parakeet": {
|
"parakeet": {
|
||||||
"ready": bool(parakeet.get("ok")),
|
"ready": bool(parakeet.get("ok")),
|
||||||
"base_url": parakeet.get("base_url"),
|
"base_url": parakeet.get("base_url"),
|
||||||
"kind": "stt",
|
"kind": "stt",
|
||||||
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
|
||||||
|
"disabled": bool(parakeet.get("disabled")),
|
||||||
},
|
},
|
||||||
"kokoro": {
|
"kokoro": {
|
||||||
"ready": bool(kokoro.get("ok")),
|
"ready": bool(kokoro.get("ok")),
|
||||||
"base_url": kokoro.get("base_url"),
|
"base_url": kokoro.get("base_url"),
|
||||||
"kind": "tts",
|
"kind": "tts",
|
||||||
|
"disabled": bool(kokoro.get("disabled")),
|
||||||
},
|
},
|
||||||
"embeddings": {
|
"embeddings": {
|
||||||
"ready": bool(embeddings.get("ok")),
|
"ready": bool(embeddings.get("ok")),
|
||||||
@@ -818,12 +825,14 @@ async def get_endpoints() -> dict:
|
|||||||
"model": embeddings.get("model"),
|
"model": embeddings.get("model"),
|
||||||
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
|
||||||
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
|
||||||
|
"disabled": bool(embeddings.get("disabled")),
|
||||||
},
|
},
|
||||||
"qdrant": {
|
"qdrant": {
|
||||||
"ready": bool(qdrant.get("ok")),
|
"ready": bool(qdrant.get("ok")),
|
||||||
"base_url": qdrant.get("base_url"),
|
"base_url": qdrant.get("base_url"),
|
||||||
"kind": "vectordb",
|
"kind": "vectordb",
|
||||||
"collection": settings.qdrant_collection or None,
|
"collection": settings.qdrant_collection or None,
|
||||||
|
"disabled": bool(qdrant.get("disabled")),
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -837,12 +846,15 @@ async def get_status() -> dict:
|
|||||||
check_embeddings(settings),
|
check_embeddings(settings),
|
||||||
check_qdrant(settings),
|
check_qdrant(settings),
|
||||||
)
|
)
|
||||||
# Feed health into the connectivity log (deduped — only logs on transition)
|
# Feed health into the connectivity log (deduped — only logs on transition).
|
||||||
record_state("vllm", bool(vllm.get("ok")))
|
# Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
|
||||||
record_state("parakeet", bool(parakeet.get("ok")))
|
# perpetually down.
|
||||||
record_state("kokoro", bool(kokoro.get("ok")))
|
for _name, _r in (
|
||||||
record_state("embeddings", bool(embeddings.get("ok")))
|
("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
|
||||||
record_state("qdrant", bool(qdrant.get("ok")))
|
("embeddings", embeddings), ("qdrant", qdrant),
|
||||||
|
):
|
||||||
|
if not _r.get("disabled"):
|
||||||
|
record_state(_name, bool(_r.get("ok")))
|
||||||
current_key = _identify_current_model(vllm.get("current_model"))
|
current_key = _identify_current_model(vllm.get("current_model"))
|
||||||
return {
|
return {
|
||||||
"configured": settings.configured,
|
"configured": settings.configured,
|
||||||
|
|||||||
+13
-2
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
|
|||||||
appropriate host.
|
appropriate host.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import logging
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
@@ -13,6 +14,8 @@ from .config import Settings
|
|||||||
from .shellsafe import quote_arg
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
||||||
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
||||||
@@ -103,7 +106,13 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
}
|
}
|
||||||
for entry in load_custom_services():
|
for entry in load_custom_services():
|
||||||
key = entry.get("key")
|
key = entry.get("key")
|
||||||
if not key or key in out:
|
if not key:
|
||||||
|
continue
|
||||||
|
if key in out:
|
||||||
|
# A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
|
||||||
|
# an adopter who picked a colliding key for, say, a second vLLM sees
|
||||||
|
# why no tile appeared instead of a silent no-op.
|
||||||
|
log.warning("custom service %r collides with a built-in name; ignoring", key)
|
||||||
continue
|
continue
|
||||||
out[key] = ServiceDef(
|
out[key] = ServiceDef(
|
||||||
name=key,
|
name=key,
|
||||||
@@ -113,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
container=entry.get("container", key),
|
container=entry.get("container", key),
|
||||||
port=int(entry.get("port", 0)),
|
port=int(entry.get("port", 0)),
|
||||||
)
|
)
|
||||||
return out
|
# Drop services the deployment has switched off (DISABLED_SERVICES) so they
|
||||||
|
# show no tile and are never probed/auto-restarted.
|
||||||
|
return {k: v for k, v in out.items() if k not in s.disabled_services}
|
||||||
|
|
||||||
|
|
||||||
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
||||||
|
|||||||
@@ -932,6 +932,10 @@ function renderHealth(status) {
|
|||||||
function setDot(id, ok, payload) {
|
function setDot(id, ok, payload) {
|
||||||
const item = el(id);
|
const item = el(id);
|
||||||
if (!item) return;
|
if (!item) return;
|
||||||
|
// A service switched off via DISABLED_SERVICES isn't part of this
|
||||||
|
// deployment — hide its indicator entirely rather than show it as down.
|
||||||
|
if (payload && payload.disabled) { item.classList.add('hidden'); return; }
|
||||||
|
item.classList.remove('hidden');
|
||||||
const dot = item.querySelector('.dot');
|
const dot = item.querySelector('.dot');
|
||||||
dot.classList.remove('ok', 'bad', 'warn');
|
dot.classList.remove('ok', 'bad', 'warn');
|
||||||
if (ok === true) dot.classList.add('ok');
|
if (ok === true) dot.classList.add('ok');
|
||||||
|
|||||||
+2
-1
@@ -7,6 +7,7 @@ from typing import Optional
|
|||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run, ssh_stream, StreamHandle
|
from .ssh import ssh_run, ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
@@ -112,7 +113,7 @@ class SwapManager:
|
|||||||
|
|
||||||
# Step 3: tail logs until the ready marker (or timeout)
|
# Step 3: tail logs until the ready marker (or timeout)
|
||||||
job.state = "tailing"
|
job.state = "tailing"
|
||||||
tail_cmd = "docker logs -f --tail 50 vllm_node"
|
tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
|
||||||
job.append(f"$ {tail_cmd}")
|
job.append(f"$ {tail_cmd}")
|
||||||
timeout = max(model.expected_ready_seconds * 2, 600)
|
timeout = max(model.expected_ready_seconds * 2, 600)
|
||||||
handle = StreamHandle()
|
handle = StreamHandle()
|
||||||
|
|||||||
@@ -22,6 +22,7 @@ from typing import Any
|
|||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
from .models import Catalog, build_launch_command
|
from .models import Catalog, build_launch_command
|
||||||
|
from .shellsafe import quote_arg
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
@@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
|
|||||||
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
||||||
# reads from stdin to avoid shell-escaping the args themselves.
|
# reads from stdin to avoid shell-escaping the args themselves.
|
||||||
cmd = (
|
cmd = (
|
||||||
f"echo '{payload}' | docker exec -i vllm_node python3 -c "
|
f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
|
||||||
+ shlex.quote(_VALIDATOR_SCRIPT)
|
+ shlex.quote(_VALIDATOR_SCRIPT)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,120 @@
|
|||||||
|
"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
|
||||||
|
extra-vLLM probe. All offline — the disabled checks short-circuit before any
|
||||||
|
network call, and the probes are exercised only on the not-configured path.
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
from app.config import Settings
|
||||||
|
from app.health import (
|
||||||
|
check_embeddings,
|
||||||
|
check_kokoro,
|
||||||
|
check_parakeet,
|
||||||
|
check_qdrant,
|
||||||
|
check_vllm,
|
||||||
|
probe_vllm_endpoint,
|
||||||
|
)
|
||||||
|
from app.services import services_from_settings
|
||||||
|
|
||||||
|
|
||||||
|
def _settings(monkeypatch, **env) -> Settings:
|
||||||
|
# Pin the topology env vars under test; default the rest to blank so a stray
|
||||||
|
# value in the real environment can't leak into the assertion.
|
||||||
|
keys = [
|
||||||
|
"SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
|
||||||
|
"DISABLED_SERVICES", "VLLM_CONTAINER",
|
||||||
|
]
|
||||||
|
for k in keys:
|
||||||
|
monkeypatch.delenv(k, raising=False)
|
||||||
|
for k, v in env.items():
|
||||||
|
monkeypatch.setenv(k, v)
|
||||||
|
return Settings.from_env()
|
||||||
|
|
||||||
|
|
||||||
|
# ---- DISABLED_SERVICES parsing ----
|
||||||
|
|
||||||
|
def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
|
||||||
|
s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
|
||||||
|
assert s.disabled_services == frozenset({"parakeet", "kokoro"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_disabled_services_blank_is_empty(monkeypatch):
|
||||||
|
assert _settings(monkeypatch).disabled_services == frozenset()
|
||||||
|
|
||||||
|
|
||||||
|
# ---- vLLM container override ----
|
||||||
|
|
||||||
|
def test_vllm_container_defaults_to_vllm_node(monkeypatch):
|
||||||
|
assert _settings(monkeypatch).vllm_container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
def test_vllm_container_override(monkeypatch):
|
||||||
|
assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
|
||||||
|
|
||||||
|
|
||||||
|
def test_vllm_container_invalid_falls_back(monkeypatch):
|
||||||
|
# A malformed value (space / shell metachar) is rejected at the boundary and
|
||||||
|
# falls back to the default rather than crashing startup or reaching a sink.
|
||||||
|
assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- services map honors the disable list ----
|
||||||
|
|
||||||
|
def test_services_from_settings_drops_disabled(monkeypatch):
|
||||||
|
s = _settings(
|
||||||
|
monkeypatch,
|
||||||
|
SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
|
||||||
|
DISABLED_SERVICES="parakeet,qdrant",
|
||||||
|
)
|
||||||
|
svcs = services_from_settings(s)
|
||||||
|
assert "parakeet" not in svcs and "qdrant" not in svcs
|
||||||
|
assert "kokoro" in svcs and "embeddings" in svcs
|
||||||
|
|
||||||
|
|
||||||
|
def test_custom_vllm_service_registered(monkeypatch):
|
||||||
|
from app import custom_services
|
||||||
|
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
||||||
|
{"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
|
||||||
|
"user": "u", "container": "vllm_node", "port": 8000},
|
||||||
|
])
|
||||||
|
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
||||||
|
svc = services_from_settings(s)["vllm-spark2"]
|
||||||
|
assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
|
||||||
|
|
||||||
|
|
||||||
|
def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
|
||||||
|
# A custom entry can't shadow a built-in key — the built-in wins.
|
||||||
|
from app import custom_services
|
||||||
|
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
|
||||||
|
{"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
|
||||||
|
])
|
||||||
|
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
|
||||||
|
assert services_from_settings(s)["parakeet"].kind == "stt"
|
||||||
|
|
||||||
|
|
||||||
|
# ---- disabled health checks short-circuit (no network) ----
|
||||||
|
|
||||||
|
def test_disabled_check_returns_disabled_verdict(monkeypatch):
|
||||||
|
s = _settings(
|
||||||
|
monkeypatch,
|
||||||
|
SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins
|
||||||
|
DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
|
||||||
|
)
|
||||||
|
for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
|
||||||
|
r = asyncio.run(check(s))
|
||||||
|
assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
|
||||||
|
|
||||||
|
|
||||||
|
# ---- vLLM probe: not-configured path is pure ----
|
||||||
|
|
||||||
|
def test_probe_vllm_endpoint_unconfigured(monkeypatch):
|
||||||
|
r = asyncio.run(probe_vllm_endpoint("", 8000))
|
||||||
|
assert r["ok"] is False and "not configured" in r["error"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_check_vllm_unconfigured_without_spark1(monkeypatch):
|
||||||
|
s = _settings(monkeypatch) # no SPARK1_HOST
|
||||||
|
r = asyncio.run(check_vllm(s))
|
||||||
|
assert r["ok"] is False and "spark1 not configured" in r["error"]
|
||||||
@@ -49,6 +49,24 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'leave blank for 8888',
|
placeholder: 'leave blank for 8888',
|
||||||
masked: false,
|
masked: false,
|
||||||
}),
|
}),
|
||||||
|
vllm_container: Value.text({
|
||||||
|
name: 'vLLM container name (optional)',
|
||||||
|
description:
|
||||||
|
'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
|
||||||
|
required: false,
|
||||||
|
default: null,
|
||||||
|
placeholder: 'leave blank for vllm_node',
|
||||||
|
masked: false,
|
||||||
|
}),
|
||||||
|
disabled_services: Value.text({
|
||||||
|
name: 'Services to hide (optional)',
|
||||||
|
description:
|
||||||
|
"Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
|
||||||
|
required: false,
|
||||||
|
default: null,
|
||||||
|
placeholder: 'e.g. parakeet,kokoro',
|
||||||
|
masked: false,
|
||||||
|
}),
|
||||||
parakeet_host: Value.text({
|
parakeet_host: Value.text({
|
||||||
name: 'Parakeet host (optional)',
|
name: 'Parakeet host (optional)',
|
||||||
description:
|
description:
|
||||||
|
|||||||
@@ -9,6 +9,11 @@ export const sparkConfigSchema = z.object({
|
|||||||
spark2_user: z.string().catch(''),
|
spark2_user: z.string().catch(''),
|
||||||
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
|
||||||
vllm_port: z.string().catch(''),
|
vllm_port: z.string().catch(''),
|
||||||
|
// Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
|
||||||
|
vllm_container: z.string().catch(''),
|
||||||
|
// Optional comma-separated list of built-in services to switch off
|
||||||
|
// (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
|
||||||
|
disabled_services: z.string().catch(''),
|
||||||
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
|
||||||
parakeet_host: z.string().catch(''),
|
parakeet_host: z.string().catch(''),
|
||||||
parakeet_user: z.string().catch(''),
|
parakeet_user: z.string().catch(''),
|
||||||
|
|||||||
@@ -14,6 +14,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
spark2_host: '',
|
spark2_host: '',
|
||||||
spark2_user: '',
|
spark2_user: '',
|
||||||
vllm_port: '',
|
vllm_port: '',
|
||||||
|
vllm_container: '',
|
||||||
|
disabled_services: '',
|
||||||
parakeet_host: '',
|
parakeet_host: '',
|
||||||
parakeet_user: '',
|
parakeet_user: '',
|
||||||
parakeet_container: '',
|
parakeet_container: '',
|
||||||
@@ -52,6 +54,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
SPARK2_HOST: cfg.spark2_host,
|
SPARK2_HOST: cfg.spark2_host,
|
||||||
SPARK2_USER: cfg.spark2_user,
|
SPARK2_USER: cfg.spark2_user,
|
||||||
VLLM_PORT: cfg.vllm_port,
|
VLLM_PORT: cfg.vllm_port,
|
||||||
|
VLLM_CONTAINER: cfg.vllm_container,
|
||||||
|
DISABLED_SERVICES: cfg.disabled_services,
|
||||||
PARAKEET_HOST: cfg.parakeet_host,
|
PARAKEET_HOST: cfg.parakeet_host,
|
||||||
PARAKEET_USER: cfg.parakeet_user,
|
PARAKEET_USER: cfg.parakeet_user,
|
||||||
PARAKEET_CONTAINER: cfg.parakeet_container,
|
PARAKEET_CONTAINER: cfg.parakeet_container,
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
export const v0_1_0 = VersionInfo.of({
|
export const v0_1_0 = VersionInfo.of({
|
||||||
version: '0.23.0:0',
|
version: '0.24.0:0',
|
||||||
releaseNotes: {
|
releaseNotes: {
|
||||||
en_US:
|
en_US:
|
||||||
"v0.23.0:0 — local / fine-tuned model support. You can now add a model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune), not just a Hugging Face repo. Use the new \"+ Add local model\" button under LLM swap: give it the model's absolute path on the Spark, an optional chat-template path, and the usual launch knobs. On swap, Spark Control bind-mounts that directory into the vLLM container at the same path (via the launch script's existing VLLM_SPARK_EXTRA_DOCKER_ARGS hook — nothing to change on the Spark) and runs `vllm serve <dir>`. Local models show a \"local\" badge and their path instead of a Hugging Face link, and their weights are never offered for dashboard deletion (that directory is your own training output, not a re-downloadable cache). API: POST /api/models now accepts `local_path` (set exactly one of `repo` or `local_path`), validated against a strict path whitelist with no traversal.",
|
"v0.24.0:0 — configurable cluster topology. Spark Control no longer assumes our exact layout, so a cluster that's wired differently can be monitored without forking. Three new optional settings in Configure Sparks: (1) vLLM container name — defaults to \"vllm_node\"; set it if your swappable vLLM runs under a different container name (the swap log view and pre-flight validator exec into it by name). (2) Services to hide — a comma-separated list of built-in services your cluster doesn't run (parakeet, kokoro, embeddings, qdrant); hidden ones show no tile and are never probed, so e.g. a vLLM sharing Parakeet's default port 8000 no longer gets a confusing Parakeet probe. (3) Monitor a second vLLM — register a vLLM on another Spark as a custom service with kind \"vllm\" (in /data/services-overrides.yaml); it gets a read-only health tile (loaded model + container state + start/stop/restart) alongside the swappable one. API: /api/endpoints now reports a `disabled` flag per service.",
|
||||||
},
|
},
|
||||||
migrations: {
|
migrations: {
|
||||||
up: async ({ effects }) => {},
|
up: async ({ effects }) => {},
|
||||||
|
|||||||
+31
@@ -52,6 +52,26 @@ The **Update** button runs `git fetch && git reset --hard origin/<branch> && doc
|
|||||||
|
|
||||||
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
|
||||||
|
|
||||||
|
## Configurable topology (v0.24.0+)
|
||||||
|
|
||||||
|
For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
|
||||||
|
|
||||||
|
- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
|
||||||
|
- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
|
||||||
|
- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
custom:
|
||||||
|
- key: vllm-spark2
|
||||||
|
kind: vllm
|
||||||
|
host: <spark-2-ip>
|
||||||
|
user: <ssh-user>
|
||||||
|
container: vllm_node
|
||||||
|
port: 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
|
||||||
|
|
||||||
## Adding a new model
|
## Adding a new model
|
||||||
|
|
||||||
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
||||||
@@ -81,6 +101,17 @@ cd ~/spark-vllm-docker
|
|||||||
docker logs -f vllm_node # wait for "Application startup complete."
|
docker logs -f vllm_node # wait for "Application startup complete."
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Sideload (`make install`) can't reach the server
|
||||||
|
|
||||||
|
Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
|
||||||
|
|
||||||
|
- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
|
||||||
|
- **Fix:** make the name resolve again, then re-run `make install`:
|
||||||
|
- `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
|
||||||
|
- `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
|
||||||
|
|
||||||
|
Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
|
||||||
|
|
||||||
## Diagnostics
|
## Diagnostics
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
Reference in New Issue
Block a user