docs: v0.24.0:0 committed/tagged/pushed — Gitea release asset + live install still pending

v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor)
Make the cluster topology configurable so an adopter wired differently (vLLM on both Sparks, port 8000, different container name, no Parakeet) can monitor without forking. Covers the OpenClaw report P4/P5/#6. - VLLM_CONTAINER override (default vllm_node), validated at the boundary and quote_arg-quoted into the swap log-tail + pre-flight validator exec. - DISABLED_SERVICES list: hidden services show no tile and are skipped by status/deep-health/connectivity probes (kills the Parakeet-on-8000 collision). - kind: vllm custom service monitors a second Spark's vLLM via the shared probe_vllm_endpoint; /api/endpoints gains a disabled flag. Swap mechanism intentionally not generalized to raw docker run (that's coordination, roadmap item 4).
2026-06-17 23:11:14 -05:00 · 2026-06-17 23:03:33 -05:00 · 2026-06-17 22:36:41 -05:00 · 2026-06-17 22:27:41 -05:00 · 2026-06-17 21:29:27 -05:00 · 2026-06-17 21:23:21 -05:00
27 changed files with 763 additions and 62 deletions
@@ -55,12 +55,12 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou

 ## Current state

- **Working (v0.21.0:1, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
+- **Live service runs v0.22.0:0** (installed and serving). **v0.24.0:0 is the latest in tree — built clean, committed (`26070eb`), tagged `v0.24.0`, and pushed to `gitea/master`. Two close-out steps remain: (a) `make release` to publish the s9pk asset to Gitea Releases (NOT run this session — needs `GITEA_URL` + write `GITEA_TOKEN`, neither in env; the s9pk is built and waiting in `package/`), and (b) the live install. v0.23.0:0 is in the same boat — also committed/tagged/Gitea-published but install PENDING.** Both installs blocked on the same mDNS issue (P3 line below). Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (blank ⇒ 8888) + **configurable topology** (vLLM container name, hide-services list, second-Spark vLLM monitor — v0.24.0:0). Local/fine-tuned models (v0.23.0:0) + the topology knobs land live once these installs go through. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
 - **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
+- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (102 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), and the configurable-topology layer (`test_topology.py`: `DISABLED_SERVICES` parsing, `vllm_container` override, disabled-service skip in `services_from_settings` + `check_*`, `probe_vllm_endpoint`). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
 - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
 - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
 - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
 - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
 - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — DONE in tree, staged as **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). **Not yet built/installed/committed — awaiting go/no-go.** (2) local-path/fine-tuned models (in ROADMAP under Dashboard). (3) configurable topology (service→Spark→port map + container names). (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
+- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. Next: (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
@@ -10,8 +10,8 @@ Driven by the one other Spark Control adopter (a colleague running OpenClaw + cr

 Sequenced:
 1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
-2. **Local-path / fine-tuned model support** — see the dedicated item under "## Dashboard" below. Independently wanted; his merged `ten31-v2` (a directory, not an HF repo) is the motivating case.
-3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
+2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
+3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
 4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
   - **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
   - **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
@@ -34,7 +34,6 @@ Sequenced:
 - Second audio worker / queueing layer; revisit which services share Spark 2.

 ## Dashboard
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (`models.yaml` + custom overrides) and the "Add custom model" path (`POST /api/models`) only accepts an HF `org/name` repo (`shellsafe._HF_REPO_RE`), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the `org/name` regex (path whitelist + `shlex.quote`, no traversal); (b) `models.build_launch_command` / `launch-cluster.sh` able to `vllm serve <path>`; (c) `disk.py` size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
 - Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
 - Spark host update actions (OS/driver) from the UI.
 - Open WebUI link-out integration; richer per-service detail views.
@@ -25,6 +25,22 @@ npm run prettier   # prettier --write startos (no semicolons, single quotes, tra
 - Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
 - New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).

+## Releasing to Gitea
+
+The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
+
+```bash
+# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
+cd package && make x86                       # build
+# 2. commit + push the source change
+git tag vX.Y.Z && git push gitea vX.Y.Z      # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
+make install                                 # optional: sideload to your own server (restarts it — go/no-go)
+# 3. publish the s9pk as a release asset (needs a write-scoped token):
+GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
+```
+
+`make release` → `scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
+
 ## Layout

 - `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
@@ -1,13 +1,44 @@
 from __future__ import annotations
+import logging
 import os
 from dataclasses import dataclass
 from pathlib import Path

+from .shellsafe import validate_container
+
+log = logging.getLogger(__name__)
+

 def _env(name: str, default: str = "") -> str:
    return os.environ.get(name, default)


+def _env_container(name: str, default: str) -> str:
+    """Resolve a container-name env var, validating it at the config boundary.
+
+    The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
+    the sink — but per the repo's two-layer convention it's also whitelist-checked
+    here. A malformed optional value falls back to `default` rather than crashing
+    daemon startup (mirrors `_env_int` for VLLM_PORT)."""
+    val = os.environ.get(name, "") or default
+    try:
+        return validate_container(val)
+    except ValueError:
+        log.warning("ignoring invalid %s=%r; using %r", name, val, default)
+        return default
+
+
+def _env_set(name: str) -> frozenset[str]:
+    """Parse a comma-separated env var into a lowercased frozenset of keys.
+
+    Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
+    support service can switch its tile + probes off entirely (rather than have
+    the probe hit whatever else listens on that port — e.g. a vLLM sharing
+    Parakeet's default 8000)."""
+    raw = os.environ.get(name, "")
+    return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
+
+
 def _env_int(name: str, default: int) -> int:
    """Parse an int env var, falling back to `default` when unset, blank, or
    malformed. The StartOS Configure panel passes optional numeric fields as an
@@ -63,6 +94,8 @@ class Settings:
    ssh_known_hosts: str
    models_yaml: str
    vllm_port: int
+    vllm_container: str
+    disabled_services: frozenset[str]
    parakeet_port: int
    kokoro_port: int
    embed_port: int
@@ -116,6 +149,15 @@ class Settings:
            ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
            models_yaml=_resolve_models_yaml(),
            vllm_port=_env_int("VLLM_PORT", 8888),
+            # Container name for the swappable vLLM on Spark 1. Defaults to the
+            # bundled launch-cluster.sh container; override if you named yours
+            # something else (the swap log-tail and pre-flight validator exec
+            # into it by name).
+            vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
+            # Built-in support-service keys (parakeet, kokoro, embeddings,
+            # qdrant) the deployment doesn't run — hidden from the dashboard and
+            # never probed.
+            disabled_services=_env_set("DISABLED_SERVICES"),
            parakeet_port=_env_int("PARAKEET_PORT", 8000),
            kokoro_port=_env_int("KOKORO_PORT", 8880),
            embed_port=_env_int("EMBED_PORT", 8088),
@@ -10,6 +10,17 @@ Format:
        port: 8001
        health_path: /health
        image: nvcr.io/nim/nvidia/riva-multilingual:latest
+
+A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
+the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
+probed via /v1/models plus container state and start/stop/restart:
+    custom:
+      - key: vllm-spark2
+        kind: vllm
+        host: <spark-2-ip>
+        user: <ssh-user>
+        container: vllm_node
+        port: 8000
 """
 from __future__ import annotations
 import os
@@ -377,6 +377,10 @@ class DeepHealth:
    async def run_all(self) -> dict[str, ProbeResult]:
        results = {}
        for name in self.PROBES:
+            # Don't deep-probe a service the deployment switched off — its port
+            # may be answered by something else (e.g. a vLLM on Parakeet's 8000).
+            if name in self.settings.disabled_services:
+                continue
            results[name] = await self.run_one(name)
        return results

@@ -15,6 +15,7 @@ from dataclasses import dataclass
 from typing import Optional

 from .config import Settings
+from .shellsafe import quote_arg
 from .ssh import ssh_run


@@ -76,16 +77,52 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
    return HostDiskResult(host=host, on_disk=True, size_bytes=size)


-async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
-    """Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
+async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
+    """Return whether a local model directory exists on this host and its size.
+
+    For locally fine-tuned models (a Spark directory, not an HF cache entry). The
+    path is whitelisted at the API boundary (shellsafe.validate_local_path); we
+    shlex-quote it here in depth.
+    """
+    if not host or not user:
+        return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
+    qp = quote_arg(path)
+    cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
+    rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
+    if rc != 0:
+        return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
+    raw = out.strip()
+    if raw == "MISSING" or raw == "":
+        return HostDiskResult(host=host, on_disk=False)
+    try:
+        size = int(raw.splitlines()[-1])
+    except ValueError:
+        return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
+    return HostDiskResult(host=host, on_disk=True, size_bytes=size)
+
+
+async def probe_disk(
+    repo: str, mode: str, settings: Settings, *, local_path: str | None = None
+) -> DiskStatus:
+    """Probe one model across the relevant Sparks based on its mode (solo|cluster).
+
+    A local model (local_path set) is probed by directory; otherwise by HF cache.
+    """
    hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
    if mode == "cluster" and settings.spark2_host:
        hosts.append((settings.spark2_host, settings.spark2_user))

-    results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
+    if local_path:
+        results = await asyncio.gather(
+            *(probe_local_host(h, u, local_path, settings) for h, u in hosts)
+        )
+        key = local_path
+    else:
+        results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
+        key = repo
    on_disk = any(r.on_disk for r in results)
    total = sum(r.size_bytes for r in results)
-    return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
+    return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))


 async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
@@ -6,17 +6,28 @@ from .config import Settings
 _TIMEOUT = 3.0


-async def check_vllm(settings: Settings) -> dict:
-    base_url = (
-        f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
-        if settings.spark1_host
-        else None
-    )
-    if not settings.spark1_host:
-        return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
+def _disabled(settings: Settings, key: str) -> dict | None:
+    """A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
+
+    Lets an adopter who doesn't run a given support service switch its probe off
+    entirely — so the probe never hits whatever else listens on that port, and
+    the connectivity log doesn't record it as perpetually down."""
+    if key in settings.disabled_services:
+        return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
+    return None
+
+
+async def probe_vllm_endpoint(host: str, port: int) -> dict:
+    """Probe any OpenAI-compatible vLLM at host:port via /v1/models.
+
+    Shared by the primary (Spark 1) health check and any extra vLLM registered
+    as a custom service (kind: vllm) to monitor a second Spark."""
+    base_url = f"http://{host}:{port}/v1" if host else None
+    if not host:
+        return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
    try:
        async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
-            r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
+            r = await c.get(f"http://{host}:{port}/v1/models")
            r.raise_for_status()
            ids = [m["id"] for m in r.json().get("data", [])]
            return {
@@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict:
        return {"ok": False, "error": str(e), "base_url": base_url}


+async def check_vllm(settings: Settings) -> dict:
+    if not settings.spark1_host:
+        return {"ok": False, "error": "spark1 not configured", "base_url": None}
+    return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
+
+
 async def check_parakeet(settings: Settings) -> dict:
+    if d := _disabled(settings, "parakeet"):
+        return d
    base_url = (
        f"http://{settings.parakeet_host}:{settings.parakeet_port}"
        if settings.parakeet_host
@@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict:


 async def check_kokoro(settings: Settings) -> dict:
+    if d := _disabled(settings, "kokoro"):
+        return d
    base_url = (
        f"http://{settings.kokoro_host}:{settings.kokoro_port}"
        if settings.kokoro_host
@@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict:


 async def check_embeddings(settings: Settings) -> dict:
+    if d := _disabled(settings, "embeddings"):
+        return d
    base_url = (
        f"http://{settings.embed_host}:{settings.embed_port}"
        if settings.embed_host
@@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict:


 async def check_qdrant(settings: Settings) -> dict:
+    if d := _disabled(settings, "qdrant"):
+        return d
    base_url = (
        f"http://{settings.qdrant_host}:{settings.qdrant_port}"
        if settings.qdrant_host
@@ -1,15 +1,33 @@
 from __future__ import annotations
+import logging
 from typing import Literal, Optional
 import yaml
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator

 from .overrides import apply_knobs_to_args, load_overrides
-from .shellsafe import quote_arg, quote_args
+from .shellsafe import quote_arg, quote_args, validate_local_path
+
+log = logging.getLogger(__name__)
+
+
+def _chat_template_path(vllm_args: list[str]) -> str | None:
+    """Extract the path from a `--chat-template=<path>` arg, if present."""
+    for a in vllm_args:
+        if a.startswith("--chat-template="):
+            return a.split("=", 1)[1]
+    return None
+
+
+def _is_within(path: str, base: str) -> bool:
+    """True if `path` is `base` itself or lives inside it (lexical check)."""
+    base = base.rstrip("/")
+    return path == base or path.startswith(base + "/")


 class ModelDef(BaseModel):
    display_name: str
-    repo: str
+    repo: str = ""                   # HF 'org/name'; empty for a local model
+    local_path: str | None = None    # absolute dir on the Spark; set => local model
    size_gb: float
    mode: Literal["solo", "cluster"]
    capabilities: list[str] = Field(default_factory=list)
@@ -19,6 +37,38 @@ class ModelDef(BaseModel):
    knobs: dict | None = None       # user-customized; merged at launch time
    custom: bool = False             # True if this came from /data overrides

+    @model_validator(mode="after")
+    def _validate_source(self) -> "ModelDef":
+        if bool(self.repo) == bool(self.local_path):
+            raise ValueError(
+                f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
+                f"or 'local_path' (Spark directory)"
+            )
+        if self.local_path:
+            # Single place that enforces the path whitelist, so YAML/override
+            # entries get the same boundary check as the API. The quote_arg sink
+            # is still defense-in-depth.
+            validate_local_path(self.local_path)
+            # Only local_path is bind-mounted into the vLLM container, so any
+            # --chat-template path must live inside it or vLLM can't find it.
+            tmpl = _chat_template_path(self.vllm_args)
+            if tmpl is not None and not _is_within(tmpl, self.local_path):
+                raise ValueError(
+                    f"--chat-template path {tmpl!r} must be inside the model "
+                    f"directory {self.local_path!r} (only that directory is mounted "
+                    f"into the container)"
+                )
+        return self
+
+    @property
+    def is_local(self) -> bool:
+        return bool(self.local_path)
+
+    @property
+    def source(self) -> str:
+        """What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
+        return self.local_path if self.local_path else self.repo
+

 class Defaults(BaseModel):
    port: int = 8888
@@ -47,7 +97,8 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
            continue
        defaults_dump = {
            "display_name": entry.get("display_name", key),
-            "repo": entry["repo"],
+            "repo": entry.get("repo", ""),
+            "local_path": entry.get("local_path"),
            "size_gb": float(entry.get("size_gb", 0)),
            "mode": entry.get("mode", "solo"),
            "capabilities": entry.get("capabilities") or [],
@@ -57,7 +108,12 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
            "knobs": entry.get("knobs"),
            "custom": True,
        }
-        new_models[key] = ModelDef.model_validate(defaults_dump)
+        # A single malformed override entry (bad path, missing source, etc.) must
+        # not take down the whole catalog — skip it and keep the rest loadable.
+        try:
+            new_models[key] = ModelDef.model_validate(defaults_dump)
+        except Exception as e:
+            log.warning("skipping invalid custom model %r: %s", key, e)

    return Catalog(defaults=catalog.defaults, models=new_models)

@@ -78,7 +134,21 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
    solo = "--solo " if model.mode == "solo" else ""
    base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
    args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
-    # repo + args are user-controlled (custom models, knobs); shlex.quote each so
-    # they cannot break out of the SSH shell command. shlex.split (used by the
+    # source + args are user-controlled (custom models, knobs); shlex.quote each
+    # so they cannot break out of the SSH shell command. shlex.split (used by the
    # vLLM pre-flight validator) cleanly reverses this quoting.
-    return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}"
+    prefix = ""
+    if model.local_path:
+        # A local model's directory isn't in the HF cache the launch script
+        # already mounts, so bind-mount it at the SAME path inside the vllm
+        # container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
+        # path inside and out means `vllm serve <dir>` and any
+        # `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
+        # change needed. (The env assignment sits before the script, so the
+        # validator's `serve`-keyed shlex round-trip is unaffected.)
+        mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
+        prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
+    return (
+        f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
+        f"{quote_arg(model.source)} {quote_args(args)}"
+    )
@@ -14,7 +14,7 @@ Shape:
    custom:
      - key: my-new-model
        display_name: My New Model (from download)
-        repo: my-org/my-model
+        repo: my-org/my-model        # an HF repo; OR set local_path instead (exactly one)
        size_gb: 20
        mode: solo
        description: null
@@ -25,6 +25,12 @@ Shape:
          fastsafetensors: true
          prefix_caching: true
          kv_cache_dtype: fp8
+      - key: my-finetune                       # a local/fine-tuned model (a directory on the Spark)
+        display_name: My Fine-tune
+        local_path: /home/you/models/my-finetune
+        size_gb: 59
+        mode: solo
+        vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
 """
 from __future__ import annotations
 import os
@@ -6,7 +6,7 @@ from pathlib import Path
 from fastapi import FastAPI, HTTPException, Query, Request
 from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
 from fastapi.staticfiles import StaticFiles
-from pydantic import BaseModel
+from pydantic import BaseModel, ValidationError
 from typing import Literal

 from .config import Settings
@@ -20,9 +20,9 @@ from .llm_proxy import build_router as build_llm_router
 from .embeddings_proxy import build_router as build_embeddings_router
 from .redaction_gateway import build_router as build_redaction_router, MapStore
 from .hardware import HardwareProbe
-from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
+from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
 from .matrix_bridge import MatrixBridgeManager
-from .models import load_catalog
+from .models import ModelDef, load_catalog
 from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
 from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
 from .services import docker_state, run_action, services_from_settings
@@ -183,7 +183,8 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
 class CustomModelBody(BaseModel):
    key: str
    display_name: str
-    repo: str
+    repo: str = ""
+    local_path: str | None = None
    size_gb: float = 0
    mode: Literal["solo", "cluster"] = "solo"
    description: str | None = None
@@ -196,8 +197,17 @@ class CustomModelBody(BaseModel):
 async def post_model(body: CustomModelBody) -> dict:
    if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
        raise HTTPException(400, "key must be alphanumeric/-/_ only")
+    # Validate the full entry BEFORE persisting (exactly-one source, local-path
+    # whitelist, chat-template location). Doing it via ModelDef means the API and
+    # the YAML-override path share one set of rules, and a bad entry can't be
+    # written to /data and then break catalog load.
    try:
-        validate_repo(body.repo)
+        ModelDef.model_validate(body.model_dump())
+        if body.repo:
+            validate_repo(body.repo)  # HF charset (the model only validates local paths)
+    except ValidationError as e:
+        msg = e.errors()[0]["msg"] if e.errors() else str(e)
+        raise HTTPException(400, msg.removeprefix("Value error, "))
    except ValueError as e:
        raise HTTPException(400, str(e))
    if body.key in catalog.models and not catalog.models[body.key].custom:
@@ -229,7 +239,13 @@ async def get_models_disk_status() -> dict:
        return {"configured": False, "models": {}}
    keys = list(catalog.models.keys())
    statuses = await asyncio.gather(*(
-        probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
+        probe_disk(
+            catalog.models[k].repo,
+            catalog.models[k].mode,
+            settings,
+            local_path=catalog.models[k].local_path,
+        )
+        for k in keys
    ), return_exceptions=True)
    out: dict[str, dict] = {}
    for k, s in zip(keys, statuses):
@@ -260,6 +276,14 @@ async def del_model_disk(key: str) -> dict:
        raise HTTPException(404, f"unknown model: {key}")
    m = catalog.models[key]

+    # Never rm a local fine-tune directory from the dashboard — it's irreplaceable
+    # training output the user placed by hand, not a re-downloadable HF cache.
+    if m.local_path:
+        raise HTTPException(
+            400,
+            "this is a local model; its directory must be managed on the Spark, not deleted from here",
+        )
+
    # Refuse if currently loaded
    try:
        vllm = await check_vllm(settings)
@@ -476,6 +500,10 @@ async def get_services() -> dict:
            http = await check_embeddings(settings)
        elif name == "qdrant":
            http = await check_qdrant(settings)
+        elif svc.kind == "vllm":
+            # An extra vLLM monitored on another Spark (registered as a custom
+            # service). Probe its own host/port, not the primary Spark 1 one.
+            http = await probe_vllm_endpoint(svc.host, svc.port)
        elif svc.kind == "bot":
            # No HTTP health endpoint (host networking, no port) — judged purely
            # by docker state. http_ready stays None so the badge isn't pinned
@@ -497,7 +525,7 @@ async def get_services() -> dict:
            # Prefer the check fn's own top-level model key (embeddings reports
            # it there); fall back to a model field inside detail for services
            # whose /health embeds it (parakeet).
-            "model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
+            "model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
            "docker_state": docker.get("state"),
            "restart_count": docker.get("restart_count"),
            "started_at": docker.get("started_at"),
@@ -775,17 +803,20 @@ async def get_endpoints() -> dict:
            "base_url": vllm.get("base_url"),
            "model": vllm.get("current_model"),
            "openai_compat": True,
+            "disabled": bool(vllm.get("disabled")),
        },
        "parakeet": {
            "ready": bool(parakeet.get("ok")),
            "base_url": parakeet.get("base_url"),
            "kind": "stt",
            "model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
+            "disabled": bool(parakeet.get("disabled")),
        },
        "kokoro": {
            "ready": bool(kokoro.get("ok")),
            "base_url": kokoro.get("base_url"),
            "kind": "tts",
+            "disabled": bool(kokoro.get("disabled")),
        },
        "embeddings": {
            "ready": bool(embeddings.get("ok")),
@@ -794,12 +825,14 @@ async def get_endpoints() -> dict:
            "model": embeddings.get("model"),
            # The proxied OpenAI-compatible endpoints live on Spark Control itself.
            "openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
+            "disabled": bool(embeddings.get("disabled")),
        },
        "qdrant": {
            "ready": bool(qdrant.get("ok")),
            "base_url": qdrant.get("base_url"),
            "kind": "vectordb",
            "collection": settings.qdrant_collection or None,
+            "disabled": bool(qdrant.get("disabled")),
        },
    }

@@ -813,12 +846,15 @@ async def get_status() -> dict:
        check_embeddings(settings),
        check_qdrant(settings),
    )
-    # Feed health into the connectivity log (deduped — only logs on transition)
-    record_state("vllm", bool(vllm.get("ok")))
-    record_state("parakeet", bool(parakeet.get("ok")))
-    record_state("kokoro", bool(kokoro.get("ok")))
-    record_state("embeddings", bool(embeddings.get("ok")))
-    record_state("qdrant", bool(qdrant.get("ok")))
+    # Feed health into the connectivity log (deduped — only logs on transition).
+    # Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
+    # perpetually down.
+    for _name, _r in (
+        ("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
+        ("embeddings", embeddings), ("qdrant", qdrant),
+    ):
+        if not _r.get("disabled"):
+            record_state(_name, bool(_r.get("ok")))
    current_key = _identify_current_model(vllm.get("current_model"))
    return {
        "configured": settings.configured,
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
 appropriate host.
 """
 from __future__ import annotations
+import logging
 import time
 from dataclasses import dataclass
 from typing import Literal, Optional
@@ -13,6 +14,8 @@ from .config import Settings
 from .shellsafe import quote_arg
 from .ssh import ssh_run

+log = logging.getLogger(__name__)
+

 # Cache the "unreachable" verdict per (host, user) for a short period so that a
 # repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
@@ -103,7 +106,13 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
    }
    for entry in load_custom_services():
        key = entry.get("key")
-        if not key or key in out:
+        if not key:
+            continue
+        if key in out:
+            # A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
+            # an adopter who picked a colliding key for, say, a second vLLM sees
+            # why no tile appeared instead of a silent no-op.
+            log.warning("custom service %r collides with a built-in name; ignoring", key)
            continue
        out[key] = ServiceDef(
            name=key,
@@ -113,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
            container=entry.get("container", key),
            port=int(entry.get("port", 0)),
        )
-    return out
+    # Drop services the deployment has switched off (DISABLED_SERVICES) so they
+    # show no tile and are never probed/auto-restarted.
+    return {k: v for k, v in out.items() if k not in s.disabled_services}


 async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
@@ -28,6 +28,12 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
 # Docker container / volume name (Docker's own rule).
 _CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")

+# Absolute filesystem path to a local model directory on a Spark. Conservative
+# charset (letters, digits, and safe path punctuation) with a required leading
+# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
+# and '..' segments) is rejected separately in validate_local_path.
+_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
+

 def validate_repo(repo: str) -> str:
    """Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
@@ -50,6 +56,25 @@ def validate_container(name: str) -> str:
    return name


+def validate_local_path(path: str) -> str:
+    """Return `path` if it is a safe absolute model directory path; else ValueError.
+
+    For locally fine-tuned models served by directory (not an HF repo). Requires
+    an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
+    caller cannot traverse out of an intended models directory. The `quote_arg`
+    sink still quotes it in depth — this is the boundary check.
+    """
+    p = path or ""
+    if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
+        raise ValueError(
+            f"invalid local model path (expected an absolute path, no spaces or "
+            f"shell metacharacters): {path!r}"
+        )
+    if any(seg in (".", "..") for seg in p.split("/")):
+        raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
+    return p
+
+
 def quote_arg(value: object) -> str:
    """shlex.quote a single token for safe embedding in a shell command string."""
    return shlex.quote(str(value))
@@ -60,6 +60,7 @@ function renderCards() {
      ? `<div class="desc">${escapeHtml(m.description)}</div>`
      : '';
    const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
+    const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
    // Disk-presence pill + trash button. Until /api/models/disk-status comes back,
    // we don't know — render a neutral placeholder.
    const disk = state.disk_status[key];
@@ -73,8 +74,10 @@ function renderCards() {
      }
    }
    // Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
+    // Never offered for local models: their directory is hand-placed training output,
+    // not a re-downloadable HF cache (the server refuses the delete too).
    let trashBtn = '';
-    if (state.disk_status_loaded && disk && disk.on_disk) {
+    if (state.disk_status_loaded && disk && disk.on_disk && !m.local_path) {
      const disabled = isActive || isSwapping;
      const tip = isActive
        ? 'Currently loaded — switch to another model first'
@@ -92,6 +95,9 @@ function renderCards() {
      primaryBtn = `<button class="btn" disabled>Current</button>`;
    } else if (isOnDisk) {
      primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
+    } else if (m.local_path) {
+      // A local model can't be "downloaded" — its directory has to exist on the Spark.
+      primaryBtn = `<button class="btn" disabled title="Directory not found on the Spark — create it there, then refresh">Not found on Spark</button>`;
    } else {
      const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
      primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
@@ -102,12 +108,15 @@ function renderCards() {
        <span class="tag mode-${m.mode}">${m.mode}</span>
        <span class="tag">${m.size_gb} GB</span>
        ${customPill}
+        ${localPill}
        ${diskPill}
        ${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
      </div>
      ${desc}
      <div class="muted small repo">
-        <a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>
+        ${m.local_path
+          ? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
+          : `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
      </div>
      <div class="spacer"></div>
      <div class="card-actions">
@@ -923,6 +932,10 @@ function renderHealth(status) {
  function setDot(id, ok, payload) {
    const item = el(id);
    if (!item) return;
+    // A service switched off via DISABLED_SERVICES isn't part of this
+    // deployment — hide its indicator entirely rather than show it as down.
+    if (payload && payload.disabled) { item.classList.add('hidden'); return; }
+    item.classList.remove('hidden');
    const dot = item.querySelector('.dot');
    dot.classList.remove('ok', 'bad', 'warn');
    if (ok === true) dot.classList.add('ok');
@@ -1671,6 +1684,60 @@ function setupAdvancedDialog() {
  el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
 }

+function openLocalModelDialog() {
+  const dlg = el('#local-model-dialog');
+  el('#lm-key').value = '';
+  el('#lm-name').value = '';
+  el('#lm-path').value = '';
+  el('#lm-chat').value = '';
+  el('#lm-size').value = '';
+  el('#lm-mode').value = 'solo';
+  el('#lm-desc').value = '';
+  el('#lm-mml').value = 32768;
+  el('#lm-gmu').value = 0.85;
+  el('#lm-gmu-out').value = '0.85';
+  el('#lm-fst').checked = true;
+  el('#lm-pcache').checked = true;
+  el('#lm-fp8').checked = true;
+  dlg.showModal();
+}
+
+function setupLocalModelDialog() {
+  el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
+  el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
+  el('#local-model-form').addEventListener('submit', async (e) => {
+    e.preventDefault();
+    const chat = el('#lm-chat').value.trim();
+    const body = {
+      key: el('#lm-key').value.trim(),
+      display_name: el('#lm-name').value.trim(),
+      local_path: el('#lm-path').value.trim(),
+      size_gb: parseFloat(el('#lm-size').value) || 0,
+      mode: el('#lm-mode').value,
+      description: el('#lm-desc').value.trim() || null,
+      // A fine-tune's chat template (if any) rides along as a launch flag.
+      vllm_args: chat ? [`--chat-template=${chat}`] : [],
+      knobs: {
+        max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
+        gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
+        fastsafetensors: el('#lm-fst').checked,
+        prefix_caching: el('#lm-pcache').checked,
+        kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
+      },
+    };
+    try {
+      await fetchJSON('/api/models', {
+        method: 'POST',
+        headers: { 'content-type': 'application/json' },
+        body: JSON.stringify(body),
+      });
+      el('#local-model-dialog').close();
+      await loadModels();
+      pollStatus();
+    } catch (e) { alert('Add local model failed: ' + e.message); }
+  });
+}
+
 // ===================== NIM installer =====================

 const nimState = {
@@ -2034,8 +2101,10 @@ async function init() {
    if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
  });
  el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
+  el('#open-local').addEventListener('click', openLocalModelDialog);
  setupCatalogDialog();
  setupAdvancedDialog();
+  setupLocalModelDialog();
  // Open WebUI link from /api/config
  try {
    state.config = await fetchJSON('/api/config');
@@ -229,6 +229,7 @@
      <div class="section-header">
        <h2 class="section-title">LLM swap</h2>
        <button id="open-download" class="btn small-btn">+ Download a new model</button>
+        <button id="open-local" class="btn small-btn">+ Add local model</button>
      </div>

      <dialog id="catalog-dialog" class="modal">
@@ -261,6 +262,37 @@
        </form>
      </dialog>

+      <dialog id="local-model-dialog" class="modal">
+        <form method="dialog" class="modal-form" id="local-model-form">
+          <h3>Add a local / fine-tuned model</h3>
+          <p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
+          <label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
+          <label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
+          <label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
+          <label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
+          <label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
+          <label class="modal-row"><span>Mode</span>
+            <select id="lm-mode">
+              <option value="solo">solo (Spark 1 only)</option>
+              <option value="cluster">cluster (both Sparks via Ray)</option>
+            </select>
+          </label>
+          <label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
+          <fieldset class="modal-fieldset">
+            <legend>Default launch knobs</legend>
+            <label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
+            <label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
+            <label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
+            <label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
+            <label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
+          </fieldset>
+          <div class="modal-actions">
+            <button type="button" id="lm-cancel" class="btn">Cancel</button>
+            <button type="submit" class="btn primary">Add local model</button>
+          </div>
+        </form>
+      </dialog>
+
      <dialog id="disk-delete-dialog" class="modal">
        <form method="dialog" class="modal-form">
          <h3>Delete model weights from disk?</h3>
@@ -694,6 +694,7 @@ main {
 .card .repo a { color: inherit; text-decoration: none; }
 .card .repo a:hover { color: var(--info); text-decoration: underline; }
 .card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
+.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
 .tag {
  background: var(--surface-2);
  border: 1px solid var(--border);
@@ -738,6 +739,7 @@ main {
 .card .adv-btn,
 .card .test-btn { padding: 8px 12px; font-size: 12px; }
 .card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
+.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
 .tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
 .tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
 .card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
@@ -7,6 +7,7 @@ from typing import Optional

 from .config import Settings
 from .models import Catalog, build_launch_command
+from .shellsafe import quote_arg
 from .ssh import ssh_run, ssh_stream, StreamHandle


@@ -112,7 +113,7 @@ class SwapManager:

        # Step 3: tail logs until the ready marker (or timeout)
        job.state = "tailing"
-        tail_cmd = "docker logs -f --tail 50 vllm_node"
+        tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
        job.append(f"$ {tail_cmd}")
        timeout = max(model.expected_ready_seconds * 2, 600)
        handle = StreamHandle()
@@ -22,6 +22,7 @@ from typing import Any

 from .config import Settings
 from .models import Catalog, build_launch_command
+from .shellsafe import quote_arg
 from .ssh import ssh_run


@@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
    # Pipe the JSON args list to a here-doc Python invocation. The validator
    # reads from stdin to avoid shell-escaping the args themselves.
    cmd = (
-        f"echo '{payload}' | docker exec -i vllm_node python3 -c "
+        f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
        + shlex.quote(_VALIDATOR_SCRIPT)
    )

@@ -7,6 +7,9 @@ the command back into the exact token list. The vLLM pre-flight validator
 """
 import shlex

+import pytest
+from pydantic import ValidationError
+
 from app.models import Defaults, ModelDef, build_launch_command

 DEFAULTS = Defaults(port=8888, host="0.0.0.0")
@@ -65,3 +68,81 @@ def test_injection_via_vllm_arg_stays_literal():
    payload = "--foo=$(touch /tmp/pwned)"
    cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
    assert payload in shlex.split(cmd)  # preserved as one inert token
+
+
+# ---- local / fine-tuned models (served by directory, not HF repo) ----
+
+def test_local_model_bind_mounts_dir_and_serves_the_path():
+    m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
+    cmd = build_launch_command("k", m, DEFAULTS)
+    tokens = shlex.split(cmd)
+    # The launch script's hook bind-mounts the host dir at the SAME container path.
+    assert tokens[0] == (
+        "VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
+    )
+    # vLLM is pointed at the directory, not an HF repo id.
+    i = tokens.index("serve")
+    assert tokens[i + 1] == "/home/u/models/ft-v2"
+    assert "--max-model-len=2048" in tokens
+
+
+def test_local_model_chat_template_arg_survives_round_trip():
+    m = _model(
+        repo="",
+        local_path="/m/ft",
+        vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
+    )
+    cmd = build_launch_command("k", m, DEFAULTS)
+    assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
+
+
+def test_local_path_with_metacharacters_is_quoted_not_executed():
+    # The validator rejects a hostile path at the boundary; bypass it with
+    # model_construct to prove the quote_arg sink is safe in depth even if a bad
+    # value somehow reaches build_launch_command.
+    evil = "/m/ft; rm -rf ~"
+    m = ModelDef.model_construct(
+        display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
+        vllm_args=[], knobs=None, custom=False, capabilities=[],
+        expected_ready_seconds=300, description=None,
+    )
+    cmd = build_launch_command("k", m, DEFAULTS)
+    tokens = shlex.split(cmd)
+    i = tokens.index("serve")
+    assert tokens[i + 1] == evil  # recovered as one literal token, not executed
+    assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
+
+
+def test_model_requires_exactly_one_source():
+    with pytest.raises(ValidationError):
+        ModelDef(display_name="x", size_gb=1, mode="solo")  # neither repo nor local_path
+    with pytest.raises(ValidationError):
+        ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo")  # both
+
+
+def test_local_model_rejects_chat_template_outside_dir():
+    # Only local_path is mounted into the container, so a chat-template elsewhere
+    # would silently 404 inside vLLM — reject it up front.
+    with pytest.raises(ValidationError):
+        ModelDef(
+            display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
+            vllm_args=["--chat-template=/other/dir/t.jinja"],
+        )
+
+
+def test_invalid_local_path_rejected_by_model():
+    with pytest.raises(ValidationError):
+        ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
+
+
+def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
+    # YAML/override-added local models get the same validation as the API; a single
+    # bad entry is skipped (logged) rather than breaking the whole catalog load.
+    from app import models as M
+    monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
+        {"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
+        {"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
+    ]})
+    cat = M._merge_overrides(M.Catalog(models={}))
+    assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
+    assert "bad" not in cat.models  # traversal path skipped, not catalog-fatal
@@ -6,7 +6,12 @@ use `validate_x(v)` inline.
 """
 import pytest

-from app.shellsafe import validate_container, validate_image, validate_repo
+from app.shellsafe import (
+    validate_container,
+    validate_image,
+    validate_local_path,
+    validate_repo,
+)

 # Shell metacharacters that must never survive any validator — these are the
 # actual injection vectors. (Path traversal like "../" is NOT in scope here:
@@ -96,3 +101,27 @@ def test_container_valid_passes_through_unchanged(name):
 def test_container_rejects_malformed_and_hostile(name):
    with pytest.raises(ValueError):
        validate_container(name)
+
+
+# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
+
+@pytest.mark.parametrize("path", [
+    "/home/modelo/models/gemma-4-31B-ten31-v2",
+    "/data/models/ft.v2_1",
+    "/srv/m/a-b/c",
+])
+def test_local_path_valid_passes_through_unchanged(path):
+    assert validate_local_path(path) == path
+
+
+@pytest.mark.parametrize("path", [
+    "",
+    "relative/path",            # must be absolute
+    "~/models/x",               # no ~ expansion
+    "/models/../etc/shadow",    # '..' traversal
+    "/models/./x",              # '.' segment
+    "/a" * 300,                 # over the 512 cap (600 chars)
+] + [f"/models/x{h}" for h in HOSTILE])
+def test_local_path_rejects_relative_traversal_and_hostile(path):
+    with pytest.raises(ValueError):
+        validate_local_path(path)
@@ -0,0 +1,120 @@
+"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
+extra-vLLM probe. All offline — the disabled checks short-circuit before any
+network call, and the probes are exercised only on the not-configured path.
+"""
+import asyncio
+
+from app.config import Settings
+from app.health import (
+    check_embeddings,
+    check_kokoro,
+    check_parakeet,
+    check_qdrant,
+    check_vllm,
+    probe_vllm_endpoint,
+)
+from app.services import services_from_settings
+
+
+def _settings(monkeypatch, **env) -> Settings:
+    # Pin the topology env vars under test; default the rest to blank so a stray
+    # value in the real environment can't leak into the assertion.
+    keys = [
+        "SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
+        "DISABLED_SERVICES", "VLLM_CONTAINER",
+    ]
+    for k in keys:
+        monkeypatch.delenv(k, raising=False)
+    for k, v in env.items():
+        monkeypatch.setenv(k, v)
+    return Settings.from_env()
+
+
+# ---- DISABLED_SERVICES parsing ----
+
+def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
+    s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
+    assert s.disabled_services == frozenset({"parakeet", "kokoro"})
+
+
+def test_disabled_services_blank_is_empty(monkeypatch):
+    assert _settings(monkeypatch).disabled_services == frozenset()
+
+
+# ---- vLLM container override ----
+
+def test_vllm_container_defaults_to_vllm_node(monkeypatch):
+    assert _settings(monkeypatch).vllm_container == "vllm_node"
+
+
+def test_vllm_container_override(monkeypatch):
+    assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
+
+
+def test_vllm_container_invalid_falls_back(monkeypatch):
+    # A malformed value (space / shell metachar) is rejected at the boundary and
+    # falls back to the default rather than crashing startup or reaching a sink.
+    assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
+
+
+# ---- services map honors the disable list ----
+
+def test_services_from_settings_drops_disabled(monkeypatch):
+    s = _settings(
+        monkeypatch,
+        SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+        SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
+        DISABLED_SERVICES="parakeet,qdrant",
+    )
+    svcs = services_from_settings(s)
+    assert "parakeet" not in svcs and "qdrant" not in svcs
+    assert "kokoro" in svcs and "embeddings" in svcs
+
+
+def test_custom_vllm_service_registered(monkeypatch):
+    from app import custom_services
+    monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
+        {"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
+         "user": "u", "container": "vllm_node", "port": 8000},
+    ])
+    s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+                  SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
+    svc = services_from_settings(s)["vllm-spark2"]
+    assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
+
+
+def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
+    # A custom entry can't shadow a built-in key — the built-in wins.
+    from app import custom_services
+    monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
+        {"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
+    ])
+    s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+                  SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
+    assert services_from_settings(s)["parakeet"].kind == "stt"
+
+
+# ---- disabled health checks short-circuit (no network) ----
+
+def test_disabled_check_returns_disabled_verdict(monkeypatch):
+    s = _settings(
+        monkeypatch,
+        SPARK2_HOST="10.0.0.2", SPARK2_USER="u",  # host set, but disable wins
+        DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
+    )
+    for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
+        r = asyncio.run(check(s))
+        assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
+
+
+# ---- vLLM probe: not-configured path is pure ----
+
+def test_probe_vllm_endpoint_unconfigured(monkeypatch):
+    r = asyncio.run(probe_vllm_endpoint("", 8000))
+    assert r["ok"] is False and "not configured" in r["error"]
+
+
+def test_check_vllm_unconfigured_without_spark1(monkeypatch):
+    s = _settings(monkeypatch)  # no SPARK1_HOST
+    r = asyncio.run(check_vllm(s))
+    assert r["ok"] is False and "spark1 not configured" in r["error"]
@@ -49,6 +49,24 @@ const inputSpec = InputSpec.of({
    placeholder: 'leave blank for 8888',
    masked: false,
  }),
+  vllm_container: Value.text({
+    name: 'vLLM container name (optional)',
+    description:
+      'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
+    required: false,
+    default: null,
+    placeholder: 'leave blank for vllm_node',
+    masked: false,
+  }),
+  disabled_services: Value.text({
+    name: 'Services to hide (optional)',
+    description:
+      "Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
+    required: false,
+    default: null,
+    placeholder: 'e.g. parakeet,kokoro',
+    masked: false,
+  }),
  parakeet_host: Value.text({
    name: 'Parakeet host (optional)',
    description:
@@ -9,6 +9,11 @@ export const sparkConfigSchema = z.object({
  spark2_user: z.string().catch(''),
  // Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
  vllm_port: z.string().catch(''),
+  // Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
+  vllm_container: z.string().catch(''),
+  // Optional comma-separated list of built-in services to switch off
+  // (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
+  disabled_services: z.string().catch(''),
  // Optional per-service overrides. Blank => use spark2_host / spark2_user.
  parakeet_host: z.string().catch(''),
  parakeet_user: z.string().catch(''),
@@ -14,6 +14,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
    spark2_host: '',
    spark2_user: '',
    vllm_port: '',
+    vllm_container: '',
+    disabled_services: '',
    parakeet_host: '',
    parakeet_user: '',
    parakeet_container: '',
@@ -52,6 +54,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
        SPARK2_HOST: cfg.spark2_host,
        SPARK2_USER: cfg.spark2_user,
        VLLM_PORT: cfg.vllm_port,
+        VLLM_CONTAINER: cfg.vllm_container,
+        DISABLED_SERVICES: cfg.disabled_services,
        PARAKEET_HOST: cfg.parakeet_host,
        PARAKEET_USER: cfg.parakeet_user,
        PARAKEET_CONTAINER: cfg.parakeet_container,
@@ -1,10 +1,10 @@
 import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'

 export const v0_1_0 = VersionInfo.of({
-  version: '0.22.0:0',
+  version: '0.24.0:0',
  releaseNotes: {
    en_US:
-      "v0.22.0:0 — configurable vLLM port. The port Spark Control uses to reach vLLM on Spark 1 (the health check and the chat proxy) is now a field in the Configure Sparks action, so you can point it at a vLLM that listens on a non-default port without rebuilding the package. Leave it blank to keep the previous default of 8888 — what the bundled launch-cluster.sh wrapper uses; set it to 8000 (vLLM's own default) or any other port if your vLLM listens elsewhere. Also hardened numeric-setting parsing so a blank or malformed port value falls back to its default instead of crashing daemon startup.",
+      "v0.24.0:0 — configurable cluster topology. Spark Control no longer assumes our exact layout, so a cluster that's wired differently can be monitored without forking. Three new optional settings in Configure Sparks: (1) vLLM container name — defaults to \"vllm_node\"; set it if your swappable vLLM runs under a different container name (the swap log view and pre-flight validator exec into it by name). (2) Services to hide — a comma-separated list of built-in services your cluster doesn't run (parakeet, kokoro, embeddings, qdrant); hidden ones show no tile and are never probed, so e.g. a vLLM sharing Parakeet's default port 8000 no longer gets a confusing Parakeet probe. (3) Monitor a second vLLM — register a vLLM on another Spark as a custom service with kind \"vllm\" (in /data/services-overrides.yaml); it gets a read-only health tile (loaded model + container state + start/stop/restart) alongside the swappable one. API: /api/endpoints now reports a `disabled` flag per service.",
  },
  migrations: {
    up: async ({ effects }) => {},
@@ -52,6 +52,26 @@ The **Update** button runs `git fetch && git reset --hard origin/<branch> && doc

 3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.

+## Configurable topology (v0.24.0+)
+
+For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
+
+- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
+- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
+- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
+
+  ```yaml
+  custom:
+    - key: vllm-spark2
+      kind: vllm
+      host: <spark-2-ip>
+      user: <ssh-user>
+      container: vllm_node
+      port: 8000
+  ```
+
+  It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
+
 ## Adding a new model

 1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
@@ -60,6 +80,12 @@ The **Update** button runs `git fetch && git reset --hard origin/<branch> && doc

 If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.

+### Local / fine-tuned models (v0.23.0+)
+
+A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
+
+**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
+
 ## Manual swap fallback

 If the UI is unavailable and you need to swap by hand:
@@ -75,6 +101,17 @@ cd ~/spark-vllm-docker
 docker logs -f vllm_node      # wait for "Application startup complete."
 ```

+## Sideload (`make install`) can't reach the server
+
+Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
+
+- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
+- **Fix:** make the name resolve again, then re-run `make install`:
+  - `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
+  - `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
+
+Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
+
 ## Diagnostics

 ```bash
@@ -8,38 +8,58 @@
 # The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
 # (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
 # reuses an existing release for the tag and replaces a same-named asset.
+# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
 set -euo pipefail

 VERSION="${1:-}"; S9PK="${2:-}"
 [ -n "$VERSION" ] && [ -n "$S9PK" ] || {
  echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
 : "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
-: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository write access}"
+: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
 [ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }

 TAG="v${VERSION%%:*}"                      # 0.22.0:0 -> v0.22.0
 ASSET="$(basename "$S9PK")"
 SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')"  # grant/spark-control
 API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
-AUTH=(-H "Authorization: token ${GITEA_TOKEN}")
+CURL=(curl -sS)                            # no -f: we inspect HTTP codes ourselves
+[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)

 echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"

+# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
+api() {
+  local method="$1" url="$2"; shift 2
+  local out
+  out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
+        -w $'\n%{http_code}' "$url")"
+  HTTP_CODE="${out##*$'\n'}"
+  BODY="${out%$'\n'*}"
+}
+
 # Reuse an existing release for this tag, otherwise create one.
-id="$(curl -fsS "${AUTH[@]}" "$API/releases/tags/$TAG" 2>/dev/null | jq -r '.id // empty')"
-if [ -z "$id" ]; then
-  id="$(curl -fsS -X POST "${AUTH[@]}" -H 'Content-Type: application/json' \
+api GET "$API/releases/tags/$TAG"
+if [ "$HTTP_CODE" = 200 ]; then
+  id="$(printf '%s' "$BODY" | jq -r '.id')"
+elif [ "$HTTP_CODE" = 404 ]; then
+  api POST "$API/releases" -H 'Content-Type: application/json' \
    --data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
-      '{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')" \
-    "$API/releases" | jq -r '.id')"
+      '{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
+  [ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
+  id="$(printf '%s' "$BODY" | jq -r '.id')"
+else
+  echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
+  exit 1
 fi
-[ -n "$id" ] && [ "$id" != null ] || { echo "could not obtain release id (check URL/token/tag)" >&2; exit 1; }
+[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }

 # Replace a same-named asset so re-runs don't 409.
-old="$(curl -fsS "${AUTH[@]}" "$API/releases/$id/assets" | jq -r --arg n "$ASSET" '.[] | select(.name==$n) | .id')"
-[ -n "$old" ] && curl -fsS -X DELETE "${AUTH[@]}" "$API/releases/$id/assets/$old" >/dev/null || true
+api GET "$API/releases/$id/assets"
+old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
+[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }

-curl -fsS -X POST "${AUTH[@]}" -F "attachment=@${S9PK};type=application/octet-stream" \
-  "$API/releases/$id/assets?name=$ASSET" >/dev/null
+api POST "$API/releases/$id/assets?name=$ASSET" \
+  -F "attachment=@${S9PK};type=application/octet-stream"
+[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }

 echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"
Author	SHA1	Message	Date
Keysat	dd3d1412d4	docs: v0.24.0:0 committed/tagged/pushed — Gitea release asset + live install still pending	2026-06-17 23:11:14 -05:00
Keysat	26070eb191	v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor) Make the cluster topology configurable so an adopter wired differently (vLLM on both Sparks, port 8000, different container name, no Parakeet) can monitor without forking. Covers the OpenClaw report P4/P5/#6. - VLLM_CONTAINER override (default vllm_node), validated at the boundary and quote_arg-quoted into the swap log-tail + pre-flight validator exec. - DISABLED_SERVICES list: hidden services show no tile and are skipped by status/deep-health/connectivity probes (kills the Parakeet-on-8000 collision). - kind: vllm custom service monitors a second Spark's vLLM via the shared probe_vllm_endpoint; /api/endpoints gains a disabled flag. Swap mechanism intentionally not generalized to raw docker run (that's coordination, roadmap item 4).	2026-06-17 23:03:33 -05:00
Keysat	90394f891b	docs: v0.23.0 published, live install pending (mDNS); runbook sideload troubleshooting	2026-06-17 22:36:41 -05:00
Keysat	e783653ef0	v0.23.0:0 - local / fine-tuned model support Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes), not just Hugging Face repos. - ModelDef gains local_path; a model must set exactly one of repo / local_path. The validator also enforces the local-path whitelist and that any --chat-template lives inside local_path (only that dir is mounted). - build_launch_command bind-mounts the dir into the vLLM container at the SAME host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook, then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream expands that var unquoted; contract noted in runbook.md). - shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'. - POST /api/models validates the full entry via ModelDef before persisting, so a bad entry can't be written and then break catalog load; _merge_overrides skips an invalid override entry instead of failing the whole catalog. - disk.py size-probes a local path with du; disk-delete refused for local models. - UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF link, delete button hidden for local models. - Tests: local launch + injection round-trip, chat-template location, traversal, exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent pass; findings addressed.	2026-06-17 22:27:41 -05:00
Keysat	57a893000e	docs: document the Gitea release ritual in startos-package guide	2026-06-17 21:29:27 -05:00
Keysat	56f7ea4444	fix: gitea-release.sh tolerate 404 on tag lookup; report HTTP errors; mark v0.22.0 published	2026-06-17 21:23:21 -05:00
Keysat	aaad57d88f	docs: mark v0.22.0:0 shipped + record Gitea-release distribution decision	2026-06-17 19:47:49 -05:00