diff --git a/AGENTS.md b/AGENTS.md
index c3287d8..bde7d87 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -57,10 +57,10 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
 
 - **Live service runs v0.22.0:0** (installed and serving); **v0.23.0:0 is built, committed (`e783653`), tagged, and published to Gitea Releases but its live install is PENDING** — see the P3 line below. Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Local/fine-tuned model support lands live once v0.23.0:0 installs. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
 - **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
-- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (70 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, the `shellsafe` validators, and `matrix_bridge.build_update_command` (+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
+- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (102 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), and the configurable-topology layer (`test_topology.py`: `DISABLED_SERVICES` parsing, `vllm_container` override, disabled-service skip in `services_from_settings` + `check_*`, `probe_vllm_endpoint`). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
 - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
 - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
 - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
 - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
 - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
-- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
+- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release` → `scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. Next: (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
diff --git a/ROADMAP.md b/ROADMAP.md
index 476c517..6a234f3 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -11,7 +11,7 @@ Driven by the one other Spark Control adopter (a colleague running OpenClaw + cr
 Sequenced:
 1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
 2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
-3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
+3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
 4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
    - **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
    - **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
diff --git a/image/app/config.py b/image/app/config.py
index e0d50aa..75107c4 100644
--- a/image/app/config.py
+++ b/image/app/config.py
@@ -1,13 +1,44 @@
 from __future__ import annotations
+import logging
 import os
 from dataclasses import dataclass
 from pathlib import Path
 
+from .shellsafe import validate_container
+
+log = logging.getLogger(__name__)
+
 
 def _env(name: str, default: str = "") -> str:
     return os.environ.get(name, default)
 
 
+def _env_container(name: str, default: str) -> str:
+    """Resolve a container-name env var, validating it at the config boundary.
+
+    The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
+    the sink — but per the repo's two-layer convention it's also whitelist-checked
+    here. A malformed optional value falls back to `default` rather than crashing
+    daemon startup (mirrors `_env_int` for VLLM_PORT)."""
+    val = os.environ.get(name, "") or default
+    try:
+        return validate_container(val)
+    except ValueError:
+        log.warning("ignoring invalid %s=%r; using %r", name, val, default)
+        return default
+
+
+def _env_set(name: str) -> frozenset[str]:
+    """Parse a comma-separated env var into a lowercased frozenset of keys.
+
+    Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
+    support service can switch its tile + probes off entirely (rather than have
+    the probe hit whatever else listens on that port — e.g. a vLLM sharing
+    Parakeet's default 8000)."""
+    raw = os.environ.get(name, "")
+    return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
+
+
 def _env_int(name: str, default: int) -> int:
     """Parse an int env var, falling back to `default` when unset, blank, or
     malformed. The StartOS Configure panel passes optional numeric fields as an
@@ -63,6 +94,8 @@ class Settings:
     ssh_known_hosts: str
     models_yaml: str
     vllm_port: int
+    vllm_container: str
+    disabled_services: frozenset[str]
     parakeet_port: int
     kokoro_port: int
     embed_port: int
@@ -116,6 +149,15 @@ class Settings:
             ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
             models_yaml=_resolve_models_yaml(),
             vllm_port=_env_int("VLLM_PORT", 8888),
+            # Container name for the swappable vLLM on Spark 1. Defaults to the
+            # bundled launch-cluster.sh container; override if you named yours
+            # something else (the swap log-tail and pre-flight validator exec
+            # into it by name).
+            vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
+            # Built-in support-service keys (parakeet, kokoro, embeddings,
+            # qdrant) the deployment doesn't run — hidden from the dashboard and
+            # never probed.
+            disabled_services=_env_set("DISABLED_SERVICES"),
             parakeet_port=_env_int("PARAKEET_PORT", 8000),
             kokoro_port=_env_int("KOKORO_PORT", 8880),
             embed_port=_env_int("EMBED_PORT", 8088),
diff --git a/image/app/custom_services.py b/image/app/custom_services.py
index 3537ef8..18f88a9 100644
--- a/image/app/custom_services.py
+++ b/image/app/custom_services.py
@@ -10,6 +10,17 @@ Format:
         port: 8001
         health_path: /health
         image: nvcr.io/nim/nvidia/riva-multilingual:latest
+
+A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
+the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
+probed via /v1/models plus container state and start/stop/restart:
+    custom:
+      - key: vllm-spark2
+        kind: vllm
+        host: <spark-2-ip>
+        user: <ssh-user>
+        container: vllm_node
+        port: 8000
 """
 from __future__ import annotations
 import os
diff --git a/image/app/deep_health.py b/image/app/deep_health.py
index bc15ef8..769d1ea 100644
--- a/image/app/deep_health.py
+++ b/image/app/deep_health.py
@@ -377,6 +377,10 @@ class DeepHealth:
     async def run_all(self) -> dict[str, ProbeResult]:
         results = {}
         for name in self.PROBES:
+            # Don't deep-probe a service the deployment switched off — its port
+            # may be answered by something else (e.g. a vLLM on Parakeet's 8000).
+            if name in self.settings.disabled_services:
+                continue
             results[name] = await self.run_one(name)
         return results
 
diff --git a/image/app/health.py b/image/app/health.py
index 1ddeb12..2ee6d89 100644
--- a/image/app/health.py
+++ b/image/app/health.py
@@ -6,17 +6,28 @@ from .config import Settings
 _TIMEOUT = 3.0
 
 
-async def check_vllm(settings: Settings) -> dict:
-    base_url = (
-        f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
-        if settings.spark1_host
-        else None
-    )
-    if not settings.spark1_host:
-        return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
+def _disabled(settings: Settings, key: str) -> dict | None:
+    """A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
+
+    Lets an adopter who doesn't run a given support service switch its probe off
+    entirely — so the probe never hits whatever else listens on that port, and
+    the connectivity log doesn't record it as perpetually down."""
+    if key in settings.disabled_services:
+        return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
+    return None
+
+
+async def probe_vllm_endpoint(host: str, port: int) -> dict:
+    """Probe any OpenAI-compatible vLLM at host:port via /v1/models.
+
+    Shared by the primary (Spark 1) health check and any extra vLLM registered
+    as a custom service (kind: vllm) to monitor a second Spark."""
+    base_url = f"http://{host}:{port}/v1" if host else None
+    if not host:
+        return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
     try:
         async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
-            r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
+            r = await c.get(f"http://{host}:{port}/v1/models")
             r.raise_for_status()
             ids = [m["id"] for m in r.json().get("data", [])]
             return {
@@ -29,7 +40,15 @@ async def check_vllm(settings: Settings) -> dict:
         return {"ok": False, "error": str(e), "base_url": base_url}
 
 
+async def check_vllm(settings: Settings) -> dict:
+    if not settings.spark1_host:
+        return {"ok": False, "error": "spark1 not configured", "base_url": None}
+    return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
+
+
 async def check_parakeet(settings: Settings) -> dict:
+    if d := _disabled(settings, "parakeet"):
+        return d
     base_url = (
         f"http://{settings.parakeet_host}:{settings.parakeet_port}"
         if settings.parakeet_host
@@ -47,6 +66,8 @@ async def check_parakeet(settings: Settings) -> dict:
 
 
 async def check_kokoro(settings: Settings) -> dict:
+    if d := _disabled(settings, "kokoro"):
+        return d
     base_url = (
         f"http://{settings.kokoro_host}:{settings.kokoro_port}"
         if settings.kokoro_host
@@ -68,6 +89,8 @@ async def check_kokoro(settings: Settings) -> dict:
 
 
 async def check_embeddings(settings: Settings) -> dict:
+    if d := _disabled(settings, "embeddings"):
+        return d
     base_url = (
         f"http://{settings.embed_host}:{settings.embed_port}"
         if settings.embed_host
@@ -89,6 +112,8 @@ async def check_embeddings(settings: Settings) -> dict:
 
 
 async def check_qdrant(settings: Settings) -> dict:
+    if d := _disabled(settings, "qdrant"):
+        return d
     base_url = (
         f"http://{settings.qdrant_host}:{settings.qdrant_port}"
         if settings.qdrant_host
diff --git a/image/app/server.py b/image/app/server.py
index 93ff0a5..e8249ea 100644
--- a/image/app/server.py
+++ b/image/app/server.py
@@ -20,7 +20,7 @@ from .llm_proxy import build_router as build_llm_router
 from .embeddings_proxy import build_router as build_embeddings_router
 from .redaction_gateway import build_router as build_redaction_router, MapStore
 from .hardware import HardwareProbe
-from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
+from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
 from .matrix_bridge import MatrixBridgeManager
 from .models import ModelDef, load_catalog
 from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
@@ -500,6 +500,10 @@ async def get_services() -> dict:
             http = await check_embeddings(settings)
         elif name == "qdrant":
             http = await check_qdrant(settings)
+        elif svc.kind == "vllm":
+            # An extra vLLM monitored on another Spark (registered as a custom
+            # service). Probe its own host/port, not the primary Spark 1 one.
+            http = await probe_vllm_endpoint(svc.host, svc.port)
         elif svc.kind == "bot":
             # No HTTP health endpoint (host networking, no port) — judged purely
             # by docker state. http_ready stays None so the badge isn't pinned
@@ -521,7 +525,7 @@ async def get_services() -> dict:
             # Prefer the check fn's own top-level model key (embeddings reports
             # it there); fall back to a model field inside detail for services
             # whose /health embeds it (parakeet).
-            "model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
+            "model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
             "docker_state": docker.get("state"),
             "restart_count": docker.get("restart_count"),
             "started_at": docker.get("started_at"),
@@ -799,17 +803,20 @@ async def get_endpoints() -> dict:
             "base_url": vllm.get("base_url"),
             "model": vllm.get("current_model"),
             "openai_compat": True,
+            "disabled": bool(vllm.get("disabled")),
         },
         "parakeet": {
             "ready": bool(parakeet.get("ok")),
             "base_url": parakeet.get("base_url"),
             "kind": "stt",
             "model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
+            "disabled": bool(parakeet.get("disabled")),
         },
         "kokoro": {
             "ready": bool(kokoro.get("ok")),
             "base_url": kokoro.get("base_url"),
             "kind": "tts",
+            "disabled": bool(kokoro.get("disabled")),
         },
         "embeddings": {
             "ready": bool(embeddings.get("ok")),
@@ -818,12 +825,14 @@ async def get_endpoints() -> dict:
             "model": embeddings.get("model"),
             # The proxied OpenAI-compatible endpoints live on Spark Control itself.
             "openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
+            "disabled": bool(embeddings.get("disabled")),
         },
         "qdrant": {
             "ready": bool(qdrant.get("ok")),
             "base_url": qdrant.get("base_url"),
             "kind": "vectordb",
             "collection": settings.qdrant_collection or None,
+            "disabled": bool(qdrant.get("disabled")),
         },
     }
 
@@ -837,12 +846,15 @@ async def get_status() -> dict:
         check_embeddings(settings),
         check_qdrant(settings),
     )
-    # Feed health into the connectivity log (deduped — only logs on transition)
-    record_state("vllm", bool(vllm.get("ok")))
-    record_state("parakeet", bool(parakeet.get("ok")))
-    record_state("kokoro", bool(kokoro.get("ok")))
-    record_state("embeddings", bool(embeddings.get("ok")))
-    record_state("qdrant", bool(qdrant.get("ok")))
+    # Feed health into the connectivity log (deduped — only logs on transition).
+    # Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
+    # perpetually down.
+    for _name, _r in (
+        ("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
+        ("embeddings", embeddings), ("qdrant", qdrant),
+    ):
+        if not _r.get("disabled"):
+            record_state(_name, bool(_r.get("ok")))
     current_key = _identify_current_model(vllm.get("current_model"))
     return {
         "configured": settings.configured,
diff --git a/image/app/services.py b/image/app/services.py
index 2c9b71b..01795bb 100644
--- a/image/app/services.py
+++ b/image/app/services.py
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
 appropriate host.
 """
 from __future__ import annotations
+import logging
 import time
 from dataclasses import dataclass
 from typing import Literal, Optional
@@ -13,6 +14,8 @@ from .config import Settings
 from .shellsafe import quote_arg
 from .ssh import ssh_run
 
+log = logging.getLogger(__name__)
+
 
 # Cache the "unreachable" verdict per (host, user) for a short period so that a
 # repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
@@ -103,7 +106,13 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
     }
     for entry in load_custom_services():
         key = entry.get("key")
-        if not key or key in out:
+        if not key:
+            continue
+        if key in out:
+            # A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
+            # an adopter who picked a colliding key for, say, a second vLLM sees
+            # why no tile appeared instead of a silent no-op.
+            log.warning("custom service %r collides with a built-in name; ignoring", key)
             continue
         out[key] = ServiceDef(
             name=key,
@@ -113,7 +122,9 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
             container=entry.get("container", key),
             port=int(entry.get("port", 0)),
         )
-    return out
+    # Drop services the deployment has switched off (DISABLED_SERVICES) so they
+    # show no tile and are never probed/auto-restarted.
+    return {k: v for k, v in out.items() if k not in s.disabled_services}
 
 
 async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
diff --git a/image/app/static/app.js b/image/app/static/app.js
index ff96c14..7ea1778 100644
--- a/image/app/static/app.js
+++ b/image/app/static/app.js
@@ -932,6 +932,10 @@ function renderHealth(status) {
   function setDot(id, ok, payload) {
     const item = el(id);
     if (!item) return;
+    // A service switched off via DISABLED_SERVICES isn't part of this
+    // deployment — hide its indicator entirely rather than show it as down.
+    if (payload && payload.disabled) { item.classList.add('hidden'); return; }
+    item.classList.remove('hidden');
     const dot = item.querySelector('.dot');
     dot.classList.remove('ok', 'bad', 'warn');
     if (ok === true) dot.classList.add('ok');
diff --git a/image/app/swap.py b/image/app/swap.py
index 07d400a..49a1bc1 100644
--- a/image/app/swap.py
+++ b/image/app/swap.py
@@ -7,6 +7,7 @@ from typing import Optional
 
 from .config import Settings
 from .models import Catalog, build_launch_command
+from .shellsafe import quote_arg
 from .ssh import ssh_run, ssh_stream, StreamHandle
 
 
@@ -112,7 +113,7 @@ class SwapManager:
 
         # Step 3: tail logs until the ready marker (or timeout)
         job.state = "tailing"
-        tail_cmd = "docker logs -f --tail 50 vllm_node"
+        tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
         job.append(f"$ {tail_cmd}")
         timeout = max(model.expected_ready_seconds * 2, 600)
         handle = StreamHandle()
diff --git a/image/app/validate.py b/image/app/validate.py
index 983e267..548c81f 100644
--- a/image/app/validate.py
+++ b/image/app/validate.py
@@ -22,6 +22,7 @@ from typing import Any
 
 from .config import Settings
 from .models import Catalog, build_launch_command
+from .shellsafe import quote_arg
 from .ssh import ssh_run
 
 
@@ -114,7 +115,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
     # Pipe the JSON args list to a here-doc Python invocation. The validator
     # reads from stdin to avoid shell-escaping the args themselves.
     cmd = (
-        f"echo '{payload}' | docker exec -i vllm_node python3 -c "
+        f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
         + shlex.quote(_VALIDATOR_SCRIPT)
     )
 
diff --git a/image/tests/test_topology.py b/image/tests/test_topology.py
new file mode 100644
index 0000000..3e978ba
--- /dev/null
+++ b/image/tests/test_topology.py
@@ -0,0 +1,120 @@
+"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
+extra-vLLM probe. All offline — the disabled checks short-circuit before any
+network call, and the probes are exercised only on the not-configured path.
+"""
+import asyncio
+
+from app.config import Settings
+from app.health import (
+    check_embeddings,
+    check_kokoro,
+    check_parakeet,
+    check_qdrant,
+    check_vllm,
+    probe_vllm_endpoint,
+)
+from app.services import services_from_settings
+
+
+def _settings(monkeypatch, **env) -> Settings:
+    # Pin the topology env vars under test; default the rest to blank so a stray
+    # value in the real environment can't leak into the assertion.
+    keys = [
+        "SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
+        "DISABLED_SERVICES", "VLLM_CONTAINER",
+    ]
+    for k in keys:
+        monkeypatch.delenv(k, raising=False)
+    for k, v in env.items():
+        monkeypatch.setenv(k, v)
+    return Settings.from_env()
+
+
+# ---- DISABLED_SERVICES parsing ----
+
+def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
+    s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
+    assert s.disabled_services == frozenset({"parakeet", "kokoro"})
+
+
+def test_disabled_services_blank_is_empty(monkeypatch):
+    assert _settings(monkeypatch).disabled_services == frozenset()
+
+
+# ---- vLLM container override ----
+
+def test_vllm_container_defaults_to_vllm_node(monkeypatch):
+    assert _settings(monkeypatch).vllm_container == "vllm_node"
+
+
+def test_vllm_container_override(monkeypatch):
+    assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
+
+
+def test_vllm_container_invalid_falls_back(monkeypatch):
+    # A malformed value (space / shell metachar) is rejected at the boundary and
+    # falls back to the default rather than crashing startup or reaching a sink.
+    assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
+
+
+# ---- services map honors the disable list ----
+
+def test_services_from_settings_drops_disabled(monkeypatch):
+    s = _settings(
+        monkeypatch,
+        SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+        SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
+        DISABLED_SERVICES="parakeet,qdrant",
+    )
+    svcs = services_from_settings(s)
+    assert "parakeet" not in svcs and "qdrant" not in svcs
+    assert "kokoro" in svcs and "embeddings" in svcs
+
+
+def test_custom_vllm_service_registered(monkeypatch):
+    from app import custom_services
+    monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
+        {"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
+         "user": "u", "container": "vllm_node", "port": 8000},
+    ])
+    s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+                  SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
+    svc = services_from_settings(s)["vllm-spark2"]
+    assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
+
+
+def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
+    # A custom entry can't shadow a built-in key — the built-in wins.
+    from app import custom_services
+    monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
+        {"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
+    ])
+    s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
+                  SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
+    assert services_from_settings(s)["parakeet"].kind == "stt"
+
+
+# ---- disabled health checks short-circuit (no network) ----
+
+def test_disabled_check_returns_disabled_verdict(monkeypatch):
+    s = _settings(
+        monkeypatch,
+        SPARK2_HOST="10.0.0.2", SPARK2_USER="u",  # host set, but disable wins
+        DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
+    )
+    for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
+        r = asyncio.run(check(s))
+        assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
+
+
+# ---- vLLM probe: not-configured path is pure ----
+
+def test_probe_vllm_endpoint_unconfigured(monkeypatch):
+    r = asyncio.run(probe_vllm_endpoint("", 8000))
+    assert r["ok"] is False and "not configured" in r["error"]
+
+
+def test_check_vllm_unconfigured_without_spark1(monkeypatch):
+    s = _settings(monkeypatch)  # no SPARK1_HOST
+    r = asyncio.run(check_vllm(s))
+    assert r["ok"] is False and "spark1 not configured" in r["error"]
diff --git a/package/startos/actions/configureSparks.ts b/package/startos/actions/configureSparks.ts
index abd8168..64d6610 100644
--- a/package/startos/actions/configureSparks.ts
+++ b/package/startos/actions/configureSparks.ts
@@ -49,6 +49,24 @@ const inputSpec = InputSpec.of({
     placeholder: 'leave blank for 8888',
     masked: false,
   }),
+  vllm_container: Value.text({
+    name: 'vLLM container name (optional)',
+    description:
+      'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
+    required: false,
+    default: null,
+    placeholder: 'leave blank for vllm_node',
+    masked: false,
+  }),
+  disabled_services: Value.text({
+    name: 'Services to hide (optional)',
+    description:
+      "Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
+    required: false,
+    default: null,
+    placeholder: 'e.g. parakeet,kokoro',
+    masked: false,
+  }),
   parakeet_host: Value.text({
     name: 'Parakeet host (optional)',
     description:
diff --git a/package/startos/fileModels/sparkConfig.yaml.ts b/package/startos/fileModels/sparkConfig.yaml.ts
index 85a63b6..a1d1545 100644
--- a/package/startos/fileModels/sparkConfig.yaml.ts
+++ b/package/startos/fileModels/sparkConfig.yaml.ts
@@ -9,6 +9,11 @@ export const sparkConfigSchema = z.object({
   spark2_user: z.string().catch(''),
   // Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
   vllm_port: z.string().catch(''),
+  // Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
+  vllm_container: z.string().catch(''),
+  // Optional comma-separated list of built-in services to switch off
+  // (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
+  disabled_services: z.string().catch(''),
   // Optional per-service overrides. Blank => use spark2_host / spark2_user.
   parakeet_host: z.string().catch(''),
   parakeet_user: z.string().catch(''),
diff --git a/package/startos/main.ts b/package/startos/main.ts
index 9595fa6..96df6c6 100644
--- a/package/startos/main.ts
+++ b/package/startos/main.ts
@@ -14,6 +14,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
     spark2_host: '',
     spark2_user: '',
     vllm_port: '',
+    vllm_container: '',
+    disabled_services: '',
     parakeet_host: '',
     parakeet_user: '',
     parakeet_container: '',
@@ -52,6 +54,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
         SPARK2_HOST: cfg.spark2_host,
         SPARK2_USER: cfg.spark2_user,
         VLLM_PORT: cfg.vllm_port,
+        VLLM_CONTAINER: cfg.vllm_container,
+        DISABLED_SERVICES: cfg.disabled_services,
         PARAKEET_HOST: cfg.parakeet_host,
         PARAKEET_USER: cfg.parakeet_user,
         PARAKEET_CONTAINER: cfg.parakeet_container,
diff --git a/package/startos/versions/v0_1_0.ts b/package/startos/versions/v0_1_0.ts
index 7607d14..9415853 100644
--- a/package/startos/versions/v0_1_0.ts
+++ b/package/startos/versions/v0_1_0.ts
@@ -1,10 +1,10 @@
 import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
 
 export const v0_1_0 = VersionInfo.of({
-  version: '0.23.0:0',
+  version: '0.24.0:0',
   releaseNotes: {
     en_US:
-      "v0.23.0:0 — local / fine-tuned model support. You can now add a model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune), not just a Hugging Face repo. Use the new \"+ Add local model\" button under LLM swap: give it the model's absolute path on the Spark, an optional chat-template path, and the usual launch knobs. On swap, Spark Control bind-mounts that directory into the vLLM container at the same path (via the launch script's existing VLLM_SPARK_EXTRA_DOCKER_ARGS hook — nothing to change on the Spark) and runs `vllm serve <dir>`. Local models show a \"local\" badge and their path instead of a Hugging Face link, and their weights are never offered for dashboard deletion (that directory is your own training output, not a re-downloadable cache). API: POST /api/models now accepts `local_path` (set exactly one of `repo` or `local_path`), validated against a strict path whitelist with no traversal.",
+      "v0.24.0:0 — configurable cluster topology. Spark Control no longer assumes our exact layout, so a cluster that's wired differently can be monitored without forking. Three new optional settings in Configure Sparks: (1) vLLM container name — defaults to \"vllm_node\"; set it if your swappable vLLM runs under a different container name (the swap log view and pre-flight validator exec into it by name). (2) Services to hide — a comma-separated list of built-in services your cluster doesn't run (parakeet, kokoro, embeddings, qdrant); hidden ones show no tile and are never probed, so e.g. a vLLM sharing Parakeet's default port 8000 no longer gets a confusing Parakeet probe. (3) Monitor a second vLLM — register a vLLM on another Spark as a custom service with kind \"vllm\" (in /data/services-overrides.yaml); it gets a read-only health tile (loaded model + container state + start/stop/restart) alongside the swappable one. API: /api/endpoints now reports a `disabled` flag per service.",
   },
   migrations: {
     up: async ({ effects }) => {},
diff --git a/runbook.md b/runbook.md
index 1f3f438..2eeb841 100644
--- a/runbook.md
+++ b/runbook.md
@@ -52,6 +52,26 @@ The **Update** button runs `git fetch && git reset --hard origin/<branch> && doc
 
 3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
 
+## Configurable topology (v0.24.0+)
+
+For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
+
+- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
+- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
+- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
+
+  ```yaml
+  custom:
+    - key: vllm-spark2
+      kind: vllm
+      host: <spark-2-ip>
+      user: <ssh-user>
+      container: vllm_node
+      port: 8000
+  ```
+
+  It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
+
 ## Adding a new model
 
 1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.