Make the cluster topology configurable so an adopter wired differently
(vLLM on both Sparks, port 8000, different container name, no Parakeet)
can monitor without forking. Covers the OpenClaw report P4/P5/#6.
- VLLM_CONTAINER override (default vllm_node), validated at the boundary
and quote_arg-quoted into the swap log-tail + pre-flight validator exec.
- DISABLED_SERVICES list: hidden services show no tile and are skipped by
status/deep-health/connectivity probes (kills the Parakeet-on-8000
collision).
- kind: vllm custom service monitors a second Spark's vLLM via the shared
probe_vllm_endpoint; /api/endpoints gains a disabled flag.
Swap mechanism intentionally not generalized to raw docker run (that's
coordination, roadmap item 4).
Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.
- ModelDef gains local_path; a model must set exactly one of repo / local_path.
The validator also enforces the local-path whitelist and that any
--chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
bad entry can't be written and then break catalog load; _merge_overrides skips
an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
pass; findings addressed.
- Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh
default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env
-> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix
the "vLLM unreachable" health check without rebuilding the package.
- Harden numeric env parsing (config._env_int): a blank or malformed port now
falls back to its default instead of crashing daemon startup (closes a P3
tech-debt item; the Configure panel passes unset optional fields as "").
- Add scripts/gitea-release.sh + `make release` to publish the built s9pk to
Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token
instead of being hand-sent the package.
- Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a
job runner" stance in ROADMAP.md and Current state.
- AGENTS.md: rewrite Current state lean for v0.19.0:0; drop the now-completed
full-eval triage block (history lives in git log + EVALUATION.md).
- docs/guides/fastapi-image.md: add two durable conventions — user values
crossing into SSH must go through shellsafe; new endpoints and the
csrf_guard exempt-prefix rule.
- ROADMAP.md: park the remaining non-blocking P2/P3 tech debt from the eval.