docs: v0.27.0:0 live + shipped; record settings-gear architecture + snapshot-holder gotcha

2026-06-18 13:51:11 -05:00
parent 7e0759846f
commit a20c538ebf
2 changed files with 4 additions and 3 deletions
@@ -55,13 +55,14 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
 ## Current state
- **Built, pending review+install: v0.27.0:0 — in-app Settings gear + two bug fixes** (prompted by the second adopter's v0.25 feedback). (1) The StartOS "Configure Sparks" action is trimmed to the **four required fields** (both Spark IPs + SSH users); every optional knob now lives behind a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names and overlaid on `os.environ`. Edits apply **live** — one shared mutable `Settings` instance, `Settings.reload()` mutates it in place (architecture in the fastapi-image guide). Existing installs' optional values **migrate automatically** on first boot (`seed_from_env`, nothing lost). (2) **NEW: support-service ports are configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` was already a knob, now surfaced in the gear) — fixes the adopter's false "vLLM down" (their vLLM is on 8000, not the launch-cluster.sh default 8888) and Parakeet 404 (they remapped it off 8000). (3) **Bug fix:** `GET /api/swap/lock` returned 404 — it was shadowed by `GET /api/swap/{job_id}`; the static lock routes now register first (load-bearing comment added). Tested locally (151 pytest + live smoke: live-apply, secret masking, route fix, 422 on bad input). Still **un-diagnosed** from the adopter report: their disk-scan shows Gemma "not on disk" — needs them to confirm where it's cached (`ls ~/.cache/huggingface/hub` as the SSH user) vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption. **Next: act on the code review, then build/install (go/no-go) + draft the adopter reply describing the fixes.**
+- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
 - **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
 - **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
 - **Next (owner-driven, concrete): Gemma-4-26B-A4B vision daily-driver eval.** The `gemma4-26b` recipe is in the catalog (NVFP4 MoE; `--moe_backend=marlin` set — the fast CUTLASS FP4 path errors on GB10; vision+tools). Not yet downloaded or swap-tested. Owner wants vision for business-card OCR and is weighing it against the text-only Qwen3.6 35B daily driver (research: Gemma ~52 tok/s vs Qwen's ~97, slightly weaker reasoning). Next: download it, swap-test, try a business card.
 - **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
 - **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
 - **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (151 passing; the in-app settings gear + swap-lock route-order regression are in `test_app_settings.py`, incl. a `TestClient` live-apply check). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
+- **Tests:** offline pytest harness in `image/tests/` — `cd image && .venv/bin/python -m pytest` (157 passing; the in-app settings gear + swap-lock route-order regression + the webhook-repoint live-reload check are in `test_app_settings.py`, incl. `TestClient` end-to-end). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
 - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
 - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
 - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
@@ -35,7 +35,7 @@ Two kinds, both run with the `image/.venv` interpreter (system python3 has no de
 - New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
 - **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
 - **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
- **Settings split (gear vs StartOS action):** only the four *required* fields (both Spark IPs + SSH users) live in the StartOS "Configure Sparks" action → `config.yaml` → env. Every *optional* knob (ports, container names, support-service hosts, integrations, webhook) is edited in the dashboard's ⚙ Settings gear, backed by the `/data/app_settings.json` overlay (`app_settings.py`), keyed by the same env-var names. Precedence (`config._effective_env`): `os.environ` first, overlay on top. `app_settings.seed_from_env` runs **once at startup** to migrate a pre-gear install's env values into the overlay (don't move seeding into `from_env`/`reload` — it writes, and `from_env` runs on every build → it would clobber across calls, which it did once already). **`Settings` is deliberately not frozen:** one shared instance is threaded by reference into every router closure/manager, and `Settings.reload()` (called after a gear save) recomputes its fields **in place** so changes apply live with no restart and no call-site changes. A new gear knob = add one entry to `app_settings.FIELDS` (the front-end renders it generically); the matching `config.Settings` field must already read that env var.
+- **Settings split (gear vs StartOS action):** only the four *required* fields (both Spark IPs + SSH users) live in the StartOS "Configure Sparks" action → `config.yaml` → env. Every *optional* knob (ports, container names, support-service hosts, integrations, webhook) is edited in the dashboard's ⚙ Settings gear, backed by the `/data/app_settings.json` overlay (`app_settings.py`), keyed by the same env-var names. Precedence (`config._effective_env`): `os.environ` first, overlay on top. `app_settings.seed_from_env` runs **once at startup** to migrate a pre-gear install's env values into the overlay (don't move seeding into `from_env`/`reload` — it writes, and `from_env` runs on every build → it would clobber across calls, which it did once already). **`Settings` is deliberately not frozen:** one shared instance is threaded by reference into every router closure/manager, and `Settings.reload()` (called after a gear save) recomputes its fields **in place** so changes apply live with no restart and no call-site changes. **Gotcha:** this only reaches holders that keep the *object* (`self.settings = settings`); anything that snapshots a *value* at construction is invisible to `reload()` and must be re-synced explicitly. The one such holder is `WebhookNotifier`, which copies `url`/`secret` — `post_settings` calls `swap_webhook.update(...)` right after `reload()`. Any future component that caches a gear-managed value (rather than reading `settings.x` at use time) needs the same treatment. A new gear knob = add one entry to `app_settings.FIELDS` (the front-end renders it generically); the matching `config.Settings` field must already read that env var.
 ## Layout