Files
spark-control/AGENTS.md
T
Keysat 7e0759846f v0.27.0:0 - in-app settings gear + swap-lock route fix
Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks"
action (now just the 4 required fields) and into a dashboard ⚙ Settings gear,
backed by a /data/app_settings.json overlay keyed by env-var names. One shared
mutable Settings instance + Settings.reload() applies edits live without a
restart; existing installs' values migrate automatically on first boot.

Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now
configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the
/api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its
url/secret reload live too.
2026-06-18 13:41:28 -05:00

73 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md
This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the `CLAUDE.md` symlink.)
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
Subsystem guidance lives in `docs/guides/` and loads when matching files are touched (Claude Code lazy-loads via `.claude/rules/` symlinks; other agents read the guides directly): `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `docs/guides/audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(spark-control)` and surface them before proposing next steps; triage with `/triage`.
## Stack
- Two halves, always coordinated:
- `image/` — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
- `package/` — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
- Build host needs `start-cli`, Node ≥22 + npm, and Docker.
- Cluster runtimes live **on the Sparks, not in this repo** (`spark-vllm-docker`, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
- Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM `:8888` (Spark 1); `parakeet-asr` `:8000`, Kokoro TTS `:8880`, bge-m3 embeddings + Qdrant (Spark 2). See `docs/` for API contracts.
## Commands (headlines — details in the scoped rules)
```bash
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
```
## Layout
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
- `docs/``AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`, `COORDINATION.md` (consumer-facing API refs; update with API changes).
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
## Conventions
- Every shipped change = version bump + release notes + rebuilt s9pk (version format `X.Y.Z:N`; details in the startos-package rule).
- Commit messages: `vX.Y.Z:N - short lowercase summary`. **Never add a Co-Authored-By / Claude attribution trailer.**
- The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
- New external-facing endpoints get documented in `docs/` and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
- Doc layout: `AGENTS.md` is the canonical file; `CLAUDE.md` is a symlink to it (don't overwrite it). Subsystem guides are real files in `docs/guides/<topic>.md` (with `paths:` frontmatter); `.claude/rules/<topic>.md` are relative symlinks into them. A new guide = add `docs/guides/<topic>.md`, symlink it from `.claude/rules/`, and add an index line above.
## Always / Never (cluster-wide)
- **Always** confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
- **Always** use the Spark's **IP** for HTTP probes — `.local` mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust `.local` hostnames inside HTTP client code.
- **Always** pass `SSH_KEY_PATH` / `-i <key>` explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
- **Never** route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via `/scrub` is the only sanctioned path toward frontier models.)
- **Never** commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders. Canonical set: `<spark-1-ip>` / `<spark-2-ip>`, `<spark-1-host>` / `<spark-2-host>`, `<spark-user>`, and generic example names (`Alice`/`Bob`).
- **Never** install `cuda-python` in `parakeet-asr` — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.
## Current state
- **Built, pending review+install: v0.27.0:0 — in-app Settings gear + two bug fixes** (prompted by the second adopter's v0.25 feedback). (1) The StartOS "Configure Sparks" action is trimmed to the **four required fields** (both Spark IPs + SSH users); every optional knob now lives behind a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names and overlaid on `os.environ`. Edits apply **live** — one shared mutable `Settings` instance, `Settings.reload()` mutates it in place (architecture in the fastapi-image guide). Existing installs' optional values **migrate automatically** on first boot (`seed_from_env`, nothing lost). (2) **NEW: support-service ports are configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` was already a knob, now surfaced in the gear) — fixes the adopter's false "vLLM down" (their vLLM is on 8000, not the launch-cluster.sh default 8888) and Parakeet 404 (they remapped it off 8000). (3) **Bug fix:** `GET /api/swap/lock` returned 404 — it was shadowed by `GET /api/swap/{job_id}`; the static lock routes now register first (load-bearing comment added). Tested locally (151 pytest + live smoke: live-apply, secret masking, route fix, 422 on bad input). Still **un-diagnosed** from the adopter report: their disk-scan shows Gemma "not on disk" — needs them to confirm where it's cached (`ls ~/.cache/huggingface/hub` as the SSH user) vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption. **Next: act on the code review, then build/install (go/no-go) + draft the adopter reply describing the fixes.**
- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
- **Next (owner-driven, concrete): Gemma-4-26B-A4B vision daily-driver eval.** The `gemma4-26b` recipe is in the catalog (NVFP4 MoE; `--moe_backend=marlin` set — the fast CUTLASS FP4 path errors on GB10; vision+tools). Not yet downloaded or swap-tested. Owner wants vision for business-card OCR and is weighing it against the text-only Qwen3.6 35B daily driver (research: Gemma ~52 tok/s vs Qwen's ~97, slightly weaker reasoning). Next: download it, swap-test, try a business card.
- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (151 passing; the in-app settings gear + swap-lock route-order regression are in `test_app_settings.py`, incl. a `TestClient` live-apply check). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), the configurable-topology layer (`test_topology.py`), the coordination layer (`test_coordination.py`: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — `now` is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (`test_discovery.py`: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and `infer_recipe` family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The `build_menu` merge + `/api/models/suggest` are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting / distribution:** source on self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) The s9pk ships via Gitea Releases (`make release`) **and** a self-hosted StartOS registry — operator-local publish tooling lives outside the repo; owner-specific addresses + the **authenticated-writes-must-be-direct-not-via-the-tunnel** gotcha are in session memory.
- **Design stance (decided):** Spark Control = control plane / GPU arbiter, **not** a job runner; recurring business jobs live in separate services that *call* the swap API (`POST /api/swap`). Full epic history (v0.22→v0.25) is in git log + `ROADMAP.md` → "Cluster coordination".
- **Usage note (2026-06-18):** owner's daily driver is the solo **Qwen3.6 35B**; the 235B `cluster` models are dormant. Keeping `launch-cluster.sh` (the `eugr/spark-vllm-docker` community standard, mirrors NVIDIA's `dgx-spark-playbooks` Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
- **Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in `ROADMAP.md`):** (1) raw-`docker run` swap generalization — **DEFERRED** (rationale in ROADMAP; revisit only if an adopter wants Spark Control to *drive*, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting `launch-cluster.sh`). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker `HEALTHCHECK` — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).