Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes), not just Hugging Face repos. - ModelDef gains local_path; a model must set exactly one of repo / local_path. The validator also enforces the local-path whitelist and that any --chat-template lives inside local_path (only that dir is mounted). - build_launch_command bind-mounts the dir into the vLLM container at the SAME host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook, then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream expands that var unquoted; contract noted in runbook.md). - shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'. - POST /api/models validates the full entry via ModelDef before persisting, so a bad entry can't be written and then break catalog load; _merge_overrides skips an invalid override entry instead of failing the whole catalog. - disk.py size-probes a local path with du; disk-delete refused for local models. - UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF link, delete button hidden for local models. - Tests: local launch + injection round-trip, chat-template location, traversal, exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent pass; findings addressed.
11 KiB
AGENTS.md
This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the CLAUDE.md symlink.)
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
Subsystem guidance lives in docs/guides/ and loads when matching files are touched (Claude Code lazy-loads via .claude/rules/ symlinks; other agents read the guides directly): startos-package.md (build/versioning, package/**), fastapi-image.md (dev server/env/layout, image/**), redaction.md (vendoring + test gates), audio-speech.md (parakeet patches, cluster-container footguns, audio testing). Read docs/guides/audio-speech.md before touching the Sparks' containers over SSH — ops sessions don't trip the path scoping.
Inbox check: At session start, if
~/Projects/standards/INBOX.mdexists, scan it for items tagged(spark-control)and surface them before proposing next steps; triage with/triage.
Stack
- Two halves, always coordinated:
image/— standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).package/— StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
- Build host needs
start-cli, Node ≥22 + npm, and Docker. - Cluster runtimes live on the Sparks, not in this repo (
spark-vllm-docker, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP. - Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM
:8888(Spark 1);parakeet-asr:8000, Kokoro TTS:8880, bge-m3 embeddings + Qdrant (Spark 2). Seedocs/for API contracts.
Commands (headlines — details in the scoped rules)
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
Layout
image/app/— FastAPI app (server.pyentry, routers in sibling modules,static/dashboard UI).package/startos/— StartOS manifest, interfaces, actions, version + release notes.docs/—AUDIO_API.md,EMBEDDINGS.md,REDACTION_GATEWAY.md(consumer-facing API refs; update with API changes).README.md(overview),HANDOFF.md(fresh-user install guide),runbook.md(ops notes),known-issues.md,ROADMAP.md(longer-term backlog — items move into "Current state" below when picked up).
Conventions
- Every shipped change = version bump + release notes + rebuilt s9pk (version format
X.Y.Z:N; details in the startos-package rule). - Commit messages:
vX.Y.Z:N - short lowercase summary. Never add a Co-Authored-By / Claude attribution trailer. - The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
- New external-facing endpoints get documented in
docs/and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs). - Doc layout:
AGENTS.mdis the canonical file;CLAUDE.mdis a symlink to it (don't overwrite it). Subsystem guides are real files indocs/guides/<topic>.md(withpaths:frontmatter);.claude/rules/<topic>.mdare relative symlinks into them. A new guide = adddocs/guides/<topic>.md, symlink it from.claude/rules/, and add an index line above.
Always / Never (cluster-wide)
- Always confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
- Always use the Spark's IP for HTTP probes —
.localmDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust.localhostnames inside HTTP client code. - Always pass
SSH_KEY_PATH/-i <key>explicitly in scripted SSH; non-interactive shells have no ssh-agent identities. - Never route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via
/scrubis the only sanctioned path toward frontier models.) - Never commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders. Canonical set:
<spark-1-ip>/<spark-2-ip>,<spark-1-host>/<spark-2-host>,<spark-user>, and generic example names (Alice/Bob). - Never install
cuda-pythoninparakeet-asr— crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.
Current state
- Working (v0.22.0:0, installed and serving): swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant);
/scrub+/rehydrate; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuardVPN <ip>hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence inEVALUATION.md. - matrix-bridge bot tile (done, v0.21.0:1, verified live):
bot-kind service tile — status badge from docker-state only (no HTTP port), plus Update / Restart / Stop/Start / View logs. Code:app/matrix_bridge.py+/api/matrix-bridge/{update,logs}(update streams; 25-min cap; fail-loud). Driven directly asmodeloon Spark 2 (nosudo -iu— spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (matrix_bridge_user); blank → tile hidden (portable). Host reusesspark2_host(192.168.1.87= the bot's boxspark-32d0); container/dir/branch are env-overridable defaults. Load-bearing ops dep: Update'sgit fetchruns asmodelo, which needsmodelo's~/.ssh/configpinning the Gitea deploy key withIdentitiesOnly yes— else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: DockerHEALTHCHECKfor running-but-disconnected detection (spec §Note). - Tests: offline pytest harness in
image/tests/—cd image && .venv/bin/python -m pytest(70 passing). Coversbuild_launch_command(incl. the shell-injection round-trip), the transcript↔diarizer label-merge, theshellsafevalidators, andmatrix_bridge.build_update_command(+ phase detection). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts. - Signal Engine "flakiness": diagnosed as not a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and forwarded to that dev (owner confirmed 2026-06-15). Awaiting whether they want the measured concurrency knee.
- Stance (decided, not built): no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- Known limits:
/healthblips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fastdocker restart(status re-checked only after the command returns). - Infra gotcha (safety): passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses
ip, notsudo wg show). spark2 sits on thestarttunnelWireGuard subnet (10.59.211.6/24, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key name leaked) — don't re-flag. - Hosting: self-hosted Gitea — remote
gitea, branchmaster, over SSH; push after committing. (Wart: commit8d839e3is mislabeledv0.13.0:4but contains through v0.18.0:0.) - Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in
ROADMAP.md→ "Cluster coordination"). Stance: Spark Control = control plane / GPU arbiter, not a job runner; business cron jobs live in separate services that call its swap API (swaps are already API-driven viaPOST /api/swap). Sequence: (1) configurableVLLM_PORT— SHIPPED v0.22.0:0 (Configure-Sparks field, blank ⇒ 8888; +_env_inthardening inconfig.pyso a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed136a471, pushed, taggedv0.22.0, rebuilt clean, installed, and published to the self-hosted Gitea Releases 2026-06-17 (make release→scripts/gitea-release.sh, takesGITEA_URL+ a write token). Distribution model (decided 2026-06-17): Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (GET /api/v1/repos/grant/spark-control/releases/latest→ download the.s9pkasset → sideload). Note: Gitea returnsbrowser_download_urlon its.localROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) local-path/fine-tuned models — DONE in tree, staged as v0.23.0:0 (ModelDef.local_path+ exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script'sVLLM_SPARK_EXTRA_DOCKER_ARGShook, nolaunch-cluster.shchange; "+ Add local model" UI form +localbadge;validate_local_path; disk-delete refused for local; 94 tests pass; verified via TestClient). Reviewer-agent pass done; findings addressed: path validation folded into theModelDefvalidator (so YAML/override-added local models are checked too), a chat-template-must-live-inside-local_pathguard,_merge_overridesskips a bad entry instead of breaking the whole catalog, and theVLLM_SPARK_EXTRA_DOCKER_ARGSunquoted-expansion contract is documented inrunbook.md. Not yet built/installed/published — awaiting go/no-go. Next: (3) configurable topology (service→Spark→port map + container names); (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge DockerHEALTHCHECKif the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).