Files

T

Keysat df9f244eae v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup)

The dashboard menu is now the set of models actually downloaded on the
Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as
launch recipes matched to an on-disk model by repo; an on-disk model with
no recipe is flagged needs_setup and its launch settings are inferred from
its config.json for a one-time operator confirmation (discovery.py).

- delete now removes weights AND the menu card (delete_from_disk sweeps all
  hosts; the delete endpoint resolves keys via the live menu)
- new GET /api/models/suggest; /api/models returns the menu + a recipes list
  (download autocomplete); GET /api/models/disk-status removed
- dropped the two legacy Qwen recipes (235B FP8, 2.5 72B)
- tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)

2026-06-18 11:09:56 -05:00

12 KiB

Raw Permalink Blame History

AGENTS.md

This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the CLAUDE.md symlink.)

Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.

Subsystem guidance lives in docs/guides/ and loads when matching files are touched (Claude Code lazy-loads via .claude/rules/ symlinks; other agents read the guides directly): startos-package.md (build/versioning, package/**), fastapi-image.md (dev server/env/layout, image/**), redaction.md (vendoring + test gates), audio-speech.md (parakeet patches, cluster-container footguns, audio testing). Read docs/guides/audio-speech.md before touching the Sparks' containers over SSH — ops sessions don't trip the path scoping.

Inbox check: At session start, if ~/Projects/standards/INBOX.md exists, scan it for items tagged (spark-control) and surface them before proposing next steps; triage with /triage.

Stack

Two halves, always coordinated:
- image/ — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
- package/ — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
Build host needs start-cli, Node ≥22 + npm, and Docker.
Cluster runtimes live on the Sparks, not in this repo (spark-vllm-docker, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM :8888 (Spark 1); parakeet-asr :8000, Kokoro TTS :8880, bge-m3 embeddings + Qdrant (Spark 2). See docs/ for API contracts.

Commands (headlines — details in the scoped rules)

(cd package && make x86)                                  # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999)          # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m pytest)                          # offline unit suite (launch-cmd injection, label-merge)
(cd image && .venv/bin/python -m app.redaction.test_gateway)      # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py)   # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file>        # e2e audio — hits the LIVE cluster

Layout

image/app/ — FastAPI app (server.py entry, routers in sibling modules, static/ dashboard UI).
package/startos/ — StartOS manifest, interfaces, actions, version + release notes.
docs/ — AUDIO_API.md, EMBEDDINGS.md, REDACTION_GATEWAY.md, COORDINATION.md (consumer-facing API refs; update with API changes).
README.md (overview), HANDOFF.md (fresh-user install guide), runbook.md (ops notes), known-issues.md, ROADMAP.md (longer-term backlog — items move into "Current state" below when picked up).

Conventions

Every shipped change = version bump + release notes + rebuilt s9pk (version format X.Y.Z:N; details in the startos-package rule).
Commit messages: vX.Y.Z:N - short lowercase summary. Never add a Co-Authored-By / Claude attribution trailer.
The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
New external-facing endpoints get documented in docs/ and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
Doc layout: AGENTS.md is the canonical file; CLAUDE.md is a symlink to it (don't overwrite it). Subsystem guides are real files in docs/guides/<topic>.md (with paths: frontmatter); .claude/rules/<topic>.md are relative symlinks into them. A new guide = add docs/guides/<topic>.md, symlink it from .claude/rules/, and add an index line above.

Always / Never (cluster-wide)

Always confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
Always use the Spark's IP for HTTP probes — .local mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust .local hostnames inside HTTP client code.
Always pass SSH_KEY_PATH / -i <key> explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
Never route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via /scrub is the only sanctioned path toward frontier models.)
Never commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders. Canonical set: <spark-1-ip> / <spark-2-ip>, <spark-1-host> / <spark-2-host>, <spark-user>, and generic example names (Alice/Bob).
Never install cuda-python in parakeet-asr — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.

Current state

Built, install pending: v0.26.0:0 — disk-driven model menu. The dashboard now lists what's actually downloaded on the Sparks instead of a hard-coded catalog. models.yaml + overrides are reframed as launch recipes matched to an on-disk model by repo (no longer "the menu"); image/app/discovery.py does the merge: build_menu scans both Sparks (disk.list_cached_models, one du per host) ∪ recipes; an on-disk model with no recipe is needs_setup and infer_recipe reads its config.json to prefill a one-time setup form (operator confirms; saved to /data overrides). Delete now removes weights and the card (delete_from_disk sweeps all hosts; the delete endpoint resolves keys via the live menu so discovered models are deletable). New GET /api/models/suggest; /api/models returns the menu + a recipes list (download-box autocomplete); GET /api/models/disk-status removed (folded into /api/models). Dropped the two legacy Qwen recipes (235B FP8, 2.5 72B). Build/typecheck clean; install (live-service restart) needs go/no-go. Why a recipe layer survives a "menu = disk" redesign: a folder can't tell you parsers / solo-vs-cluster / MoE backend (Gemma MoE needs marlin on GB10) — disk drives presence, recipes drive launch.
Live: v0.25.0:0 (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable VLLM_PORT (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — VLLM_CONTAINER, DISABLED_SERVICES hide-list, second-Spark kind: vllm monitor), coordination layer (v0.25 — swap reservation lock with 423-enforced manual-swap pause + ?force=true Release override, swap_complete/swap_failed webhook, read-only schedule registry; consumer API in docs/COORDINATION.md).
Other live features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); /scrub + /rehydrate; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard VPN <ip> hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (EVALUATION.md). Spark 2 audio/embeddings stack healthy.
matrix-bridge bot tile (v0.21.0:1, live): bot-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as modelo (no sudo -iu; blank matrix_bridge_user ⇒ tile hidden; host reuses spark2_host). Code: app/matrix_bridge.py + /api/matrix-bridge/{update,logs}. Load-bearing: Update's git fetch runs as modelo and needs modelo's ~/.ssh/config pinning the Gitea deploy key with IdentitiesOnly yes (else publickey denial). Optional next only if the bot dev asks: Docker HEALTHCHECK.
Tests: offline pytest harness in image/tests/ — cd image && .venv/bin/python -m pytest (137 passing). Covers build_launch_command (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the shellsafe validators, matrix_bridge.build_update_command (+ phase detection), the configurable-topology layer (test_topology.py), the coordination layer (test_coordination.py: swap-lock lifecycle/expiry/token-auth, schedule-registry CRUD, webhook payload + HMAC signature — now is injected into the lock so expiry is tested without sleeping), and the disk-driven menu (test_discovery.py: cache-dirname↔repo parsing, the cache-listing parser incl. incomplete-download filtering, and infer_recipe family/mode mapping — Qwen3-MoE→flashinfer_cutlass, Gemma-MoE→marlin, vision caps, solo-vs-cluster by size/host-count). The build_menu merge + /api/models/suggest are exercised by hand against the live cluster (mock-heavy unit tests there would test the mocks). Redaction + live-audio suites remain standalone scripts.
Signal Engine "flakiness": diagnosed as not a server bug — transient 1–4s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and forwarded to that dev (owner confirmed 2026-06-15). Awaiting whether they want the measured concurrency knee.
Stance (decided, not built): no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
Known limits: /health blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast docker restart (status re-checked only after the command returns).
Infra gotcha (safety): passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses ip, not sudo wg show). spark2 sits on the starttunnel WireGuard subnet (10.59.211.6/24, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key name leaked) — don't re-flag.
Hosting: self-hosted Gitea — remote gitea, branch master, over SSH; push after committing. (Wart: commit 8d839e3 is mislabeled v0.13.0:4 but contains through v0.18.0:0.)
Design stance (decided): Spark Control = control plane / GPU arbiter, not a job runner; recurring business jobs live in separate services that call the swap API (POST /api/swap). Full epic history (v0.22→v0.25) is in git log + ROADMAP.md → "Cluster coordination".
Usage note (2026-06-18): owner's daily driver is the solo Qwen3.6 35B; the 235B cluster models are dormant. Keeping launch-cluster.sh (the eugr/spark-vllm-docker community standard, mirrors NVIDIA's dgx-spark-playbooks Ray+RoCE design) is still correct even single-node — it supplies the maintained, hardware-tuned vLLM images; raw docker would mean DIY image upkeep for no gain. Spark 2 stays the speech/embeddings box regardless.
Next steps (all low-priority / externally gated; P2/P3 tech-debt backlog in ROADMAP.md): (1) raw-docker run swap generalization — DEFERRED (rationale in ROADMAP; revisit only if an adopter wants Spark Control to drive, not just monitor, raw-docker swaps — cleanest fix is the adopter adopting launch-cluster.sh). (2) audio concurrency knee — only if the Signal Engine dev wants it (needs a quiet window). (3) matrix-bridge Docker HEALTHCHECK — only if the bot dev asks. (4) Parakeet long-audio guard — deferred (rationale in ROADMAP).

12 KiB Raw Permalink Blame History Unescape Escape