Files
spark-control/AGENTS.md
T

68 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md
This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the `CLAUDE.md` symlink.)
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
Subsystem guidance lives in `docs/guides/` and loads when matching files are touched (Claude Code lazy-loads via `.claude/rules/` symlinks; other agents read the guides directly): `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `docs/guides/audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(spark-control)` and surface them before proposing next steps; triage with `/triage`.
## Stack
- Two halves, always coordinated:
- `image/` — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
- `package/` — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
- Build host needs `start-cli`, Node ≥22 + npm, and Docker.
- Cluster runtimes live **on the Sparks, not in this repo** (`spark-vllm-docker`, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
- Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM `:8888` (Spark 1); `parakeet-asr` `:8000`, Kokoro TTS `:8880`, bge-m3 embeddings + Qdrant (Spark 2). See `docs/` for API contracts.
## Commands (headlines — details in the scoped rules)
```bash
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
```
## Layout
- `image/app/` — FastAPI app (`server.py` entry, routers in sibling modules, `static/` dashboard UI).
- `package/startos/` — StartOS manifest, interfaces, actions, version + release notes.
- `docs/``AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md` (consumer-facing API refs; update with API changes).
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`, `ROADMAP.md` (longer-term backlog — items move into "Current state" below when picked up).
## Conventions
- Every shipped change = version bump + release notes + rebuilt s9pk (version format `X.Y.Z:N`; details in the startos-package rule).
- Commit messages: `vX.Y.Z:N - short lowercase summary`. **Never add a Co-Authored-By / Claude attribution trailer.**
- The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
- New external-facing endpoints get documented in `docs/` and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
- Doc layout: `AGENTS.md` is the canonical file; `CLAUDE.md` is a symlink to it (don't overwrite it). Subsystem guides are real files in `docs/guides/<topic>.md` (with `paths:` frontmatter); `.claude/rules/<topic>.md` are relative symlinks into them. A new guide = add `docs/guides/<topic>.md`, symlink it from `.claude/rules/`, and add an index line above.
## Always / Never (cluster-wide)
- **Always** confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
- **Always** use the Spark's **IP** for HTTP probes — `.local` mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust `.local` hostnames inside HTTP client code.
- **Always** pass `SSH_KEY_PATH` / `-i <key>` explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
- **Never** route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via `/scrub` is the only sanctioned path toward frontier models.)
- **Never** commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders. Canonical set: `<spark-1-ip>` / `<spark-2-ip>`, `<spark-1-host>` / `<spark-2-host>`, `<spark-user>`, and generic example names (`Alice`/`Bob`).
- **Never** install `cuda-python` in `parakeet-asr` — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.
## Current state
- **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack healthy (11k+ requests/12h, all 200).
- **Security hardening shipped (v0.19.0:0, 2026-06-12):** closed an SSH command-injection path (`shellsafe.py` validates + `shlex.quote`s every user value crossing into a Spark command), a Qdrant collection path-injection, and added a same-origin (CSRF) guard on control endpoints (proxy/data API exempt, consumers unaffected). Full evidence in `EVALUATION.md`; remaining non-blocking P2/P3 debt now lives in `ROADMAP.md`.
- **Git history scrubbed (2026-06-12):** owner-specific IPs/hosts/user/key-name/personal-names purged from all commits/tags/messages via `git filter-repo`, force-pushed to `gitea` (every SHA changed); 0 hits across all refs. Pre-rewrite backup bundle: `../spark-control-prehistory-rewrite.bundle`. Owner declined SSH-key rotation (only the key *name* leaked, never the material) — don't re-flag.- **Shipped — Spark connectivity helpers (v0.20.0:0, built + installed 2026-06-15):** two read-mostly hardware-card additions. (a) **SSH-key copy:** small copy icon top-right of each reachable card → `POST /api/spark/{name}/ssh-key` (generate-if-missing + return the Spark's *outbound* pubkey; non-destructive; CSRF-guarded; no request input reaches the command so no shellsafe). UI pops `#sshkey-dialog` (key + paste-on-Mac one-liner) since plain-HTTP blocks `navigator.clipboard`. Opposite direction from the StartOS `showPublicKey` action (that grants the *dashboard* access to the Sparks). (b) **WireGuard status badge:** the `hardware.py` probe now also reports `wg_iface`/`wg_addr` via unprivileged `ip -o link show type wireguard` (no root/sudo, ends in a pipe to awk so it can't trip the probe's `set -e`); `renderHardware` shows a `VPN <ip>` badge in the meta line when a tunnel is up. Reflects interface presence, not live peer reachability (true handshake age would need `sudo wg show`). Verified: clean `make x86` + `start-cli package install` exit 0, the real `ip ... type wireguard` output on spark2 matches the parser, and — **confirmed in-browser** — the SSH-key icon works. That also closes the long-open v0.19.0 question: the same-origin CSRF guard does NOT false-block control endpoints behind the StartOS proxy (the SSH-key POST goes through it). The `VPN 10.59.211.6` badge render is confirmed in-browser too — feature fully verified.
- **spark2 joined the `starttunnel` WireGuard subnet (2026-06-15):** config installed at `/etc/wireguard/starttunnel.conf`, interface `starttunnel` up at `10.59.211.6/24`, `wg-quick@starttunnel` enabled (survives reboot). Split tunnel (`AllowedIPs = 10.59.211.0/24`) so the Spark keeps its LAN route — the dashboard's SSH is unaffected. Purpose: let a bot on spark2 reach the owner's Mac off-LAN. **Finding:** passwordless sudo is NOT configured on spark2 (`sudo wg show` → "a password is required") — the earlier assumption was wrong; harmless here since the badge is sudo-free, but note it before designing any dashboard feature that needs root on a Spark.
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 14s unresponsiveness while the single GPU is continuously busy. Client-side remedy drafted (in-flight cap 2, hard ceiling 3 across audio endpoints, retry-with-backoff on timeout/503), with the owner to forward to that dev.
- **Decided, not implemented:** no public interface / no API token auth — LAN + WireGuard/Tailscale split-tunnel only (the CSRF guard now covers the browser-driven vector). An empirical audio concurrency sweep is offered but needs the owner's OK in a quiet window.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
- **Repo wart:** commit `8d839e3` (was `367d986` pre-rewrite) is labeled `v0.13.0:4` but contains everything through v0.18.0:0 — per-version commits for v0.14v0.18 don't exist. Keep commit messages accurate.
- **Hosting:** pushes to the owner's self-hosted Gitea — remote `gitea`, branch `master`, over SSH. Push after committing.
- **Next:** (1) owner forwards the concurrency note to the Signal Engine dev; (2) concurrency sweep if the dev wants the measured knee; (3) parakeet-asr `--memory` cap via Reapply-patches; (4) start the `ROADMAP.md` tech-debt list (a pytest harness first).