Files
spark-control/AGENTS.md
T
Keysat 6a6112a15f restructure: AGENTS.md canonical + docs/guides with .claude/rules symlinks
Rename CLAUDE.md -> AGENTS.md (cross-vendor standard) with a relative
CLAUDE.md symlink so Claude Code still loads it. Move each .claude/rules
file into docs/guides/ (paths: frontmatter preserved) and replace the
rules file with a relative symlink into the guide. Repoint the AGENTS.md
index paragraph at docs/guides/ so non-Claude agents find the guides.
2026-06-12 14:27:17 -05:00

6.0 KiB
Raw Blame History

AGENTS.md

This file provides guidance to coding agents (Claude Code and others) when working with code in this repository. (Claude Code reads it via the CLAUDE.md symlink.)

Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.

Subsystem guidance lives in docs/guides/ and loads when matching files are touched (Claude Code lazy-loads via .claude/rules/ symlinks; other agents read the guides directly): startos-package.md (build/versioning, package/**), fastapi-image.md (dev server/env/layout, image/**), redaction.md (vendoring + test gates), audio-speech.md (parakeet patches, cluster-container footguns, audio testing). Read docs/guides/audio-speech.md before touching the Sparks' containers over SSH — ops sessions don't trip the path scoping.

Stack

  • Two halves, always coordinated:
    • image/ — standalone FastAPI app (Python ≥3.11; UI on port 9999; vanilla HTML/CSS/JS).
    • package/ — StartOS 0.4 wrapper (TypeScript) that ships the Docker image as an s9pk.
  • Build host needs start-cli, Node ≥22 + npm, and Docker.
  • Cluster runtimes live on the Sparks, not in this repo (spark-vllm-docker, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
  • Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM :8888 (Spark 1); parakeet-asr :8000, Kokoro TTS :8880, bge-m3 embeddings + Qdrant (Spark 2). See docs/ for API contracts.

Commands (headlines — details in the scoped rules)

(cd package && make x86)                                  # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999)          # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m app.redaction.test_gateway)      # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py)   # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file>        # e2e audio — hits the LIVE cluster

Layout

  • image/app/ — FastAPI app (server.py entry, routers in sibling modules, static/ dashboard UI).
  • package/startos/ — StartOS manifest, interfaces, actions, version + release notes.
  • docs/AUDIO_API.md, EMBEDDINGS.md, REDACTION_GATEWAY.md (consumer-facing API refs; update with API changes).
  • README.md (overview), HANDOFF.md (fresh-user install guide), runbook.md (ops notes), known-issues.md, ROADMAP.md (longer-term backlog — items move into "Current state" below when picked up).

Conventions

  • Every shipped change = version bump + release notes + rebuilt s9pk (version format X.Y.Z:N; details in the startos-package rule).
  • Commit messages: vX.Y.Z:N - short lowercase summary. Never add a Co-Authored-By / Claude attribution trailer.
  • The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
  • New external-facing endpoints get documented in docs/ and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).

Always / Never (cluster-wide)

  • Always confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
  • Always use the Spark's IP for HTTP probes — .local mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only). Never trust .local hostnames inside HTTP client code.
  • Always pass SSH_KEY_PATH / -i <key> explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
  • Never route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via /scrub is the only sanctioned path toward frontier models.)
  • Never commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders (<spark-1-ip> style).
  • Never install cuda-python in parakeet-asr — crashes real decode on this GPU/CUDA-13 stack; full story in the audio-speech rule.

Current state

  • Working (v0.18.0:0, installed and serving): swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); /scrub + /rehydrate; label-merge incl. dual-channel mode. Spark 2 audio stack is healthy (11k+ requests/12h, all 200).
  • In progress — Signal Engine "flakiness": diagnosed, not a server bug — transient 14s unresponsiveness while the single GPU is continuously busy. Remedy is client-side; a drafted message (in-flight cap 2, hard ceiling 3 global across audio endpoints, retry-with-backoff on timeout/503) is with the owner to forward to that dev.
  • Decided, not implemented: remote access stays WireGuard/Tailscale split-tunnel — no public interface, so no API auth built; an empirical concurrency sweep is offered but needs the owner's explicit OK in a quiet window.
  • Known limits: /health blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
  • Repo wart: commit 367d986 is labeled v0.13.0:4 but actually contains everything through v0.18.0:0 — per-version commits for v0.14v0.18 are missing. Keep commit messages accurate going forward.
  • Hosting: repo pushes to the owner's self-hosted Gitea — remote gitea, branch master, over SSH (host alias + key live in the local ~/.ssh/config; no owner-specific details belong in the repo). Push there after committing.
  • Next: (1) owner forwards the concurrency note to the Signal Engine dev; (2) run the concurrency sweep if the dev wants the measured knee; (3) add the --memory cap to parakeet-asr via the Reapply-patches action; (4) pick the next item from ROADMAP.md.