Files
spark-control/CLAUDE.md
T

102 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster: one-click vLLM model swaps, plus health, proxying, and APIs for speech (STT/diarization/TTS), embeddings, and redaction.
## Stack
- Two halves, always coordinated:
- `image/` — standalone FastAPI app (Python ≥3.11; ships on `python:3.12-slim`; UI on port 9999; vanilla HTML/CSS/JS, no framework).
- `package/` — StartOS 0.4 wrapper (TypeScript, `@start9labs/start-sdk` pinned `1.3.3`, Node ≥22, bundled by `@vercel/ncc`).
- Build host needs `start-cli`, Node ≥22 + npm, and Docker (the s9pk embeds the Docker image).
- Cluster runtimes live **on the Sparks, not in this repo** (`spark-vllm-docker`, the parakeet/kokoro/embeddings containers). This repo is the controller; it reaches them over SSH + HTTP.
- Sparks are ARM64 (GB10 Grace-Blackwell, sm_121, CUDA 13). Services: vLLM `:8888` (Spark 1); `parakeet-asr` `:8000` (Parakeet TDT 0.6B v3 + Sortformer diarizer + TitaNet voiceprints), Kokoro TTS `:8880`, bge-m3 embeddings + Qdrant (Spark 2). See `docs/` for API contracts.
## Commands
### Build & deploy the s9pk
```bash
cd package
npm i # one-time
make x86 # typecheck + ncc bundle + docker build + pack → spark-control_x86_64.s9pk
make install # sideload to the Start9 server; needs "host: http://<server>.local" in ~/.startos/config.yaml
```
`make aarch64` for ARM Start9 servers. `make install` picks the newest `*.s9pk` in `package/`.
### Local dev (FastAPI only, no StartOS)
```bash
cd image
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
export SPARK1_HOST=<ip> SPARK1_USER=<user> SPARK2_HOST=<ip> SPARK2_USER=<user> SSH_KEY_PATH=<private-key>
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
```
Other env vars: `BIND_PORT`, `MODELS_YAML`, `SSH_DIR`, `SSH_KNOWN_HOSTS`.
### Tests
No pytest harness — each suite is a standalone script (that *is* how you run a single test):
```bash
cd image
python3 -m app.redaction.test_gateway # /scrub + /rehydrate acceptance; offline, no cluster needed
python3 app/redaction/test_scrub_leak.py # vendored golden-file leak test; offline
./scripts/test-audio-with-speakers.sh # (from repo root) end-to-end audio pipeline — hits the LIVE cluster
```
Both Python suites must pass before shipping anything touching redaction.
### Typecheck / format (TypeScript)
```bash
cd package
npm run check # tsc --noEmit — run after any startos/ edit; make x86 also runs it
npm run prettier # prettier --write startos (no semicolons, single quotes, trailing commas)
```
Python has no configured linter/formatter — match the style of the file you're editing.
## Layout
- `image/app/server.py` — FastAPI entry; routers live in sibling modules (`audio_proxy.py`, `llm_proxy.py`, `embeddings_proxy.py`, `redaction_gateway.py`, `swap.py`, `health.py`, `deep_health.py`, `connectivity.py`, …).
- `image/app/static/` — the dashboard UI.
- `image/models.yaml` — vLLM model catalog bundled into the image.
- `image/parakeet_patches/` — overlay (`main.py`, `diarizer.py`) copied into the `parakeet-asr` container on Spark 2 by the "Reapply speech-model patches" action. The **only** durable way to change that container.
- `image/app/redaction/``scrub.py` + `test_scrub_leak.py` vendored byte-for-byte from the CRM repo (sha in `__init__.py`). The gateway around it is `redaction_gateway.py`.
- `image/spark_embed/` — Dockerfile + app for the embeddings container; built ON a Spark (ARM64, NGC PyTorch base).
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
- `docs/``AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md` (consumer-facing API refs; update them with API changes).
- `README.md` (overview), `HANDOFF.md` (fresh-user install guide), `runbook.md` (ops notes), `known-issues.md`.
## Conventions
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
- Commit messages: `vX.Y.Z:N - short lowercase summary`. **Never add a Co-Authored-By / Claude attribution trailer.**
- Every shipped change = version bump + release notes + rebuilt s9pk (`make x86 && make install`).
- The package owner is non-technical: explain infra effects in plain English and get an explicit go/no-go before mutating the cluster.
- Pydantic request models go at **module scope**, never inside a `build_router()` body (FastAPI silently 422s otherwise).
- New external-facing endpoints get documented in `docs/` and noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
## Always / Never
**Always**
- Confirm with the user before swap/stop/restart of anything on the live cluster. Read-only probes and dry-runs are fine without asking.
- Use the Spark's **IP** for HTTP probes — `.local` mDNS names can resolve IPv6-first and hang httpx (vLLM and friends bind IPv4 only).
- Pass `SSH_KEY_PATH` / `-i <key>` explicitly in scripted SSH; non-interactive shells have no ssh-agent identities.
- Make parakeet-container changes via `image/parakeet_patches/` + the Reapply action. `docker exec` / pip changes inside the container die on `docker rm`.
- Test audio endpoints with **real speech** (e.g. macOS `say`), not tones/silence — zero-token audio skips the decoder paths where crashes live.
- Send audio requests to Spark 2 **sequentially** in tests/scripts. Parallel audio requests can race (cuFFT → 503), and the single GPU serializes them anyway.
- Pin/constrain torch versions when pip-installing anything into NGC-based containers on the Sparks (ABI breaks otherwise); expect ARM64 wheel gaps and source builds (`--no-build-isolation` for torchaudio).
- Keep the redaction leak tests green against the vendored `scrub.py` after any re-vendor.
**Never**
- Never install `cuda-python` in `parakeet-asr` to "fix" the startup warning about CUDA graphs being disabled. The warning is harmless; enabling the graph path crashes real decode with illegal memory access on this GPU/CUDA-13 stack. Leave it alone — the slow path served 11k+ requests with zero failures.
- Never edit `image/app/redaction/scrub.py` or `test_scrub_leak.py` here — change them in the CRM repo, re-vendor (`cp`), update the sha in `redaction/__init__.py`, re-run the leak test.
- Never commit owner-specific hostnames, IPs, usernames, or names into package strings, UI text, or docs — this package gets shared; use placeholders (`<spark-1-ip>` style).
- Never route audio or transcripts to cloud services — speech stays on the LAN. (Scrubbed text via `/scrub` is the only sanctioned path toward frontier models.)
- Never trust `.local` hostnames inside HTTP client code (see IPv6 note above) and never assume the ssh-agent is loaded.
- Never ship a redaction change without both redaction suites passing.
## Current state
- **Working (v0.18.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack is healthy (11k+ requests/12h, all 200).
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 14s unresponsiveness while the single GPU is continuously busy. Remedy is client-side; a drafted message (in-flight cap 2, hard ceiling 3 global across audio endpoints, retry-with-backoff on timeout/503) is with the owner to forward to that dev.
- **Decided, not implemented:** remote access stays WireGuard/Tailscale split-tunnel — no public interface, so no API auth built; an empirical concurrency sweep is offered but needs the owner's explicit OK in a quiet window.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
- **Repo wart:** HEAD's message says `v0.13.0:4` but the commit contains everything through v0.18.0:0 — per-version commits for v0.14v0.18 are missing. Keep commit messages accurate going forward.
- **Next:** (1) owner forwards the concurrency note to the Signal Engine dev; (2) commit CLAUDE.md + ROADMAP.md; (3) run the concurrency sweep if the dev wants the measured knee; (4) add the `--memory` cap to parakeet-asr via the Reapply-patches action; (5) pick the next item from ROADMAP.md.