9.8 KiB
9.8 KiB
AGENTS.md — Recap Relay
Operator-side, credit-metered service that sits in front of Gemini and the operator's local AI hardware ("Spark Control": Parakeet ASR, Sortformer diarization, TitaNet voice embeddings, a vLLM/Gemma analyze endpoint). The Recaps app (../recap) is the client; this repo owns transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement, and the internal-meetings feature (upload audio → transcribe → diarize → cluster → analyze → polish → operator dashboard). Private. Ships to the operator's own Start9 box via make install only — NEVER to the public registry.
Stack
- Server: Node.js (
type: module, ES modules). Same dev box as the app (v25.6.1); container runtime is whatever theDockerfilepins. - HTTP:
express+multer(audio upload). Admin routes under/admin/*behind an admin-session-cookie gate; relay-to-relay routes under/relay/*behind the operator key. - Dashboard:
public/dashboard.html— single-file vanilla JS, render-string-into-innerHTML, same shape as the app'sindex.html. - Packaging:
@start9labs/start-sdkunderstartos/— version graph atstartos/versions/index.ts. - Storage: filesystem under the StartOS data dir (
/data). Internal meetings persist as/data/internal-meetings/<id>.json. No SQLite here. - Upstreams: Gemini (
@google/genai); operator hardware via "Spark Control" HTTP (Parakeet transcribe,/api/audio/diarize-chunkfor Sortformer+TitaNet, a vLLM/Gemma OpenAI-shape analyze endpoint).
Commands
Run from repo root unless noted.
| Action | Command |
|---|---|
| Run all tests | cd server && npm test (built-in node --test) |
| Run one test file | cd server && node --test test/<file>.test.js |
Build .s9pk (x86) |
make x86 |
| Bump version (interactive) | make bump |
| Install to operator's Start9 box | make install (bump FIRST — see Always) |
| Deploy to registry | make deploy / make redeploy — NEVER run these here (private package) |
make installpicks the newest*.s9pkby mtime in the cwd (ls -t *.s9pk | head -1) — it does NOT build. Alwaysmake x86after a change, and run from this repo's root (the shell cwd can drift to../recap, where install would grab the app's.s9pkinstead).- Host comes from the
host:field in~/.startos/config.yaml(a<relay-host>.localmDNS name). Never edit that file without authorization.
Directory layout (what this session touched / verified)
server/
routes/internal-meetings.js upload → pipeline → save; the /admin/internal-meetings/* API,
including the post-hoc speaker-edit + download endpoints
speaker-clustering.js cross-chunk voice clustering (agglomerative, cosine sim) +
assignSpeakersToSegments + small-cluster suppression
post-cluster-polish.js Stage 1 runNameInference + Stage 2 runSummaryPolish (per-window)
meeting-extras.js decisions / action items / open questions / key quotes extraction
meeting-speaker-edits.js post-hoc record edits: mergeSpeakersInRecord,
reclusterMeetingRecord, applyPolishedSummaries, backfillEntrySpeakers
backends/hardware.js Parakeet transcribe + /api/audio/diarize-chunk + chunking + vLLM analyze
chunked-analyze.js windowed analyze (planWindowsByDuration, runPipelinedAnalysis, …)
config.js getConfigSnapshot() + relay_* config defaults
hardware-config.js resolveHardwareConfig() → Spark Control endpoint discovery
test/ node --test files (speaker-clustering, meeting-speaker-edits, credits)
public/dashboard.html operator dashboard (meetings detail view + speaker tools)
startos/versions/<vN>.ts one file per version + index.ts graph
docs/issues-backlog.md detailed issue log
Internal-meetings pipeline (how speakers are produced)
- Chunk audio into ~5-min pieces (
relay_hardware_tx_chunk_minutes) with a few seconds overlap. - Per-chunk diarize at Spark Control
/api/audio/diarize-chunk: Sortformer emits chunk-local labels (Speaker_0/1), TitaNet emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not. - Cross-chunk cluster (
speaker-clustering.js,clusterSpeakers): average-linkage agglomerative clustering over all fingerprints by cosine similarity → globalSpeaker_A/B/…. Then a small-cluster suppression pass folds brief clusters into anchors orSpeaker_Unknown. - Analyze (windowed) → section
{title, summary, startIndex, endIndex}. - Polish (
post-cluster-polish.js):runNameInferenceinfers real names from the transcript, thenrunSummaryPolishrewrites each section summary to attribute statements to those names. - Extras (
meeting-extras.js). - Audio is deleted after processing (success or failure) — the relay never retains uploaded audio.
Conventions for this codebase specifically
- A saved meeting record stores the per-chunk TitaNet fingerprints in
rec.diarization. Because the audio is gone, this is what makes re-clustering possible offline — no re-upload, no Spark Control round-trip. - Speaker labels live in FOUR places that every edit must keep in sync:
rec.transcript_segments[].speaker,rec.chunks[].entries[].speaker(+.speaker_override),rec.speakers(per-cluster stats), andrec.extras(tldr.primary_speakers,decisions[].agreed_by,action_items[].owner,key_quotes[].speaker). Display names are a separate map:rec.speaker_names. - Over-merging (two people clustered as one) is tuned by
relay_hardware_voice_clustering_threshold(raise it, e.g. 70→80, to split similar voices) plus the suppression knobsrelay_hardware_anchor_min_speaking_sec/relay_hardware_small_cluster_max_speaking_sec/relay_hardware_uncertain_margin_pct. All operator-config-driven; never hardcode. - Post-hoc speaker-edit endpoints (operator dashboard, added this session —
server/meeting-speaker-edits.js):PATCH /admin/internal-meetings/:id/speakers— rename a cluster (display name only; pre-existing).PATCH /admin/internal-meetings/:id/entries— per-linespeaker_override(pre-existing).PATCH /admin/internal-meetings/:id/merge-speakers— fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.POST /admin/internal-meetings/:id/recluster— re-run clustering at a new threshold (TWO people merged as one). Pure, offline (usesrec.diarizationfingerprints); resetsspeaker_names, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.POST /admin/internal-meetings/:id/repolish— re-runrunSummaryPolishwith the current names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
make installcorrectness: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (server/test/*.test.js,node --test).
Always
- Bump the version before EVERY
make install— StartOS dedupes sideloads by version string, so an unbumped reinstall (even one line changed) silently no-ops.make bump→make x86→make install. See memorybump-before-install(applies to this repo AND../recap). - Add new version files to BOTH the import block AND the
other:list instartos/versions/index.ts, and pointcurrent:at the new constant.make bumpdoes this for you. - Build freely; ask before anything that leaves this machine.
make x86/make install(to the operator's own box) are fine.make deploy/make redeployare NOT. - Reference env-var / config names, never values. Relay secrets (operator key, Gemini key, SMTP, Zaprite, BTCPay) live in gitignored env; docs name them only.
Never
- Never
make deploy/make redeploy/ upload to the registry. This package is private to the operator's box. (Memory:feedback_relay_never_to_registry.) - No "Co-Authored-By" / no "Claude" mentions in commits or source.
- Never edit a
startos/versions/<v>.tsthat's already been built/installed — add a new version file. - Don't push to GitHub by default — remote is self-hosted Gitea.
Current state (2026-06-13) — at 0.2.124; only git commits lag
- Box AND local working tree are both at relay
0.2.124(app0.2.155). Confirmed on the StartOS UI (version + the Merge/Re-polish controls visible on the dashboard). - The version files
v0.2.117–v0.2.124are all in this working tree (untracked). v0.2.124's note is a billing change ("tier Bitcoin invoices return the Lightning BOLT11 + per-period credit allotment"). A concurrent chat session during 2026-06-13 continued from this session's 0.2.117, bumped through 0.2.124, and built+installed it to the box — so the working tree matches the box. (Heads-up: more than one session may be editing this tree; re-read before assuming.) - The post-hoc speaker tools are present and live:
meeting-speaker-edits.js(merge/recluster/repolish + backfill) and the matching/admin/internal-meetings/:id/{merge-speakers,recluster,repolish}routes; the dashboard shows the controls. Tests pass (32,npm test). - The real gap is git, not versions. Committed HEAD is
v0.2.11; everything since — v0.2.12→v0.2.124, the entire internal-meetings feature, diarization, speaker-edit tools, billing — is uncommitted (≈28 modified + 153 untracked). "Catching up local git" = committing this large working tree (see ROADMAP). The 0.2.117 this session installed was superseded by the concurrent 0.2.124 — no box downgrade occurred.