- AGENTS.md: add Endpoints section — auth model (cloud operator-key path, license/install-id path, admin session cookie, BTCPay HMAC) plus full /relay/* surface (public + operator-key-only control plane), the /admin/* dashboard, and the /admin/internal-meetings/* API. - AGENTS.md: rewrite Current state with verified git facts — HEAD is the prior docs commit, HEAD~1 is v0.2.11, working tree at v_0_2_124, file counts pulled live from git status. - ROADMAP.md: log two doc-precision follow-ups caught in review (the working-tree counts drift fast; the admin-route shortlist silently omits three real routes).
15 KiB
AGENTS.md — Recap Relay
Operator-side, credit-metered service that sits in front of Gemini and the operator's local AI hardware ("Spark Control": Parakeet ASR, Sortformer diarization, TitaNet voice embeddings, a vLLM/Gemma analyze endpoint). The Recaps app (../recap) is the client; this repo owns transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement, and the internal-meetings feature (upload audio → transcribe → diarize → cluster → analyze → polish → operator dashboard). Private. Ships to the operator's own Start9 box via make install only — NEVER to the public registry.
Stack
- Server: Node.js (
type: module, ES modules). Same dev box as the app (v25.6.1); container runtime is whatever theDockerfilepins. - HTTP:
express+multer(audio upload). Admin routes under/admin/*behind an admin-session-cookie gate; relay-to-relay routes under/relay/*behind the operator key. - Dashboard:
public/dashboard.html— single-file vanilla JS, render-string-into-innerHTML, same shape as the app'sindex.html. - Packaging:
@start9labs/start-sdkunderstartos/— version graph atstartos/versions/index.ts. - Storage: filesystem under the StartOS data dir (
/data). Internal meetings persist as/data/internal-meetings/<id>.json. No SQLite here. - Upstreams: Gemini (
@google/genai); operator hardware via "Spark Control" HTTP (Parakeet transcribe,/api/audio/diarize-chunkfor Sortformer+TitaNet, a vLLM/Gemma OpenAI-shape analyze endpoint).
Commands
Run from repo root unless noted.
| Action | Command |
|---|---|
| Run all tests | cd server && npm test (built-in node --test) |
| Run one test file | cd server && node --test test/<file>.test.js |
Build .s9pk (x86) |
make x86 |
| Bump version (interactive) | make bump |
| Install to operator's Start9 box | make install (bump FIRST — see Always) |
| Deploy to registry | make deploy / make redeploy — NEVER run these here (private package) |
make installpicks the newest*.s9pkby mtime in the cwd (ls -t *.s9pk | head -1) — it does NOT build. Alwaysmake x86after a change, and run from this repo's root (the shell cwd can drift to../recap, where install would grab the app's.s9pkinstead).- Host comes from the
host:field in~/.startos/config.yaml(a<relay-host>.localmDNS name). Never edit that file without authorization.
Directory layout (what this session touched / verified)
server/
routes/internal-meetings.js upload → pipeline → save; the /admin/internal-meetings/* API,
including the post-hoc speaker-edit + download endpoints
speaker-clustering.js cross-chunk voice clustering (agglomerative, cosine sim) +
assignSpeakersToSegments + small-cluster suppression
post-cluster-polish.js Stage 1 runNameInference + Stage 2 runSummaryPolish (per-window)
meeting-extras.js decisions / action items / open questions / key quotes extraction
meeting-speaker-edits.js post-hoc record edits: mergeSpeakersInRecord,
reclusterMeetingRecord, applyPolishedSummaries, backfillEntrySpeakers
backends/hardware.js Parakeet transcribe + /api/audio/diarize-chunk + chunking + vLLM analyze
chunked-analyze.js windowed analyze (planWindowsByDuration, runPipelinedAnalysis, …)
config.js getConfigSnapshot() + relay_* config defaults
hardware-config.js resolveHardwareConfig() → Spark Control endpoint discovery
test/ node --test files (speaker-clustering, meeting-speaker-edits, credits)
public/dashboard.html operator dashboard (meetings detail view + speaker tools)
startos/versions/<vN>.ts one file per version + index.ts graph
docs/issues-backlog.md detailed issue log
Internal-meetings pipeline (how speakers are produced)
- Chunk audio into ~5-min pieces (
relay_hardware_tx_chunk_minutes) with a few seconds overlap. - Per-chunk diarize at Spark Control
/api/audio/diarize-chunk: Sortformer emits chunk-local labels (Speaker_0/1), TitaNet emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not. - Cross-chunk cluster (
speaker-clustering.js,clusterSpeakers): average-linkage agglomerative clustering over all fingerprints by cosine similarity → globalSpeaker_A/B/…. Then a small-cluster suppression pass folds brief clusters into anchors orSpeaker_Unknown. - Analyze (windowed) → section
{title, summary, startIndex, endIndex}. - Polish (
post-cluster-polish.js):runNameInferenceinfers real names from the transcript, thenrunSummaryPolishrewrites each section summary to attribute statements to those names. - Extras (
meeting-extras.js). - Audio is deleted after processing (success or failure) — the relay never retains uploaded audio.
Endpoints (server-side contract)
All routes mount in server/index.js. Public paths sit under /relay/*; operator paths under /admin/*.
Auth model
X-Recap-Operator-Key+X-Recap-User-Id→ "cloud" path. The Recaps cloud server (recaps.cc) authenticates once with a shared operator key (relay_cloud_operator_key) and names the acting user. Credit pool keyeduser:<id>, tier comes from the relay's stored row, NOT a per-user license. Seeserver/identity.js.X-Recap-Install-Id(+ optionalAuthorization: <license>) → "license" path. Self-hosted installs and the operator's single-mode app. Credits/tier come from the resolved Keysat license + install id.- Admin session cookie →
/admin/*. Cookie issued byPOST /admin/login;/admin/loginand/admin/statusare exempt insidesetupAdminAuthMiddleware. - Webhook signature →
POST /relay/btcpay/webhookvalidatesBTCPay-Sigagainstrelay_btcpay_webhook_secret. Zaprite's webhook re-fetches the order through the Zaprite API to verify, so no shared-secret signing. X-Recap-Job-Idis a billing key, not auth: the first call with a given id charges one credit; later calls with the same id are free (so transcribe + analyze for one summary = one credit total).
/relay/* (public; per-call header auth)
GET /relay/health— liveness; tolerates partial config. (routes/health.js)GET /relay/policy—{ tiers, core_total_credits, core_gemini_credits }; no auth. (routes/policy.js)GET /relay/capabilities— operator-wide feature flags (hardware ready, TTS backend choice, etc).X-Recap-Install-Idoptional. (routes/capabilities.js)GET /relay/balance— caller's credit balance (routes/balance.js).POST /relay/transcribe— multipart audio →{ text, segments, duration_seconds, model, ... }. Body fields:mime_type,title,channel,description. (routes/transcribe.js)POST /relay/transcribe-url— async;{ media_url, type, mime_type, title, channel, description, chapters }→{ job_id }then pollGET /relay/jobs/:id. (routes/transcribe-url.js)POST /relay/summarize-url— async; same body shape, full transcribe+analyze pipeline →{ job_id }then streamGET /relay/summarize-url/:jobId/events(SSE). (routes/summarize-url.js)POST /relay/analyze—{ transcript, … }→ topic sections JSON. (routes/analyze.js)POST /relay/tts— text → audio; gated bycapabilities.has_tts. (routes/tts.js)GET /relay/credits/packages,POST /relay/credits/buy,GET /relay/credits/invoice/:id— à-la-carte credit purchase (BTCPay). (routes/credits.js)POST /relay/btcpay/webhook— BTCPay settle → eitherextendUserTier(subscription) or credit grant (à-la-carte). HMAC validated. (routes/credits.js)POST /relay/zaprite/webhook— Zaprite settle →extendUserTieronly. Re-fetches order to verify. (routes/zaprite-webhook.js)
/relay/* (operator-key only — cloud → relay control plane)
All require a valid X-Recap-Operator-Key. Defined in routes/user-tier.js.
POST /relay/user-tier—{ user_id, tier: "core"|"pro"|"max", expires_at? }→ sets the cloud user's stored tier (operator comp grants live here).POST /relay/tier-invoice—{ user_id, tier: "pro"|"max", return_url }→ mints a BTCPay tier-purchase invoice (Lightning QR).POST /relay/tier-zaprite-order— same idea on the card rail.GET /relay/tier-plans—{ ok, period_days, plans: [{tier, sats, fiat_amount, fiat_currency, credits_per_period}], card_available }.credits_per_period: null→ "Unlimited"; never hardcode this label.GET /relay/expiring-subscriptions?within_days=7&lapsed_days=3—{ ok, now, subscriptions: [{user_id, tier, expires_at, expired, days_left}] }. The Recaps server maps user_id → email and sends the reminder; the relay never sees email.GET /relay/user-tier/:userId— read the stored row.
/admin/* (operator dashboard; cookie-gated)
routes/admin.js: GET /admin/{usage,config,license-cache,hardware-queue,jobs,jobs-history,job/:id/details,dashboard,dashboard.csv,settings,output-store-stats}, POST /admin/{quotas,wipe-all}, PUT /admin/settings, DELETE /admin/job-outputs. routes/admin-test-run.js: POST /admin/test-run. BTCPay setup wizard under /admin/btcpay/* (routes/btcpay-setup.js).
/admin/internal-meetings/* (cookie-gated; routes/internal-meetings.js)
POST /upload— multipart audio; runs the full pipeline (chunk → diarize → cluster → analyze → polish → extras → save). Audio is deleted after.GET /→{ meetings: [...] };GET /:id→ full saved record (rec).GET /:id/markdown,GET /:id/html,GET /:id/download— exports.GET /jobs/:id,GET /jobs/:id/stream(SSE) — progress for a running upload.PATCH /:id/speakers— rename a cluster (display-name only).PATCH /:id/entries— per-linespeaker_override.PATCH /:id/merge-speakers— fold cluster(s) into one (split-as-two). Offline, no LLM.POST /:id/recluster— re-run clustering at a new threshold (merged-as-one). Offline, usesrec.diarizationfingerprints. Resetsspeaker_names, per-line overrides, and extras attributions. 400 if no fingerprints.POST /:id/repolish— re-runsrunSummaryPolishwith the CURRENT names (no re-inference). Synchronous; needs hardware analyze online; 400 if no named speakers.DELETE /:id.
Conventions for this codebase specifically
- A saved meeting record stores the per-chunk TitaNet fingerprints in
rec.diarization. Because the audio is gone, this is what makes re-clustering possible offline — no re-upload, no Spark Control round-trip. - Speaker labels live in FOUR places that every edit must keep in sync:
rec.transcript_segments[].speaker,rec.chunks[].entries[].speaker(+.speaker_override),rec.speakers(per-cluster stats), andrec.extras(tldr.primary_speakers,decisions[].agreed_by,action_items[].owner,key_quotes[].speaker). Display names are a separate map:rec.speaker_names. - Over-merging (two people clustered as one) is tuned by
relay_hardware_voice_clustering_threshold(raise it, e.g. 70→80, to split similar voices) plus the suppression knobsrelay_hardware_anchor_min_speaking_sec/relay_hardware_small_cluster_max_speaking_sec/relay_hardware_uncertain_margin_pct. All operator-config-driven; never hardcode. - Post-hoc speaker-edit endpoints (operator dashboard, added this session —
server/meeting-speaker-edits.js):PATCH /admin/internal-meetings/:id/speakers— rename a cluster (display name only; pre-existing).PATCH /admin/internal-meetings/:id/entries— per-linespeaker_override(pre-existing).PATCH /admin/internal-meetings/:id/merge-speakers— fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.POST /admin/internal-meetings/:id/recluster— re-run clustering at a new threshold (TWO people merged as one). Pure, offline (usesrec.diarizationfingerprints); resetsspeaker_names, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.POST /admin/internal-meetings/:id/repolish— re-runrunSummaryPolishwith the current names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
make installcorrectness: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (server/test/*.test.js,node --test).
Always
- Bump the version before EVERY
make install— StartOS dedupes sideloads by version string, so an unbumped reinstall (even one line changed) silently no-ops.make bump→make x86→make install. See memorybump-before-install(applies to this repo AND../recap). - Add new version files to BOTH the import block AND the
other:list instartos/versions/index.ts, and pointcurrent:at the new constant.make bumpdoes this for you. - Build freely; ask before anything that leaves this machine.
make x86/make install(to the operator's own box) are fine.make deploy/make redeployare NOT. - Reference env-var / config names, never values. Relay secrets (operator key, Gemini key, SMTP, Zaprite, BTCPay) live in gitignored env; docs name them only.
Never
- Never
make deploy/make redeploy/ upload to the registry. This package is private to the operator's box. (Memory:feedback_relay_never_to_registry.) - No "Co-Authored-By" / no "Claude" mentions in commits or source.
- Never edit a
startos/versions/<v>.tsthat's already been built/installed — add a new version file. - Don't push to GitHub by default — remote is self-hosted Gitea.
Current state — box AND working tree at 0.2.124; git is the gap
- Box AND local working tree are both at relay
0.2.124(app at0.2.155).startos/versions/index.tscurrent: v_0_2_124; the StartOS dashboard reflects the same. - Version files
v0.2.117–v0.2.124are present in the working tree (untracked). A concurrent 2026-06-13 session continued from this session's 0.2.117, bumped through 0.2.124, and shipped to the box — re-read the tree before assuming what's there. - Post-hoc speaker tools are live:
meeting-speaker-edits.js(merge / recluster / repolish + backfill) and the matchingPATCH/POST /admin/internal-meetings/:id/{merge-speakers,recluster,repolish}routes are present; the dashboard exposes the controls. Tests pass viacd server && npm test. - The real gap is git, not versions.
HEADis6fa175a Add agent docs;HEAD~1isb7f7590 v0.2.11 /relay/capabilities + /relay/transcribe-url. So the last code commit is atv0.2.11; everything fromv0.2.12→v0.2.124— the entire internal-meetings feature, diarization, speaker-edit tools, billing, the user-tier control plane — is uncommitted. Working-tree counts: 28 modified, 150 untracked, 5 deleted (183 total) as of this read. "Catching up git" = committing this tree (see ROADMAP).