Files

T

Keysat 7e5a7e3b7e Document server-side endpoint contract; correct Current state precision

- AGENTS.md: add Endpoints section — auth model (cloud operator-key path,
  license/install-id path, admin session cookie, BTCPay HMAC) plus full
  /relay/* surface (public + operator-key-only control plane), the
  /admin/* dashboard, and the /admin/internal-meetings/* API.
- AGENTS.md: rewrite Current state with verified git facts — HEAD is the
  prior docs commit, HEAD~1 is v0.2.11, working tree at v_0_2_124, file
  counts pulled live from git status.
- ROADMAP.md: log two doc-precision follow-ups caught in review (the
  working-tree counts drift fast; the admin-route shortlist silently
  omits three real routes).

2026-06-13 11:13:12 -05:00

15 KiB

Raw Blame History

AGENTS.md — Recap Relay

Operator-side, credit-metered service that sits in front of Gemini and the operator's local AI hardware ("Spark Control": Parakeet ASR, Sortformer diarization, TitaNet voice embeddings, a vLLM/Gemma analyze endpoint). The Recaps app (../recap) is the client; this repo owns transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement, and the internal-meetings feature (upload audio → transcribe → diarize → cluster → analyze → polish → operator dashboard). Private. Ships to the operator's own Start9 box via make install only — NEVER to the public registry.

Stack

Server: Node.js (type: module, ES modules). Same dev box as the app (v25.6.1); container runtime is whatever the Dockerfile pins.
HTTP: express + multer (audio upload). Admin routes under /admin/* behind an admin-session-cookie gate; relay-to-relay routes under /relay/* behind the operator key.
Dashboard: public/dashboard.html — single-file vanilla JS, render-string-into-innerHTML, same shape as the app's index.html.
Packaging: @start9labs/start-sdk under startos/ — version graph at startos/versions/index.ts.
Storage: filesystem under the StartOS data dir (/data). Internal meetings persist as /data/internal-meetings/<id>.json. No SQLite here.
Upstreams: Gemini (@google/genai); operator hardware via "Spark Control" HTTP (Parakeet transcribe, /api/audio/diarize-chunk for Sortformer+TitaNet, a vLLM/Gemma OpenAI-shape analyze endpoint).

Commands

Run from repo root unless noted.

Action	Command
Run all tests	`cd server && npm test` (built-in `node --test`)
Run one test file	`cd server && node --test test/<file>.test.js`
Build `.s9pk` (x86)	`make x86`
Bump version (interactive)	`make bump`
Install to operator's Start9 box	`make install` (bump FIRST — see Always)
Deploy to registry	`make deploy` / `make redeploy` — NEVER run these here (private package)

make install picks the newest *.s9pk by mtime in the cwd (ls -t *.s9pk | head -1) — it does NOT build. Always make x86 after a change, and run from this repo's root (the shell cwd can drift to ../recap, where install would grab the app's .s9pk instead).
Host comes from the host: field in ~/.startos/config.yaml (a <relay-host>.local mDNS name). Never edit that file without authorization.

Directory layout (what this session touched / verified)

server/
  routes/internal-meetings.js   upload → pipeline → save; the /admin/internal-meetings/* API,
                                including the post-hoc speaker-edit + download endpoints
  speaker-clustering.js         cross-chunk voice clustering (agglomerative, cosine sim) +
                                assignSpeakersToSegments + small-cluster suppression
  post-cluster-polish.js        Stage 1 runNameInference + Stage 2 runSummaryPolish (per-window)
  meeting-extras.js             decisions / action items / open questions / key quotes extraction
  meeting-speaker-edits.js      post-hoc record edits: mergeSpeakersInRecord,
                                reclusterMeetingRecord, applyPolishedSummaries, backfillEntrySpeakers
  backends/hardware.js          Parakeet transcribe + /api/audio/diarize-chunk + chunking + vLLM analyze
  chunked-analyze.js            windowed analyze (planWindowsByDuration, runPipelinedAnalysis, …)
  config.js                     getConfigSnapshot() + relay_* config defaults
  hardware-config.js            resolveHardwareConfig() → Spark Control endpoint discovery
  test/                         node --test files (speaker-clustering, meeting-speaker-edits, credits)
public/dashboard.html           operator dashboard (meetings detail view + speaker tools)
startos/versions/<vN>.ts        one file per version + index.ts graph
docs/issues-backlog.md          detailed issue log

Internal-meetings pipeline (how speakers are produced)

Chunk audio into ~5-min pieces (relay_hardware_tx_chunk_minutes) with a few seconds overlap.
Per-chunk diarize at Spark Control /api/audio/diarize-chunk: Sortformer emits chunk-local labels (Speaker_0/1), TitaNet emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not.
Cross-chunk cluster (speaker-clustering.js, clusterSpeakers): average-linkage agglomerative clustering over all fingerprints by cosine similarity → global Speaker_A/B/…. Then a small-cluster suppression pass folds brief clusters into anchors or Speaker_Unknown.
Analyze (windowed) → section {title, summary, startIndex, endIndex}.
Polish (post-cluster-polish.js): runNameInference infers real names from the transcript, then runSummaryPolish rewrites each section summary to attribute statements to those names.
Extras (meeting-extras.js).
Audio is deleted after processing (success or failure) — the relay never retains uploaded audio.

Endpoints (server-side contract)

All routes mount in server/index.js. Public paths sit under /relay/*; operator paths under /admin/*.

Auth model

X-Recap-Operator-Key + X-Recap-User-Id → "cloud" path. The Recaps cloud server (recaps.cc) authenticates once with a shared operator key (relay_cloud_operator_key) and names the acting user. Credit pool keyed user:<id>, tier comes from the relay's stored row, NOT a per-user license. See server/identity.js.
X-Recap-Install-Id (+ optional Authorization: <license>) → "license" path. Self-hosted installs and the operator's single-mode app. Credits/tier come from the resolved Keysat license + install id.
Admin session cookie → /admin/*. Cookie issued by POST /admin/login; /admin/login and /admin/status are exempt inside setupAdminAuthMiddleware.
Webhook signature → POST /relay/btcpay/webhook validates BTCPay-Sig against relay_btcpay_webhook_secret. Zaprite's webhook re-fetches the order through the Zaprite API to verify, so no shared-secret signing.
X-Recap-Job-Id is a billing key, not auth: the first call with a given id charges one credit; later calls with the same id are free (so transcribe + analyze for one summary = one credit total).

`/relay/*` (public; per-call header auth)

GET /relay/health — liveness; tolerates partial config. (routes/health.js)
GET /relay/policy — { tiers, core_total_credits, core_gemini_credits }; no auth. (routes/policy.js)
GET /relay/capabilities — operator-wide feature flags (hardware ready, TTS backend choice, etc). X-Recap-Install-Id optional. (routes/capabilities.js)
GET /relay/balance — caller's credit balance (routes/balance.js).
POST /relay/transcribe — multipart audio → { text, segments, duration_seconds, model, ... }. Body fields: mime_type, title, channel, description. (routes/transcribe.js)
POST /relay/transcribe-url — async; { media_url, type, mime_type, title, channel, description, chapters } → { job_id } then poll GET /relay/jobs/:id. (routes/transcribe-url.js)
POST /relay/summarize-url — async; same body shape, full transcribe+analyze pipeline → { job_id } then stream GET /relay/summarize-url/:jobId/events (SSE). (routes/summarize-url.js)
POST /relay/analyze — { transcript, … } → topic sections JSON. (routes/analyze.js)
POST /relay/tts — text → audio; gated by capabilities.has_tts. (routes/tts.js)
GET /relay/credits/packages, POST /relay/credits/buy, GET /relay/credits/invoice/:id — à-la-carte credit purchase (BTCPay). (routes/credits.js)
POST /relay/btcpay/webhook — BTCPay settle → either extendUserTier (subscription) or credit grant (à-la-carte). HMAC validated. (routes/credits.js)
POST /relay/zaprite/webhook — Zaprite settle → extendUserTier only. Re-fetches order to verify. (routes/zaprite-webhook.js)

`/relay/*` (operator-key only — cloud → relay control plane)

All require a valid X-Recap-Operator-Key. Defined in routes/user-tier.js.

POST /relay/user-tier — { user_id, tier: "core"|"pro"|"max", expires_at? } → sets the cloud user's stored tier (operator comp grants live here).
POST /relay/tier-invoice — { user_id, tier: "pro"|"max", return_url } → mints a BTCPay tier-purchase invoice (Lightning QR).
POST /relay/tier-zaprite-order — same idea on the card rail.
GET /relay/tier-plans — { ok, period_days, plans: [{tier, sats, fiat_amount, fiat_currency, credits_per_period}], card_available }. credits_per_period: null → "Unlimited"; never hardcode this label.
GET /relay/expiring-subscriptions?within_days=7&lapsed_days=3 — { ok, now, subscriptions: [{user_id, tier, expires_at, expired, days_left}] }. The Recaps server maps user_id → email and sends the reminder; the relay never sees email.
GET /relay/user-tier/:userId — read the stored row.

`/admin/*` (operator dashboard; cookie-gated)

routes/admin.js: GET /admin/{usage,config,license-cache,hardware-queue,jobs,jobs-history,job/:id/details,dashboard,dashboard.csv,settings,output-store-stats}, POST /admin/{quotas,wipe-all}, PUT /admin/settings, DELETE /admin/job-outputs. routes/admin-test-run.js: POST /admin/test-run. BTCPay setup wizard under /admin/btcpay/* (routes/btcpay-setup.js).

`/admin/internal-meetings/*` (cookie-gated; `routes/internal-meetings.js`)

POST /upload — multipart audio; runs the full pipeline (chunk → diarize → cluster → analyze → polish → extras → save). Audio is deleted after.
GET / → { meetings: [...] }; GET /:id → full saved record (rec).
GET /:id/markdown, GET /:id/html, GET /:id/download — exports.
GET /jobs/:id, GET /jobs/:id/stream (SSE) — progress for a running upload.
PATCH /:id/speakers — rename a cluster (display-name only).
PATCH /:id/entries — per-line speaker_override.
PATCH /:id/merge-speakers — fold cluster(s) into one (split-as-two). Offline, no LLM.
POST /:id/recluster — re-run clustering at a new threshold (merged-as-one). Offline, uses rec.diarization fingerprints. Resets speaker_names, per-line overrides, and extras attributions. 400 if no fingerprints.
POST /:id/repolish — re-runs runSummaryPolish with the CURRENT names (no re-inference). Synchronous; needs hardware analyze online; 400 if no named speakers.
DELETE /:id.

Conventions for this codebase specifically

A saved meeting record stores the per-chunk TitaNet fingerprints in rec.diarization. Because the audio is gone, this is what makes re-clustering possible offline — no re-upload, no Spark Control round-trip.
Speaker labels live in FOUR places that every edit must keep in sync: rec.transcript_segments[].speaker, rec.chunks[].entries[].speaker (+ .speaker_override), rec.speakers (per-cluster stats), and rec.extras (tldr.primary_speakers, decisions[].agreed_by, action_items[].owner, key_quotes[].speaker). Display names are a separate map: rec.speaker_names.
Over-merging (two people clustered as one) is tuned by relay_hardware_voice_clustering_threshold (raise it, e.g. 70→80, to split similar voices) plus the suppression knobs relay_hardware_anchor_min_speaking_sec / relay_hardware_small_cluster_max_speaking_sec / relay_hardware_uncertain_margin_pct. All operator-config-driven; never hardcode.
Post-hoc speaker-edit endpoints (operator dashboard, added this session — server/meeting-speaker-edits.js):
- PATCH /admin/internal-meetings/:id/speakers — rename a cluster (display name only; pre-existing).
- PATCH /admin/internal-meetings/:id/entries — per-line speaker_override (pre-existing).
- PATCH /admin/internal-meetings/:id/merge-speakers — fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.
- POST /admin/internal-meetings/:id/recluster — re-run clustering at a new threshold (TWO people merged as one). Pure, offline (uses rec.diarization fingerprints); resets speaker_names, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.
- POST /admin/internal-meetings/:id/repolish — re-run runSummaryPolish with the current names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
make install correctness: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (server/test/*.test.js, node --test).

Always

Bump the version before EVERY make install — StartOS dedupes sideloads by version string, so an unbumped reinstall (even one line changed) silently no-ops. make bump → make x86 → make install. See memory bump-before-install (applies to this repo AND ../recap).
Add new version files to BOTH the import block AND the other: list in startos/versions/index.ts, and point current: at the new constant. make bump does this for you.
Build freely; ask before anything that leaves this machine. make x86 / make install (to the operator's own box) are fine. make deploy / make redeploy are NOT.
Reference env-var / config names, never values. Relay secrets (operator key, Gemini key, SMTP, Zaprite, BTCPay) live in gitignored env; docs name them only.

Never

Never make deploy / make redeploy / upload to the registry. This package is private to the operator's box. (Memory: feedback_relay_never_to_registry.)
No "Co-Authored-By" / no "Claude" mentions in commits or source.
Never edit a startos/versions/<v>.ts that's already been built/installed — add a new version file.
Don't push to GitHub by default — remote is self-hosted Gitea.

Current state — box AND working tree at `0.2.124`; git is the gap

Box AND local working tree are both at relay 0.2.124 (app at 0.2.155). startos/versions/index.ts current: v_0_2_124; the StartOS dashboard reflects the same.
Version files v0.2.117–v0.2.124 are present in the working tree (untracked). A concurrent 2026-06-13 session continued from this session's 0.2.117, bumped through 0.2.124, and shipped to the box — re-read the tree before assuming what's there.
Post-hoc speaker tools are live: meeting-speaker-edits.js (merge / recluster / repolish + backfill) and the matching PATCH/POST /admin/internal-meetings/:id/{merge-speakers,recluster,repolish} routes are present; the dashboard exposes the controls. Tests pass via cd server && npm test.
The real gap is git, not versions. HEAD is 6fa175a Add agent docs; HEAD~1 is b7f7590 v0.2.11 /relay/capabilities + /relay/transcribe-url. So the last code commit is at v0.2.11; everything from v0.2.12 → v0.2.124 — the entire internal-meetings feature, diarization, speaker-edit tools, billing, the user-tier control plane — is uncommitted. Working-tree counts: 28 modified, 150 untracked, 5 deleted (183 total) as of this read. "Catching up git" = committing this tree (see ROADMAP).

15 KiB Raw Blame History Unescape Escape