161 lines
16 KiB
Markdown
161 lines
16 KiB
Markdown
# AGENTS.md — Recap Relay
|
||
|
||
Operator-side, credit-metered service that sits in front of Gemini and the operator's local AI hardware ("Spark Control": Parakeet ASR, Sortformer diarization, TitaNet voice embeddings, a vLLM/Gemma analyze endpoint). The Recaps app (`../recap`) is the client; this repo owns transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement, and the **internal-meetings** feature (upload audio → transcribe → diarize → cluster → analyze → polish → operator dashboard). **Private. Ships to the operator's own Start9 box via `make install` only — NEVER to the public registry.**
|
||
|
||
## Stack
|
||
|
||
- **Server**: Node.js (`type: module`, ES modules). Same dev box as the app (`v25.6.1`); container runtime is whatever the `Dockerfile` pins.
|
||
- **HTTP**: `express` + `multer` (audio upload). Admin routes under `/admin/*` behind an admin-session-cookie gate; relay-to-relay routes under `/relay/*` behind the operator key.
|
||
- **Dashboard**: `public/dashboard.html` — single-file vanilla JS, render-string-into-innerHTML, same shape as the app's `index.html`.
|
||
- **Packaging**: `@start9labs/start-sdk` under `startos/` — version graph at `startos/versions/index.ts`.
|
||
- **Storage**: filesystem under the StartOS data dir (`/data`). Internal meetings persist as `/data/internal-meetings/<id>.json`. No SQLite here.
|
||
- **Upstreams**: Gemini (`@google/genai`); operator hardware via "Spark Control" HTTP (Parakeet transcribe, `/api/audio/diarize-chunk` for Sortformer+TitaNet, a vLLM/Gemma OpenAI-shape analyze endpoint).
|
||
|
||
## Commands
|
||
|
||
Run from repo root unless noted.
|
||
|
||
| Action | Command |
|
||
|---|---|
|
||
| Run all tests | `cd server && npm test` (built-in `node --test`) |
|
||
| Run one test file | `cd server && node --test test/<file>.test.js` |
|
||
| Build `.s9pk` (x86) | `make x86` |
|
||
| Bump version (interactive) | `make bump` |
|
||
| Install to operator's Start9 box | `make install` *(bump FIRST — see Always)* |
|
||
| Deploy to registry | `make deploy` / `make redeploy` — **NEVER run these here** (private package) |
|
||
|
||
- `make install` picks the **newest `*.s9pk` by mtime in the cwd** (`ls -t *.s9pk | head -1`) — it does NOT build. Always `make x86` after a change, and run from this repo's root (the shell cwd can drift to `../recap`, where install would grab the *app's* `.s9pk` instead).
|
||
- Host comes from the `host:` field in `~/.startos/config.yaml` (a `<relay-host>.local` mDNS name). Never edit that file without authorization.
|
||
|
||
## Directory layout (what this session touched / verified)
|
||
|
||
```
|
||
server/
|
||
routes/internal-meetings.js upload → pipeline → save; the /admin/internal-meetings/* API,
|
||
including the post-hoc speaker-edit + download endpoints
|
||
speaker-clustering.js cross-chunk voice clustering (agglomerative, cosine sim) +
|
||
assignSpeakersToSegments + small-cluster suppression
|
||
post-cluster-polish.js Stage 1 runNameInference + Stage 2 runSummaryPolish (per-window)
|
||
meeting-extras.js decisions / action items / open questions / key quotes extraction
|
||
meeting-speaker-edits.js post-hoc record edits: mergeSpeakersInRecord,
|
||
reclusterMeetingRecord, applyPolishedSummaries, backfillEntrySpeakers
|
||
backends/hardware.js Parakeet transcribe + /api/audio/diarize-chunk + chunking + vLLM analyze
|
||
chunked-analyze.js windowed analyze (planWindowsByDuration, runPipelinedAnalysis, …)
|
||
config.js getConfigSnapshot() + relay_* config defaults
|
||
hardware-config.js resolveHardwareConfig() → Spark Control endpoint discovery
|
||
test/ node --test files (speaker-clustering, meeting-speaker-edits, credits)
|
||
public/dashboard.html operator dashboard (meetings detail view + speaker tools)
|
||
startos/versions/<vN>.ts one file per version + index.ts graph
|
||
docs/issues-backlog.md detailed issue log
|
||
```
|
||
|
||
## Internal-meetings pipeline (how speakers are produced)
|
||
|
||
1. **Chunk** audio into ~5-min pieces (`relay_hardware_tx_chunk_minutes`) with a few seconds overlap.
|
||
2. **Per-chunk diarize** at Spark Control `/api/audio/diarize-chunk`: **Sortformer** emits chunk-local labels (`Speaker_0/1`), **TitaNet** emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not.
|
||
3. **Cross-chunk cluster** (`speaker-clustering.js`, `clusterSpeakers`): average-linkage agglomerative clustering over all fingerprints by cosine similarity → global `Speaker_A/B/…`. Then a **small-cluster suppression** pass folds brief clusters into anchors or `Speaker_Unknown`.
|
||
4. **Analyze** (windowed) → section `{title, summary, startIndex, endIndex}`.
|
||
5. **Polish** (`post-cluster-polish.js`): `runNameInference` infers real names from the transcript, then `runSummaryPolish` rewrites each section summary to attribute statements to those names.
|
||
6. **Extras** (`meeting-extras.js`).
|
||
7. **Audio is deleted after processing** (success or failure) — the relay never retains uploaded audio.
|
||
|
||
## Endpoints (server-side contract)
|
||
|
||
All routes mount in `server/index.js`. Public paths sit under `/relay/*`; operator paths under `/admin/*`.
|
||
|
||
### Auth model
|
||
|
||
- **`X-Recap-Operator-Key`** + **`X-Recap-User-Id`** → "cloud" path. The Recaps cloud server (`recaps.cc`) authenticates once with a shared operator key (`relay_cloud_operator_key`) and names the acting user. Credit pool keyed `user:<id>`, tier comes from the relay's stored row, NOT a per-user license. See `server/identity.js`.
|
||
- **`X-Recap-Install-Id`** (+ optional `Authorization: <license>`) → "license" path. Self-hosted installs and the operator's single-mode app. Credits/tier come from the resolved Keysat license + install id.
|
||
- **Admin session cookie** → `/admin/*`. Cookie issued by `POST /admin/login`; `/admin/login` and `/admin/status` are exempt inside `setupAdminAuthMiddleware`.
|
||
- **Webhook signature** → `POST /relay/btcpay/webhook` validates `BTCPay-Sig` against `relay_btcpay_webhook_secret`. Zaprite's webhook re-fetches the order through the Zaprite API to verify, so no shared-secret signing.
|
||
- **`X-Recap-Job-Id`** is a billing key, not auth: the first call with a given id charges one credit; later calls with the same id are free (so transcribe + analyze for one summary = one credit total).
|
||
|
||
### `/relay/*` (public; per-call header auth)
|
||
|
||
- `GET /relay/health` — liveness; tolerates partial config. (`routes/health.js`)
|
||
- `GET /relay/policy` — `{ tiers, core_total_credits, core_gemini_credits }`; no auth. (`routes/policy.js`)
|
||
- `GET /relay/capabilities` — operator-wide feature flags (hardware ready, TTS backend choice, etc). `X-Recap-Install-Id` optional. (`routes/capabilities.js`)
|
||
- `GET /relay/balance` — caller's credit balance (`routes/balance.js`).
|
||
- `POST /relay/transcribe` — multipart audio → `{ text, segments, duration_seconds, model, ... }`. Body fields: `mime_type`, `title`, `channel`, `description`. (`routes/transcribe.js`)
|
||
- `POST /relay/transcribe-url` — async; `{ media_url, type, mime_type, title, channel, description, chapters }` → `{ job_id }` then poll `GET /relay/jobs/:id`. (`routes/transcribe-url.js`)
|
||
- `POST /relay/summarize-url` — async; same body shape, full transcribe+analyze pipeline → `{ job_id }` then stream `GET /relay/summarize-url/:jobId/events` (SSE). (`routes/summarize-url.js`)
|
||
- `POST /relay/analyze` — `{ transcript, … }` → topic sections JSON. (`routes/analyze.js`)
|
||
- `POST /relay/tts` — text → audio; gated by `capabilities.has_tts`. (`routes/tts.js`)
|
||
- `GET /relay/credits/packages`, `POST /relay/credits/buy`, `GET /relay/credits/invoice/:id` — à-la-carte credit purchase (BTCPay). (`routes/credits.js`)
|
||
- `POST /relay/btcpay/webhook` — BTCPay settle → either `extendUserTier` (subscription) or credit grant (à-la-carte). HMAC validated. (`routes/credits.js`)
|
||
- `POST /relay/zaprite/webhook` — Zaprite settle → `extendUserTier` only. Re-fetches order to verify. (`routes/zaprite-webhook.js`)
|
||
|
||
### `/relay/*` (operator-key only — cloud → relay control plane)
|
||
|
||
All require a valid `X-Recap-Operator-Key`. Defined in `routes/user-tier.js`.
|
||
|
||
- `POST /relay/user-tier` — `{ user_id, tier: "core"|"pro"|"max", expires_at? }` → sets the cloud user's stored tier (operator comp grants live here).
|
||
- `POST /relay/tier-invoice` — `{ user_id, tier: "pro"|"max", return_url }` → mints a BTCPay tier-purchase invoice (Lightning QR).
|
||
- `POST /relay/tier-zaprite-order` — same idea on the card rail.
|
||
- `GET /relay/tier-plans` — `{ ok, period_days, plans: [{tier, sats, fiat_amount, fiat_currency, credits_per_period}], card_available }`. `credits_per_period: null` → "Unlimited"; never hardcode this label.
|
||
- `GET /relay/expiring-subscriptions?within_days=7&lapsed_days=3` — `{ ok, now, subscriptions: [{user_id, tier, expires_at, expired, days_left}] }`. The Recaps server maps user_id → email and sends the reminder; the relay never sees email.
|
||
- `GET /relay/user-tier/:userId` — read the stored row.
|
||
|
||
### `/admin/*` (operator dashboard; cookie-gated)
|
||
|
||
`routes/admin.js`: `GET /admin/{usage,config,license-cache,hardware-queue,jobs,jobs-history,job-output/:id,job/:id/details,output-store-stats,output-store-ids,dashboard,dashboard.csv,settings}`, `POST /admin/{quotas,wipe-all,settings/promote-prompt}`, `PUT /admin/settings`, `DELETE /admin/job-outputs`. `routes/admin-test-run.js`: `POST /admin/{test-run,test-run-suite}`. BTCPay setup wizard under `/admin/btcpay/*` (`routes/btcpay-setup.js`).
|
||
|
||
### `/admin/internal-meetings/*` (cookie-gated; `routes/internal-meetings.js`)
|
||
|
||
- `POST /upload` — multipart audio; runs the full pipeline (chunk → diarize → cluster → analyze → polish → extras → save). Audio is deleted after.
|
||
- `GET /` → `{ meetings: [...] }`; `GET /:id` → full saved record (`rec`).
|
||
- `GET /:id/markdown`, `GET /:id/html`, `GET /:id/download` — exports.
|
||
- `GET /jobs/:id`, `GET /jobs/:id/stream` (SSE) — progress for a running upload.
|
||
- `PATCH /:id/speakers` — rename a cluster (display-name only).
|
||
- `PATCH /:id/entries` — per-line `speaker_override`.
|
||
- `PATCH /:id/merge-speakers` — fold cluster(s) into one (split-as-two). Offline, no LLM.
|
||
- `POST /:id/recluster` — re-run clustering at a new threshold (merged-as-one). Offline, uses `rec.diarization` fingerprints. Resets `speaker_names`, per-line overrides, and extras attributions. 400 if no fingerprints.
|
||
- `POST /:id/repolish` — re-runs `runSummaryPolish` with the CURRENT names (no re-inference). Synchronous; needs hardware analyze online; 400 if no named speakers.
|
||
- `DELETE /:id`.
|
||
|
||
### Cross-repo changes (sibling: `../recap`)
|
||
|
||
This repo and the Recaps app (`../recap`) share a live client/server contract — the
|
||
`/relay/*` endpoints, the `X-Recap-*` headers, request/response shapes, and tier/credit
|
||
semantics. **Before finishing any change that touches that boundary, check whether
|
||
`../recap` needs a matching change.** If you add/rename/remove an endpoint, alter a payload
|
||
shape or header, or shift tier/credit/billing behavior, update the consumer side too — and
|
||
reflect it in BOTH repos' `AGENTS.md` (the contract docs) and `ROADMAP.md` (if it's staged
|
||
work). Purely internal changes (diarization tuning, dashboard layout, packaging) don't need
|
||
this. When unsure whether a change is contract-affecting, assume it is and check.
|
||
|
||
## Conventions for this codebase specifically
|
||
|
||
- **A saved meeting record stores the per-chunk TitaNet fingerprints in `rec.diarization`.** Because the audio is gone, this is what makes re-clustering possible *offline* — no re-upload, no Spark Control round-trip.
|
||
- **Speaker labels live in FOUR places that every edit must keep in sync:** `rec.transcript_segments[].speaker`, `rec.chunks[].entries[].speaker` (+ `.speaker_override`), `rec.speakers` (per-cluster stats), and `rec.extras` (`tldr.primary_speakers`, `decisions[].agreed_by`, `action_items[].owner`, `key_quotes[].speaker`). Display names are a separate map: `rec.speaker_names`.
|
||
- **Over-merging (two people clustered as one) is tuned by `relay_hardware_voice_clustering_threshold`** (raise it, e.g. 70→80, to split similar voices) plus the suppression knobs `relay_hardware_anchor_min_speaking_sec` / `relay_hardware_small_cluster_max_speaking_sec` / `relay_hardware_uncertain_margin_pct`. All operator-config-driven; never hardcode.
|
||
- **Post-hoc speaker-edit endpoints** (operator dashboard, added this session — `server/meeting-speaker-edits.js`):
|
||
- `PATCH /admin/internal-meetings/:id/speakers` — rename a cluster (display name only; pre-existing).
|
||
- `PATCH /admin/internal-meetings/:id/entries` — per-line `speaker_override` (pre-existing).
|
||
- `PATCH /admin/internal-meetings/:id/merge-speakers` — fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.
|
||
- `POST /admin/internal-meetings/:id/recluster` — re-run clustering at a new threshold (TWO people merged as one). Pure, offline (uses `rec.diarization` fingerprints); **resets** `speaker_names`, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.
|
||
- `POST /admin/internal-meetings/:id/repolish` — re-run `runSummaryPolish` with the **current** names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
|
||
- **`make install` correctness**: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (`server/test/*.test.js`, `node --test`).
|
||
|
||
## Always
|
||
|
||
- **Bump the version before EVERY `make install`** — StartOS dedupes sideloads by version string, so an unbumped reinstall (even one line changed) silently no-ops. `make bump` → `make x86` → `make install`. See memory `bump-before-install` (applies to this repo AND `../recap`).
|
||
- **Add new version files to BOTH the import block AND the `other:` list** in `startos/versions/index.ts`, and point `current:` at the new constant. `make bump` does this for you.
|
||
- **Build freely; ask before anything that leaves this machine.** `make x86` / `make install` (to the operator's own box) are fine. `make deploy` / `make redeploy` are NOT.
|
||
- **Reference env-var / config names, never values.** Relay secrets (operator key, Gemini key, SMTP, Zaprite, BTCPay) live in gitignored env; docs name them only.
|
||
|
||
## Never
|
||
|
||
- **Never `make deploy` / `make redeploy` / upload to the registry.** This package is private to the operator's box. (Memory: `feedback_relay_never_to_registry`.)
|
||
- **No "Co-Authored-By" / no "Claude" mentions** in commits or source.
|
||
- **Never edit a `startos/versions/<v>.ts` that's already been built/installed** — add a new version file.
|
||
- **Don't push to GitHub by default** — remote is self-hosted Gitea.
|
||
|
||
## Current state — box AND working tree at `0.2.124`; git is the gap
|
||
|
||
- **Box AND local working tree are both at relay `0.2.124`** (app at `0.2.155`). `startos/versions/index.ts` `current: v_0_2_124`; the StartOS dashboard reflects the same.
|
||
- **Version files `v0.2.117`–`v0.2.124` are present in the working tree** (untracked). A concurrent 2026-06-13 session continued from this session's 0.2.117, bumped through 0.2.124, and shipped to the box — re-read the tree before assuming what's there.
|
||
- **Post-hoc speaker tools are live**: `meeting-speaker-edits.js` (merge / recluster / repolish + backfill) and the matching `PATCH/POST /admin/internal-meetings/:id/{merge-speakers,recluster,repolish}` routes are present; the dashboard exposes the controls. Tests pass via `cd server && npm test`.
|
||
- **The real gap is git, not versions.** The last committed **code** is `b7f7590 v0.2.11 /relay/capabilities + /relay/transcribe-url`; the commit(s) stacked on top of it are docs-only (the AGENTS/ROADMAP consolidation). So everything from `v0.2.12` → `v0.2.124` — the entire internal-meetings feature, diarization, speaker-edit tools, billing, the user-tier control plane — is uncommitted. Working-tree counts: **28 modified, 150 untracked, 5 deleted (183 total)** as of this read. "Catching up git" = committing this tree (see ROADMAP).
|