Files
recap-relay/AGENTS.md
T

95 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md — Recap Relay
Operator-side, credit-metered service that sits in front of Gemini and the operator's local AI hardware ("Spark Control": Parakeet ASR, Sortformer diarization, TitaNet voice embeddings, a vLLM/Gemma analyze endpoint). The Recaps app (`../recap`) is the client; this repo owns transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement, and the **internal-meetings** feature (upload audio → transcribe → diarize → cluster → analyze → polish → operator dashboard). **Private. Ships to the operator's own Start9 box via `make install` only — NEVER to the public registry.**
## Stack
- **Server**: Node.js (`type: module`, ES modules). Same dev box as the app (`v25.6.1`); container runtime is whatever the `Dockerfile` pins.
- **HTTP**: `express` + `multer` (audio upload). Admin routes under `/admin/*` behind an admin-session-cookie gate; relay-to-relay routes under `/relay/*` behind the operator key.
- **Dashboard**: `public/dashboard.html` — single-file vanilla JS, render-string-into-innerHTML, same shape as the app's `index.html`.
- **Packaging**: `@start9labs/start-sdk` under `startos/` — version graph at `startos/versions/index.ts`.
- **Storage**: filesystem under the StartOS data dir (`/data`). Internal meetings persist as `/data/internal-meetings/<id>.json`. No SQLite here.
- **Upstreams**: Gemini (`@google/genai`); operator hardware via "Spark Control" HTTP (Parakeet transcribe, `/api/audio/diarize-chunk` for Sortformer+TitaNet, a vLLM/Gemma OpenAI-shape analyze endpoint).
## Commands
Run from repo root unless noted.
| Action | Command |
|---|---|
| Run all tests | `cd server && npm test` (built-in `node --test`) |
| Run one test file | `cd server && node --test test/<file>.test.js` |
| Build `.s9pk` (x86) | `make x86` |
| Bump version (interactive) | `make bump` |
| Install to operator's Start9 box | `make install` *(bump FIRST — see Always)* |
| Deploy to registry | `make deploy` / `make redeploy`**NEVER run these here** (private package) |
- `make install` picks the **newest `*.s9pk` by mtime in the cwd** (`ls -t *.s9pk | head -1`) — it does NOT build. Always `make x86` after a change, and run from this repo's root (the shell cwd can drift to `../recap`, where install would grab the *app's* `.s9pk` instead).
- Host comes from the `host:` field in `~/.startos/config.yaml` (a `<relay-host>.local` mDNS name). Never edit that file without authorization.
## Directory layout (what this session touched / verified)
```
server/
routes/internal-meetings.js upload → pipeline → save; the /admin/internal-meetings/* API,
including the post-hoc speaker-edit + download endpoints
speaker-clustering.js cross-chunk voice clustering (agglomerative, cosine sim) +
assignSpeakersToSegments + small-cluster suppression
post-cluster-polish.js Stage 1 runNameInference + Stage 2 runSummaryPolish (per-window)
meeting-extras.js decisions / action items / open questions / key quotes extraction
meeting-speaker-edits.js post-hoc record edits: mergeSpeakersInRecord,
reclusterMeetingRecord, applyPolishedSummaries, backfillEntrySpeakers
backends/hardware.js Parakeet transcribe + /api/audio/diarize-chunk + chunking + vLLM analyze
chunked-analyze.js windowed analyze (planWindowsByDuration, runPipelinedAnalysis, …)
config.js getConfigSnapshot() + relay_* config defaults
hardware-config.js resolveHardwareConfig() → Spark Control endpoint discovery
test/ node --test files (speaker-clustering, meeting-speaker-edits, credits)
public/dashboard.html operator dashboard (meetings detail view + speaker tools)
startos/versions/<vN>.ts one file per version + index.ts graph
docs/issues-backlog.md detailed issue log
```
## Internal-meetings pipeline (how speakers are produced)
1. **Chunk** audio into ~5-min pieces (`relay_hardware_tx_chunk_minutes`) with a few seconds overlap.
2. **Per-chunk diarize** at Spark Control `/api/audio/diarize-chunk`: **Sortformer** emits chunk-local labels (`Speaker_0/1`), **TitaNet** emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not.
3. **Cross-chunk cluster** (`speaker-clustering.js`, `clusterSpeakers`): average-linkage agglomerative clustering over all fingerprints by cosine similarity → global `Speaker_A/B/…`. Then a **small-cluster suppression** pass folds brief clusters into anchors or `Speaker_Unknown`.
4. **Analyze** (windowed) → section `{title, summary, startIndex, endIndex}`.
5. **Polish** (`post-cluster-polish.js`): `runNameInference` infers real names from the transcript, then `runSummaryPolish` rewrites each section summary to attribute statements to those names.
6. **Extras** (`meeting-extras.js`).
7. **Audio is deleted after processing** (success or failure) — the relay never retains uploaded audio.
## Conventions for this codebase specifically
- **A saved meeting record stores the per-chunk TitaNet fingerprints in `rec.diarization`.** Because the audio is gone, this is what makes re-clustering possible *offline* — no re-upload, no Spark Control round-trip.
- **Speaker labels live in FOUR places that every edit must keep in sync:** `rec.transcript_segments[].speaker`, `rec.chunks[].entries[].speaker` (+ `.speaker_override`), `rec.speakers` (per-cluster stats), and `rec.extras` (`tldr.primary_speakers`, `decisions[].agreed_by`, `action_items[].owner`, `key_quotes[].speaker`). Display names are a separate map: `rec.speaker_names`.
- **Over-merging (two people clustered as one) is tuned by `relay_hardware_voice_clustering_threshold`** (raise it, e.g. 70→80, to split similar voices) plus the suppression knobs `relay_hardware_anchor_min_speaking_sec` / `relay_hardware_small_cluster_max_speaking_sec` / `relay_hardware_uncertain_margin_pct`. All operator-config-driven; never hardcode.
- **Post-hoc speaker-edit endpoints** (operator dashboard, added this session — `server/meeting-speaker-edits.js`):
- `PATCH /admin/internal-meetings/:id/speakers` — rename a cluster (display name only; pre-existing).
- `PATCH /admin/internal-meetings/:id/entries` — per-line `speaker_override` (pre-existing).
- `PATCH /admin/internal-meetings/:id/merge-speakers` — fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.
- `POST /admin/internal-meetings/:id/recluster` — re-run clustering at a new threshold (TWO people merged as one). Pure, offline (uses `rec.diarization` fingerprints); **resets** `speaker_names`, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.
- `POST /admin/internal-meetings/:id/repolish` — re-run `runSummaryPolish` with the **current** names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
- **`make install` correctness**: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (`server/test/*.test.js`, `node --test`).
## Always
- **Bump the version before EVERY `make install`** — StartOS dedupes sideloads by version string, so an unbumped reinstall (even one line changed) silently no-ops. `make bump``make x86``make install`. See memory `bump-before-install` (applies to this repo AND `../recap`).
- **Add new version files to BOTH the import block AND the `other:` list** in `startos/versions/index.ts`, and point `current:` at the new constant. `make bump` does this for you.
- **Build freely; ask before anything that leaves this machine.** `make x86` / `make install` (to the operator's own box) are fine. `make deploy` / `make redeploy` are NOT.
- **Reference env-var / config names, never values.** Relay secrets (operator key, Gemini key, SMTP, Zaprite, BTCPay) live in gitignored env; docs name them only.
## Never
- **Never `make deploy` / `make redeploy` / upload to the registry.** This package is private to the operator's box. (Memory: `feedback_relay_never_to_registry`.)
- **No "Co-Authored-By" / no "Claude" mentions** in commits or source.
- **Never edit a `startos/versions/<v>.ts` that's already been built/installed** — add a new version file.
- **Don't push to GitHub by default** — remote is self-hosted Gitea.
## Current state (2026-06-13) — at `0.2.124`; only git commits lag
- **Box AND local working tree are both at relay `0.2.124`** (app `0.2.155`). Confirmed on the StartOS UI (version + the Merge/Re-polish controls visible on the dashboard).
- **The version files `v0.2.117``v0.2.124` are all in this working tree** (untracked). v0.2.124's note is a billing change ("tier Bitcoin invoices return the Lightning BOLT11 + per-period credit allotment"). A **concurrent chat session** during 2026-06-13 continued from this session's 0.2.117, bumped through 0.2.124, and built+installed it to the box — so the working tree matches the box. (Heads-up: more than one session may be editing this tree; re-read before assuming.)
- **The post-hoc speaker tools are present and live**: `meeting-speaker-edits.js` (merge/recluster/repolish + backfill) and the matching `/admin/internal-meetings/:id/{merge-speakers,recluster,repolish}` routes; the dashboard shows the controls. Tests pass (32, `npm test`).
- **The real gap is git, not versions.** Committed HEAD is `v0.2.11`; everything since — v0.2.12→v0.2.124, the entire internal-meetings feature, diarization, speaker-edit tools, billing — is **uncommitted** (≈28 modified + 153 untracked). "Catching up local git" = committing this large working tree (see ROADMAP). The 0.2.117 this session installed was superseded by the concurrent 0.2.124 — **no box downgrade occurred.**