Trim AGENTS.md; extract internal-meetings guide + lazy-load wiring

This commit is contained in:
Keysat
2026-06-13 13:36:46 -05:00
parent 1243f4414c
commit fb11dd6a04
5 changed files with 89 additions and 22 deletions
+4 -20
View File
@@ -27,7 +27,7 @@ Run from repo root unless noted.
- `make install` picks the **newest `*.s9pk` by mtime in the cwd** (`ls -t *.s9pk | head -1`) — it does NOT build. Always `make x86` after a change, and run from this repo's root (the shell cwd can drift to `../recap`, where install would grab the *app's* `.s9pk` instead).
- Host comes from the `host:` field in `~/.startos/config.yaml` (a `<relay-host>.local` mDNS name). Never edit that file without authorization.
## Directory layout (what this session touched / verified)
## Directory layout (key files)
```
server/
@@ -47,18 +47,9 @@ server/
public/dashboard.html operator dashboard (meetings detail view + speaker tools)
startos/versions/<vN>.ts one file per version + index.ts graph
docs/issues-backlog.md detailed issue log
docs/guides/internal-meetings.md diarization / speaker subsystem guide (path-scoped; lazy-loads via .claude/rules/)
```
## Internal-meetings pipeline (how speakers are produced)
1. **Chunk** audio into ~5-min pieces (`relay_hardware_tx_chunk_minutes`) with a few seconds overlap.
2. **Per-chunk diarize** at Spark Control `/api/audio/diarize-chunk`: **Sortformer** emits chunk-local labels (`Speaker_0/1`), **TitaNet** emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not.
3. **Cross-chunk cluster** (`speaker-clustering.js`, `clusterSpeakers`): average-linkage agglomerative clustering over all fingerprints by cosine similarity → global `Speaker_A/B/…`. Then a **small-cluster suppression** pass folds brief clusters into anchors or `Speaker_Unknown`.
4. **Analyze** (windowed) → section `{title, summary, startIndex, endIndex}`.
5. **Polish** (`post-cluster-polish.js`): `runNameInference` infers real names from the transcript, then `runSummaryPolish` rewrites each section summary to attribute statements to those names.
6. **Extras** (`meeting-extras.js`).
7. **Audio is deleted after processing** (success or failure) — the relay never retains uploaded audio.
## Endpoints (server-side contract)
All routes mount in `server/index.js`. Public paths sit under `/relay/*`; operator paths under `/admin/*`.
@@ -127,15 +118,8 @@ this. When unsure whether a change is contract-affecting, assume it is and check
## Conventions for this codebase specifically
- **A saved meeting record stores the per-chunk TitaNet fingerprints in `rec.diarization`.** Because the audio is gone, this is what makes re-clustering possible *offline* — no re-upload, no Spark Control round-trip.
- **Speaker labels live in FOUR places that every edit must keep in sync:** `rec.transcript_segments[].speaker`, `rec.chunks[].entries[].speaker` (+ `.speaker_override`), `rec.speakers` (per-cluster stats), and `rec.extras` (`tldr.primary_speakers`, `decisions[].agreed_by`, `action_items[].owner`, `key_quotes[].speaker`). Display names are a separate map: `rec.speaker_names`.
- **Over-merging (two people clustered as one) is tuned by `relay_hardware_voice_clustering_threshold`** (raise it, e.g. 70→80, to split similar voices) plus the suppression knobs `relay_hardware_anchor_min_speaking_sec` / `relay_hardware_small_cluster_max_speaking_sec` / `relay_hardware_uncertain_margin_pct`. All operator-config-driven; never hardcode.
- **Post-hoc speaker-edit endpoints** (operator dashboard, added this session — `server/meeting-speaker-edits.js`):
- `PATCH /admin/internal-meetings/:id/speakers` — rename a cluster (display name only; pre-existing).
- `PATCH /admin/internal-meetings/:id/entries` — per-line `speaker_override` (pre-existing).
- `PATCH /admin/internal-meetings/:id/merge-speakers` — fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.
- `POST /admin/internal-meetings/:id/recluster` — re-run clustering at a new threshold (TWO people merged as one). Pure, offline (uses `rec.diarization` fingerprints); **resets** `speaker_names`, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.
- `POST /admin/internal-meetings/:id/repolish` — re-run `runSummaryPolish` with the **current** names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
- **Before editing the internal-meetings / diarization / speaker subsystem, read `docs/guides/internal-meetings.md`** — the diarize→cluster→polish pipeline, the four-places speaker-label sync rule, the clustering-threshold knobs, and the post-hoc speaker-edit (merge / recluster / repolish) semantics live there. Scoped to `server/{speaker-clustering,post-cluster-polish,meeting-extras,meeting-speaker-edits,chunked-analyze}.js`, `server/routes/internal-meetings.js`, `server/backends/hardware.js`.
- **Doc layout**: `AGENTS.md` is canonical; `CLAUDE.md` is a symlink to it (don't overwrite it). Subsystem guides are real files in `docs/guides/<topic>.md` (with `paths:` frontmatter); `.claude/rules/<topic>.md` are relative symlinks into them (`.gitignore` carves out `!.claude/rules/` so the symlinks commit). New guide = add `docs/guides/<topic>.md`, symlink it from `.claude/rules/`, add an index line above.
- **`make install` correctness**: see [Always]. Honest reports; failing test/build is a failure. Comments explain WHY. Write tests alongside (`server/test/*.test.js`, `node --test`).
## Always