Internal-meetings / diarization / speaker subsystem

Subsystem guide for the internal-meetings feature: the upload → transcribe → diarize → cluster → analyze → polish pipeline, and the post-hoc speaker-edit tools the operator dashboard exposes. Whole-repo facts (stack, commands, endpoint contract, tier/billing) live in ../../AGENTS.md; this file lazy-loads when you edit the files it's scoped to.

Pipeline (how speakers are produced)

Chunk audio into ~5-min pieces (relay_hardware_tx_chunk_minutes) with a few seconds overlap.
Per-chunk diarize at Spark Control /api/audio/diarize-chunk: Sortformer emits chunk-local labels (Speaker_0/1), TitaNet emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not.
Cross-chunk cluster (speaker-clustering.js, clusterSpeakers): average-linkage agglomerative clustering over all fingerprints by cosine similarity → global Speaker_A/B/…. Then a small-cluster suppression pass folds brief clusters into anchors or Speaker_Unknown.
Analyze (windowed, chunked-analyze.js) → section {title, summary, startIndex, endIndex}.
Polish (post-cluster-polish.js): runNameInference infers real names from the transcript, then runSummaryPolish rewrites each section summary to attribute statements to those names.
Extras (meeting-extras.js): decisions / action items / open questions / key quotes.
Audio is deleted after processing (success or failure) — the relay never retains uploaded audio.

Conventions

A saved meeting record stores the per-chunk TitaNet fingerprints in rec.diarization. Because the audio is gone, this is what makes re-clustering possible offline — no re-upload, no Spark Control round-trip.
Speaker labels live in FOUR places that every edit must keep in sync: rec.transcript_segments[].speaker, rec.chunks[].entries[].speaker (+ .speaker_override), rec.speakers (per-cluster stats), and rec.extras (tldr.primary_speakers, decisions[].agreed_by, action_items[].owner, key_quotes[].speaker). Display names are a separate map: rec.speaker_names.
Over-merging (two people clustered as one) is tuned by relay_hardware_voice_clustering_threshold (raise it, e.g. 70→80, to split similar voices) plus the suppression knobs relay_hardware_anchor_min_speaking_sec / relay_hardware_small_cluster_max_speaking_sec / relay_hardware_uncertain_margin_pct. All operator-config-driven; never hardcode.

Post-hoc speaker-edit endpoints (`server/meeting-speaker-edits.js`)

Operator-dashboard edits to a saved record, mounted under /admin/internal-meetings/:id/* (routing in server/routes/internal-meetings.js). Every edit must keep the four label locations above in sync.

PATCH /admin/internal-meetings/:id/speakers — rename a cluster (display name only; pre-existing).
PATCH /admin/internal-meetings/:id/entries — per-line speaker_override (pre-existing).
PATCH /admin/internal-meetings/:id/merge-speakers — fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.
POST /admin/internal-meetings/:id/recluster — re-run clustering at a new threshold (TWO people merged as one). Pure, offline (uses rec.diarization fingerprints); resets speaker_names, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.
POST /admin/internal-meetings/:id/repolish — re-run runSummaryPolish with the current names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.

Test coverage: server/test/speaker-clustering.test.js, server/test/meeting-speaker-edits.test.js, server/test/polish-speaker-labels.test.js (node --test).

4.1 KiB Raw Blame History

Internal-meetings / diarization / speaker subsystem

Pipeline (how speakers are produced)

Conventions

Post-hoc speaker-edit endpoints (server/meeting-speaker-edits.js)

4.1 KiB

Raw Blame History

Post-hoc speaker-edit endpoints (`server/meeting-speaker-edits.js`)