4.1 KiB
4.1 KiB
paths
| paths | |||||||
|---|---|---|---|---|---|---|---|
|
Internal-meetings / diarization / speaker subsystem
Subsystem guide for the internal-meetings feature: the upload → transcribe → diarize →
cluster → analyze → polish pipeline, and the post-hoc speaker-edit tools the operator
dashboard exposes. Whole-repo facts (stack, commands, endpoint contract, tier/billing)
live in ../../AGENTS.md; this file lazy-loads when you edit the files it's scoped to.
Pipeline (how speakers are produced)
- Chunk audio into ~5-min pieces (
relay_hardware_tx_chunk_minutes) with a few seconds overlap. - Per-chunk diarize at Spark Control
/api/audio/diarize-chunk: Sortformer emits chunk-local labels (Speaker_0/1), TitaNet emits a 192-dim voice fingerprint per local speaker. Labels are meaningless across chunks; fingerprints are not. - Cross-chunk cluster (
speaker-clustering.js,clusterSpeakers): average-linkage agglomerative clustering over all fingerprints by cosine similarity → globalSpeaker_A/B/…. Then a small-cluster suppression pass folds brief clusters into anchors orSpeaker_Unknown. - Analyze (windowed,
chunked-analyze.js) → section{title, summary, startIndex, endIndex}. - Polish (
post-cluster-polish.js):runNameInferenceinfers real names from the transcript, thenrunSummaryPolishrewrites each section summary to attribute statements to those names. - Extras (
meeting-extras.js): decisions / action items / open questions / key quotes. - Audio is deleted after processing (success or failure) — the relay never retains uploaded audio.
Conventions
- A saved meeting record stores the per-chunk TitaNet fingerprints in
rec.diarization. Because the audio is gone, this is what makes re-clustering possible offline — no re-upload, no Spark Control round-trip. - Speaker labels live in FOUR places that every edit must keep in sync:
rec.transcript_segments[].speaker,rec.chunks[].entries[].speaker(+.speaker_override),rec.speakers(per-cluster stats), andrec.extras(tldr.primary_speakers,decisions[].agreed_by,action_items[].owner,key_quotes[].speaker). Display names are a separate map:rec.speaker_names. - Over-merging (two people clustered as one) is tuned by
relay_hardware_voice_clustering_threshold(raise it, e.g. 70→80, to split similar voices) plus the suppression knobsrelay_hardware_anchor_min_speaking_sec/relay_hardware_small_cluster_max_speaking_sec/relay_hardware_uncertain_margin_pct. All operator-config-driven; never hardcode.
Post-hoc speaker-edit endpoints (server/meeting-speaker-edits.js)
Operator-dashboard edits to a saved record, mounted under /admin/internal-meetings/:id/*
(routing in server/routes/internal-meetings.js). Every edit must keep the four label
locations above in sync.
PATCH /admin/internal-meetings/:id/speakers— rename a cluster (display name only; pre-existing).PATCH /admin/internal-meetings/:id/entries— per-linespeaker_override(pre-existing).PATCH /admin/internal-meetings/:id/merge-speakers— fold cluster(s) into one (ONE person split as two). Pure, offline, no LLM.POST /admin/internal-meetings/:id/recluster— re-run clustering at a new threshold (TWO people merged as one). Pure, offline (usesrec.diarizationfingerprints); resetsspeaker_names, per-line overrides, and extras attributions — operator re-labels afterward. 400 if no fingerprints saved.POST /admin/internal-meetings/:id/repolish— re-runrunSummaryPolishwith the current names (no re-inference) so topic summaries re-attribute after a rename/merge. The ONLY LLM-backed edit; needs the analyze hardware online; 400 if no named speakers.
Test coverage: server/test/speaker-clustering.test.js, server/test/meeting-speaker-edits.test.js, server/test/polish-speaker-labels.test.js (node --test).