Files
spark-control/package/startos/versions/v0_1_0.ts
T
Keysat c7f94381e7 v0.13.0:2 - per-segment confidence in diarize-chunk response
Recap Relay dev asked: can the diarization output include a confidence
level per segment so the UI can render "Speaker_0?" for uncertain
assignments rather than confidently mislabeling?

Answer: yes. Sortformer's diarize() with include_tensor_outputs=True
returns the per-frame per-speaker sigmoid scores (shape [B, T, 4spk],
~12.6 fps frame rate). The current code argmaxes those into segment
strings and throws the raw scores away. Now: for each output segment,
compute mean probability of the assigned speaker across the segment's
frames → confidence in [0, 1].

Implementation:
  - diarizer.py: diarize_chunk() now calls diarize() with
    include_tensor_outputs=True, and a new _attach_confidence() helper
    derives the per-segment mean probability after parsing the segment
    strings. The frame-rate is computed from tensor shape vs audio
    duration (no need to hard-code the model's stride).
  - All failure paths return confidence=None gracefully — Recap Relay
    can treat None as "no info" or fall back to a default threshold.

Endpoint shape change: segments[] now have an optional `confidence`
field in [0, 1] (or None). All other fields unchanged. Existing callers
that ignore the field aren't affected.

Verified with a 5s test signal that the tensor has shape [1, 63, 4]
(63 frames / 5s = 12.6 fps) and values in [0, 1] (sigmoid outputs,
independent per speaker so overlap detection works). Real speech values
will be much higher than the near-zero values of the pure-tone test
signal.

Reapply patches on the Speech Models card after installing v0.13.0:2
to pick up the updated diarizer.py + main.py in the parakeet container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:36:25 -05:00

14 lines
925 B
TypeScript

import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({
version: '0.13.0:2',
releaseNotes: {
en_US:
'v0.13.0:2 — per-segment confidence in diarize-chunk. Sortformer outputs per-frame per-speaker sigmoid probabilities (~12.6 fps) that we previously discarded. Now: for each diarization segment, compute mean probability of the assigned speaker across the segment\'s frames → confidence in [0, 1]. Recap Relay (and other consumers) can threshold this to render uncertain segments as "Speaker_0?" with a question mark, or to skip them entirely. Endpoint shape is otherwise unchanged — segments[].confidence is a new field, value may be None on derivation failure. Click Reapply patches on the Speech Models card after install to pick up the updated diarizer.py + main.py.',
},
migrations: {
up: async ({ effects }) => {},
down: IMPOSSIBLE,
},
})