Make diarization chunk length configurable (Auto + presets)
Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.
This commit is contained in:
@@ -15,9 +15,15 @@ final class TimelineBuilder {
|
||||
private let closeFrames: Int
|
||||
private var aliases: [String: String] = [:] // normalized variant -> canonical
|
||||
private var states: [String: NameState] = [:]
|
||||
private var observed: Set<String> = [] // every tile name seen (speaking or not)
|
||||
private var lastFrameT: Double = 0
|
||||
private(set) var segments: [VisualTimeline.Segment] = []
|
||||
|
||||
/// Every distinct participant name the adapter has OCR'd, whether or not they were
|
||||
/// ever detected speaking — the call-size signal (drives "Auto" chunk sizing and a
|
||||
/// complete participant roster, since speaking-detection is intentionally sparse).
|
||||
var observedNames: [String] { observed.sorted() }
|
||||
|
||||
init(openFrames: Int = 2, closeFrames: Int = 2) {
|
||||
self.openFrames = max(1, openFrames)
|
||||
self.closeFrames = max(1, closeFrames)
|
||||
@@ -34,6 +40,9 @@ final class TimelineBuilder {
|
||||
func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
|
||||
lastFrameT = t
|
||||
|
||||
// Record every tile seen (speaking or not) for the participant roster / call size.
|
||||
for obs in observations where !obs.name.isEmpty { observed.insert(canonical(obs.name)) }
|
||||
|
||||
// Best confidence per canonical name that is speaking this frame.
|
||||
var speaking: [String: Double] = [:]
|
||||
for obs in observations where obs.speaking && !obs.name.isEmpty {
|
||||
|
||||
Reference in New Issue
Block a user