Make diarization chunk length configurable (Auto + presets)

Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.
2026-06-09 10:15:16 -05:00
parent 3bb7f1ab32
commit a3e3406b28
9 changed files with 133 additions and 3 deletions
@@ -378,12 +378,15 @@ final class SessionController: ObservableObject {
        let settings = self.settings
        let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
                                          skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
+        // Resolve the diarization chunk length from the setting; "Auto" uses the
+        // participant count the visual capture saw for this session.
+        let chunkSeconds = settings.chunk.bodySeconds(participantCount: Self.participantCount(in: inputs.folder))
        do {
            let speakers = try await pipeline.process(
                sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
                micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
                timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
-                systemHealthy: inputs.systemHealthy,
+                systemHealthy: inputs.systemHealthy, chunkSeconds: chunkSeconds,
                progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
            self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
            try Task.checkCancellation()
@@ -531,6 +534,16 @@ final class SessionController: ObservableObject {
        }
    }

+    /// Detected participant count from a session's visual timeline, for "Auto" chunk
+    /// sizing. Nil when there's no visual timeline (audio-only) so callers keep the
+    /// default body length. Counts everyone OCR'd on the call, not just who spoke.
+    private static func participantCount(in folder: URL) -> Int? {
+        guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
+              let vt = try? JSONDecoder().decode(VisualTimeline.self, from: data),
+              !vt.participants.isEmpty else { return nil }
+        return vt.participants.count
+    }
+
    /// The remote (vision) visual-timeline segments saved for a session, if any.
    private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
        guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),