Chunk overlap + overlap-aware stitching
Chunks were contiguous (start = prev end) with a naïve offset-concat stitch — no overlap. That cut sentences at boundaries, denied the diarizer context at edges, and let one voice split across chunks (the MH/Unknown_0 problem). Now each ~150s body is sliced with a 15s margin on both sides ([bodyStart-15, bodyEnd+15]); the stitcher keeps a segment only in the chunk that owns its MIDPOINT (body region) and drops it from the neighbour's margin — so boundary-spanning speech is seen whole by the backend and kept exactly once. - SessionPackager.PlannedChunk gains bodyStart/bodyEnd; planChunks adds overlapSeconds. - TranscriptAssembler.ChunkResult carries body bounds (defaults keep-all → no-overlap behaviour preserved for existing callers); assemble dedups by midpoint-in-body. - TranscriptPipeline passes body bounds through. Complements (doesn't replace) the fragment-smoothing + reconciliation safety nets; this is the upstream fix. ~+20% backend audio per interior chunk. 63/63 XCTest (new: overlap window layout + boundary-segment dedup).
This commit is contained in:
@@ -5,8 +5,13 @@ import Foundation
|
||||
/// name, and fingerprints collected for the voiceprint store.
|
||||
enum TranscriptAssembler {
|
||||
struct ChunkResult {
|
||||
let chunkStart: Double // global seconds
|
||||
let chunkStart: Double // global seconds (the sliced window start)
|
||||
let response: LabelMergeResponse
|
||||
// The region this chunk OWNS; segments whose midpoint falls outside it are the
|
||||
// neighbour's (overlap margin) and are dropped here. Defaults keep everything
|
||||
// (no-overlap behaviour).
|
||||
var bodyStart: Double = -.greatestFiniteMagnitude
|
||||
var bodyEnd: Double = .greatestFiniteMagnitude
|
||||
}
|
||||
|
||||
struct Assembled {
|
||||
@@ -40,13 +45,16 @@ enum TranscriptAssembler {
|
||||
|
||||
for chunk in chunks {
|
||||
let offset = chunk.chunkStart
|
||||
// Audio length from the chunk window, so silent/all-unknown calls still
|
||||
// report a real duration (not just the last segment's end).
|
||||
duration = max(duration, offset + chunk.response.duration)
|
||||
// Body end bounds the real session length even on silent/all-unknown calls.
|
||||
duration = max(duration, min(chunk.bodyEnd, offset + chunk.response.duration))
|
||||
|
||||
for seg in chunk.response.segments {
|
||||
let start = seg.startSeconds + offset
|
||||
let end = seg.endSeconds + offset
|
||||
// Overlap dedup: keep a segment only in the chunk that OWNS its midpoint;
|
||||
// the other chunk saw it only in its margin (for context) and drops it.
|
||||
let mid = (start + end) / 2
|
||||
guard mid >= chunk.bodyStart, mid < chunk.bodyEnd else { continue }
|
||||
segments.append(.init(start: start, end: end, speaker: seg.speaker, text: seg.text))
|
||||
duration = max(duration, end)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user