Channel-verified self identity: the mic track is you
Grant's insight + proven on real session audio: we capture self (mic) and others (system) as separate tracks, then throw the separation away by mixing to mono — so the backend has to re-guess who's who. Analysis of a real call showed the channels are cleanly separated (envelope corr 0.015, NO echo); Caitlyn's 'Go Bitcoin' was 11.8x louder in system than mic, yet the mono mix + noisy visual named it 'Grant'. ChannelSelfVAD marks self-speech as windows where the mic is active AND louder than system (mic > system x1.5). Benefits: (1) self is identified by CHANNEL, not by the on-screen name — set one name in Settings, no per-platform matching; (2) a remote speaker (or room echo) can never be mislabeled as self. Computed at finalize from the two finished WAVs; the live capture path is untouched. Falls back to mic-VAD if tracks can't be read. SessionController feeds these spans to the backend timeline. Validated on the real session: 16 self spans; 'Go Bitcoin' (72-74s) correctly EXCLUDED, Grant's 49.9-53.3s / 62.6-64s correctly INCLUDED. 33/33 XCTest (5 new).
This commit is contained in:
@@ -277,16 +277,30 @@ final class SessionController: ObservableObject {
|
||||
private func stopVisualAndTimeline(_ result: RecordingResult, folder: URL?)
|
||||
async -> (timeline: [VisualTimeline.Segment], visualRan: Bool) {
|
||||
let selfName = settings.selfName
|
||||
let selfSpans = await channelSelfSpans(result: result, folder: folder)
|
||||
if let vc = visualCapture, let folder {
|
||||
visualCapture = nil
|
||||
let timeline = await vc.finish(
|
||||
selfSpans: result.selfSpans, selfName: selfName,
|
||||
selfSpans: selfSpans, selfName: selfName,
|
||||
sessionId: folder.lastPathComponent, t0Unix: result.t0Unix,
|
||||
durationSec: result.duration, folder: folder)
|
||||
return (timeline, true)
|
||||
}
|
||||
if let vc = visualCapture { await vc.cancel(); visualCapture = nil }
|
||||
return (TranscriptPipeline.timeline(fromSelfSpans: result.selfSpans, selfName: selfName), false)
|
||||
return (TranscriptPipeline.timeline(fromSelfSpans: selfSpans, selfName: selfName), false)
|
||||
}
|
||||
|
||||
/// Self spans for the backend timeline, identified by CHANNEL: the mic track is
|
||||
/// the local user, so self = mic active AND louder than system. This makes self
|
||||
/// platform-independent (one name, no display-name matching) and stops a remote
|
||||
/// speaker from being mislabeled as self. Falls back to the mic-VAD spans if the
|
||||
/// tracks can't be read. Runs off the main actor (file I/O).
|
||||
private func channelSelfSpans(result: RecordingResult, folder: URL?) async -> [VADSpan] {
|
||||
guard let folder else { return result.selfSpans }
|
||||
let mic = folder.appendingPathComponent("mic.wav")
|
||||
let sys = folder.appendingPathComponent("system.wav")
|
||||
let spans = await Task.detached { ChannelSelfVAD.selfSpans(micURL: mic, systemURL: sys) }.value
|
||||
return spans ?? result.selfSpans
|
||||
}
|
||||
|
||||
private func stop() {
|
||||
|
||||
Reference in New Issue
Block a user