2191486506
Grant's insight + proven on real session audio: we capture self (mic) and others (system) as separate tracks, then throw the separation away by mixing to mono — so the backend has to re-guess who's who. Analysis of a real call showed the channels are cleanly separated (envelope corr 0.015, NO echo); Caitlyn's 'Go Bitcoin' was 11.8x louder in system than mic, yet the mono mix + noisy visual named it 'Grant'. ChannelSelfVAD marks self-speech as windows where the mic is active AND louder than system (mic > system x1.5). Benefits: (1) self is identified by CHANNEL, not by the on-screen name — set one name in Settings, no per-platform matching; (2) a remote speaker (or room echo) can never be mislabeled as self. Computed at finalize from the two finished WAVs; the live capture path is untouched. Falls back to mic-VAD if tracks can't be read. SessionController feeds these spans to the backend timeline. Validated on the real session: 16 self spans; 'Go Bitcoin' (72-74s) correctly EXCLUDED, Grant's 49.9-53.3s / 62.6-64s correctly INCLUDED. 33/33 XCTest (5 new).