Signal 1:1 (and group) calls didn't auto-record. Root cause confirmed on-device:
Signal is Electron and holds the mic in a HELPER process
(org.whispersystems.signal-desktop.helper.Renderer, a child of the main app).
detectViaMicAttribution only matched PIDs listed in NSWorkspace.runningApplications
against the main bundle ID, so the helper's mic use was never attributed to Signal.
(Zoom worked = single native process; Meet worked = browser resolved.)
Fix: iterate the mic-using PIDs and resolve each to its owning app by walking the
parent-process chain (sysctl KERN_PROC_PID → ppid) until an NSRunningApplication is
found. Helper PIDs that return nil directly now resolve to the main app. Validated
against the live Signal helpers: pids 2383/2372 → org.whispersystems.signal-desktop.
Superset of the old behavior, so Zoom/Meet detection is preserved (browser case now
also more robust); our own recording is still skipped (selfPID).
Visual capture now runs alongside audio: on call start the session picks the
app's adapter, captures the call window on the SAME monotonic clock as the audio
(AudioRecorder.sharedT0Host), and on stop writes visual_timeline.json and hands
the backend the visual segments with mic-VAD self-spans merged. Any visual
failure (no adapter, no window, Screen Recording denied) leaves the session
recording audio-only — the proven path is never blocked or broken.
- CallDetector now emits DetectedCall{app, bundleID, windowID}: the exact
CGWindowID of the matched Meet browser window (native apps → nil → largest).
- VisualCapture wraps VisualObserver + AdapterRegistry, writes visual_timeline.json.
- AudioRecorder.sharedT0Host() exposes the shared t0 for frame alignment.
Hardened per a 3-lens adversarial review (concurrency / failure-isolation /
data-flow), all 6 confirmed findings fixed:
- P0 (critical): startVisual could adopt a stale capture into a DIFFERENT session
(cross-session SCStream leak + visual_timeline.json written to the wrong
folder). Now gated on session identity — generation + recorder ===, still
.recording — with fail-closed adoption; otherwise the stream is cancelled.
- P1: observer captured the browser's largest window, not the detected Meet
window. Now targets the exact CGWindowID (pickWindowIndex, unit-tested),
largest-area only as fallback.
- P2: a startVisual orphaned by a concurrent stop could leak a stream on quit.
inFlightVisual is registered before the await and drained in prepareForTermination.
- P3: trailing visual gap/segment ends could exceed duration_sec. Clamped in
VisualCapture (clampSegments/clampGaps, unit-tested).
- P4: capture pixel size used NSScreen.main scale; now uses the scale of the
display actually hosting the window (OCR clarity on secondary displays).
- VisualObserver.stop() bounds stopCapture() with a 3s timeout (mirrors audio) so
a wedged stream can't hang finalization.
25/25 XCTest pass. Live validation on real calls still pending.