Phase 1: dual-track audio capture → mixed-mono 16 kHz WAV + mic VAD

AudioRecorder captures system audio (ScreenCaptureKit) + mic (AVAudioEngine) on a
single serial ioQueue, one shared monotonic t0, time-driven writers (pad gaps /
trim overlaps) so tracks stay aligned, and an energy mic-VAD for 'self' spans.
AudioMixer sums the aligned tracks into mixed_mono_16k.wav. SessionController
drives a serialized start/stop state machine, writes the session folder +
self_vad.json, exposes live level meters, and finalizes on quit.

Hardening from review: ioQueue single-domain (no races), stop() never hangs
(mic-first teardown + bounded stopCapture), layout-agnostic mic deep-copy,
discard-only video output to keep SCStream alive, VAD lockstep on committed
frames, stable signing team in project.yml, single-instance enforcement.
This commit is contained in:
Grant Gilliam
2026-06-05 21:30:11 -05:00
parent b2ae3a62b9
commit fd7e1a5907
12 changed files with 1018 additions and 10 deletions
+38
View File
@@ -0,0 +1,38 @@
import SwiftUI
/// A small horizontal audio level meter. `level` is a peak amplitude (01);
/// it's mapped to a dBFS scale (60 dB 0 dB) so normal speech is clearly visible.
struct LevelBar: View {
let label: String
let level: Float
var body: some View {
HStack(spacing: 8) {
Text(label)
.font(.caption2)
.foregroundStyle(.secondary)
.frame(width: 48, alignment: .leading)
GeometryReader { geo in
ZStack(alignment: .leading) {
RoundedRectangle(cornerRadius: 2).fill(Color.secondary.opacity(0.2))
RoundedRectangle(cornerRadius: 2)
.fill(color)
.frame(width: geo.size.width * fraction)
}
}
.frame(height: 6)
}
}
private var fraction: CGFloat {
guard level > 0 else { return 0 }
let db = 20 * log10(Double(level)) // 0
return CGFloat(min(1, max(0, (db + 60) / 60)))
}
private var color: Color {
if fraction < 0.02 { return .gray }
if fraction > 0.9 { return .red }
return .green
}
}