Phases 2-6: detection, visual timeline, backend hand-off, voiceprints

Phase 2 (call detection): CallDetector using CoreAudio per-process mic
attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet,
ignoring our own recording; auto-record toggle. Built; pending live multi-app
confirmation by the user.

Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation,
TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema
1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR +
saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver
(window capture; frames released, never saved; minimized->visual_gap, idle != gap).
Synthetic-frame tested; adapter geometry pending real Signal fixtures + live
VisualObserver validation.

Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential,
TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline
slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated
END-TO-END against the live backend (chunk -> label-merge -> speakers.json).

Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named
fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status,
auto-send toggle (default off) + self-name setting.

All adversarial-review findings fixed. App + XCTest suite build; tests pass.
This commit is contained in:
Grant Gilliam
2026-06-06 00:15:49 -05:00
parent fd7e1a5907
commit 863136aeec
27 changed files with 2108 additions and 22 deletions
+42
View File
@@ -46,6 +46,9 @@ struct MenuBarView: View {
.foregroundStyle(.secondary)
}
}
Text(detectionText)
.font(.caption)
.foregroundStyle(.secondary)
Button {
session.toggle()
@@ -84,6 +87,15 @@ struct MenuBarView: View {
.font(.caption)
}
.buttonStyle(.link)
HStack {
Button("Send to backend") { session.processLastSession() }
.disabled(transcriptProcessing)
Spacer()
}
if !transcriptText.isEmpty {
Text(transcriptText).font(.caption).foregroundStyle(transcriptColor)
}
}
}
}
@@ -114,6 +126,36 @@ struct MenuBarView: View {
return String(format: "%02d:%02d", total / 60, total % 60)
}
private var detectionText: String {
switch session.detectionStatus {
case .disabled: return "Auto-detect off"
case .listening: return "Listening for calls…"
case .inCall(let app): return "In call: \(app.display)"
}
}
private var transcriptProcessing: Bool {
if case .processing = session.transcriptStatus { return true }
return false
}
private var transcriptText: String {
switch session.transcriptStatus {
case .idle: return ""
case .processing(let d, let t): return "Transcribing… chunk \(d)/\(t)"
case .done(let s, let seg): return "Transcript ready · \(s) speakers · \(seg) segments"
case .failed(let m): return "Transcript failed: \(m)"
}
}
private var transcriptColor: Color {
switch session.transcriptStatus {
case .failed: return .red
case .done: return .green
default: return .secondary
}
}
private var header: some View {
VStack(alignment: .leading, spacing: 2) {
Text("Ten31 Transcripts").font(.headline)