Phases 2-6: detection, visual timeline, backend hand-off, voiceprints

Phase 2 (call detection): CallDetector using CoreAudio per-process mic
attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet,
ignoring our own recording; auto-record toggle. Built; pending live multi-app
confirmation by the user.

Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation,
TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema
1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR +
saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver
(window capture; frames released, never saved; minimized->visual_gap, idle != gap).
Synthetic-frame tested; adapter geometry pending real Signal fixtures + live
VisualObserver validation.

Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential,
TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline
slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated
END-TO-END against the live backend (chunk -> label-merge -> speakers.json).

Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named
fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status,
auto-send toggle (default off) + self-name setting.

All adversarial-review findings fixed. App + XCTest suite build; tests pass.
This commit is contained in:
Grant Gilliam
2026-06-06 00:15:49 -05:00
parent fd7e1a5907
commit 863136aeec
27 changed files with 2108 additions and 22 deletions
@@ -0,0 +1,82 @@
import Foundation
import CoreGraphics
/// Renders a CGImage to an RGBA8 buffer once, then answers cheap colour queries
/// over pixel regions. Used to score the active-speaker highlight (a saturated
/// coloured border/ring) around participant tiles.
struct FrameSampler {
let width: Int
let height: Int
private let pixels: [UInt8] // RGBA8, row-major, top-left origin
init?(cgImage: CGImage) {
let w = cgImage.width, h = cgImage.height
guard w > 0, h > 0 else { return nil }
var buffer = [UInt8](repeating: 0, count: w * h * 4)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let info = CGImageAlphaInfo.premultipliedLast.rawValue
guard let ctx = buffer.withUnsafeMutableBytes({ raw -> CGContext? in
CGContext(data: raw.baseAddress, width: w, height: h, bitsPerComponent: 8,
bytesPerRow: w * 4, space: colorSpace, bitmapInfo: info)
}) else { return nil }
ctx.draw(cgImage, in: CGRect(x: 0, y: 0, width: w, height: h))
self.width = w
self.height = h
self.pixels = buffer
}
/// Mean HSV saturation (01) over a pixel rect (top-left origin), sampled on a grid.
func meanSaturation(inPixelRect rect: CGRect, samples: Int = 24) -> Double {
let x0 = max(0, Int(rect.minX)), x1 = min(width, Int(rect.maxX))
let y0 = max(0, Int(rect.minY)), y1 = min(height, Int(rect.maxY))
guard x1 > x0, y1 > y0 else { return 0 }
let stepX = max(1, (x1 - x0) / samples)
let stepY = max(1, (y1 - y0) / samples)
var sum = 0.0, count = 0
var y = y0
while y < y1 {
var x = x0
while x < x1 {
let i = (y * width + x) * 4
let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
let mx = max(r, g, b), mn = min(r, g, b)
sum += mx > 0 ? (mx - mn) / mx : 0
count += 1
x += stepX
}
y += stepY
}
return count > 0 ? sum / Double(count) : 0
}
/// Mean saturation of a ring just inside `rect`'s edges (the tile border),
/// excluding the interior that's where the speaking highlight lives.
func borderSaturation(inPixelRect rect: CGRect, thicknessFraction: Double = 0.12) -> Double {
let t = max(2.0, min(rect.width, rect.height) * thicknessFraction)
let top = CGRect(x: rect.minX, y: rect.minY, width: rect.width, height: t)
let bottom = CGRect(x: rect.minX, y: rect.maxY - t, width: rect.width, height: t)
let left = CGRect(x: rect.minX, y: rect.minY, width: t, height: rect.height)
let right = CGRect(x: rect.maxX - t, y: rect.minY, width: t, height: rect.height)
return [top, bottom, left, right].map { meanSaturation(inPixelRect: $0) }.max() ?? 0
}
/// Grid-sampled pixel positions (top-left origin) that are strongly saturated
/// AND bright enough to be a UI highlight i.e. the speaking ring/border.
func saturatedPoints(threshold: Double = 0.5, minBrightness: Double = 60, gridStep: Int = 6) -> [CGPoint] {
var points: [CGPoint] = []
var y = 0
while y < height {
var x = 0
while x < width {
let i = (y * width + x) * 4
let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
let mx = max(r, g, b), mn = min(r, g, b)
let sat = mx > 0 ? (mx - mn) / mx : 0
if sat > threshold && mx > minBrightness { points.append(CGPoint(x: x, y: y)) }
x += gridStep
}
y += gridStep
}
return points
}
}