Files
ten31-transcripts/Ten31Transcripts/Session/SpeakersFile.swift
T
Grant Gilliam 863136aeec Phases 2-6: detection, visual timeline, backend hand-off, voiceprints
Phase 2 (call detection): CallDetector using CoreAudio per-process mic
attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet,
ignoring our own recording; auto-record toggle. Built; pending live multi-app
confirmation by the user.

Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation,
TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema
1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR +
saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver
(window capture; frames released, never saved; minimized->visual_gap, idle != gap).
Synthetic-frame tested; adapter geometry pending real Signal fixtures + live
VisualObserver validation.

Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential,
TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline
slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated
END-TO-END against the live backend (chunk -> label-merge -> speakers.json).

Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named
fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status,
auto-send toggle (default off) + self-name setting.

All adversarial-review findings fixed. App + XCTest suite build; tests pass.
2026-06-06 00:15:49 -05:00

46 lines
1.3 KiB
Swift

import Foundation
/// `speakers.json` the final stored output (docs §6): per-chunk `label-merge`
/// results concatenated, timestamps offset back to global seconds, names unified.
/// This is the hand-off to the downstream summarizer; the app stops here.
struct SpeakersFile: Codable {
let sessionId: String
let app: String
let durationSec: Double
let speakers: [Speaker]
let segments: [Segment]
let models: [String: String]
struct Speaker: Codable, Equatable {
let name: String
let source: String
let overlapConfidence: Double?
let matchSimilarity: Double?
enum CodingKeys: String, CodingKey {
case name, source
case overlapConfidence = "overlap_confidence"
case matchSimilarity = "match_similarity"
}
}
struct Segment: Codable, Equatable {
let start: Double
let end: Double
let speaker: String
let text: String?
}
enum CodingKeys: String, CodingKey {
case sessionId = "session_id"
case app
case durationSec = "duration_sec"
case speakers, segments, models
}
func write(to url: URL) throws {
let encoder = JSONEncoder()
encoder.outputFormatting = [.prettyPrinted, .sortedKeys]
try encoder.encode(self).write(to: url)
}
}