Files
ten31-transcripts/AGENTS.md
T
2026-06-13 16:08:46 -05:00

12 KiB
Raw Blame History

AGENTS.md — Ten31 Transcripts

Native macOS menu-bar app that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted SparkControl backend that does transcription/diarization/naming — producing named transcripts and recaps.

Stack (versions that matter)

  • Swift 5.0, SwiftUI + AppKit, macOS 13.0 deployment target. LSUIElement (menu-bar only, no Dock icon).
  • Project is generated by XcodeGen from project.yml (brew install xcodegen). *.xcodeproj is gitignored — regenerate, don't edit.
  • Full Xcode lives at /Applications/Xcode.app, but xcode-select points at CommandLineTools → set DEVELOPER_DIR for every xcodebuild.
  • Bundle id xyz.ten31.transcripts; DEVELOPMENT_TEAM (Apple Team ID) is set in a gitignored Config/Signing.xcconfig (copy Config/Signing.xcconfig.example and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
  • Backend: SparkControl gateway at $SPARK_BACKEND_URL (a private LAN backend — IP or .local host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, host-scoped escape hatch, off by default — see InsecureTrustDelegate). Resolution order: a value saved in Settings → SparkControl backend (UserDefaults) wins, else the SPARK_BACKEND_URL env var, else the placeholder default in AppSettings.swift. Diarization = Sortformer/TitaNet (mono-only, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible /v1/chat/completions; audio via /api/audio/label-merge.

Commands

First time on a machine — create the local signing config (else xcodegen generate/signing won't find a team):

cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM

Regenerate the Xcode project (after adding/removing/renaming any source file):

xcodegen generate

Build + run all tests:

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd

Run a single test (target/class/method):

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
  -only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine

Build only: replace test with build. Lint/format: none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in ROADMAP.md. Build a standalone app and install/run it (Xcode does not need to stay open):

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app

Fast validation harness (preferred for visual/backend logic): compile the specific Ten31Transcripts/**.swift files plus a main.swift with xcrun --sdk macosx swiftc -O ... main.swift -o x and run against real fixtures (example-screenshots/) or saved sessions. Top-level code must live in the file literally named main.swift.

Layout (day one)

  • Ten31Transcripts/App/@main entry + AppDelegate.
  • Ten31Transcripts/Session/SessionController (state machine), TranscriptPipeline, SessionPackager (chunking), TranscriptAssembler, SpeakerReconciler, ChunkPlan (ChunkMode), SpeakersFile.
  • Ten31Transcripts/Visual/VisualCapture/VisualObserver (ScreenCaptureKit, ~3fps), GridCallAnalyzer (+ FrameSampler, TextRecognizer, TimelineBuilder, VisualTimeline, SpeakerObservation).
  • Ten31Transcripts/Adapters/ — per-app screen-readers (MeetAdapter, ZoomAdapter, TeamsAdapter, SignalAdapter) + AdapterRegistry.
  • Ten31Transcripts/Audio/AudioRecorder, MicVAD, ChannelSelfVAD.
  • Ten31Transcripts/Backend/SparkControlClient, GatewayLLMClient, VoiceprintStore, SparkControlHealth, InsecureTrustDelegate (TLS skip).
  • Ten31Transcripts/Recap/RecapAnalyzer, RecapRenderer (writes transcript.md + recap.html), RecapModels, RecapTemplate, SpeakerEditing, RecapEditModel.
  • Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/CallDetector; PermissionsManager; AppSettings (UserDefaults); SwiftUI views + AppKit window hosts; Info.plist + entitlements.
  • Ten31TranscriptsTests/ — XCTest. example-screenshots/ — real fixtures (gitignored). docs/, README.md.
  • Runtime output (default ~/Ten31Transcripts/sessions/<ts>_<app>/, configurable in Settings): mic.wav, system.wav, mixed_mono_16k.wav, self_vad.json, visual_timeline.json, speakers.json (output), cluster_fingerprints.json, recap.{html,json}, transcript.md.

Conventions

  • Match the surrounding file's style; small reviewable diffs; comments explain why, not what.
  • Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
  • Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote origin (branch main, over SSH) after committing; the remote URL lives in .git/config, kept out of source. Branch before committing; never commit to main without asking.
  • Never commit recordings, transcripts, screenshots, or the generated *.xcodeproj.
  • No API keys/tokens/passwords in the repo. The backend host ($SPARK_BACKEND_URL) and the Apple Team ID (Config/Signing.xcconfig, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: DEVELOPER_DIR (required) and optional SPARK_BACKEND_URL.
  • Git history scrubbed (2026-06-13): the private backend host + LAN IP were purged from all commits via git filter-repo (replaced with the your-spark-backend.local placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: ../ten31-transcripts-prehistory-rewrite.bundle. A second rewrite the same day purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (192.0.2.1/192.0.2.2) and force-pushing; 0 hits across refs; backup bundle ../ten31-transcripts-pre-ip-scrub.bundle. The Apple Team ID was intentionally not scrubbed (it's public in every signed binary) — don't re-flag it.

Always

  • Set DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer on every xcodebuild.
  • Run xcodegen generate after adding/removing/renaming source files.
  • Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
  • Identify self by the mic channel + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
  • Treat visual active-speaker cues as naming hints over audio diarization (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
  • Send the backend dual-channel (mic_file + system_file) when the system track is healthy, else the mono mixed_mono_16k.wav; keep backend calls sequential (one in flight).
  • After any code change, rebuild Release + ditto to /Applications — the installed copy does not auto-update.

Never

  • Never write video frames to disk — analyze in-memory and release immediately (privacy non-negotiable).
  • Never add Co-Authored-By / "Generated with" / any AI or tool attribution to commits or PRs.
  • Never commit secrets, recordings, transcripts, or example-screenshots/ (faces + contact names).
  • Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
  • Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin hollow coloured ring; require thin-edge + hue gate (see GridCallAnalyzer.isHollow, FrameSampler.thinColoredPoints).
  • Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
  • Never send call audio to a raw IP the user didn't configure. The backend host ($SPARK_BACKEND_URL) is a private .local mDNS name a plain swiftc binary can't resolve via URLSession (-1009) — use the real app for backend runs (or curl for health checks).
  • Never commit to main or force-push a shared branch; branch first and ask.

Current state

Present tense; overwritten each session. 73 tests pass; /Applications/Ten31Transcripts.app matches HEAD and runs; working tree clean and pushed to origin/main. A full independent evaluation ran 2026-06-13 → EVALUATION.md (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below. The eval's P1 (TLS) is now fixed and verified against the live backend.

  • Working: call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (transcript.md + recap-relay-styled recap.html), speaker editor, configurable chunk length, standalone Settings window.
  • In progress: the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's visual_timeline.json predates the fix.
  • Done this session (was eval P1): TLS validation is now on by default and the skip-TLS escape hatch is scoped to the configured host (InsecureTrustDelegate.allowsTrustOverride, covered by InsecureTrustDelegateTests). Supported path = the StartOS Root CA trusted in the System keychain; verified URLSession default validation returns 200 against both the primary backend IP and its fallback.
  • Work queue (next up): wire the backend URL + primary→fallback into config. Today it's a single backendBaseURL with no fallback logic, and on this Mac no value is saved (so it resolves to the your-spark-backend.local placeholder); the real setup is a primary LAN IP with a fallback IP (both port 62419) — the actual addresses live in Settings/UserDefaults, never source.
  • Known debt (P2 — fix before wider use):
    • RecapAnalyzer.mmss() fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backend duration (e.g. 1e400Double.infinity) aborts the app at recap-render time — add a finite-guard fallback (RecapAnalyzer.swift:137).
    • README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment AppSettings.swift:7; and README.md:49 still calls skip-TLS "on by default" (now off). Rewrite to match reality.
    • SessionController (670 lines, the most concurrency-dense file) has zero unit tests — cover pendingAutoStop (auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor.
  • Deferred (P3 — later decision or bulk cleanup; full evidence in EVALUATION.md): docs/ specs drifted from the dual-channel API + recap phase; docs/01 §7 lists already-resolved open items; docs/02 §2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings under Audio//Detection/ are incomplete; the manifest.json sha256 contract is specced but never written; env-var precedence footgun (saved URL shadows SPARK_BACKEND_URL); SessionController owns three jobs (extract the open-panel UI); unused NSAppleEventsUsageDescription; unauthenticated LAN backend (consider a shared bearer token).
  • Known bugs: Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
  • Next (product validation — no agent could reach the live backend, so this stays manual): (1) re-process the saved Meet session in the app, then read its speakers.json + cluster_fingerprints.json to confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic "Me"; "Grant" only ever lives in local UserDefaults.)