The app shipped with certificate validation bypassed globally and on by default — InsecureTrustDelegate trusted any cert from any host. That was the evaluation's P1: anyone on the LAN could MITM call audio, transcripts, and voiceprints. The backend's Start9 cert already validates under normal system trust when the StartOS Root CA is installed in the keychain (confirmed: URLSession default validation returns 200 against the backend and its fallback), so the bypass is unnecessary: - skip-TLS now defaults to off - when explicitly enabled, the bypass is scoped to the configured host via InsecureTrustDelegate.allowsTrustOverride, never "trust any server" - the host gate is pure and unit-tested (InsecureTrustDelegateTests) Docs reconciled: AGENTS.md backend/TLS line and Current state.
12 KiB
AGENTS.md — Ten31 Transcripts
Native macOS menu-bar app that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted SparkControl backend that does transcription/diarization/naming — producing named transcripts and recaps.
Stack (versions that matter)
- Swift 5.0, SwiftUI + AppKit, macOS 13.0 deployment target.
LSUIElement(menu-bar only, no Dock icon). - Project is generated by XcodeGen from
project.yml(brew install xcodegen).*.xcodeprojis gitignored — regenerate, don't edit. - Full Xcode lives at
/Applications/Xcode.app, butxcode-selectpoints at CommandLineTools → setDEVELOPER_DIRfor everyxcodebuild. - Bundle id
xyz.ten31.transcripts;DEVELOPMENT_TEAM(Apple Team ID) is set in a gitignoredConfig/Signing.xcconfig(copyConfig/Signing.xcconfig.exampleand set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds. - Backend: SparkControl gateway at
$SPARK_BACKEND_URL(a private LAN backend — IP or.localhost; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, host-scoped escape hatch, off by default — seeInsecureTrustDelegate). Resolution order: a value saved in Settings → SparkControl backend (UserDefaults) wins, else theSPARK_BACKEND_URLenv var, else the placeholder default inAppSettings.swift. Diarization = Sortformer/TitaNet (mono-only, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible/v1/chat/completions; audio via/api/audio/label-merge.
Commands
First time on a machine — create the local signing config (else xcodegen generate/signing won't find a team):
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
Regenerate the Xcode project (after adding/removing/renaming any source file):
xcodegen generate
Build + run all tests:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
Run a single test (target/class/method):
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
Build only: replace test with build. Lint/format: none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in ROADMAP.md.
Build a standalone app and install/run it (Xcode does not need to stay open):
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
Fast validation harness (preferred for visual/backend logic): compile the specific Ten31Transcripts/**.swift files plus a main.swift with xcrun --sdk macosx swiftc -O ... main.swift -o x and run against real fixtures (example-screenshots/) or saved sessions. Top-level code must live in the file literally named main.swift.
Layout (day one)
Ten31Transcripts/App/—@mainentry +AppDelegate.Ten31Transcripts/Session/—SessionController(state machine),TranscriptPipeline,SessionPackager(chunking),TranscriptAssembler,SpeakerReconciler,ChunkPlan(ChunkMode),SpeakersFile.Ten31Transcripts/Visual/—VisualCapture/VisualObserver(ScreenCaptureKit, ~3fps),GridCallAnalyzer(+FrameSampler,TextRecognizer,TimelineBuilder,VisualTimeline,SpeakerObservation).Ten31Transcripts/Adapters/— per-app screen-readers (MeetAdapter,ZoomAdapter,TeamsAdapter,SignalAdapter) +AdapterRegistry.Ten31Transcripts/Audio/—AudioRecorder,MicVAD,ChannelSelfVAD.Ten31Transcripts/Backend/—SparkControlClient,GatewayLLMClient,VoiceprintStore,SparkControlHealth,InsecureTrustDelegate(TLS skip).Ten31Transcripts/Recap/—RecapAnalyzer,RecapRenderer(writestranscript.md+recap.html),RecapModels,RecapTemplate,SpeakerEditing,RecapEditModel.Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/—CallDetector;PermissionsManager;AppSettings(UserDefaults); SwiftUI views + AppKit window hosts;Info.plist+ entitlements.Ten31TranscriptsTests/— XCTest.example-screenshots/— real fixtures (gitignored).docs/,README.md.- Runtime output (default
~/Ten31Transcripts/sessions/<ts>_<app>/, configurable in Settings):mic.wav,system.wav,mixed_mono_16k.wav,self_vad.json,visual_timeline.json,speakers.json(output),cluster_fingerprints.json,recap.{html,json},transcript.md.
Conventions
- Match the surrounding file's style; small reviewable diffs; comments explain why, not what.
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote
origin(branchmain, over SSH) after committing; the remote URL lives in.git/config, kept out of source. Branch before committing; never commit tomainwithout asking. - Never commit recordings, transcripts, screenshots, or the generated
*.xcodeproj. - No API keys/tokens/passwords in the repo. The backend host (
$SPARK_BACKEND_URL) and the Apple Team ID (Config/Signing.xcconfig, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars:DEVELOPER_DIR(required) and optionalSPARK_BACKEND_URL. - Git history scrubbed (2026-06-13): the private backend host + LAN IP were purged from all commits via
git filter-repo(replaced with theyour-spark-backend.localplaceholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle:../ten31-transcripts-prehistory-rewrite.bundle. The Apple Team ID was intentionally not scrubbed (it's public in every signed binary) — don't re-flag it.
Always
- Set
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developeron everyxcodebuild. - Run
xcodegen generateafter adding/removing/renaming source files. - Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
- Identify self by the mic channel + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
- Treat visual active-speaker cues as naming hints over audio diarization (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
- Send the backend dual-channel (
mic_file+system_file) when the system track is healthy, else the monomixed_mono_16k.wav; keep backend calls sequential (one in flight). - After any code change, rebuild Release +
dittoto/Applications— the installed copy does not auto-update.
Never
- Never write video frames to disk — analyze in-memory and release immediately (privacy non-negotiable).
- Never add Co-Authored-By / "Generated with" / any AI or tool attribution to commits or PRs.
- Never commit secrets, recordings, transcripts, or
example-screenshots/(faces + contact names). - Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin hollow coloured ring; require thin-edge + hue gate (see
GridCallAnalyzer.isHollow,FrameSampler.thinColoredPoints). - Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
- Never send call audio to a raw IP the user didn't configure. The backend host (
$SPARK_BACKEND_URL) is a private.localmDNS name a plainswiftcbinary can't resolve via URLSession (-1009) — use the real app for backend runs (orcurlfor health checks). - Never commit to
mainor force-push a shared branch; branch first and ask.
Current state
Present tense; overwritten each session. 73 tests pass; /Applications/Ten31Transcripts.app matches HEAD and runs; working tree clean and pushed to origin/main. A full independent evaluation ran 2026-06-13 → EVALUATION.md (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below. The eval's P1 (TLS) is now fixed and verified against the live backend.
- Working: call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (
transcript.md+ recap-relay-styledrecap.html), speaker editor, configurable chunk length, standalone Settings window. - In progress: the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's
visual_timeline.jsonpredates the fix. - Done this session (was eval P1): TLS validation is now on by default and the skip-TLS escape hatch is scoped to the configured host (
InsecureTrustDelegate.allowsTrustOverride, covered byInsecureTrustDelegateTests). Supported path = the StartOS Root CA trusted in the System keychain; verifiedURLSessiondefault validation returns 200 against both192.0.2.1and the192.0.2.2fallback. - Work queue (next up): wire the backend URL + primary→fallback into config. Today it's a single
backendBaseURLwith no fallback logic, and on this Mac no value is saved (so it resolves to theyour-spark-backend.localplaceholder); the real setup is primaryhttps://192.0.2.1:62419→ fallbackhttps://192.0.2.2:62419. - Known debt (P2 — fix before wider use):
RecapAnalyzer.mmss()fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backendduration(e.g.1e400→Double.infinity) aborts the app at recap-render time — add a finite-guard fallback (RecapAnalyzer.swift:137).- README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment
AppSettings.swift:7; andREADME.md:49still calls skip-TLS "on by default" (now off). Rewrite to match reality. SessionController(670 lines, the most concurrency-dense file) has zero unit tests — coverpendingAutoStop(auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor.
- Deferred (P3 — later decision or bulk cleanup; full evidence in
EVALUATION.md):docs/specs drifted from the dual-channel API + recap phase;docs/01§7 lists already-resolved open items;docs/02§2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings underAudio//Detection/are incomplete; themanifest.jsonsha256 contract is specced but never written; env-var precedence footgun (saved URL shadowsSPARK_BACKEND_URL);SessionControllerowns three jobs (extract the open-panel UI); unusedNSAppleEventsUsageDescription; unauthenticated LAN backend (consider a shared bearer token). - Known bugs: Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
- Next (product validation — no agent could reach the live backend, so this stays manual): (1) re-process the saved Meet session in the app, then read its
speakers.json+cluster_fingerprints.jsonto confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic"Me"; "Grant" only ever lives in local UserDefaults.)