Files

T

Grant Gilliam 3629dbdaaa Default TLS validation on; scope skip-TLS bypass to the configured host

The app shipped with certificate validation bypassed globally and on by
default — InsecureTrustDelegate trusted any cert from any host. That was
the evaluation's P1: anyone on the LAN could MITM call audio, transcripts,
and voiceprints.

The backend's Start9 cert already validates under normal system trust when
the StartOS Root CA is installed in the keychain (confirmed: URLSession
default validation returns 200 against the backend and its fallback), so the
bypass is unnecessary:
- skip-TLS now defaults to off
- when explicitly enabled, the bypass is scoped to the configured host via
  InsecureTrustDelegate.allowsTrustOverride, never "trust any server"
- the host gate is pure and unit-tested (InsecureTrustDelegateTests)

Docs reconciled: AGENTS.md backend/TLS line and Current state.

2026-06-13 16:02:57 -05:00

12 KiB

Raw Blame History

AGENTS.md — Ten31 Transcripts

Native macOS menu-bar app that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted SparkControl backend that does transcription/diarization/naming — producing named transcripts and recaps.

Stack (versions that matter)

Swift 5.0, SwiftUI + AppKit, macOS 13.0 deployment target. LSUIElement (menu-bar only, no Dock icon).
Project is generated by XcodeGen from project.yml (brew install xcodegen). *.xcodeproj is gitignored — regenerate, don't edit.
Full Xcode lives at /Applications/Xcode.app, but xcode-select points at CommandLineTools → set DEVELOPER_DIR for every xcodebuild.
Bundle id xyz.ten31.transcripts; DEVELOPMENT_TEAM (Apple Team ID) is set in a gitignored Config/Signing.xcconfig (copy Config/Signing.xcconfig.example and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
Backend: SparkControl gateway at $SPARK_BACKEND_URL (a private LAN backend — IP or .local host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, host-scoped escape hatch, off by default — see InsecureTrustDelegate). Resolution order: a value saved in Settings → SparkControl backend (UserDefaults) wins, else the SPARK_BACKEND_URL env var, else the placeholder default in AppSettings.swift. Diarization = Sortformer/TitaNet (mono-only, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible /v1/chat/completions; audio via /api/audio/label-merge.

Commands

First time on a machine — create the local signing config (else xcodegen generate/signing won't find a team):

cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM

Regenerate the Xcode project (after adding/removing/renaming any source file):

xcodegen generate

Build + run all tests:

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd

Run a single test (target/class/method):

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
  -only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine

Build only: replace test with build. Lint/format: none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in ROADMAP.md. Build a standalone app and install/run it (Xcode does not need to stay open):

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app

Fast validation harness (preferred for visual/backend logic): compile the specific Ten31Transcripts/**.swift files plus a main.swift with xcrun --sdk macosx swiftc -O ... main.swift -o x and run against real fixtures (example-screenshots/) or saved sessions. Top-level code must live in the file literally named main.swift.

Layout (day one)

Ten31Transcripts/App/ — @main entry + AppDelegate.
Ten31Transcripts/Session/ — SessionController (state machine), TranscriptPipeline, SessionPackager (chunking), TranscriptAssembler, SpeakerReconciler, ChunkPlan (ChunkMode), SpeakersFile.
Ten31Transcripts/Visual/ — VisualCapture/VisualObserver (ScreenCaptureKit, ~3fps), GridCallAnalyzer (+ FrameSampler, TextRecognizer, TimelineBuilder, VisualTimeline, SpeakerObservation).
Ten31Transcripts/Adapters/ — per-app screen-readers (MeetAdapter, ZoomAdapter, TeamsAdapter, SignalAdapter) + AdapterRegistry.
Ten31Transcripts/Audio/ — AudioRecorder, MicVAD, ChannelSelfVAD.
Ten31Transcripts/Backend/ — SparkControlClient, GatewayLLMClient, VoiceprintStore, SparkControlHealth, InsecureTrustDelegate (TLS skip).
Ten31Transcripts/Recap/ — RecapAnalyzer, RecapRenderer (writes transcript.md + recap.html), RecapModels, RecapTemplate, SpeakerEditing, RecapEditModel.
Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/ — CallDetector; PermissionsManager; AppSettings (UserDefaults); SwiftUI views + AppKit window hosts; Info.plist + entitlements.
Ten31TranscriptsTests/ — XCTest. example-screenshots/ — real fixtures (gitignored). docs/, README.md.
Runtime output (default ~/Ten31Transcripts/sessions/<ts>_<app>/, configurable in Settings): mic.wav, system.wav, mixed_mono_16k.wav, self_vad.json, visual_timeline.json, speakers.json (output), cluster_fingerprints.json, recap.{html,json}, transcript.md.

Conventions

Match the surrounding file's style; small reviewable diffs; comments explain why, not what.
Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote origin (branch main, over SSH) after committing; the remote URL lives in .git/config, kept out of source. Branch before committing; never commit to main without asking.
Never commit recordings, transcripts, screenshots, or the generated *.xcodeproj.
No API keys/tokens/passwords in the repo. The backend host ($SPARK_BACKEND_URL) and the Apple Team ID (Config/Signing.xcconfig, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: DEVELOPER_DIR (required) and optional SPARK_BACKEND_URL.
Git history scrubbed (2026-06-13): the private backend host + LAN IP were purged from all commits via git filter-repo (replaced with the your-spark-backend.local placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: ../ten31-transcripts-prehistory-rewrite.bundle. The Apple Team ID was intentionally not scrubbed (it's public in every signed binary) — don't re-flag it.

Always

Set DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer on every xcodebuild.
Run xcodegen generate after adding/removing/renaming source files.
Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
Identify self by the mic channel + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
Treat visual active-speaker cues as naming hints over audio diarization (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
Send the backend dual-channel (mic_file + system_file) when the system track is healthy, else the mono mixed_mono_16k.wav; keep backend calls sequential (one in flight).
After any code change, rebuild Release + ditto to /Applications — the installed copy does not auto-update.

Never

Never write video frames to disk — analyze in-memory and release immediately (privacy non-negotiable).
Never add Co-Authored-By / "Generated with" / any AI or tool attribution to commits or PRs.
Never commit secrets, recordings, transcripts, or example-screenshots/ (faces + contact names).
Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin hollow coloured ring; require thin-edge + hue gate (see GridCallAnalyzer.isHollow, FrameSampler.thinColoredPoints).
Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
Never send call audio to a raw IP the user didn't configure. The backend host ($SPARK_BACKEND_URL) is a private .local mDNS name a plain swiftc binary can't resolve via URLSession (-1009) — use the real app for backend runs (or curl for health checks).
Never commit to main or force-push a shared branch; branch first and ask.

Current state

Present tense; overwritten each session. 73 tests pass; /Applications/Ten31Transcripts.app matches HEAD and runs; working tree clean and pushed to origin/main. A full independent evaluation ran 2026-06-13 → EVALUATION.md (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below. The eval's P1 (TLS) is now fixed and verified against the live backend.

Working: call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (transcript.md + recap-relay-styled recap.html), speaker editor, configurable chunk length, standalone Settings window.
In progress: the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's visual_timeline.json predates the fix.
Done this session (was eval P1): TLS validation is now on by default and the skip-TLS escape hatch is scoped to the configured host (InsecureTrustDelegate.allowsTrustOverride, covered by InsecureTrustDelegateTests). Supported path = the StartOS Root CA trusted in the System keychain; verified URLSession default validation returns 200 against both 192.0.2.1 and the 192.0.2.2 fallback.
Work queue (next up): wire the backend URL + primary→fallback into config. Today it's a single backendBaseURL with no fallback logic, and on this Mac no value is saved (so it resolves to the your-spark-backend.local placeholder); the real setup is primary https://192.0.2.1:62419 → fallback https://192.0.2.2:62419.
Known debt (P2 — fix before wider use):
- RecapAnalyzer.mmss() fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backend duration (e.g. 1e400 → Double.infinity) aborts the app at recap-render time — add a finite-guard fallback (RecapAnalyzer.swift:137).
- README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment AppSettings.swift:7; and README.md:49 still calls skip-TLS "on by default" (now off). Rewrite to match reality.
- SessionController (670 lines, the most concurrency-dense file) has zero unit tests — cover pendingAutoStop (auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor.
Deferred (P3 — later decision or bulk cleanup; full evidence in EVALUATION.md): docs/ specs drifted from the dual-channel API + recap phase; docs/01 §7 lists already-resolved open items; docs/02 §2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings under Audio//Detection/ are incomplete; the manifest.json sha256 contract is specced but never written; env-var precedence footgun (saved URL shadows SPARK_BACKEND_URL); SessionController owns three jobs (extract the open-panel UI); unused NSAppleEventsUsageDescription; unauthenticated LAN backend (consider a shared bearer token).
Known bugs: Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
Next (product validation — no agent could reach the live backend, so this stays manual): (1) re-process the saved Meet session in the app, then read its speakers.json + cluster_fingerprints.json to confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic "Me"; "Grant" only ever lives in local UserDefaults.)

12 KiB Raw Blame History Unescape Escape