13 KiB
AGENTS.md — Ten31 Transcripts
Native macOS menu-bar app that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted SparkControl backend that does transcription/diarization/naming — producing named transcripts and recaps.
Inbox check: At session start, if
~/Projects/standards/INBOX.mdexists, scan it for items tagged(ten31-transcripts)and surface them before proposing next steps; triage with/triage.
Stack (versions that matter)
- Swift 5.0, SwiftUI + AppKit, macOS 13.0 deployment target.
LSUIElement(menu-bar only, no Dock icon). - Project is generated by XcodeGen from
project.yml(brew install xcodegen).*.xcodeprojis gitignored — regenerate, don't edit. - Full Xcode lives at
/Applications/Xcode.app, butxcode-selectpoints at CommandLineTools → setDEVELOPER_DIRfor everyxcodebuild. - Bundle id
xyz.ten31.transcripts;DEVELOPMENT_TEAM(Apple Team ID) is set in a gitignoredConfig/Signing.xcconfig(copyConfig/Signing.xcconfig.exampleand set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds. - Backend: SparkControl gateway at
$SPARK_BACKEND_URL(a private LAN backend — IP or.localhost; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, host-scoped escape hatch, off by default — seeInsecureTrustDelegate). Resolution order: a value saved in Settings → SparkControl backend (UserDefaults) wins, else theSPARK_BACKEND_URLenv var, else the placeholder default inAppSettings.swift. Diarization = Sortformer/TitaNet (mono-only, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible/v1/chat/completions; audio via/api/audio/label-merge.
Commands
First time on a machine — create the local signing config (else xcodegen generate/signing won't find a team):
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
Regenerate the Xcode project (after adding/removing/renaming any source file):
xcodegen generate
Build + run all tests:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
Run a single test (target/class/method):
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
Build only: replace test with build. Lint/format: none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in ROADMAP.md.
Build a standalone app and install/run it (Xcode does not need to stay open):
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
Fast validation harness (preferred for visual/backend logic): compile the specific Ten31Transcripts/**.swift files plus a main.swift with xcrun --sdk macosx swiftc -O ... main.swift -o x and run against real fixtures (example-screenshots/) or saved sessions. Top-level code must live in the file literally named main.swift.
Layout (day one)
Ten31Transcripts/App/—@mainentry +AppDelegate.Ten31Transcripts/Session/—SessionController(state machine),TranscriptPipeline,SessionPackager(chunking),TranscriptAssembler,SpeakerReconciler,ChunkPlan(ChunkMode),SpeakersFile,SessionNaming(pure folder-name + recap-title logic).Ten31Transcripts/Visual/—VisualCapture/VisualObserver(ScreenCaptureKit, ~3fps),GridCallAnalyzer(+FrameSampler,TextRecognizer,TimelineBuilder,VisualTimeline,SpeakerObservation).Ten31Transcripts/Adapters/— per-app screen-readers (MeetAdapter,ZoomAdapter,TeamsAdapter,SignalAdapter) +AdapterRegistry.Ten31Transcripts/Audio/—AudioRecorder,MicVAD,ChannelSelfVAD,AudioMixer,MonoTrackWriter,Resampler.Ten31Transcripts/Backend/—SparkControlClient,GatewayLLMClient,VoiceprintStore,SparkControlHealth,InsecureTrustDelegate(TLS skip).Ten31Transcripts/Recap/—RecapAnalyzer,RecapRenderer(writestranscript.md+recap.html),RecapModels,RecapTemplate,SpeakerEditing,RecapEditModel.Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/—CallDetector/AudioInputProcesses/MicActivityMonitor;PermissionsManager;AppSettings(UserDefaults); SwiftUI views + AppKit window hosts;Info.plist+ entitlements.Ten31TranscriptsTests/— XCTest.example-screenshots/— real fixtures (gitignored).docs/,README.md.- Runtime output (default
~/Ten31Transcripts/sessions/<ts>_<app>/, configurable in Settings):mic.wav,system.wav,mixed_mono_16k.wav,self_vad.json,visual_timeline.json,speakers.json(output),cluster_fingerprints.json,recap.{html,json},transcript.md. The folder is created at session start as<yyyy-MM-dd'T'HH-mm-ss>_<app>; on stop the user can name the meeting and it's renamed to<date>_<name>_<app>(skipping keeps the auto stamp).
Conventions
- Match the surrounding file's style; small reviewable diffs; comments explain why, not what.
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote
origin(branchmain, over SSH) after committing, with my approval; the remote URL lives in.git/config, kept out of source. Work onmain— don't create feature branches unless I ask. - Gitea push gotcha:
origin's URL uses a raw.localmDNS host that intermittently fails to resolve (Could not resolve hostname, or a push that connects then stalls). Thegitea-homeSSH alias (in~/.ssh/config) points at the same Gitea server (port 59916, usergit) via a reliable HostName — the siblingstandardsrepo uses it. Reliable fallback:git push gitea-home:grant/ten31-transcripts.git mainthengit update-ref refs/remotes/origin/main main. Repointingoriginto the alias would make this permanent (not yet done). - Never commit recordings, transcripts, screenshots, or the generated
*.xcodeproj. - No API keys/tokens/passwords in the repo. The backend host (
$SPARK_BACKEND_URL) and the Apple Team ID (Config/Signing.xcconfig, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars:DEVELOPER_DIR(required) and optionalSPARK_BACKEND_URL. - Git history scrubbed (2026-06-13): the private backend host + LAN IP were purged from all commits via
git filter-repo(replaced with theyour-spark-backend.localplaceholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle:../ten31-transcripts-prehistory-rewrite.bundle. A second rewrite the same day purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (192.0.2.1/192.0.2.2) and force-pushing; 0 hits across refs; backup bundle../ten31-transcripts-pre-ip-scrub.bundle. The Apple Team ID was intentionally not scrubbed (it's public in every signed binary) — don't re-flag it.
Always
- Set
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developeron everyxcodebuild. - Run
xcodegen generateafter adding/removing/renaming source files. - Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
- Identify self by the mic channel + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
- Treat visual active-speaker cues as naming hints over audio diarization (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
- Send the backend dual-channel (
mic_file+system_file) when the system track is healthy, else the monomixed_mono_16k.wav; keep backend calls sequential (one in flight). - After any code change, rebuild Release +
dittoto/Applications— the installed copy does not auto-update.
Never
- Never write video frames to disk — analyze in-memory and release immediately (privacy non-negotiable).
- Never add Co-Authored-By / "Generated with" / any AI or tool attribution to commits or PRs.
- Never commit secrets, recordings, transcripts, or
example-screenshots/(faces + contact names). - Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin hollow coloured ring; require thin-edge + hue gate (see
GridCallAnalyzer.isHollow,FrameSampler.thinColoredPoints). - Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
- Never let a session-folder name put the meeting name where the app label is parsed from: the app must stay the last
_-segment (SessionController.appLabel(from:)reads.split("_").last;SessionNamingenforces this and disambiguates collisions on the name segment). Renames happen atfinish()-time after files are closed — re-derive track URLs from the (possibly moved) folder, never fromRecordingResult's start-time paths. - Never send call audio to a raw IP the user didn't configure. Offline backend checks: a
.localmDNS host can't be resolved by a plainswiftc/URLSession binary (-1009) — use the real app orcurl; but a configured raw IP is reachable from a plain swiftc URLSession binary (that's how the TLS fix was verified offline). - Never force-push a shared branch, and never push without my approval. (Work on
main— don't create feature branches unless I ask.)
Current state
Present tense; overwritten each session. main clean and pushed (HEAD a5c227e, pushed via the gitea-home alias — origin's .local host wouldn't resolve); /Applications/Ten31Transcripts.app rebuilt + installed from HEAD. Full suite re-run: 91 pass (was 73; +18 SessionNamingTests).
- This session (2026-06-17) — meeting-name prompt + folder rename: on stop, an NSAlert asks for a meeting name (Save/Skip) and the session folder is renamed
<ts>_<app>→<date>_<name>_<app>(HH-MM-SS dropped; Skip/blank keeps the stamp). Pure logic inSessionNaming(sanitize, leaf compose,recapTitlefor both forms); app label stays the last_-segment; collisions disambiguate on the name segment;finish()re-derives track URLs post-rename; quit never prompts and aborts an open prompt. Reviewer-reviewed; its P1 (quit-during-modal) + two P2s fixed. - Backend connected end-to-end: real LAN URL saved in Settings → SparkControl backend (off-repo:
defaults read xyz.ten31.transcripts backendBaseURL); committed default stays the placeholder. - Working: backend hand-off (live), call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked send, speaker reconciliation, recap, speaker editor, configurable chunk length, standalone Settings, meeting-name prompt + readable folders.
- Verify next (real app): the naming prompt + rename is unit-tested + builds but not yet exercised on a live stop — run a real recording, stop, name it, confirm the folder renames and backend output lands in the renamed folder.
- Next up: (a) repoint
origintogitea-homeso pushes stop hitting the flaky.localhost (see Conventions); (b) backend URL primary→fallback + themmss()NaN/∞ guard freebie (sketch first; keep real IPs out of source — use192.0.2.x). - In progress / unverified: the Meet visual fix (reject solid camera-off tiles) still has no clean end-to-end run — re-process the saved Meet session + a fresh Meet call (needs real app + backend).
- Known bugs / loose end: sparse Meet speaking-detection (faint blue border); sub-second junk "self" mic fragments; desktop-mic vs phone doesn't unify by voiceprint. Doc loose end:
docs/01 §5/docs/02 §2.4still list "AppleScript" as a Meet name source though the code uses window titles.