The README still described "Phase 0 (scaffold)" — no audio capture, call detection, screen reading, or backend hand-off — for an app that ships all of it. Rewrite it to document the real detect/record/send/transcribe/recap pipeline, the standalone build+install commands, backend and Start9 Root CA setup (skip-TLS is off by default and host-scoped, not on by default), output files, and the real project layout. Also fix the matching "Phase 0" comment in AppSettings.
6.7 KiB
Ten31 Transcripts
Native macOS menu-bar app that auto-detects conference calls, records dual-track audio while watching the call window for active-speaker cues, and hands the audio plus a visual speaker timeline to a self-hosted SparkControl backend that does the transcription, diarization, and speaker naming — producing named transcripts and meeting recaps.
It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on the backend; the app only records, watches, packages, and reconciles hints.
How it works
- Detect — a call in Google Meet, Zoom, Teams, or Signal starts;
CallDetectornotices and (optionally) auto-starts a session. - Record + watch — dual-track audio (your mic + system output) is captured while
ScreenCaptureKitsamples the call window (~3 fps) to read names and spot the active speaker. Video frames are analyzed in memory and released immediately — never written to disk. - Package + send — audio is chunked and sent to the backend, dual-channel
(
mic_file+system_file) when the system track is healthy, else a mono mix. The visual timeline rides along as naming hints. Backend calls are sequential (one in flight) to respect the single-GPU backend. - Transcribe + name — the backend diarizes (Sortformer/TitaNet) and an LLM (Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual hints and your stored voiceprints.
- Reconcile + recap — the app reconciles speaker hints, then writes a readable
transcript.mdand an HTMLrecap.html. A built-in speaker editor lets you fix names after the fact.
You are identified by the mic channel plus the single name in Settings → Your name — that name is reserved so the LLM never assigns it to anyone else. (There's no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
One-time setup
- Install Xcode from the Mac App Store (free; large download). Open it once and accept the license prompt.
- Install XcodeGen (generates the Xcode project from
project.yml):brew install xcodegen - Set your signing team. The Apple Team ID is kept out of source in a gitignored
Config/Signing.xcconfig. Copy the template and set your team:cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAMxcodegenwires it in viaconfigFiles, so Signing & Capabilities shows the team automatically. Keep the value stable so macOS preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, not Xcode —xcodegen generateoverwrites Xcode-side changes. - Generate the project (re-run any time you add/remove/rename a source file):
This creates
xcodegen generateTen31Transcripts.xcodeproj(gitignored — regenerate, don't edit).
Build & run
The simplest path is to open Ten31Transcripts.xcodeproj and press Run (⌘R).
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
DEVELOPER_DIR prefix: full Xcode lives at /Applications/Xcode.app but
xcode-select may point at the Command Line Tools, so set it on every
xcodebuild:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
The installed copy does not auto-update — rebuild and ditto again after changes.
Run the test suite:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
Permissions
The menu panel shows live status for the three permissions the app needs, each with Grant / Open Settings buttons:
- Microphone — to record your side of the call.
- Screen Recording — to capture system audio and watch the call window.
- Accessibility — to read window/participant information.
Backend setup
Point the app at your SparkControl backend in Settings → SparkControl backend.
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
SPARK_BACKEND_URL env var, else a neutral placeholder default. The committed
default is only a placeholder (https://your-spark-backend.local) — your real LAN
URL lives in Settings and never touches source.
The backend sits behind a Start9 self-signed Root CA. The supported path is to install the StartOS Root CA in your System keychain, after which normal TLS validation succeeds. Skip TLS verification is an opt-in escape hatch, off by default and scoped to the configured backend host — it never becomes "trust any server."
Output
Each session writes to ~/Ten31Transcripts/sessions/<timestamp>_<app>/ (configurable
in Settings):
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
self_vad.json visual_timeline.json # self voice-activity + visual hints
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
transcript.md recap.html recap.json # final outputs
Project layout
project.yml # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/
App/ @main entry + AppDelegate
Detection/ CallDetector — which app is in a call
Audio/ dual-track capture, mixing, resampling, self-VAD
Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
Session/ SessionController state machine, packaging, reconciliation
Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
Recap/ transcript.md + recap.html rendering, speaker editor
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/ # architecture & data-contract design notes
Notes
- App Sandbox is off and Hardened Runtime is off — this is a personal, LAN-only tool that must observe other apps. Revisit only if distributing.
- Privacy: video frames are never written to disk; recordings, transcripts, and screenshots are gitignored and never committed.
AGENTS.mdis the canonical reference for build commands, conventions, and current state;ROADMAP.mdholds the backlog;docs/holds the architecture and data-contract design notes.