When a recording finishes, ask for a meeting name and rename the session folder from the auto stamp `<yyyy-MM-dd'T'HH-mm-ss>_<app>` to the readable `<date>_<name>_<app>` (dropping HH-MM-SS), so sessions/ is easy to scan. Skipping or leaving it blank keeps the timestamped name. The rename runs after the recorder and visual capture finish (files closed) and before finish() captures the folder for backend processing, so the renamed folder is what flows downstream; finish() re-derives the track URLs from the possibly-moved folder. The quit path never prompts, and a quit with the prompt open ends its modal so termination isn't blocked. Naming/parsing logic lives in a pure, unit-tested SessionNaming; recapTitle moves there and now understands both folder forms.
Ten31 Transcripts
Native macOS menu-bar app that auto-detects conference calls, records dual-track audio while watching the call window for active-speaker cues, and hands the audio plus a visual speaker timeline to a self-hosted SparkControl backend that does the transcription, diarization, and speaker naming — producing named transcripts and meeting recaps.
It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on the backend; the app only records, watches, packages, and reconciles hints.
How it works
- Detect — a call in Google Meet, Zoom, Teams, or Signal starts;
CallDetectornotices and (optionally) auto-starts a session. - Record + watch — dual-track audio (your mic + system output) is captured while
ScreenCaptureKitsamples the call window (~3 fps) to read names and spot the active speaker. Video frames are analyzed in memory and released immediately — never written to disk. - Package + send — audio is chunked and sent to the backend, dual-channel
(
mic_file+system_file) when the system track is healthy, else a mono mix. The visual timeline rides along as naming hints. Backend calls are sequential (one in flight) to respect the single-GPU backend. - Transcribe + name — the backend diarizes (Sortformer/TitaNet) and an LLM (Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual hints and your stored voiceprints.
- Reconcile + recap — the app reconciles speaker hints, then writes a readable
transcript.mdand an HTMLrecap.html. A built-in speaker editor lets you fix names after the fact.
You are identified by the mic channel plus the single name in Settings → Your name — that name is reserved so the LLM never assigns it to anyone else. (There's no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
One-time setup
- Install Xcode from the Mac App Store (free; large download). Open it once and accept the license prompt.
- Install XcodeGen (generates the Xcode project from
project.yml):brew install xcodegen - Set your signing team. The Apple Team ID is kept out of source in a gitignored
Config/Signing.xcconfig. Copy the template and set your team:cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAMxcodegenwires it in viaconfigFiles, so Signing & Capabilities shows the team automatically. Keep the value stable so macOS preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, not Xcode —xcodegen generateoverwrites Xcode-side changes. - Generate the project (re-run any time you add/remove/rename a source file):
This creates
xcodegen generateTen31Transcripts.xcodeproj(gitignored — regenerate, don't edit).
Build & run
The simplest path is to open Ten31Transcripts.xcodeproj and press Run (⌘R).
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
DEVELOPER_DIR prefix: full Xcode lives at /Applications/Xcode.app but
xcode-select may point at the Command Line Tools, so set it on every
xcodebuild:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
The installed copy does not auto-update — rebuild and ditto again after changes.
Run the test suite:
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
Permissions
The menu panel shows live status for the three permissions the app needs, each with Grant / Open Settings buttons:
- Microphone — to record your side of the call.
- Screen Recording — to capture system audio and watch the call window.
- Accessibility — to read window/participant information.
Backend setup
Point the app at your SparkControl backend in Settings → SparkControl backend.
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
SPARK_BACKEND_URL env var, else a neutral placeholder default. The committed
default is only a placeholder (https://your-spark-backend.local) — your real LAN
URL lives in Settings and never touches source.
The backend sits behind a Start9 self-signed Root CA. The supported path is to install the StartOS Root CA in your System keychain, after which normal TLS validation succeeds. Skip TLS verification is an opt-in escape hatch, off by default and scoped to the configured backend host — it never becomes "trust any server."
Output
Each session writes to ~/Ten31Transcripts/sessions/<timestamp>_<app>/ (configurable
in Settings):
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
self_vad.json visual_timeline.json # self voice-activity + visual hints
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
transcript.md recap.html recap.json # final outputs
Project layout
project.yml # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/
App/ @main entry + AppDelegate
Detection/ CallDetector — which app is in a call
Audio/ dual-track capture, mixing, resampling, self-VAD
Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
Session/ SessionController state machine, packaging, reconciliation
Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
Recap/ transcript.md + recap.html rendering, speaker editor
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/ # architecture & data-contract design notes
Notes
- App Sandbox is off and Hardened Runtime is off — this is a personal, LAN-only tool that must observe other apps. Revisit only if distributing.
- Privacy: video frames are never written to disk; recordings, transcripts, and screenshots are gitignored and never committed.
AGENTS.mdis the canonical reference for build commands, conventions, and current state;ROADMAP.mdholds the backlog;docs/holds the architecture and data-contract design notes.