Grant Gilliam a5c227ef1c Prompt for a meeting name on stop; rename the session folder
When a recording finishes, ask for a meeting name and rename the session
folder from the auto stamp `<yyyy-MM-dd'T'HH-mm-ss>_<app>` to the readable
`<date>_<name>_<app>` (dropping HH-MM-SS), so sessions/ is easy to scan.
Skipping or leaving it blank keeps the timestamped name.

The rename runs after the recorder and visual capture finish (files closed)
and before finish() captures the folder for backend processing, so the
renamed folder is what flows downstream; finish() re-derives the track URLs
from the possibly-moved folder. The quit path never prompts, and a quit with
the prompt open ends its modal so termination isn't blocked.

Naming/parsing logic lives in a pure, unit-tested SessionNaming; recapTitle
moves there and now understands both folder forms.
2026-06-17 21:51:05 -05:00

Ten31 Transcripts

Native macOS menu-bar app that auto-detects conference calls, records dual-track audio while watching the call window for active-speaker cues, and hands the audio plus a visual speaker timeline to a self-hosted SparkControl backend that does the transcription, diarization, and speaker naming — producing named transcripts and meeting recaps.

It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on the backend; the app only records, watches, packages, and reconciles hints.

How it works

  1. Detect — a call in Google Meet, Zoom, Teams, or Signal starts; CallDetector notices and (optionally) auto-starts a session.
  2. Record + watch — dual-track audio (your mic + system output) is captured while ScreenCaptureKit samples the call window (~3 fps) to read names and spot the active speaker. Video frames are analyzed in memory and released immediately — never written to disk.
  3. Package + send — audio is chunked and sent to the backend, dual-channel (mic_file + system_file) when the system track is healthy, else a mono mix. The visual timeline rides along as naming hints. Backend calls are sequential (one in flight) to respect the single-GPU backend.
  4. Transcribe + name — the backend diarizes (Sortformer/TitaNet) and an LLM (Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual hints and your stored voiceprints.
  5. Reconcile + recap — the app reconciles speaker hints, then writes a readable transcript.md and an HTML recap.html. A built-in speaker editor lets you fix names after the fact.

You are identified by the mic channel plus the single name in Settings → Your name — that name is reserved so the LLM never assigns it to anyone else. (There's no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)

One-time setup

  1. Install Xcode from the Mac App Store (free; large download). Open it once and accept the license prompt.
  2. Install XcodeGen (generates the Xcode project from project.yml):
    brew install xcodegen
    
  3. Set your signing team. The Apple Team ID is kept out of source in a gitignored Config/Signing.xcconfig. Copy the template and set your team:
    cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM
    
    xcodegen wires it in via configFiles, so Signing & Capabilities shows the team automatically. Keep the value stable so macOS preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, not Xcode — xcodegen generate overwrites Xcode-side changes.
  4. Generate the project (re-run any time you add/remove/rename a source file):
    xcodegen generate
    
    This creates Ten31Transcripts.xcodeproj (gitignored — regenerate, don't edit).

Build & run

The simplest path is to open Ten31Transcripts.xcodeproj and press Run (⌘R).

To build a standalone app and install it (Xcode doesn't need to stay open) — note the DEVELOPER_DIR prefix: full Xcode lives at /Applications/Xcode.app but xcode-select may point at the Command Line Tools, so set it on every xcodebuild:

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app

The installed copy does not auto-update — rebuild and ditto again after changes.

Run the test suite:

DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd

Permissions

The menu panel shows live status for the three permissions the app needs, each with Grant / Open Settings buttons:

  • Microphone — to record your side of the call.
  • Screen Recording — to capture system audio and watch the call window.
  • Accessibility — to read window/participant information.

Backend setup

Point the app at your SparkControl backend in Settings → SparkControl backend. The resolution order is: the value saved in Settings (UserDefaults) wins, else the SPARK_BACKEND_URL env var, else a neutral placeholder default. The committed default is only a placeholder (https://your-spark-backend.local) — your real LAN URL lives in Settings and never touches source.

The backend sits behind a Start9 self-signed Root CA. The supported path is to install the StartOS Root CA in your System keychain, after which normal TLS validation succeeds. Skip TLS verification is an opt-in escape hatch, off by default and scoped to the configured backend host — it never becomes "trust any server."

Output

Each session writes to ~/Ten31Transcripts/sessions/<timestamp>_<app>/ (configurable in Settings):

mic.wav  system.wav  mixed_mono_16k.wav    # audio (dual-track + mono mix)
self_vad.json  visual_timeline.json        # self voice-activity + visual hints
speakers.json  cluster_fingerprints.json   # reconciled speakers + voiceprints
transcript.md  recap.html  recap.json      # final outputs

Project layout

project.yml                # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/
  App/         @main entry + AppDelegate
  Detection/   CallDetector — which app is in a call
  Audio/       dual-track capture, mixing, resampling, self-VAD
  Visual/      ScreenCaptureKit capture + grid analysis → speaker timeline
  Adapters/    per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
  Session/     SessionController state machine, packaging, reconciliation
  Backend/     SparkControl + LLM clients, voiceprint store, TLS handling
  Recap/       transcript.md + recap.html rendering, speaker editor
  Permissions/ Settings/ UI/ Support/   (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/     # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/                      # architecture & data-contract design notes

Notes

  • App Sandbox is off and Hardened Runtime is off — this is a personal, LAN-only tool that must observe other apps. Revisit only if distributing.
  • Privacy: video frames are never written to disk; recordings, transcripts, and screenshots are gitignored and never committed.
  • AGENTS.md is the canonical reference for build commands, conventions, and current state; ROADMAP.md holds the backlog; docs/ holds the architecture and data-contract design notes.
S
Description
No description provided
Readme 890 KiB
Languages
Swift 100%