ten31-transcripts/README.md

# Ten31 Transcripts

Native macOS menu-bar app that auto-detects conference calls, records dual-track
audio while watching the call window for active-speaker cues, and hands the audio
plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
the transcription, diarization, and speaker naming — producing named transcripts
and meeting recaps.

It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
the backend; the app only records, watches, packages, and reconciles hints.

## How it works

1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
   notices and (optionally) auto-starts a session.
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
   `ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
   active speaker. Video frames are analyzed in memory and released immediately —
   **never written to disk**.
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
   (`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
   The visual timeline rides along as naming hints. Backend calls are sequential
   (one in flight) to respect the single-GPU backend.
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
   (Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
   hints and your stored voiceprints.
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
   `transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
   names after the fact.

**You** are identified by the mic channel plus the single name in *Settings → Your
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)

## One-time setup

1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
   accept the license prompt.
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
   ```sh
   brew install xcodegen
   ```
3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
   `Config/Signing.xcconfig`. Copy the template and set your team:
   ```sh
   cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM
   ```
   `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
   team automatically. Keep the value stable so macOS preserves the app's permission
   (TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
   overwrites Xcode-side changes.
4. **Generate the project** (re-run any time you add/remove/rename a source file):
   ```sh
   xcodegen generate
   ```
   This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).

## Build & run

The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).

To build a standalone app and install it (Xcode doesn't need to stay open) — note the
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
`xcode-select` may point at the Command Line Tools, so set it on **every**
`xcodebuild`:

```sh
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
```

The installed copy does **not** auto-update — rebuild and `ditto` again after changes.

Run the test suite:

```sh
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
```

## Permissions

The menu panel shows live status for the three permissions the app needs, each with
Grant / Open Settings buttons:

- **Microphone** — to record your side of the call.
- **Screen Recording** — to capture system audio and watch the call window.
- **Accessibility** — to read window/participant information.

## Backend setup

Point the app at your SparkControl backend in **Settings → SparkControl backend**.
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
URL lives in Settings and never touches source.

The backend sits behind a Start9 self-signed Root CA. The supported path is to
**install the StartOS Root CA in your System keychain**, after which normal TLS
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
default** and **scoped to the configured backend host** — it never becomes
"trust any server."

## Output

Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
in Settings):

```
mic.wav  system.wav  mixed_mono_16k.wav    # audio (dual-track + mono mix)
self_vad.json  visual_timeline.json        # self voice-activity + visual hints
speakers.json  cluster_fingerprints.json   # reconciled speakers + voiceprints
transcript.md  recap.html  recap.json      # final outputs
```

## Project layout

```
project.yml                # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/
  App/         @main entry + AppDelegate
  Detection/   CallDetector — which app is in a call
  Audio/       dual-track capture, mixing, resampling, self-VAD
  Visual/      ScreenCaptureKit capture + grid analysis → speaker timeline
  Adapters/    per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
  Session/     SessionController state machine, packaging, reconciliation
  Backend/     SparkControl + LLM clients, voiceprint store, TLS handling
  Recap/       transcript.md + recap.html rendering, speaker editor
  Permissions/ Settings/ UI/ Support/   (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/     # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/                      # architecture & data-contract design notes
```

## Notes

- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
  LAN-only tool that must observe other apps. Revisit only if distributing.
- **Privacy:** video frames are never written to disk; recordings, transcripts, and
  screenshots are gitignored and never committed.
- `AGENTS.md` is the canonical reference for build commands, conventions, and current
  state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
  data-contract design notes.