Rewrite README for the shipped app; fix stale AppSettings comment

The README still described "Phase 0 (scaffold)" — no audio capture, call detection, screen reading, or backend hand-off — for an app that ships all of it. Rewrite it to document the real detect/record/send/transcribe/recap pipeline, the standalone build+install commands, backend and Start9 Root CA setup (skip-TLS is off by default and host-scoped, not on by default), output files, and the real project layout. Also fix the matching "Phase 0" comment in AppSettings.
This commit is contained in:
Grant Gilliam
2026-06-16 21:54:54 -05:00
parent b42b591690
commit 85ea8fde45
2 changed files with 115 additions and 43 deletions
+110 -38
View File
@@ -1,74 +1,146 @@
# Ten31 Transcripts # Ten31 Transcripts
Native macOS menu-bar app that auto-detects conference calls, records local audio, Native macOS menu-bar app that auto-detects conference calls, records dual-track
builds a visual-derived speaker timeline, and hands audio + timeline to the audio while watching the call window for active-speaker cues, and hands the audio
SparkControl backend for naming/transcription. See `docs/` for the full spec. plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
the transcription, diarization, and speaker naming — producing named transcripts
and meeting recaps.
This repo is at **Phase 0** (scaffold, permissions, backend health check). It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
the backend; the app only records, watches, packages, and reconciles hints.
## How it works
1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
notices and (optionally) auto-starts a session.
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
`ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
active speaker. Video frames are analyzed in memory and released immediately —
**never written to disk**.
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
(`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
The visual timeline rides along as naming hints. Backend calls are sequential
(one in flight) to respect the single-GPU backend.
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
(Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
hints and your stored voiceprints.
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
`transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
names after the fact.
**You** are identified by the mic channel plus the single name in *Settings → Your
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
## One-time setup ## One-time setup
1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and 1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
accept the license prompt. accept the license prompt.
2. **Install XcodeGen** (generates the Xcode project from `project.yml`): 2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
```sh ```sh
brew install xcodegen brew install xcodegen
``` ```
3. **Set your signing team.** The Apple Team ID is kept out of source in a 3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
gitignored `Config/Signing.xcconfig`. Copy the template and set your team: `Config/Signing.xcconfig`. Copy the template and set your team:
```sh ```sh
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
``` ```
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
team automatically — no manual selection. Keep the value stable so macOS team automatically. Keep the value stable so macOS preserves the app's permission
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, (TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
not Xcode — `xcodegen generate` overwrites Xcode-side changes. overwrites Xcode-side changes.
4. **Generate the project:** 4. **Generate the project** (re-run any time you add/remove/rename a source file):
```sh ```sh
xcodegen generate xcodegen generate
``` ```
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time). This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).
5. **Open it:**
## Build & run
The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
`xcode-select` may point at the Command Line Tools, so set it on **every**
`xcodebuild`:
```sh ```sh
open Ten31Transcripts.xcodeproj DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
``` ```
6. Press **Run** (⌘R).
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let The installed copy does **not** auto-update — rebuild and `ditto` again after changes.
> Xcode reload the project. The signing team persists because it lives in
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
> rebuilds.
## What Phase 0 does Run the test suite:
- Launches as a menu-bar-only app (no Dock icon). ```sh
- Menu panel shows live status for the three permissions it needs — **Microphone**, DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
**Screen Recording**, **Accessibility** — with Grant / Open Settings buttons. -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
- Shows a **backend health check** (`GET /api/status`) against the configured host. -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
- **Settings:** backend base URL, skip-TLS toggle (on by default for the ```
self-signed cert), output folder, and adapter toggles (inert this phase).
No audio capture, call detection, screen reading, or backend hand-off yet — those ## Permissions
arrive in Phases 16 (`docs/04_BUILD_PLAN.md`).
The menu panel shows live status for the three permissions the app needs, each with
Grant / Open Settings buttons:
- **Microphone** — to record your side of the call.
- **Screen Recording** — to capture system audio and watch the call window.
- **Accessibility** — to read window/participant information.
## Backend setup
Point the app at your SparkControl backend in **Settings → SparkControl backend**.
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
URL lives in Settings and never touches source.
The backend sits behind a Start9 self-signed Root CA. The supported path is to
**install the StartOS Root CA in your System keychain**, after which normal TLS
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
default** and **scoped to the configured backend host** — it never becomes
"trust any server."
## Output
Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
in Settings):
```
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
self_vad.json visual_timeline.json # self voice-activity + visual hints
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
transcript.md recap.html recap.json # final outputs
```
## Project layout ## Project layout
``` ```
project.yml # XcodeGen recipe → generates the .xcodeproj project.yml # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/ Ten31Transcripts/
App/ Ten31TranscriptsApp.swift, AppDelegate.swift App/ @main entry + AppDelegate
UI/ MenuBarView, SettingsView, PermissionRow Detection/ CallDetector — which app is in a call
Permissions/PermissionsManager.swift Audio/ dual-track capture, mixing, resampling, self-VAD
Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
Settings/ AppSettings.swift Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
Support/ Info.plist, Ten31Transcripts.entitlements Session/ SessionController state machine, packaging, reconciliation
Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3 Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
Recap/ transcript.md + recap.html rendering, speaker editor
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/ # architecture & data-contract design notes
``` ```
## Notes ## Notes
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal, - **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
LAN-only tool that must observe other apps. Revisit only if distributing. LAN-only tool that must observe other apps. Revisit only if distributing.
- The backend host is a private LAN address — set it in **Settings**, or seed it - **Privacy:** video frames are never written to disk; recordings, transcripts, and
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral screenshots are gitignored and never committed.
placeholder (`https://your-spark-backend.local`). - `AGENTS.md` is the canonical reference for build commands, conventions, and current
state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
data-contract design notes.
+2 -2
View File
@@ -3,8 +3,8 @@ import Combine
/// User-facing settings, persisted to `UserDefaults`. /// User-facing settings, persisted to `UserDefaults`.
/// ///
/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles. /// Covers the backend host + TLS handling, output folder, your name, chunk
/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 34). /// length, per-app adapter toggles, and the auto-record/auto-send/recap flags.
@MainActor @MainActor
final class AppSettings: ObservableObject { final class AppSettings: ObservableObject {