diff --git a/README.md b/README.md index 33545df..33c74e5 100644 --- a/README.md +++ b/README.md @@ -1,74 +1,146 @@ # Ten31 Transcripts -Native macOS menu-bar app that auto-detects conference calls, records local audio, -builds a visual-derived speaker timeline, and hands audio + timeline to the -SparkControl backend for naming/transcription. See `docs/` for the full spec. +Native macOS menu-bar app that auto-detects conference calls, records dual-track +audio while watching the call window for active-speaker cues, and hands the audio +plus a visual speaker timeline to a self-hosted **SparkControl** backend that does +the transcription, diarization, and speaker naming — producing named transcripts +and meeting recaps. -This repo is at **Phase 0** (scaffold, permissions, backend health check). +It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on +the backend; the app only records, watches, packages, and reconciles hints. + +## How it works + +1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector` + notices and (optionally) auto-starts a session. +2. **Record + watch** — dual-track audio (your mic + system output) is captured while + `ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the + active speaker. Video frames are analyzed in memory and released immediately — + **never written to disk**. +3. **Package + send** — audio is chunked and sent to the backend, dual-channel + (`mic_file` + `system_file`) when the system track is healthy, else a mono mix. + The visual timeline rides along as naming hints. Backend calls are sequential + (one in flight) to respect the single-GPU backend. +4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM + (Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual + hints and your stored voiceprints. +5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable + `transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix + names after the fact. + +**You** are identified by the mic channel plus the single name in *Settings → Your +name* — that name is reserved so the LLM never assigns it to anyone else. (There's +no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.) ## One-time setup -1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and +1. **Install Xcode** from the Mac App Store (free; large download). Open it once and accept the license prompt. 2. **Install XcodeGen** (generates the Xcode project from `project.yml`): ```sh brew install xcodegen ``` -3. **Set your signing team.** The Apple Team ID is kept out of source in a - gitignored `Config/Signing.xcconfig`. Copy the template and set your team: +3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored + `Config/Signing.xcconfig`. Copy the template and set your team: ```sh cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM ``` `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the - team automatically — no manual selection. Keep the value stable so macOS - preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, - not Xcode — `xcodegen generate` overwrites Xcode-side changes. -4. **Generate the project:** + team automatically. Keep the value stable so macOS preserves the app's permission + (TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate` + overwrites Xcode-side changes. +4. **Generate the project** (re-run any time you add/remove/rename a source file): ```sh xcodegen generate ``` - This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time). -5. **Open it:** - ```sh - open Ten31Transcripts.xcodeproj - ``` -6. Press **Run** (⌘R). + This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit). -> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let -> Xcode reload the project. The signing team persists because it lives in -> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across -> rebuilds. +## Build & run -## What Phase 0 does +The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R). -- Launches as a menu-bar-only app (no Dock icon). -- Menu panel shows live status for the three permissions it needs — **Microphone**, - **Screen Recording**, **Accessibility** — with Grant / Open Settings buttons. -- Shows a **backend health check** (`GET /api/status`) against the configured host. -- **Settings:** backend base URL, skip-TLS toggle (on by default for the - self-signed cert), output folder, and adapter toggles (inert this phase). +To build a standalone app and install it (Xcode doesn't need to stay open) — note the +`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but +`xcode-select` may point at the Command Line Tools, so set it on **every** +`xcodebuild`: -No audio capture, call detection, screen reading, or backend hand-off yet — those -arrive in Phases 1–6 (`docs/04_BUILD_PLAN.md`). +```sh +DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \ + -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \ + -configuration Release -derivedDataPath /tmp/ten31-release build +ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app +open /Applications/Ten31Transcripts.app +``` + +The installed copy does **not** auto-update — rebuild and `ditto` again after changes. + +Run the test suite: + +```sh +DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \ + -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \ + -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd +``` + +## Permissions + +The menu panel shows live status for the three permissions the app needs, each with +Grant / Open Settings buttons: + +- **Microphone** — to record your side of the call. +- **Screen Recording** — to capture system audio and watch the call window. +- **Accessibility** — to read window/participant information. + +## Backend setup + +Point the app at your SparkControl backend in **Settings → SparkControl backend**. +The resolution order is: the value saved in Settings (UserDefaults) wins, else the +`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed +default is only a placeholder (`https://your-spark-backend.local`) — your real LAN +URL lives in Settings and never touches source. + +The backend sits behind a Start9 self-signed Root CA. The supported path is to +**install the StartOS Root CA in your System keychain**, after which normal TLS +validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by +default** and **scoped to the configured backend host** — it never becomes +"trust any server." + +## Output + +Each session writes to `~/Ten31Transcripts/sessions/_/` (configurable +in Settings): + +``` +mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix) +self_vad.json visual_timeline.json # self voice-activity + visual hints +speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints +transcript.md recap.html recap.json # final outputs +``` ## Project layout ``` -project.yml # XcodeGen recipe → generates the .xcodeproj +project.yml # XcodeGen recipe → generates the .xcodeproj Ten31Transcripts/ - App/ Ten31TranscriptsApp.swift, AppDelegate.swift - UI/ MenuBarView, SettingsView, PermissionRow - Permissions/PermissionsManager.swift - Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift - Settings/ AppSettings.swift - Support/ Info.plist, Ten31Transcripts.entitlements -Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3 + App/ @main entry + AppDelegate + Detection/ CallDetector — which app is in a call + Audio/ dual-track capture, mixing, resampling, self-VAD + Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline + Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry + Session/ SessionController state machine, packaging, reconciliation + Backend/ SparkControl + LLM clients, voiceprint store, TLS handling + Recap/ transcript.md + recap.html rendering, speaker editor + Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist) +Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math) +docs/ # architecture & data-contract design notes ``` ## Notes - **App Sandbox is off** and **Hardened Runtime is off** — this is a personal, LAN-only tool that must observe other apps. Revisit only if distributing. -- The backend host is a private LAN address — set it in **Settings**, or seed it - from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral - placeholder (`https://your-spark-backend.local`). +- **Privacy:** video frames are never written to disk; recordings, transcripts, and + screenshots are gitignored and never committed. +- `AGENTS.md` is the canonical reference for build commands, conventions, and current + state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and + data-contract design notes. diff --git a/Ten31Transcripts/Settings/AppSettings.swift b/Ten31Transcripts/Settings/AppSettings.swift index 037513d..7e7b0b2 100644 --- a/Ten31Transcripts/Settings/AppSettings.swift +++ b/Ten31Transcripts/Settings/AppSettings.swift @@ -3,8 +3,8 @@ import Combine /// User-facing settings, persisted to `UserDefaults`. /// -/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles. -/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 3–4). +/// Covers the backend host + TLS handling, output folder, your name, chunk +/// length, per-app adapter toggles, and the auto-record/auto-send/recap flags. @MainActor final class AppSettings: ObservableObject {