Rewrite README for the shipped app; fix stale AppSettings comment
The README still described "Phase 0 (scaffold)" — no audio capture, call detection, screen reading, or backend hand-off — for an app that ships all of it. Rewrite it to document the real detect/record/send/transcribe/recap pipeline, the standalone build+install commands, backend and Start9 Root CA setup (skip-TLS is off by default and host-scoped, not on by default), output files, and the real project layout. Also fix the matching "Phase 0" comment in AppSettings.
This commit is contained in:
@@ -1,74 +1,146 @@
|
|||||||
# Ten31 Transcripts
|
# Ten31 Transcripts
|
||||||
|
|
||||||
Native macOS menu-bar app that auto-detects conference calls, records local audio,
|
Native macOS menu-bar app that auto-detects conference calls, records dual-track
|
||||||
builds a visual-derived speaker timeline, and hands audio + timeline to the
|
audio while watching the call window for active-speaker cues, and hands the audio
|
||||||
SparkControl backend for naming/transcription. See `docs/` for the full spec.
|
plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
|
||||||
|
the transcription, diarization, and speaker naming — producing named transcripts
|
||||||
|
and meeting recaps.
|
||||||
|
|
||||||
This repo is at **Phase 0** (scaffold, permissions, backend health check).
|
It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
|
||||||
|
the backend; the app only records, watches, packages, and reconciles hints.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
|
||||||
|
notices and (optionally) auto-starts a session.
|
||||||
|
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
|
||||||
|
`ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
|
||||||
|
active speaker. Video frames are analyzed in memory and released immediately —
|
||||||
|
**never written to disk**.
|
||||||
|
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
|
||||||
|
(`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
|
||||||
|
The visual timeline rides along as naming hints. Backend calls are sequential
|
||||||
|
(one in flight) to respect the single-GPU backend.
|
||||||
|
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
|
||||||
|
(Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
|
||||||
|
hints and your stored voiceprints.
|
||||||
|
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
|
||||||
|
`transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
|
||||||
|
names after the fact.
|
||||||
|
|
||||||
|
**You** are identified by the mic channel plus the single name in *Settings → Your
|
||||||
|
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
|
||||||
|
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
|
||||||
|
|
||||||
## One-time setup
|
## One-time setup
|
||||||
|
|
||||||
1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and
|
1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
|
||||||
accept the license prompt.
|
accept the license prompt.
|
||||||
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
|
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
|
||||||
```sh
|
```sh
|
||||||
brew install xcodegen
|
brew install xcodegen
|
||||||
```
|
```
|
||||||
3. **Set your signing team.** The Apple Team ID is kept out of source in a
|
3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
|
||||||
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
|
`Config/Signing.xcconfig`. Copy the template and set your team:
|
||||||
```sh
|
```sh
|
||||||
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||||
```
|
```
|
||||||
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
||||||
team automatically — no manual selection. Keep the value stable so macOS
|
team automatically. Keep the value stable so macOS preserves the app's permission
|
||||||
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
|
(TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
|
||||||
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
|
overwrites Xcode-side changes.
|
||||||
4. **Generate the project:**
|
4. **Generate the project** (re-run any time you add/remove/rename a source file):
|
||||||
```sh
|
```sh
|
||||||
xcodegen generate
|
xcodegen generate
|
||||||
```
|
```
|
||||||
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).
|
||||||
5. **Open it:**
|
|
||||||
```sh
|
|
||||||
open Ten31Transcripts.xcodeproj
|
|
||||||
```
|
|
||||||
6. Press **Run** (⌘R).
|
|
||||||
|
|
||||||
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
## Build & run
|
||||||
> Xcode reload the project. The signing team persists because it lives in
|
|
||||||
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
|
|
||||||
> rebuilds.
|
|
||||||
|
|
||||||
## What Phase 0 does
|
The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).
|
||||||
|
|
||||||
- Launches as a menu-bar-only app (no Dock icon).
|
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
|
||||||
- Menu panel shows live status for the three permissions it needs — **Microphone**,
|
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
|
||||||
**Screen Recording**, **Accessibility** — with Grant / Open Settings buttons.
|
`xcode-select` may point at the Command Line Tools, so set it on **every**
|
||||||
- Shows a **backend health check** (`GET /api/status`) against the configured host.
|
`xcodebuild`:
|
||||||
- **Settings:** backend base URL, skip-TLS toggle (on by default for the
|
|
||||||
self-signed cert), output folder, and adapter toggles (inert this phase).
|
|
||||||
|
|
||||||
No audio capture, call detection, screen reading, or backend hand-off yet — those
|
```sh
|
||||||
arrive in Phases 1–6 (`docs/04_BUILD_PLAN.md`).
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-configuration Release -derivedDataPath /tmp/ten31-release build
|
||||||
|
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
|
||||||
|
open /Applications/Ten31Transcripts.app
|
||||||
|
```
|
||||||
|
|
||||||
|
The installed copy does **not** auto-update — rebuild and `ditto` again after changes.
|
||||||
|
|
||||||
|
Run the test suite:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
|
||||||
|
```
|
||||||
|
|
||||||
|
## Permissions
|
||||||
|
|
||||||
|
The menu panel shows live status for the three permissions the app needs, each with
|
||||||
|
Grant / Open Settings buttons:
|
||||||
|
|
||||||
|
- **Microphone** — to record your side of the call.
|
||||||
|
- **Screen Recording** — to capture system audio and watch the call window.
|
||||||
|
- **Accessibility** — to read window/participant information.
|
||||||
|
|
||||||
|
## Backend setup
|
||||||
|
|
||||||
|
Point the app at your SparkControl backend in **Settings → SparkControl backend**.
|
||||||
|
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
|
||||||
|
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
|
||||||
|
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
|
||||||
|
URL lives in Settings and never touches source.
|
||||||
|
|
||||||
|
The backend sits behind a Start9 self-signed Root CA. The supported path is to
|
||||||
|
**install the StartOS Root CA in your System keychain**, after which normal TLS
|
||||||
|
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
|
||||||
|
default** and **scoped to the configured backend host** — it never becomes
|
||||||
|
"trust any server."
|
||||||
|
|
||||||
|
## Output
|
||||||
|
|
||||||
|
Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
|
||||||
|
in Settings):
|
||||||
|
|
||||||
|
```
|
||||||
|
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
|
||||||
|
self_vad.json visual_timeline.json # self voice-activity + visual hints
|
||||||
|
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
|
||||||
|
transcript.md recap.html recap.json # final outputs
|
||||||
|
```
|
||||||
|
|
||||||
## Project layout
|
## Project layout
|
||||||
|
|
||||||
```
|
```
|
||||||
project.yml # XcodeGen recipe → generates the .xcodeproj
|
project.yml # XcodeGen recipe → generates the .xcodeproj
|
||||||
Ten31Transcripts/
|
Ten31Transcripts/
|
||||||
App/ Ten31TranscriptsApp.swift, AppDelegate.swift
|
App/ @main entry + AppDelegate
|
||||||
UI/ MenuBarView, SettingsView, PermissionRow
|
Detection/ CallDetector — which app is in a call
|
||||||
Permissions/PermissionsManager.swift
|
Audio/ dual-track capture, mixing, resampling, self-VAD
|
||||||
Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift
|
Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
|
||||||
Settings/ AppSettings.swift
|
Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
|
||||||
Support/ Info.plist, Ten31Transcripts.entitlements
|
Session/ SessionController state machine, packaging, reconciliation
|
||||||
Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
|
Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
|
||||||
|
Recap/ transcript.md + recap.html rendering, speaker editor
|
||||||
|
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
|
||||||
|
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
|
||||||
|
docs/ # architecture & data-contract design notes
|
||||||
```
|
```
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
||||||
LAN-only tool that must observe other apps. Revisit only if distributing.
|
LAN-only tool that must observe other apps. Revisit only if distributing.
|
||||||
- The backend host is a private LAN address — set it in **Settings**, or seed it
|
- **Privacy:** video frames are never written to disk; recordings, transcripts, and
|
||||||
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
|
screenshots are gitignored and never committed.
|
||||||
placeholder (`https://your-spark-backend.local`).
|
- `AGENTS.md` is the canonical reference for build commands, conventions, and current
|
||||||
|
state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
|
||||||
|
data-contract design notes.
|
||||||
|
|||||||
@@ -3,8 +3,8 @@ import Combine
|
|||||||
|
|
||||||
/// User-facing settings, persisted to `UserDefaults`.
|
/// User-facing settings, persisted to `UserDefaults`.
|
||||||
///
|
///
|
||||||
/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles.
|
/// Covers the backend host + TLS handling, output folder, your name, chunk
|
||||||
/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 3–4).
|
/// length, per-app adapter toggles, and the auto-record/auto-send/recap flags.
|
||||||
@MainActor
|
@MainActor
|
||||||
final class AppSettings: ObservableObject {
|
final class AppSettings: ObservableObject {
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user