diff --git a/.gitignore b/.gitignore index 072c680..b6e380b 100644 --- a/.gitignore +++ b/.gitignore @@ -17,3 +17,10 @@ build/ # Personal call screenshots / fixtures (faces, contact names) — never commit example-screenshots/ + +# Local signing identity (Apple Team ID) — keep out of source; template is committed +Config/Signing.xcconfig + +# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit +.env +.env.local diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..f11bec1 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,89 @@ +# AGENTS.md — Ten31 Transcripts + +Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps. + +## Stack (versions that matter) +- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon). +- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit. +- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**. +- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds. +- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`. + +## Commands +First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team): +``` +cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM +``` +Regenerate the Xcode project (after adding/removing/renaming any source file): +``` +xcodegen generate +``` +Build + run all tests: +``` +DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \ + -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \ + -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd +``` +Run a **single** test (target/class/method): +``` +DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \ + -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \ + -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \ + -only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine +``` +Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`. +Build a standalone app and install/run it (Xcode does **not** need to stay open): +``` +DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \ + -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \ + -configuration Release -derivedDataPath /tmp/ten31-release build +ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app +open /Applications/Ten31Transcripts.app +``` +**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`. + +## Layout (day one) +- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`. +- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`. +- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`). +- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`. +- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`. +- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip). +- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`. +- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements. +- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`. +- **Runtime output** (default `~/Ten31Transcripts/sessions/_/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`. + +## Conventions +- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what. +- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline. +- Commits: imperative mood, concise; authored by Grant. **No remote is configured** — confirm where to push (choosing one is tracked in `ROADMAP.md`). Branch before committing; never commit to `main` without asking. +- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`. +- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`. + +## Always +- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`. +- Run `xcodegen generate` after adding/removing/renaming source files. +- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints. +- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker. +- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong. +- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight). +- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update. + +## Never +- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable). +- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs. +- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names). +- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only. +- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`). +- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance. +- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks). +- Never commit to `main` or force-push a shared branch; branch first and ask. + +## Current state +Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs. +- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window. +- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix. +- **Decided but not implemented:** none open (deferred items live in `ROADMAP.md`). +- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint. +- **Next:** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) confirm Settings → Your name = "Grant"; (3) record a fresh Meet call to validate the fix on a clean capture; (4) decide a git remote and push. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/Config/Signing.xcconfig.example b/Config/Signing.xcconfig.example new file mode 100644 index 0000000..6beae7b --- /dev/null +++ b/Config/Signing.xcconfig.example @@ -0,0 +1,4 @@ +// Template for Config/Signing.xcconfig (which is gitignored). +// Copy to Config/Signing.xcconfig and set your Apple Developer Team ID +// (Xcode ▸ Settings ▸ Accounts, or `security find-identity -p codesigning -v`). +DEVELOPMENT_TEAM = YOUR_APPLE_TEAM_ID diff --git a/README.md b/README.md index 172efc7..33545df 100644 --- a/README.md +++ b/README.md @@ -14,25 +14,30 @@ This repo is at **Phase 0** (scaffold, permissions, backend health check). ```sh brew install xcodegen ``` -3. **Generate the project:** +3. **Set your signing team.** The Apple Team ID is kept out of source in a + gitignored `Config/Signing.xcconfig`. Copy the template and set your team: + ```sh + cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM + ``` + `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the + team automatically — no manual selection. Keep the value stable so macOS + preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, + not Xcode — `xcodegen generate` overwrites Xcode-side changes. +4. **Generate the project:** ```sh xcodegen generate ``` This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time). -4. **Open it:** +5. **Open it:** ```sh open Ten31Transcripts.xcodeproj ``` -5. Signing is preconfigured: `project.yml` sets `DEVELOPMENT_TEAM` to the free - personal team `BK4Y6CXN35` with automatic signing, so **Signing & Capabilities - should already show the team** — no manual selection needed. (If you ever sign - with a different Apple ID, update `DEVELOPMENT_TEAM` in `project.yml`, not in - Xcode — `xcodegen generate` overwrites Xcode-side changes.) 6. Press **Run** (⌘R). > **Note:** after adding files in a new phase, re-run `xcodegen generate` and let > Xcode reload the project. The signing team persists because it lives in -> `project.yml`, so macOS permissions stay granted across rebuilds. +> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across +> rebuilds. ## What Phase 0 does @@ -64,5 +69,6 @@ Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3 - **App Sandbox is off** and **Hardened Runtime is off** — this is a personal, LAN-only tool that must observe other apps. Revisit only if distributing. -- The default backend host is `https://your-spark-backend.local:62419` (editable in - Settings). +- The backend host is a private LAN address — set it in **Settings**, or seed it + from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral + placeholder (`https://your-spark-backend.local`). diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..7bc52ed --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,27 @@ +# ROADMAP — Ten31 Transcripts + +Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state. + +## Visual detection +- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill. +- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker. +- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid. +- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there. +- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1). +- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil. + +## Audio / speakers +- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing). +- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate. + +## App / UX +- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records. +- Constrain recap reading width on very wide windows (long line length in the summary band). + +## Tooling / repo +- Decide and configure a git remote (none set); then push. +- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today. +- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win. + +## Deferred decisions +- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path). diff --git a/Ten31Transcripts/Settings/AppSettings.swift b/Ten31Transcripts/Settings/AppSettings.swift index d43919d..5b2f452 100644 --- a/Ten31Transcripts/Settings/AppSettings.swift +++ b/Ten31Transcripts/Settings/AppSettings.swift @@ -92,11 +92,19 @@ final class AppSettings: ObservableObject { private let defaults: UserDefaults + /// Neutral placeholder. The real (private LAN) backend host is never committed — + /// it's entered in Settings (persisted to UserDefaults) or seeded from the + /// `SPARK_BACKEND_URL` env var for dev/CI/harness runs. + static let defaultBackendURL = "https://your-spark-backend.local" + init(defaults: UserDefaults = .standard) { self.defaults = defaults + // Precedence: a value the user saved in Settings wins; else the env var + // (handy when launching from Xcode/terminal); else the placeholder. self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL) - ?? "https://your-spark-backend.local:62419" + ?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"] + ?? Self.defaultBackendURL self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true diff --git a/docs/01_PROJECT_BRIEF.md b/docs/01_PROJECT_BRIEF.md index e9b5baf..63554dc 100644 --- a/docs/01_PROJECT_BRIEF.md +++ b/docs/01_PROJECT_BRIEF.md @@ -135,10 +135,11 @@ Full request/response shapes, curl examples, limits, and error formats are in ## 7. Remaining open items (small) -1. **Base URL — RESOLVED.** `https://your-spark-backend.local:62419`, also - `https://your-spark-backend.local:62419` (prefer the `.local` form; it survives IP - changes). Ship the `.local` host as the default; keep it editable in settings. - Service-discovery at `GET /api/endpoints`. +1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred + over a raw IP, since it survives IP changes) — configured in Settings or via the + `SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as + the default; keep it editable in settings. Service-discovery at + `GET /api/endpoints`. 2. **Send trigger** — assume auto-POST on call end; expose a "hold for review" toggle if the user wants to eyeball the timeline first. 3. **Retention** — keep the session folder after a successful hand-off, or prune diff --git a/docs/03_DATA_CONTRACTS.md b/docs/03_DATA_CONTRACTS.md index 0a91528..a2271d0 100644 --- a/docs/03_DATA_CONTRACTS.md +++ b/docs/03_DATA_CONTRACTS.md @@ -76,12 +76,13 @@ locally — the mic track is the user's known identity / VAD source.) ## 3. SparkControl — connection (real) -- **Base URL (confirmed):** `https://your-spark-backend.local:62419` — also reachable at - `https://your-spark-backend.local:62419` (the `.local` form survives IP changes; - **prefer it as the default**). Service-discovery JSON is at +- **Base URL (confirmed):** a private LAN host — a `.local` mDNS name (preferred + over a raw IP; it survives IP changes) — configured in Settings or via the + `SPARK_BACKEND_URL` env var, and **never committed**. Service-discovery JSON is at `GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio - endpoints in §4–§5 hang off this base. Still **make it a setting** so the host - can change, but ship `https://your-spark-backend.local:62419` as the default. + endpoints in §4–§5 hang off this base. **Make it a setting** so the host can + change, and ship a neutral placeholder (`https://your-spark-backend.local`) as + the default. - **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession` delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install the Start9 Root CA into the trust store. diff --git a/project.yml b/project.yml index 06de0f4..24807ae 100644 --- a/project.yml +++ b/project.yml @@ -7,17 +7,20 @@ options: createIntermediateGroups: true groupSortPosition: top +# Signing identity (DEVELOPMENT_TEAM) is kept out of source in a gitignored xcconfig +# so the Team ID isn't committed. Copy Config/Signing.xcconfig.example to +# Config/Signing.xcconfig and set your team. Keeping the value stable is what makes +# macOS TCC grants (Mic / Screen Recording / Accessibility) persist across rebuilds. +configFiles: + Debug: Config/Signing.xcconfig + Release: Config/Signing.xcconfig + settings: base: MARKETING_VERSION: "0.1.0" CURRENT_PROJECT_VERSION: "1" SWIFT_VERSION: "5.0" CODE_SIGN_STYLE: Automatic - # Grant's free personal team (cert OU). Baked in so `xcodegen generate` keeps - # a STABLE signing identity across regenerations — macOS ties TCC permission - # grants (Mic / Screen Recording / Accessibility) to this identity, so a - # stable team is what makes those permissions persist across rebuilds. - DEVELOPMENT_TEAM: "BK4Y6CXN35" targets: Ten31Transcripts: