Files
ten31-transcripts/AGENTS.md
T

94 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md — Ten31 Transcripts
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items tagged `(ten31-transcripts)` and surface them before proposing next steps; triage with `/triage`.
## Stack (versions that matter)
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN backend — IP or `.local` host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, **host-scoped** escape hatch, **off by default** — see `InsecureTrustDelegate`). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
## Commands
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
```
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
```
Regenerate the Xcode project (after adding/removing/renaming any source file):
```
xcodegen generate
```
Build + run all tests:
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
```
Run a **single** test (target/class/method):
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
```
Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`.
Build a standalone app and install/run it (Xcode does **not** need to stay open):
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
```
**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`.
## Layout (day one)
- `Ten31Transcripts/App/``@main` entry + `AppDelegate`.
- `Ten31Transcripts/Session/``SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
- `Ten31Transcripts/Visual/``VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
- `Ten31Transcripts/Audio/``AudioRecorder`, `MicVAD`, `ChannelSelfVAD`, `AudioMixer`, `MonoTrackWriter`, `Resampler`.
- `Ten31Transcripts/Backend/``SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
- `Ten31Transcripts/Recap/``RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/``CallDetector`/`AudioInputProcesses`/`MicActivityMonitor`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
## Conventions
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing, with my approval; the remote URL lives in `.git/config`, kept out of source. Work on `main` — don't create feature branches unless I ask.
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. A **second rewrite the same day** purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (`192.0.2.1`/`192.0.2.2`) and force-pushing; 0 hits across refs; backup bundle `../ten31-transcripts-pre-ip-scrub.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
## Always
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
- Run `xcodegen generate` after adding/removing/renaming source files.
- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight).
- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update.
## Never
- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable).
- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs.
- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names).
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
- Never send call audio to a raw IP the user didn't configure. Offline backend checks: a `.local` mDNS host can't be resolved by a plain `swiftc`/URLSession binary (`-1009`) — use the **real app** or `curl`; but a **configured raw IP _is_ reachable from a plain swiftc URLSession binary** (that's how the TLS fix was verified offline).
- Never force-push a shared branch, and never push without my approval. (Work on `main` — don't create feature branches unless I ask.)
## Current state
Present tense; overwritten each session. `main` clean and pushed (HEAD `35ba6ec`); `/Applications/Ten31Transcripts.app` rebuilt + installed from HEAD. Release build green; the full test suite was **not re-run this session** (changes were docs/comments + one plist key + one UI string — no logic touched; 73 pass as of the last run). A 2026-06-13 eval → `EVALUATION.md`; its P1 (TLS) is fixed and this session drained most of the P2/P3 **doc-debt**.
- **This session (2026-06-16) — docs/repo hygiene:** added `.claude/settings.json` + the portable inbox-check line + canonical `.gitignore`; rewrote the stale README and reconciled `docs/0103` with the shipped app (dual-channel API + recap phase); marked `docs/04` COMPLETE/historical; removed the dead `NSAppleEventsUsageDescription` and the last stale "Phase 0" strings/comments; completed the AGENTS Layout listings. Jitsi support routed to `ROADMAP.md`.
- **Backend connected end-to-end (2026-06-16):** real LAN URL saved in Settings → SparkControl backend; transcription/analysis reachable. The saved value lives off-repo (`defaults read xyz.ten31.transcripts backendBaseURL`); the committed default stays the placeholder.
- **Working:** backend hand-off (live), call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked send, speaker reconciliation, recap (`transcript.md` + `recap.html`), speaker editor, configurable chunk length, standalone Settings.
- **Next up (start here): backend URL primary→fallback.** Single `backendBaseURL`, no fallback. (1) primary→fallback on connection failure + show which endpoint is live; (2) freebie: the `mmss()` NaN/∞ guard. Sketch before coding. Keep real IPs out of source — use `192.0.2.x`.
- **In progress / unverified:** the Meet visual fix (reject solid camera-off tiles) has no clean end-to-end run yet — validate by re-processing the saved Meet session + a fresh Meet call (needs the real app + backend; not offline).
- **Known bugs / loose end:** sparse Meet speaking-detection (faint blue border); sub-second junk "self" mic fragments; desktop-mic vs phone doesn't unify by voiceprint. Doc loose end: `docs/01 §5`/`docs/02 §2.4` still list "AppleScript" as a Meet name source though the code uses window titles — reconcile when tracing the real Meet name path.