Compare commits

...

13 Commits

Author SHA1 Message Date
Grant Gilliam 050ae32e1d Document meeting-name session rename; note Gitea .local push gotcha 2026-06-17 22:18:16 -05:00
Grant Gilliam a5c227ef1c Prompt for a meeting name on stop; rename the session folder
When a recording finishes, ask for a meeting name and rename the session
folder from the auto stamp `<yyyy-MM-dd'T'HH-mm-ss>_<app>` to the readable
`<date>_<name>_<app>` (dropping HH-MM-SS), so sessions/ is easy to scan.
Skipping or leaving it blank keeps the timestamped name.

The rename runs after the recorder and visual capture finish (files closed)
and before finish() captures the folder for backend processing, so the
renamed folder is what flows downstream; finish() re-derives the track URLs
from the possibly-moved folder. The quit path never prompts, and a quit with
the prompt open ends its modal so termination isn't blocked.

Naming/parsing logic lives in a pure, unit-tested SessionNaming; recapTitle
moves there and now understands both folder forms.
2026-06-17 21:51:05 -05:00
Grant Gilliam d4228b566a Refresh Current state: docs/repo-hygiene session, doc-debt drained 2026-06-16 22:34:37 -05:00
Grant Gilliam 35ba6ecf05 Drop unused AppleEvents usage string; de-stale Phase-N comments
The NSAppleEventsUsageDescription usage string was dead — the app has no AppleEvents/AppleScript code path (Meet detection reads window titles), so the permission prompt never fired; remove it. Rephrase the leftover "Phase N" build-plan references in source comments (one of which falsely claimed "no audio, capture, or call detection yet"), and complete the AGENTS.md Audio/Detection layout listings.
2026-06-16 22:15:44 -05:00
Grant Gilliam dda4322de7 Reconcile docs/ specs with the shipped app
Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
2026-06-16 22:09:04 -05:00
Grant Gilliam 85ea8fde45 Rewrite README for the shipped app; fix stale AppSettings comment
The README still described "Phase 0 (scaffold)" — no audio capture, call detection, screen reading, or backend hand-off — for an app that ships all of it. Rewrite it to document the real detect/record/send/transcribe/recap pipeline, the standalone build+install commands, backend and Start9 Root CA setup (skip-TLS is off by default and host-scoped, not on by default), output files, and the real project layout. Also fix the matching "Phase 0" comment in AppSettings.
2026-06-16 21:54:54 -05:00
Grant Gilliam b42b591690 Add standard .claude scaffolding and inbox-check line
Create .claude/settings.json so shared project config is committable, add the deny-by-default .claude/* and .env.* allow-list block to .gitignore, and add the portable inbox-check line to AGENTS.md. Track Jitsi support in ROADMAP.
2026-06-16 21:40:44 -05:00
Grant Gilliam 82de00ce37 Align git workflow: work on main, gate on push (no branch-first)
Match the updated how-i-work default; drop "branch before committing".
2026-06-15 21:28:02 -05:00
Grant Gilliam d770e52d8f Refresh Current state: backend connected end-to-end; Settings save confirmed 2026-06-15 20:45:20 -05:00
Grant Gilliam fc80f6707a Hand off: stage next work, move eval debt to ROADMAP, trim Current state 2026-06-13 18:16:04 -05:00
Grant Gilliam 0af86411c2 Document the backend-IP history scrub in AGENTS.md 2026-06-13 16:08:46 -05:00
Grant Gilliam 5bed24a454 Replace real backend IPs with placeholders in docs and tests
The backend host and LAN IPs are kept out of source by convention; the prior
commit committed the real primary/fallback IPs into AGENTS.md and the new test.
Swap them for neutral wording and the RFC 5737 documentation IP (192.0.2.1).

These IPs remain in commit 3629dbd (already pushed); purging them from history
is a separate filter-repo + force-push decision.
2026-06-13 16:04:44 -05:00
Grant Gilliam 3629dbdaaa Default TLS validation on; scope skip-TLS bypass to the configured host
The app shipped with certificate validation bypassed globally and on by
default — InsecureTrustDelegate trusted any cert from any host. That was
the evaluation's P1: anyone on the LAN could MITM call audio, transcripts,
and voiceprints.

The backend's Start9 cert already validates under normal system trust when
the StartOS Root CA is installed in the keychain (confirmed: URLSession
default validation returns 200 against the backend and its fallback), so the
bypass is unnecessary:
- skip-TLS now defaults to off
- when explicitly enabled, the bypass is scoped to the configured host via
  InsecureTrustDelegate.allowsTrustOverride, never "trust any server"
- the host gate is pure and unit-tested (InsecureTrustDelegateTests)

Docs reconciled: AGENTS.md backend/TLS line and Current state.
2026-06-13 16:02:57 -05:00
25 changed files with 604 additions and 155 deletions
+1
View File
@@ -0,0 +1 @@
{}
+12 -1
View File
@@ -23,4 +23,15 @@ Config/Signing.xcconfig
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit # Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
.env .env
.env.local .env.*
!.env.example
# Claude Code — deny by default, allow-list shared wiring.
# .claude/ also accumulates worktrees, editor configs, and OS cruft; commit
# only the shared parts so new local scratch (or a stray secret) stays out.
.claude/*
!.claude/rules/
!.claude/agents/
!.claude/commands/
!.claude/skills/
!.claude/settings.json
+21 -20
View File
@@ -2,12 +2,14 @@
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps. Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items tagged `(ten31-transcripts)` and surface them before proposing next steps; triage with `/triage`.
## Stack (versions that matter) ## Stack (versions that matter)
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon). - **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit. - Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**. - Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds. - Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`. - Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN backend — IP or `.local` host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, **host-scoped** escape hatch, **off by default** — see `InsecureTrustDelegate`). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
## Commands ## Commands
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team): First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
@@ -44,23 +46,24 @@ open /Applications/Ten31Transcripts.app
## Layout (day one) ## Layout (day one)
- `Ten31Transcripts/App/``@main` entry + `AppDelegate`. - `Ten31Transcripts/App/``@main` entry + `AppDelegate`.
- `Ten31Transcripts/Session/``SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`. - `Ten31Transcripts/Session/``SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`, `SessionNaming` (pure folder-name + recap-title logic).
- `Ten31Transcripts/Visual/``VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`). - `Ten31Transcripts/Visual/``VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`. - `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
- `Ten31Transcripts/Audio/``AudioRecorder`, `MicVAD`, `ChannelSelfVAD`. - `Ten31Transcripts/Audio/``AudioRecorder`, `MicVAD`, `ChannelSelfVAD`, `AudioMixer`, `MonoTrackWriter`, `Resampler`.
- `Ten31Transcripts/Backend/``SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip). - `Ten31Transcripts/Backend/``SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
- `Ten31Transcripts/Recap/``RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`. - `Ten31Transcripts/Recap/``RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/``CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements. - `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/``CallDetector`/`AudioInputProcesses`/`MicActivityMonitor`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`. - `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`. - **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`. The folder is created at session start as `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; on stop the user can name the meeting and it's renamed to `<date>_<name>_<app>` (skipping keeps the auto stamp).
## Conventions ## Conventions
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what. - Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline. - Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing; the remote URL lives in `.git/config`, kept out of source. Branch before committing; never commit to `main` without asking. - Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing, with my approval; the remote URL lives in `.git/config`, kept out of source. Work on `main` — don't create feature branches unless I ask.
- **Gitea push gotcha:** `origin`'s URL uses a raw `.local` mDNS host that intermittently fails to resolve (`Could not resolve hostname`, or a push that connects then stalls). The `gitea-home` SSH alias (in `~/.ssh/config`) points at the **same** Gitea server (port 59916, user `git`) via a reliable HostName — the sibling `standards` repo uses it. Reliable fallback: `git push gitea-home:grant/ten31-transcripts.git main` then `git update-ref refs/remotes/origin/main main`. Repointing `origin` to the alias would make this permanent (not yet done).
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`. - Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`. - No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it. - **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. A **second rewrite the same day** purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (`192.0.2.1`/`192.0.2.2`) and force-pushing; 0 hits across refs; backup bundle `../ten31-transcripts-pre-ip-scrub.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
## Always ## Always
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`. - Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
@@ -78,18 +81,16 @@ open /Applications/Ten31Transcripts.app
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only. - Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`). - Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance. - Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks). - Never let a session-folder name put the meeting name where the app label is parsed from: the app must stay the **last** `_`-segment (`SessionController.appLabel(from:)` reads `.split("_").last`; `SessionNaming` enforces this and disambiguates collisions on the name segment). Renames happen at `finish()`-time after files are closed — re-derive track URLs from the (possibly moved) folder, never from `RecordingResult`'s start-time paths.
- Never commit to `main` or force-push a shared branch; branch first and ask. - Never send call audio to a raw IP the user didn't configure. Offline backend checks: a `.local` mDNS host can't be resolved by a plain `swiftc`/URLSession binary (`-1009`) — use the **real app** or `curl`; but a **configured raw IP _is_ reachable from a plain swiftc URLSession binary** (that's how the TLS fix was verified offline).
- Never force-push a shared branch, and never push without my approval. (Work on `main` — don't create feature branches unless I ask.)
## Current state ## Current state
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs; working tree clean and pushed to `origin`/`main`. A full independent evaluation ran 2026-06-13 → `EVALUATION.md` (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below. Present tense; overwritten each session. `main` clean and pushed (HEAD `a5c227e`, pushed via the `gitea-home` alias — origin's `.local` host wouldn't resolve); `/Applications/Ten31Transcripts.app` rebuilt + installed from HEAD. **Full suite re-run: 91 pass** (was 73; +18 `SessionNamingTests`).
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window. - **This session (2026-06-17) — meeting-name prompt + folder rename:** on stop, an NSAlert asks for a meeting name (Save/Skip) and the session folder is renamed `<ts>_<app>``<date>_<name>_<app>` (HH-MM-SS dropped; Skip/blank keeps the stamp). Pure logic in `SessionNaming` (sanitize, leaf compose, `recapTitle` for both forms); app label stays the last `_`-segment; collisions disambiguate on the name segment; `finish()` re-derives track URLs post-rename; quit never prompts and aborts an open prompt. Reviewer-reviewed; its P1 (quit-during-modal) + two P2s fixed.
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix. - **Backend connected end-to-end:** real LAN URL saved in Settings → SparkControl backend (off-repo: `defaults read xyz.ten31.transcripts backendBaseURL`); committed default stays the placeholder.
- **Work queue (P1 — do first):** the TLS-trust override is global and on by default — it returns `URLCredential(trust:)` for *any* host (`InsecureTrustDelegate.swift:22`; default-on at `AppSettings.swift:109`), so the full mic+system audio, visual timeline, and voiceprint upload is MITM-able by anyone on the LAN. Scope the override to the configured backend host and pin the Start9 root CA (or the leaf SPKI hash); default skip-TLS to off. This gates trusting any later backend-integration test. - **Working:** backend hand-off (live), call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked send, speaker reconciliation, recap, speaker editor, configurable chunk length, standalone Settings, meeting-name prompt + readable folders.
- **Known debt (P2 — fix before wider use):** - **Verify next (real app):** the naming prompt + rename is unit-tested + builds but **not yet exercised on a live stop** — run a real recording, stop, name it, confirm the folder renames and backend output lands in the renamed folder.
- `RecapAnalyzer.mmss()` fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backend `duration` (e.g. `1e400``Double.infinity`) aborts the app at recap-render time — add a finite-guard fallback (`RecapAnalyzer.swift:137`). - **Next up:** (a) repoint `origin` to `gitea-home` so pushes stop hitting the flaky `.local` host (see Conventions); (b) **backend URL primary→fallback** + the `mmss()` NaN/∞ guard freebie (sketch first; keep real IPs out of source — use `192.0.2.x`).
- README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment `AppSettings.swift:7`. Rewrite both to match reality. - **In progress / unverified:** the Meet visual fix (reject solid camera-off tiles) still has no clean end-to-end run — re-process the saved Meet session + a fresh Meet call (needs real app + backend).
- `SessionController` (670 lines, the most concurrency-dense file) has zero unit tests — cover `pendingAutoStop` (auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor. - **Known bugs / loose end:** sparse Meet speaking-detection (faint blue border); sub-second junk "self" mic fragments; desktop-mic vs phone doesn't unify by voiceprint. Doc loose end: `docs/01 §5`/`docs/02 §2.4` still list "AppleScript" as a Meet name source though the code uses window titles.
- **Deferred (P3 — later decision or bulk cleanup; full evidence in `EVALUATION.md`):** `docs/` specs drifted from the dual-channel API + recap phase; `docs/01` §7 lists already-resolved open items; `docs/02` §2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings under `Audio/`/`Detection/` are incomplete; the `manifest.json` sha256 contract is specced but never written; env-var precedence footgun (saved URL shadows `SPARK_BACKEND_URL`); `SessionController` owns three jobs (extract the open-panel UI); unused `NSAppleEventsUsageDescription`; unauthenticated LAN backend (consider a shared bearer token).
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
- **Next (product validation — no agent could reach the live backend, so this stays manual):** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic `"Me"`; "Grant" only ever lives in local UserDefaults.)
+110 -38
View File
@@ -1,74 +1,146 @@
# Ten31 Transcripts # Ten31 Transcripts
Native macOS menu-bar app that auto-detects conference calls, records local audio, Native macOS menu-bar app that auto-detects conference calls, records dual-track
builds a visual-derived speaker timeline, and hands audio + timeline to the audio while watching the call window for active-speaker cues, and hands the audio
SparkControl backend for naming/transcription. See `docs/` for the full spec. plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
the transcription, diarization, and speaker naming — producing named transcripts
and meeting recaps.
This repo is at **Phase 0** (scaffold, permissions, backend health check). It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
the backend; the app only records, watches, packages, and reconciles hints.
## How it works
1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
notices and (optionally) auto-starts a session.
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
`ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
active speaker. Video frames are analyzed in memory and released immediately —
**never written to disk**.
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
(`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
The visual timeline rides along as naming hints. Backend calls are sequential
(one in flight) to respect the single-GPU backend.
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
(Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
hints and your stored voiceprints.
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
`transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
names after the fact.
**You** are identified by the mic channel plus the single name in *Settings → Your
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
## One-time setup ## One-time setup
1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and 1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
accept the license prompt. accept the license prompt.
2. **Install XcodeGen** (generates the Xcode project from `project.yml`): 2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
```sh ```sh
brew install xcodegen brew install xcodegen
``` ```
3. **Set your signing team.** The Apple Team ID is kept out of source in a 3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
gitignored `Config/Signing.xcconfig`. Copy the template and set your team: `Config/Signing.xcconfig`. Copy the template and set your team:
```sh ```sh
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
``` ```
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
team automatically — no manual selection. Keep the value stable so macOS team automatically. Keep the value stable so macOS preserves the app's permission
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig, (TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
not Xcode — `xcodegen generate` overwrites Xcode-side changes. overwrites Xcode-side changes.
4. **Generate the project:** 4. **Generate the project** (re-run any time you add/remove/rename a source file):
```sh ```sh
xcodegen generate xcodegen generate
``` ```
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time). This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).
5. **Open it:**
## Build & run
The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
`xcode-select` may point at the Command Line Tools, so set it on **every**
`xcodebuild`:
```sh ```sh
open Ten31Transcripts.xcodeproj DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
``` ```
6. Press **Run** (⌘R).
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let The installed copy does **not** auto-update — rebuild and `ditto` again after changes.
> Xcode reload the project. The signing team persists because it lives in
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
> rebuilds.
## What Phase 0 does Run the test suite:
- Launches as a menu-bar-only app (no Dock icon). ```sh
- Menu panel shows live status for the three permissions it needs — **Microphone**, DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
**Screen Recording**, **Accessibility** — with Grant / Open Settings buttons. -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
- Shows a **backend health check** (`GET /api/status`) against the configured host. -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
- **Settings:** backend base URL, skip-TLS toggle (on by default for the ```
self-signed cert), output folder, and adapter toggles (inert this phase).
No audio capture, call detection, screen reading, or backend hand-off yet — those ## Permissions
arrive in Phases 16 (`docs/04_BUILD_PLAN.md`).
The menu panel shows live status for the three permissions the app needs, each with
Grant / Open Settings buttons:
- **Microphone** — to record your side of the call.
- **Screen Recording** — to capture system audio and watch the call window.
- **Accessibility** — to read window/participant information.
## Backend setup
Point the app at your SparkControl backend in **Settings → SparkControl backend**.
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
URL lives in Settings and never touches source.
The backend sits behind a Start9 self-signed Root CA. The supported path is to
**install the StartOS Root CA in your System keychain**, after which normal TLS
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
default** and **scoped to the configured backend host** — it never becomes
"trust any server."
## Output
Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
in Settings):
```
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
self_vad.json visual_timeline.json # self voice-activity + visual hints
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
transcript.md recap.html recap.json # final outputs
```
## Project layout ## Project layout
``` ```
project.yml # XcodeGen recipe → generates the .xcodeproj project.yml # XcodeGen recipe → generates the .xcodeproj
Ten31Transcripts/ Ten31Transcripts/
App/ Ten31TranscriptsApp.swift, AppDelegate.swift App/ @main entry + AppDelegate
UI/ MenuBarView, SettingsView, PermissionRow Detection/ CallDetector — which app is in a call
Permissions/PermissionsManager.swift Audio/ dual-track capture, mixing, resampling, self-VAD
Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
Settings/ AppSettings.swift Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
Support/ Info.plist, Ten31Transcripts.entitlements Session/ SessionController state machine, packaging, reconciliation
Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3 Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
Recap/ transcript.md + recap.html rendering, speaker editor
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
docs/ # architecture & data-contract design notes
``` ```
## Notes ## Notes
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal, - **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
LAN-only tool that must observe other apps. Revisit only if distributing. LAN-only tool that must observe other apps. Revisit only if distributing.
- The backend host is a private LAN address — set it in **Settings**, or seed it - **Privacy:** video frames are never written to disk; recordings, transcripts, and
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral screenshots are gitignored and never committed.
placeholder (`https://your-spark-backend.local`). - `AGENTS.md` is the canonical reference for build commands, conventions, and current
state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
data-contract design notes.
+8
View File
@@ -10,6 +10,9 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1). - 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil. - Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
## Platform support
- Jitsi: add call detection + a `JitsiAdapter` (Jitsi Meet is browser-based like Google Meet — needs `CallDetector` title recognition, an adapter for participant-name reading, and active-speaker visual cues). New platform alongside Meet/Zoom/Teams/Signal.
## Audio / speakers ## Audio / speakers
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing). - Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate. - Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
@@ -22,5 +25,10 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today. - Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win. - `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`)
- Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
- Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
- Smaller P3s in `EVALUATION.md`: whether to actually emit the `manifest.json` per-file `sha256` (now documented as not-emitted in `docs/03` §2); unauthenticated LAN backend (consider a bearer token).
## Deferred decisions ## Deferred decisions
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path). - Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
@@ -3,9 +3,8 @@ import SwiftUI
/// Menu-bar-only app entry point. /// Menu-bar-only app entry point.
/// ///
/// `LSUIElement` (set in Info.plist) keeps the app out of the Dock; the /// `LSUIElement` (set in Info.plist) keeps the app out of the Dock; the
/// `MenuBarExtra` scene provides the status-bar item and its panel. Phase 0 only /// `MenuBarExtra` scene provides the status-bar item and its panel, which wires
/// wires up permissions, settings, and a backend health check no audio, /// up permissions, settings, recording control, and the backend health check.
/// capture, or call detection yet.
@main @main
struct Ten31TranscriptsApp: App { struct Ten31TranscriptsApp: App {
@NSApplicationDelegateAdaptor(AppDelegate.self) private var appDelegate @NSApplicationDelegateAdaptor(AppDelegate.self) private var appDelegate
+1 -1
View File
@@ -14,7 +14,7 @@ struct RecordingResult {
let systemNote: String? let systemNote: String?
} }
/// Dual-track local audio capture for Phase 1. /// Dual-track local audio capture.
/// ///
/// - System audio via `SCStream` (`capturesAudio`); its audio handler runs on /// - System audio via `SCStream` (`capturesAudio`); its audio handler runs on
/// `ioQueue`. A discard-only video output runs on `screenQueue` purely to keep /// `ioQueue`. A discard-only video output runs on `screenQueue` purely to keep
+2 -2
View File
@@ -13,8 +13,8 @@ struct VADSpan: Equatable {
/// internal sample cursor always equals the mic file position, and span times /// internal sample cursor always equals the mic file position, and span times
/// land on the same instants as `mixed_mono_16k.wav`. /// land on the same instants as `mixed_mono_16k.wav`.
/// ///
/// Phase 3's `TimelineBuilder` will fold these in as high-confidence pre-seeded /// `TimelineBuilder` folds these in as high-confidence pre-seeded "self"
/// "self" segments. Thresholds are intentionally simple and will be tuned later. /// segments. Thresholds are intentionally simple.
/// ///
/// Single-threaded: all calls happen on `AudioRecorder.ioQueue`. /// Single-threaded: all calls happen on `AudioRecorder.ioQueue`.
final class MicVAD { final class MicVAD {
@@ -33,7 +33,9 @@ final class GatewayLLMClient {
config.timeoutIntervalForRequest = 600 config.timeoutIntervalForRequest = 600
config.timeoutIntervalForResource = 900 config.timeoutIntervalForResource = 900
config.waitsForConnectivity = false config.waitsForConnectivity = false
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil let delegate: URLSessionDelegate? = skipTLS
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
: nil
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil) self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
} }
@@ -1,19 +1,42 @@
import Foundation import Foundation
/// URLSession delegate that trusts the server certificate without validation. /// URLSession delegate that bypasses certificate validation for **one host only**
/// the configured SparkControl backend.
/// ///
/// SparkControl sits behind a Start9 self-signed Root CA on the LAN, so default /// SparkControl sits behind a Start9 self-signed Root CA on the LAN. The supported
/// trust evaluation rejects it. This delegate is used **only** when the /// path is to install that CA in the System keychain; default trust evaluation then
/// "Skip TLS verification" setting is on. It trusts any server certificate /// succeeds and this delegate is never used. It exists only as an opt-in escape
/// acceptable for a personal tool on a trusted local network and nothing else. /// hatch (the "Skip TLS verification" setting, off by default) for a machine where
/// the CA isn't installed. Even then it trusts a certificate only when the challenge
/// host equals `allowedHost` a server-trust challenge from any other host falls
/// back to default validation, so the bypass can never become "trust any server".
final class InsecureTrustDelegate: NSObject, URLSessionDelegate { final class InsecureTrustDelegate: NSObject, URLSessionDelegate {
/// The single host the bypass is scoped to (the configured backend host). When
/// nil only reachable via a malformed base URL the gate never fires and every
/// challenge falls back to default validation: the safe degenerate case.
private let allowedHost: String?
init(allowedHost: String?) {
self.allowedHost = allowedHost
}
/// The security gate: the trust override may fire only for a server-trust
/// challenge whose host matches `allowedHost`. Pure and synchronous so the
/// host-scoping can be unit-tested without fabricating a `SecTrust`; the
/// credential itself is built only when this is true *and* a serverTrust exists.
func allowsTrustOverride(for space: URLProtectionSpace) -> Bool {
guard let allowedHost else { return false }
return space.authenticationMethod == NSURLAuthenticationMethodServerTrust
&& space.host == allowedHost
}
func urlSession( func urlSession(
_ session: URLSession, _ session: URLSession,
didReceive challenge: URLAuthenticationChallenge, didReceive challenge: URLAuthenticationChallenge,
completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void
) { ) {
guard guard
challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust, allowsTrustOverride(for: challenge.protectionSpace),
let serverTrust = challenge.protectionSpace.serverTrust let serverTrust = challenge.protectionSpace.serverTrust
else { else {
completionHandler(.performDefaultHandling, nil) completionHandler(.performDefaultHandling, nil)
@@ -82,7 +82,9 @@ final class SparkControlClient {
config.timeoutIntervalForRequest = 600 // diarization can take up to ~600s config.timeoutIntervalForRequest = 600 // diarization can take up to ~600s
config.timeoutIntervalForResource = 900 config.timeoutIntervalForResource = 900
config.waitsForConnectivity = false config.waitsForConnectivity = false
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil let delegate: URLSessionDelegate? = skipTLS
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
: nil
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil) self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
} }
@@ -1,10 +1,10 @@
import Foundation import Foundation
import Combine import Combine
/// Performs the Phase 0 backend reachability check: `GET {baseURL}/api/status`. /// Performs the backend reachability check: `GET {baseURL}/api/status`.
/// ///
/// This is a thin slice the full `SparkControlClient` (label-merge, multipart, /// This is a thin slice; the full upload path (label-merge, multipart, sequential
/// sequential queueing, retries) arrives in Phase 5. /// queueing, retries) lives in `SparkControlClient`.
@MainActor @MainActor
final class SparkControlHealth: ObservableObject { final class SparkControlHealth: ObservableObject {
@@ -32,7 +32,9 @@ final class SparkControlHealth: ObservableObject {
config.timeoutIntervalForRequest = 8 config.timeoutIntervalForRequest = 8
config.waitsForConnectivity = false config.waitsForConnectivity = false
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil let delegate: URLSessionDelegate? = skipTLS
? InsecureTrustDelegate(allowedHost: url.host)
: nil
let session = URLSession(configuration: config, delegate: delegate, delegateQueue: nil) let session = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
defer { session.finishTasksAndInvalidate() } defer { session.finishTasksAndInvalidate() }
@@ -99,6 +99,11 @@ final class SessionController: ObservableObject {
/// Bumped each time a start/stop Task is spawned (Task is a value type, so this /// Bumped each time a start/stop Task is spawned (Task is a value type, so this
/// is how `prepareForTermination` detects a newly-spawned transition). /// is how `prepareForTermination` detects a newly-spawned transition).
private var lifecycleGeneration = 0 private var lifecycleGeneration = 0
/// The meeting-name prompt currently on screen, if any, so a quit can end it
/// instead of blocking termination on user input (set in `askMeetingName`).
private weak var activeNamingAlert: NSAlert?
/// Set once `prepareForTermination` begins, so we skip the post-stop naming prompt.
private var isTerminating = false
init(settings: AppSettings) { init(settings: AppSettings) {
self.settings = settings self.settings = settings
@@ -324,6 +329,9 @@ final class SessionController: ObservableObject {
lifecycleTask = Task { lifecycleTask = Task {
let result = await recorder.stop() let result = await recorder.stop()
let visual = await self.stopVisualAndTimeline(result, folder: folder) let visual = await self.stopVisualAndTimeline(result, folder: folder)
// Interactive stop only: ask for a meeting name and give the folder a
// readable name before `finish()` captures it for backend processing.
self.promptMeetingNameAndRename()
self.finish(result, timeline: visual.timeline, selfSpans: visual.selfSpans, visualRan: visual.visualRan) self.finish(result, timeline: visual.timeline, selfSpans: visual.selfSpans, visualRan: visual.visualRan)
} }
} }
@@ -338,13 +346,18 @@ final class SessionController: ObservableObject {
if let folder = currentFolder { if let folder = currentFolder {
writeSelfSpans(spans: selfSpans, result: result, to: folder) writeSelfSpans(spans: selfSpans, result: result, to: folder)
let visualCount = visualRan ? timeline.count : nil // `timeline` is the remote vision segments let visualCount = visualRan ? timeline.count : nil // `timeline` is the remote vision segments
// Re-derive the track URLs from `folder`: a meeting-name rename may have
// moved the session after `result` captured its original paths.
let micURL = folder.appendingPathComponent("mic.wav")
let systemURL = folder.appendingPathComponent("system.wav")
let mixedURL = folder.appendingPathComponent("mixed_mono_16k.wav")
lastSession = SessionInfo( lastSession = SessionInfo(
folder: folder, mixedURL: result.mixedURL, folder: folder, mixedURL: mixedURL,
duration: result.duration, selfSpanCount: selfSpans.count, duration: result.duration, selfSpanCount: selfSpans.count,
visualSegmentCount: visualCount) visualSegmentCount: visualCount)
lastProcess = ProcessInputs( lastProcess = ProcessInputs(
folder: folder, sessionId: folder.lastPathComponent, app: currentLabel, folder: folder, sessionId: folder.lastPathComponent, app: currentLabel,
micURL: result.micURL, systemURL: result.systemURL, mixedURL: result.mixedURL, micURL: micURL, systemURL: systemURL, mixedURL: mixedURL,
timeline: timeline, selfSpans: selfSpans, selfName: settings.selfName, timeline: timeline, selfSpans: selfSpans, selfName: settings.selfName,
systemHealthy: result.systemNote == nil) systemHealthy: result.systemNote == nil)
} }
@@ -419,24 +432,13 @@ final class SessionController: ObservableObject {
guard settings.recapEnabled, !resolved.segments.isEmpty else { return } guard settings.recapEnabled, !resolved.segments.isEmpty else { return }
let analyzer = RecapAnalyzer(llm: llm, model: model) let analyzer = RecapAnalyzer(llm: llm, model: model)
guard let result = try? await analyzer.recap(file: resolved, template: settings.defaultTemplate) else { return } guard let result = try? await analyzer.recap(file: resolved, template: settings.defaultTemplate) else { return }
let title = Self.recapTitle(app: inputs.app, sessionId: inputs.sessionId) let title = SessionNaming.recapTitle(app: inputs.app, sessionId: inputs.sessionId)
try? RecapRenderer.write(file: resolved, result: result, title: title, to: inputs.folder) try? RecapRenderer.write(file: resolved, result: result, title: title, to: inputs.folder)
try? RecapFile(title: title, result: result).write(to: inputs.folder.appendingPathComponent("recap.json")) try? RecapFile(title: title, result: result).write(to: inputs.folder.appendingPathComponent("recap.json"))
let url = inputs.folder.appendingPathComponent("recap.html") let url = inputs.folder.appendingPathComponent("recap.html")
if FileManager.default.fileExists(atPath: url.path) { self.recapURL = url } if FileManager.default.fileExists(atPath: url.path) { self.recapURL = url }
} }
/// Friendly recap title, e.g. "Google Meet call 2026-06-06 11:43".
private static func recapTitle(app: String, sessionId: String) -> String {
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
let stamp = sessionId.split(separator: "_").first.map(String.init) ?? sessionId
let parts = stamp.split(separator: "T")
let date = parts.first.map(String.init) ?? ""
let timeBits = parts.count > 1 ? parts[1].split(separator: "-") : []
let time = timeBits.count >= 2 ? "\(timeBits[0]):\(timeBits[1])" : ""
return "\(appName) call — \(date) \(time)".trimmingCharacters(in: .whitespaces)
}
// MARK: - Speaker corrections // MARK: - Speaker corrections
/// True once the last session has a transcribed `speakers.json` to correct. /// True once the last session has a transcribed `speakers.json` to correct.
@@ -584,6 +586,11 @@ final class SessionController: ObservableObject {
/// its WAV headers are finalized before the process exits. Handles quit while /// its WAV headers are finalized before the process exits. Handles quit while
/// `.starting` and `.finishing`, not just `.recording`. /// `.starting` and `.finishing`, not just `.recording`.
func prepareForTermination() async { func prepareForTermination() async {
isTerminating = true
// If the meeting-name prompt is open, end its modal loop so quit isn't blocked
// waiting on the user the session keeps its auto timestamped name. (Falls
// back to the user answering the on-screen dialog if the abort isn't serviced.)
if activeNamingAlert != nil { NSApp.abortModal() }
// Cancel any in-flight backend transcription (audio is already saved; the // Cancel any in-flight backend transcription (audio is already saved; the
// user can resend). The pipeline's checkCancellation + defer clean up chunks. // user can resend). The pipeline's checkCancellation + defer clean up chunks.
processTask?.cancel() processTask?.cancel()
@@ -649,6 +656,59 @@ final class SessionController: ObservableObject {
return f.string(from: Date()) return f.string(from: Date())
} }
/// Ask the user to name the just-finished recording, then rename its folder to
/// a readable `<date>_<name>_<app>` (dropping the HH-MM-SS auto stamp). Skipping
/// or leaving it blank keeps the timestamped name. Must run BEFORE `finish()` so
/// the renamed folder is what flows to backend processing. The recorder and
/// visual capture have both finished by now, so every session file is closed and
/// the move is safe. Never called from the quit path we don't block a quit on
/// a prompt.
private func promptMeetingNameAndRename() {
// A quit can begin while we're finishing don't put a blocking prompt in its
// way; keep the auto timestamped name and let termination drain.
guard !isTerminating, let folder = currentFolder,
let name = askMeetingName() else { return } // nil = skipped / blank
let base = folder.deletingLastPathComponent()
let date = SessionNaming.datePrefix(ofSessionNamed: folder.lastPathComponent)
let fm = FileManager.default
var counter = 0
while counter < 100 {
guard let leaf = SessionNaming.renamedLeaf(
date: date, app: currentLabel, meetingName: name, counter: counter) else { return }
let target = base.appendingPathComponent(leaf, isDirectory: true)
if fm.fileExists(atPath: target.path) { counter += 1; continue } // disambiguate
do {
try fm.moveItem(at: folder, to: target)
currentFolder = target
} catch {
NSLog("Session rename to “\(leaf)” failed: \(error.localizedDescription)") // keep the original folder
}
return
}
NSLog("Session rename: kept “\(folder.lastPathComponent)” — 100 name collisions")
}
/// Modal prompt for a meeting name. Registers the alert so `prepareForTermination`
/// can end it on quit. Returns the trimmed name, or nil if the user skipped, left
/// it empty, or a quit aborted the prompt (caller keeps the auto folder name).
private func askMeetingName() -> String? {
let alert = NSAlert()
alert.messageText = "Name this recording"
alert.informativeText = "Give the meeting a name so its folder is easy to find in your sessions. Leave blank to keep the timestamped name."
alert.addButton(withTitle: "Save") // .alertFirstButtonReturn
alert.addButton(withTitle: "Skip") // .alertSecondButtonReturn
let field = NSTextField(frame: NSRect(x: 0, y: 0, width: 240, height: 24))
field.placeholderString = "Meeting name"
alert.accessoryView = field
alert.window.initialFirstResponder = field
NSApp.activate(ignoringOtherApps: true)
activeNamingAlert = alert
defer { activeNamingAlert = nil }
guard alert.runModal() == .alertFirstButtonReturn else { return nil }
let text = field.stringValue.trimmingCharacters(in: .whitespacesAndNewlines)
return text.isEmpty ? nil : text
}
/// Debug artifact: the channel-verified "self" spans actually sent to the backend /// Debug artifact: the channel-verified "self" spans actually sent to the backend
/// as `self_vad` (mic active AND louder than system). Lets us eyeball self detection. /// as `self_vad` (mic active AND louder than system). Lets us eyeball self detection.
private func writeSelfSpans(spans: [VADSpan], result: RecordingResult, to folder: URL) { private func writeSelfSpans(spans: [VADSpan], result: RecordingResult, to folder: URL) {
@@ -0,0 +1,71 @@
import Foundation
/// Pure helpers for session-folder names. A session folder is created at start
/// with an auto name `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; when the user names the
/// recording on stop it's renamed to `<yyyy-MM-dd>_<name>_<app>` (no HH-MM-SS),
/// which is far easier to scan in `sessions/`. The app label always stays the
/// LAST `_`-separated segment so `SessionController.appLabel(from:)` keeps working
/// even when the meeting name itself contains spaces or underscores.
enum SessionNaming {
/// Filesystem- and parse-safe meeting name: trims, turns path separators into
/// dashes, drops control characters, collapses whitespace runs, removes leading
/// dots (no hidden/`.`/`..` folders), and caps the length. Returns "" if nothing
/// usable is left, which callers treat as "skip the rename".
static func sanitize(_ raw: String) -> String {
var s = raw.trimmingCharacters(in: .whitespacesAndNewlines)
// Path-hostile separators (`/` and the classic Mac `:`, plus `\`) dash.
s = s.components(separatedBy: CharacterSet(charactersIn: "/:\\")).joined(separator: "-")
// Strip control characters outright.
s = s.components(separatedBy: .controlCharacters).joined()
// Collapse internal whitespace runs to single spaces.
s = s.split(whereSeparator: { $0 == " " || $0 == "\t" }).joined(separator: " ")
while s.hasPrefix(".") { s.removeFirst() }
s = s.trimmingCharacters(in: .whitespaces)
if s.count > 60 { s = String(s.prefix(60)).trimmingCharacters(in: .whitespaces) }
return s
}
/// The date prefix of a session leaf name, e.g. `2026-06-17T09-59-48_signal`
/// `2026-06-17`. Already-renamed leaves (`2026-06-17_name_signal`) return the
/// same date, so this is safe to call on either form.
static func datePrefix(ofSessionNamed leaf: String) -> String {
let head = leaf.split(separator: "_").first.map(String.init) ?? leaf
return head.split(separator: "T").first.map(String.init) ?? head
}
/// Compose the renamed leaf `<date>_<name>_<app>`. A positive `counter`
/// disambiguates a collision by suffixing the NAME segment (`<name>-2`) so the
/// trailing `_<app>` stays parseable. Returns nil when the name sanitizes to
/// empty (the caller keeps the auto timestamped name).
static func renamedLeaf(date: String, app: String, meetingName: String, counter: Int = 0) -> String? {
let clean = sanitize(meetingName)
guard !clean.isEmpty else { return nil }
let suffix = counter > 0 ? "-\(counter + 1)" : ""
return "\(date)_\(clean)\(suffix)_\(app)"
}
/// Friendly recap title from a session id, understanding both folder forms:
/// `2026-06-06T11-43-02_meet` "Google Meet call 2026-06-06 11:43"
/// `2026-06-06_Weekly sync_meet` "Weekly sync Google Meet (2026-06-06)"
static func recapTitle(app: String, sessionId: String) -> String {
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
var parts = sessionId.split(separator: "_").map(String.init)
if parts.count > 1 { parts.removeLast() } // drop the trailing "_<app>"
let head = parts.first ?? sessionId
let tBits = head.split(separator: "T").map(String.init)
let date = tBits.first ?? head
let time: String = {
guard tBits.count > 1 else { return "" }
let b = tBits[1].split(separator: "-")
return b.count >= 2 ? "\(b[0]):\(b[1])" : ""
}()
let when = [date, time].filter { !$0.isEmpty }.joined(separator: " ")
// Rejoin with "_" the faithful inverse of split("_") so a name that
// itself contained underscores survives the round-trip through the folder name.
let name = parts.count > 1 ? parts[1...].joined(separator: "_") : ""
if name.isEmpty {
return "\(appName) call — \(when)".trimmingCharacters(in: .whitespaces)
}
return "\(name)\(appName) (\(when))".trimmingCharacters(in: .whitespaces)
}
}
@@ -121,8 +121,8 @@ final class TranscriptPipeline {
return assembled.speakersFile return assembled.speakersFile
} }
/// Build the `label-merge` timeline from mic-VAD self spans (Phase 1/2). Once /// Build the `label-merge` timeline from mic-VAD self spans; the visual
/// the visual adapters land (Phase 34), their segments are merged in too. /// adapters' segments are merged in alongside these.
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] { static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") } spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
} }
+6 -3
View File
@@ -3,8 +3,8 @@ import Combine
/// User-facing settings, persisted to `UserDefaults`. /// User-facing settings, persisted to `UserDefaults`.
/// ///
/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles. /// Covers the backend host + TLS handling, output folder, your name, chunk
/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 34). /// length, per-app adapter toggles, and the auto-record/auto-send/recap flags.
@MainActor @MainActor
final class AppSettings: ObservableObject { final class AppSettings: ObservableObject {
@@ -106,7 +106,10 @@ final class AppSettings: ObservableObject {
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"] ?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
?? Self.defaultBackendURL ?? Self.defaultBackendURL
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true // Off by default: install the Start9 Root CA in the System keychain and the
// backend's cert validates normally. The bypass is an opt-in escape hatch and,
// when on, is scoped to the configured host (see `InsecureTrustDelegate`).
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? false
self.outputFolderPath = defaults.string(forKey: Keys.outputFolder) self.outputFolderPath = defaults.string(forKey: Keys.outputFolder)
?? "~/Ten31Transcripts" ?? "~/Ten31Transcripts"
-2
View File
@@ -30,8 +30,6 @@
<string>Ten31</string> <string>Ten31</string>
<key>NSMicrophoneUsageDescription</key> <key>NSMicrophoneUsageDescription</key>
<string>Ten31 Transcripts records your microphone during calls to build the local audio track.</string> <string>Ten31 Transcripts records your microphone during calls to build the local audio track.</string>
<key>NSAppleEventsUsageDescription</key>
<string>Ten31 Transcripts reads the active browser tab's URL to detect Google Meet calls.</string>
<key>NSLocalNetworkUsageDescription</key> <key>NSLocalNetworkUsageDescription</key>
<string>Ten31 Transcripts connects to your SparkControl server on the local network.</string> <string>Ten31 Transcripts connects to your SparkControl server on the local network.</string>
<key>NSAppTransportSecurity</key> <key>NSAppTransportSecurity</key>
+1 -1
View File
@@ -173,7 +173,7 @@ struct MenuBarView: View {
private var header: some View { private var header: some View {
VStack(alignment: .leading, spacing: 2) { VStack(alignment: .leading, spacing: 2) {
Text("Ten31 Transcripts").font(.headline) Text("Ten31 Transcripts").font(.headline)
Text("Phase 0 · setup & status") Text("Setup & status")
.font(.caption) .font(.caption)
.foregroundStyle(.secondary) .foregroundStyle(.secondary)
} }
+1 -1
View File
@@ -62,7 +62,7 @@ struct VisualTimeline: Codable {
} }
/// The flat array `label-merge` wants: `[{start,end,name,confidence}]`, /// The flat array `label-merge` wants: `[{start,end,name,confidence}]`,
/// dropping `source`. Slice/rebase to chunk-local seconds happens in Phase 5. /// dropping `source`. Slice/rebase to chunk-local seconds happens at chunking time.
func flatTimelineData() throws -> Data { func flatTimelineData() throws -> Data {
let flat = segments.map { seg -> [String: Any] in let flat = segments.map { seg -> [String: Any] in
["start": seg.start, "end": seg.end, "name": seg.name, "confidence": seg.confidence] ["start": seg.start, "end": seg.end, "name": seg.name, "confidence": seg.confidence]
@@ -0,0 +1,35 @@
import XCTest
@testable import Ten31Transcripts
/// The TLS bypass is an opt-in escape hatch scoped to the configured backend host.
/// These cover the security gate (`allowsTrustOverride`) so a regression can't widen
/// it back to "trust any server". The gate is pure, so no network or SecTrust needed.
final class InsecureTrustDelegateTests: XCTestCase {
private func space(host: String,
method: String = NSURLAuthenticationMethodServerTrust) -> URLProtectionSpace {
URLProtectionSpace(host: host, port: 62419, protocol: "https",
realm: nil, authenticationMethod: method)
}
func testFiresForMatchingHost() {
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
XCTAssertTrue(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
}
func testRejectsMismatchedHost() {
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "evil.example.com")))
}
func testNilAllowedHostNeverFires() {
let d = InsecureTrustDelegate(allowedHost: nil)
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
}
func testOnlyServerTrustMethodFires() {
// Matching host but a non-server-trust challenge (e.g. HTTP Basic) must not override.
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
XCTAssertFalse(d.allowsTrustOverride(
for: space(host: "192.0.2.1", method: NSURLAuthenticationMethodHTTPBasic)))
}
}
@@ -0,0 +1,110 @@
import XCTest
@testable import Ten31Transcripts
final class SessionNamingTests: XCTestCase {
// MARK: sanitize
func testSanitizeTrimsAndKeepsSpaces() {
XCTAssertEqual(SessionNaming.sanitize(" Weekly Sync "), "Weekly Sync")
}
func testSanitizeReplacesPathSeparators() {
XCTAssertEqual(SessionNaming.sanitize("9/10 standup"), "9-10 standup")
XCTAssertEqual(SessionNaming.sanitize("a:b\\c"), "a-b-c")
}
func testSanitizeCollapsesWhitespaceRuns() {
XCTAssertEqual(SessionNaming.sanitize("board 1:1"), "board 1-1")
}
func testSanitizeStripsLeadingDots() {
XCTAssertEqual(SessionNaming.sanitize("...hidden"), "hidden")
XCTAssertEqual(SessionNaming.sanitize(".."), "")
}
func testSanitizeEmptyForBlankOrWhitespace() {
XCTAssertEqual(SessionNaming.sanitize(""), "")
XCTAssertEqual(SessionNaming.sanitize(" \n\t "), "")
}
func testSanitizeCapsLength() {
let long = String(repeating: "x", count: 200)
XCTAssertEqual(SessionNaming.sanitize(long).count, 60)
}
func testSanitizeStripsControlCharacters() {
XCTAssertEqual(SessionNaming.sanitize("a\u{0000}b\u{001F}c"), "abc")
}
// MARK: datePrefix
func testDatePrefixFromAutoName() {
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17T09-59-48_signal"), "2026-06-17")
}
func testDatePrefixFromRenamedName() {
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17_Weekly sync_signal"), "2026-06-17")
}
// MARK: renamedLeaf
func testRenamedLeafBasic() {
XCTAssertEqual(
SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "Weekly sync"),
"2026-06-17_Weekly sync_signal")
}
func testRenamedLeafAppStaysLastSegment() {
// The meeting name may contain underscores; the app must remain parseable as
// the final "_"-segment (what SessionController.appLabel reads).
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "meet", meetingName: "q3_planning")
XCTAssertEqual(leaf, "2026-06-17_q3_planning_meet")
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "meet")
}
func testRenamedLeafNilForBlankName() {
XCTAssertNil(SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: " "))
}
func testRenamedLeafCounterDisambiguatesNameSegment() {
// A collision suffixes the NAME, not the whole leaf, so "_app" stays last.
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "sync", counter: 1)
XCTAssertEqual(leaf, "2026-06-17_sync-2_signal")
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
}
func testRenamedLeafAppStaysLastAtMaxCollisionDepth() {
// The 100-collision cap is counter 099; the app must still parse out last.
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "q3_sync", counter: 99)
XCTAssertEqual(leaf, "2026-06-17_q3_sync-100_signal")
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
}
// MARK: recapTitle
func testRecapTitleAutoNamePreservesLegacyFormat() {
XCTAssertEqual(
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06T11-43-02_meet"),
"Google Meet call — 2026-06-06 11:43")
}
func testRecapTitleNamedSession() {
XCTAssertEqual(
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_Weekly sync_meet"),
"Weekly sync — Google Meet (2026-06-06)")
}
func testRecapTitleNamePreservesUnderscores() {
// A meeting name with underscores must survive the split/join round-trip.
XCTAssertEqual(
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_q3_planning_meet"),
"q3_planning — Google Meet (2026-06-06)")
}
func testRecapTitleUnknownAppCapitalizes() {
XCTAssertEqual(
SessionNaming.recapTitle(app: "manual", sessionId: "2026-06-06T11-43-02_manual"),
"Manual call — 2026-06-06 11:43")
}
}
+46 -35
View File
@@ -7,9 +7,9 @@
> returns named transcript segments. A growing **voiceprint library** recovers > returns named transcript segments. A growing **voiceprint library** recovers
> speakers even when the visual cue is missing. > speakers even when the visual cue is missing.
Master context document. Read this first, then `02_ARCHITECTURE.md`, Master context document. Read this first, then `02_ARCHITECTURE.md` and
`03_DATA_CONTRACTS.md`, `04_BUILD_PLAN.md`. The SparkControl API is now fully `03_DATA_CONTRACTS.md`. The SparkControl API is fully specified in
specified — see `03_DATA_CONTRACTS.md` (and the source `AUDIO_API.md`). `03_DATA_CONTRACTS.md`.
--- ---
@@ -20,25 +20,30 @@ A lightweight, always-running **menu-bar app on macOS** that:
1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams, 1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams,
or Signal. or Signal.
2. **Records two local audio tracks** — system audio (everyone else) and the 2. **Records two local audio tracks** — system audio (everyone else) and the
user's microphone (the user) — and **mixes them to one 16 kHz mono WAV** for user's microphone (the user). It sends the backend **dual-channel**
the backend. (`mic_file` + `system_file`) when the system track is healthy, falling back to
a **mixed-mono 16 kHz WAV** otherwise.
3. **Watches the call window** at ~24 fps and, per app, reads participant 3. **Watches the call window** at ~24 fps and, per app, reads participant
**names** and the **active-speaker cue**, producing a **names** and the **active-speaker cue**, producing a
`(start, end, name, confidence)` **visual timeline** — its best guess at who `(start, end, name, confidence)` **visual timeline** — its best guess at who
was talking when. was talking when.
4. **Discards every video frame after extraction.** No video is ever written to 4. **Discards every video frame after extraction.** No video is ever written to
disk. Only audio + the derived timeline persist locally. disk. Only audio + the derived timeline persist locally.
5. On call end, **POSTs the mixed audio + the visual timeline (+ the known 5. On call end, **POSTs the audio + the visual timeline (+ the known voiceprint
voiceprint library) to `POST /api/audio/label-merge`** on SparkControl, which library) to `POST /api/audio/label-merge`** on SparkControl, which returns
returns **named, speaker-attributed transcript segments** and a **voiceprint **named, speaker-attributed transcript segments** and a **voiceprint per
per speaker**. speaker**.
6. **Persists the returned voiceprints** keyed by name, so the next call can pass 6. **Persists the returned voiceprints** keyed by name, so the next call can pass
them as `known_voiceprints` and recover a speaker by voice when the visual cue them as `known_voiceprints` and recover a speaker by voice when the visual cue
is absent (camera off, a bad OCR frame). is absent (camera off, a bad OCR frame).
7. **Renders the result locally** — a readable `transcript.md` plus an HTML
`recap.html` (topics + meeting extras, generated via the backend's LLM
endpoint), with an in-app editor for fixing speaker names after the fact.
The app's job ends at receiving and storing the named segments from SparkControl. The app's job ends at producing the named transcript and recap from SparkControl's
**All transcription, diarization, and the name-merge happen on the backend.** Do segments. **All transcription, diarization, name-merge, and LLM analysis happen on
not build transcription, diarization, or the merge vote in this app. the backend.** Do not build transcription, diarization, or the merge vote in this
app.
## 2. Why the visual timeline still matters (the core idea) ## 2. Why the visual timeline still matters (the core idea)
@@ -68,19 +73,25 @@ few calls the system can name regulars even with cameras off.
**In scope (this app):** **In scope (this app):**
- Call detection for Meet / Zoom / Teams / Signal. - Call detection for Meet / Zoom / Teams / Signal.
- Dual-track local audio capture + mix-to-mono for the backend. - Dual-track local audio capture; **dual-channel send** (mic + system) with a
mix-to-mono fallback for the backend.
- Low-fps window capture → OCR (names) + active-speaker cue detection. - Low-fps window capture → OCR (names) + active-speaker cue detection.
- Per-app "adapter" modules encapsulating each app's UI quirks. - Per-app "adapter" modules encapsulating each app's UI quirks.
- Building the visual timeline; **mic-VAD self-labeling** (the mic track is the - Building the visual timeline; **mic-VAD self-labeling** (the mic track is the
user, so hot-mic spans pre-seed the user's name into the timeline). user, so hot-mic spans pre-seed the user's name into the timeline).
- Chunking long calls (~23 min) and calling `label-merge` **sequentially**. - Chunking long calls (~23 min) and calling `label-merge` **sequentially**.
- A local **voiceprint store** (persist + replay named voiceprints). - A local **voiceprint store** (persist + replay named voiceprints).
- Storing the backend's named transcript segments locally. - Storing the backend's named segments and **rendering** them — `transcript.md`
- A minimal menu-bar UI: status, manual start/stop, recent sessions, adapter plus an HTML `recap.html` (recap analysis via the backend LLM) — with an in-app
toggles, backend host/health, output folder. speaker-name editor.
- A minimal menu-bar UI: status, manual start/stop, the last session (reveal,
resend, open recap, edit speakers), adapter toggles, backend host/health,
output folder.
**Out of scope (owned by the backend):** **Out of scope (owned by the backend):**
- Transcription, diarization, the name-merge vote, summarization/analysis. - Transcription, diarization, the name-merge vote, and LLM summarization — these
run on the backend; the app only orchestrates the recap call and renders the
result.
**Explicitly not doing:** saving video; cloud anything. Everything stays on the **Explicitly not doing:** saving video; cloud anything. Everything stays on the
operator's LAN. operator's LAN.
@@ -91,14 +102,14 @@ operator's LAN.
|---|---|---| |---|---|---|
| Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. | | Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. |
| Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. | | Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. |
| Backend audio format | **Mixed-mono 16 kHz WAV** | Diarizer separates speakers from one mixed stream; 16 kHz is ideal. | | Backend audio format | **Dual-channel (mic + system)** when the system track is healthy, else **mixed-mono 16 kHz WAV** | Separate tracks let the backend attribute the user's mic channel directly; the diarizer can still split the mono fallback. |
| Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. | | Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. |
| Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. | | Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. |
| Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. | | Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. |
| Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. | | Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. |
| Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). | | Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). |
| Long calls | Chunk ~23 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. | | Long calls | Chunk ~23 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. |
| Transport / TLS | `multipart/form-data`, file field `file`; self-signed Start9 cert (skip verify or trust the Root CA); **no auth on LAN** | Matches every other SparkControl endpoint. | | Transport / TLS | `multipart/form-data`, file field `file` (mono) or `mic_file` + `system_file` (dual-channel); self-signed Start9 cert (trust the Root CA — supported default; host-scoped skip-verify is an off-by-default escape hatch); **no auth on LAN** | Matches every other SparkControl endpoint. |
| Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. | | Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. |
### On forking Hyprnote ### On forking Hyprnote
@@ -128,25 +139,25 @@ SparkControl, on the operator's Start9 LAN, fronting two DGX Sparks:
- **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize + - **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize +
name from the visual timeline (+ voiceprint fallback), optionally transcribe, name from the visual timeline (+ voiceprint fallback), optionally transcribe,
in one synchronous call. in one synchronous call.
- **LLM (recap):** Qwen3 via OpenAI-compatible `POST /v1/chat/completions`
generates the readable recap (topics + meeting extras) from the transcript.
- Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`. - Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`.
Full request/response shapes, curl examples, limits, and error formats are in Full request/response shapes, curl examples, limits, and error formats are in
`03_DATA_CONTRACTS.md`. `03_DATA_CONTRACTS.md`.
## 7. Remaining open items (small) ## 7. Settled decisions (were open at brief time)
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred 1. **Base URL.** A private LAN host — a `.local` mDNS name (preferred over a raw
over a raw IP, since it survives IP changes) — configured in Settings or via the IP, since it survives IP changes) — configured in Settings or via the
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as `SPARK_BACKEND_URL` env var, never committed. A neutral placeholder ships as the
the default; keep it editable in settings. Service-discovery at default and stays editable in Settings. Service-discovery at `GET /api/endpoints`.
`GET /api/endpoints`. 2. **Send trigger.** Auto-send on call end is a setting (`autoSendOnStop`), **off
2. **Send trigger**assume auto-POST on call end; expose a "hold for review" by default** — the user reviews the session and sends manually unless they opt in.
toggle if the user wants to eyeball the timeline first. 3. **Retention.** The session folder is kept after a successful hand-off (output
3. **Retention** — keep the session folder after a successful hand-off, or prune location is configurable); nothing is pruned automatically.
audio and keep only `speakers.json` + voiceprints? Default: keep everything, 4. **Voiceprint update policy.** Store/refresh the latest high-confidence vector
user-configurable. per name (`02_ARCHITECTURE.md §2.9`); a per-name running average is a possible
4. **Voiceprint update policy** — overwrite vs running-average a person's stored later refinement.
voiceprint across calls (see `02_ARCHITECTURE.md §2.9`). Start simple 5. **Signing.** A stable identity via `Config/Signing.xcconfig` (gitignored) keeps
(store/refresh latest high-confidence), refine later. macOS from re-prompting for permissions on each rebuild.
5. **Signing** — stable identity so macOS doesn't re-prompt for permissions on
each rebuild.
+23 -6
View File
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
└────────────────┘ └────────────────────┘ └────────────────┘ └────────────────────┘
``` ```
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
the backend LLM — see §2.11.)
## 2. Modules ## 2. Modules
### 2.1 `CallDetector` ### 2.1 `CallDetector`
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
``` ```
### 2.7 `SparkControlClient` ### 2.7 `SparkControlClient`
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
`file`, `timeline`, `known_voiceprints`, `transcribe=true`. **dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
system track is healthy, else the **mono** `file`; always with `timeline`,
`known_voiceprints`, `transcribe=true`.
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`). - **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the - **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
Start9 cert) or trust the Root CA. **No auth on the LAN.** Start9 cert) or trust the Root CA. **No auth on the LAN.**
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
- Editable/clearable from the menu-bar UI (rename, delete a person, reset). - Editable/clearable from the menu-bar UI (rename, delete a person, reset).
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`) ### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
Status (idle / detected / recording / uploading), manual start/stop, recent Status (idle / detected / recording / finishing), manual start/stop with live
sessions (open folder, resend, delete), adapter toggles, **backend host + a mic/system level meters, and the **last session** — reveal in Finder, resend
health check** (`GET /api/status`), output folder, voiceprint manager, and a ("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
permissions checklist (Screen Recording, Microphone, Accessibility). to reprocess an existing folder. Also a **backend host + health check**
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
(Microphone, Screen Recording, Accessibility). (No multi-session list or
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
After `speakers.json`, the recap phase turns the named transcript into the
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
`RecapEditModel`) rewrites names across all outputs after the fact. All
language-model work stays on the backend; the app orchestrates and renders.
## 3. macOS frameworks & permissions ## 3. macOS frameworks & permissions
+28 -11
View File
@@ -1,7 +1,7 @@
# Data Contracts — Ten31 Transcripts # Data Contracts — Ten31 Transcripts
Companion to docs 01/02. Defines the files the app produces/stores and the **real Companion to docs 01/02. Defines the files the app produces/stores and the **real
SparkControl contract** (source of truth: `AUDIO_API.md`). The `label-merge` SparkControl contract** (verified against the live backend). The `label-merge`
endpoint is the app's primary integration point. endpoint is the app's primary integration point.
--- ---
@@ -69,8 +69,10 @@ When chunking, **slice to the chunk window and rebase to chunk-local seconds**
"app_version": "0.1.0" "app_version": "0.1.0"
} }
``` ```
(`mixed_mono_16k.wav` is the one the backend gets; the separate tracks are kept (On the dual-channel path the backend gets `mic.wav` + `system.wav` directly; on
locally — the mic track is the user's known identity / VAD source.) the mono fallback it gets `mixed_mono_16k.wav`. The mic track is the user's known
identity / VAD source. **Note:** the per-file `sha256` fields above are part of the
intended contract but are **not currently emitted** by the pipeline.)
--- ---
@@ -83,15 +85,17 @@ locally — the mic track is the user's known identity / VAD source.)
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
the default. the default.
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession` - **TLS:** Start9 self-signed Root CA. Supported path: install the Start9 Root CA
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install into the System keychain (default trust then succeeds). Skip-verification is an
the Start9 Root CA into the trust store. **off-by-default, host-scoped** escape hatch (`InsecureTrustDelegate`, scoped to
the configured backend host), not the default.
- **Auth:** **none on the LAN.** No token/key today. - **Auth:** **none on the LAN.** No token/key today.
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription), - **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio ~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
trips a GPU FFT race → `503 + Retry-After`. trips a GPU FFT race → `503 + Retry-After`.
- **Transport:** `multipart/form-data`, audio file field name **`file`** (bytes, - **Transport:** `multipart/form-data`. Audio file field is **`file`** on the mono
not base64/path). path, or **`mic_file`** + **`system_file`** on the dual-channel path (bytes, not
base64/path).
- **All endpoints are synchronous** (no job IDs / polling). - **All endpoints are synchronous** (no job IDs / polling).
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 + - **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
Retry-After` transient (retry after the interval). Retry-After` transient (retry after the interval).
@@ -105,11 +109,16 @@ Diarize + name clusters from the visual timeline (majority temporal overlap),
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** — with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
the app owns the timeline and the voiceprint library. the app owns the timeline and the voiceprint library.
**Multipart fields:** **Multipart fields** — two audio shapes: **mono** (`file`) or **dual-channel**
(`mic_file` + `system_file`, preferred when the system track is healthy):
| field | required | notes | | field | required | notes |
|---|---|---| |---|---|---|
| `file` | **yes** | mixed-mono WAV (the chunk, when chunking) | | `file` | mono path | mixed-mono WAV (the chunk, when chunking) |
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1) | | `mic_file` | dual path | the user's mic track (chunk) — attributed to `self_name` |
| `system_file` | dual path | the remote/system track (chunk) |
| `self_name` | dual path | the user's name; the mic channel is attributed to them |
| `self_vad` | no | chunk-local windows where the mic is genuinely the user (active + louder than system) |
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1); on the dual path it names only the remote speakers |
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` | | `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
| `transcribe` | no | `"true"` to also return per-segment text (default false) | | `transcribe` | no | `"true"` to also return per-segment text (default false) |
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) | | `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
@@ -213,3 +222,11 @@ Loaded → `known_voiceprints` on every `label-merge` call. Updated from respons
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never `fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
`overlap_confidence ≥ ~0.8`; consider per-name running mean later. `overlap_confidence ≥ ~0.8`; consider per-name running mean later.
## 8. Recap outputs (`transcript.md`, `recap.{html,json}`)
After `speakers.json` is assembled, the recap phase renders the human-readable
deliverables: a `transcript.md` (one line per diarized utterance) and an HTML
`recap.html`, backed by a structured `recap.json`. The recap's topic/summary
content is generated by the **backend LLM** (`POST /v1/chat/completions`, Qwen3);
the app owns the rendering and the in-app **speaker-name editor**, which can rewrite
names across `speakers.json`, the transcript, and the recap after the fact.
+6
View File
@@ -1,5 +1,11 @@
# Build Plan — Ten31 Transcripts # Build Plan — Ten31 Transcripts
> **Status: COMPLETE (historical).** Phases 06 shipped and the app is in daily
> use; a recap phase (transcript + HTML recap via the backend LLM) was added after
> this plan was written. Kept as the original build log and as the map for the
> "Phase N" references in the code comments. Forward-looking work lives in
> `ROADMAP.md`; current status in `AGENTS.md`.
Companion to docs 0103. Phased plan for the Claude Code session, each phase with Companion to docs 0103. Phased plan for the Claude Code session, each phase with
a demoable milestone. Build in order; the risky/novel work (visual adapters) is a demoable milestone. Build in order; the risky/novel work (visual adapters) is
isolated for independent tuning. The SparkControl contract is now known isolated for independent tuning. The SparkControl contract is now known