Compare commits
13 Commits
ddee2c4871
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 050ae32e1d | |||
| a5c227ef1c | |||
| d4228b566a | |||
| 35ba6ecf05 | |||
| dda4322de7 | |||
| 85ea8fde45 | |||
| b42b591690 | |||
| 82de00ce37 | |||
| d770e52d8f | |||
| fc80f6707a | |||
| 0af86411c2 | |||
| 5bed24a454 | |||
| 3629dbdaaa |
@@ -0,0 +1 @@
|
||||
{}
|
||||
+12
-1
@@ -23,4 +23,15 @@ Config/Signing.xcconfig
|
||||
|
||||
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
|
||||
.env
|
||||
.env.local
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# Claude Code — deny by default, allow-list shared wiring.
|
||||
# .claude/ also accumulates worktrees, editor configs, and OS cruft; commit
|
||||
# only the shared parts so new local scratch (or a stray secret) stays out.
|
||||
.claude/*
|
||||
!.claude/rules/
|
||||
!.claude/agents/
|
||||
!.claude/commands/
|
||||
!.claude/skills/
|
||||
!.claude/settings.json
|
||||
|
||||
@@ -2,12 +2,14 @@
|
||||
|
||||
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
|
||||
|
||||
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items tagged `(ten31-transcripts)` and surface them before proposing next steps; triage with `/triage`.
|
||||
|
||||
## Stack (versions that matter)
|
||||
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
|
||||
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
|
||||
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
|
||||
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
|
||||
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
||||
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN backend — IP or `.local` host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, **host-scoped** escape hatch, **off by default** — see `InsecureTrustDelegate`). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
||||
|
||||
## Commands
|
||||
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
|
||||
@@ -44,23 +46,24 @@ open /Applications/Ten31Transcripts.app
|
||||
|
||||
## Layout (day one)
|
||||
- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
|
||||
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
|
||||
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`, `SessionNaming` (pure folder-name + recap-title logic).
|
||||
- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
|
||||
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
|
||||
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
|
||||
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`, `AudioMixer`, `MonoTrackWriter`, `Resampler`.
|
||||
- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
|
||||
- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
|
||||
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
||||
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`/`AudioInputProcesses`/`MicActivityMonitor`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
||||
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
|
||||
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
|
||||
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`. The folder is created at session start as `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; on stop the user can name the meeting and it's renamed to `<date>_<name>_<app>` (skipping keeps the auto stamp).
|
||||
|
||||
## Conventions
|
||||
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
|
||||
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
|
||||
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing; the remote URL lives in `.git/config`, kept out of source. Branch before committing; never commit to `main` without asking.
|
||||
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing, with my approval; the remote URL lives in `.git/config`, kept out of source. Work on `main` — don't create feature branches unless I ask.
|
||||
- **Gitea push gotcha:** `origin`'s URL uses a raw `.local` mDNS host that intermittently fails to resolve (`Could not resolve hostname`, or a push that connects then stalls). The `gitea-home` SSH alias (in `~/.ssh/config`) points at the **same** Gitea server (port 59916, user `git`) via a reliable HostName — the sibling `standards` repo uses it. Reliable fallback: `git push gitea-home:grant/ten31-transcripts.git main` then `git update-ref refs/remotes/origin/main main`. Repointing `origin` to the alias would make this permanent (not yet done).
|
||||
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
|
||||
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
|
||||
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
|
||||
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. A **second rewrite the same day** purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (`192.0.2.1`/`192.0.2.2`) and force-pushing; 0 hits across refs; backup bundle `../ten31-transcripts-pre-ip-scrub.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
|
||||
|
||||
## Always
|
||||
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
|
||||
@@ -78,18 +81,16 @@ open /Applications/Ten31Transcripts.app
|
||||
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
|
||||
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
|
||||
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
|
||||
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
|
||||
- Never commit to `main` or force-push a shared branch; branch first and ask.
|
||||
- Never let a session-folder name put the meeting name where the app label is parsed from: the app must stay the **last** `_`-segment (`SessionController.appLabel(from:)` reads `.split("_").last`; `SessionNaming` enforces this and disambiguates collisions on the name segment). Renames happen at `finish()`-time after files are closed — re-derive track URLs from the (possibly moved) folder, never from `RecordingResult`'s start-time paths.
|
||||
- Never send call audio to a raw IP the user didn't configure. Offline backend checks: a `.local` mDNS host can't be resolved by a plain `swiftc`/URLSession binary (`-1009`) — use the **real app** or `curl`; but a **configured raw IP _is_ reachable from a plain swiftc URLSession binary** (that's how the TLS fix was verified offline).
|
||||
- Never force-push a shared branch, and never push without my approval. (Work on `main` — don't create feature branches unless I ask.)
|
||||
|
||||
## Current state
|
||||
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs; working tree clean and pushed to `origin`/`main`. A full independent evaluation ran 2026-06-13 → `EVALUATION.md` (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below.
|
||||
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
|
||||
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
|
||||
- **Work queue (P1 — do first):** the TLS-trust override is global and on by default — it returns `URLCredential(trust:)` for *any* host (`InsecureTrustDelegate.swift:22`; default-on at `AppSettings.swift:109`), so the full mic+system audio, visual timeline, and voiceprint upload is MITM-able by anyone on the LAN. Scope the override to the configured backend host and pin the Start9 root CA (or the leaf SPKI hash); default skip-TLS to off. This gates trusting any later backend-integration test.
|
||||
- **Known debt (P2 — fix before wider use):**
|
||||
- `RecapAnalyzer.mmss()` fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backend `duration` (e.g. `1e400` → `Double.infinity`) aborts the app at recap-render time — add a finite-guard fallback (`RecapAnalyzer.swift:137`).
|
||||
- README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment `AppSettings.swift:7`. Rewrite both to match reality.
|
||||
- `SessionController` (670 lines, the most concurrency-dense file) has zero unit tests — cover `pendingAutoStop` (auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor.
|
||||
- **Deferred (P3 — later decision or bulk cleanup; full evidence in `EVALUATION.md`):** `docs/` specs drifted from the dual-channel API + recap phase; `docs/01` §7 lists already-resolved open items; `docs/02` §2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings under `Audio/`/`Detection/` are incomplete; the `manifest.json` sha256 contract is specced but never written; env-var precedence footgun (saved URL shadows `SPARK_BACKEND_URL`); `SessionController` owns three jobs (extract the open-panel UI); unused `NSAppleEventsUsageDescription`; unauthenticated LAN backend (consider a shared bearer token).
|
||||
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
|
||||
- **Next (product validation — no agent could reach the live backend, so this stays manual):** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic `"Me"`; "Grant" only ever lives in local UserDefaults.)
|
||||
Present tense; overwritten each session. `main` clean and pushed (HEAD `a5c227e`, pushed via the `gitea-home` alias — origin's `.local` host wouldn't resolve); `/Applications/Ten31Transcripts.app` rebuilt + installed from HEAD. **Full suite re-run: 91 pass** (was 73; +18 `SessionNamingTests`).
|
||||
- **This session (2026-06-17) — meeting-name prompt + folder rename:** on stop, an NSAlert asks for a meeting name (Save/Skip) and the session folder is renamed `<ts>_<app>` → `<date>_<name>_<app>` (HH-MM-SS dropped; Skip/blank keeps the stamp). Pure logic in `SessionNaming` (sanitize, leaf compose, `recapTitle` for both forms); app label stays the last `_`-segment; collisions disambiguate on the name segment; `finish()` re-derives track URLs post-rename; quit never prompts and aborts an open prompt. Reviewer-reviewed; its P1 (quit-during-modal) + two P2s fixed.
|
||||
- **Backend connected end-to-end:** real LAN URL saved in Settings → SparkControl backend (off-repo: `defaults read xyz.ten31.transcripts backendBaseURL`); committed default stays the placeholder.
|
||||
- **Working:** backend hand-off (live), call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked send, speaker reconciliation, recap, speaker editor, configurable chunk length, standalone Settings, meeting-name prompt + readable folders.
|
||||
- **Verify next (real app):** the naming prompt + rename is unit-tested + builds but **not yet exercised on a live stop** — run a real recording, stop, name it, confirm the folder renames and backend output lands in the renamed folder.
|
||||
- **Next up:** (a) repoint `origin` to `gitea-home` so pushes stop hitting the flaky `.local` host (see Conventions); (b) **backend URL primary→fallback** + the `mmss()` NaN/∞ guard freebie (sketch first; keep real IPs out of source — use `192.0.2.x`).
|
||||
- **In progress / unverified:** the Meet visual fix (reject solid camera-off tiles) still has no clean end-to-end run — re-process the saved Meet session + a fresh Meet call (needs real app + backend).
|
||||
- **Known bugs / loose end:** sparse Meet speaking-detection (faint blue border); sub-second junk "self" mic fragments; desktop-mic vs phone doesn't unify by voiceprint. Doc loose end: `docs/01 §5`/`docs/02 §2.4` still list "AppleScript" as a Meet name source though the code uses window titles.
|
||||
|
||||
@@ -1,74 +1,146 @@
|
||||
# Ten31 Transcripts
|
||||
|
||||
Native macOS menu-bar app that auto-detects conference calls, records local audio,
|
||||
builds a visual-derived speaker timeline, and hands audio + timeline to the
|
||||
SparkControl backend for naming/transcription. See `docs/` for the full spec.
|
||||
Native macOS menu-bar app that auto-detects conference calls, records dual-track
|
||||
audio while watching the call window for active-speaker cues, and hands the audio
|
||||
plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
|
||||
the transcription, diarization, and speaker naming — producing named transcripts
|
||||
and meeting recaps.
|
||||
|
||||
This repo is at **Phase 0** (scaffold, permissions, backend health check).
|
||||
It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
|
||||
the backend; the app only records, watches, packages, and reconciles hints.
|
||||
|
||||
## How it works
|
||||
|
||||
1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
|
||||
notices and (optionally) auto-starts a session.
|
||||
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
|
||||
`ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
|
||||
active speaker. Video frames are analyzed in memory and released immediately —
|
||||
**never written to disk**.
|
||||
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
|
||||
(`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
|
||||
The visual timeline rides along as naming hints. Backend calls are sequential
|
||||
(one in flight) to respect the single-GPU backend.
|
||||
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
|
||||
(Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
|
||||
hints and your stored voiceprints.
|
||||
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
|
||||
`transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
|
||||
names after the fact.
|
||||
|
||||
**You** are identified by the mic channel plus the single name in *Settings → Your
|
||||
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
|
||||
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
|
||||
|
||||
## One-time setup
|
||||
|
||||
1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and
|
||||
1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
|
||||
accept the license prompt.
|
||||
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
|
||||
```sh
|
||||
brew install xcodegen
|
||||
```
|
||||
3. **Set your signing team.** The Apple Team ID is kept out of source in a
|
||||
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
|
||||
3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
|
||||
`Config/Signing.xcconfig`. Copy the template and set your team:
|
||||
```sh
|
||||
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||
```
|
||||
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
||||
team automatically — no manual selection. Keep the value stable so macOS
|
||||
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
|
||||
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
|
||||
4. **Generate the project:**
|
||||
team automatically. Keep the value stable so macOS preserves the app's permission
|
||||
(TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
|
||||
overwrites Xcode-side changes.
|
||||
4. **Generate the project** (re-run any time you add/remove/rename a source file):
|
||||
```sh
|
||||
xcodegen generate
|
||||
```
|
||||
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
||||
5. **Open it:**
|
||||
```sh
|
||||
open Ten31Transcripts.xcodeproj
|
||||
```
|
||||
6. Press **Run** (⌘R).
|
||||
This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).
|
||||
|
||||
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
||||
> Xcode reload the project. The signing team persists because it lives in
|
||||
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
|
||||
> rebuilds.
|
||||
## Build & run
|
||||
|
||||
## What Phase 0 does
|
||||
The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).
|
||||
|
||||
- Launches as a menu-bar-only app (no Dock icon).
|
||||
- Menu panel shows live status for the three permissions it needs — **Microphone**,
|
||||
**Screen Recording**, **Accessibility** — with Grant / Open Settings buttons.
|
||||
- Shows a **backend health check** (`GET /api/status`) against the configured host.
|
||||
- **Settings:** backend base URL, skip-TLS toggle (on by default for the
|
||||
self-signed cert), output folder, and adapter toggles (inert this phase).
|
||||
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
|
||||
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
|
||||
`xcode-select` may point at the Command Line Tools, so set it on **every**
|
||||
`xcodebuild`:
|
||||
|
||||
No audio capture, call detection, screen reading, or backend hand-off yet — those
|
||||
arrive in Phases 1–6 (`docs/04_BUILD_PLAN.md`).
|
||||
```sh
|
||||
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
|
||||
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||
-configuration Release -derivedDataPath /tmp/ten31-release build
|
||||
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
|
||||
open /Applications/Ten31Transcripts.app
|
||||
```
|
||||
|
||||
The installed copy does **not** auto-update — rebuild and `ditto` again after changes.
|
||||
|
||||
Run the test suite:
|
||||
|
||||
```sh
|
||||
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
|
||||
```
|
||||
|
||||
## Permissions
|
||||
|
||||
The menu panel shows live status for the three permissions the app needs, each with
|
||||
Grant / Open Settings buttons:
|
||||
|
||||
- **Microphone** — to record your side of the call.
|
||||
- **Screen Recording** — to capture system audio and watch the call window.
|
||||
- **Accessibility** — to read window/participant information.
|
||||
|
||||
## Backend setup
|
||||
|
||||
Point the app at your SparkControl backend in **Settings → SparkControl backend**.
|
||||
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
|
||||
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
|
||||
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
|
||||
URL lives in Settings and never touches source.
|
||||
|
||||
The backend sits behind a Start9 self-signed Root CA. The supported path is to
|
||||
**install the StartOS Root CA in your System keychain**, after which normal TLS
|
||||
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
|
||||
default** and **scoped to the configured backend host** — it never becomes
|
||||
"trust any server."
|
||||
|
||||
## Output
|
||||
|
||||
Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
|
||||
in Settings):
|
||||
|
||||
```
|
||||
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
|
||||
self_vad.json visual_timeline.json # self voice-activity + visual hints
|
||||
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
|
||||
transcript.md recap.html recap.json # final outputs
|
||||
```
|
||||
|
||||
## Project layout
|
||||
|
||||
```
|
||||
project.yml # XcodeGen recipe → generates the .xcodeproj
|
||||
Ten31Transcripts/
|
||||
App/ Ten31TranscriptsApp.swift, AppDelegate.swift
|
||||
UI/ MenuBarView, SettingsView, PermissionRow
|
||||
Permissions/PermissionsManager.swift
|
||||
Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift
|
||||
Settings/ AppSettings.swift
|
||||
Support/ Info.plist, Ten31Transcripts.entitlements
|
||||
Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
|
||||
App/ @main entry + AppDelegate
|
||||
Detection/ CallDetector — which app is in a call
|
||||
Audio/ dual-track capture, mixing, resampling, self-VAD
|
||||
Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
|
||||
Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
|
||||
Session/ SessionController state machine, packaging, reconciliation
|
||||
Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
|
||||
Recap/ transcript.md + recap.html rendering, speaker editor
|
||||
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
|
||||
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
|
||||
docs/ # architecture & data-contract design notes
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
||||
LAN-only tool that must observe other apps. Revisit only if distributing.
|
||||
- The backend host is a private LAN address — set it in **Settings**, or seed it
|
||||
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
|
||||
placeholder (`https://your-spark-backend.local`).
|
||||
- **Privacy:** video frames are never written to disk; recordings, transcripts, and
|
||||
screenshots are gitignored and never committed.
|
||||
- `AGENTS.md` is the canonical reference for build commands, conventions, and current
|
||||
state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
|
||||
data-contract design notes.
|
||||
|
||||
@@ -10,6 +10,9 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
|
||||
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
||||
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
||||
|
||||
## Platform support
|
||||
- Jitsi: add call detection + a `JitsiAdapter` (Jitsi Meet is browser-based like Google Meet — needs `CallDetector` title recognition, an adapter for participant-name reading, and active-speaker visual cues). New platform alongside Meet/Zoom/Teams/Signal.
|
||||
|
||||
## Audio / speakers
|
||||
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
||||
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
||||
@@ -22,5 +25,10 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
|
||||
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
||||
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
||||
|
||||
## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`)
|
||||
- Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
|
||||
- Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
|
||||
- Smaller P3s in `EVALUATION.md`: whether to actually emit the `manifest.json` per-file `sha256` (now documented as not-emitted in `docs/03` §2); unauthenticated LAN backend (consider a bearer token).
|
||||
|
||||
## Deferred decisions
|
||||
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|
||||
|
||||
@@ -3,9 +3,8 @@ import SwiftUI
|
||||
/// Menu-bar-only app entry point.
|
||||
///
|
||||
/// `LSUIElement` (set in Info.plist) keeps the app out of the Dock; the
|
||||
/// `MenuBarExtra` scene provides the status-bar item and its panel. Phase 0 only
|
||||
/// wires up permissions, settings, and a backend health check — no audio,
|
||||
/// capture, or call detection yet.
|
||||
/// `MenuBarExtra` scene provides the status-bar item and its panel, which wires
|
||||
/// up permissions, settings, recording control, and the backend health check.
|
||||
@main
|
||||
struct Ten31TranscriptsApp: App {
|
||||
@NSApplicationDelegateAdaptor(AppDelegate.self) private var appDelegate
|
||||
|
||||
@@ -14,7 +14,7 @@ struct RecordingResult {
|
||||
let systemNote: String?
|
||||
}
|
||||
|
||||
/// Dual-track local audio capture for Phase 1.
|
||||
/// Dual-track local audio capture.
|
||||
///
|
||||
/// - System audio via `SCStream` (`capturesAudio`); its audio handler runs on
|
||||
/// `ioQueue`. A discard-only video output runs on `screenQueue` purely to keep
|
||||
|
||||
@@ -13,8 +13,8 @@ struct VADSpan: Equatable {
|
||||
/// internal sample cursor always equals the mic file position, and span times
|
||||
/// land on the same instants as `mixed_mono_16k.wav`.
|
||||
///
|
||||
/// Phase 3's `TimelineBuilder` will fold these in as high-confidence pre-seeded
|
||||
/// "self" segments. Thresholds are intentionally simple and will be tuned later.
|
||||
/// `TimelineBuilder` folds these in as high-confidence pre-seeded "self"
|
||||
/// segments. Thresholds are intentionally simple.
|
||||
///
|
||||
/// Single-threaded: all calls happen on `AudioRecorder.ioQueue`.
|
||||
final class MicVAD {
|
||||
|
||||
@@ -33,7 +33,9 @@ final class GatewayLLMClient {
|
||||
config.timeoutIntervalForRequest = 600
|
||||
config.timeoutIntervalForResource = 900
|
||||
config.waitsForConnectivity = false
|
||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
||||
let delegate: URLSessionDelegate? = skipTLS
|
||||
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
|
||||
: nil
|
||||
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,19 +1,42 @@
|
||||
import Foundation
|
||||
|
||||
/// URLSession delegate that trusts the server certificate without validation.
|
||||
/// URLSession delegate that bypasses certificate validation for **one host only**
|
||||
/// — the configured SparkControl backend.
|
||||
///
|
||||
/// SparkControl sits behind a Start9 self-signed Root CA on the LAN, so default
|
||||
/// trust evaluation rejects it. This delegate is used **only** when the
|
||||
/// "Skip TLS verification" setting is on. It trusts any server certificate —
|
||||
/// acceptable for a personal tool on a trusted local network and nothing else.
|
||||
/// SparkControl sits behind a Start9 self-signed Root CA on the LAN. The supported
|
||||
/// path is to install that CA in the System keychain; default trust evaluation then
|
||||
/// succeeds and this delegate is never used. It exists only as an opt-in escape
|
||||
/// hatch (the "Skip TLS verification" setting, off by default) for a machine where
|
||||
/// the CA isn't installed. Even then it trusts a certificate only when the challenge
|
||||
/// host equals `allowedHost` — a server-trust challenge from any other host falls
|
||||
/// back to default validation, so the bypass can never become "trust any server".
|
||||
final class InsecureTrustDelegate: NSObject, URLSessionDelegate {
|
||||
/// The single host the bypass is scoped to (the configured backend host). When
|
||||
/// nil — only reachable via a malformed base URL — the gate never fires and every
|
||||
/// challenge falls back to default validation: the safe degenerate case.
|
||||
private let allowedHost: String?
|
||||
|
||||
init(allowedHost: String?) {
|
||||
self.allowedHost = allowedHost
|
||||
}
|
||||
|
||||
/// The security gate: the trust override may fire only for a server-trust
|
||||
/// challenge whose host matches `allowedHost`. Pure and synchronous so the
|
||||
/// host-scoping can be unit-tested without fabricating a `SecTrust`; the
|
||||
/// credential itself is built only when this is true *and* a serverTrust exists.
|
||||
func allowsTrustOverride(for space: URLProtectionSpace) -> Bool {
|
||||
guard let allowedHost else { return false }
|
||||
return space.authenticationMethod == NSURLAuthenticationMethodServerTrust
|
||||
&& space.host == allowedHost
|
||||
}
|
||||
|
||||
func urlSession(
|
||||
_ session: URLSession,
|
||||
didReceive challenge: URLAuthenticationChallenge,
|
||||
completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void
|
||||
) {
|
||||
guard
|
||||
challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust,
|
||||
allowsTrustOverride(for: challenge.protectionSpace),
|
||||
let serverTrust = challenge.protectionSpace.serverTrust
|
||||
else {
|
||||
completionHandler(.performDefaultHandling, nil)
|
||||
|
||||
@@ -82,7 +82,9 @@ final class SparkControlClient {
|
||||
config.timeoutIntervalForRequest = 600 // diarization can take up to ~600s
|
||||
config.timeoutIntervalForResource = 900
|
||||
config.waitsForConnectivity = false
|
||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
||||
let delegate: URLSessionDelegate? = skipTLS
|
||||
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
|
||||
: nil
|
||||
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
import Foundation
|
||||
import Combine
|
||||
|
||||
/// Performs the Phase 0 backend reachability check: `GET {baseURL}/api/status`.
|
||||
/// Performs the backend reachability check: `GET {baseURL}/api/status`.
|
||||
///
|
||||
/// This is a thin slice — the full `SparkControlClient` (label-merge, multipart,
|
||||
/// sequential queueing, retries) arrives in Phase 5.
|
||||
/// This is a thin slice; the full upload path (label-merge, multipart, sequential
|
||||
/// queueing, retries) lives in `SparkControlClient`.
|
||||
@MainActor
|
||||
final class SparkControlHealth: ObservableObject {
|
||||
|
||||
@@ -32,7 +32,9 @@ final class SparkControlHealth: ObservableObject {
|
||||
config.timeoutIntervalForRequest = 8
|
||||
config.waitsForConnectivity = false
|
||||
|
||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
||||
let delegate: URLSessionDelegate? = skipTLS
|
||||
? InsecureTrustDelegate(allowedHost: url.host)
|
||||
: nil
|
||||
let session = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||
defer { session.finishTasksAndInvalidate() }
|
||||
|
||||
|
||||
@@ -99,6 +99,11 @@ final class SessionController: ObservableObject {
|
||||
/// Bumped each time a start/stop Task is spawned (Task is a value type, so this
|
||||
/// is how `prepareForTermination` detects a newly-spawned transition).
|
||||
private var lifecycleGeneration = 0
|
||||
/// The meeting-name prompt currently on screen, if any, so a quit can end it
|
||||
/// instead of blocking termination on user input (set in `askMeetingName`).
|
||||
private weak var activeNamingAlert: NSAlert?
|
||||
/// Set once `prepareForTermination` begins, so we skip the post-stop naming prompt.
|
||||
private var isTerminating = false
|
||||
|
||||
init(settings: AppSettings) {
|
||||
self.settings = settings
|
||||
@@ -324,6 +329,9 @@ final class SessionController: ObservableObject {
|
||||
lifecycleTask = Task {
|
||||
let result = await recorder.stop()
|
||||
let visual = await self.stopVisualAndTimeline(result, folder: folder)
|
||||
// Interactive stop only: ask for a meeting name and give the folder a
|
||||
// readable name before `finish()` captures it for backend processing.
|
||||
self.promptMeetingNameAndRename()
|
||||
self.finish(result, timeline: visual.timeline, selfSpans: visual.selfSpans, visualRan: visual.visualRan)
|
||||
}
|
||||
}
|
||||
@@ -338,13 +346,18 @@ final class SessionController: ObservableObject {
|
||||
if let folder = currentFolder {
|
||||
writeSelfSpans(spans: selfSpans, result: result, to: folder)
|
||||
let visualCount = visualRan ? timeline.count : nil // `timeline` is the remote vision segments
|
||||
// Re-derive the track URLs from `folder`: a meeting-name rename may have
|
||||
// moved the session after `result` captured its original paths.
|
||||
let micURL = folder.appendingPathComponent("mic.wav")
|
||||
let systemURL = folder.appendingPathComponent("system.wav")
|
||||
let mixedURL = folder.appendingPathComponent("mixed_mono_16k.wav")
|
||||
lastSession = SessionInfo(
|
||||
folder: folder, mixedURL: result.mixedURL,
|
||||
folder: folder, mixedURL: mixedURL,
|
||||
duration: result.duration, selfSpanCount: selfSpans.count,
|
||||
visualSegmentCount: visualCount)
|
||||
lastProcess = ProcessInputs(
|
||||
folder: folder, sessionId: folder.lastPathComponent, app: currentLabel,
|
||||
micURL: result.micURL, systemURL: result.systemURL, mixedURL: result.mixedURL,
|
||||
micURL: micURL, systemURL: systemURL, mixedURL: mixedURL,
|
||||
timeline: timeline, selfSpans: selfSpans, selfName: settings.selfName,
|
||||
systemHealthy: result.systemNote == nil)
|
||||
}
|
||||
@@ -419,24 +432,13 @@ final class SessionController: ObservableObject {
|
||||
guard settings.recapEnabled, !resolved.segments.isEmpty else { return }
|
||||
let analyzer = RecapAnalyzer(llm: llm, model: model)
|
||||
guard let result = try? await analyzer.recap(file: resolved, template: settings.defaultTemplate) else { return }
|
||||
let title = Self.recapTitle(app: inputs.app, sessionId: inputs.sessionId)
|
||||
let title = SessionNaming.recapTitle(app: inputs.app, sessionId: inputs.sessionId)
|
||||
try? RecapRenderer.write(file: resolved, result: result, title: title, to: inputs.folder)
|
||||
try? RecapFile(title: title, result: result).write(to: inputs.folder.appendingPathComponent("recap.json"))
|
||||
let url = inputs.folder.appendingPathComponent("recap.html")
|
||||
if FileManager.default.fileExists(atPath: url.path) { self.recapURL = url }
|
||||
}
|
||||
|
||||
/// Friendly recap title, e.g. "Google Meet call — 2026-06-06 11:43".
|
||||
private static func recapTitle(app: String, sessionId: String) -> String {
|
||||
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
|
||||
let stamp = sessionId.split(separator: "_").first.map(String.init) ?? sessionId
|
||||
let parts = stamp.split(separator: "T")
|
||||
let date = parts.first.map(String.init) ?? ""
|
||||
let timeBits = parts.count > 1 ? parts[1].split(separator: "-") : []
|
||||
let time = timeBits.count >= 2 ? "\(timeBits[0]):\(timeBits[1])" : ""
|
||||
return "\(appName) call — \(date) \(time)".trimmingCharacters(in: .whitespaces)
|
||||
}
|
||||
|
||||
// MARK: - Speaker corrections
|
||||
|
||||
/// True once the last session has a transcribed `speakers.json` to correct.
|
||||
@@ -584,6 +586,11 @@ final class SessionController: ObservableObject {
|
||||
/// its WAV headers are finalized before the process exits. Handles quit while
|
||||
/// `.starting` and `.finishing`, not just `.recording`.
|
||||
func prepareForTermination() async {
|
||||
isTerminating = true
|
||||
// If the meeting-name prompt is open, end its modal loop so quit isn't blocked
|
||||
// waiting on the user — the session keeps its auto timestamped name. (Falls
|
||||
// back to the user answering the on-screen dialog if the abort isn't serviced.)
|
||||
if activeNamingAlert != nil { NSApp.abortModal() }
|
||||
// Cancel any in-flight backend transcription (audio is already saved; the
|
||||
// user can resend). The pipeline's checkCancellation + defer clean up chunks.
|
||||
processTask?.cancel()
|
||||
@@ -649,6 +656,59 @@ final class SessionController: ObservableObject {
|
||||
return f.string(from: Date())
|
||||
}
|
||||
|
||||
/// Ask the user to name the just-finished recording, then rename its folder to
|
||||
/// a readable `<date>_<name>_<app>` (dropping the HH-MM-SS auto stamp). Skipping
|
||||
/// or leaving it blank keeps the timestamped name. Must run BEFORE `finish()` so
|
||||
/// the renamed folder is what flows to backend processing. The recorder and
|
||||
/// visual capture have both finished by now, so every session file is closed and
|
||||
/// the move is safe. Never called from the quit path — we don't block a quit on
|
||||
/// a prompt.
|
||||
private func promptMeetingNameAndRename() {
|
||||
// A quit can begin while we're finishing — don't put a blocking prompt in its
|
||||
// way; keep the auto timestamped name and let termination drain.
|
||||
guard !isTerminating, let folder = currentFolder,
|
||||
let name = askMeetingName() else { return } // nil = skipped / blank
|
||||
let base = folder.deletingLastPathComponent()
|
||||
let date = SessionNaming.datePrefix(ofSessionNamed: folder.lastPathComponent)
|
||||
let fm = FileManager.default
|
||||
var counter = 0
|
||||
while counter < 100 {
|
||||
guard let leaf = SessionNaming.renamedLeaf(
|
||||
date: date, app: currentLabel, meetingName: name, counter: counter) else { return }
|
||||
let target = base.appendingPathComponent(leaf, isDirectory: true)
|
||||
if fm.fileExists(atPath: target.path) { counter += 1; continue } // disambiguate
|
||||
do {
|
||||
try fm.moveItem(at: folder, to: target)
|
||||
currentFolder = target
|
||||
} catch {
|
||||
NSLog("Session rename to “\(leaf)” failed: \(error.localizedDescription)") // keep the original folder
|
||||
}
|
||||
return
|
||||
}
|
||||
NSLog("Session rename: kept “\(folder.lastPathComponent)” — 100 name collisions")
|
||||
}
|
||||
|
||||
/// Modal prompt for a meeting name. Registers the alert so `prepareForTermination`
|
||||
/// can end it on quit. Returns the trimmed name, or nil if the user skipped, left
|
||||
/// it empty, or a quit aborted the prompt (caller keeps the auto folder name).
|
||||
private func askMeetingName() -> String? {
|
||||
let alert = NSAlert()
|
||||
alert.messageText = "Name this recording"
|
||||
alert.informativeText = "Give the meeting a name so its folder is easy to find in your sessions. Leave blank to keep the timestamped name."
|
||||
alert.addButton(withTitle: "Save") // .alertFirstButtonReturn
|
||||
alert.addButton(withTitle: "Skip") // .alertSecondButtonReturn
|
||||
let field = NSTextField(frame: NSRect(x: 0, y: 0, width: 240, height: 24))
|
||||
field.placeholderString = "Meeting name"
|
||||
alert.accessoryView = field
|
||||
alert.window.initialFirstResponder = field
|
||||
NSApp.activate(ignoringOtherApps: true)
|
||||
activeNamingAlert = alert
|
||||
defer { activeNamingAlert = nil }
|
||||
guard alert.runModal() == .alertFirstButtonReturn else { return nil }
|
||||
let text = field.stringValue.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
return text.isEmpty ? nil : text
|
||||
}
|
||||
|
||||
/// Debug artifact: the channel-verified "self" spans actually sent to the backend
|
||||
/// as `self_vad` (mic active AND louder than system). Lets us eyeball self detection.
|
||||
private func writeSelfSpans(spans: [VADSpan], result: RecordingResult, to folder: URL) {
|
||||
|
||||
@@ -0,0 +1,71 @@
|
||||
import Foundation
|
||||
|
||||
/// Pure helpers for session-folder names. A session folder is created at start
|
||||
/// with an auto name `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; when the user names the
|
||||
/// recording on stop it's renamed to `<yyyy-MM-dd>_<name>_<app>` (no HH-MM-SS),
|
||||
/// which is far easier to scan in `sessions/`. The app label always stays the
|
||||
/// LAST `_`-separated segment so `SessionController.appLabel(from:)` keeps working
|
||||
/// even when the meeting name itself contains spaces or underscores.
|
||||
enum SessionNaming {
|
||||
/// Filesystem- and parse-safe meeting name: trims, turns path separators into
|
||||
/// dashes, drops control characters, collapses whitespace runs, removes leading
|
||||
/// dots (no hidden/`.`/`..` folders), and caps the length. Returns "" if nothing
|
||||
/// usable is left, which callers treat as "skip the rename".
|
||||
static func sanitize(_ raw: String) -> String {
|
||||
var s = raw.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
// Path-hostile separators (`/` and the classic Mac `:`, plus `\`) → dash.
|
||||
s = s.components(separatedBy: CharacterSet(charactersIn: "/:\\")).joined(separator: "-")
|
||||
// Strip control characters outright.
|
||||
s = s.components(separatedBy: .controlCharacters).joined()
|
||||
// Collapse internal whitespace runs to single spaces.
|
||||
s = s.split(whereSeparator: { $0 == " " || $0 == "\t" }).joined(separator: " ")
|
||||
while s.hasPrefix(".") { s.removeFirst() }
|
||||
s = s.trimmingCharacters(in: .whitespaces)
|
||||
if s.count > 60 { s = String(s.prefix(60)).trimmingCharacters(in: .whitespaces) }
|
||||
return s
|
||||
}
|
||||
|
||||
/// The date prefix of a session leaf name, e.g. `2026-06-17T09-59-48_signal`
|
||||
/// → `2026-06-17`. Already-renamed leaves (`2026-06-17_name_signal`) return the
|
||||
/// same date, so this is safe to call on either form.
|
||||
static func datePrefix(ofSessionNamed leaf: String) -> String {
|
||||
let head = leaf.split(separator: "_").first.map(String.init) ?? leaf
|
||||
return head.split(separator: "T").first.map(String.init) ?? head
|
||||
}
|
||||
|
||||
/// Compose the renamed leaf `<date>_<name>_<app>`. A positive `counter`
|
||||
/// disambiguates a collision by suffixing the NAME segment (`<name>-2`) so the
|
||||
/// trailing `_<app>` stays parseable. Returns nil when the name sanitizes to
|
||||
/// empty (the caller keeps the auto timestamped name).
|
||||
static func renamedLeaf(date: String, app: String, meetingName: String, counter: Int = 0) -> String? {
|
||||
let clean = sanitize(meetingName)
|
||||
guard !clean.isEmpty else { return nil }
|
||||
let suffix = counter > 0 ? "-\(counter + 1)" : ""
|
||||
return "\(date)_\(clean)\(suffix)_\(app)"
|
||||
}
|
||||
|
||||
/// Friendly recap title from a session id, understanding both folder forms:
|
||||
/// `2026-06-06T11-43-02_meet` → "Google Meet call — 2026-06-06 11:43"
|
||||
/// `2026-06-06_Weekly sync_meet` → "Weekly sync — Google Meet (2026-06-06)"
|
||||
static func recapTitle(app: String, sessionId: String) -> String {
|
||||
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
|
||||
var parts = sessionId.split(separator: "_").map(String.init)
|
||||
if parts.count > 1 { parts.removeLast() } // drop the trailing "_<app>"
|
||||
let head = parts.first ?? sessionId
|
||||
let tBits = head.split(separator: "T").map(String.init)
|
||||
let date = tBits.first ?? head
|
||||
let time: String = {
|
||||
guard tBits.count > 1 else { return "" }
|
||||
let b = tBits[1].split(separator: "-")
|
||||
return b.count >= 2 ? "\(b[0]):\(b[1])" : ""
|
||||
}()
|
||||
let when = [date, time].filter { !$0.isEmpty }.joined(separator: " ")
|
||||
// Rejoin with "_" — the faithful inverse of split("_") — so a name that
|
||||
// itself contained underscores survives the round-trip through the folder name.
|
||||
let name = parts.count > 1 ? parts[1...].joined(separator: "_") : ""
|
||||
if name.isEmpty {
|
||||
return "\(appName) call — \(when)".trimmingCharacters(in: .whitespaces)
|
||||
}
|
||||
return "\(name) — \(appName) (\(when))".trimmingCharacters(in: .whitespaces)
|
||||
}
|
||||
}
|
||||
@@ -121,8 +121,8 @@ final class TranscriptPipeline {
|
||||
return assembled.speakersFile
|
||||
}
|
||||
|
||||
/// Build the `label-merge` timeline from mic-VAD self spans (Phase 1/2). Once
|
||||
/// the visual adapters land (Phase 3–4), their segments are merged in too.
|
||||
/// Build the `label-merge` timeline from mic-VAD self spans; the visual
|
||||
/// adapters' segments are merged in alongside these.
|
||||
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
||||
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
||||
}
|
||||
|
||||
@@ -3,8 +3,8 @@ import Combine
|
||||
|
||||
/// User-facing settings, persisted to `UserDefaults`.
|
||||
///
|
||||
/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles.
|
||||
/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 3–4).
|
||||
/// Covers the backend host + TLS handling, output folder, your name, chunk
|
||||
/// length, per-app adapter toggles, and the auto-record/auto-send/recap flags.
|
||||
@MainActor
|
||||
final class AppSettings: ObservableObject {
|
||||
|
||||
@@ -106,7 +106,10 @@ final class AppSettings: ObservableObject {
|
||||
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
|
||||
?? Self.defaultBackendURL
|
||||
|
||||
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
|
||||
// Off by default: install the Start9 Root CA in the System keychain and the
|
||||
// backend's cert validates normally. The bypass is an opt-in escape hatch and,
|
||||
// when on, is scoped to the configured host (see `InsecureTrustDelegate`).
|
||||
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? false
|
||||
|
||||
self.outputFolderPath = defaults.string(forKey: Keys.outputFolder)
|
||||
?? "~/Ten31Transcripts"
|
||||
|
||||
@@ -30,8 +30,6 @@
|
||||
<string>Ten31</string>
|
||||
<key>NSMicrophoneUsageDescription</key>
|
||||
<string>Ten31 Transcripts records your microphone during calls to build the local audio track.</string>
|
||||
<key>NSAppleEventsUsageDescription</key>
|
||||
<string>Ten31 Transcripts reads the active browser tab's URL to detect Google Meet calls.</string>
|
||||
<key>NSLocalNetworkUsageDescription</key>
|
||||
<string>Ten31 Transcripts connects to your SparkControl server on the local network.</string>
|
||||
<key>NSAppTransportSecurity</key>
|
||||
|
||||
@@ -173,7 +173,7 @@ struct MenuBarView: View {
|
||||
private var header: some View {
|
||||
VStack(alignment: .leading, spacing: 2) {
|
||||
Text("Ten31 Transcripts").font(.headline)
|
||||
Text("Phase 0 · setup & status")
|
||||
Text("Setup & status")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
|
||||
@@ -62,7 +62,7 @@ struct VisualTimeline: Codable {
|
||||
}
|
||||
|
||||
/// The flat array `label-merge` wants: `[{start,end,name,confidence}]`,
|
||||
/// dropping `source`. Slice/rebase to chunk-local seconds happens in Phase 5.
|
||||
/// dropping `source`. Slice/rebase to chunk-local seconds happens at chunking time.
|
||||
func flatTimelineData() throws -> Data {
|
||||
let flat = segments.map { seg -> [String: Any] in
|
||||
["start": seg.start, "end": seg.end, "name": seg.name, "confidence": seg.confidence]
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
import XCTest
|
||||
@testable import Ten31Transcripts
|
||||
|
||||
/// The TLS bypass is an opt-in escape hatch scoped to the configured backend host.
|
||||
/// These cover the security gate (`allowsTrustOverride`) so a regression can't widen
|
||||
/// it back to "trust any server". The gate is pure, so no network or SecTrust needed.
|
||||
final class InsecureTrustDelegateTests: XCTestCase {
|
||||
private func space(host: String,
|
||||
method: String = NSURLAuthenticationMethodServerTrust) -> URLProtectionSpace {
|
||||
URLProtectionSpace(host: host, port: 62419, protocol: "https",
|
||||
realm: nil, authenticationMethod: method)
|
||||
}
|
||||
|
||||
func testFiresForMatchingHost() {
|
||||
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||
XCTAssertTrue(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
|
||||
}
|
||||
|
||||
func testRejectsMismatchedHost() {
|
||||
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "evil.example.com")))
|
||||
}
|
||||
|
||||
func testNilAllowedHostNeverFires() {
|
||||
let d = InsecureTrustDelegate(allowedHost: nil)
|
||||
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
|
||||
}
|
||||
|
||||
func testOnlyServerTrustMethodFires() {
|
||||
// Matching host but a non-server-trust challenge (e.g. HTTP Basic) must not override.
|
||||
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||
XCTAssertFalse(d.allowsTrustOverride(
|
||||
for: space(host: "192.0.2.1", method: NSURLAuthenticationMethodHTTPBasic)))
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,110 @@
|
||||
import XCTest
|
||||
@testable import Ten31Transcripts
|
||||
|
||||
final class SessionNamingTests: XCTestCase {
|
||||
|
||||
// MARK: sanitize
|
||||
|
||||
func testSanitizeTrimsAndKeepsSpaces() {
|
||||
XCTAssertEqual(SessionNaming.sanitize(" Weekly Sync "), "Weekly Sync")
|
||||
}
|
||||
|
||||
func testSanitizeReplacesPathSeparators() {
|
||||
XCTAssertEqual(SessionNaming.sanitize("9/10 standup"), "9-10 standup")
|
||||
XCTAssertEqual(SessionNaming.sanitize("a:b\\c"), "a-b-c")
|
||||
}
|
||||
|
||||
func testSanitizeCollapsesWhitespaceRuns() {
|
||||
XCTAssertEqual(SessionNaming.sanitize("board 1:1"), "board 1-1")
|
||||
}
|
||||
|
||||
func testSanitizeStripsLeadingDots() {
|
||||
XCTAssertEqual(SessionNaming.sanitize("...hidden"), "hidden")
|
||||
XCTAssertEqual(SessionNaming.sanitize(".."), "")
|
||||
}
|
||||
|
||||
func testSanitizeEmptyForBlankOrWhitespace() {
|
||||
XCTAssertEqual(SessionNaming.sanitize(""), "")
|
||||
XCTAssertEqual(SessionNaming.sanitize(" \n\t "), "")
|
||||
}
|
||||
|
||||
func testSanitizeCapsLength() {
|
||||
let long = String(repeating: "x", count: 200)
|
||||
XCTAssertEqual(SessionNaming.sanitize(long).count, 60)
|
||||
}
|
||||
|
||||
func testSanitizeStripsControlCharacters() {
|
||||
XCTAssertEqual(SessionNaming.sanitize("a\u{0000}b\u{001F}c"), "abc")
|
||||
}
|
||||
|
||||
// MARK: datePrefix
|
||||
|
||||
func testDatePrefixFromAutoName() {
|
||||
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17T09-59-48_signal"), "2026-06-17")
|
||||
}
|
||||
|
||||
func testDatePrefixFromRenamedName() {
|
||||
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17_Weekly sync_signal"), "2026-06-17")
|
||||
}
|
||||
|
||||
// MARK: renamedLeaf
|
||||
|
||||
func testRenamedLeafBasic() {
|
||||
XCTAssertEqual(
|
||||
SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "Weekly sync"),
|
||||
"2026-06-17_Weekly sync_signal")
|
||||
}
|
||||
|
||||
func testRenamedLeafAppStaysLastSegment() {
|
||||
// The meeting name may contain underscores; the app must remain parseable as
|
||||
// the final "_"-segment (what SessionController.appLabel reads).
|
||||
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "meet", meetingName: "q3_planning")
|
||||
XCTAssertEqual(leaf, "2026-06-17_q3_planning_meet")
|
||||
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "meet")
|
||||
}
|
||||
|
||||
func testRenamedLeafNilForBlankName() {
|
||||
XCTAssertNil(SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: " "))
|
||||
}
|
||||
|
||||
func testRenamedLeafCounterDisambiguatesNameSegment() {
|
||||
// A collision suffixes the NAME, not the whole leaf, so "_app" stays last.
|
||||
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "sync", counter: 1)
|
||||
XCTAssertEqual(leaf, "2026-06-17_sync-2_signal")
|
||||
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
|
||||
}
|
||||
|
||||
func testRenamedLeafAppStaysLastAtMaxCollisionDepth() {
|
||||
// The 100-collision cap is counter 0…99; the app must still parse out last.
|
||||
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "q3_sync", counter: 99)
|
||||
XCTAssertEqual(leaf, "2026-06-17_q3_sync-100_signal")
|
||||
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
|
||||
}
|
||||
|
||||
// MARK: recapTitle
|
||||
|
||||
func testRecapTitleAutoNamePreservesLegacyFormat() {
|
||||
XCTAssertEqual(
|
||||
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06T11-43-02_meet"),
|
||||
"Google Meet call — 2026-06-06 11:43")
|
||||
}
|
||||
|
||||
func testRecapTitleNamedSession() {
|
||||
XCTAssertEqual(
|
||||
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_Weekly sync_meet"),
|
||||
"Weekly sync — Google Meet (2026-06-06)")
|
||||
}
|
||||
|
||||
func testRecapTitleNamePreservesUnderscores() {
|
||||
// A meeting name with underscores must survive the split/join round-trip.
|
||||
XCTAssertEqual(
|
||||
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_q3_planning_meet"),
|
||||
"q3_planning — Google Meet (2026-06-06)")
|
||||
}
|
||||
|
||||
func testRecapTitleUnknownAppCapitalizes() {
|
||||
XCTAssertEqual(
|
||||
SessionNaming.recapTitle(app: "manual", sessionId: "2026-06-06T11-43-02_manual"),
|
||||
"Manual call — 2026-06-06 11:43")
|
||||
}
|
||||
}
|
||||
+46
-35
@@ -7,9 +7,9 @@
|
||||
> returns named transcript segments. A growing **voiceprint library** recovers
|
||||
> speakers even when the visual cue is missing.
|
||||
|
||||
Master context document. Read this first, then `02_ARCHITECTURE.md`,
|
||||
`03_DATA_CONTRACTS.md`, `04_BUILD_PLAN.md`. The SparkControl API is now fully
|
||||
specified — see `03_DATA_CONTRACTS.md` (and the source `AUDIO_API.md`).
|
||||
Master context document. Read this first, then `02_ARCHITECTURE.md` and
|
||||
`03_DATA_CONTRACTS.md`. The SparkControl API is fully specified in
|
||||
`03_DATA_CONTRACTS.md`.
|
||||
|
||||
---
|
||||
|
||||
@@ -20,25 +20,30 @@ A lightweight, always-running **menu-bar app on macOS** that:
|
||||
1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams,
|
||||
or Signal.
|
||||
2. **Records two local audio tracks** — system audio (everyone else) and the
|
||||
user's microphone (the user) — and **mixes them to one 16 kHz mono WAV** for
|
||||
the backend.
|
||||
user's microphone (the user). It sends the backend **dual-channel**
|
||||
(`mic_file` + `system_file`) when the system track is healthy, falling back to
|
||||
a **mixed-mono 16 kHz WAV** otherwise.
|
||||
3. **Watches the call window** at ~2–4 fps and, per app, reads participant
|
||||
**names** and the **active-speaker cue**, producing a
|
||||
`(start, end, name, confidence)` **visual timeline** — its best guess at who
|
||||
was talking when.
|
||||
4. **Discards every video frame after extraction.** No video is ever written to
|
||||
disk. Only audio + the derived timeline persist locally.
|
||||
5. On call end, **POSTs the mixed audio + the visual timeline (+ the known
|
||||
voiceprint library) to `POST /api/audio/label-merge`** on SparkControl, which
|
||||
returns **named, speaker-attributed transcript segments** and a **voiceprint
|
||||
per speaker**.
|
||||
5. On call end, **POSTs the audio + the visual timeline (+ the known voiceprint
|
||||
library) to `POST /api/audio/label-merge`** on SparkControl, which returns
|
||||
**named, speaker-attributed transcript segments** and a **voiceprint per
|
||||
speaker**.
|
||||
6. **Persists the returned voiceprints** keyed by name, so the next call can pass
|
||||
them as `known_voiceprints` and recover a speaker by voice when the visual cue
|
||||
is absent (camera off, a bad OCR frame).
|
||||
7. **Renders the result locally** — a readable `transcript.md` plus an HTML
|
||||
`recap.html` (topics + meeting extras, generated via the backend's LLM
|
||||
endpoint), with an in-app editor for fixing speaker names after the fact.
|
||||
|
||||
The app's job ends at receiving and storing the named segments from SparkControl.
|
||||
**All transcription, diarization, and the name-merge happen on the backend.** Do
|
||||
not build transcription, diarization, or the merge vote in this app.
|
||||
The app's job ends at producing the named transcript and recap from SparkControl's
|
||||
segments. **All transcription, diarization, name-merge, and LLM analysis happen on
|
||||
the backend.** Do not build transcription, diarization, or the merge vote in this
|
||||
app.
|
||||
|
||||
## 2. Why the visual timeline still matters (the core idea)
|
||||
|
||||
@@ -68,19 +73,25 @@ few calls the system can name regulars even with cameras off.
|
||||
|
||||
**In scope (this app):**
|
||||
- Call detection for Meet / Zoom / Teams / Signal.
|
||||
- Dual-track local audio capture + mix-to-mono for the backend.
|
||||
- Dual-track local audio capture; **dual-channel send** (mic + system) with a
|
||||
mix-to-mono fallback for the backend.
|
||||
- Low-fps window capture → OCR (names) + active-speaker cue detection.
|
||||
- Per-app "adapter" modules encapsulating each app's UI quirks.
|
||||
- Building the visual timeline; **mic-VAD self-labeling** (the mic track is the
|
||||
user, so hot-mic spans pre-seed the user's name into the timeline).
|
||||
- Chunking long calls (~2–3 min) and calling `label-merge` **sequentially**.
|
||||
- A local **voiceprint store** (persist + replay named voiceprints).
|
||||
- Storing the backend's named transcript segments locally.
|
||||
- A minimal menu-bar UI: status, manual start/stop, recent sessions, adapter
|
||||
toggles, backend host/health, output folder.
|
||||
- Storing the backend's named segments and **rendering** them — `transcript.md`
|
||||
plus an HTML `recap.html` (recap analysis via the backend LLM) — with an in-app
|
||||
speaker-name editor.
|
||||
- A minimal menu-bar UI: status, manual start/stop, the last session (reveal,
|
||||
resend, open recap, edit speakers), adapter toggles, backend host/health,
|
||||
output folder.
|
||||
|
||||
**Out of scope (owned by the backend):**
|
||||
- Transcription, diarization, the name-merge vote, summarization/analysis.
|
||||
- Transcription, diarization, the name-merge vote, and LLM summarization — these
|
||||
run on the backend; the app only orchestrates the recap call and renders the
|
||||
result.
|
||||
|
||||
**Explicitly not doing:** saving video; cloud anything. Everything stays on the
|
||||
operator's LAN.
|
||||
@@ -91,14 +102,14 @@ operator's LAN.
|
||||
|---|---|---|
|
||||
| Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. |
|
||||
| Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. |
|
||||
| Backend audio format | **Mixed-mono 16 kHz WAV** | Diarizer separates speakers from one mixed stream; 16 kHz is ideal. |
|
||||
| Backend audio format | **Dual-channel (mic + system)** when the system track is healthy, else **mixed-mono 16 kHz WAV** | Separate tracks let the backend attribute the user's mic channel directly; the diarizer can still split the mono fallback. |
|
||||
| Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. |
|
||||
| Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. |
|
||||
| Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. |
|
||||
| Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. |
|
||||
| Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). |
|
||||
| Long calls | Chunk ~2–3 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. |
|
||||
| Transport / TLS | `multipart/form-data`, file field `file`; self-signed Start9 cert (skip verify or trust the Root CA); **no auth on LAN** | Matches every other SparkControl endpoint. |
|
||||
| Transport / TLS | `multipart/form-data`, file field `file` (mono) or `mic_file` + `system_file` (dual-channel); self-signed Start9 cert (trust the Root CA — supported default; host-scoped skip-verify is an off-by-default escape hatch); **no auth on LAN** | Matches every other SparkControl endpoint. |
|
||||
| Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. |
|
||||
|
||||
### On forking Hyprnote
|
||||
@@ -128,25 +139,25 @@ SparkControl, on the operator's Start9 LAN, fronting two DGX Sparks:
|
||||
- **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize +
|
||||
name from the visual timeline (+ voiceprint fallback), optionally transcribe,
|
||||
in one synchronous call.
|
||||
- **LLM (recap):** Qwen3 via OpenAI-compatible `POST /v1/chat/completions` —
|
||||
generates the readable recap (topics + meeting extras) from the transcript.
|
||||
- Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`.
|
||||
|
||||
Full request/response shapes, curl examples, limits, and error formats are in
|
||||
`03_DATA_CONTRACTS.md`.
|
||||
|
||||
## 7. Remaining open items (small)
|
||||
## 7. Settled decisions (were open at brief time)
|
||||
|
||||
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
|
||||
over a raw IP, since it survives IP changes) — configured in Settings or via the
|
||||
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
|
||||
the default; keep it editable in settings. Service-discovery at
|
||||
`GET /api/endpoints`.
|
||||
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
|
||||
toggle if the user wants to eyeball the timeline first.
|
||||
3. **Retention** — keep the session folder after a successful hand-off, or prune
|
||||
audio and keep only `speakers.json` + voiceprints? Default: keep everything,
|
||||
user-configurable.
|
||||
4. **Voiceprint update policy** — overwrite vs running-average a person's stored
|
||||
voiceprint across calls (see `02_ARCHITECTURE.md §2.9`). Start simple
|
||||
(store/refresh latest high-confidence), refine later.
|
||||
5. **Signing** — stable identity so macOS doesn't re-prompt for permissions on
|
||||
each rebuild.
|
||||
1. **Base URL.** A private LAN host — a `.local` mDNS name (preferred over a raw
|
||||
IP, since it survives IP changes) — configured in Settings or via the
|
||||
`SPARK_BACKEND_URL` env var, never committed. A neutral placeholder ships as the
|
||||
default and stays editable in Settings. Service-discovery at `GET /api/endpoints`.
|
||||
2. **Send trigger.** Auto-send on call end is a setting (`autoSendOnStop`), **off
|
||||
by default** — the user reviews the session and sends manually unless they opt in.
|
||||
3. **Retention.** The session folder is kept after a successful hand-off (output
|
||||
location is configurable); nothing is pruned automatically.
|
||||
4. **Voiceprint update policy.** Store/refresh the latest high-confidence vector
|
||||
per name (`02_ARCHITECTURE.md §2.9`); a per-name running average is a possible
|
||||
later refinement.
|
||||
5. **Signing.** A stable identity via `Config/Signing.xcconfig` (gitignored) keeps
|
||||
macOS from re-prompting for permissions on each rebuild.
|
||||
|
||||
+23
-6
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
|
||||
└────────────────┘ └────────────────────┘
|
||||
```
|
||||
|
||||
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
|
||||
the backend LLM — see §2.11.)
|
||||
|
||||
## 2. Modules
|
||||
|
||||
### 2.1 `CallDetector`
|
||||
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
|
||||
```
|
||||
|
||||
### 2.7 `SparkControlClient`
|
||||
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with
|
||||
`file`, `timeline`, `known_voiceprints`, `transcribe=true`.
|
||||
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
|
||||
**dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
|
||||
system track is healthy, else the **mono** `file`; always with `timeline`,
|
||||
`known_voiceprints`, `transcribe=true`.
|
||||
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
|
||||
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
|
||||
Start9 cert) or trust the Root CA. **No auth on the LAN.**
|
||||
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
|
||||
- Editable/clearable from the menu-bar UI (rename, delete a person, reset).
|
||||
|
||||
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
|
||||
Status (idle / detected / recording / uploading), manual start/stop, recent
|
||||
sessions (open folder, resend, delete), adapter toggles, **backend host + a
|
||||
health check** (`GET /api/status`), output folder, voiceprint manager, and a
|
||||
permissions checklist (Screen Recording, Microphone, Accessibility).
|
||||
Status (idle / detected / recording / finishing), manual start/stop with live
|
||||
mic/system level meters, and the **last session** — reveal in Finder, resend
|
||||
("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
|
||||
to reprocess an existing folder. Also a **backend host + health check**
|
||||
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
|
||||
(Microphone, Screen Recording, Accessibility). (No multi-session list or
|
||||
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
|
||||
|
||||
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
|
||||
After `speakers.json`, the recap phase turns the named transcript into the
|
||||
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
|
||||
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
|
||||
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
|
||||
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
|
||||
`RecapEditModel`) rewrites names across all outputs after the fact. All
|
||||
language-model work stays on the backend; the app orchestrates and renders.
|
||||
|
||||
## 3. macOS frameworks & permissions
|
||||
|
||||
|
||||
+28
-11
@@ -1,7 +1,7 @@
|
||||
# Data Contracts — Ten31 Transcripts
|
||||
|
||||
Companion to docs 01/02. Defines the files the app produces/stores and the **real
|
||||
SparkControl contract** (source of truth: `AUDIO_API.md`). The `label-merge`
|
||||
SparkControl contract** (verified against the live backend). The `label-merge`
|
||||
endpoint is the app's primary integration point.
|
||||
|
||||
---
|
||||
@@ -69,8 +69,10 @@ When chunking, **slice to the chunk window and rebase to chunk-local seconds**
|
||||
"app_version": "0.1.0"
|
||||
}
|
||||
```
|
||||
(`mixed_mono_16k.wav` is the one the backend gets; the separate tracks are kept
|
||||
locally — the mic track is the user's known identity / VAD source.)
|
||||
(On the dual-channel path the backend gets `mic.wav` + `system.wav` directly; on
|
||||
the mono fallback it gets `mixed_mono_16k.wav`. The mic track is the user's known
|
||||
identity / VAD source. **Note:** the per-file `sha256` fields above are part of the
|
||||
intended contract but are **not currently emitted** by the pipeline.)
|
||||
|
||||
---
|
||||
|
||||
@@ -83,15 +85,17 @@ locally — the mic track is the user's known identity / VAD source.)
|
||||
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
||||
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
||||
the default.
|
||||
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
||||
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
||||
the Start9 Root CA into the trust store.
|
||||
- **TLS:** Start9 self-signed Root CA. Supported path: install the Start9 Root CA
|
||||
into the System keychain (default trust then succeeds). Skip-verification is an
|
||||
**off-by-default, host-scoped** escape hatch (`InsecureTrustDelegate`, scoped to
|
||||
the configured backend host), not the default.
|
||||
- **Auth:** **none on the LAN.** No token/key today.
|
||||
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
|
||||
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
|
||||
trips a GPU FFT race → `503 + Retry-After`.
|
||||
- **Transport:** `multipart/form-data`, audio file field name **`file`** (bytes,
|
||||
not base64/path).
|
||||
- **Transport:** `multipart/form-data`. Audio file field is **`file`** on the mono
|
||||
path, or **`mic_file`** + **`system_file`** on the dual-channel path (bytes, not
|
||||
base64/path).
|
||||
- **All endpoints are synchronous** (no job IDs / polling).
|
||||
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
|
||||
Retry-After` transient (retry after the interval).
|
||||
@@ -105,11 +109,16 @@ Diarize + name clusters from the visual timeline (majority temporal overlap),
|
||||
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
|
||||
the app owns the timeline and the voiceprint library.
|
||||
|
||||
**Multipart fields:**
|
||||
**Multipart fields** — two audio shapes: **mono** (`file`) or **dual-channel**
|
||||
(`mic_file` + `system_file`, preferred when the system track is healthy):
|
||||
| field | required | notes |
|
||||
|---|---|---|
|
||||
| `file` | **yes** | mixed-mono WAV (the chunk, when chunking) |
|
||||
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1) |
|
||||
| `file` | mono path | mixed-mono WAV (the chunk, when chunking) |
|
||||
| `mic_file` | dual path | the user's mic track (chunk) — attributed to `self_name` |
|
||||
| `system_file` | dual path | the remote/system track (chunk) |
|
||||
| `self_name` | dual path | the user's name; the mic channel is attributed to them |
|
||||
| `self_vad` | no | chunk-local windows where the mic is genuinely the user (active + louder than system) |
|
||||
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1); on the dual path it names only the remote speakers |
|
||||
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
|
||||
| `transcribe` | no | `"true"` to also return per-segment text (default false) |
|
||||
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
|
||||
@@ -213,3 +222,11 @@ Loaded → `known_voiceprints` on every `label-merge` call. Updated from respons
|
||||
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
|
||||
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
|
||||
`overlap_confidence ≥ ~0.8`; consider per-name running mean later.
|
||||
|
||||
## 8. Recap outputs (`transcript.md`, `recap.{html,json}`)
|
||||
After `speakers.json` is assembled, the recap phase renders the human-readable
|
||||
deliverables: a `transcript.md` (one line per diarized utterance) and an HTML
|
||||
`recap.html`, backed by a structured `recap.json`. The recap's topic/summary
|
||||
content is generated by the **backend LLM** (`POST /v1/chat/completions`, Qwen3);
|
||||
the app owns the rendering and the in-app **speaker-name editor**, which can rewrite
|
||||
names across `speakers.json`, the transcript, and the recap after the fact.
|
||||
|
||||
@@ -1,5 +1,11 @@
|
||||
# Build Plan — Ten31 Transcripts
|
||||
|
||||
> **Status: COMPLETE (historical).** Phases 0–6 shipped and the app is in daily
|
||||
> use; a recap phase (transcript + HTML recap via the backend LLM) was added after
|
||||
> this plan was written. Kept as the original build log and as the map for the
|
||||
> "Phase N" references in the code comments. Forward-looking work lives in
|
||||
> `ROADMAP.md`; current status in `AGENTS.md`.
|
||||
|
||||
Companion to docs 01–03. Phased plan for the Claude Code session, each phase with
|
||||
a demoable milestone. Build in order; the risky/novel work (visual adapters) is
|
||||
isolated for independent tuning. The SparkControl contract is now known
|
||||
|
||||
Reference in New Issue
Block a user