Compare commits
13 Commits
ddee2c4871
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 050ae32e1d | |||
| a5c227ef1c | |||
| d4228b566a | |||
| 35ba6ecf05 | |||
| dda4322de7 | |||
| 85ea8fde45 | |||
| b42b591690 | |||
| 82de00ce37 | |||
| d770e52d8f | |||
| fc80f6707a | |||
| 0af86411c2 | |||
| 5bed24a454 | |||
| 3629dbdaaa |
@@ -0,0 +1 @@
|
|||||||
|
{}
|
||||||
+12
-1
@@ -23,4 +23,15 @@ Config/Signing.xcconfig
|
|||||||
|
|
||||||
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
|
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
|
||||||
.env
|
.env
|
||||||
.env.local
|
.env.*
|
||||||
|
!.env.example
|
||||||
|
|
||||||
|
# Claude Code — deny by default, allow-list shared wiring.
|
||||||
|
# .claude/ also accumulates worktrees, editor configs, and OS cruft; commit
|
||||||
|
# only the shared parts so new local scratch (or a stray secret) stays out.
|
||||||
|
.claude/*
|
||||||
|
!.claude/rules/
|
||||||
|
!.claude/agents/
|
||||||
|
!.claude/commands/
|
||||||
|
!.claude/skills/
|
||||||
|
!.claude/settings.json
|
||||||
|
|||||||
@@ -2,12 +2,14 @@
|
|||||||
|
|
||||||
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
|
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
|
||||||
|
|
||||||
|
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items tagged `(ten31-transcripts)` and surface them before proposing next steps; triage with `/triage`.
|
||||||
|
|
||||||
## Stack (versions that matter)
|
## Stack (versions that matter)
|
||||||
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
|
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
|
||||||
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
|
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
|
||||||
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
|
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
|
||||||
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
|
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
|
||||||
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN backend — IP or `.local` host; Start9 self-signed cert. Install the StartOS Root CA in the System keychain so normal TLS validation succeeds; skip-TLS is an opt-in, **host-scoped** escape hatch, **off by default** — see `InsecureTrustDelegate`). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
|
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
|
||||||
@@ -44,23 +46,24 @@ open /Applications/Ten31Transcripts.app
|
|||||||
|
|
||||||
## Layout (day one)
|
## Layout (day one)
|
||||||
- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
|
- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
|
||||||
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
|
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`, `SessionNaming` (pure folder-name + recap-title logic).
|
||||||
- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
|
- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
|
||||||
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
|
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
|
||||||
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
|
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`, `AudioMixer`, `MonoTrackWriter`, `Resampler`.
|
||||||
- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
|
- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
|
||||||
- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
|
- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
|
||||||
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`/`AudioInputProcesses`/`MicActivityMonitor`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
||||||
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
|
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
|
||||||
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
|
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`. The folder is created at session start as `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; on stop the user can name the meeting and it's renamed to `<date>_<name>_<app>` (skipping keeps the auto stamp).
|
||||||
|
|
||||||
## Conventions
|
## Conventions
|
||||||
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
|
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
|
||||||
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
|
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
|
||||||
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing; the remote URL lives in `.git/config`, kept out of source. Branch before committing; never commit to `main` without asking.
|
- Commits: imperative mood, concise; authored by Grant. Push to the self-hosted Gitea remote `origin` (branch `main`, over SSH) after committing, with my approval; the remote URL lives in `.git/config`, kept out of source. Work on `main` — don't create feature branches unless I ask.
|
||||||
|
- **Gitea push gotcha:** `origin`'s URL uses a raw `.local` mDNS host that intermittently fails to resolve (`Could not resolve hostname`, or a push that connects then stalls). The `gitea-home` SSH alias (in `~/.ssh/config`) points at the **same** Gitea server (port 59916, user `git`) via a reliable HostName — the sibling `standards` repo uses it. Reliable fallback: `git push gitea-home:grant/ten31-transcripts.git main` then `git update-ref refs/remotes/origin/main main`. Repointing `origin` to the alias would make this permanent (not yet done).
|
||||||
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
|
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
|
||||||
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
|
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
|
||||||
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
|
- **Git history scrubbed (2026-06-13):** the private backend host + LAN IP were purged from all commits via `git filter-repo` (replaced with the `your-spark-backend.local` placeholder) and force-pushed; 0 hits across refs. Pre-rewrite backup bundle: `../ten31-transcripts-prehistory-rewrite.bundle`. A **second rewrite the same day** purged two backend LAN IPs that had slipped into a docs/test commit, replacing them with RFC 5737 documentation IPs (`192.0.2.1`/`192.0.2.2`) and force-pushing; 0 hits across refs; backup bundle `../ten31-transcripts-pre-ip-scrub.bundle`. The Apple Team ID was intentionally **not** scrubbed (it's public in every signed binary) — don't re-flag it.
|
||||||
|
|
||||||
## Always
|
## Always
|
||||||
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
|
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
|
||||||
@@ -78,18 +81,16 @@ open /Applications/Ten31Transcripts.app
|
|||||||
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
|
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
|
||||||
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
|
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
|
||||||
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
|
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
|
||||||
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
|
- Never let a session-folder name put the meeting name where the app label is parsed from: the app must stay the **last** `_`-segment (`SessionController.appLabel(from:)` reads `.split("_").last`; `SessionNaming` enforces this and disambiguates collisions on the name segment). Renames happen at `finish()`-time after files are closed — re-derive track URLs from the (possibly moved) folder, never from `RecordingResult`'s start-time paths.
|
||||||
- Never commit to `main` or force-push a shared branch; branch first and ask.
|
- Never send call audio to a raw IP the user didn't configure. Offline backend checks: a `.local` mDNS host can't be resolved by a plain `swiftc`/URLSession binary (`-1009`) — use the **real app** or `curl`; but a **configured raw IP _is_ reachable from a plain swiftc URLSession binary** (that's how the TLS fix was verified offline).
|
||||||
|
- Never force-push a shared branch, and never push without my approval. (Work on `main` — don't create feature branches unless I ask.)
|
||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs; working tree clean and pushed to `origin`/`main`. A full independent evaluation ran 2026-06-13 → `EVALUATION.md` (committed at repo root; overwritten + re-committed each run for a reviewable diff); its findings are triaged into the lists below.
|
Present tense; overwritten each session. `main` clean and pushed (HEAD `a5c227e`, pushed via the `gitea-home` alias — origin's `.local` host wouldn't resolve); `/Applications/Ten31Transcripts.app` rebuilt + installed from HEAD. **Full suite re-run: 91 pass** (was 73; +18 `SessionNamingTests`).
|
||||||
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
|
- **This session (2026-06-17) — meeting-name prompt + folder rename:** on stop, an NSAlert asks for a meeting name (Save/Skip) and the session folder is renamed `<ts>_<app>` → `<date>_<name>_<app>` (HH-MM-SS dropped; Skip/blank keeps the stamp). Pure logic in `SessionNaming` (sanitize, leaf compose, `recapTitle` for both forms); app label stays the last `_`-segment; collisions disambiguate on the name segment; `finish()` re-derives track URLs post-rename; quit never prompts and aborts an open prompt. Reviewer-reviewed; its P1 (quit-during-modal) + two P2s fixed.
|
||||||
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
|
- **Backend connected end-to-end:** real LAN URL saved in Settings → SparkControl backend (off-repo: `defaults read xyz.ten31.transcripts backendBaseURL`); committed default stays the placeholder.
|
||||||
- **Work queue (P1 — do first):** the TLS-trust override is global and on by default — it returns `URLCredential(trust:)` for *any* host (`InsecureTrustDelegate.swift:22`; default-on at `AppSettings.swift:109`), so the full mic+system audio, visual timeline, and voiceprint upload is MITM-able by anyone on the LAN. Scope the override to the configured backend host and pin the Start9 root CA (or the leaf SPKI hash); default skip-TLS to off. This gates trusting any later backend-integration test.
|
- **Working:** backend hand-off (live), call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked send, speaker reconciliation, recap, speaker editor, configurable chunk length, standalone Settings, meeting-name prompt + readable folders.
|
||||||
- **Known debt (P2 — fix before wider use):**
|
- **Verify next (real app):** the naming prompt + rename is unit-tested + builds but **not yet exercised on a live stop** — run a real recording, stop, name it, confirm the folder renames and backend output lands in the renamed folder.
|
||||||
- `RecapAnalyzer.mmss()` fatally crashes on NaN/∞ (reproduced 2×); a malformed/MITM'd backend `duration` (e.g. `1e400` → `Double.infinity`) aborts the app at recap-render time — add a finite-guard fallback (`RecapAnalyzer.swift:137`).
|
- **Next up:** (a) repoint `origin` to `gitea-home` so pushes stop hitting the flaky `.local` host (see Conventions); (b) **backend URL primary→fallback** + the `mmss()` NaN/∞ guard freebie (sketch first; keep real IPs out of source — use `192.0.2.x`).
|
||||||
- README is stale by six phases — still says "Phase 0 (scaffold) / no audio capture, detection, or backend hand-off yet" for a shipped Phase-6 app; same lie in source comment `AppSettings.swift:7`. Rewrite both to match reality.
|
- **In progress / unverified:** the Meet visual fix (reject solid camera-off tiles) still has no clean end-to-end run — re-process the saved Meet session + a fresh Meet call (needs real app + backend).
|
||||||
- `SessionController` (670 lines, the most concurrency-dense file) has zero unit tests — cover `pendingAutoStop` (auto-start-then-immediate-call-end) and the visual-adoption generation guard before any refactor.
|
- **Known bugs / loose end:** sparse Meet speaking-detection (faint blue border); sub-second junk "self" mic fragments; desktop-mic vs phone doesn't unify by voiceprint. Doc loose end: `docs/01 §5`/`docs/02 §2.4` still list "AppleScript" as a Meet name source though the code uses window titles.
|
||||||
- **Deferred (P3 — later decision or bulk cleanup; full evidence in `EVALUATION.md`):** `docs/` specs drifted from the dual-channel API + recap phase; `docs/01` §7 lists already-resolved open items; `docs/02` §2.10 claims MenuBarUI features that don't exist; AGENTS.md Layout listings under `Audio/`/`Detection/` are incomplete; the `manifest.json` sha256 contract is specced but never written; env-var precedence footgun (saved URL shadows `SPARK_BACKEND_URL`); `SessionController` owns three jobs (extract the open-panel UI); unused `NSAppleEventsUsageDescription`; unauthenticated LAN backend (consider a shared bearer token).
|
|
||||||
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
|
|
||||||
- **Next (product validation — no agent could reach the live backend, so this stays manual):** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) record a fresh Meet call to validate the visual fix on a clean capture. (The old "confirm Your name = Grant" item is moot — the committed default is the generic `"Me"`; "Grant" only ever lives in local UserDefaults.)
|
|
||||||
|
|||||||
@@ -1,74 +1,146 @@
|
|||||||
# Ten31 Transcripts
|
# Ten31 Transcripts
|
||||||
|
|
||||||
Native macOS menu-bar app that auto-detects conference calls, records local audio,
|
Native macOS menu-bar app that auto-detects conference calls, records dual-track
|
||||||
builds a visual-derived speaker timeline, and hands audio + timeline to the
|
audio while watching the call window for active-speaker cues, and hands the audio
|
||||||
SparkControl backend for naming/transcription. See `docs/` for the full spec.
|
plus a visual speaker timeline to a self-hosted **SparkControl** backend that does
|
||||||
|
the transcription, diarization, and speaker naming — producing named transcripts
|
||||||
|
and meeting recaps.
|
||||||
|
|
||||||
This repo is at **Phase 0** (scaffold, permissions, backend health check).
|
It runs as a menu-bar-only app (no Dock icon). All machine-learning work lives on
|
||||||
|
the backend; the app only records, watches, packages, and reconciles hints.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
1. **Detect** — a call in Google Meet, Zoom, Teams, or Signal starts; `CallDetector`
|
||||||
|
notices and (optionally) auto-starts a session.
|
||||||
|
2. **Record + watch** — dual-track audio (your mic + system output) is captured while
|
||||||
|
`ScreenCaptureKit` samples the call window (~3 fps) to read names and spot the
|
||||||
|
active speaker. Video frames are analyzed in memory and released immediately —
|
||||||
|
**never written to disk**.
|
||||||
|
3. **Package + send** — audio is chunked and sent to the backend, dual-channel
|
||||||
|
(`mic_file` + `system_file`) when the system track is healthy, else a mono mix.
|
||||||
|
The visual timeline rides along as naming hints. Backend calls are sequential
|
||||||
|
(one in flight) to respect the single-GPU backend.
|
||||||
|
4. **Transcribe + name** — the backend diarizes (Sortformer/TitaNet) and an LLM
|
||||||
|
(Qwen3, via an OpenAI-compatible endpoint) assigns names, helped by the visual
|
||||||
|
hints and your stored voiceprints.
|
||||||
|
5. **Reconcile + recap** — the app reconciles speaker hints, then writes a readable
|
||||||
|
`transcript.md` and an HTML `recap.html`. A built-in speaker editor lets you fix
|
||||||
|
names after the fact.
|
||||||
|
|
||||||
|
**You** are identified by the mic channel plus the single name in *Settings → Your
|
||||||
|
name* — that name is reserved so the LLM never assigns it to anyone else. (There's
|
||||||
|
no per-platform display-name matching; your Zoom/Meet/Signal names can all differ.)
|
||||||
|
|
||||||
## One-time setup
|
## One-time setup
|
||||||
|
|
||||||
1. **Install Xcode** from the Mac App Store (free; ~40 GB). Open it once and
|
1. **Install Xcode** from the Mac App Store (free; large download). Open it once and
|
||||||
accept the license prompt.
|
accept the license prompt.
|
||||||
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
|
2. **Install XcodeGen** (generates the Xcode project from `project.yml`):
|
||||||
```sh
|
```sh
|
||||||
brew install xcodegen
|
brew install xcodegen
|
||||||
```
|
```
|
||||||
3. **Set your signing team.** The Apple Team ID is kept out of source in a
|
3. **Set your signing team.** The Apple Team ID is kept out of source in a gitignored
|
||||||
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
|
`Config/Signing.xcconfig`. Copy the template and set your team:
|
||||||
```sh
|
```sh
|
||||||
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||||
```
|
```
|
||||||
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
||||||
team automatically — no manual selection. Keep the value stable so macOS
|
team automatically. Keep the value stable so macOS preserves the app's permission
|
||||||
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
|
(TCC) grants across rebuilds. Edit the xcconfig, not Xcode — `xcodegen generate`
|
||||||
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
|
overwrites Xcode-side changes.
|
||||||
4. **Generate the project:**
|
4. **Generate the project** (re-run any time you add/remove/rename a source file):
|
||||||
```sh
|
```sh
|
||||||
xcodegen generate
|
xcodegen generate
|
||||||
```
|
```
|
||||||
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
This creates `Ten31Transcripts.xcodeproj` (gitignored — regenerate, don't edit).
|
||||||
5. **Open it:**
|
|
||||||
```sh
|
|
||||||
open Ten31Transcripts.xcodeproj
|
|
||||||
```
|
|
||||||
6. Press **Run** (⌘R).
|
|
||||||
|
|
||||||
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
## Build & run
|
||||||
> Xcode reload the project. The signing team persists because it lives in
|
|
||||||
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
|
|
||||||
> rebuilds.
|
|
||||||
|
|
||||||
## What Phase 0 does
|
The simplest path is to open `Ten31Transcripts.xcodeproj` and press **Run** (⌘R).
|
||||||
|
|
||||||
- Launches as a menu-bar-only app (no Dock icon).
|
To build a standalone app and install it (Xcode doesn't need to stay open) — note the
|
||||||
- Menu panel shows live status for the three permissions it needs — **Microphone**,
|
`DEVELOPER_DIR` prefix: full Xcode lives at `/Applications/Xcode.app` but
|
||||||
**Screen Recording**, **Accessibility** — with Grant / Open Settings buttons.
|
`xcode-select` may point at the Command Line Tools, so set it on **every**
|
||||||
- Shows a **backend health check** (`GET /api/status`) against the configured host.
|
`xcodebuild`:
|
||||||
- **Settings:** backend base URL, skip-TLS toggle (on by default for the
|
|
||||||
self-signed cert), output folder, and adapter toggles (inert this phase).
|
|
||||||
|
|
||||||
No audio capture, call detection, screen reading, or backend hand-off yet — those
|
```sh
|
||||||
arrive in Phases 1–6 (`docs/04_BUILD_PLAN.md`).
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-configuration Release -derivedDataPath /tmp/ten31-release build
|
||||||
|
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
|
||||||
|
open /Applications/Ten31Transcripts.app
|
||||||
|
```
|
||||||
|
|
||||||
|
The installed copy does **not** auto-update — rebuild and `ditto` again after changes.
|
||||||
|
|
||||||
|
Run the test suite:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
|
||||||
|
```
|
||||||
|
|
||||||
|
## Permissions
|
||||||
|
|
||||||
|
The menu panel shows live status for the three permissions the app needs, each with
|
||||||
|
Grant / Open Settings buttons:
|
||||||
|
|
||||||
|
- **Microphone** — to record your side of the call.
|
||||||
|
- **Screen Recording** — to capture system audio and watch the call window.
|
||||||
|
- **Accessibility** — to read window/participant information.
|
||||||
|
|
||||||
|
## Backend setup
|
||||||
|
|
||||||
|
Point the app at your SparkControl backend in **Settings → SparkControl backend**.
|
||||||
|
The resolution order is: the value saved in Settings (UserDefaults) wins, else the
|
||||||
|
`SPARK_BACKEND_URL` env var, else a neutral placeholder default. The committed
|
||||||
|
default is only a placeholder (`https://your-spark-backend.local`) — your real LAN
|
||||||
|
URL lives in Settings and never touches source.
|
||||||
|
|
||||||
|
The backend sits behind a Start9 self-signed Root CA. The supported path is to
|
||||||
|
**install the StartOS Root CA in your System keychain**, after which normal TLS
|
||||||
|
validation succeeds. *Skip TLS verification* is an opt-in escape hatch, **off by
|
||||||
|
default** and **scoped to the configured backend host** — it never becomes
|
||||||
|
"trust any server."
|
||||||
|
|
||||||
|
## Output
|
||||||
|
|
||||||
|
Each session writes to `~/Ten31Transcripts/sessions/<timestamp>_<app>/` (configurable
|
||||||
|
in Settings):
|
||||||
|
|
||||||
|
```
|
||||||
|
mic.wav system.wav mixed_mono_16k.wav # audio (dual-track + mono mix)
|
||||||
|
self_vad.json visual_timeline.json # self voice-activity + visual hints
|
||||||
|
speakers.json cluster_fingerprints.json # reconciled speakers + voiceprints
|
||||||
|
transcript.md recap.html recap.json # final outputs
|
||||||
|
```
|
||||||
|
|
||||||
## Project layout
|
## Project layout
|
||||||
|
|
||||||
```
|
```
|
||||||
project.yml # XcodeGen recipe → generates the .xcodeproj
|
project.yml # XcodeGen recipe → generates the .xcodeproj
|
||||||
Ten31Transcripts/
|
Ten31Transcripts/
|
||||||
App/ Ten31TranscriptsApp.swift, AppDelegate.swift
|
App/ @main entry + AppDelegate
|
||||||
UI/ MenuBarView, SettingsView, PermissionRow
|
Detection/ CallDetector — which app is in a call
|
||||||
Permissions/PermissionsManager.swift
|
Audio/ dual-track capture, mixing, resampling, self-VAD
|
||||||
Backend/ SparkControlHealth.swift, InsecureTrustDelegate.swift
|
Visual/ ScreenCaptureKit capture + grid analysis → speaker timeline
|
||||||
Settings/ AppSettings.swift
|
Adapters/ per-app screen-readers (Meet, Zoom, Teams, Signal) + registry
|
||||||
Support/ Info.plist, Ten31Transcripts.entitlements
|
Session/ SessionController state machine, packaging, reconciliation
|
||||||
Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
|
Backend/ SparkControl + LLM clients, voiceprint store, TLS handling
|
||||||
|
Recap/ transcript.md + recap.html rendering, speaker editor
|
||||||
|
Permissions/ Settings/ UI/ Support/ (permissions, AppSettings, views, Info.plist)
|
||||||
|
Ten31TranscriptsTests/ # XCTest — pure logic (chunking, reconciliation, analyzer math)
|
||||||
|
docs/ # architecture & data-contract design notes
|
||||||
```
|
```
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
||||||
LAN-only tool that must observe other apps. Revisit only if distributing.
|
LAN-only tool that must observe other apps. Revisit only if distributing.
|
||||||
- The backend host is a private LAN address — set it in **Settings**, or seed it
|
- **Privacy:** video frames are never written to disk; recordings, transcripts, and
|
||||||
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
|
screenshots are gitignored and never committed.
|
||||||
placeholder (`https://your-spark-backend.local`).
|
- `AGENTS.md` is the canonical reference for build commands, conventions, and current
|
||||||
|
state; `ROADMAP.md` holds the backlog; `docs/` holds the architecture and
|
||||||
|
data-contract design notes.
|
||||||
|
|||||||
@@ -10,6 +10,9 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
|
|||||||
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
||||||
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
||||||
|
|
||||||
|
## Platform support
|
||||||
|
- Jitsi: add call detection + a `JitsiAdapter` (Jitsi Meet is browser-based like Google Meet — needs `CallDetector` title recognition, an adapter for participant-name reading, and active-speaker visual cues). New platform alongside Meet/Zoom/Teams/Signal.
|
||||||
|
|
||||||
## Audio / speakers
|
## Audio / speakers
|
||||||
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
||||||
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
||||||
@@ -22,5 +25,10 @@ Longer-term backlog and deferred decisions. Near-term status + the next few step
|
|||||||
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
||||||
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
||||||
|
|
||||||
|
## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`)
|
||||||
|
- Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
|
||||||
|
- Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
|
||||||
|
- Smaller P3s in `EVALUATION.md`: whether to actually emit the `manifest.json` per-file `sha256` (now documented as not-emitted in `docs/03` §2); unauthenticated LAN backend (consider a bearer token).
|
||||||
|
|
||||||
## Deferred decisions
|
## Deferred decisions
|
||||||
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|
||||||
|
|||||||
@@ -3,9 +3,8 @@ import SwiftUI
|
|||||||
/// Menu-bar-only app entry point.
|
/// Menu-bar-only app entry point.
|
||||||
///
|
///
|
||||||
/// `LSUIElement` (set in Info.plist) keeps the app out of the Dock; the
|
/// `LSUIElement` (set in Info.plist) keeps the app out of the Dock; the
|
||||||
/// `MenuBarExtra` scene provides the status-bar item and its panel. Phase 0 only
|
/// `MenuBarExtra` scene provides the status-bar item and its panel, which wires
|
||||||
/// wires up permissions, settings, and a backend health check — no audio,
|
/// up permissions, settings, recording control, and the backend health check.
|
||||||
/// capture, or call detection yet.
|
|
||||||
@main
|
@main
|
||||||
struct Ten31TranscriptsApp: App {
|
struct Ten31TranscriptsApp: App {
|
||||||
@NSApplicationDelegateAdaptor(AppDelegate.self) private var appDelegate
|
@NSApplicationDelegateAdaptor(AppDelegate.self) private var appDelegate
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ struct RecordingResult {
|
|||||||
let systemNote: String?
|
let systemNote: String?
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Dual-track local audio capture for Phase 1.
|
/// Dual-track local audio capture.
|
||||||
///
|
///
|
||||||
/// - System audio via `SCStream` (`capturesAudio`); its audio handler runs on
|
/// - System audio via `SCStream` (`capturesAudio`); its audio handler runs on
|
||||||
/// `ioQueue`. A discard-only video output runs on `screenQueue` purely to keep
|
/// `ioQueue`. A discard-only video output runs on `screenQueue` purely to keep
|
||||||
|
|||||||
@@ -13,8 +13,8 @@ struct VADSpan: Equatable {
|
|||||||
/// internal sample cursor always equals the mic file position, and span times
|
/// internal sample cursor always equals the mic file position, and span times
|
||||||
/// land on the same instants as `mixed_mono_16k.wav`.
|
/// land on the same instants as `mixed_mono_16k.wav`.
|
||||||
///
|
///
|
||||||
/// Phase 3's `TimelineBuilder` will fold these in as high-confidence pre-seeded
|
/// `TimelineBuilder` folds these in as high-confidence pre-seeded "self"
|
||||||
/// "self" segments. Thresholds are intentionally simple and will be tuned later.
|
/// segments. Thresholds are intentionally simple.
|
||||||
///
|
///
|
||||||
/// Single-threaded: all calls happen on `AudioRecorder.ioQueue`.
|
/// Single-threaded: all calls happen on `AudioRecorder.ioQueue`.
|
||||||
final class MicVAD {
|
final class MicVAD {
|
||||||
|
|||||||
@@ -33,7 +33,9 @@ final class GatewayLLMClient {
|
|||||||
config.timeoutIntervalForRequest = 600
|
config.timeoutIntervalForRequest = 600
|
||||||
config.timeoutIntervalForResource = 900
|
config.timeoutIntervalForResource = 900
|
||||||
config.waitsForConnectivity = false
|
config.waitsForConnectivity = false
|
||||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
let delegate: URLSessionDelegate? = skipTLS
|
||||||
|
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
|
||||||
|
: nil
|
||||||
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,19 +1,42 @@
|
|||||||
import Foundation
|
import Foundation
|
||||||
|
|
||||||
/// URLSession delegate that trusts the server certificate without validation.
|
/// URLSession delegate that bypasses certificate validation for **one host only**
|
||||||
|
/// — the configured SparkControl backend.
|
||||||
///
|
///
|
||||||
/// SparkControl sits behind a Start9 self-signed Root CA on the LAN, so default
|
/// SparkControl sits behind a Start9 self-signed Root CA on the LAN. The supported
|
||||||
/// trust evaluation rejects it. This delegate is used **only** when the
|
/// path is to install that CA in the System keychain; default trust evaluation then
|
||||||
/// "Skip TLS verification" setting is on. It trusts any server certificate —
|
/// succeeds and this delegate is never used. It exists only as an opt-in escape
|
||||||
/// acceptable for a personal tool on a trusted local network and nothing else.
|
/// hatch (the "Skip TLS verification" setting, off by default) for a machine where
|
||||||
|
/// the CA isn't installed. Even then it trusts a certificate only when the challenge
|
||||||
|
/// host equals `allowedHost` — a server-trust challenge from any other host falls
|
||||||
|
/// back to default validation, so the bypass can never become "trust any server".
|
||||||
final class InsecureTrustDelegate: NSObject, URLSessionDelegate {
|
final class InsecureTrustDelegate: NSObject, URLSessionDelegate {
|
||||||
|
/// The single host the bypass is scoped to (the configured backend host). When
|
||||||
|
/// nil — only reachable via a malformed base URL — the gate never fires and every
|
||||||
|
/// challenge falls back to default validation: the safe degenerate case.
|
||||||
|
private let allowedHost: String?
|
||||||
|
|
||||||
|
init(allowedHost: String?) {
|
||||||
|
self.allowedHost = allowedHost
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The security gate: the trust override may fire only for a server-trust
|
||||||
|
/// challenge whose host matches `allowedHost`. Pure and synchronous so the
|
||||||
|
/// host-scoping can be unit-tested without fabricating a `SecTrust`; the
|
||||||
|
/// credential itself is built only when this is true *and* a serverTrust exists.
|
||||||
|
func allowsTrustOverride(for space: URLProtectionSpace) -> Bool {
|
||||||
|
guard let allowedHost else { return false }
|
||||||
|
return space.authenticationMethod == NSURLAuthenticationMethodServerTrust
|
||||||
|
&& space.host == allowedHost
|
||||||
|
}
|
||||||
|
|
||||||
func urlSession(
|
func urlSession(
|
||||||
_ session: URLSession,
|
_ session: URLSession,
|
||||||
didReceive challenge: URLAuthenticationChallenge,
|
didReceive challenge: URLAuthenticationChallenge,
|
||||||
completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void
|
completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void
|
||||||
) {
|
) {
|
||||||
guard
|
guard
|
||||||
challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust,
|
allowsTrustOverride(for: challenge.protectionSpace),
|
||||||
let serverTrust = challenge.protectionSpace.serverTrust
|
let serverTrust = challenge.protectionSpace.serverTrust
|
||||||
else {
|
else {
|
||||||
completionHandler(.performDefaultHandling, nil)
|
completionHandler(.performDefaultHandling, nil)
|
||||||
|
|||||||
@@ -82,7 +82,9 @@ final class SparkControlClient {
|
|||||||
config.timeoutIntervalForRequest = 600 // diarization can take up to ~600s
|
config.timeoutIntervalForRequest = 600 // diarization can take up to ~600s
|
||||||
config.timeoutIntervalForResource = 900
|
config.timeoutIntervalForResource = 900
|
||||||
config.waitsForConnectivity = false
|
config.waitsForConnectivity = false
|
||||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
let delegate: URLSessionDelegate? = skipTLS
|
||||||
|
? InsecureTrustDelegate(allowedHost: URL(string: self.baseURL)?.host)
|
||||||
|
: nil
|
||||||
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
self.urlSession = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import Foundation
|
import Foundation
|
||||||
import Combine
|
import Combine
|
||||||
|
|
||||||
/// Performs the Phase 0 backend reachability check: `GET {baseURL}/api/status`.
|
/// Performs the backend reachability check: `GET {baseURL}/api/status`.
|
||||||
///
|
///
|
||||||
/// This is a thin slice — the full `SparkControlClient` (label-merge, multipart,
|
/// This is a thin slice; the full upload path (label-merge, multipart, sequential
|
||||||
/// sequential queueing, retries) arrives in Phase 5.
|
/// queueing, retries) lives in `SparkControlClient`.
|
||||||
@MainActor
|
@MainActor
|
||||||
final class SparkControlHealth: ObservableObject {
|
final class SparkControlHealth: ObservableObject {
|
||||||
|
|
||||||
@@ -32,7 +32,9 @@ final class SparkControlHealth: ObservableObject {
|
|||||||
config.timeoutIntervalForRequest = 8
|
config.timeoutIntervalForRequest = 8
|
||||||
config.waitsForConnectivity = false
|
config.waitsForConnectivity = false
|
||||||
|
|
||||||
let delegate: URLSessionDelegate? = skipTLS ? InsecureTrustDelegate() : nil
|
let delegate: URLSessionDelegate? = skipTLS
|
||||||
|
? InsecureTrustDelegate(allowedHost: url.host)
|
||||||
|
: nil
|
||||||
let session = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
let session = URLSession(configuration: config, delegate: delegate, delegateQueue: nil)
|
||||||
defer { session.finishTasksAndInvalidate() }
|
defer { session.finishTasksAndInvalidate() }
|
||||||
|
|
||||||
|
|||||||
@@ -99,6 +99,11 @@ final class SessionController: ObservableObject {
|
|||||||
/// Bumped each time a start/stop Task is spawned (Task is a value type, so this
|
/// Bumped each time a start/stop Task is spawned (Task is a value type, so this
|
||||||
/// is how `prepareForTermination` detects a newly-spawned transition).
|
/// is how `prepareForTermination` detects a newly-spawned transition).
|
||||||
private var lifecycleGeneration = 0
|
private var lifecycleGeneration = 0
|
||||||
|
/// The meeting-name prompt currently on screen, if any, so a quit can end it
|
||||||
|
/// instead of blocking termination on user input (set in `askMeetingName`).
|
||||||
|
private weak var activeNamingAlert: NSAlert?
|
||||||
|
/// Set once `prepareForTermination` begins, so we skip the post-stop naming prompt.
|
||||||
|
private var isTerminating = false
|
||||||
|
|
||||||
init(settings: AppSettings) {
|
init(settings: AppSettings) {
|
||||||
self.settings = settings
|
self.settings = settings
|
||||||
@@ -324,6 +329,9 @@ final class SessionController: ObservableObject {
|
|||||||
lifecycleTask = Task {
|
lifecycleTask = Task {
|
||||||
let result = await recorder.stop()
|
let result = await recorder.stop()
|
||||||
let visual = await self.stopVisualAndTimeline(result, folder: folder)
|
let visual = await self.stopVisualAndTimeline(result, folder: folder)
|
||||||
|
// Interactive stop only: ask for a meeting name and give the folder a
|
||||||
|
// readable name before `finish()` captures it for backend processing.
|
||||||
|
self.promptMeetingNameAndRename()
|
||||||
self.finish(result, timeline: visual.timeline, selfSpans: visual.selfSpans, visualRan: visual.visualRan)
|
self.finish(result, timeline: visual.timeline, selfSpans: visual.selfSpans, visualRan: visual.visualRan)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -338,13 +346,18 @@ final class SessionController: ObservableObject {
|
|||||||
if let folder = currentFolder {
|
if let folder = currentFolder {
|
||||||
writeSelfSpans(spans: selfSpans, result: result, to: folder)
|
writeSelfSpans(spans: selfSpans, result: result, to: folder)
|
||||||
let visualCount = visualRan ? timeline.count : nil // `timeline` is the remote vision segments
|
let visualCount = visualRan ? timeline.count : nil // `timeline` is the remote vision segments
|
||||||
|
// Re-derive the track URLs from `folder`: a meeting-name rename may have
|
||||||
|
// moved the session after `result` captured its original paths.
|
||||||
|
let micURL = folder.appendingPathComponent("mic.wav")
|
||||||
|
let systemURL = folder.appendingPathComponent("system.wav")
|
||||||
|
let mixedURL = folder.appendingPathComponent("mixed_mono_16k.wav")
|
||||||
lastSession = SessionInfo(
|
lastSession = SessionInfo(
|
||||||
folder: folder, mixedURL: result.mixedURL,
|
folder: folder, mixedURL: mixedURL,
|
||||||
duration: result.duration, selfSpanCount: selfSpans.count,
|
duration: result.duration, selfSpanCount: selfSpans.count,
|
||||||
visualSegmentCount: visualCount)
|
visualSegmentCount: visualCount)
|
||||||
lastProcess = ProcessInputs(
|
lastProcess = ProcessInputs(
|
||||||
folder: folder, sessionId: folder.lastPathComponent, app: currentLabel,
|
folder: folder, sessionId: folder.lastPathComponent, app: currentLabel,
|
||||||
micURL: result.micURL, systemURL: result.systemURL, mixedURL: result.mixedURL,
|
micURL: micURL, systemURL: systemURL, mixedURL: mixedURL,
|
||||||
timeline: timeline, selfSpans: selfSpans, selfName: settings.selfName,
|
timeline: timeline, selfSpans: selfSpans, selfName: settings.selfName,
|
||||||
systemHealthy: result.systemNote == nil)
|
systemHealthy: result.systemNote == nil)
|
||||||
}
|
}
|
||||||
@@ -419,24 +432,13 @@ final class SessionController: ObservableObject {
|
|||||||
guard settings.recapEnabled, !resolved.segments.isEmpty else { return }
|
guard settings.recapEnabled, !resolved.segments.isEmpty else { return }
|
||||||
let analyzer = RecapAnalyzer(llm: llm, model: model)
|
let analyzer = RecapAnalyzer(llm: llm, model: model)
|
||||||
guard let result = try? await analyzer.recap(file: resolved, template: settings.defaultTemplate) else { return }
|
guard let result = try? await analyzer.recap(file: resolved, template: settings.defaultTemplate) else { return }
|
||||||
let title = Self.recapTitle(app: inputs.app, sessionId: inputs.sessionId)
|
let title = SessionNaming.recapTitle(app: inputs.app, sessionId: inputs.sessionId)
|
||||||
try? RecapRenderer.write(file: resolved, result: result, title: title, to: inputs.folder)
|
try? RecapRenderer.write(file: resolved, result: result, title: title, to: inputs.folder)
|
||||||
try? RecapFile(title: title, result: result).write(to: inputs.folder.appendingPathComponent("recap.json"))
|
try? RecapFile(title: title, result: result).write(to: inputs.folder.appendingPathComponent("recap.json"))
|
||||||
let url = inputs.folder.appendingPathComponent("recap.html")
|
let url = inputs.folder.appendingPathComponent("recap.html")
|
||||||
if FileManager.default.fileExists(atPath: url.path) { self.recapURL = url }
|
if FileManager.default.fileExists(atPath: url.path) { self.recapURL = url }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Friendly recap title, e.g. "Google Meet call — 2026-06-06 11:43".
|
|
||||||
private static func recapTitle(app: String, sessionId: String) -> String {
|
|
||||||
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
|
|
||||||
let stamp = sessionId.split(separator: "_").first.map(String.init) ?? sessionId
|
|
||||||
let parts = stamp.split(separator: "T")
|
|
||||||
let date = parts.first.map(String.init) ?? ""
|
|
||||||
let timeBits = parts.count > 1 ? parts[1].split(separator: "-") : []
|
|
||||||
let time = timeBits.count >= 2 ? "\(timeBits[0]):\(timeBits[1])" : ""
|
|
||||||
return "\(appName) call — \(date) \(time)".trimmingCharacters(in: .whitespaces)
|
|
||||||
}
|
|
||||||
|
|
||||||
// MARK: - Speaker corrections
|
// MARK: - Speaker corrections
|
||||||
|
|
||||||
/// True once the last session has a transcribed `speakers.json` to correct.
|
/// True once the last session has a transcribed `speakers.json` to correct.
|
||||||
@@ -584,6 +586,11 @@ final class SessionController: ObservableObject {
|
|||||||
/// its WAV headers are finalized before the process exits. Handles quit while
|
/// its WAV headers are finalized before the process exits. Handles quit while
|
||||||
/// `.starting` and `.finishing`, not just `.recording`.
|
/// `.starting` and `.finishing`, not just `.recording`.
|
||||||
func prepareForTermination() async {
|
func prepareForTermination() async {
|
||||||
|
isTerminating = true
|
||||||
|
// If the meeting-name prompt is open, end its modal loop so quit isn't blocked
|
||||||
|
// waiting on the user — the session keeps its auto timestamped name. (Falls
|
||||||
|
// back to the user answering the on-screen dialog if the abort isn't serviced.)
|
||||||
|
if activeNamingAlert != nil { NSApp.abortModal() }
|
||||||
// Cancel any in-flight backend transcription (audio is already saved; the
|
// Cancel any in-flight backend transcription (audio is already saved; the
|
||||||
// user can resend). The pipeline's checkCancellation + defer clean up chunks.
|
// user can resend). The pipeline's checkCancellation + defer clean up chunks.
|
||||||
processTask?.cancel()
|
processTask?.cancel()
|
||||||
@@ -649,6 +656,59 @@ final class SessionController: ObservableObject {
|
|||||||
return f.string(from: Date())
|
return f.string(from: Date())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Ask the user to name the just-finished recording, then rename its folder to
|
||||||
|
/// a readable `<date>_<name>_<app>` (dropping the HH-MM-SS auto stamp). Skipping
|
||||||
|
/// or leaving it blank keeps the timestamped name. Must run BEFORE `finish()` so
|
||||||
|
/// the renamed folder is what flows to backend processing. The recorder and
|
||||||
|
/// visual capture have both finished by now, so every session file is closed and
|
||||||
|
/// the move is safe. Never called from the quit path — we don't block a quit on
|
||||||
|
/// a prompt.
|
||||||
|
private func promptMeetingNameAndRename() {
|
||||||
|
// A quit can begin while we're finishing — don't put a blocking prompt in its
|
||||||
|
// way; keep the auto timestamped name and let termination drain.
|
||||||
|
guard !isTerminating, let folder = currentFolder,
|
||||||
|
let name = askMeetingName() else { return } // nil = skipped / blank
|
||||||
|
let base = folder.deletingLastPathComponent()
|
||||||
|
let date = SessionNaming.datePrefix(ofSessionNamed: folder.lastPathComponent)
|
||||||
|
let fm = FileManager.default
|
||||||
|
var counter = 0
|
||||||
|
while counter < 100 {
|
||||||
|
guard let leaf = SessionNaming.renamedLeaf(
|
||||||
|
date: date, app: currentLabel, meetingName: name, counter: counter) else { return }
|
||||||
|
let target = base.appendingPathComponent(leaf, isDirectory: true)
|
||||||
|
if fm.fileExists(atPath: target.path) { counter += 1; continue } // disambiguate
|
||||||
|
do {
|
||||||
|
try fm.moveItem(at: folder, to: target)
|
||||||
|
currentFolder = target
|
||||||
|
} catch {
|
||||||
|
NSLog("Session rename to “\(leaf)” failed: \(error.localizedDescription)") // keep the original folder
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
NSLog("Session rename: kept “\(folder.lastPathComponent)” — 100 name collisions")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Modal prompt for a meeting name. Registers the alert so `prepareForTermination`
|
||||||
|
/// can end it on quit. Returns the trimmed name, or nil if the user skipped, left
|
||||||
|
/// it empty, or a quit aborted the prompt (caller keeps the auto folder name).
|
||||||
|
private func askMeetingName() -> String? {
|
||||||
|
let alert = NSAlert()
|
||||||
|
alert.messageText = "Name this recording"
|
||||||
|
alert.informativeText = "Give the meeting a name so its folder is easy to find in your sessions. Leave blank to keep the timestamped name."
|
||||||
|
alert.addButton(withTitle: "Save") // .alertFirstButtonReturn
|
||||||
|
alert.addButton(withTitle: "Skip") // .alertSecondButtonReturn
|
||||||
|
let field = NSTextField(frame: NSRect(x: 0, y: 0, width: 240, height: 24))
|
||||||
|
field.placeholderString = "Meeting name"
|
||||||
|
alert.accessoryView = field
|
||||||
|
alert.window.initialFirstResponder = field
|
||||||
|
NSApp.activate(ignoringOtherApps: true)
|
||||||
|
activeNamingAlert = alert
|
||||||
|
defer { activeNamingAlert = nil }
|
||||||
|
guard alert.runModal() == .alertFirstButtonReturn else { return nil }
|
||||||
|
let text = field.stringValue.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||||
|
return text.isEmpty ? nil : text
|
||||||
|
}
|
||||||
|
|
||||||
/// Debug artifact: the channel-verified "self" spans actually sent to the backend
|
/// Debug artifact: the channel-verified "self" spans actually sent to the backend
|
||||||
/// as `self_vad` (mic active AND louder than system). Lets us eyeball self detection.
|
/// as `self_vad` (mic active AND louder than system). Lets us eyeball self detection.
|
||||||
private func writeSelfSpans(spans: [VADSpan], result: RecordingResult, to folder: URL) {
|
private func writeSelfSpans(spans: [VADSpan], result: RecordingResult, to folder: URL) {
|
||||||
|
|||||||
@@ -0,0 +1,71 @@
|
|||||||
|
import Foundation
|
||||||
|
|
||||||
|
/// Pure helpers for session-folder names. A session folder is created at start
|
||||||
|
/// with an auto name `<yyyy-MM-dd'T'HH-mm-ss>_<app>`; when the user names the
|
||||||
|
/// recording on stop it's renamed to `<yyyy-MM-dd>_<name>_<app>` (no HH-MM-SS),
|
||||||
|
/// which is far easier to scan in `sessions/`. The app label always stays the
|
||||||
|
/// LAST `_`-separated segment so `SessionController.appLabel(from:)` keeps working
|
||||||
|
/// even when the meeting name itself contains spaces or underscores.
|
||||||
|
enum SessionNaming {
|
||||||
|
/// Filesystem- and parse-safe meeting name: trims, turns path separators into
|
||||||
|
/// dashes, drops control characters, collapses whitespace runs, removes leading
|
||||||
|
/// dots (no hidden/`.`/`..` folders), and caps the length. Returns "" if nothing
|
||||||
|
/// usable is left, which callers treat as "skip the rename".
|
||||||
|
static func sanitize(_ raw: String) -> String {
|
||||||
|
var s = raw.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||||
|
// Path-hostile separators (`/` and the classic Mac `:`, plus `\`) → dash.
|
||||||
|
s = s.components(separatedBy: CharacterSet(charactersIn: "/:\\")).joined(separator: "-")
|
||||||
|
// Strip control characters outright.
|
||||||
|
s = s.components(separatedBy: .controlCharacters).joined()
|
||||||
|
// Collapse internal whitespace runs to single spaces.
|
||||||
|
s = s.split(whereSeparator: { $0 == " " || $0 == "\t" }).joined(separator: " ")
|
||||||
|
while s.hasPrefix(".") { s.removeFirst() }
|
||||||
|
s = s.trimmingCharacters(in: .whitespaces)
|
||||||
|
if s.count > 60 { s = String(s.prefix(60)).trimmingCharacters(in: .whitespaces) }
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The date prefix of a session leaf name, e.g. `2026-06-17T09-59-48_signal`
|
||||||
|
/// → `2026-06-17`. Already-renamed leaves (`2026-06-17_name_signal`) return the
|
||||||
|
/// same date, so this is safe to call on either form.
|
||||||
|
static func datePrefix(ofSessionNamed leaf: String) -> String {
|
||||||
|
let head = leaf.split(separator: "_").first.map(String.init) ?? leaf
|
||||||
|
return head.split(separator: "T").first.map(String.init) ?? head
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compose the renamed leaf `<date>_<name>_<app>`. A positive `counter`
|
||||||
|
/// disambiguates a collision by suffixing the NAME segment (`<name>-2`) so the
|
||||||
|
/// trailing `_<app>` stays parseable. Returns nil when the name sanitizes to
|
||||||
|
/// empty (the caller keeps the auto timestamped name).
|
||||||
|
static func renamedLeaf(date: String, app: String, meetingName: String, counter: Int = 0) -> String? {
|
||||||
|
let clean = sanitize(meetingName)
|
||||||
|
guard !clean.isEmpty else { return nil }
|
||||||
|
let suffix = counter > 0 ? "-\(counter + 1)" : ""
|
||||||
|
return "\(date)_\(clean)\(suffix)_\(app)"
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Friendly recap title from a session id, understanding both folder forms:
|
||||||
|
/// `2026-06-06T11-43-02_meet` → "Google Meet call — 2026-06-06 11:43"
|
||||||
|
/// `2026-06-06_Weekly sync_meet` → "Weekly sync — Google Meet (2026-06-06)"
|
||||||
|
static func recapTitle(app: String, sessionId: String) -> String {
|
||||||
|
let appName = CallDetector.DetectedApp(rawValue: app)?.display ?? app.capitalized
|
||||||
|
var parts = sessionId.split(separator: "_").map(String.init)
|
||||||
|
if parts.count > 1 { parts.removeLast() } // drop the trailing "_<app>"
|
||||||
|
let head = parts.first ?? sessionId
|
||||||
|
let tBits = head.split(separator: "T").map(String.init)
|
||||||
|
let date = tBits.first ?? head
|
||||||
|
let time: String = {
|
||||||
|
guard tBits.count > 1 else { return "" }
|
||||||
|
let b = tBits[1].split(separator: "-")
|
||||||
|
return b.count >= 2 ? "\(b[0]):\(b[1])" : ""
|
||||||
|
}()
|
||||||
|
let when = [date, time].filter { !$0.isEmpty }.joined(separator: " ")
|
||||||
|
// Rejoin with "_" — the faithful inverse of split("_") — so a name that
|
||||||
|
// itself contained underscores survives the round-trip through the folder name.
|
||||||
|
let name = parts.count > 1 ? parts[1...].joined(separator: "_") : ""
|
||||||
|
if name.isEmpty {
|
||||||
|
return "\(appName) call — \(when)".trimmingCharacters(in: .whitespaces)
|
||||||
|
}
|
||||||
|
return "\(name) — \(appName) (\(when))".trimmingCharacters(in: .whitespaces)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -121,8 +121,8 @@ final class TranscriptPipeline {
|
|||||||
return assembled.speakersFile
|
return assembled.speakersFile
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build the `label-merge` timeline from mic-VAD self spans (Phase 1/2). Once
|
/// Build the `label-merge` timeline from mic-VAD self spans; the visual
|
||||||
/// the visual adapters land (Phase 3–4), their segments are merged in too.
|
/// adapters' segments are merged in alongside these.
|
||||||
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
||||||
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,8 +3,8 @@ import Combine
|
|||||||
|
|
||||||
/// User-facing settings, persisted to `UserDefaults`.
|
/// User-facing settings, persisted to `UserDefaults`.
|
||||||
///
|
///
|
||||||
/// Phase 0 scope: backend host + TLS-skip, output folder, and adapter toggles.
|
/// Covers the backend host + TLS handling, output folder, your name, chunk
|
||||||
/// The adapter toggles persist but do nothing yet (adapters arrive in Phase 3–4).
|
/// length, per-app adapter toggles, and the auto-record/auto-send/recap flags.
|
||||||
@MainActor
|
@MainActor
|
||||||
final class AppSettings: ObservableObject {
|
final class AppSettings: ObservableObject {
|
||||||
|
|
||||||
@@ -106,7 +106,10 @@ final class AppSettings: ObservableObject {
|
|||||||
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
|
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
|
||||||
?? Self.defaultBackendURL
|
?? Self.defaultBackendURL
|
||||||
|
|
||||||
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
|
// Off by default: install the Start9 Root CA in the System keychain and the
|
||||||
|
// backend's cert validates normally. The bypass is an opt-in escape hatch and,
|
||||||
|
// when on, is scoped to the configured host (see `InsecureTrustDelegate`).
|
||||||
|
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? false
|
||||||
|
|
||||||
self.outputFolderPath = defaults.string(forKey: Keys.outputFolder)
|
self.outputFolderPath = defaults.string(forKey: Keys.outputFolder)
|
||||||
?? "~/Ten31Transcripts"
|
?? "~/Ten31Transcripts"
|
||||||
|
|||||||
@@ -30,8 +30,6 @@
|
|||||||
<string>Ten31</string>
|
<string>Ten31</string>
|
||||||
<key>NSMicrophoneUsageDescription</key>
|
<key>NSMicrophoneUsageDescription</key>
|
||||||
<string>Ten31 Transcripts records your microphone during calls to build the local audio track.</string>
|
<string>Ten31 Transcripts records your microphone during calls to build the local audio track.</string>
|
||||||
<key>NSAppleEventsUsageDescription</key>
|
|
||||||
<string>Ten31 Transcripts reads the active browser tab's URL to detect Google Meet calls.</string>
|
|
||||||
<key>NSLocalNetworkUsageDescription</key>
|
<key>NSLocalNetworkUsageDescription</key>
|
||||||
<string>Ten31 Transcripts connects to your SparkControl server on the local network.</string>
|
<string>Ten31 Transcripts connects to your SparkControl server on the local network.</string>
|
||||||
<key>NSAppTransportSecurity</key>
|
<key>NSAppTransportSecurity</key>
|
||||||
|
|||||||
@@ -173,7 +173,7 @@ struct MenuBarView: View {
|
|||||||
private var header: some View {
|
private var header: some View {
|
||||||
VStack(alignment: .leading, spacing: 2) {
|
VStack(alignment: .leading, spacing: 2) {
|
||||||
Text("Ten31 Transcripts").font(.headline)
|
Text("Ten31 Transcripts").font(.headline)
|
||||||
Text("Phase 0 · setup & status")
|
Text("Setup & status")
|
||||||
.font(.caption)
|
.font(.caption)
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -62,7 +62,7 @@ struct VisualTimeline: Codable {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// The flat array `label-merge` wants: `[{start,end,name,confidence}]`,
|
/// The flat array `label-merge` wants: `[{start,end,name,confidence}]`,
|
||||||
/// dropping `source`. Slice/rebase to chunk-local seconds happens in Phase 5.
|
/// dropping `source`. Slice/rebase to chunk-local seconds happens at chunking time.
|
||||||
func flatTimelineData() throws -> Data {
|
func flatTimelineData() throws -> Data {
|
||||||
let flat = segments.map { seg -> [String: Any] in
|
let flat = segments.map { seg -> [String: Any] in
|
||||||
["start": seg.start, "end": seg.end, "name": seg.name, "confidence": seg.confidence]
|
["start": seg.start, "end": seg.end, "name": seg.name, "confidence": seg.confidence]
|
||||||
|
|||||||
@@ -0,0 +1,35 @@
|
|||||||
|
import XCTest
|
||||||
|
@testable import Ten31Transcripts
|
||||||
|
|
||||||
|
/// The TLS bypass is an opt-in escape hatch scoped to the configured backend host.
|
||||||
|
/// These cover the security gate (`allowsTrustOverride`) so a regression can't widen
|
||||||
|
/// it back to "trust any server". The gate is pure, so no network or SecTrust needed.
|
||||||
|
final class InsecureTrustDelegateTests: XCTestCase {
|
||||||
|
private func space(host: String,
|
||||||
|
method: String = NSURLAuthenticationMethodServerTrust) -> URLProtectionSpace {
|
||||||
|
URLProtectionSpace(host: host, port: 62419, protocol: "https",
|
||||||
|
realm: nil, authenticationMethod: method)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testFiresForMatchingHost() {
|
||||||
|
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||||
|
XCTAssertTrue(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRejectsMismatchedHost() {
|
||||||
|
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||||
|
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "evil.example.com")))
|
||||||
|
}
|
||||||
|
|
||||||
|
func testNilAllowedHostNeverFires() {
|
||||||
|
let d = InsecureTrustDelegate(allowedHost: nil)
|
||||||
|
XCTAssertFalse(d.allowsTrustOverride(for: space(host: "192.0.2.1")))
|
||||||
|
}
|
||||||
|
|
||||||
|
func testOnlyServerTrustMethodFires() {
|
||||||
|
// Matching host but a non-server-trust challenge (e.g. HTTP Basic) must not override.
|
||||||
|
let d = InsecureTrustDelegate(allowedHost: "192.0.2.1")
|
||||||
|
XCTAssertFalse(d.allowsTrustOverride(
|
||||||
|
for: space(host: "192.0.2.1", method: NSURLAuthenticationMethodHTTPBasic)))
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,110 @@
|
|||||||
|
import XCTest
|
||||||
|
@testable import Ten31Transcripts
|
||||||
|
|
||||||
|
final class SessionNamingTests: XCTestCase {
|
||||||
|
|
||||||
|
// MARK: sanitize
|
||||||
|
|
||||||
|
func testSanitizeTrimsAndKeepsSpaces() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize(" Weekly Sync "), "Weekly Sync")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeReplacesPathSeparators() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize("9/10 standup"), "9-10 standup")
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize("a:b\\c"), "a-b-c")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeCollapsesWhitespaceRuns() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize("board 1:1"), "board 1-1")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeStripsLeadingDots() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize("...hidden"), "hidden")
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize(".."), "")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeEmptyForBlankOrWhitespace() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize(""), "")
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize(" \n\t "), "")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeCapsLength() {
|
||||||
|
let long = String(repeating: "x", count: 200)
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize(long).count, 60)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSanitizeStripsControlCharacters() {
|
||||||
|
XCTAssertEqual(SessionNaming.sanitize("a\u{0000}b\u{001F}c"), "abc")
|
||||||
|
}
|
||||||
|
|
||||||
|
// MARK: datePrefix
|
||||||
|
|
||||||
|
func testDatePrefixFromAutoName() {
|
||||||
|
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17T09-59-48_signal"), "2026-06-17")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testDatePrefixFromRenamedName() {
|
||||||
|
XCTAssertEqual(SessionNaming.datePrefix(ofSessionNamed: "2026-06-17_Weekly sync_signal"), "2026-06-17")
|
||||||
|
}
|
||||||
|
|
||||||
|
// MARK: renamedLeaf
|
||||||
|
|
||||||
|
func testRenamedLeafBasic() {
|
||||||
|
XCTAssertEqual(
|
||||||
|
SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "Weekly sync"),
|
||||||
|
"2026-06-17_Weekly sync_signal")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRenamedLeafAppStaysLastSegment() {
|
||||||
|
// The meeting name may contain underscores; the app must remain parseable as
|
||||||
|
// the final "_"-segment (what SessionController.appLabel reads).
|
||||||
|
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "meet", meetingName: "q3_planning")
|
||||||
|
XCTAssertEqual(leaf, "2026-06-17_q3_planning_meet")
|
||||||
|
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "meet")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRenamedLeafNilForBlankName() {
|
||||||
|
XCTAssertNil(SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: " "))
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRenamedLeafCounterDisambiguatesNameSegment() {
|
||||||
|
// A collision suffixes the NAME, not the whole leaf, so "_app" stays last.
|
||||||
|
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "sync", counter: 1)
|
||||||
|
XCTAssertEqual(leaf, "2026-06-17_sync-2_signal")
|
||||||
|
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRenamedLeafAppStaysLastAtMaxCollisionDepth() {
|
||||||
|
// The 100-collision cap is counter 0…99; the app must still parse out last.
|
||||||
|
let leaf = SessionNaming.renamedLeaf(date: "2026-06-17", app: "signal", meetingName: "q3_sync", counter: 99)
|
||||||
|
XCTAssertEqual(leaf, "2026-06-17_q3_sync-100_signal")
|
||||||
|
XCTAssertEqual(leaf?.split(separator: "_").last.map(String.init), "signal")
|
||||||
|
}
|
||||||
|
|
||||||
|
// MARK: recapTitle
|
||||||
|
|
||||||
|
func testRecapTitleAutoNamePreservesLegacyFormat() {
|
||||||
|
XCTAssertEqual(
|
||||||
|
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06T11-43-02_meet"),
|
||||||
|
"Google Meet call — 2026-06-06 11:43")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRecapTitleNamedSession() {
|
||||||
|
XCTAssertEqual(
|
||||||
|
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_Weekly sync_meet"),
|
||||||
|
"Weekly sync — Google Meet (2026-06-06)")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRecapTitleNamePreservesUnderscores() {
|
||||||
|
// A meeting name with underscores must survive the split/join round-trip.
|
||||||
|
XCTAssertEqual(
|
||||||
|
SessionNaming.recapTitle(app: "meet", sessionId: "2026-06-06_q3_planning_meet"),
|
||||||
|
"q3_planning — Google Meet (2026-06-06)")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testRecapTitleUnknownAppCapitalizes() {
|
||||||
|
XCTAssertEqual(
|
||||||
|
SessionNaming.recapTitle(app: "manual", sessionId: "2026-06-06T11-43-02_manual"),
|
||||||
|
"Manual call — 2026-06-06 11:43")
|
||||||
|
}
|
||||||
|
}
|
||||||
+46
-35
@@ -7,9 +7,9 @@
|
|||||||
> returns named transcript segments. A growing **voiceprint library** recovers
|
> returns named transcript segments. A growing **voiceprint library** recovers
|
||||||
> speakers even when the visual cue is missing.
|
> speakers even when the visual cue is missing.
|
||||||
|
|
||||||
Master context document. Read this first, then `02_ARCHITECTURE.md`,
|
Master context document. Read this first, then `02_ARCHITECTURE.md` and
|
||||||
`03_DATA_CONTRACTS.md`, `04_BUILD_PLAN.md`. The SparkControl API is now fully
|
`03_DATA_CONTRACTS.md`. The SparkControl API is fully specified in
|
||||||
specified — see `03_DATA_CONTRACTS.md` (and the source `AUDIO_API.md`).
|
`03_DATA_CONTRACTS.md`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -20,25 +20,30 @@ A lightweight, always-running **menu-bar app on macOS** that:
|
|||||||
1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams,
|
1. **Detects** when the user joins a call in Google Meet, Zoom, Microsoft Teams,
|
||||||
or Signal.
|
or Signal.
|
||||||
2. **Records two local audio tracks** — system audio (everyone else) and the
|
2. **Records two local audio tracks** — system audio (everyone else) and the
|
||||||
user's microphone (the user) — and **mixes them to one 16 kHz mono WAV** for
|
user's microphone (the user). It sends the backend **dual-channel**
|
||||||
the backend.
|
(`mic_file` + `system_file`) when the system track is healthy, falling back to
|
||||||
|
a **mixed-mono 16 kHz WAV** otherwise.
|
||||||
3. **Watches the call window** at ~2–4 fps and, per app, reads participant
|
3. **Watches the call window** at ~2–4 fps and, per app, reads participant
|
||||||
**names** and the **active-speaker cue**, producing a
|
**names** and the **active-speaker cue**, producing a
|
||||||
`(start, end, name, confidence)` **visual timeline** — its best guess at who
|
`(start, end, name, confidence)` **visual timeline** — its best guess at who
|
||||||
was talking when.
|
was talking when.
|
||||||
4. **Discards every video frame after extraction.** No video is ever written to
|
4. **Discards every video frame after extraction.** No video is ever written to
|
||||||
disk. Only audio + the derived timeline persist locally.
|
disk. Only audio + the derived timeline persist locally.
|
||||||
5. On call end, **POSTs the mixed audio + the visual timeline (+ the known
|
5. On call end, **POSTs the audio + the visual timeline (+ the known voiceprint
|
||||||
voiceprint library) to `POST /api/audio/label-merge`** on SparkControl, which
|
library) to `POST /api/audio/label-merge`** on SparkControl, which returns
|
||||||
returns **named, speaker-attributed transcript segments** and a **voiceprint
|
**named, speaker-attributed transcript segments** and a **voiceprint per
|
||||||
per speaker**.
|
speaker**.
|
||||||
6. **Persists the returned voiceprints** keyed by name, so the next call can pass
|
6. **Persists the returned voiceprints** keyed by name, so the next call can pass
|
||||||
them as `known_voiceprints` and recover a speaker by voice when the visual cue
|
them as `known_voiceprints` and recover a speaker by voice when the visual cue
|
||||||
is absent (camera off, a bad OCR frame).
|
is absent (camera off, a bad OCR frame).
|
||||||
|
7. **Renders the result locally** — a readable `transcript.md` plus an HTML
|
||||||
|
`recap.html` (topics + meeting extras, generated via the backend's LLM
|
||||||
|
endpoint), with an in-app editor for fixing speaker names after the fact.
|
||||||
|
|
||||||
The app's job ends at receiving and storing the named segments from SparkControl.
|
The app's job ends at producing the named transcript and recap from SparkControl's
|
||||||
**All transcription, diarization, and the name-merge happen on the backend.** Do
|
segments. **All transcription, diarization, name-merge, and LLM analysis happen on
|
||||||
not build transcription, diarization, or the merge vote in this app.
|
the backend.** Do not build transcription, diarization, or the merge vote in this
|
||||||
|
app.
|
||||||
|
|
||||||
## 2. Why the visual timeline still matters (the core idea)
|
## 2. Why the visual timeline still matters (the core idea)
|
||||||
|
|
||||||
@@ -68,19 +73,25 @@ few calls the system can name regulars even with cameras off.
|
|||||||
|
|
||||||
**In scope (this app):**
|
**In scope (this app):**
|
||||||
- Call detection for Meet / Zoom / Teams / Signal.
|
- Call detection for Meet / Zoom / Teams / Signal.
|
||||||
- Dual-track local audio capture + mix-to-mono for the backend.
|
- Dual-track local audio capture; **dual-channel send** (mic + system) with a
|
||||||
|
mix-to-mono fallback for the backend.
|
||||||
- Low-fps window capture → OCR (names) + active-speaker cue detection.
|
- Low-fps window capture → OCR (names) + active-speaker cue detection.
|
||||||
- Per-app "adapter" modules encapsulating each app's UI quirks.
|
- Per-app "adapter" modules encapsulating each app's UI quirks.
|
||||||
- Building the visual timeline; **mic-VAD self-labeling** (the mic track is the
|
- Building the visual timeline; **mic-VAD self-labeling** (the mic track is the
|
||||||
user, so hot-mic spans pre-seed the user's name into the timeline).
|
user, so hot-mic spans pre-seed the user's name into the timeline).
|
||||||
- Chunking long calls (~2–3 min) and calling `label-merge` **sequentially**.
|
- Chunking long calls (~2–3 min) and calling `label-merge` **sequentially**.
|
||||||
- A local **voiceprint store** (persist + replay named voiceprints).
|
- A local **voiceprint store** (persist + replay named voiceprints).
|
||||||
- Storing the backend's named transcript segments locally.
|
- Storing the backend's named segments and **rendering** them — `transcript.md`
|
||||||
- A minimal menu-bar UI: status, manual start/stop, recent sessions, adapter
|
plus an HTML `recap.html` (recap analysis via the backend LLM) — with an in-app
|
||||||
toggles, backend host/health, output folder.
|
speaker-name editor.
|
||||||
|
- A minimal menu-bar UI: status, manual start/stop, the last session (reveal,
|
||||||
|
resend, open recap, edit speakers), adapter toggles, backend host/health,
|
||||||
|
output folder.
|
||||||
|
|
||||||
**Out of scope (owned by the backend):**
|
**Out of scope (owned by the backend):**
|
||||||
- Transcription, diarization, the name-merge vote, summarization/analysis.
|
- Transcription, diarization, the name-merge vote, and LLM summarization — these
|
||||||
|
run on the backend; the app only orchestrates the recap call and renders the
|
||||||
|
result.
|
||||||
|
|
||||||
**Explicitly not doing:** saving video; cloud anything. Everything stays on the
|
**Explicitly not doing:** saving video; cloud anything. Everything stays on the
|
||||||
operator's LAN.
|
operator's LAN.
|
||||||
@@ -91,14 +102,14 @@ operator's LAN.
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. |
|
| Language / framework | Native Swift + SwiftUI menu-bar app (`LSUIElement`) | System audio, window capture, Vision all native; one codebase. |
|
||||||
| Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. |
|
| Audio capture | ScreenCaptureKit (system audio) + AVFoundation (mic) | No virtual audio device; works with headphones; macOS 13+. |
|
||||||
| Backend audio format | **Mixed-mono 16 kHz WAV** | Diarizer separates speakers from one mixed stream; 16 kHz is ideal. |
|
| Backend audio format | **Dual-channel (mic + system)** when the system track is healthy, else **mixed-mono 16 kHz WAV** | Separate tracks let the backend attribute the user's mic channel directly; the diarizer can still split the mono fallback. |
|
||||||
| Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. |
|
| Call detection | CoreAudio "mic running somewhere" + known-app / Meet-tab heuristic | Clean live-mic signal + app disambiguation. |
|
||||||
| Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. |
|
| Speaker naming | **Backend, via `POST /api/audio/label-merge`** | One call does diarize + overlap-vote naming + transcription. No client merge. |
|
||||||
| Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. |
|
| Identity recovery | **Local voiceprint library** replayed as `known_voiceprints` | Recovers camera-off / OCR-missed speakers by voice; compounds over calls. |
|
||||||
| Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. |
|
| Self-identity | mic-VAD → pre-seed user's name in timeline | The mic track is the user; gives the backend a strong prior + enrolls the user's voiceprint immediately. |
|
||||||
| Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). |
|
| Requests | **Sequential, one audio request in flight** | Parallel audio requests trip a backend GPU race (`503 + Retry-After`). |
|
||||||
| Long calls | Chunk ~2–3 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. |
|
| Long calls | Chunk ~2–3 min, sequential, stitch via names+voiceprints | Diarizer caps at **4 speakers/chunk**; voiceprints + names unify across chunks. |
|
||||||
| Transport / TLS | `multipart/form-data`, file field `file`; self-signed Start9 cert (skip verify or trust the Root CA); **no auth on LAN** | Matches every other SparkControl endpoint. |
|
| Transport / TLS | `multipart/form-data`, file field `file` (mono) or `mic_file` + `system_file` (dual-channel); self-signed Start9 cert (trust the Root CA — supported default; host-scoped skip-verify is an off-by-default escape hatch); **no auth on LAN** | Matches every other SparkControl endpoint. |
|
||||||
| Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. |
|
| Timing | Batch after call (sync endpoints, no polling) | Endpoints are synchronous; no job/poll machinery needed. |
|
||||||
|
|
||||||
### On forking Hyprnote
|
### On forking Hyprnote
|
||||||
@@ -128,25 +139,25 @@ SparkControl, on the operator's Start9 LAN, fronting two DGX Sparks:
|
|||||||
- **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize +
|
- **★ Primary endpoint for this app:** `POST /api/audio/label-merge` — diarize +
|
||||||
name from the visual timeline (+ voiceprint fallback), optionally transcribe,
|
name from the visual timeline (+ voiceprint fallback), optionally transcribe,
|
||||||
in one synchronous call.
|
in one synchronous call.
|
||||||
|
- **LLM (recap):** Qwen3 via OpenAI-compatible `POST /v1/chat/completions` —
|
||||||
|
generates the readable recap (topics + meeting extras) from the transcript.
|
||||||
- Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`.
|
- Health/discovery: `GET /api/status`, `GET /api/endpoints`, `GET /v1/models`.
|
||||||
|
|
||||||
Full request/response shapes, curl examples, limits, and error formats are in
|
Full request/response shapes, curl examples, limits, and error formats are in
|
||||||
`03_DATA_CONTRACTS.md`.
|
`03_DATA_CONTRACTS.md`.
|
||||||
|
|
||||||
## 7. Remaining open items (small)
|
## 7. Settled decisions (were open at brief time)
|
||||||
|
|
||||||
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
|
1. **Base URL.** A private LAN host — a `.local` mDNS name (preferred over a raw
|
||||||
over a raw IP, since it survives IP changes) — configured in Settings or via the
|
IP, since it survives IP changes) — configured in Settings or via the
|
||||||
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
|
`SPARK_BACKEND_URL` env var, never committed. A neutral placeholder ships as the
|
||||||
the default; keep it editable in settings. Service-discovery at
|
default and stays editable in Settings. Service-discovery at `GET /api/endpoints`.
|
||||||
`GET /api/endpoints`.
|
2. **Send trigger.** Auto-send on call end is a setting (`autoSendOnStop`), **off
|
||||||
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
|
by default** — the user reviews the session and sends manually unless they opt in.
|
||||||
toggle if the user wants to eyeball the timeline first.
|
3. **Retention.** The session folder is kept after a successful hand-off (output
|
||||||
3. **Retention** — keep the session folder after a successful hand-off, or prune
|
location is configurable); nothing is pruned automatically.
|
||||||
audio and keep only `speakers.json` + voiceprints? Default: keep everything,
|
4. **Voiceprint update policy.** Store/refresh the latest high-confidence vector
|
||||||
user-configurable.
|
per name (`02_ARCHITECTURE.md §2.9`); a per-name running average is a possible
|
||||||
4. **Voiceprint update policy** — overwrite vs running-average a person's stored
|
later refinement.
|
||||||
voiceprint across calls (see `02_ARCHITECTURE.md §2.9`). Start simple
|
5. **Signing.** A stable identity via `Config/Signing.xcconfig` (gitignored) keeps
|
||||||
(store/refresh latest high-confidence), refine later.
|
macOS from re-prompting for permissions on each rebuild.
|
||||||
5. **Signing** — stable identity so macOS doesn't re-prompt for permissions on
|
|
||||||
each rebuild.
|
|
||||||
|
|||||||
+23
-6
@@ -64,6 +64,9 @@ pattern, the macOS APIs, and the SparkControl integration (now fully specified).
|
|||||||
└────────────────┘ └────────────────────┘
|
└────────────────┘ └────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
|
(After `speakers.json`, a recap phase renders `transcript.md` + `recap.html` via
|
||||||
|
the backend LLM — see §2.11.)
|
||||||
|
|
||||||
## 2. Modules
|
## 2. Modules
|
||||||
|
|
||||||
### 2.1 `CallDetector`
|
### 2.1 `CallDetector`
|
||||||
@@ -176,8 +179,10 @@ Write the session folder and, if the call is longer than ~3 min, produce a
|
|||||||
```
|
```
|
||||||
|
|
||||||
### 2.7 `SparkControlClient`
|
### 2.7 `SparkControlClient`
|
||||||
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`** with
|
Deliver to SparkControl. **Primary path = `POST /api/audio/label-merge`**. Sends
|
||||||
`file`, `timeline`, `known_voiceprints`, `transcribe=true`.
|
**dual-channel** (`mic_file` + `system_file` + `self_name` + `self_vad`) when the
|
||||||
|
system track is healthy, else the **mono** `file`; always with `timeline`,
|
||||||
|
`known_voiceprints`, `transcribe=true`.
|
||||||
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
|
- **Sequential only** — one audio request in flight (parallel ⇒ `503 + Retry-After`).
|
||||||
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
|
- **Self-signed TLS** — skip verification (`URLSession` delegate trusting the
|
||||||
Start9 cert) or trust the Root CA. **No auth on the LAN.**
|
Start9 cert) or trust the Root CA. **No auth on the LAN.**
|
||||||
@@ -210,10 +215,22 @@ Local persistence of named voiceprints — the compounding-identity layer.
|
|||||||
- Editable/clearable from the menu-bar UI (rename, delete a person, reset).
|
- Editable/clearable from the menu-bar UI (rename, delete a person, reset).
|
||||||
|
|
||||||
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
|
### 2.10 `MenuBarUI` (SwiftUI, `LSUIElement`)
|
||||||
Status (idle / detected / recording / uploading), manual start/stop, recent
|
Status (idle / detected / recording / finishing), manual start/stop with live
|
||||||
sessions (open folder, resend, delete), adapter toggles, **backend host + a
|
mic/system level meters, and the **last session** — reveal in Finder, resend
|
||||||
health check** (`GET /api/status`), output folder, voiceprint manager, and a
|
("Send to backend"), open recap, and edit speakers — plus "Open saved session…"
|
||||||
permissions checklist (Screen Recording, Microphone, Accessibility).
|
to reprocess an existing folder. Also a **backend host + health check**
|
||||||
|
(`GET /api/status`), adapter toggles, output folder, and a permissions checklist
|
||||||
|
(Microphone, Screen Recording, Accessibility). (No multi-session list or
|
||||||
|
voiceprint-manager UI yet — those are in `ROADMAP.md`.)
|
||||||
|
|
||||||
|
### 2.11 Recap (`RecapAnalyzer`, `RecapRenderer`)
|
||||||
|
After `speakers.json`, the recap phase turns the named transcript into the
|
||||||
|
human-readable deliverables. `RecapAnalyzer` calls the backend LLM
|
||||||
|
(`POST /v1/chat/completions`, Qwen3) for topics + meeting extras; `RecapRenderer`
|
||||||
|
writes `transcript.md` (one line per diarized utterance) and `recap.html` (+ a
|
||||||
|
`recap.json` sidecar). The in-app speaker editor (`SpeakerEditing` /
|
||||||
|
`RecapEditModel`) rewrites names across all outputs after the fact. All
|
||||||
|
language-model work stays on the backend; the app orchestrates and renders.
|
||||||
|
|
||||||
## 3. macOS frameworks & permissions
|
## 3. macOS frameworks & permissions
|
||||||
|
|
||||||
|
|||||||
+28
-11
@@ -1,7 +1,7 @@
|
|||||||
# Data Contracts — Ten31 Transcripts
|
# Data Contracts — Ten31 Transcripts
|
||||||
|
|
||||||
Companion to docs 01/02. Defines the files the app produces/stores and the **real
|
Companion to docs 01/02. Defines the files the app produces/stores and the **real
|
||||||
SparkControl contract** (source of truth: `AUDIO_API.md`). The `label-merge`
|
SparkControl contract** (verified against the live backend). The `label-merge`
|
||||||
endpoint is the app's primary integration point.
|
endpoint is the app's primary integration point.
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -69,8 +69,10 @@ When chunking, **slice to the chunk window and rebase to chunk-local seconds**
|
|||||||
"app_version": "0.1.0"
|
"app_version": "0.1.0"
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
(`mixed_mono_16k.wav` is the one the backend gets; the separate tracks are kept
|
(On the dual-channel path the backend gets `mic.wav` + `system.wav` directly; on
|
||||||
locally — the mic track is the user's known identity / VAD source.)
|
the mono fallback it gets `mixed_mono_16k.wav`. The mic track is the user's known
|
||||||
|
identity / VAD source. **Note:** the per-file `sha256` fields above are part of the
|
||||||
|
intended contract but are **not currently emitted** by the pipeline.)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -83,15 +85,17 @@ locally — the mic track is the user's known identity / VAD source.)
|
|||||||
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
||||||
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
||||||
the default.
|
the default.
|
||||||
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
- **TLS:** Start9 self-signed Root CA. Supported path: install the Start9 Root CA
|
||||||
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
into the System keychain (default trust then succeeds). Skip-verification is an
|
||||||
the Start9 Root CA into the trust store.
|
**off-by-default, host-scoped** escape hatch (`InsecureTrustDelegate`, scoped to
|
||||||
|
the configured backend host), not the default.
|
||||||
- **Auth:** **none on the LAN.** No token/key today.
|
- **Auth:** **none on the LAN.** No token/key today.
|
||||||
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
|
- **Limits:** **200 MB/request** (`413` over); timeouts ~300 s (transcription),
|
||||||
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
|
~600 s (diarization). **Send audio requests SEQUENTIALLY** — concurrent audio
|
||||||
trips a GPU FFT race → `503 + Retry-After`.
|
trips a GPU FFT race → `503 + Retry-After`.
|
||||||
- **Transport:** `multipart/form-data`, audio file field name **`file`** (bytes,
|
- **Transport:** `multipart/form-data`. Audio file field is **`file`** on the mono
|
||||||
not base64/path).
|
path, or **`mic_file`** + **`system_file`** on the dual-channel path (bytes, not
|
||||||
|
base64/path).
|
||||||
- **All endpoints are synchronous** (no job IDs / polling).
|
- **All endpoints are synchronous** (no job IDs / polling).
|
||||||
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
|
- **Errors:** JSON `{"detail": "..."}`; `400` malformed, `413` too large, `503 +
|
||||||
Retry-After` transient (retry after the interval).
|
Retry-After` transient (retry after the interval).
|
||||||
@@ -105,11 +109,16 @@ Diarize + name clusters from the visual timeline (majority temporal overlap),
|
|||||||
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
|
with voiceprint fallback, optionally transcribed. Synchronous. **Stateless** —
|
||||||
the app owns the timeline and the voiceprint library.
|
the app owns the timeline and the voiceprint library.
|
||||||
|
|
||||||
**Multipart fields:**
|
**Multipart fields** — two audio shapes: **mono** (`file`) or **dual-channel**
|
||||||
|
(`mic_file` + `system_file`, preferred when the system track is healthy):
|
||||||
| field | required | notes |
|
| field | required | notes |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `file` | **yes** | mixed-mono WAV (the chunk, when chunking) |
|
| `file` | mono path | mixed-mono WAV (the chunk, when chunking) |
|
||||||
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1) |
|
| `mic_file` | dual path | the user's mic track (chunk) — attributed to `self_name` |
|
||||||
|
| `system_file` | dual path | the remote/system track (chunk) |
|
||||||
|
| `self_name` | dual path | the user's name; the mic channel is attributed to them |
|
||||||
|
| `self_vad` | no | chunk-local windows where the mic is genuinely the user (active + louder than system) |
|
||||||
|
| `timeline` | **yes** | flat JSON array `[{"start","end","name","confidence"}]`, chunk-local seconds (§1.1); on the dual path it names only the remote speakers |
|
||||||
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
|
| `known_voiceprints` | no | JSON `{"<name>":[192 floats], ...}` from `VoiceprintStore` |
|
||||||
| `transcribe` | no | `"true"` to also return per-segment text (default false) |
|
| `transcribe` | no | `"true"` to also return per-segment text (default false) |
|
||||||
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
|
| `min_overlap` | no | min fraction of a cluster's time overlapping the winning name (default `0.0`) |
|
||||||
@@ -213,3 +222,11 @@ Loaded → `known_voiceprints` on every `label-merge` call. Updated from respons
|
|||||||
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
|
`fingerprints` for `visual`/high-confidence `voiceprint` speakers only. Never
|
||||||
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
|
stores `Unknown_N`. Update policy (`02 §2.9`): start = store latest with
|
||||||
`overlap_confidence ≥ ~0.8`; consider per-name running mean later.
|
`overlap_confidence ≥ ~0.8`; consider per-name running mean later.
|
||||||
|
|
||||||
|
## 8. Recap outputs (`transcript.md`, `recap.{html,json}`)
|
||||||
|
After `speakers.json` is assembled, the recap phase renders the human-readable
|
||||||
|
deliverables: a `transcript.md` (one line per diarized utterance) and an HTML
|
||||||
|
`recap.html`, backed by a structured `recap.json`. The recap's topic/summary
|
||||||
|
content is generated by the **backend LLM** (`POST /v1/chat/completions`, Qwen3);
|
||||||
|
the app owns the rendering and the in-app **speaker-name editor**, which can rewrite
|
||||||
|
names across `speakers.json`, the transcript, and the recap after the fact.
|
||||||
|
|||||||
@@ -1,5 +1,11 @@
|
|||||||
# Build Plan — Ten31 Transcripts
|
# Build Plan — Ten31 Transcripts
|
||||||
|
|
||||||
|
> **Status: COMPLETE (historical).** Phases 0–6 shipped and the app is in daily
|
||||||
|
> use; a recap phase (transcript + HTML recap via the backend LLM) was added after
|
||||||
|
> this plan was written. Kept as the original build log and as the map for the
|
||||||
|
> "Phase N" references in the code comments. Forward-looking work lives in
|
||||||
|
> `ROADMAP.md`; current status in `AGENTS.md`.
|
||||||
|
|
||||||
Companion to docs 01–03. Phased plan for the Claude Code session, each phase with
|
Companion to docs 01–03. Phased plan for the Claude Code session, each phase with
|
||||||
a demoable milestone. Build in order; the risky/novel work (visual adapters) is
|
a demoable milestone. Build in order; the risky/novel work (visual adapters) is
|
||||||
isolated for independent tuning. The SparkControl contract is now known
|
isolated for independent tuning. The SparkControl contract is now known
|
||||||
|
|||||||
Reference in New Issue
Block a user