Compare commits

...

10 Commits

Author SHA1 Message Date
Grant Gilliam 3dd02f8ce6 Add agent instructions; extract signing/backend secrets from source
- Add AGENTS.md (canonical) + CLAUDE.md symlink + ROADMAP.md
- Move Apple Team ID from project.yml into a gitignored
  Config/Signing.xcconfig via configFiles; commit the .example template
- Replace hardcoded backend host in AppSettings with a neutral
  placeholder + SPARK_BACKEND_URL env-var fallback
- Scrub the Team ID, .local host, and raw LAN IP from README/docs
- Ignore Config/Signing.xcconfig and .env
2026-06-13 12:23:54 -05:00
Grant Gilliam b0a4b50dac Make diarization chunk length configurable (Auto + presets)
Chunk size was hardcoded at 2.5-min bodies. Add a Settings control:
Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer
simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large
calls, at some cost to speed and cross-chunk voice matching.

- ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4
  participants were detected, else 150s; overlap + single-chunk threshold scale
  with the body length.
- AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation.
- TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it.
- SessionController resolves the body from the setting + the session's detected
  participant count (visual_timeline participants) for both send + re-process.
- Participant roster now counts EVERY tile OCR'd, not just who spoke
  (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto
  call-size signal is meaningful even though speaking-detection is sparse.

Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.
2026-06-09 10:15:16 -05:00
Grant Gilliam 9a80c7c96e Restyle recap.html to match recap-relay
The recap output looked notably different from recap-relay (bigger 15px font,
different palette, single-column cards). Match recap-relay's job-output view:
slate/indigo palette (--bg #0a0e1a, --accent #818cf8), 13px base type with the
Helvetica/Arial stack, monospace accent-soft timestamps, and the two-pane layout
— topic list on the left, full diarized transcript on the right, click a topic to
scroll + highlight its range (inline JS, data baked in; no backend fetch). The
summary/takeaways render as recap-relay-style cards in a band above the split.
markdown() output unchanged. 66 tests pass.
2026-06-08 21:00:56 -05:00
Grant Gilliam 18af17f26c Drop stuck whole-call visual spans at processing time
Defense-in-depth + salvage for sessions captured before the adapter fix: drop any
vision-source span whose single unbroken duration covers ≥60% of the call. No one
speaks that long without a break, so it's a stuck/false active-speaker cue that
would dominate backend name attribution. Self (mic_vad) spans are never dropped.
Applied to both the live and re-process paths. Test added; 66 pass.
2026-06-08 16:21:45 -05:00
Grant Gilliam 19ca85abd5 Fix Meet visual: reject solid avatar tiles + screen-share OCR
Root cause of the "4 people → 2 speakers" Meet call: the colored-border detector
read solid camera-off avatar tiles (orange "J", magenta "G") as active speakers
for the ENTIRE call. Those whole-call phantom spans dominated backend name
attribution, collapsing every remote voice onto one name — and the giant filled
bbox also swallowed screen-share text (WERUNBTC.COM ×49) as a speaker.

Validated against 9 real fixtures (harness over the real MeetAdapter):

Detection:
- FrameSampler.thinColoredPoints: coloured counterpart of thinWhitePoints — keeps
  thin border/ring/pill edges, drops solid colour fills.
- GridCallAnalyzer.isHollow: reject a highlight component whose interior is filled
  (a solid tile) vs a hollow ring (a real border). Config.maxInteriorFill (0.2 default).
- MeetAdapter: detect thin BLUE edges only (hue 180–240°, measured from the
  fixtures), maxInteriorFill 0.3 (real Meet rings ≈0.2–0.3, solid tiles ≈0.36).
- Result on fixtures: John Arnold/Grant Gilliam (solid tiles) now NEVER detected;
  Matt Odell/Mark detected when their blue cue is present. Sparse but never wrong —
  correct for a naming hint over audio diarization.

OCR name hygiene:
- isLikelyName rejects domain-like screen-share text ("WERUNBTC.COM", OCR'd ".GOM").
- cleaned() strips trailing punctuation ("Mark." → "Mark").
- TimelineBuilder.canonicalizeByFrequency folds rare OCR misspellings into a
  dominant near-twin name ("Matt Odel"/"MattOdell" → "Matt Odell", "Mare" → "Mark").

Tests: hollow-ring, extended OCR filter, fuzzy-merge. 65 pass.
2026-06-08 16:18:52 -05:00
Grant Gilliam 98a198471c Revert adjacent same-speaker segment collapse
User found the merged transcript lines harder to read — too many sentences
joined into one statement. Remove SpeakerReconciler.mergeAdjacent, its wiring in
finishBackend (restore the no-LLM early return), and its tests. Back to one
segment per diarized utterance.
2026-06-08 15:52:27 -05:00
Grant Gilliam a273e768dc Open Settings in its own roomy window, not the cramped popover
Settings was a NavigationLink pushed inside the 320pt menu-bar popover, so the
grouped form was cramped and most controls sat below a non-obvious scroll (and
showed a confusing "< Settings" back arrow). Add SettingsWindow (same standalone
NSWindow pattern as the Editor/Templates windows) and open it from the menu-bar
"Settings…" button. Drop the now-unused NavigationStack and the 320pt cap so the
form uses real window width with normal macOS spacing; window is resizable.
2026-06-08 13:57:24 -05:00
Grant Gilliam c81bdc4cba Make adapter toggles actually gate screen-reading
The Settings "Adapters" toggles wrote adapterEnabled but nothing in the capture
path ever read it, so flipping one off did nothing — and the caption still said
"Inert in Phase 0". The adapters (Zoom/Teams/Signal/Meet) are all live now.

SessionController.startVisual now skips visual capture when the detected app's
adapter is toggled off (records audio-only; transcription still runs). Update the
section caption to describe the real behavior.
2026-06-08 13:30:31 -05:00
Grant Gilliam 836b930083 Surface "Your name" as its own top Settings section
The name field was the first row of the third "Transcription" section, below
the fold — users couldn't find where to set their name (it's the setting that
labels the mic channel and reserves the name so the LLM never assigns it to
another speaker). Move it into a dedicated "Your name" section at the top of
Settings, and show an orange nudge while it's still the placeholder "Me"/empty.
2026-06-08 13:25:33 -05:00
Grant Gilliam 217639f12e Collapse adjacent same-speaker segments after reconciliation
Fragments reabsorbed by smoothFragments (e.g. "I" then "need to switch it
back") were left as separate transcript lines. Add SpeakerReconciler.mergeAdjacent
to join consecutive same-speaker segments within 2s, concatenating their text.

Wire it into SessionController.finishBackend AFTER reconcile/LLM naming. The
collapse needs no LLM, so finishBackend no longer early-returns when the gateway
has no chat model — it runs the collapse and re-persists speakers.json
unconditionally, gating only the reconcile and recap passes on the model.
2026-06-08 13:19:05 -05:00
26 changed files with 691 additions and 116 deletions
+7
View File
@@ -17,3 +17,10 @@ build/
# Personal call screenshots / fixtures (faces, contact names) — never commit
example-screenshots/
# Local signing identity (Apple Team ID) — keep out of source; template is committed
Config/Signing.xcconfig
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
.env
.env.local
+89
View File
@@ -0,0 +1,89 @@
# AGENTS.md — Ten31 Transcripts
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
## Stack (versions that matter)
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
## Commands
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
```
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
```
Regenerate the Xcode project (after adding/removing/renaming any source file):
```
xcodegen generate
```
Build + run all tests:
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
```
Run a **single** test (target/class/method):
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
```
Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`.
Build a standalone app and install/run it (Xcode does **not** need to stay open):
```
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
-configuration Release -derivedDataPath /tmp/ten31-release build
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
open /Applications/Ten31Transcripts.app
```
**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`.
## Layout (day one)
- `Ten31Transcripts/App/``@main` entry + `AppDelegate`.
- `Ten31Transcripts/Session/``SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
- `Ten31Transcripts/Visual/``VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
- `Ten31Transcripts/Audio/``AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
- `Ten31Transcripts/Backend/``SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
- `Ten31Transcripts/Recap/``RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/``CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
## Conventions
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
- Commits: imperative mood, concise; authored by Grant. **No remote is configured** — confirm where to push (choosing one is tracked in `ROADMAP.md`). Branch before committing; never commit to `main` without asking.
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
## Always
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
- Run `xcodegen generate` after adding/removing/renaming source files.
- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight).
- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update.
## Never
- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable).
- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs.
- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names).
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
- Never commit to `main` or force-push a shared branch; branch first and ask.
## Current state
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs.
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
- **Decided but not implemented:** none open (deferred items live in `ROADMAP.md`).
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
- **Next:** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) confirm Settings → Your name = "Grant"; (3) record a fresh Meet call to validate the fix on a clean capture; (4) decide a git remote and push.
Symlink
+1
View File
@@ -0,0 +1 @@
AGENTS.md
+4
View File
@@ -0,0 +1,4 @@
// Template for Config/Signing.xcconfig (which is gitignored).
// Copy to Config/Signing.xcconfig and set your Apple Developer Team ID
// (Xcode ▸ Settings ▸ Accounts, or `security find-identity -p codesigning -v`).
DEVELOPMENT_TEAM = YOUR_APPLE_TEAM_ID
+16 -10
View File
@@ -14,25 +14,30 @@ This repo is at **Phase 0** (scaffold, permissions, backend health check).
```sh
brew install xcodegen
```
3. **Generate the project:**
3. **Set your signing team.** The Apple Team ID is kept out of source in a
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
```sh
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
```
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
team automatically — no manual selection. Keep the value stable so macOS
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
4. **Generate the project:**
```sh
xcodegen generate
```
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
4. **Open it:**
5. **Open it:**
```sh
open Ten31Transcripts.xcodeproj
```
5. Signing is preconfigured: `project.yml` sets `DEVELOPMENT_TEAM` to the free
personal team `BK4Y6CXN35` with automatic signing, so **Signing & Capabilities
should already show the team** — no manual selection needed. (If you ever sign
with a different Apple ID, update `DEVELOPMENT_TEAM` in `project.yml`, not in
Xcode — `xcodegen generate` overwrites Xcode-side changes.)
6. Press **Run** (⌘R).
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
> Xcode reload the project. The signing team persists because it lives in
> `project.yml`, so macOS permissions stay granted across rebuilds.
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
> rebuilds.
## What Phase 0 does
@@ -64,5 +69,6 @@ Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
LAN-only tool that must observe other apps. Revisit only if distributing.
- The default backend host is `https://immense-voyage.local:62419` (editable in
Settings).
- The backend host is a private LAN address — set it in **Settings**, or seed it
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
placeholder (`https://your-spark-backend.local`).
+27
View File
@@ -0,0 +1,27 @@
# ROADMAP — Ten31 Transcripts
Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state.
## Visual detection
- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
## Audio / speakers
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
## App / UX
- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
- Constrain recap reading width on very wide windows (long line length in the summary band).
## Tooling / repo
- Decide and configure a git remote (none set); then push.
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
## Deferred decisions
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
@@ -32,6 +32,16 @@ struct MeetAdapter: AppAdapter {
// The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
// ~0.44, below the 0.5 default lower the threshold so the glow registers.
config.colorSaturation = 0.35
// Meet's active cue is a thin BLUE (210°) ring + audio pill. Detect thin blue
// EDGES only, gated to blue: this rejects solid camera-off avatar tiles (orange
// 30°, magenta 340°), which otherwise read as "speaking" for the whole call
// and collapse every remote voice onto one name. Validated on real fixtures.
config.coloredBorderThinOnly = true
config.colorHueRange = 180...240
// Meet's blue border is faint; real rings measure 0.200.30 interior fill while
// solid tiles measure 0.36, so allow a higher fill here than the 0.2 default to
// recover real borders without readmitting the solid-tile false positives.
config.maxInteriorFill = 0.3
config.tileExpandX = 3.0
config.tileExpandY = 5.0
self.analyzer = GridCallAnalyzer(config: config)
+102 -57
View File
@@ -82,6 +82,10 @@ enum RecapRenderer {
// MARK: - HTML
/// Mirror of recap-relay's job-output view: a header, an optional band of recap
/// cards (summary + takeaways), then a two-pane split topic list on the left,
/// full diarized transcript on the right, click a topic to jump + highlight its
/// range. Self-contained (data baked in; the click handler is inline JS).
static func html(file: SpeakersFile, result: RecapResult, title: String,
entries: [RecapAnalyzer.Entry]) -> String {
let speakers = RecapAnalyzer.orderedSpeakerNames(entries)
@@ -91,66 +95,78 @@ enum RecapRenderer {
return "<span class=\"chip\" style=\"background:\(c)\">\(esc(name))</span>"
}
var body = ""
// Header: title, meta line, speaker legend.
let sub = "\(esc(file.app)) · \(RecapAnalyzer.mmss(file.durationSec))"
+ (speakers.isEmpty ? "" : " · \(speakers.count) speaker\(speakers.count == 1 ? "" : "s")")
body += "<header><h1>\(esc(title))</h1><div class=\"sub\">\(sub)</div>"
var header = "<div class=\"header\"><div class=\"htext\"><h1>\(esc(title))</h1><div class=\"meta\">\(sub)</div></div>"
if !speakers.isEmpty {
body += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
header += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
}
body += "</header>"
header += "</div>"
// Recap cards band (summary + template takeaways).
var cards = ""
if let x = result.extras {
if !x.tldr.isEmpty {
body += card("Summary", "<p>\(esc(x.tldr))</p>"
cards += card("Summary", "<p>\(esc(x.tldr))</p>"
+ (x.primarySpeakers.isEmpty ? "" : "<p class=\"muted\">Primary: \(x.primarySpeakers.map(esc).joined(separator: ", "))</p>"))
}
for section in x.sections where !section.isEmpty {
switch section.kind {
case .paragraph:
body += card(section.title, "<p>\(esc(section.paragraph))</p>")
cards += card(section.title, "<p>\(esc(section.paragraph))</p>")
case .bullets:
body += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
cards += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
case .items:
let lis = section.items.map { item -> String in
var s = "<li>\(esc(item.text))"
if let who = item.who { s += " <strong>\(esc(who))</strong>" }
if let note = item.note { s += " <span class=\"muted\">(\(esc(note)))</span>" }
if let when = item.when { s += " <span class=\"ts\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
if let who = item.who { s += " <span class=\"who\">\(esc(who))</span>" }
if let note = item.note { s += " <span class=\"note\">(\(esc(note)))</span>" }
if let when = item.when { s += " <span class=\"ts-badge\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
return s + "</li>"
}.joined()
body += card(section.title, "<ul>\(lis)</ul>")
cards += card(section.title, "<ul>\(lis)</ul>")
}
}
}
let band = cards.isEmpty ? "" : "<div class=\"band\">\(cards)</div>"
if !result.sections.isEmpty {
var topics = ""
for (i, sec) in result.sections.enumerated() {
let range = entries.indices.contains(sec.startIndex) && entries.indices.contains(sec.endIndex)
? "<span class=\"ts\">\(RecapAnalyzer.mmss(entries[sec.startIndex].offset))\(RecapAnalyzer.mmss(entries[sec.endIndex].end))</span>" : ""
topics += "<details class=\"topic\"><summary><span class=\"tnum\">\(i + 1)</span> \(esc(sec.title)) \(range)</summary>"
if !sec.summary.isEmpty { topics += "<p>\(esc(sec.summary))</p>" }
topics += "<div class=\"turns\">" + turnsHtml(sec, entries: entries, chip: chip) + "</div></details>"
// Left pane: topic cards (click to jump). data-start/data-end index entries.
var left = "<div class=\"left\">"
if result.sections.isEmpty {
left += "<div class=\"empty\">No topic sections.</div>"
} else {
for sec in result.sections {
let s = max(0, min(sec.startIndex, entries.count - 1))
let e = max(s, min(sec.endIndex, entries.count - 1))
let time = entries.indices.contains(s) && entries.indices.contains(e)
? "<span class=\"chunk-time\">\(RecapAnalyzer.mmss(entries[s].offset))\(RecapAnalyzer.mmss(entries[e].end))</span>" : ""
left += "<div class=\"chunk\" data-start=\"\(s)\" data-end=\"\(e)\" onclick=\"jump(this)\">"
+ "<div class=\"chunk-title\">\(esc(sec.title))\(time)</div>"
+ (sec.summary.isEmpty ? "" : "<div class=\"chunk-summary\">\(esc(sec.summary))</div>")
+ "</div>"
}
body += card("Topics", topics)
}
left += "</div>"
let full = entries.map { "<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>" }.joined()
body += "<details class=\"topic\" open><summary>Full Transcript</summary><div class=\"turns\">\(full)</div></details>"
// Right pane: full diarized transcript, one line per turn (id=entry-i).
var right = "<div class=\"right\">"
if entries.isEmpty {
right += "<div class=\"empty\">No transcript.</div>"
} else {
for (i, en) in entries.enumerated() {
right += "<div class=\"transcript-line\" id=\"entry-\(i)\">"
+ "<span class=\"ts-badge\">\(RecapAnalyzer.mmss(en.offset))</span>"
+ chip(en.speaker)
+ "<span class=\"ts-text\">\(esc(en.text))</span></div>"
}
}
right += "</div>"
let body = header + band + "<div class=\"split\">\(left)\(right)</div>"
return htmlShell(title: esc(title), body: body)
}
private static func turnsHtml(_ sec: TopicSection, entries: [RecapAnalyzer.Entry],
chip: (String) -> String) -> String {
guard sec.startIndex <= sec.endIndex, entries.indices.contains(sec.startIndex), entries.indices.contains(sec.endIndex)
else { return "" }
return entries[sec.startIndex...sec.endIndex].map {
"<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>"
}.joined()
}
private static func card(_ title: String, _ inner: String) -> String {
"<section class=\"card\"><h2>\(esc(title))</h2>\(inner)</section>"
}
@@ -176,34 +192,63 @@ enum RecapRenderer {
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>\(title)</title>
<style>
:root{--bg:#15171c;--card:#1d2026;--fg:#e6e8ec;--muted:#9aa0aa;--line:#2a2e36;--accent:#5b8def;}
:root{--bg:#0a0e1a;--panel:#111827;--panel-2:#1e293b;--line:#1e293b;--line-2:#334155;
--fg:#e2e8f0;--fg-dim:#94a3b8;--fg-faint:#64748b;--accent:#818cf8;--accent-soft:#a5b4fc;}
*{box-sizing:border-box}
body{margin:0;background:var(--bg);color:var(--fg);font:15px/1.55 -apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;}
main{max-width:820px;margin:0 auto;padding:32px 20px 80px;}
header h1{margin:0 0 4px;font-size:24px}
.sub{color:var(--muted);font-size:13px}
.legend{margin-top:12px;display:flex;flex-wrap:wrap;gap:6px}
.chip{display:inline-block;padding:1px 8px;border-radius:10px;color:#fff;font-size:12px;font-weight:600}
.card{background:var(--card);border:1px solid var(--line);border-radius:12px;padding:16px 18px;margin-top:18px}
.card h2{margin:0 0 10px;font-size:16px;color:var(--accent)}
.muted{color:var(--muted)}
ul{margin:0;padding-left:18px} li{margin:4px 0}
ul.actions{list-style:none;padding-left:0}
.ts{color:var(--muted);font-variant-numeric:tabular-nums;font-size:12px;margin-right:4px}
blockquote{margin:0 0 12px;padding:8px 12px;border-left:3px solid var(--accent);background:#0e0f13;border-radius:0 8px 8px 0}
blockquote cite{display:block;color:var(--muted);font-size:12px;margin-top:4px;font-style:normal}
details.topic{border-top:1px solid var(--line);padding:10px 0}
details.topic > summary{cursor:pointer;font-weight:600;list-style:none}
details.topic > summary::-webkit-details-marker{display:none}
.tnum{display:inline-block;min-width:20px;color:var(--accent);font-weight:700}
.turns{margin-top:10px}
.turn{margin:6px 0;display:flex;gap:8px;align-items:baseline;flex-wrap:wrap}
.turn .txt{flex:1;min-width:60%}
@media print{body{background:#fff;color:#000}.card,blockquote{background:#fff;border-color:#ccc}details.topic{}.chip{border:1px solid #999}}
body{margin:0;background:var(--bg);color:var(--fg);min-height:100vh;
font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif;font-size:13px;line-height:1.55}
.header{padding:14px 24px;background:var(--panel);border-bottom:1px solid var(--line);
display:flex;align-items:center;gap:16px;flex-wrap:wrap}
.header .htext{min-width:0}
.header h1{margin:0;font-size:16px;font-weight:700;color:var(--fg)}
.header .meta{font-size:11px;color:var(--fg-faint);margin-top:2px;font-variant-numeric:tabular-nums}
.legend{margin-left:auto;display:flex;flex-wrap:wrap;gap:6px;justify-content:flex-end}
.chip{display:inline-block;padding:1px 8px;border-radius:999px;color:#fff;font-size:10px;font-weight:700;white-space:nowrap}
.band{padding:16px 24px;display:grid;gap:12px}
.card{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:14px 16px}
.card h2{margin:0 0 8px;font-size:11px;font-weight:700;text-transform:uppercase;letter-spacing:.04em;color:var(--accent-soft)}
.card p{margin:0 0 8px}
.card p:last-child{margin-bottom:0}
.card .muted{color:var(--fg-dim);font-size:12px}
.card ul{margin:0;padding-left:18px}
.card li{margin:5px 0;color:var(--fg)}
.card .who{color:var(--accent-soft);font-weight:600}
.card .note{color:var(--fg-faint)}
.split{display:flex;min-height:calc(100vh - 56px)}
.left{flex:0 0 42%;max-width:42%;border-right:1px solid var(--line);overflow-y:auto;padding:16px;background:var(--bg)}
.right{flex:1;min-width:0;overflow-y:auto;padding:16px;background:var(--panel)}
@media(max-width:900px){.split{flex-direction:column}.left,.right{flex:none;max-width:100%;border-right:none}
.left{border-bottom:1px solid var(--line)}}
.chunk{padding:12px 14px;margin-bottom:8px;background:var(--panel);border:1px solid var(--line);
border-radius:10px;cursor:pointer;transition:border-color .15s,background .15s}
.chunk:hover{border-color:var(--accent)}
.chunk.active{border-color:var(--accent);background:rgba(129,140,248,.06);box-shadow:0 2px 16px rgba(129,140,248,.10)}
.chunk-title{font-size:13px;font-weight:700;color:var(--fg);margin-bottom:4px}
.chunk-time{font-size:10px;color:var(--fg-faint);margin-left:6px;font-weight:500;font-family:"SF Mono",Menlo,monospace}
.chunk-summary{font-size:12px;color:var(--fg-dim);line-height:1.5}
.transcript-line{display:flex;gap:10px;padding:4px 8px;border-radius:6px;line-height:1.6;align-items:baseline;scroll-margin-top:16px}
.transcript-line.hl{background:rgba(129,140,248,.10)}
.ts-badge{flex:0 0 auto;font-family:"SF Mono",Menlo,monospace;font-size:11px;color:var(--accent-soft);min-width:52px}
.ts-text{flex:1;font-size:13px;color:var(--fg)}
.empty{padding:32px 16px;text-align:center;color:var(--fg-faint)}
.foot{padding:14px 24px;color:var(--fg-faint);font-size:11px;border-top:1px solid var(--line)}
@media print{body{background:#fff;color:#000}.header,.right,.left,.card,.chunk{background:#fff;border-color:#ccc}
.split{display:block}.left,.right{max-width:100%}.chip{border:1px solid #999}}
</style></head>
<body><main>\(body)
<footer class="sub" style="margin-top:40px">Ten31 Transcripts · generated on-device</footer>
</main></body></html>
<body>\(body)
<div class="foot">Ten31 Transcripts · generated on-device</div>
<script>
function jump(el){
document.querySelectorAll('.chunk.active').forEach(function(x){x.classList.remove('active')});
el.classList.add('active');
var s=+el.dataset.start, e=+el.dataset.end;
var t=document.getElementById('entry-'+s);
if(t) t.scrollIntoView({behavior:'smooth',block:'start'});
document.querySelectorAll('.transcript-line.hl').forEach(function(x){x.classList.remove('hl')});
for(var i=s;i<=e;i++){var x=document.getElementById('entry-'+i); if(x) x.classList.add('hl');}
}
</script>
</body></html>
"""
}
}
+51
View File
@@ -0,0 +1,51 @@
import Foundation
/// How long each diarization *body* chunk should be. Smaller chunks keep fewer
/// simultaneous speakers inside one window Sortformer resolves at most ~4 speakers
/// per chunk, and the dual-channel split already spends the local user on the mic
/// track, so the system (remote) channel is what can saturate on a big call. The
/// cost of going smaller: weaker cross-chunk voiceprints, more cross-chunk speaker
/// splitting (the reconciler re-merges some), and more backend round-trips.
enum ChunkMode: String, CaseIterable, Identifiable, Codable {
case auto, standard, largeGroup, fine
var id: String { rawValue }
var label: String {
switch self {
case .auto: return "Auto (by call size)"
case .standard: return "Standard · 2.5 min"
case .largeGroup: return "Large group · 60 sec"
case .fine: return "Fine · 90 sec"
}
}
/// Fixed body length, or nil for `.auto` (resolved from the participant count).
var fixedBodySeconds: Double? {
switch self {
case .auto: return nil
case .standard: return 150
case .largeGroup: return 60
case .fine: return 90
}
}
/// More than this many detected participants makes `.auto` pick the short body,
/// so one chunk is less likely to exceed Sortformer's ~4-speaker resolution.
static let autoLargeThreshold = 4
/// Resolve the body length in seconds. `.auto` drops to 60s when more than
/// `autoLargeThreshold` participants were detected, else uses the 2.5-min default;
/// with no count available (audio-only) it stays at the 2.5-min default.
func bodySeconds(participantCount: Int?) -> Double {
if let fixed = fixedBodySeconds { return fixed }
if let n = participantCount, n > Self.autoLargeThreshold { return 60 }
return 150
}
/// Overlap margin scaled to the body length (~12%, clamped 815s) so a 60s chunk
/// isn't dominated by a fixed 15s margin while a 2.5-min chunk keeps the full 15s.
static func overlapSeconds(forBody body: Double) -> Double {
max(8, min(15, (body * 0.12).rounded()))
}
}
@@ -256,6 +256,9 @@ final class SessionController: ObservableObject {
private func startVisual(t0Host: Double, generation: Int, recorder: AudioRecorder) async {
guard let capture = pendingCapture else { return } // manual recording audio-only
pendingCapture = nil
// Honor the per-app adapter switch: if the user turned this app's adapter off,
// skip screen-reading entirely and record audio-only (transcription still runs).
guard settings.adapterEnabled[capture.app.rawValue] ?? true else { return }
guard let vc = VisualCapture(app: capture.app, bundleID: capture.bundleID,
windowID: capture.windowID, t0Host: t0Host) else { return }
// Register the live capture before the await so a quit (prepareForTermination)
@@ -375,12 +378,15 @@ final class SessionController: ObservableObject {
let settings = self.settings
let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
// Resolve the diarization chunk length from the setting; "Auto" uses the
// participant count the visual capture saw for this session.
let chunkSeconds = settings.chunk.bodySeconds(participantCount: Self.participantCount(in: inputs.folder))
do {
let speakers = try await pipeline.process(
sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
systemHealthy: inputs.systemHealthy,
systemHealthy: inputs.systemHealthy, chunkSeconds: chunkSeconds,
progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
try Task.checkCancellation()
@@ -528,6 +534,16 @@ final class SessionController: ObservableObject {
}
}
/// Detected participant count from a session's visual timeline, for "Auto" chunk
/// sizing. Nil when there's no visual timeline (audio-only) so callers keep the
/// default body length. Counts everyone OCR'd on the call, not just who spoke.
private static func participantCount(in folder: URL) -> Int? {
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
let vt = try? JSONDecoder().decode(VisualTimeline.self, from: data),
!vt.participants.isEmpty else { return nil }
return vt.participants.count
}
/// The remote (vision) visual-timeline segments saved for a session, if any.
private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
@@ -28,6 +28,7 @@ final class TranscriptPipeline {
selfSpans: [VADSpan],
selfName: String,
systemHealthy: Bool,
chunkSeconds: Double = 150,
progress: ((Int, Int) async -> Void)? = nil) async throws -> SpeakersFile {
let fm = FileManager.default
let dual = systemHealthy
@@ -36,7 +37,12 @@ final class TranscriptPipeline {
let duration = dual
? max(SessionPackager.duration(of: micURL), SessionPackager.duration(of: systemURL))
: SessionPackager.duration(of: mixedURL)
let plan = SessionPackager.planChunks(durationSec: duration)
// Chunk to the requested body length; overlap and the single-chunk threshold
// scale with it (a 60s body shouldn't be cut by a fixed 15s margin or stay
// unchunked below the 2.5-min default threshold).
let overlap = ChunkMode.overlapSeconds(forBody: chunkSeconds)
let plan = SessionPackager.planChunks(durationSec: duration, chunkSeconds: chunkSeconds,
overlapSeconds: overlap, thresholdSec: chunkSeconds * 1.2)
// Zero-duration / empty session a valid empty speakers.json, no backend call.
if plan.isEmpty || duration <= 0 {
@@ -50,13 +56,20 @@ final class TranscriptPipeline {
try? fm.createDirectory(at: chunksDir, withIntermediateDirectories: true)
defer { try? fm.removeItem(at: chunksDir) } // cleanup on success OR throw
// Defensive: drop any visual span covering most of the call in one unbroken
// segment the signature of a stuck/false active-speaker cue (e.g. a solid
// camera-off tile read as "speaking" the whole call). Such a span would
// dominate the backend's name attribution and collapse every voice onto one
// name. Also salvages sessions captured before the adapter fix landed.
let vis = Self.dropStuckSpans(timeline, duration: duration)
// Start from stored voiceprints; accumulate this call's prints across chunks
// for within-call unification (the store only persists high-confidence ones).
var known = voiceprints.knownVoiceprints()
var results: [TranscriptAssembler.ChunkResult] = []
// Mono fallback needs self folded into the timeline; dual sends it separately.
let monoTimeline = dual ? timeline
: timeline + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
let monoTimeline = dual ? vis
: vis + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
for chunk in plan {
try Task.checkCancellation()
@@ -70,7 +83,7 @@ final class TranscriptPipeline {
try SessionPackager.sliceAudio(from: micURL, startSec: chunk.start, endSec: chunk.end, to: micChunk)
try SessionPackager.sliceAudio(from: systemURL, startSec: chunk.start, endSec: chunk.end, to: sysChunk)
guard fm.fileExists(atPath: micChunk.path), fm.fileExists(atPath: sysChunk.path) else { continue }
let timelineData = try SessionPackager.rebasedTimelineData(timeline, start: chunk.start, end: chunk.end)
let timelineData = try SessionPackager.rebasedTimelineData(vis, start: chunk.start, end: chunk.end)
let selfVadData = try SessionPackager.rebasedSelfVadData(selfSpans, start: chunk.start, end: chunk.end)
response = try await client.labelMergeDual(
micURL: micChunk, systemURL: sysChunk, selfName: selfName, selfVad: selfVadData,
@@ -113,4 +126,14 @@ final class TranscriptPipeline {
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
}
/// Drop visual (vision-source) spans whose single unbroken duration covers at
/// least `maxFraction` of the whole call no one legitimately speaks that long
/// without a break, so it's a stuck/false cue. Self spans (mic_vad) are kept.
static func dropStuckSpans(_ timeline: [VisualTimeline.Segment], duration: Double,
maxFraction: Double = 0.6) -> [VisualTimeline.Segment] {
guard duration > 0 else { return timeline }
let limit = maxFraction * duration
return timeline.filter { $0.source != "vision" || ($0.end - $0.start) < limit }
}
}
+20 -1
View File
@@ -60,6 +60,15 @@ final class AppSettings: ObservableObject {
didSet { defaults.set(reconcileSpeakers, forKey: Keys.reconcileSpeakers) }
}
/// Diarization chunk length (raw value of `ChunkMode`). `.auto` shrinks chunks on
/// large calls so a window is less likely to exceed Sortformer's ~4-speaker cap.
@Published var chunkMode: String {
didSet { defaults.set(chunkMode, forKey: Keys.chunkMode) }
}
/// Typed accessor for `chunkMode`.
var chunk: ChunkMode { ChunkMode(rawValue: chunkMode) ?? .auto }
/// User-editable recap templates (takeaways categories per meeting type).
@Published var recapTemplates: [RecapTemplate] {
didSet { persist(recapTemplates, forKey: Keys.recapTemplates) }
@@ -83,11 +92,19 @@ final class AppSettings: ObservableObject {
private let defaults: UserDefaults
/// Neutral placeholder. The real (private LAN) backend host is never committed
/// it's entered in Settings (persisted to UserDefaults) or seeded from the
/// `SPARK_BACKEND_URL` env var for dev/CI/harness runs.
static let defaultBackendURL = "https://your-spark-backend.local"
init(defaults: UserDefaults = .standard) {
self.defaults = defaults
// Precedence: a value the user saved in Settings wins; else the env var
// (handy when launching from Xcode/terminal); else the placeholder.
self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL)
?? "https://immense-voyage.local:62419"
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
?? Self.defaultBackendURL
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
@@ -104,6 +121,7 @@ final class AppSettings: ObservableObject {
self.autoSendOnStop = defaults.object(forKey: Keys.autoSend) as? Bool ?? false
self.recapEnabled = defaults.object(forKey: Keys.recapEnabled) as? Bool ?? true
self.reconcileSpeakers = defaults.object(forKey: Keys.reconcileSpeakers) as? Bool ?? true
self.chunkMode = defaults.string(forKey: Keys.chunkMode) ?? ChunkMode.auto.rawValue
let loaded = (defaults.data(forKey: Keys.recapTemplates))
.flatMap { try? JSONDecoder().decode([RecapTemplate].self, from: $0) }
@@ -126,6 +144,7 @@ final class AppSettings: ObservableObject {
static let autoSend = "autoSendOnStop"
static let recapEnabled = "recapEnabled"
static let reconcileSpeakers = "reconcileSpeakers"
static let chunkMode = "chunkMode"
static let recapTemplates = "recapTemplates"
static let defaultTemplate = "defaultTemplateId"
}
+29
View File
@@ -31,6 +31,35 @@ final class EditorWindow {
}
}
/// Hosts the app Settings in a standalone resizable window. Far roomier than the
/// old in-popover NavigationLink, which cramped the form into the 320pt menu-bar
/// panel and hid most controls below a non-obvious scroll.
@MainActor
final class SettingsWindow {
static let shared = SettingsWindow()
private var window: NSWindow?
func show(settings: AppSettings) {
if let window {
NSApp.activate(ignoringOtherApps: true)
window.makeKeyAndOrderFront(nil)
return
}
let w = NSWindow(
contentRect: NSRect(x: 0, y: 0, width: 520, height: 660),
styleMask: [.titled, .closable, .resizable, .miniaturizable],
backing: .buffered, defer: false)
w.title = "Settings"
w.isReleasedWhenClosed = false
w.center()
w.contentViewController = NSHostingController(
rootView: SettingsView().environmentObject(settings))
window = w
NSApp.activate(ignoringOtherApps: true)
w.makeKeyAndOrderFront(nil)
}
}
/// Hosts the recap-templates manager in its own resizable window.
@MainActor
final class TemplatesWindow {
+13 -17
View File
@@ -10,21 +10,19 @@ struct MenuBarView: View {
@EnvironmentObject private var session: SessionController
var body: some View {
NavigationStack {
VStack(alignment: .leading, spacing: 12) {
header
Divider()
recordingSection
Divider()
permissionsSection
Divider()
backendSection
Divider()
footer
}
.padding(14)
.frame(width: 320)
VStack(alignment: .leading, spacing: 12) {
header
Divider()
recordingSection
Divider()
permissionsSection
Divider()
backendSection
Divider()
footer
}
.padding(14)
.frame(width: 320)
.onAppear { permissions.refresh() }
.task { await refreshHealth() }
}
@@ -227,9 +225,7 @@ struct MenuBarView: View {
private var footer: some View {
HStack {
NavigationLink("Settings…") {
SettingsView()
}
Button("Settings…") { SettingsWindow.shared.show(settings: settings) }
Spacer()
Button("Quit") { NSApplication.shared.terminate(nil) }
}
+31 -5
View File
@@ -7,6 +7,21 @@ struct SettingsView: View {
var body: some View {
Form {
Section("Your name") {
TextField("Your name", text: $settings.selfName)
.textFieldStyle(.roundedBorder)
if isDefaultName {
Label("Still set to the default. Enter your real name so your own voice is labeled correctly — and so the AI never gives your name to someone else.",
systemImage: "exclamationmark.triangle.fill")
.font(.caption)
.foregroundStyle(.orange)
} else {
Text("Labels your microphone channel as you in every transcript, and reserves this name so its never assigned to another speaker.")
.font(.caption)
.foregroundStyle(.secondary)
}
}
Section("SparkControl backend") {
TextField("Base URL", text: $settings.backendBaseURL)
.textFieldStyle(.roundedBorder)
@@ -22,10 +37,14 @@ struct SettingsView: View {
}
Section("Transcription") {
TextField("Your name", text: $settings.selfName)
.textFieldStyle(.roundedBorder)
Toggle("Auto-send recordings to backend", isOn: $settings.autoSendOnStop)
Toggle("Reconcile speakers (merge splits + name from content)", isOn: $settings.reconcileSpeakers)
Picker("Chunk length", selection: $settings.chunkMode) {
ForEach(ChunkMode.allCases) { Text($0.label).tag($0.rawValue) }
}
Text("How finely audio is split for diarization. Shorter chunks keep fewer simultaneous speakers per window (the diarizer resolves ~4 at a time), at some cost to speed and voice matching. Auto uses 60-sec chunks when more than \(ChunkMode.autoLargeThreshold) people are detected on the call, else 2.5 min.")
.font(.caption)
.foregroundStyle(.secondary)
Toggle("Build readable recap (topics + highlights)", isOn: $settings.recapEnabled)
HStack {
Picker("Default recap template", selection: $settings.defaultTemplateId) {
@@ -33,7 +52,7 @@ struct SettingsView: View {
}
Button("Manage…") { TemplatesWindow.shared.show(settings: settings) }
}
Text("Your name labels your mic channel. Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
Text("Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
.font(.caption)
.foregroundStyle(.secondary)
}
@@ -50,7 +69,7 @@ struct SettingsView: View {
}
Section("Adapters") {
Text("Inert in Phase 0 — these toggles only persist for now.")
Text("Screen-reading for active-speaker cues. Turn one off to record that app audio-only — transcription still runs, but speakers arent identified from the screen.")
.font(.caption)
.foregroundStyle(.secondary)
ForEach(AppSettings.adapterKeys, id: \.key) { adapter in
@@ -59,10 +78,17 @@ struct SettingsView: View {
}
}
.formStyle(.grouped)
.frame(width: 320)
.frame(minWidth: 460, idealWidth: 520, maxWidth: .infinity,
minHeight: 520, idealHeight: 660, maxHeight: .infinity)
.navigationTitle("Settings")
}
/// True while the user still has the placeholder name drives the inline nudge.
private var isDefaultName: Bool {
let n = settings.selfName.trimmingCharacters(in: .whitespacesAndNewlines)
return n.isEmpty || n.caseInsensitiveCompare("Me") == .orderedSame
}
private func binding(for key: String) -> Binding<Bool> {
Binding(
get: { settings.adapterEnabled[key] ?? true },
@@ -120,6 +120,43 @@ struct FrameSampler {
return points
}
/// Grid-sampled saturated pixels that lie on a THIN structure (a non-saturated
/// pixel within `edgeGap` on some axis) the coloured counterpart of
/// `thinWhitePoints`. This keeps a thin speaking BORDER/ring/pill but drops the
/// solid interior of a colour FILL (e.g. Meet's orange/magenta camera-off avatar
/// tiles), whose pixels are surrounded by the same colour. Pair with `hueRange`
/// to keep only the cue's colour (Meet's blue ring) and reject the thin edges a
/// solid tile still has against the background (orange/magenta boundaries).
func thinColoredPoints(threshold: Double = 0.35, minBrightness: Double = 60,
hueRange: ClosedRange<Double>? = nil,
edgeGap: Int = 6, gridStep: Int = 4) -> [CGPoint] {
func isCue(_ x: Int, _ y: Int) -> Bool {
guard x >= 0, x < width, y >= 0, y < height else { return false }
let i = (y * width + x) * 4
let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
let mx = max(r, g, b), mn = min(r, g, b)
let sat = mx > 0 ? (mx - mn) / mx : 0
guard sat > threshold, mx > minBrightness else { return false }
if let hr = hueRange { return hr.contains(Self.hueDegrees(r, g, b, mx, mn)) }
return true
}
var points: [CGPoint] = []
var y = edgeGap
while y < height - edgeGap {
var x = edgeGap
while x < width - edgeGap {
if isCue(x, y) {
let thin = !isCue(x - edgeGap, y) || !isCue(x + edgeGap, y)
|| !isCue(x, y - edgeGap) || !isCue(x, y + edgeGap)
if thin { points.append(CGPoint(x: x, y: y)) }
}
x += gridStep
}
y += gridStep
}
return points
}
/// HSV hue in degrees (0360) from RGB and its precomputed max/min channels.
private static func hueDegrees(_ r: Double, _ g: Double, _ b: Double, _ mx: Double, _ mn: Double) -> Double {
let d = mx - mn
+43 -4
View File
@@ -35,11 +35,21 @@ struct GridCallAnalyzer {
var colorSaturation: Double = 0.5
var colorMinBrightness: Double = 60
var colorHueRange: ClosedRange<Double>? = nil
// When true, the coloured highlight is detected from THIN edges only (drops
// solid colour fills like Meet's camera-off avatar tiles). Pair with a tight
// `colorHueRange` so a solid tile's thin background boundary is rejected too.
var coloredBorderThinOnly = false
var minTextConfidence: Float = 0.3
var maxNameLength = 40
var minHighlightPoints = 6
var highlightShareOfMax = 0.35
var minRingSpan: Double = 60 // a speaking border spans a sizable box, not a speck
// A real active-speaker cue is a thin RING (border) with an EMPTY interior.
// A solid camera-off avatar tile (Meet's orange/magenta fill) or a screen-share
// fill is a filled BLOB its highlight points spread through the interior. Reject
// a component when more than this fraction of its points fall in the central
// 60%×60% of its bbox (a hollow ring 0; a solid fill 0.36). Set 1 to disable.
var maxInteriorFill: Double = 0.2
}
var config = Config()
@@ -68,9 +78,13 @@ struct GridCallAnalyzer {
// Highlight pixels: coloured (saturated) and/or white (thin near-white).
var highlight: [CGPoint] = []
if config.detectColoredBorder {
highlight += sampler.saturatedPoints(threshold: config.colorSaturation,
minBrightness: config.colorMinBrightness,
hueRange: config.colorHueRange)
highlight += config.coloredBorderThinOnly
? sampler.thinColoredPoints(threshold: config.colorSaturation,
minBrightness: config.colorMinBrightness,
hueRange: config.colorHueRange)
: sampler.saturatedPoints(threshold: config.colorSaturation,
minBrightness: config.colorMinBrightness,
hueRange: config.colorHueRange)
}
if config.detectWhiteBorder { highlight += sampler.thinWhitePoints() }
@@ -89,7 +103,8 @@ struct GridCallAnalyzer {
var speakingBBox: [Int: CGRect] = [:] // tile index -> the ring bbox marking it speaking
for ring in rings where ring.count >= config.minHighlightPoints {
let bb = Self.boundingBox(ring)
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a blob
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a speck
guard Self.isHollow(ring, bbox: bb, maxInteriorFill: config.maxInteriorFill) else { continue } // a ring, not a filled tile
for (i, tile) in tiles.enumerated() where bb.contains(CGPoint(x: tile.textRect.midX, y: tile.textRect.midY)) {
speakingBBox[i] = bb
}
@@ -128,6 +143,18 @@ struct GridCallAnalyzer {
return Array(groups.values)
}
/// True if `pts` form a hollow ring (border) rather than a filled blob: at most
/// `maxInteriorFill` of the points fall in the central 60%×60% of `bbox`. A thin
/// border has an empty interior ( 0); a solid camera-off avatar tile or a
/// screen-share fill spreads points through the interior ( 0.36). Disabled when
/// `maxInteriorFill >= 1`.
static func isHollow(_ pts: [CGPoint], bbox: CGRect, maxInteriorFill: Double) -> Bool {
guard maxInteriorFill < 1, !pts.isEmpty else { return true }
let inner = bbox.insetBy(dx: bbox.width * 0.2, dy: bbox.height * 0.2)
let innerCount = pts.reduce(into: 0) { if inner.contains($1) { $0 += 1 } }
return Double(innerCount) / Double(pts.count) <= maxInteriorFill
}
static func boundingBox(_ pts: [CGPoint]) -> CGRect {
var minX = Double.greatestFiniteMagnitude, minY = minX, maxX = -minX, maxY = -minX
for p in pts { minX = min(minX, p.x); minY = min(minY, p.y); maxX = max(maxX, p.x); maxY = max(maxY, p.y) }
@@ -166,7 +193,11 @@ struct GridCallAnalyzer {
}
private func cleaned(_ s: String) -> String {
// Trim whitespace and any trailing punctuation OCR tacks on, so "Mark." folds
// into "Mark" rather than becoming a separate phantom speaker.
s.trimmingCharacters(in: .whitespacesAndNewlines)
.trimmingCharacters(in: CharacterSet(charactersIn: ".,;:·•-"))
.trimmingCharacters(in: .whitespacesAndNewlines)
}
/// True if `s` looks like a participant name label rather than UI chrome. Call
@@ -181,6 +212,14 @@ struct GridCallAnalyzer {
if s.rangeOfCharacter(from: CharacterSet(charactersIn: "@:/\\|+*=<>#0123456789")) != nil {
return false
}
// Reject domain-like screen-share text (e.g. "WERUNBTC.COM", OCR'd "WERUNBTC.GOM"):
// a token whose final dotted segment is a 24 letter suffix. Real names don't end
// in a TLD; this keeps "Cait's Phone" and initials like "MO".
let lower = s.lowercased()
if let dot = lower.lastIndex(of: "."), lower.index(after: dot) < lower.endIndex {
let suffix = lower[lower.index(after: dot)...]
if (2...4).contains(suffix.count) && suffix.allSatisfy({ $0.isLetter }) { return false }
}
let words = s.split(separator: " ")
guard (1...3).contains(words.count) else { return false }
let allowed = CharacterSet.letters.union(CharacterSet(charactersIn: "'.-"))
@@ -15,9 +15,15 @@ final class TimelineBuilder {
private let closeFrames: Int
private var aliases: [String: String] = [:] // normalized variant -> canonical
private var states: [String: NameState] = [:]
private var observed: Set<String> = [] // every tile name seen (speaking or not)
private var lastFrameT: Double = 0
private(set) var segments: [VisualTimeline.Segment] = []
/// Every distinct participant name the adapter has OCR'd, whether or not they were
/// ever detected speaking the call-size signal (drives "Auto" chunk sizing and a
/// complete participant roster, since speaking-detection is intentionally sparse).
var observedNames: [String] { observed.sorted() }
init(openFrames: Int = 2, closeFrames: Int = 2) {
self.openFrames = max(1, openFrames)
self.closeFrames = max(1, closeFrames)
@@ -34,6 +40,9 @@ final class TimelineBuilder {
func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
lastFrameT = t
// Record every tile seen (speaking or not) for the participant roster / call size.
for obs in observations where !obs.name.isEmpty { observed.insert(canonical(obs.name)) }
// Best confidence per canonical name that is speaking this frame.
var speaking: [String: Double] = [:]
for obs in observations where obs.speaking && !obs.name.isEmpty {
@@ -93,9 +102,57 @@ final class TimelineBuilder {
closeSegment(name: name, state: st)
states[name]?.open = false
}
segments = Self.canonicalizeByFrequency(segments)
segments.sort { $0.start < $1.start }
}
/// Fold rare OCR misspellings into the dominant name they're a typo of: a name with
/// little total time is remapped to a much longer-running name with the same initial
/// within a small edit distance (e.g. "Matt Odel"/"MattOdell"/"Mare" "Matt Odell"/
/// "Mark"). Conservative by design it won't merge two well-attested speakers, only
/// a transient variant into its clearly-dominant canonical. Pure/testable.
static func canonicalizeByFrequency(_ segs: [VisualTimeline.Segment],
minorMaxSec: Double = 5, dominanceRatio: Double = 8,
maxEdits: Int = 2) -> [VisualTimeline.Segment] {
var dur: [String: Double] = [:]
for s in segs { dur[s.name, default: 0] += s.end - s.start }
let names = Array(dur.keys)
var remap: [String: String] = [:]
for minor in names {
let md = dur[minor]!
guard md <= minorMaxSec, let mInit = minor.first else { continue }
var best: String?, bestDur = 0.0
for major in names where major != minor {
let Md = dur[major]!
guard Md >= md * dominanceRatio, Md > bestDur, major.first == mInit else { continue }
if levenshtein(minor.lowercased(), major.lowercased()) <= maxEdits { best = major; bestDur = Md }
}
if let b = best { remap[minor] = b }
}
guard !remap.isEmpty else { return segs }
return segs.map { s in
remap[s.name].map { VisualTimeline.Segment(start: s.start, end: s.end, name: $0,
confidence: s.confidence, source: s.source) } ?? s
}
}
/// Levenshtein edit distance (small strings names).
static func levenshtein(_ a: String, _ b: String) -> Int {
let x = Array(a), y = Array(b)
if x.isEmpty { return y.count }; if y.isEmpty { return x.count }
var prev = Array(0...y.count)
var cur = [Int](repeating: 0, count: y.count + 1)
for i in 1...x.count {
cur[0] = i
for j in 1...y.count {
cur[j] = x[i-1] == y[j-1] ? prev[j-1]
: Swift.min(prev[j-1], prev[j], cur[j-1]) + 1
}
swap(&prev, &cur)
}
return prev[y.count]
}
// MARK: - Internal
private struct NameState {
+4 -1
View File
@@ -75,7 +75,10 @@ final class VisualCapture {
}, to: durationSec)
let artifact = (vision + selfSegs).sorted { $0.start < $1.start }
let names = Set(artifact.map { $0.name })
// Roster = everyone OCR'd (speaking or not) the names that produced segments,
// so the participant count reflects true call size even when few people were
// detected speaking. Drives "Auto" chunk sizing downstream.
let names = Set(artifact.map { $0.name }).union(observer.participantNames())
let participants = names.sorted().map {
VisualTimeline.Participant(name: $0, isSelf: $0 == selfName ? true : nil, aliases: nil)
}
@@ -114,6 +114,10 @@ final class VisualObserver: NSObject, SCStreamDelegate, SCStreamOutput {
queue.sync { builder.mergeSelfSpans(spans, selfName: selfName) }
}
/// Every distinct participant name OCR'd over the session (read on the builder's
/// queue; safe to call after `stop`).
func participantNames() -> [String] { queue.sync { builder.observedNames } }
// MARK: - SCStreamOutput (on `queue`)
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
@@ -138,16 +138,37 @@ final class GridCallAnalyzerTests: XCTestCase {
func testNameFilterAgainstRealMeetOCR() {
// The exact strings OCR pulled from a real Meet session only the first
// group are participants; the rest are UI chrome that must NOT become speakers.
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me"]
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me", "Matt Odell"]
let junk = ["11:43 AM | rvo-rmjg-rdq", "@ Embassy Er", "Admit 1 guest",
"Joined as grant.gilliam@gmail.com", "Others may see your video differently",
"Others might still see your full video.", "Your meeting's ready", "efforot",
"g* Add others", "g+ Add others", "meet.google.com/rvo-rmjg-rdq",
"permission before they can join.", "the meeting", "G"]
"permission before they can join.", "the meeting", "G",
// Screen-share domain text OCR'd as a name (incl. OCR'd TLDs).
"WERUNBTC.COM", "WERUNBTG.COM", "WERUNBTC.GOM"]
for n in names { XCTAssertTrue(GridCallAnalyzer.isLikelyName(n), "should keep name: \(n)") }
for j in junk { XCTAssertFalse(GridCallAnalyzer.isLikelyName(j), "should drop junk: \(j)") }
}
func testHollowRingKeptFilledTileRejected() {
// A thin ring (border): points only on the perimeter of a 120×120 box.
var ring: [CGPoint] = []
for t in stride(from: 0.0, through: 120, by: 4) {
ring.append(.init(x: t, y: 0)); ring.append(.init(x: t, y: 120))
ring.append(.init(x: 0, y: t)); ring.append(.init(x: 120, y: t))
}
let rbb = GridCallAnalyzer.boundingBox(ring)
XCTAssertTrue(GridCallAnalyzer.isHollow(ring, bbox: rbb, maxInteriorFill: 0.2))
// A solid fill (camera-off avatar tile): points across the whole box.
var blob: [CGPoint] = []
for x in stride(from: 0.0, through: 120, by: 4) {
for y in stride(from: 0.0, through: 120, by: 4) { blob.append(.init(x: x, y: y)) }
}
let bbb = GridCallAnalyzer.boundingBox(blob)
XCTAssertFalse(GridCallAnalyzer.isHollow(blob, bbox: bbb, maxInteriorFill: 0.2))
}
func testWhiteBorderDetectorIgnoresColouredBorder() {
// Signal looks only for the white border, so a coloured (Meet) border must
// not register as a Signal speaker.
+39
View File
@@ -37,6 +37,45 @@ final class Phase5Tests: XCTestCase {
XCTAssertEqual(asm.speakersFile.segments[0].start, 152, accuracy: 0.01)
}
func testChunkModeResolvesBodyLength() {
// Fixed presets ignore participant count.
XCTAssertEqual(ChunkMode.standard.bodySeconds(participantCount: 99), 150)
XCTAssertEqual(ChunkMode.largeGroup.bodySeconds(participantCount: 2), 60)
XCTAssertEqual(ChunkMode.fine.bodySeconds(participantCount: nil), 90)
// Auto: >4 detected 60s, 4 150s, unknown 150s.
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 6), 60)
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 4), 150)
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: nil), 150)
}
func testChunkOverlapScalesWithBody() {
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 150), 15) // capped
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 60), 8) // floored (60*0.12=7.28)
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 90), 11) // 90*0.12=10.811
}
func testPlanChunksShortBodyChunksAShortCall() {
// A 100s call would be ONE chunk at the 2.5-min default, but at a 60s body it
// splits so "Large group" actually re-chunks medium calls.
let c = SessionPackager.planChunks(durationSec: 100, chunkSeconds: 60,
overlapSeconds: 8, thresholdSec: 72)
XCTAssertEqual(c.count, 2)
XCTAssertEqual(c[0].bodyStart, 0); XCTAssertEqual(c[0].bodyEnd, 60)
XCTAssertEqual(c[1].bodyStart, 60); XCTAssertEqual(c[1].bodyEnd, 100)
}
func testDropStuckSpansRemovesWholeCallCue() {
let segs = [
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant Gilliam", confidence: 1, source: "vision"), // stuck whole-call tile
VisualTimeline.Segment(start: 100, end: 130, name: "Matt Odell", confidence: 0.9, source: "vision"), // real
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant", confidence: 1, source: "mic_vad"), // self span: keep
]
let out = TranscriptPipeline.dropStuckSpans(segs, duration: 1976)
XCTAssertFalse(out.contains { $0.name == "Grant Gilliam" }) // 96% of call in one span dropped
XCTAssertTrue(out.contains { $0.name == "Matt Odell" }) // short real span kept
XCTAssertTrue(out.contains { $0.source == "mic_vad" }) // self never dropped
}
func testRebaseClipsAndRebases() throws {
let segs = [
VisualTimeline.Segment(start: 140, end: 160, name: "A", confidence: 0.9, source: "vision"),
@@ -11,6 +11,27 @@ final class VisualObserverTests: XCTestCase {
(id, CGRect(x: 0, y: 0, width: w, height: h))
}
func testCanonicalizeFoldsOcrMisspellingsIntoDominantName() {
func seg(_ s: Double, _ e: Double, _ n: String) -> VisualTimeline.Segment {
.init(start: s, end: e, name: n, confidence: 0.9, source: "vision")
}
let segs = [
seg(0, 1689, "Matt Odell"), // dominant
seg(1700, 1702, "Matt Odel"), // OCR typo fold
seg(1702, 1702.3, "MattOdell"), // dropped-space typo fold
seg(0, 1155, "Mark"), // dominant
seg(1200, 1201, "Mare"), // OCR typo fold into Mark
seg(0, 4, "Sidisel"), // screen junk, no near-twin kept (dropped later, no voice match)
]
let names = Set(TimelineBuilder.canonicalizeByFrequency(segs).map { $0.name })
XCTAssertTrue(names.contains("Matt Odell"))
XCTAssertTrue(names.contains("Mark"))
XCTAssertFalse(names.contains("Matt Odel"))
XCTAssertFalse(names.contains("MattOdell"))
XCTAssertFalse(names.contains("Mare"))
XCTAssertTrue(names.contains("Sidisel"))
}
func testPrefersMatchingWindowIDOverLargest() {
// The Meet window (id 42) is NOT the largest must still be chosen by ID.
let candidates = [c(7, 1600, 1000), c(42, 800, 600), c(9, 1200, 900)]
+5 -4
View File
@@ -135,10 +135,11 @@ Full request/response shapes, curl examples, limits, and error formats are in
## 7. Remaining open items (small)
1. **Base URL — RESOLVED.** `https://192.168.1.72:62419`, also
`https://immense-voyage.local:62419` (prefer the `.local` form; it survives IP
changes). Ship the `.local` host as the default; keep it editable in settings.
Service-discovery at `GET /api/endpoints`.
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
over a raw IP, since it survives IP changes) — configured in Settings or via the
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
the default; keep it editable in settings. Service-discovery at
`GET /api/endpoints`.
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
toggle if the user wants to eyeball the timeline first.
3. **Retention** — keep the session folder after a successful hand-off, or prune
+6 -5
View File
@@ -76,12 +76,13 @@ locally — the mic track is the user's known identity / VAD source.)
## 3. SparkControl — connection (real)
- **Base URL (confirmed):** `https://192.168.1.72:62419` — also reachable at
`https://immense-voyage.local:62419` (the `.local` form survives IP changes;
**prefer it as the default**). Service-discovery JSON is at
- **Base URL (confirmed):** a private LAN host — a `.local` mDNS name (preferred
over a raw IP; it survives IP changes) — configured in Settings or via the
`SPARK_BACKEND_URL` env var, and **never committed**. Service-discovery JSON is at
`GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio
endpoints in §4–§5 hang off this base. Still **make it a setting** so the host
can change, but ship `https://immense-voyage.local:62419` as the default.
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
the default.
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
the Start9 Root CA into the trust store.
+8 -5
View File
@@ -7,17 +7,20 @@ options:
createIntermediateGroups: true
groupSortPosition: top
# Signing identity (DEVELOPMENT_TEAM) is kept out of source in a gitignored xcconfig
# so the Team ID isn't committed. Copy Config/Signing.xcconfig.example to
# Config/Signing.xcconfig and set your team. Keeping the value stable is what makes
# macOS TCC grants (Mic / Screen Recording / Accessibility) persist across rebuilds.
configFiles:
Debug: Config/Signing.xcconfig
Release: Config/Signing.xcconfig
settings:
base:
MARKETING_VERSION: "0.1.0"
CURRENT_PROJECT_VERSION: "1"
SWIFT_VERSION: "5.0"
CODE_SIGN_STYLE: Automatic
# Grant's free personal team (cert OU). Baked in so `xcodegen generate` keeps
# a STABLE signing identity across regenerations — macOS ties TCC permission
# grants (Mic / Screen Recording / Accessibility) to this identity, so a
# stable team is what makes those permissions persist across rebuilds.
DEVELOPMENT_TEAM: "BK4Y6CXN35"
targets:
Ten31Transcripts: