Add agent instructions; extract signing/backend secrets from source

- Add AGENTS.md (canonical) + CLAUDE.md symlink + ROADMAP.md - Move Apple Team ID from project.yml into a gitignored Config/Signing.xcconfig via configFiles; commit the .example template - Replace hardcoded backend host in AppSettings with a neutral placeholder + SPARK_BACKEND_URL env-var fallback - Scrub the Team ID, .local host, and raw LAN IP from README/docs - Ignore Config/Signing.xcconfig and .env
Make diarization chunk length configurable (Auto + presets)
2026-06-13 12:23:54 -05:00 · 2026-06-09 10:15:16 -05:00 · 2026-06-08 21:00:56 -05:00 · 2026-06-08 16:21:45 -05:00 · 2026-06-08 16:18:52 -05:00 · 2026-06-08 15:52:27 -05:00
26 changed files with 691 additions and 116 deletions
@@ -17,3 +17,10 @@ build/

 # Personal call screenshots / fixtures (faces, contact names) — never commit
 example-screenshots/
+
+# Local signing identity (Apple Team ID) — keep out of source; template is committed
+Config/Signing.xcconfig
+
+# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
+.env
+.env.local
@@ -0,0 +1,89 @@
+# AGENTS.md — Ten31 Transcripts
+
+Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
+
+## Stack (versions that matter)
+- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
+- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
+- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
+- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
+- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
+
+## Commands
+First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
+```
+cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM
+```
+Regenerate the Xcode project (after adding/removing/renaming any source file):
+```
+xcodegen generate
+```
+Build + run all tests:
+```
+DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
+  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
+  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
+```
+Run a **single** test (target/class/method):
+```
+DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
+  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
+  -destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
+  -only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
+```
+Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`.
+Build a standalone app and install/run it (Xcode does **not** need to stay open):
+```
+DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
+  -project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
+  -configuration Release -derivedDataPath /tmp/ten31-release build
+ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
+open /Applications/Ten31Transcripts.app
+```
+**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`.
+
+## Layout (day one)
+- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
+- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
+- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
+- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
+- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
+- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
+- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
+- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
+- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
+- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
+
+## Conventions
+- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
+- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
+- Commits: imperative mood, concise; authored by Grant. **No remote is configured** — confirm where to push (choosing one is tracked in `ROADMAP.md`). Branch before committing; never commit to `main` without asking.
+- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
+- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
+
+## Always
+- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
+- Run `xcodegen generate` after adding/removing/renaming source files.
+- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
+- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
+- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
+- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight).
+- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update.
+
+## Never
+- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable).
+- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs.
+- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names).
+- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
+- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
+- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
+- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
+- Never commit to `main` or force-push a shared branch; branch first and ask.
+
+## Current state
+Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs.
+- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
+- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
+- **Decided but not implemented:** none open (deferred items live in `ROADMAP.md`).
+- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
+- **Next:** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) confirm Settings → Your name = "Grant"; (3) record a fresh Meet call to validate the fix on a clean capture; (4) decide a git remote and push.
@@ -0,0 +1 @@
+AGENTS.md
@@ -0,0 +1,4 @@
+// Template for Config/Signing.xcconfig (which is gitignored).
+// Copy to Config/Signing.xcconfig and set your Apple Developer Team ID
+// (Xcode ▸ Settings ▸ Accounts, or `security find-identity -p codesigning -v`).
+DEVELOPMENT_TEAM = YOUR_APPLE_TEAM_ID
@@ -14,25 +14,30 @@ This repo is at **Phase 0** (scaffold, permissions, backend health check).
   ```sh
   brew install xcodegen
   ```
-3. **Generate the project:**
+3. **Set your signing team.** The Apple Team ID is kept out of source in a
+   gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
+   ```sh
+   cp Config/Signing.xcconfig.example Config/Signing.xcconfig   # then set DEVELOPMENT_TEAM
+   ```
+   `xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
+   team automatically — no manual selection. Keep the value stable so macOS
+   preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
+   not Xcode — `xcodegen generate` overwrites Xcode-side changes.
+4. **Generate the project:**
   ```sh
   xcodegen generate
   ```
   This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
-4. **Open it:**
+5. **Open it:**
   ```sh
   open Ten31Transcripts.xcodeproj
   ```
-5. Signing is preconfigured: `project.yml` sets `DEVELOPMENT_TEAM` to the free
-   personal team `BK4Y6CXN35` with automatic signing, so **Signing & Capabilities
-   should already show the team** — no manual selection needed. (If you ever sign
-   with a different Apple ID, update `DEVELOPMENT_TEAM` in `project.yml`, not in
-   Xcode — `xcodegen generate` overwrites Xcode-side changes.)
 6. Press **Run** (⌘R).

 > **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
 > Xcode reload the project. The signing team persists because it lives in
-> `project.yml`, so macOS permissions stay granted across rebuilds.
+> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
+> rebuilds.

 ## What Phase 0 does

@@ -64,5 +69,6 @@ Ten31TranscriptsTests/          # placeholder; real tests land in Phase 3

 - **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
  LAN-only tool that must observe other apps. Revisit only if distributing.
- The default backend host is `https://immense-voyage.local:62419` (editable in
-  Settings).
+- The backend host is a private LAN address — set it in **Settings**, or seed it
+  from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
+  placeholder (`https://your-spark-backend.local`).
@@ -0,0 +1,27 @@
+# ROADMAP — Ten31 Transcripts
+
+Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state.
+
+## Visual detection
+- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
+- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
+- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
+- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
+- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
+- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
+
+## Audio / speakers
+- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
+- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
+
+## App / UX
+- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
+- Constrain recap reading width on very wide windows (long line length in the summary band).
+
+## Tooling / repo
+- Decide and configure a git remote (none set); then push.
+- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
+- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
+
+## Deferred decisions
+- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
@@ -32,6 +32,16 @@ struct MeetAdapter: AppAdapter {
        // The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
        // ~0.44, below the 0.5 default — lower the threshold so the glow registers.
        config.colorSaturation = 0.35
+        // Meet's active cue is a thin BLUE (≈210°) ring + audio pill. Detect thin blue
+        // EDGES only, gated to blue: this rejects solid camera-off avatar tiles (orange
+        // ≈30°, magenta ≈340°), which otherwise read as "speaking" for the whole call
+        // and collapse every remote voice onto one name. Validated on real fixtures.
+        config.coloredBorderThinOnly = true
+        config.colorHueRange = 180...240
+        // Meet's blue border is faint; real rings measure ≈0.20–0.30 interior fill while
+        // solid tiles measure ≈0.36, so allow a higher fill here than the 0.2 default to
+        // recover real borders without readmitting the solid-tile false positives.
+        config.maxInteriorFill = 0.3
        config.tileExpandX = 3.0
        config.tileExpandY = 5.0
        self.analyzer = GridCallAnalyzer(config: config)
@@ -82,6 +82,10 @@ enum RecapRenderer {

    // MARK: - HTML

+    /// Mirror of recap-relay's job-output view: a header, an optional band of recap
+    /// cards (summary + takeaways), then a two-pane split — topic list on the left,
+    /// full diarized transcript on the right, click a topic to jump + highlight its
+    /// range. Self-contained (data baked in; the click handler is inline JS).
    static func html(file: SpeakersFile, result: RecapResult, title: String,
                     entries: [RecapAnalyzer.Entry]) -> String {
        let speakers = RecapAnalyzer.orderedSpeakerNames(entries)
@@ -91,66 +95,78 @@ enum RecapRenderer {
            return "<span class=\"chip\" style=\"background:\(c)\">\(esc(name))</span>"
        }

-        var body = ""
+        // Header: title, meta line, speaker legend.
        let sub = "\(esc(file.app)) · \(RecapAnalyzer.mmss(file.durationSec))"
            + (speakers.isEmpty ? "" : " · \(speakers.count) speaker\(speakers.count == 1 ? "" : "s")")
-        body += "<header><h1>\(esc(title))</h1><div class=\"sub\">\(sub)</div>"
+        var header = "<div class=\"header\"><div class=\"htext\"><h1>\(esc(title))</h1><div class=\"meta\">\(sub)</div></div>"
        if !speakers.isEmpty {
-            body += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
+            header += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
        }
-        body += "</header>"
+        header += "</div>"

+        // Recap cards band (summary + template takeaways).
+        var cards = ""
        if let x = result.extras {
            if !x.tldr.isEmpty {
-                body += card("Summary", "<p>\(esc(x.tldr))</p>"
+                cards += card("Summary", "<p>\(esc(x.tldr))</p>"
                    + (x.primarySpeakers.isEmpty ? "" : "<p class=\"muted\">Primary: \(x.primarySpeakers.map(esc).joined(separator: ", "))</p>"))
            }
            for section in x.sections where !section.isEmpty {
                switch section.kind {
                case .paragraph:
-                    body += card(section.title, "<p>\(esc(section.paragraph))</p>")
+                    cards += card(section.title, "<p>\(esc(section.paragraph))</p>")
                case .bullets:
-                    body += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
+                    cards += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
                case .items:
                    let lis = section.items.map { item -> String in
                        var s = "<li>\(esc(item.text))"
-                        if let who = item.who { s += " <strong>\(esc(who))</strong>" }
-                        if let note = item.note { s += " <span class=\"muted\">(\(esc(note)))</span>" }
-                        if let when = item.when { s += " <span class=\"ts\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
+                        if let who = item.who { s += " <span class=\"who\">\(esc(who))</span>" }
+                        if let note = item.note { s += " <span class=\"note\">(\(esc(note)))</span>" }
+                        if let when = item.when { s += " <span class=\"ts-badge\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
                        return s + "</li>"
                    }.joined()
-                    body += card(section.title, "<ul>\(lis)</ul>")
+                    cards += card(section.title, "<ul>\(lis)</ul>")
                }
            }
        }
+        let band = cards.isEmpty ? "" : "<div class=\"band\">\(cards)</div>"

-        if !result.sections.isEmpty {
-            var topics = ""
-            for (i, sec) in result.sections.enumerated() {
-                let range = entries.indices.contains(sec.startIndex) && entries.indices.contains(sec.endIndex)
-                    ? "<span class=\"ts\">\(RecapAnalyzer.mmss(entries[sec.startIndex].offset))–\(RecapAnalyzer.mmss(entries[sec.endIndex].end))</span>" : ""
-                topics += "<details class=\"topic\"><summary><span class=\"tnum\">\(i + 1)</span> \(esc(sec.title)) \(range)</summary>"
-                if !sec.summary.isEmpty { topics += "<p>\(esc(sec.summary))</p>" }
-                topics += "<div class=\"turns\">" + turnsHtml(sec, entries: entries, chip: chip) + "</div></details>"
+        // Left pane: topic cards (click to jump). data-start/data-end index entries.
+        var left = "<div class=\"left\">"
+        if result.sections.isEmpty {
+            left += "<div class=\"empty\">No topic sections.</div>"
+        } else {
+            for sec in result.sections {
+                let s = max(0, min(sec.startIndex, entries.count - 1))
+                let e = max(s, min(sec.endIndex, entries.count - 1))
+                let time = entries.indices.contains(s) && entries.indices.contains(e)
+                    ? "<span class=\"chunk-time\">\(RecapAnalyzer.mmss(entries[s].offset)) — \(RecapAnalyzer.mmss(entries[e].end))</span>" : ""
+                left += "<div class=\"chunk\" data-start=\"\(s)\" data-end=\"\(e)\" onclick=\"jump(this)\">"
+                    + "<div class=\"chunk-title\">\(esc(sec.title))\(time)</div>"
+                    + (sec.summary.isEmpty ? "" : "<div class=\"chunk-summary\">\(esc(sec.summary))</div>")
+                    + "</div>"
            }
-            body += card("Topics", topics)
        }
+        left += "</div>"

-        let full = entries.map { "<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>" }.joined()
-        body += "<details class=\"topic\" open><summary>Full Transcript</summary><div class=\"turns\">\(full)</div></details>"
+        // Right pane: full diarized transcript, one line per turn (id=entry-i).
+        var right = "<div class=\"right\">"
+        if entries.isEmpty {
+            right += "<div class=\"empty\">No transcript.</div>"
+        } else {
+            for (i, en) in entries.enumerated() {
+                right += "<div class=\"transcript-line\" id=\"entry-\(i)\">"
+                    + "<span class=\"ts-badge\">\(RecapAnalyzer.mmss(en.offset))</span>"
+                    + chip(en.speaker)
+                    + "<span class=\"ts-text\">\(esc(en.text))</span></div>"
+            }
+        }
+        right += "</div>"

+        let body = header + band + "<div class=\"split\">\(left)\(right)</div>"
        return htmlShell(title: esc(title), body: body)
    }

-    private static func turnsHtml(_ sec: TopicSection, entries: [RecapAnalyzer.Entry],
-                                  chip: (String) -> String) -> String {
-        guard sec.startIndex <= sec.endIndex, entries.indices.contains(sec.startIndex), entries.indices.contains(sec.endIndex)
-        else { return "" }
-        return entries[sec.startIndex...sec.endIndex].map {
-            "<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>"
-        }.joined()
-    }
-
    private static func card(_ title: String, _ inner: String) -> String {
        "<section class=\"card\"><h2>\(esc(title))</h2>\(inner)</section>"
    }
@@ -176,34 +192,63 @@ enum RecapRenderer {
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <title>\(title)</title>
        <style>
-        :root{--bg:#15171c;--card:#1d2026;--fg:#e6e8ec;--muted:#9aa0aa;--line:#2a2e36;--accent:#5b8def;}
+        :root{--bg:#0a0e1a;--panel:#111827;--panel-2:#1e293b;--line:#1e293b;--line-2:#334155;
+        --fg:#e2e8f0;--fg-dim:#94a3b8;--fg-faint:#64748b;--accent:#818cf8;--accent-soft:#a5b4fc;}
        *{box-sizing:border-box}
-        body{margin:0;background:var(--bg);color:var(--fg);font:15px/1.55 -apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;}
-        main{max-width:820px;margin:0 auto;padding:32px 20px 80px;}
-        header h1{margin:0 0 4px;font-size:24px}
-        .sub{color:var(--muted);font-size:13px}
-        .legend{margin-top:12px;display:flex;flex-wrap:wrap;gap:6px}
-        .chip{display:inline-block;padding:1px 8px;border-radius:10px;color:#fff;font-size:12px;font-weight:600}
-        .card{background:var(--card);border:1px solid var(--line);border-radius:12px;padding:16px 18px;margin-top:18px}
-        .card h2{margin:0 0 10px;font-size:16px;color:var(--accent)}
-        .muted{color:var(--muted)}
-        ul{margin:0;padding-left:18px} li{margin:4px 0}
-        ul.actions{list-style:none;padding-left:0}
-        .ts{color:var(--muted);font-variant-numeric:tabular-nums;font-size:12px;margin-right:4px}
-        blockquote{margin:0 0 12px;padding:8px 12px;border-left:3px solid var(--accent);background:#0e0f13;border-radius:0 8px 8px 0}
-        blockquote cite{display:block;color:var(--muted);font-size:12px;margin-top:4px;font-style:normal}
-        details.topic{border-top:1px solid var(--line);padding:10px 0}
-        details.topic > summary{cursor:pointer;font-weight:600;list-style:none}
-        details.topic > summary::-webkit-details-marker{display:none}
-        .tnum{display:inline-block;min-width:20px;color:var(--accent);font-weight:700}
-        .turns{margin-top:10px}
-        .turn{margin:6px 0;display:flex;gap:8px;align-items:baseline;flex-wrap:wrap}
-        .turn .txt{flex:1;min-width:60%}
-        @media print{body{background:#fff;color:#000}.card,blockquote{background:#fff;border-color:#ccc}details.topic{}.chip{border:1px solid #999}}
+        body{margin:0;background:var(--bg);color:var(--fg);min-height:100vh;
+        font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif;font-size:13px;line-height:1.55}
+        .header{padding:14px 24px;background:var(--panel);border-bottom:1px solid var(--line);
+        display:flex;align-items:center;gap:16px;flex-wrap:wrap}
+        .header .htext{min-width:0}
+        .header h1{margin:0;font-size:16px;font-weight:700;color:var(--fg)}
+        .header .meta{font-size:11px;color:var(--fg-faint);margin-top:2px;font-variant-numeric:tabular-nums}
+        .legend{margin-left:auto;display:flex;flex-wrap:wrap;gap:6px;justify-content:flex-end}
+        .chip{display:inline-block;padding:1px 8px;border-radius:999px;color:#fff;font-size:10px;font-weight:700;white-space:nowrap}
+        .band{padding:16px 24px;display:grid;gap:12px}
+        .card{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:14px 16px}
+        .card h2{margin:0 0 8px;font-size:11px;font-weight:700;text-transform:uppercase;letter-spacing:.04em;color:var(--accent-soft)}
+        .card p{margin:0 0 8px}
+        .card p:last-child{margin-bottom:0}
+        .card .muted{color:var(--fg-dim);font-size:12px}
+        .card ul{margin:0;padding-left:18px}
+        .card li{margin:5px 0;color:var(--fg)}
+        .card .who{color:var(--accent-soft);font-weight:600}
+        .card .note{color:var(--fg-faint)}
+        .split{display:flex;min-height:calc(100vh - 56px)}
+        .left{flex:0 0 42%;max-width:42%;border-right:1px solid var(--line);overflow-y:auto;padding:16px;background:var(--bg)}
+        .right{flex:1;min-width:0;overflow-y:auto;padding:16px;background:var(--panel)}
+        @media(max-width:900px){.split{flex-direction:column}.left,.right{flex:none;max-width:100%;border-right:none}
+        .left{border-bottom:1px solid var(--line)}}
+        .chunk{padding:12px 14px;margin-bottom:8px;background:var(--panel);border:1px solid var(--line);
+        border-radius:10px;cursor:pointer;transition:border-color .15s,background .15s}
+        .chunk:hover{border-color:var(--accent)}
+        .chunk.active{border-color:var(--accent);background:rgba(129,140,248,.06);box-shadow:0 2px 16px rgba(129,140,248,.10)}
+        .chunk-title{font-size:13px;font-weight:700;color:var(--fg);margin-bottom:4px}
+        .chunk-time{font-size:10px;color:var(--fg-faint);margin-left:6px;font-weight:500;font-family:"SF Mono",Menlo,monospace}
+        .chunk-summary{font-size:12px;color:var(--fg-dim);line-height:1.5}
+        .transcript-line{display:flex;gap:10px;padding:4px 8px;border-radius:6px;line-height:1.6;align-items:baseline;scroll-margin-top:16px}
+        .transcript-line.hl{background:rgba(129,140,248,.10)}
+        .ts-badge{flex:0 0 auto;font-family:"SF Mono",Menlo,monospace;font-size:11px;color:var(--accent-soft);min-width:52px}
+        .ts-text{flex:1;font-size:13px;color:var(--fg)}
+        .empty{padding:32px 16px;text-align:center;color:var(--fg-faint)}
+        .foot{padding:14px 24px;color:var(--fg-faint);font-size:11px;border-top:1px solid var(--line)}
+        @media print{body{background:#fff;color:#000}.header,.right,.left,.card,.chunk{background:#fff;border-color:#ccc}
+        .split{display:block}.left,.right{max-width:100%}.chip{border:1px solid #999}}
        </style></head>
-        <body><main>\(body)
-        <footer class="sub" style="margin-top:40px">Ten31 Transcripts · generated on-device</footer>
-        </main></body></html>
+        <body>\(body)
+        <div class="foot">Ten31 Transcripts · generated on-device</div>
+        <script>
+        function jump(el){
+          document.querySelectorAll('.chunk.active').forEach(function(x){x.classList.remove('active')});
+          el.classList.add('active');
+          var s=+el.dataset.start, e=+el.dataset.end;
+          var t=document.getElementById('entry-'+s);
+          if(t) t.scrollIntoView({behavior:'smooth',block:'start'});
+          document.querySelectorAll('.transcript-line.hl').forEach(function(x){x.classList.remove('hl')});
+          for(var i=s;i<=e;i++){var x=document.getElementById('entry-'+i); if(x) x.classList.add('hl');}
+        }
+        </script>
+        </body></html>
        """
    }
 }
@@ -0,0 +1,51 @@
+import Foundation
+
+/// How long each diarization *body* chunk should be. Smaller chunks keep fewer
+/// simultaneous speakers inside one window — Sortformer resolves at most ~4 speakers
+/// per chunk, and the dual-channel split already spends the local user on the mic
+/// track, so the system (remote) channel is what can saturate on a big call. The
+/// cost of going smaller: weaker cross-chunk voiceprints, more cross-chunk speaker
+/// splitting (the reconciler re-merges some), and more backend round-trips.
+enum ChunkMode: String, CaseIterable, Identifiable, Codable {
+    case auto, standard, largeGroup, fine
+
+    var id: String { rawValue }
+
+    var label: String {
+        switch self {
+        case .auto:       return "Auto (by call size)"
+        case .standard:   return "Standard · 2.5 min"
+        case .largeGroup: return "Large group · 60 sec"
+        case .fine:       return "Fine · 90 sec"
+        }
+    }
+
+    /// Fixed body length, or nil for `.auto` (resolved from the participant count).
+    var fixedBodySeconds: Double? {
+        switch self {
+        case .auto:       return nil
+        case .standard:   return 150
+        case .largeGroup: return 60
+        case .fine:       return 90
+        }
+    }
+
+    /// More than this many detected participants makes `.auto` pick the short body,
+    /// so one chunk is less likely to exceed Sortformer's ~4-speaker resolution.
+    static let autoLargeThreshold = 4
+
+    /// Resolve the body length in seconds. `.auto` drops to 60s when more than
+    /// `autoLargeThreshold` participants were detected, else uses the 2.5-min default;
+    /// with no count available (audio-only) it stays at the 2.5-min default.
+    func bodySeconds(participantCount: Int?) -> Double {
+        if let fixed = fixedBodySeconds { return fixed }
+        if let n = participantCount, n > Self.autoLargeThreshold { return 60 }
+        return 150
+    }
+
+    /// Overlap margin scaled to the body length (~12%, clamped 8…15s) so a 60s chunk
+    /// isn't dominated by a fixed 15s margin while a 2.5-min chunk keeps the full 15s.
+    static func overlapSeconds(forBody body: Double) -> Double {
+        max(8, min(15, (body * 0.12).rounded()))
+    }
+}
@@ -256,6 +256,9 @@ final class SessionController: ObservableObject {
    private func startVisual(t0Host: Double, generation: Int, recorder: AudioRecorder) async {
        guard let capture = pendingCapture else { return }   // manual recording → audio-only
        pendingCapture = nil
+        // Honor the per-app adapter switch: if the user turned this app's adapter off,
+        // skip screen-reading entirely and record audio-only (transcription still runs).
+        guard settings.adapterEnabled[capture.app.rawValue] ?? true else { return }
        guard let vc = VisualCapture(app: capture.app, bundleID: capture.bundleID,
                                     windowID: capture.windowID, t0Host: t0Host) else { return }
        // Register the live capture before the await so a quit (prepareForTermination)
@@ -375,12 +378,15 @@ final class SessionController: ObservableObject {
        let settings = self.settings
        let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
                                          skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
+        // Resolve the diarization chunk length from the setting; "Auto" uses the
+        // participant count the visual capture saw for this session.
+        let chunkSeconds = settings.chunk.bodySeconds(participantCount: Self.participantCount(in: inputs.folder))
        do {
            let speakers = try await pipeline.process(
                sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
                micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
                timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
-                systemHealthy: inputs.systemHealthy,
+                systemHealthy: inputs.systemHealthy, chunkSeconds: chunkSeconds,
                progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
            self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
            try Task.checkCancellation()
@@ -528,6 +534,16 @@ final class SessionController: ObservableObject {
        }
    }

+    /// Detected participant count from a session's visual timeline, for "Auto" chunk
+    /// sizing. Nil when there's no visual timeline (audio-only) so callers keep the
+    /// default body length. Counts everyone OCR'd on the call, not just who spoke.
+    private static func participantCount(in folder: URL) -> Int? {
+        guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
+              let vt = try? JSONDecoder().decode(VisualTimeline.self, from: data),
+              !vt.participants.isEmpty else { return nil }
+        return vt.participants.count
+    }
+
    /// The remote (vision) visual-timeline segments saved for a session, if any.
    private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
        guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
@@ -28,6 +28,7 @@ final class TranscriptPipeline {
                 selfSpans: [VADSpan],
                 selfName: String,
                 systemHealthy: Bool,
+                 chunkSeconds: Double = 150,
                 progress: ((Int, Int) async -> Void)? = nil) async throws -> SpeakersFile {
        let fm = FileManager.default
        let dual = systemHealthy
@@ -36,7 +37,12 @@ final class TranscriptPipeline {
        let duration = dual
            ? max(SessionPackager.duration(of: micURL), SessionPackager.duration(of: systemURL))
            : SessionPackager.duration(of: mixedURL)
-        let plan = SessionPackager.planChunks(durationSec: duration)
+        // Chunk to the requested body length; overlap and the single-chunk threshold
+        // scale with it (a 60s body shouldn't be cut by a fixed 15s margin or stay
+        // unchunked below the 2.5-min default threshold).
+        let overlap = ChunkMode.overlapSeconds(forBody: chunkSeconds)
+        let plan = SessionPackager.planChunks(durationSec: duration, chunkSeconds: chunkSeconds,
+                                              overlapSeconds: overlap, thresholdSec: chunkSeconds * 1.2)

        // Zero-duration / empty session → a valid empty speakers.json, no backend call.
        if plan.isEmpty || duration <= 0 {
@@ -50,13 +56,20 @@ final class TranscriptPipeline {
        try? fm.createDirectory(at: chunksDir, withIntermediateDirectories: true)
        defer { try? fm.removeItem(at: chunksDir) }   // cleanup on success OR throw

+        // Defensive: drop any visual span covering most of the call in one unbroken
+        // segment — the signature of a stuck/false active-speaker cue (e.g. a solid
+        // camera-off tile read as "speaking" the whole call). Such a span would
+        // dominate the backend's name attribution and collapse every voice onto one
+        // name. Also salvages sessions captured before the adapter fix landed.
+        let vis = Self.dropStuckSpans(timeline, duration: duration)
+
        // Start from stored voiceprints; accumulate this call's prints across chunks
        // for within-call unification (the store only persists high-confidence ones).
        var known = voiceprints.knownVoiceprints()
        var results: [TranscriptAssembler.ChunkResult] = []
        // Mono fallback needs self folded into the timeline; dual sends it separately.
-        let monoTimeline = dual ? timeline
-            : timeline + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
+        let monoTimeline = dual ? vis
+            : vis + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)

        for chunk in plan {
            try Task.checkCancellation()
@@ -70,7 +83,7 @@ final class TranscriptPipeline {
                try SessionPackager.sliceAudio(from: micURL, startSec: chunk.start, endSec: chunk.end, to: micChunk)
                try SessionPackager.sliceAudio(from: systemURL, startSec: chunk.start, endSec: chunk.end, to: sysChunk)
                guard fm.fileExists(atPath: micChunk.path), fm.fileExists(atPath: sysChunk.path) else { continue }
-                let timelineData = try SessionPackager.rebasedTimelineData(timeline, start: chunk.start, end: chunk.end)
+                let timelineData = try SessionPackager.rebasedTimelineData(vis, start: chunk.start, end: chunk.end)
                let selfVadData = try SessionPackager.rebasedSelfVadData(selfSpans, start: chunk.start, end: chunk.end)
                response = try await client.labelMergeDual(
                    micURL: micChunk, systemURL: sysChunk, selfName: selfName, selfVad: selfVadData,
@@ -113,4 +126,14 @@ final class TranscriptPipeline {
    static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
        spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
    }
+
+    /// Drop visual (vision-source) spans whose single unbroken duration covers at
+    /// least `maxFraction` of the whole call — no one legitimately speaks that long
+    /// without a break, so it's a stuck/false cue. Self spans (mic_vad) are kept.
+    static func dropStuckSpans(_ timeline: [VisualTimeline.Segment], duration: Double,
+                               maxFraction: Double = 0.6) -> [VisualTimeline.Segment] {
+        guard duration > 0 else { return timeline }
+        let limit = maxFraction * duration
+        return timeline.filter { $0.source != "vision" || ($0.end - $0.start) < limit }
+    }
 }
@@ -60,6 +60,15 @@ final class AppSettings: ObservableObject {
        didSet { defaults.set(reconcileSpeakers, forKey: Keys.reconcileSpeakers) }
    }

+    /// Diarization chunk length (raw value of `ChunkMode`). `.auto` shrinks chunks on
+    /// large calls so a window is less likely to exceed Sortformer's ~4-speaker cap.
+    @Published var chunkMode: String {
+        didSet { defaults.set(chunkMode, forKey: Keys.chunkMode) }
+    }
+
+    /// Typed accessor for `chunkMode`.
+    var chunk: ChunkMode { ChunkMode(rawValue: chunkMode) ?? .auto }
+
    /// User-editable recap templates (takeaways categories per meeting type).
    @Published var recapTemplates: [RecapTemplate] {
        didSet { persist(recapTemplates, forKey: Keys.recapTemplates) }
@@ -83,11 +92,19 @@ final class AppSettings: ObservableObject {

    private let defaults: UserDefaults

+    /// Neutral placeholder. The real (private LAN) backend host is never committed —
+    /// it's entered in Settings (persisted to UserDefaults) or seeded from the
+    /// `SPARK_BACKEND_URL` env var for dev/CI/harness runs.
+    static let defaultBackendURL = "https://your-spark-backend.local"
+
    init(defaults: UserDefaults = .standard) {
        self.defaults = defaults

+        // Precedence: a value the user saved in Settings wins; else the env var
+        // (handy when launching from Xcode/terminal); else the placeholder.
        self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL)
-            ?? "https://immense-voyage.local:62419"
+            ?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
+            ?? Self.defaultBackendURL

        self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true

@@ -104,6 +121,7 @@ final class AppSettings: ObservableObject {
        self.autoSendOnStop = defaults.object(forKey: Keys.autoSend) as? Bool ?? false
        self.recapEnabled = defaults.object(forKey: Keys.recapEnabled) as? Bool ?? true
        self.reconcileSpeakers = defaults.object(forKey: Keys.reconcileSpeakers) as? Bool ?? true
+        self.chunkMode = defaults.string(forKey: Keys.chunkMode) ?? ChunkMode.auto.rawValue

        let loaded = (defaults.data(forKey: Keys.recapTemplates))
            .flatMap { try? JSONDecoder().decode([RecapTemplate].self, from: $0) }
@@ -126,6 +144,7 @@ final class AppSettings: ObservableObject {
        static let autoSend = "autoSendOnStop"
        static let recapEnabled = "recapEnabled"
        static let reconcileSpeakers = "reconcileSpeakers"
+        static let chunkMode = "chunkMode"
        static let recapTemplates = "recapTemplates"
        static let defaultTemplate = "defaultTemplateId"
    }
@@ -31,6 +31,35 @@ final class EditorWindow {
    }
 }

+/// Hosts the app Settings in a standalone resizable window. Far roomier than the
+/// old in-popover NavigationLink, which cramped the form into the 320pt menu-bar
+/// panel and hid most controls below a non-obvious scroll.
+@MainActor
+final class SettingsWindow {
+    static let shared = SettingsWindow()
+    private var window: NSWindow?
+
+    func show(settings: AppSettings) {
+        if let window {
+            NSApp.activate(ignoringOtherApps: true)
+            window.makeKeyAndOrderFront(nil)
+            return
+        }
+        let w = NSWindow(
+            contentRect: NSRect(x: 0, y: 0, width: 520, height: 660),
+            styleMask: [.titled, .closable, .resizable, .miniaturizable],
+            backing: .buffered, defer: false)
+        w.title = "Settings"
+        w.isReleasedWhenClosed = false
+        w.center()
+        w.contentViewController = NSHostingController(
+            rootView: SettingsView().environmentObject(settings))
+        window = w
+        NSApp.activate(ignoringOtherApps: true)
+        w.makeKeyAndOrderFront(nil)
+    }
+}
+
 /// Hosts the recap-templates manager in its own resizable window.
@MainActor
 final class TemplatesWindow {
@@ -10,21 +10,19 @@ struct MenuBarView: View {
    @EnvironmentObject private var session: SessionController

    var body: some View {
-        NavigationStack {
-            VStack(alignment: .leading, spacing: 12) {
-                header
-                Divider()
-                recordingSection
-                Divider()
-                permissionsSection
-                Divider()
-                backendSection
-                Divider()
-                footer
-            }
-            .padding(14)
-            .frame(width: 320)
+        VStack(alignment: .leading, spacing: 12) {
+            header
+            Divider()
+            recordingSection
+            Divider()
+            permissionsSection
+            Divider()
+            backendSection
+            Divider()
+            footer
        }
+        .padding(14)
+        .frame(width: 320)
        .onAppear { permissions.refresh() }
        .task { await refreshHealth() }
    }
@@ -227,9 +225,7 @@ struct MenuBarView: View {

    private var footer: some View {
        HStack {
-            NavigationLink("Settings…") {
-                SettingsView()
-            }
+            Button("Settings…") { SettingsWindow.shared.show(settings: settings) }
            Spacer()
            Button("Quit") { NSApplication.shared.terminate(nil) }
        }
@@ -7,6 +7,21 @@ struct SettingsView: View {

    var body: some View {
        Form {
+            Section("Your name") {
+                TextField("Your name", text: $settings.selfName)
+                    .textFieldStyle(.roundedBorder)
+                if isDefaultName {
+                    Label("Still set to the default. Enter your real name so your own voice is labeled correctly — and so the AI never gives your name to someone else.",
+                          systemImage: "exclamationmark.triangle.fill")
+                        .font(.caption)
+                        .foregroundStyle(.orange)
+                } else {
+                    Text("Labels your microphone channel as you in every transcript, and reserves this name so it’s never assigned to another speaker.")
+                        .font(.caption)
+                        .foregroundStyle(.secondary)
+                }
+            }
+
            Section("SparkControl backend") {
                TextField("Base URL", text: $settings.backendBaseURL)
                    .textFieldStyle(.roundedBorder)
@@ -22,10 +37,14 @@ struct SettingsView: View {
            }

            Section("Transcription") {
-                TextField("Your name", text: $settings.selfName)
-                    .textFieldStyle(.roundedBorder)
                Toggle("Auto-send recordings to backend", isOn: $settings.autoSendOnStop)
                Toggle("Reconcile speakers (merge splits + name from content)", isOn: $settings.reconcileSpeakers)
+                Picker("Chunk length", selection: $settings.chunkMode) {
+                    ForEach(ChunkMode.allCases) { Text($0.label).tag($0.rawValue) }
+                }
+                Text("How finely audio is split for diarization. Shorter chunks keep fewer simultaneous speakers per window (the diarizer resolves ~4 at a time), at some cost to speed and voice matching. Auto uses 60-sec chunks when more than \(ChunkMode.autoLargeThreshold) people are detected on the call, else 2.5 min.")
+                    .font(.caption)
+                    .foregroundStyle(.secondary)
                Toggle("Build readable recap (topics + highlights)", isOn: $settings.recapEnabled)
                HStack {
                    Picker("Default recap template", selection: $settings.defaultTemplateId) {
@@ -33,7 +52,7 @@ struct SettingsView: View {
                    }
                    Button("Manage…") { TemplatesWindow.shared.show(settings: settings) }
                }
-                Text("Your name labels your mic channel. Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
+                Text("Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
                    .font(.caption)
                    .foregroundStyle(.secondary)
            }
@@ -50,7 +69,7 @@ struct SettingsView: View {
            }

            Section("Adapters") {
-                Text("Inert in Phase 0 — these toggles only persist for now.")
+                Text("Screen-reading for active-speaker cues. Turn one off to record that app audio-only — transcription still runs, but speakers aren’t identified from the screen.")
                    .font(.caption)
                    .foregroundStyle(.secondary)
                ForEach(AppSettings.adapterKeys, id: \.key) { adapter in
@@ -59,10 +78,17 @@ struct SettingsView: View {
            }
        }
        .formStyle(.grouped)
-        .frame(width: 320)
+        .frame(minWidth: 460, idealWidth: 520, maxWidth: .infinity,
+               minHeight: 520, idealHeight: 660, maxHeight: .infinity)
        .navigationTitle("Settings")
    }

+    /// True while the user still has the placeholder name — drives the inline nudge.
+    private var isDefaultName: Bool {
+        let n = settings.selfName.trimmingCharacters(in: .whitespacesAndNewlines)
+        return n.isEmpty || n.caseInsensitiveCompare("Me") == .orderedSame
+    }
+
    private func binding(for key: String) -> Binding<Bool> {
        Binding(
            get: { settings.adapterEnabled[key] ?? true },
@@ -120,6 +120,43 @@ struct FrameSampler {
        return points
    }

+    /// Grid-sampled saturated pixels that lie on a THIN structure (a non-saturated
+    /// pixel within `edgeGap` on some axis) — the coloured counterpart of
+    /// `thinWhitePoints`. This keeps a thin speaking BORDER/ring/pill but drops the
+    /// solid interior of a colour FILL (e.g. Meet's orange/magenta camera-off avatar
+    /// tiles), whose pixels are surrounded by the same colour. Pair with `hueRange`
+    /// to keep only the cue's colour (Meet's blue ring) and reject the thin edges a
+    /// solid tile still has against the background (orange/magenta boundaries).
+    func thinColoredPoints(threshold: Double = 0.35, minBrightness: Double = 60,
+                           hueRange: ClosedRange<Double>? = nil,
+                           edgeGap: Int = 6, gridStep: Int = 4) -> [CGPoint] {
+        func isCue(_ x: Int, _ y: Int) -> Bool {
+            guard x >= 0, x < width, y >= 0, y < height else { return false }
+            let i = (y * width + x) * 4
+            let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
+            let mx = max(r, g, b), mn = min(r, g, b)
+            let sat = mx > 0 ? (mx - mn) / mx : 0
+            guard sat > threshold, mx > minBrightness else { return false }
+            if let hr = hueRange { return hr.contains(Self.hueDegrees(r, g, b, mx, mn)) }
+            return true
+        }
+        var points: [CGPoint] = []
+        var y = edgeGap
+        while y < height - edgeGap {
+            var x = edgeGap
+            while x < width - edgeGap {
+                if isCue(x, y) {
+                    let thin = !isCue(x - edgeGap, y) || !isCue(x + edgeGap, y)
+                        || !isCue(x, y - edgeGap) || !isCue(x, y + edgeGap)
+                    if thin { points.append(CGPoint(x: x, y: y)) }
+                }
+                x += gridStep
+            }
+            y += gridStep
+        }
+        return points
+    }
+
    /// HSV hue in degrees (0…360) from RGB and its precomputed max/min channels.
    private static func hueDegrees(_ r: Double, _ g: Double, _ b: Double, _ mx: Double, _ mn: Double) -> Double {
        let d = mx - mn
@@ -35,11 +35,21 @@ struct GridCallAnalyzer {
        var colorSaturation: Double = 0.5
        var colorMinBrightness: Double = 60
        var colorHueRange: ClosedRange<Double>? = nil
+        // When true, the coloured highlight is detected from THIN edges only (drops
+        // solid colour fills like Meet's camera-off avatar tiles). Pair with a tight
+        // `colorHueRange` so a solid tile's thin background boundary is rejected too.
+        var coloredBorderThinOnly = false
        var minTextConfidence: Float = 0.3
        var maxNameLength = 40
        var minHighlightPoints = 6
        var highlightShareOfMax = 0.35
        var minRingSpan: Double = 60   // a speaking border spans a sizable box, not a speck
+        // A real active-speaker cue is a thin RING (border) with an EMPTY interior.
+        // A solid camera-off avatar tile (Meet's orange/magenta fill) or a screen-share
+        // fill is a filled BLOB — its highlight points spread through the interior. Reject
+        // a component when more than this fraction of its points fall in the central
+        // 60%×60% of its bbox (a hollow ring ≈ 0; a solid fill ≈ 0.36). Set ≥ 1 to disable.
+        var maxInteriorFill: Double = 0.2
    }

    var config = Config()
@@ -68,9 +78,13 @@ struct GridCallAnalyzer {
        // Highlight pixels: coloured (saturated) and/or white (thin near-white).
        var highlight: [CGPoint] = []
        if config.detectColoredBorder {
-            highlight += sampler.saturatedPoints(threshold: config.colorSaturation,
-                                                 minBrightness: config.colorMinBrightness,
-                                                 hueRange: config.colorHueRange)
+            highlight += config.coloredBorderThinOnly
+                ? sampler.thinColoredPoints(threshold: config.colorSaturation,
+                                            minBrightness: config.colorMinBrightness,
+                                            hueRange: config.colorHueRange)
+                : sampler.saturatedPoints(threshold: config.colorSaturation,
+                                          minBrightness: config.colorMinBrightness,
+                                          hueRange: config.colorHueRange)
        }
        if config.detectWhiteBorder { highlight += sampler.thinWhitePoints() }

@@ -89,7 +103,8 @@ struct GridCallAnalyzer {
        var speakingBBox: [Int: CGRect] = [:]      // tile index -> the ring bbox marking it speaking
        for ring in rings where ring.count >= config.minHighlightPoints {
            let bb = Self.boundingBox(ring)
-            guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue }   // a ring, not a blob
+            guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue }   // a ring, not a speck
+            guard Self.isHollow(ring, bbox: bb, maxInteriorFill: config.maxInteriorFill) else { continue }   // a ring, not a filled tile
            for (i, tile) in tiles.enumerated() where bb.contains(CGPoint(x: tile.textRect.midX, y: tile.textRect.midY)) {
                speakingBBox[i] = bb
            }
@@ -128,6 +143,18 @@ struct GridCallAnalyzer {
        return Array(groups.values)
    }

+    /// True if `pts` form a hollow ring (border) rather than a filled blob: at most
+    /// `maxInteriorFill` of the points fall in the central 60%×60% of `bbox`. A thin
+    /// border has an empty interior (≈ 0); a solid camera-off avatar tile or a
+    /// screen-share fill spreads points through the interior (≈ 0.36). Disabled when
+    /// `maxInteriorFill >= 1`.
+    static func isHollow(_ pts: [CGPoint], bbox: CGRect, maxInteriorFill: Double) -> Bool {
+        guard maxInteriorFill < 1, !pts.isEmpty else { return true }
+        let inner = bbox.insetBy(dx: bbox.width * 0.2, dy: bbox.height * 0.2)
+        let innerCount = pts.reduce(into: 0) { if inner.contains($1) { $0 += 1 } }
+        return Double(innerCount) / Double(pts.count) <= maxInteriorFill
+    }
+
    static func boundingBox(_ pts: [CGPoint]) -> CGRect {
        var minX = Double.greatestFiniteMagnitude, minY = minX, maxX = -minX, maxY = -minX
        for p in pts { minX = min(minX, p.x); minY = min(minY, p.y); maxX = max(maxX, p.x); maxY = max(maxY, p.y) }
@@ -166,7 +193,11 @@ struct GridCallAnalyzer {
    }

    private func cleaned(_ s: String) -> String {
+        // Trim whitespace and any trailing punctuation OCR tacks on, so "Mark." folds
+        // into "Mark" rather than becoming a separate phantom speaker.
        s.trimmingCharacters(in: .whitespacesAndNewlines)
+         .trimmingCharacters(in: CharacterSet(charactersIn: ".,;:·•-"))
+         .trimmingCharacters(in: .whitespacesAndNewlines)
    }

    /// True if `s` looks like a participant name label rather than UI chrome. Call
@@ -181,6 +212,14 @@ struct GridCallAnalyzer {
        if s.rangeOfCharacter(from: CharacterSet(charactersIn: "@:/\\|+*=<>#0123456789")) != nil {
            return false
        }
+        // Reject domain-like screen-share text (e.g. "WERUNBTC.COM", OCR'd "WERUNBTC.GOM"):
+        // a token whose final dotted segment is a 2–4 letter suffix. Real names don't end
+        // in a TLD; this keeps "Cait's Phone" and initials like "MO".
+        let lower = s.lowercased()
+        if let dot = lower.lastIndex(of: "."), lower.index(after: dot) < lower.endIndex {
+            let suffix = lower[lower.index(after: dot)...]
+            if (2...4).contains(suffix.count) && suffix.allSatisfy({ $0.isLetter }) { return false }
+        }
        let words = s.split(separator: " ")
        guard (1...3).contains(words.count) else { return false }
        let allowed = CharacterSet.letters.union(CharacterSet(charactersIn: "'.-"))
@@ -15,9 +15,15 @@ final class TimelineBuilder {
    private let closeFrames: Int
    private var aliases: [String: String] = [:]      // normalized variant -> canonical
    private var states: [String: NameState] = [:]
+    private var observed: Set<String> = []           // every tile name seen (speaking or not)
    private var lastFrameT: Double = 0
    private(set) var segments: [VisualTimeline.Segment] = []

+    /// Every distinct participant name the adapter has OCR'd, whether or not they were
+    /// ever detected speaking — the call-size signal (drives "Auto" chunk sizing and a
+    /// complete participant roster, since speaking-detection is intentionally sparse).
+    var observedNames: [String] { observed.sorted() }
+
    init(openFrames: Int = 2, closeFrames: Int = 2) {
        self.openFrames = max(1, openFrames)
        self.closeFrames = max(1, closeFrames)
@@ -34,6 +40,9 @@ final class TimelineBuilder {
    func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
        lastFrameT = t

+        // Record every tile seen (speaking or not) for the participant roster / call size.
+        for obs in observations where !obs.name.isEmpty { observed.insert(canonical(obs.name)) }
+
        // Best confidence per canonical name that is speaking this frame.
        var speaking: [String: Double] = [:]
        for obs in observations where obs.speaking && !obs.name.isEmpty {
@@ -93,9 +102,57 @@ final class TimelineBuilder {
            closeSegment(name: name, state: st)
            states[name]?.open = false
        }
+        segments = Self.canonicalizeByFrequency(segments)
        segments.sort { $0.start < $1.start }
    }

+    /// Fold rare OCR misspellings into the dominant name they're a typo of: a name with
+    /// little total time is remapped to a much longer-running name with the same initial
+    /// within a small edit distance (e.g. "Matt Odel"/"MattOdell"/"Mare" → "Matt Odell"/
+    /// "Mark"). Conservative by design — it won't merge two well-attested speakers, only
+    /// a transient variant into its clearly-dominant canonical. Pure/testable.
+    static func canonicalizeByFrequency(_ segs: [VisualTimeline.Segment],
+                                        minorMaxSec: Double = 5, dominanceRatio: Double = 8,
+                                        maxEdits: Int = 2) -> [VisualTimeline.Segment] {
+        var dur: [String: Double] = [:]
+        for s in segs { dur[s.name, default: 0] += s.end - s.start }
+        let names = Array(dur.keys)
+        var remap: [String: String] = [:]
+        for minor in names {
+            let md = dur[minor]!
+            guard md <= minorMaxSec, let mInit = minor.first else { continue }
+            var best: String?, bestDur = 0.0
+            for major in names where major != minor {
+                let Md = dur[major]!
+                guard Md >= md * dominanceRatio, Md > bestDur, major.first == mInit else { continue }
+                if levenshtein(minor.lowercased(), major.lowercased()) <= maxEdits { best = major; bestDur = Md }
+            }
+            if let b = best { remap[minor] = b }
+        }
+        guard !remap.isEmpty else { return segs }
+        return segs.map { s in
+            remap[s.name].map { VisualTimeline.Segment(start: s.start, end: s.end, name: $0,
+                                                       confidence: s.confidence, source: s.source) } ?? s
+        }
+    }
+
+    /// Levenshtein edit distance (small strings — names).
+    static func levenshtein(_ a: String, _ b: String) -> Int {
+        let x = Array(a), y = Array(b)
+        if x.isEmpty { return y.count }; if y.isEmpty { return x.count }
+        var prev = Array(0...y.count)
+        var cur = [Int](repeating: 0, count: y.count + 1)
+        for i in 1...x.count {
+            cur[0] = i
+            for j in 1...y.count {
+                cur[j] = x[i-1] == y[j-1] ? prev[j-1]
+                    : Swift.min(prev[j-1], prev[j], cur[j-1]) + 1
+            }
+            swap(&prev, &cur)
+        }
+        return prev[y.count]
+    }
+
    // MARK: - Internal

    private struct NameState {
@@ -75,7 +75,10 @@ final class VisualCapture {
        }, to: durationSec)

        let artifact = (vision + selfSegs).sorted { $0.start < $1.start }
-        let names = Set(artifact.map { $0.name })
+        // Roster = everyone OCR'd (speaking or not) ∪ the names that produced segments,
+        // so the participant count reflects true call size even when few people were
+        // detected speaking. Drives "Auto" chunk sizing downstream.
+        let names = Set(artifact.map { $0.name }).union(observer.participantNames())
        let participants = names.sorted().map {
            VisualTimeline.Participant(name: $0, isSelf: $0 == selfName ? true : nil, aliases: nil)
        }
@@ -114,6 +114,10 @@ final class VisualObserver: NSObject, SCStreamDelegate, SCStreamOutput {
        queue.sync { builder.mergeSelfSpans(spans, selfName: selfName) }
    }

+    /// Every distinct participant name OCR'd over the session (read on the builder's
+    /// queue; safe to call after `stop`).
+    func participantNames() -> [String] { queue.sync { builder.observedNames } }
+
    // MARK: - SCStreamOutput (on `queue`)

    func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
@@ -138,16 +138,37 @@ final class GridCallAnalyzerTests: XCTestCase {
    func testNameFilterAgainstRealMeetOCR() {
        // The exact strings OCR pulled from a real Meet session — only the first
        // group are participants; the rest are UI chrome that must NOT become speakers.
-        let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me"]
+        let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me", "Matt Odell"]
        let junk = ["11:43 AM | rvo-rmjg-rdq", "@ Embassy Er", "Admit 1 guest",
                    "Joined as grant.gilliam@gmail.com", "Others may see your video differently",
                    "Others might still see your full video.", "Your meeting's ready", "efforot",
                    "g* Add others", "g+ Add others", "meet.google.com/rvo-rmjg-rdq",
-                    "permission before they can join.", "the meeting", "G"]
+                    "permission before they can join.", "the meeting", "G",
+                    // Screen-share domain text OCR'd as a name (incl. OCR'd TLDs).
+                    "WERUNBTC.COM", "WERUNBTG.COM", "WERUNBTC.GOM"]
        for n in names { XCTAssertTrue(GridCallAnalyzer.isLikelyName(n), "should keep name: \(n)") }
        for j in junk { XCTAssertFalse(GridCallAnalyzer.isLikelyName(j), "should drop junk: \(j)") }
    }

+    func testHollowRingKeptFilledTileRejected() {
+        // A thin ring (border): points only on the perimeter of a 120×120 box.
+        var ring: [CGPoint] = []
+        for t in stride(from: 0.0, through: 120, by: 4) {
+            ring.append(.init(x: t, y: 0));   ring.append(.init(x: t, y: 120))
+            ring.append(.init(x: 0, y: t));   ring.append(.init(x: 120, y: t))
+        }
+        let rbb = GridCallAnalyzer.boundingBox(ring)
+        XCTAssertTrue(GridCallAnalyzer.isHollow(ring, bbox: rbb, maxInteriorFill: 0.2))
+
+        // A solid fill (camera-off avatar tile): points across the whole box.
+        var blob: [CGPoint] = []
+        for x in stride(from: 0.0, through: 120, by: 4) {
+            for y in stride(from: 0.0, through: 120, by: 4) { blob.append(.init(x: x, y: y)) }
+        }
+        let bbb = GridCallAnalyzer.boundingBox(blob)
+        XCTAssertFalse(GridCallAnalyzer.isHollow(blob, bbox: bbb, maxInteriorFill: 0.2))
+    }
+
    func testWhiteBorderDetectorIgnoresColouredBorder() {
        // Signal looks only for the white border, so a coloured (Meet) border must
        // not register as a Signal speaker.
@@ -37,6 +37,45 @@ final class Phase5Tests: XCTestCase {
        XCTAssertEqual(asm.speakersFile.segments[0].start, 152, accuracy: 0.01)
    }

+    func testChunkModeResolvesBodyLength() {
+        // Fixed presets ignore participant count.
+        XCTAssertEqual(ChunkMode.standard.bodySeconds(participantCount: 99), 150)
+        XCTAssertEqual(ChunkMode.largeGroup.bodySeconds(participantCount: 2), 60)
+        XCTAssertEqual(ChunkMode.fine.bodySeconds(participantCount: nil), 90)
+        // Auto: >4 detected → 60s, ≤4 → 150s, unknown → 150s.
+        XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 6), 60)
+        XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 4), 150)
+        XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: nil), 150)
+    }
+
+    func testChunkOverlapScalesWithBody() {
+        XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 150), 15)   // capped
+        XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 60), 8)     // floored (60*0.12=7.2→8)
+        XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 90), 11)    // 90*0.12=10.8→11
+    }
+
+    func testPlanChunksShortBodyChunksAShortCall() {
+        // A 100s call would be ONE chunk at the 2.5-min default, but at a 60s body it
+        // splits — so "Large group" actually re-chunks medium calls.
+        let c = SessionPackager.planChunks(durationSec: 100, chunkSeconds: 60,
+                                           overlapSeconds: 8, thresholdSec: 72)
+        XCTAssertEqual(c.count, 2)
+        XCTAssertEqual(c[0].bodyStart, 0); XCTAssertEqual(c[0].bodyEnd, 60)
+        XCTAssertEqual(c[1].bodyStart, 60); XCTAssertEqual(c[1].bodyEnd, 100)
+    }
+
+    func testDropStuckSpansRemovesWholeCallCue() {
+        let segs = [
+            VisualTimeline.Segment(start: 0, end: 1900, name: "Grant Gilliam", confidence: 1, source: "vision"), // stuck whole-call tile
+            VisualTimeline.Segment(start: 100, end: 130, name: "Matt Odell", confidence: 0.9, source: "vision"),  // real
+            VisualTimeline.Segment(start: 0, end: 1900, name: "Grant", confidence: 1, source: "mic_vad"),         // self span: keep
+        ]
+        let out = TranscriptPipeline.dropStuckSpans(segs, duration: 1976)
+        XCTAssertFalse(out.contains { $0.name == "Grant Gilliam" })   // 96% of call in one span → dropped
+        XCTAssertTrue(out.contains { $0.name == "Matt Odell" })       // short real span kept
+        XCTAssertTrue(out.contains { $0.source == "mic_vad" })        // self never dropped
+    }
+
    func testRebaseClipsAndRebases() throws {
        let segs = [
            VisualTimeline.Segment(start: 140, end: 160, name: "A", confidence: 0.9, source: "vision"),
@@ -11,6 +11,27 @@ final class VisualObserverTests: XCTestCase {
        (id, CGRect(x: 0, y: 0, width: w, height: h))
    }

+    func testCanonicalizeFoldsOcrMisspellingsIntoDominantName() {
+        func seg(_ s: Double, _ e: Double, _ n: String) -> VisualTimeline.Segment {
+            .init(start: s, end: e, name: n, confidence: 0.9, source: "vision")
+        }
+        let segs = [
+            seg(0, 1689, "Matt Odell"),   // dominant
+            seg(1700, 1702, "Matt Odel"), // OCR typo → fold
+            seg(1702, 1702.3, "MattOdell"), // dropped-space typo → fold
+            seg(0, 1155, "Mark"),         // dominant
+            seg(1200, 1201, "Mare"),      // OCR typo → fold into Mark
+            seg(0, 4, "Sidisel"),         // screen junk, no near-twin → kept (dropped later, no voice match)
+        ]
+        let names = Set(TimelineBuilder.canonicalizeByFrequency(segs).map { $0.name })
+        XCTAssertTrue(names.contains("Matt Odell"))
+        XCTAssertTrue(names.contains("Mark"))
+        XCTAssertFalse(names.contains("Matt Odel"))
+        XCTAssertFalse(names.contains("MattOdell"))
+        XCTAssertFalse(names.contains("Mare"))
+        XCTAssertTrue(names.contains("Sidisel"))
+    }
+
    func testPrefersMatchingWindowIDOverLargest() {
        // The Meet window (id 42) is NOT the largest — must still be chosen by ID.
        let candidates = [c(7, 1600, 1000), c(42, 800, 600), c(9, 1200, 900)]
@@ -135,10 +135,11 @@ Full request/response shapes, curl examples, limits, and error formats are in

 ## 7. Remaining open items (small)

-1. **Base URL — RESOLVED.** `https://192.168.1.72:62419`, also
-   `https://immense-voyage.local:62419` (prefer the `.local` form; it survives IP
-   changes). Ship the `.local` host as the default; keep it editable in settings.
-   Service-discovery at `GET /api/endpoints`.
+1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
+   over a raw IP, since it survives IP changes) — configured in Settings or via the
+   `SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
+   the default; keep it editable in settings. Service-discovery at
+   `GET /api/endpoints`.
 2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
   toggle if the user wants to eyeball the timeline first.
 3. **Retention** — keep the session folder after a successful hand-off, or prune
@@ -76,12 +76,13 @@ locally — the mic track is the user's known identity / VAD source.)

 ## 3. SparkControl — connection (real)

- **Base URL (confirmed):** `https://192.168.1.72:62419` — also reachable at
-  `https://immense-voyage.local:62419` (the `.local` form survives IP changes;
-  **prefer it as the default**). Service-discovery JSON is at
+- **Base URL (confirmed):** a private LAN host — a `.local` mDNS name (preferred
+  over a raw IP; it survives IP changes) — configured in Settings or via the
+  `SPARK_BACKEND_URL` env var, and **never committed**. Service-discovery JSON is at
  `GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio
-  endpoints in §4–§5 hang off this base. Still **make it a setting** so the host
-  can change, but ship `https://immense-voyage.local:62419` as the default.
+  endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
+  change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
+  the default.
 - **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
  delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
  the Start9 Root CA into the trust store.
@@ -7,17 +7,20 @@ options:
  createIntermediateGroups: true
  groupSortPosition: top

+# Signing identity (DEVELOPMENT_TEAM) is kept out of source in a gitignored xcconfig
+# so the Team ID isn't committed. Copy Config/Signing.xcconfig.example to
+# Config/Signing.xcconfig and set your team. Keeping the value stable is what makes
+# macOS TCC grants (Mic / Screen Recording / Accessibility) persist across rebuilds.
+configFiles:
+  Debug: Config/Signing.xcconfig
+  Release: Config/Signing.xcconfig
+
 settings:
  base:
    MARKETING_VERSION: "0.1.0"
    CURRENT_PROJECT_VERSION: "1"
    SWIFT_VERSION: "5.0"
    CODE_SIGN_STYLE: Automatic
-    # Grant's free personal team (cert OU). Baked in so `xcodegen generate` keeps
-    # a STABLE signing identity across regenerations — macOS ties TCC permission
-    # grants (Mic / Screen Recording / Accessibility) to this identity, so a
-    # stable team is what makes those permissions persist across rebuilds.
-    DEVELOPMENT_TEAM: "BK4Y6CXN35"

 targets:
  Ten31Transcripts:
Author	SHA1	Message	Date
Grant Gilliam	3dd02f8ce6	Add agent instructions; extract signing/backend secrets from source - Add AGENTS.md (canonical) + CLAUDE.md symlink + ROADMAP.md - Move Apple Team ID from project.yml into a gitignored Config/Signing.xcconfig via configFiles; commit the .example template - Replace hardcoded backend host in AppSettings with a neutral placeholder + SPARK_BACKEND_URL env-var fallback - Scrub the Team ID, .local host, and raw LAN IP from README/docs - Ignore Config/Signing.xcconfig and .env	2026-06-13 12:23:54 -05:00
Grant Gilliam	b0a4b50dac	Make diarization chunk length configurable (Auto + presets) Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.	2026-06-09 10:15:16 -05:00
Grant Gilliam	9a80c7c96e	Restyle recap.html to match recap-relay The recap output looked notably different from recap-relay (bigger 15px font, different palette, single-column cards). Match recap-relay's job-output view: slate/indigo palette (--bg #0a0e1a, --accent #818cf8), 13px base type with the Helvetica/Arial stack, monospace accent-soft timestamps, and the two-pane layout — topic list on the left, full diarized transcript on the right, click a topic to scroll + highlight its range (inline JS, data baked in; no backend fetch). The summary/takeaways render as recap-relay-style cards in a band above the split. markdown() output unchanged. 66 tests pass.	2026-06-08 21:00:56 -05:00
Grant Gilliam	18af17f26c	Drop stuck whole-call visual spans at processing time Defense-in-depth + salvage for sessions captured before the adapter fix: drop any vision-source span whose single unbroken duration covers ≥60% of the call. No one speaks that long without a break, so it's a stuck/false active-speaker cue that would dominate backend name attribution. Self (mic_vad) spans are never dropped. Applied to both the live and re-process paths. Test added; 66 pass.	2026-06-08 16:21:45 -05:00
Grant Gilliam	19ca85abd5	Fix Meet visual: reject solid avatar tiles + screen-share OCR Root cause of the "4 people → 2 speakers" Meet call: the colored-border detector read solid camera-off avatar tiles (orange "J", magenta "G") as active speakers for the ENTIRE call. Those whole-call phantom spans dominated backend name attribution, collapsing every remote voice onto one name — and the giant filled bbox also swallowed screen-share text (WERUNBTC.COM ×49) as a speaker. Validated against 9 real fixtures (harness over the real MeetAdapter): Detection: - FrameSampler.thinColoredPoints: coloured counterpart of thinWhitePoints — keeps thin border/ring/pill edges, drops solid colour fills. - GridCallAnalyzer.isHollow: reject a highlight component whose interior is filled (a solid tile) vs a hollow ring (a real border). Config.maxInteriorFill (0.2 default). - MeetAdapter: detect thin BLUE edges only (hue 180–240°, measured from the fixtures), maxInteriorFill 0.3 (real Meet rings ≈0.2–0.3, solid tiles ≈0.36). - Result on fixtures: John Arnold/Grant Gilliam (solid tiles) now NEVER detected; Matt Odell/Mark detected when their blue cue is present. Sparse but never wrong — correct for a naming hint over audio diarization. OCR name hygiene: - isLikelyName rejects domain-like screen-share text ("WERUNBTC.COM", OCR'd ".GOM"). - cleaned() strips trailing punctuation ("Mark." → "Mark"). - TimelineBuilder.canonicalizeByFrequency folds rare OCR misspellings into a dominant near-twin name ("Matt Odel"/"MattOdell" → "Matt Odell", "Mare" → "Mark"). Tests: hollow-ring, extended OCR filter, fuzzy-merge. 65 pass.	2026-06-08 16:18:52 -05:00
Grant Gilliam	98a198471c	Revert adjacent same-speaker segment collapse User found the merged transcript lines harder to read — too many sentences joined into one statement. Remove SpeakerReconciler.mergeAdjacent, its wiring in finishBackend (restore the no-LLM early return), and its tests. Back to one segment per diarized utterance.	2026-06-08 15:52:27 -05:00
Grant Gilliam	a273e768dc	Open Settings in its own roomy window, not the cramped popover Settings was a NavigationLink pushed inside the 320pt menu-bar popover, so the grouped form was cramped and most controls sat below a non-obvious scroll (and showed a confusing "< Settings" back arrow). Add SettingsWindow (same standalone NSWindow pattern as the Editor/Templates windows) and open it from the menu-bar "Settings…" button. Drop the now-unused NavigationStack and the 320pt cap so the form uses real window width with normal macOS spacing; window is resizable.	2026-06-08 13:57:24 -05:00
Grant Gilliam	c81bdc4cba	Make adapter toggles actually gate screen-reading The Settings "Adapters" toggles wrote adapterEnabled but nothing in the capture path ever read it, so flipping one off did nothing — and the caption still said "Inert in Phase 0". The adapters (Zoom/Teams/Signal/Meet) are all live now. SessionController.startVisual now skips visual capture when the detected app's adapter is toggled off (records audio-only; transcription still runs). Update the section caption to describe the real behavior.	2026-06-08 13:30:31 -05:00
Grant Gilliam	836b930083	Surface "Your name" as its own top Settings section The name field was the first row of the third "Transcription" section, below the fold — users couldn't find where to set their name (it's the setting that labels the mic channel and reserves the name so the LLM never assigns it to another speaker). Move it into a dedicated "Your name" section at the top of Settings, and show an orange nudge while it's still the placeholder "Me"/empty.	2026-06-08 13:25:33 -05:00
Grant Gilliam	217639f12e	Collapse adjacent same-speaker segments after reconciliation Fragments reabsorbed by smoothFragments (e.g. "I" then "need to switch it back") were left as separate transcript lines. Add SpeakerReconciler.mergeAdjacent to join consecutive same-speaker segments within 2s, concatenating their text. Wire it into SessionController.finishBackend AFTER reconcile/LLM naming. The collapse needs no LLM, so finishBackend no longer early-returns when the gateway has no chat model — it runs the collapse and re-persists speakers.json unconditionally, gating only the reconcile and recap passes on the model.	2026-06-08 13:19:05 -05:00