Compare commits
10 Commits
b4bf88ed2c
...
3dd02f8ce6
| Author | SHA1 | Date | |
|---|---|---|---|
| 3dd02f8ce6 | |||
| b0a4b50dac | |||
| 9a80c7c96e | |||
| 18af17f26c | |||
| 19ca85abd5 | |||
| 98a198471c | |||
| a273e768dc | |||
| c81bdc4cba | |||
| 836b930083 | |||
| 217639f12e |
@@ -17,3 +17,10 @@ build/
|
||||
|
||||
# Personal call screenshots / fixtures (faces, contact names) — never commit
|
||||
example-screenshots/
|
||||
|
||||
# Local signing identity (Apple Team ID) — keep out of source; template is committed
|
||||
Config/Signing.xcconfig
|
||||
|
||||
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
|
||||
.env
|
||||
.env.local
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
# AGENTS.md — Ten31 Transcripts
|
||||
|
||||
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
|
||||
|
||||
## Stack (versions that matter)
|
||||
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
|
||||
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
|
||||
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
|
||||
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
|
||||
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
||||
|
||||
## Commands
|
||||
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
|
||||
```
|
||||
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||
```
|
||||
Regenerate the Xcode project (after adding/removing/renaming any source file):
|
||||
```
|
||||
xcodegen generate
|
||||
```
|
||||
Build + run all tests:
|
||||
```
|
||||
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
|
||||
```
|
||||
Run a **single** test (target/class/method):
|
||||
```
|
||||
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
|
||||
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
|
||||
```
|
||||
Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`.
|
||||
Build a standalone app and install/run it (Xcode does **not** need to stay open):
|
||||
```
|
||||
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
|
||||
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||
-configuration Release -derivedDataPath /tmp/ten31-release build
|
||||
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
|
||||
open /Applications/Ten31Transcripts.app
|
||||
```
|
||||
**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`.
|
||||
|
||||
## Layout (day one)
|
||||
- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
|
||||
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
|
||||
- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
|
||||
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
|
||||
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
|
||||
- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
|
||||
- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
|
||||
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
||||
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
|
||||
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
|
||||
|
||||
## Conventions
|
||||
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
|
||||
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
|
||||
- Commits: imperative mood, concise; authored by Grant. **No remote is configured** — confirm where to push (choosing one is tracked in `ROADMAP.md`). Branch before committing; never commit to `main` without asking.
|
||||
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
|
||||
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
|
||||
|
||||
## Always
|
||||
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
|
||||
- Run `xcodegen generate` after adding/removing/renaming source files.
|
||||
- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
|
||||
- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
|
||||
- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
|
||||
- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight).
|
||||
- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update.
|
||||
|
||||
## Never
|
||||
- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable).
|
||||
- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs.
|
||||
- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names).
|
||||
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
|
||||
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
|
||||
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
|
||||
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
|
||||
- Never commit to `main` or force-push a shared branch; branch first and ask.
|
||||
|
||||
## Current state
|
||||
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs.
|
||||
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
|
||||
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
|
||||
- **Decided but not implemented:** none open (deferred items live in `ROADMAP.md`).
|
||||
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
|
||||
- **Next:** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) confirm Settings → Your name = "Grant"; (3) record a fresh Meet call to validate the fix on a clean capture; (4) decide a git remote and push.
|
||||
@@ -0,0 +1,4 @@
|
||||
// Template for Config/Signing.xcconfig (which is gitignored).
|
||||
// Copy to Config/Signing.xcconfig and set your Apple Developer Team ID
|
||||
// (Xcode ▸ Settings ▸ Accounts, or `security find-identity -p codesigning -v`).
|
||||
DEVELOPMENT_TEAM = YOUR_APPLE_TEAM_ID
|
||||
@@ -14,25 +14,30 @@ This repo is at **Phase 0** (scaffold, permissions, backend health check).
|
||||
```sh
|
||||
brew install xcodegen
|
||||
```
|
||||
3. **Generate the project:**
|
||||
3. **Set your signing team.** The Apple Team ID is kept out of source in a
|
||||
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
|
||||
```sh
|
||||
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||
```
|
||||
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
||||
team automatically — no manual selection. Keep the value stable so macOS
|
||||
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
|
||||
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
|
||||
4. **Generate the project:**
|
||||
```sh
|
||||
xcodegen generate
|
||||
```
|
||||
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
||||
4. **Open it:**
|
||||
5. **Open it:**
|
||||
```sh
|
||||
open Ten31Transcripts.xcodeproj
|
||||
```
|
||||
5. Signing is preconfigured: `project.yml` sets `DEVELOPMENT_TEAM` to the free
|
||||
personal team `BK4Y6CXN35` with automatic signing, so **Signing & Capabilities
|
||||
should already show the team** — no manual selection needed. (If you ever sign
|
||||
with a different Apple ID, update `DEVELOPMENT_TEAM` in `project.yml`, not in
|
||||
Xcode — `xcodegen generate` overwrites Xcode-side changes.)
|
||||
6. Press **Run** (⌘R).
|
||||
|
||||
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
||||
> Xcode reload the project. The signing team persists because it lives in
|
||||
> `project.yml`, so macOS permissions stay granted across rebuilds.
|
||||
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
|
||||
> rebuilds.
|
||||
|
||||
## What Phase 0 does
|
||||
|
||||
@@ -64,5 +69,6 @@ Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
|
||||
|
||||
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
||||
LAN-only tool that must observe other apps. Revisit only if distributing.
|
||||
- The default backend host is `https://immense-voyage.local:62419` (editable in
|
||||
Settings).
|
||||
- The backend host is a private LAN address — set it in **Settings**, or seed it
|
||||
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
|
||||
placeholder (`https://your-spark-backend.local`).
|
||||
|
||||
+27
@@ -0,0 +1,27 @@
|
||||
# ROADMAP — Ten31 Transcripts
|
||||
|
||||
Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state.
|
||||
|
||||
## Visual detection
|
||||
- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
|
||||
- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
|
||||
- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
|
||||
- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
|
||||
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
||||
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
||||
|
||||
## Audio / speakers
|
||||
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
||||
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
||||
|
||||
## App / UX
|
||||
- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
|
||||
- Constrain recap reading width on very wide windows (long line length in the summary band).
|
||||
|
||||
## Tooling / repo
|
||||
- Decide and configure a git remote (none set); then push.
|
||||
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
||||
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
||||
|
||||
## Deferred decisions
|
||||
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|
||||
@@ -32,6 +32,16 @@ struct MeetAdapter: AppAdapter {
|
||||
// The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
|
||||
// ~0.44, below the 0.5 default — lower the threshold so the glow registers.
|
||||
config.colorSaturation = 0.35
|
||||
// Meet's active cue is a thin BLUE (≈210°) ring + audio pill. Detect thin blue
|
||||
// EDGES only, gated to blue: this rejects solid camera-off avatar tiles (orange
|
||||
// ≈30°, magenta ≈340°), which otherwise read as "speaking" for the whole call
|
||||
// and collapse every remote voice onto one name. Validated on real fixtures.
|
||||
config.coloredBorderThinOnly = true
|
||||
config.colorHueRange = 180...240
|
||||
// Meet's blue border is faint; real rings measure ≈0.20–0.30 interior fill while
|
||||
// solid tiles measure ≈0.36, so allow a higher fill here than the 0.2 default to
|
||||
// recover real borders without readmitting the solid-tile false positives.
|
||||
config.maxInteriorFill = 0.3
|
||||
config.tileExpandX = 3.0
|
||||
config.tileExpandY = 5.0
|
||||
self.analyzer = GridCallAnalyzer(config: config)
|
||||
|
||||
@@ -82,6 +82,10 @@ enum RecapRenderer {
|
||||
|
||||
// MARK: - HTML
|
||||
|
||||
/// Mirror of recap-relay's job-output view: a header, an optional band of recap
|
||||
/// cards (summary + takeaways), then a two-pane split — topic list on the left,
|
||||
/// full diarized transcript on the right, click a topic to jump + highlight its
|
||||
/// range. Self-contained (data baked in; the click handler is inline JS).
|
||||
static func html(file: SpeakersFile, result: RecapResult, title: String,
|
||||
entries: [RecapAnalyzer.Entry]) -> String {
|
||||
let speakers = RecapAnalyzer.orderedSpeakerNames(entries)
|
||||
@@ -91,66 +95,78 @@ enum RecapRenderer {
|
||||
return "<span class=\"chip\" style=\"background:\(c)\">\(esc(name))</span>"
|
||||
}
|
||||
|
||||
var body = ""
|
||||
// Header: title, meta line, speaker legend.
|
||||
let sub = "\(esc(file.app)) · \(RecapAnalyzer.mmss(file.durationSec))"
|
||||
+ (speakers.isEmpty ? "" : " · \(speakers.count) speaker\(speakers.count == 1 ? "" : "s")")
|
||||
body += "<header><h1>\(esc(title))</h1><div class=\"sub\">\(sub)</div>"
|
||||
var header = "<div class=\"header\"><div class=\"htext\"><h1>\(esc(title))</h1><div class=\"meta\">\(sub)</div></div>"
|
||||
if !speakers.isEmpty {
|
||||
body += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
|
||||
header += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
|
||||
}
|
||||
body += "</header>"
|
||||
header += "</div>"
|
||||
|
||||
// Recap cards band (summary + template takeaways).
|
||||
var cards = ""
|
||||
if let x = result.extras {
|
||||
if !x.tldr.isEmpty {
|
||||
body += card("Summary", "<p>\(esc(x.tldr))</p>"
|
||||
cards += card("Summary", "<p>\(esc(x.tldr))</p>"
|
||||
+ (x.primarySpeakers.isEmpty ? "" : "<p class=\"muted\">Primary: \(x.primarySpeakers.map(esc).joined(separator: ", "))</p>"))
|
||||
}
|
||||
for section in x.sections where !section.isEmpty {
|
||||
switch section.kind {
|
||||
case .paragraph:
|
||||
body += card(section.title, "<p>\(esc(section.paragraph))</p>")
|
||||
cards += card(section.title, "<p>\(esc(section.paragraph))</p>")
|
||||
case .bullets:
|
||||
body += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
|
||||
cards += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
|
||||
case .items:
|
||||
let lis = section.items.map { item -> String in
|
||||
var s = "<li>\(esc(item.text))"
|
||||
if let who = item.who { s += " <strong>\(esc(who))</strong>" }
|
||||
if let note = item.note { s += " <span class=\"muted\">(\(esc(note)))</span>" }
|
||||
if let when = item.when { s += " <span class=\"ts\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
|
||||
if let who = item.who { s += " <span class=\"who\">\(esc(who))</span>" }
|
||||
if let note = item.note { s += " <span class=\"note\">(\(esc(note)))</span>" }
|
||||
if let when = item.when { s += " <span class=\"ts-badge\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
|
||||
return s + "</li>"
|
||||
}.joined()
|
||||
body += card(section.title, "<ul>\(lis)</ul>")
|
||||
cards += card(section.title, "<ul>\(lis)</ul>")
|
||||
}
|
||||
}
|
||||
}
|
||||
let band = cards.isEmpty ? "" : "<div class=\"band\">\(cards)</div>"
|
||||
|
||||
if !result.sections.isEmpty {
|
||||
var topics = ""
|
||||
for (i, sec) in result.sections.enumerated() {
|
||||
let range = entries.indices.contains(sec.startIndex) && entries.indices.contains(sec.endIndex)
|
||||
? "<span class=\"ts\">\(RecapAnalyzer.mmss(entries[sec.startIndex].offset))–\(RecapAnalyzer.mmss(entries[sec.endIndex].end))</span>" : ""
|
||||
topics += "<details class=\"topic\"><summary><span class=\"tnum\">\(i + 1)</span> \(esc(sec.title)) \(range)</summary>"
|
||||
if !sec.summary.isEmpty { topics += "<p>\(esc(sec.summary))</p>" }
|
||||
topics += "<div class=\"turns\">" + turnsHtml(sec, entries: entries, chip: chip) + "</div></details>"
|
||||
// Left pane: topic cards (click to jump). data-start/data-end index entries.
|
||||
var left = "<div class=\"left\">"
|
||||
if result.sections.isEmpty {
|
||||
left += "<div class=\"empty\">No topic sections.</div>"
|
||||
} else {
|
||||
for sec in result.sections {
|
||||
let s = max(0, min(sec.startIndex, entries.count - 1))
|
||||
let e = max(s, min(sec.endIndex, entries.count - 1))
|
||||
let time = entries.indices.contains(s) && entries.indices.contains(e)
|
||||
? "<span class=\"chunk-time\">\(RecapAnalyzer.mmss(entries[s].offset)) — \(RecapAnalyzer.mmss(entries[e].end))</span>" : ""
|
||||
left += "<div class=\"chunk\" data-start=\"\(s)\" data-end=\"\(e)\" onclick=\"jump(this)\">"
|
||||
+ "<div class=\"chunk-title\">\(esc(sec.title))\(time)</div>"
|
||||
+ (sec.summary.isEmpty ? "" : "<div class=\"chunk-summary\">\(esc(sec.summary))</div>")
|
||||
+ "</div>"
|
||||
}
|
||||
body += card("Topics", topics)
|
||||
}
|
||||
left += "</div>"
|
||||
|
||||
let full = entries.map { "<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>" }.joined()
|
||||
body += "<details class=\"topic\" open><summary>Full Transcript</summary><div class=\"turns\">\(full)</div></details>"
|
||||
// Right pane: full diarized transcript, one line per turn (id=entry-i).
|
||||
var right = "<div class=\"right\">"
|
||||
if entries.isEmpty {
|
||||
right += "<div class=\"empty\">No transcript.</div>"
|
||||
} else {
|
||||
for (i, en) in entries.enumerated() {
|
||||
right += "<div class=\"transcript-line\" id=\"entry-\(i)\">"
|
||||
+ "<span class=\"ts-badge\">\(RecapAnalyzer.mmss(en.offset))</span>"
|
||||
+ chip(en.speaker)
|
||||
+ "<span class=\"ts-text\">\(esc(en.text))</span></div>"
|
||||
}
|
||||
}
|
||||
right += "</div>"
|
||||
|
||||
let body = header + band + "<div class=\"split\">\(left)\(right)</div>"
|
||||
return htmlShell(title: esc(title), body: body)
|
||||
}
|
||||
|
||||
private static func turnsHtml(_ sec: TopicSection, entries: [RecapAnalyzer.Entry],
|
||||
chip: (String) -> String) -> String {
|
||||
guard sec.startIndex <= sec.endIndex, entries.indices.contains(sec.startIndex), entries.indices.contains(sec.endIndex)
|
||||
else { return "" }
|
||||
return entries[sec.startIndex...sec.endIndex].map {
|
||||
"<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>"
|
||||
}.joined()
|
||||
}
|
||||
|
||||
private static func card(_ title: String, _ inner: String) -> String {
|
||||
"<section class=\"card\"><h2>\(esc(title))</h2>\(inner)</section>"
|
||||
}
|
||||
@@ -176,34 +192,63 @@ enum RecapRenderer {
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>\(title)</title>
|
||||
<style>
|
||||
:root{--bg:#15171c;--card:#1d2026;--fg:#e6e8ec;--muted:#9aa0aa;--line:#2a2e36;--accent:#5b8def;}
|
||||
:root{--bg:#0a0e1a;--panel:#111827;--panel-2:#1e293b;--line:#1e293b;--line-2:#334155;
|
||||
--fg:#e2e8f0;--fg-dim:#94a3b8;--fg-faint:#64748b;--accent:#818cf8;--accent-soft:#a5b4fc;}
|
||||
*{box-sizing:border-box}
|
||||
body{margin:0;background:var(--bg);color:var(--fg);font:15px/1.55 -apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;}
|
||||
main{max-width:820px;margin:0 auto;padding:32px 20px 80px;}
|
||||
header h1{margin:0 0 4px;font-size:24px}
|
||||
.sub{color:var(--muted);font-size:13px}
|
||||
.legend{margin-top:12px;display:flex;flex-wrap:wrap;gap:6px}
|
||||
.chip{display:inline-block;padding:1px 8px;border-radius:10px;color:#fff;font-size:12px;font-weight:600}
|
||||
.card{background:var(--card);border:1px solid var(--line);border-radius:12px;padding:16px 18px;margin-top:18px}
|
||||
.card h2{margin:0 0 10px;font-size:16px;color:var(--accent)}
|
||||
.muted{color:var(--muted)}
|
||||
ul{margin:0;padding-left:18px} li{margin:4px 0}
|
||||
ul.actions{list-style:none;padding-left:0}
|
||||
.ts{color:var(--muted);font-variant-numeric:tabular-nums;font-size:12px;margin-right:4px}
|
||||
blockquote{margin:0 0 12px;padding:8px 12px;border-left:3px solid var(--accent);background:#0e0f13;border-radius:0 8px 8px 0}
|
||||
blockquote cite{display:block;color:var(--muted);font-size:12px;margin-top:4px;font-style:normal}
|
||||
details.topic{border-top:1px solid var(--line);padding:10px 0}
|
||||
details.topic > summary{cursor:pointer;font-weight:600;list-style:none}
|
||||
details.topic > summary::-webkit-details-marker{display:none}
|
||||
.tnum{display:inline-block;min-width:20px;color:var(--accent);font-weight:700}
|
||||
.turns{margin-top:10px}
|
||||
.turn{margin:6px 0;display:flex;gap:8px;align-items:baseline;flex-wrap:wrap}
|
||||
.turn .txt{flex:1;min-width:60%}
|
||||
@media print{body{background:#fff;color:#000}.card,blockquote{background:#fff;border-color:#ccc}details.topic{}.chip{border:1px solid #999}}
|
||||
body{margin:0;background:var(--bg);color:var(--fg);min-height:100vh;
|
||||
font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif;font-size:13px;line-height:1.55}
|
||||
.header{padding:14px 24px;background:var(--panel);border-bottom:1px solid var(--line);
|
||||
display:flex;align-items:center;gap:16px;flex-wrap:wrap}
|
||||
.header .htext{min-width:0}
|
||||
.header h1{margin:0;font-size:16px;font-weight:700;color:var(--fg)}
|
||||
.header .meta{font-size:11px;color:var(--fg-faint);margin-top:2px;font-variant-numeric:tabular-nums}
|
||||
.legend{margin-left:auto;display:flex;flex-wrap:wrap;gap:6px;justify-content:flex-end}
|
||||
.chip{display:inline-block;padding:1px 8px;border-radius:999px;color:#fff;font-size:10px;font-weight:700;white-space:nowrap}
|
||||
.band{padding:16px 24px;display:grid;gap:12px}
|
||||
.card{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:14px 16px}
|
||||
.card h2{margin:0 0 8px;font-size:11px;font-weight:700;text-transform:uppercase;letter-spacing:.04em;color:var(--accent-soft)}
|
||||
.card p{margin:0 0 8px}
|
||||
.card p:last-child{margin-bottom:0}
|
||||
.card .muted{color:var(--fg-dim);font-size:12px}
|
||||
.card ul{margin:0;padding-left:18px}
|
||||
.card li{margin:5px 0;color:var(--fg)}
|
||||
.card .who{color:var(--accent-soft);font-weight:600}
|
||||
.card .note{color:var(--fg-faint)}
|
||||
.split{display:flex;min-height:calc(100vh - 56px)}
|
||||
.left{flex:0 0 42%;max-width:42%;border-right:1px solid var(--line);overflow-y:auto;padding:16px;background:var(--bg)}
|
||||
.right{flex:1;min-width:0;overflow-y:auto;padding:16px;background:var(--panel)}
|
||||
@media(max-width:900px){.split{flex-direction:column}.left,.right{flex:none;max-width:100%;border-right:none}
|
||||
.left{border-bottom:1px solid var(--line)}}
|
||||
.chunk{padding:12px 14px;margin-bottom:8px;background:var(--panel);border:1px solid var(--line);
|
||||
border-radius:10px;cursor:pointer;transition:border-color .15s,background .15s}
|
||||
.chunk:hover{border-color:var(--accent)}
|
||||
.chunk.active{border-color:var(--accent);background:rgba(129,140,248,.06);box-shadow:0 2px 16px rgba(129,140,248,.10)}
|
||||
.chunk-title{font-size:13px;font-weight:700;color:var(--fg);margin-bottom:4px}
|
||||
.chunk-time{font-size:10px;color:var(--fg-faint);margin-left:6px;font-weight:500;font-family:"SF Mono",Menlo,monospace}
|
||||
.chunk-summary{font-size:12px;color:var(--fg-dim);line-height:1.5}
|
||||
.transcript-line{display:flex;gap:10px;padding:4px 8px;border-radius:6px;line-height:1.6;align-items:baseline;scroll-margin-top:16px}
|
||||
.transcript-line.hl{background:rgba(129,140,248,.10)}
|
||||
.ts-badge{flex:0 0 auto;font-family:"SF Mono",Menlo,monospace;font-size:11px;color:var(--accent-soft);min-width:52px}
|
||||
.ts-text{flex:1;font-size:13px;color:var(--fg)}
|
||||
.empty{padding:32px 16px;text-align:center;color:var(--fg-faint)}
|
||||
.foot{padding:14px 24px;color:var(--fg-faint);font-size:11px;border-top:1px solid var(--line)}
|
||||
@media print{body{background:#fff;color:#000}.header,.right,.left,.card,.chunk{background:#fff;border-color:#ccc}
|
||||
.split{display:block}.left,.right{max-width:100%}.chip{border:1px solid #999}}
|
||||
</style></head>
|
||||
<body><main>\(body)
|
||||
<footer class="sub" style="margin-top:40px">Ten31 Transcripts · generated on-device</footer>
|
||||
</main></body></html>
|
||||
<body>\(body)
|
||||
<div class="foot">Ten31 Transcripts · generated on-device</div>
|
||||
<script>
|
||||
function jump(el){
|
||||
document.querySelectorAll('.chunk.active').forEach(function(x){x.classList.remove('active')});
|
||||
el.classList.add('active');
|
||||
var s=+el.dataset.start, e=+el.dataset.end;
|
||||
var t=document.getElementById('entry-'+s);
|
||||
if(t) t.scrollIntoView({behavior:'smooth',block:'start'});
|
||||
document.querySelectorAll('.transcript-line.hl').forEach(function(x){x.classList.remove('hl')});
|
||||
for(var i=s;i<=e;i++){var x=document.getElementById('entry-'+i); if(x) x.classList.add('hl');}
|
||||
}
|
||||
</script>
|
||||
</body></html>
|
||||
"""
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
import Foundation
|
||||
|
||||
/// How long each diarization *body* chunk should be. Smaller chunks keep fewer
|
||||
/// simultaneous speakers inside one window — Sortformer resolves at most ~4 speakers
|
||||
/// per chunk, and the dual-channel split already spends the local user on the mic
|
||||
/// track, so the system (remote) channel is what can saturate on a big call. The
|
||||
/// cost of going smaller: weaker cross-chunk voiceprints, more cross-chunk speaker
|
||||
/// splitting (the reconciler re-merges some), and more backend round-trips.
|
||||
enum ChunkMode: String, CaseIterable, Identifiable, Codable {
|
||||
case auto, standard, largeGroup, fine
|
||||
|
||||
var id: String { rawValue }
|
||||
|
||||
var label: String {
|
||||
switch self {
|
||||
case .auto: return "Auto (by call size)"
|
||||
case .standard: return "Standard · 2.5 min"
|
||||
case .largeGroup: return "Large group · 60 sec"
|
||||
case .fine: return "Fine · 90 sec"
|
||||
}
|
||||
}
|
||||
|
||||
/// Fixed body length, or nil for `.auto` (resolved from the participant count).
|
||||
var fixedBodySeconds: Double? {
|
||||
switch self {
|
||||
case .auto: return nil
|
||||
case .standard: return 150
|
||||
case .largeGroup: return 60
|
||||
case .fine: return 90
|
||||
}
|
||||
}
|
||||
|
||||
/// More than this many detected participants makes `.auto` pick the short body,
|
||||
/// so one chunk is less likely to exceed Sortformer's ~4-speaker resolution.
|
||||
static let autoLargeThreshold = 4
|
||||
|
||||
/// Resolve the body length in seconds. `.auto` drops to 60s when more than
|
||||
/// `autoLargeThreshold` participants were detected, else uses the 2.5-min default;
|
||||
/// with no count available (audio-only) it stays at the 2.5-min default.
|
||||
func bodySeconds(participantCount: Int?) -> Double {
|
||||
if let fixed = fixedBodySeconds { return fixed }
|
||||
if let n = participantCount, n > Self.autoLargeThreshold { return 60 }
|
||||
return 150
|
||||
}
|
||||
|
||||
/// Overlap margin scaled to the body length (~12%, clamped 8…15s) so a 60s chunk
|
||||
/// isn't dominated by a fixed 15s margin while a 2.5-min chunk keeps the full 15s.
|
||||
static func overlapSeconds(forBody body: Double) -> Double {
|
||||
max(8, min(15, (body * 0.12).rounded()))
|
||||
}
|
||||
}
|
||||
@@ -256,6 +256,9 @@ final class SessionController: ObservableObject {
|
||||
private func startVisual(t0Host: Double, generation: Int, recorder: AudioRecorder) async {
|
||||
guard let capture = pendingCapture else { return } // manual recording → audio-only
|
||||
pendingCapture = nil
|
||||
// Honor the per-app adapter switch: if the user turned this app's adapter off,
|
||||
// skip screen-reading entirely and record audio-only (transcription still runs).
|
||||
guard settings.adapterEnabled[capture.app.rawValue] ?? true else { return }
|
||||
guard let vc = VisualCapture(app: capture.app, bundleID: capture.bundleID,
|
||||
windowID: capture.windowID, t0Host: t0Host) else { return }
|
||||
// Register the live capture before the await so a quit (prepareForTermination)
|
||||
@@ -375,12 +378,15 @@ final class SessionController: ObservableObject {
|
||||
let settings = self.settings
|
||||
let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
|
||||
skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
|
||||
// Resolve the diarization chunk length from the setting; "Auto" uses the
|
||||
// participant count the visual capture saw for this session.
|
||||
let chunkSeconds = settings.chunk.bodySeconds(participantCount: Self.participantCount(in: inputs.folder))
|
||||
do {
|
||||
let speakers = try await pipeline.process(
|
||||
sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
|
||||
micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
|
||||
timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
|
||||
systemHealthy: inputs.systemHealthy,
|
||||
systemHealthy: inputs.systemHealthy, chunkSeconds: chunkSeconds,
|
||||
progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
|
||||
self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
|
||||
try Task.checkCancellation()
|
||||
@@ -528,6 +534,16 @@ final class SessionController: ObservableObject {
|
||||
}
|
||||
}
|
||||
|
||||
/// Detected participant count from a session's visual timeline, for "Auto" chunk
|
||||
/// sizing. Nil when there's no visual timeline (audio-only) so callers keep the
|
||||
/// default body length. Counts everyone OCR'd on the call, not just who spoke.
|
||||
private static func participantCount(in folder: URL) -> Int? {
|
||||
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
|
||||
let vt = try? JSONDecoder().decode(VisualTimeline.self, from: data),
|
||||
!vt.participants.isEmpty else { return nil }
|
||||
return vt.participants.count
|
||||
}
|
||||
|
||||
/// The remote (vision) visual-timeline segments saved for a session, if any.
|
||||
private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
|
||||
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
|
||||
|
||||
@@ -28,6 +28,7 @@ final class TranscriptPipeline {
|
||||
selfSpans: [VADSpan],
|
||||
selfName: String,
|
||||
systemHealthy: Bool,
|
||||
chunkSeconds: Double = 150,
|
||||
progress: ((Int, Int) async -> Void)? = nil) async throws -> SpeakersFile {
|
||||
let fm = FileManager.default
|
||||
let dual = systemHealthy
|
||||
@@ -36,7 +37,12 @@ final class TranscriptPipeline {
|
||||
let duration = dual
|
||||
? max(SessionPackager.duration(of: micURL), SessionPackager.duration(of: systemURL))
|
||||
: SessionPackager.duration(of: mixedURL)
|
||||
let plan = SessionPackager.planChunks(durationSec: duration)
|
||||
// Chunk to the requested body length; overlap and the single-chunk threshold
|
||||
// scale with it (a 60s body shouldn't be cut by a fixed 15s margin or stay
|
||||
// unchunked below the 2.5-min default threshold).
|
||||
let overlap = ChunkMode.overlapSeconds(forBody: chunkSeconds)
|
||||
let plan = SessionPackager.planChunks(durationSec: duration, chunkSeconds: chunkSeconds,
|
||||
overlapSeconds: overlap, thresholdSec: chunkSeconds * 1.2)
|
||||
|
||||
// Zero-duration / empty session → a valid empty speakers.json, no backend call.
|
||||
if plan.isEmpty || duration <= 0 {
|
||||
@@ -50,13 +56,20 @@ final class TranscriptPipeline {
|
||||
try? fm.createDirectory(at: chunksDir, withIntermediateDirectories: true)
|
||||
defer { try? fm.removeItem(at: chunksDir) } // cleanup on success OR throw
|
||||
|
||||
// Defensive: drop any visual span covering most of the call in one unbroken
|
||||
// segment — the signature of a stuck/false active-speaker cue (e.g. a solid
|
||||
// camera-off tile read as "speaking" the whole call). Such a span would
|
||||
// dominate the backend's name attribution and collapse every voice onto one
|
||||
// name. Also salvages sessions captured before the adapter fix landed.
|
||||
let vis = Self.dropStuckSpans(timeline, duration: duration)
|
||||
|
||||
// Start from stored voiceprints; accumulate this call's prints across chunks
|
||||
// for within-call unification (the store only persists high-confidence ones).
|
||||
var known = voiceprints.knownVoiceprints()
|
||||
var results: [TranscriptAssembler.ChunkResult] = []
|
||||
// Mono fallback needs self folded into the timeline; dual sends it separately.
|
||||
let monoTimeline = dual ? timeline
|
||||
: timeline + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
|
||||
let monoTimeline = dual ? vis
|
||||
: vis + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
|
||||
|
||||
for chunk in plan {
|
||||
try Task.checkCancellation()
|
||||
@@ -70,7 +83,7 @@ final class TranscriptPipeline {
|
||||
try SessionPackager.sliceAudio(from: micURL, startSec: chunk.start, endSec: chunk.end, to: micChunk)
|
||||
try SessionPackager.sliceAudio(from: systemURL, startSec: chunk.start, endSec: chunk.end, to: sysChunk)
|
||||
guard fm.fileExists(atPath: micChunk.path), fm.fileExists(atPath: sysChunk.path) else { continue }
|
||||
let timelineData = try SessionPackager.rebasedTimelineData(timeline, start: chunk.start, end: chunk.end)
|
||||
let timelineData = try SessionPackager.rebasedTimelineData(vis, start: chunk.start, end: chunk.end)
|
||||
let selfVadData = try SessionPackager.rebasedSelfVadData(selfSpans, start: chunk.start, end: chunk.end)
|
||||
response = try await client.labelMergeDual(
|
||||
micURL: micChunk, systemURL: sysChunk, selfName: selfName, selfVad: selfVadData,
|
||||
@@ -113,4 +126,14 @@ final class TranscriptPipeline {
|
||||
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
||||
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
||||
}
|
||||
|
||||
/// Drop visual (vision-source) spans whose single unbroken duration covers at
|
||||
/// least `maxFraction` of the whole call — no one legitimately speaks that long
|
||||
/// without a break, so it's a stuck/false cue. Self spans (mic_vad) are kept.
|
||||
static func dropStuckSpans(_ timeline: [VisualTimeline.Segment], duration: Double,
|
||||
maxFraction: Double = 0.6) -> [VisualTimeline.Segment] {
|
||||
guard duration > 0 else { return timeline }
|
||||
let limit = maxFraction * duration
|
||||
return timeline.filter { $0.source != "vision" || ($0.end - $0.start) < limit }
|
||||
}
|
||||
}
|
||||
|
||||
@@ -60,6 +60,15 @@ final class AppSettings: ObservableObject {
|
||||
didSet { defaults.set(reconcileSpeakers, forKey: Keys.reconcileSpeakers) }
|
||||
}
|
||||
|
||||
/// Diarization chunk length (raw value of `ChunkMode`). `.auto` shrinks chunks on
|
||||
/// large calls so a window is less likely to exceed Sortformer's ~4-speaker cap.
|
||||
@Published var chunkMode: String {
|
||||
didSet { defaults.set(chunkMode, forKey: Keys.chunkMode) }
|
||||
}
|
||||
|
||||
/// Typed accessor for `chunkMode`.
|
||||
var chunk: ChunkMode { ChunkMode(rawValue: chunkMode) ?? .auto }
|
||||
|
||||
/// User-editable recap templates (takeaways categories per meeting type).
|
||||
@Published var recapTemplates: [RecapTemplate] {
|
||||
didSet { persist(recapTemplates, forKey: Keys.recapTemplates) }
|
||||
@@ -83,11 +92,19 @@ final class AppSettings: ObservableObject {
|
||||
|
||||
private let defaults: UserDefaults
|
||||
|
||||
/// Neutral placeholder. The real (private LAN) backend host is never committed —
|
||||
/// it's entered in Settings (persisted to UserDefaults) or seeded from the
|
||||
/// `SPARK_BACKEND_URL` env var for dev/CI/harness runs.
|
||||
static let defaultBackendURL = "https://your-spark-backend.local"
|
||||
|
||||
init(defaults: UserDefaults = .standard) {
|
||||
self.defaults = defaults
|
||||
|
||||
// Precedence: a value the user saved in Settings wins; else the env var
|
||||
// (handy when launching from Xcode/terminal); else the placeholder.
|
||||
self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL)
|
||||
?? "https://immense-voyage.local:62419"
|
||||
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
|
||||
?? Self.defaultBackendURL
|
||||
|
||||
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
|
||||
|
||||
@@ -104,6 +121,7 @@ final class AppSettings: ObservableObject {
|
||||
self.autoSendOnStop = defaults.object(forKey: Keys.autoSend) as? Bool ?? false
|
||||
self.recapEnabled = defaults.object(forKey: Keys.recapEnabled) as? Bool ?? true
|
||||
self.reconcileSpeakers = defaults.object(forKey: Keys.reconcileSpeakers) as? Bool ?? true
|
||||
self.chunkMode = defaults.string(forKey: Keys.chunkMode) ?? ChunkMode.auto.rawValue
|
||||
|
||||
let loaded = (defaults.data(forKey: Keys.recapTemplates))
|
||||
.flatMap { try? JSONDecoder().decode([RecapTemplate].self, from: $0) }
|
||||
@@ -126,6 +144,7 @@ final class AppSettings: ObservableObject {
|
||||
static let autoSend = "autoSendOnStop"
|
||||
static let recapEnabled = "recapEnabled"
|
||||
static let reconcileSpeakers = "reconcileSpeakers"
|
||||
static let chunkMode = "chunkMode"
|
||||
static let recapTemplates = "recapTemplates"
|
||||
static let defaultTemplate = "defaultTemplateId"
|
||||
}
|
||||
|
||||
@@ -31,6 +31,35 @@ final class EditorWindow {
|
||||
}
|
||||
}
|
||||
|
||||
/// Hosts the app Settings in a standalone resizable window. Far roomier than the
|
||||
/// old in-popover NavigationLink, which cramped the form into the 320pt menu-bar
|
||||
/// panel and hid most controls below a non-obvious scroll.
|
||||
@MainActor
|
||||
final class SettingsWindow {
|
||||
static let shared = SettingsWindow()
|
||||
private var window: NSWindow?
|
||||
|
||||
func show(settings: AppSettings) {
|
||||
if let window {
|
||||
NSApp.activate(ignoringOtherApps: true)
|
||||
window.makeKeyAndOrderFront(nil)
|
||||
return
|
||||
}
|
||||
let w = NSWindow(
|
||||
contentRect: NSRect(x: 0, y: 0, width: 520, height: 660),
|
||||
styleMask: [.titled, .closable, .resizable, .miniaturizable],
|
||||
backing: .buffered, defer: false)
|
||||
w.title = "Settings"
|
||||
w.isReleasedWhenClosed = false
|
||||
w.center()
|
||||
w.contentViewController = NSHostingController(
|
||||
rootView: SettingsView().environmentObject(settings))
|
||||
window = w
|
||||
NSApp.activate(ignoringOtherApps: true)
|
||||
w.makeKeyAndOrderFront(nil)
|
||||
}
|
||||
}
|
||||
|
||||
/// Hosts the recap-templates manager in its own resizable window.
|
||||
@MainActor
|
||||
final class TemplatesWindow {
|
||||
|
||||
@@ -10,21 +10,19 @@ struct MenuBarView: View {
|
||||
@EnvironmentObject private var session: SessionController
|
||||
|
||||
var body: some View {
|
||||
NavigationStack {
|
||||
VStack(alignment: .leading, spacing: 12) {
|
||||
header
|
||||
Divider()
|
||||
recordingSection
|
||||
Divider()
|
||||
permissionsSection
|
||||
Divider()
|
||||
backendSection
|
||||
Divider()
|
||||
footer
|
||||
}
|
||||
.padding(14)
|
||||
.frame(width: 320)
|
||||
VStack(alignment: .leading, spacing: 12) {
|
||||
header
|
||||
Divider()
|
||||
recordingSection
|
||||
Divider()
|
||||
permissionsSection
|
||||
Divider()
|
||||
backendSection
|
||||
Divider()
|
||||
footer
|
||||
}
|
||||
.padding(14)
|
||||
.frame(width: 320)
|
||||
.onAppear { permissions.refresh() }
|
||||
.task { await refreshHealth() }
|
||||
}
|
||||
@@ -227,9 +225,7 @@ struct MenuBarView: View {
|
||||
|
||||
private var footer: some View {
|
||||
HStack {
|
||||
NavigationLink("Settings…") {
|
||||
SettingsView()
|
||||
}
|
||||
Button("Settings…") { SettingsWindow.shared.show(settings: settings) }
|
||||
Spacer()
|
||||
Button("Quit") { NSApplication.shared.terminate(nil) }
|
||||
}
|
||||
|
||||
@@ -7,6 +7,21 @@ struct SettingsView: View {
|
||||
|
||||
var body: some View {
|
||||
Form {
|
||||
Section("Your name") {
|
||||
TextField("Your name", text: $settings.selfName)
|
||||
.textFieldStyle(.roundedBorder)
|
||||
if isDefaultName {
|
||||
Label("Still set to the default. Enter your real name so your own voice is labeled correctly — and so the AI never gives your name to someone else.",
|
||||
systemImage: "exclamationmark.triangle.fill")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.orange)
|
||||
} else {
|
||||
Text("Labels your microphone channel as you in every transcript, and reserves this name so it’s never assigned to another speaker.")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
}
|
||||
|
||||
Section("SparkControl backend") {
|
||||
TextField("Base URL", text: $settings.backendBaseURL)
|
||||
.textFieldStyle(.roundedBorder)
|
||||
@@ -22,10 +37,14 @@ struct SettingsView: View {
|
||||
}
|
||||
|
||||
Section("Transcription") {
|
||||
TextField("Your name", text: $settings.selfName)
|
||||
.textFieldStyle(.roundedBorder)
|
||||
Toggle("Auto-send recordings to backend", isOn: $settings.autoSendOnStop)
|
||||
Toggle("Reconcile speakers (merge splits + name from content)", isOn: $settings.reconcileSpeakers)
|
||||
Picker("Chunk length", selection: $settings.chunkMode) {
|
||||
ForEach(ChunkMode.allCases) { Text($0.label).tag($0.rawValue) }
|
||||
}
|
||||
Text("How finely audio is split for diarization. Shorter chunks keep fewer simultaneous speakers per window (the diarizer resolves ~4 at a time), at some cost to speed and voice matching. Auto uses 60-sec chunks when more than \(ChunkMode.autoLargeThreshold) people are detected on the call, else 2.5 min.")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
Toggle("Build readable recap (topics + highlights)", isOn: $settings.recapEnabled)
|
||||
HStack {
|
||||
Picker("Default recap template", selection: $settings.defaultTemplateId) {
|
||||
@@ -33,7 +52,7 @@ struct SettingsView: View {
|
||||
}
|
||||
Button("Manage…") { TemplatesWindow.shared.show(settings: settings) }
|
||||
}
|
||||
Text("Your name labels your mic channel. Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
|
||||
Text("Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
@@ -50,7 +69,7 @@ struct SettingsView: View {
|
||||
}
|
||||
|
||||
Section("Adapters") {
|
||||
Text("Inert in Phase 0 — these toggles only persist for now.")
|
||||
Text("Screen-reading for active-speaker cues. Turn one off to record that app audio-only — transcription still runs, but speakers aren’t identified from the screen.")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
ForEach(AppSettings.adapterKeys, id: \.key) { adapter in
|
||||
@@ -59,10 +78,17 @@ struct SettingsView: View {
|
||||
}
|
||||
}
|
||||
.formStyle(.grouped)
|
||||
.frame(width: 320)
|
||||
.frame(minWidth: 460, idealWidth: 520, maxWidth: .infinity,
|
||||
minHeight: 520, idealHeight: 660, maxHeight: .infinity)
|
||||
.navigationTitle("Settings")
|
||||
}
|
||||
|
||||
/// True while the user still has the placeholder name — drives the inline nudge.
|
||||
private var isDefaultName: Bool {
|
||||
let n = settings.selfName.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
return n.isEmpty || n.caseInsensitiveCompare("Me") == .orderedSame
|
||||
}
|
||||
|
||||
private func binding(for key: String) -> Binding<Bool> {
|
||||
Binding(
|
||||
get: { settings.adapterEnabled[key] ?? true },
|
||||
|
||||
@@ -120,6 +120,43 @@ struct FrameSampler {
|
||||
return points
|
||||
}
|
||||
|
||||
/// Grid-sampled saturated pixels that lie on a THIN structure (a non-saturated
|
||||
/// pixel within `edgeGap` on some axis) — the coloured counterpart of
|
||||
/// `thinWhitePoints`. This keeps a thin speaking BORDER/ring/pill but drops the
|
||||
/// solid interior of a colour FILL (e.g. Meet's orange/magenta camera-off avatar
|
||||
/// tiles), whose pixels are surrounded by the same colour. Pair with `hueRange`
|
||||
/// to keep only the cue's colour (Meet's blue ring) and reject the thin edges a
|
||||
/// solid tile still has against the background (orange/magenta boundaries).
|
||||
func thinColoredPoints(threshold: Double = 0.35, minBrightness: Double = 60,
|
||||
hueRange: ClosedRange<Double>? = nil,
|
||||
edgeGap: Int = 6, gridStep: Int = 4) -> [CGPoint] {
|
||||
func isCue(_ x: Int, _ y: Int) -> Bool {
|
||||
guard x >= 0, x < width, y >= 0, y < height else { return false }
|
||||
let i = (y * width + x) * 4
|
||||
let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
|
||||
let mx = max(r, g, b), mn = min(r, g, b)
|
||||
let sat = mx > 0 ? (mx - mn) / mx : 0
|
||||
guard sat > threshold, mx > minBrightness else { return false }
|
||||
if let hr = hueRange { return hr.contains(Self.hueDegrees(r, g, b, mx, mn)) }
|
||||
return true
|
||||
}
|
||||
var points: [CGPoint] = []
|
||||
var y = edgeGap
|
||||
while y < height - edgeGap {
|
||||
var x = edgeGap
|
||||
while x < width - edgeGap {
|
||||
if isCue(x, y) {
|
||||
let thin = !isCue(x - edgeGap, y) || !isCue(x + edgeGap, y)
|
||||
|| !isCue(x, y - edgeGap) || !isCue(x, y + edgeGap)
|
||||
if thin { points.append(CGPoint(x: x, y: y)) }
|
||||
}
|
||||
x += gridStep
|
||||
}
|
||||
y += gridStep
|
||||
}
|
||||
return points
|
||||
}
|
||||
|
||||
/// HSV hue in degrees (0…360) from RGB and its precomputed max/min channels.
|
||||
private static func hueDegrees(_ r: Double, _ g: Double, _ b: Double, _ mx: Double, _ mn: Double) -> Double {
|
||||
let d = mx - mn
|
||||
|
||||
@@ -35,11 +35,21 @@ struct GridCallAnalyzer {
|
||||
var colorSaturation: Double = 0.5
|
||||
var colorMinBrightness: Double = 60
|
||||
var colorHueRange: ClosedRange<Double>? = nil
|
||||
// When true, the coloured highlight is detected from THIN edges only (drops
|
||||
// solid colour fills like Meet's camera-off avatar tiles). Pair with a tight
|
||||
// `colorHueRange` so a solid tile's thin background boundary is rejected too.
|
||||
var coloredBorderThinOnly = false
|
||||
var minTextConfidence: Float = 0.3
|
||||
var maxNameLength = 40
|
||||
var minHighlightPoints = 6
|
||||
var highlightShareOfMax = 0.35
|
||||
var minRingSpan: Double = 60 // a speaking border spans a sizable box, not a speck
|
||||
// A real active-speaker cue is a thin RING (border) with an EMPTY interior.
|
||||
// A solid camera-off avatar tile (Meet's orange/magenta fill) or a screen-share
|
||||
// fill is a filled BLOB — its highlight points spread through the interior. Reject
|
||||
// a component when more than this fraction of its points fall in the central
|
||||
// 60%×60% of its bbox (a hollow ring ≈ 0; a solid fill ≈ 0.36). Set ≥ 1 to disable.
|
||||
var maxInteriorFill: Double = 0.2
|
||||
}
|
||||
|
||||
var config = Config()
|
||||
@@ -68,9 +78,13 @@ struct GridCallAnalyzer {
|
||||
// Highlight pixels: coloured (saturated) and/or white (thin near-white).
|
||||
var highlight: [CGPoint] = []
|
||||
if config.detectColoredBorder {
|
||||
highlight += sampler.saturatedPoints(threshold: config.colorSaturation,
|
||||
minBrightness: config.colorMinBrightness,
|
||||
hueRange: config.colorHueRange)
|
||||
highlight += config.coloredBorderThinOnly
|
||||
? sampler.thinColoredPoints(threshold: config.colorSaturation,
|
||||
minBrightness: config.colorMinBrightness,
|
||||
hueRange: config.colorHueRange)
|
||||
: sampler.saturatedPoints(threshold: config.colorSaturation,
|
||||
minBrightness: config.colorMinBrightness,
|
||||
hueRange: config.colorHueRange)
|
||||
}
|
||||
if config.detectWhiteBorder { highlight += sampler.thinWhitePoints() }
|
||||
|
||||
@@ -89,7 +103,8 @@ struct GridCallAnalyzer {
|
||||
var speakingBBox: [Int: CGRect] = [:] // tile index -> the ring bbox marking it speaking
|
||||
for ring in rings where ring.count >= config.minHighlightPoints {
|
||||
let bb = Self.boundingBox(ring)
|
||||
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a blob
|
||||
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a speck
|
||||
guard Self.isHollow(ring, bbox: bb, maxInteriorFill: config.maxInteriorFill) else { continue } // a ring, not a filled tile
|
||||
for (i, tile) in tiles.enumerated() where bb.contains(CGPoint(x: tile.textRect.midX, y: tile.textRect.midY)) {
|
||||
speakingBBox[i] = bb
|
||||
}
|
||||
@@ -128,6 +143,18 @@ struct GridCallAnalyzer {
|
||||
return Array(groups.values)
|
||||
}
|
||||
|
||||
/// True if `pts` form a hollow ring (border) rather than a filled blob: at most
|
||||
/// `maxInteriorFill` of the points fall in the central 60%×60% of `bbox`. A thin
|
||||
/// border has an empty interior (≈ 0); a solid camera-off avatar tile or a
|
||||
/// screen-share fill spreads points through the interior (≈ 0.36). Disabled when
|
||||
/// `maxInteriorFill >= 1`.
|
||||
static func isHollow(_ pts: [CGPoint], bbox: CGRect, maxInteriorFill: Double) -> Bool {
|
||||
guard maxInteriorFill < 1, !pts.isEmpty else { return true }
|
||||
let inner = bbox.insetBy(dx: bbox.width * 0.2, dy: bbox.height * 0.2)
|
||||
let innerCount = pts.reduce(into: 0) { if inner.contains($1) { $0 += 1 } }
|
||||
return Double(innerCount) / Double(pts.count) <= maxInteriorFill
|
||||
}
|
||||
|
||||
static func boundingBox(_ pts: [CGPoint]) -> CGRect {
|
||||
var minX = Double.greatestFiniteMagnitude, minY = minX, maxX = -minX, maxY = -minX
|
||||
for p in pts { minX = min(minX, p.x); minY = min(minY, p.y); maxX = max(maxX, p.x); maxY = max(maxY, p.y) }
|
||||
@@ -166,7 +193,11 @@ struct GridCallAnalyzer {
|
||||
}
|
||||
|
||||
private func cleaned(_ s: String) -> String {
|
||||
// Trim whitespace and any trailing punctuation OCR tacks on, so "Mark." folds
|
||||
// into "Mark" rather than becoming a separate phantom speaker.
|
||||
s.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
.trimmingCharacters(in: CharacterSet(charactersIn: ".,;:·•-"))
|
||||
.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
}
|
||||
|
||||
/// True if `s` looks like a participant name label rather than UI chrome. Call
|
||||
@@ -181,6 +212,14 @@ struct GridCallAnalyzer {
|
||||
if s.rangeOfCharacter(from: CharacterSet(charactersIn: "@:/\\|+*=<>#0123456789")) != nil {
|
||||
return false
|
||||
}
|
||||
// Reject domain-like screen-share text (e.g. "WERUNBTC.COM", OCR'd "WERUNBTC.GOM"):
|
||||
// a token whose final dotted segment is a 2–4 letter suffix. Real names don't end
|
||||
// in a TLD; this keeps "Cait's Phone" and initials like "MO".
|
||||
let lower = s.lowercased()
|
||||
if let dot = lower.lastIndex(of: "."), lower.index(after: dot) < lower.endIndex {
|
||||
let suffix = lower[lower.index(after: dot)...]
|
||||
if (2...4).contains(suffix.count) && suffix.allSatisfy({ $0.isLetter }) { return false }
|
||||
}
|
||||
let words = s.split(separator: " ")
|
||||
guard (1...3).contains(words.count) else { return false }
|
||||
let allowed = CharacterSet.letters.union(CharacterSet(charactersIn: "'.-"))
|
||||
|
||||
@@ -15,9 +15,15 @@ final class TimelineBuilder {
|
||||
private let closeFrames: Int
|
||||
private var aliases: [String: String] = [:] // normalized variant -> canonical
|
||||
private var states: [String: NameState] = [:]
|
||||
private var observed: Set<String> = [] // every tile name seen (speaking or not)
|
||||
private var lastFrameT: Double = 0
|
||||
private(set) var segments: [VisualTimeline.Segment] = []
|
||||
|
||||
/// Every distinct participant name the adapter has OCR'd, whether or not they were
|
||||
/// ever detected speaking — the call-size signal (drives "Auto" chunk sizing and a
|
||||
/// complete participant roster, since speaking-detection is intentionally sparse).
|
||||
var observedNames: [String] { observed.sorted() }
|
||||
|
||||
init(openFrames: Int = 2, closeFrames: Int = 2) {
|
||||
self.openFrames = max(1, openFrames)
|
||||
self.closeFrames = max(1, closeFrames)
|
||||
@@ -34,6 +40,9 @@ final class TimelineBuilder {
|
||||
func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
|
||||
lastFrameT = t
|
||||
|
||||
// Record every tile seen (speaking or not) for the participant roster / call size.
|
||||
for obs in observations where !obs.name.isEmpty { observed.insert(canonical(obs.name)) }
|
||||
|
||||
// Best confidence per canonical name that is speaking this frame.
|
||||
var speaking: [String: Double] = [:]
|
||||
for obs in observations where obs.speaking && !obs.name.isEmpty {
|
||||
@@ -93,9 +102,57 @@ final class TimelineBuilder {
|
||||
closeSegment(name: name, state: st)
|
||||
states[name]?.open = false
|
||||
}
|
||||
segments = Self.canonicalizeByFrequency(segments)
|
||||
segments.sort { $0.start < $1.start }
|
||||
}
|
||||
|
||||
/// Fold rare OCR misspellings into the dominant name they're a typo of: a name with
|
||||
/// little total time is remapped to a much longer-running name with the same initial
|
||||
/// within a small edit distance (e.g. "Matt Odel"/"MattOdell"/"Mare" → "Matt Odell"/
|
||||
/// "Mark"). Conservative by design — it won't merge two well-attested speakers, only
|
||||
/// a transient variant into its clearly-dominant canonical. Pure/testable.
|
||||
static func canonicalizeByFrequency(_ segs: [VisualTimeline.Segment],
|
||||
minorMaxSec: Double = 5, dominanceRatio: Double = 8,
|
||||
maxEdits: Int = 2) -> [VisualTimeline.Segment] {
|
||||
var dur: [String: Double] = [:]
|
||||
for s in segs { dur[s.name, default: 0] += s.end - s.start }
|
||||
let names = Array(dur.keys)
|
||||
var remap: [String: String] = [:]
|
||||
for minor in names {
|
||||
let md = dur[minor]!
|
||||
guard md <= minorMaxSec, let mInit = minor.first else { continue }
|
||||
var best: String?, bestDur = 0.0
|
||||
for major in names where major != minor {
|
||||
let Md = dur[major]!
|
||||
guard Md >= md * dominanceRatio, Md > bestDur, major.first == mInit else { continue }
|
||||
if levenshtein(minor.lowercased(), major.lowercased()) <= maxEdits { best = major; bestDur = Md }
|
||||
}
|
||||
if let b = best { remap[minor] = b }
|
||||
}
|
||||
guard !remap.isEmpty else { return segs }
|
||||
return segs.map { s in
|
||||
remap[s.name].map { VisualTimeline.Segment(start: s.start, end: s.end, name: $0,
|
||||
confidence: s.confidence, source: s.source) } ?? s
|
||||
}
|
||||
}
|
||||
|
||||
/// Levenshtein edit distance (small strings — names).
|
||||
static func levenshtein(_ a: String, _ b: String) -> Int {
|
||||
let x = Array(a), y = Array(b)
|
||||
if x.isEmpty { return y.count }; if y.isEmpty { return x.count }
|
||||
var prev = Array(0...y.count)
|
||||
var cur = [Int](repeating: 0, count: y.count + 1)
|
||||
for i in 1...x.count {
|
||||
cur[0] = i
|
||||
for j in 1...y.count {
|
||||
cur[j] = x[i-1] == y[j-1] ? prev[j-1]
|
||||
: Swift.min(prev[j-1], prev[j], cur[j-1]) + 1
|
||||
}
|
||||
swap(&prev, &cur)
|
||||
}
|
||||
return prev[y.count]
|
||||
}
|
||||
|
||||
// MARK: - Internal
|
||||
|
||||
private struct NameState {
|
||||
|
||||
@@ -75,7 +75,10 @@ final class VisualCapture {
|
||||
}, to: durationSec)
|
||||
|
||||
let artifact = (vision + selfSegs).sorted { $0.start < $1.start }
|
||||
let names = Set(artifact.map { $0.name })
|
||||
// Roster = everyone OCR'd (speaking or not) ∪ the names that produced segments,
|
||||
// so the participant count reflects true call size even when few people were
|
||||
// detected speaking. Drives "Auto" chunk sizing downstream.
|
||||
let names = Set(artifact.map { $0.name }).union(observer.participantNames())
|
||||
let participants = names.sorted().map {
|
||||
VisualTimeline.Participant(name: $0, isSelf: $0 == selfName ? true : nil, aliases: nil)
|
||||
}
|
||||
|
||||
@@ -114,6 +114,10 @@ final class VisualObserver: NSObject, SCStreamDelegate, SCStreamOutput {
|
||||
queue.sync { builder.mergeSelfSpans(spans, selfName: selfName) }
|
||||
}
|
||||
|
||||
/// Every distinct participant name OCR'd over the session (read on the builder's
|
||||
/// queue; safe to call after `stop`).
|
||||
func participantNames() -> [String] { queue.sync { builder.observedNames } }
|
||||
|
||||
// MARK: - SCStreamOutput (on `queue`)
|
||||
|
||||
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
|
||||
|
||||
@@ -138,16 +138,37 @@ final class GridCallAnalyzerTests: XCTestCase {
|
||||
func testNameFilterAgainstRealMeetOCR() {
|
||||
// The exact strings OCR pulled from a real Meet session — only the first
|
||||
// group are participants; the rest are UI chrome that must NOT become speakers.
|
||||
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me"]
|
||||
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me", "Matt Odell"]
|
||||
let junk = ["11:43 AM | rvo-rmjg-rdq", "@ Embassy Er", "Admit 1 guest",
|
||||
"Joined as grant.gilliam@gmail.com", "Others may see your video differently",
|
||||
"Others might still see your full video.", "Your meeting's ready", "efforot",
|
||||
"g* Add others", "g+ Add others", "meet.google.com/rvo-rmjg-rdq",
|
||||
"permission before they can join.", "the meeting", "G"]
|
||||
"permission before they can join.", "the meeting", "G",
|
||||
// Screen-share domain text OCR'd as a name (incl. OCR'd TLDs).
|
||||
"WERUNBTC.COM", "WERUNBTG.COM", "WERUNBTC.GOM"]
|
||||
for n in names { XCTAssertTrue(GridCallAnalyzer.isLikelyName(n), "should keep name: \(n)") }
|
||||
for j in junk { XCTAssertFalse(GridCallAnalyzer.isLikelyName(j), "should drop junk: \(j)") }
|
||||
}
|
||||
|
||||
func testHollowRingKeptFilledTileRejected() {
|
||||
// A thin ring (border): points only on the perimeter of a 120×120 box.
|
||||
var ring: [CGPoint] = []
|
||||
for t in stride(from: 0.0, through: 120, by: 4) {
|
||||
ring.append(.init(x: t, y: 0)); ring.append(.init(x: t, y: 120))
|
||||
ring.append(.init(x: 0, y: t)); ring.append(.init(x: 120, y: t))
|
||||
}
|
||||
let rbb = GridCallAnalyzer.boundingBox(ring)
|
||||
XCTAssertTrue(GridCallAnalyzer.isHollow(ring, bbox: rbb, maxInteriorFill: 0.2))
|
||||
|
||||
// A solid fill (camera-off avatar tile): points across the whole box.
|
||||
var blob: [CGPoint] = []
|
||||
for x in stride(from: 0.0, through: 120, by: 4) {
|
||||
for y in stride(from: 0.0, through: 120, by: 4) { blob.append(.init(x: x, y: y)) }
|
||||
}
|
||||
let bbb = GridCallAnalyzer.boundingBox(blob)
|
||||
XCTAssertFalse(GridCallAnalyzer.isHollow(blob, bbox: bbb, maxInteriorFill: 0.2))
|
||||
}
|
||||
|
||||
func testWhiteBorderDetectorIgnoresColouredBorder() {
|
||||
// Signal looks only for the white border, so a coloured (Meet) border must
|
||||
// not register as a Signal speaker.
|
||||
|
||||
@@ -37,6 +37,45 @@ final class Phase5Tests: XCTestCase {
|
||||
XCTAssertEqual(asm.speakersFile.segments[0].start, 152, accuracy: 0.01)
|
||||
}
|
||||
|
||||
func testChunkModeResolvesBodyLength() {
|
||||
// Fixed presets ignore participant count.
|
||||
XCTAssertEqual(ChunkMode.standard.bodySeconds(participantCount: 99), 150)
|
||||
XCTAssertEqual(ChunkMode.largeGroup.bodySeconds(participantCount: 2), 60)
|
||||
XCTAssertEqual(ChunkMode.fine.bodySeconds(participantCount: nil), 90)
|
||||
// Auto: >4 detected → 60s, ≤4 → 150s, unknown → 150s.
|
||||
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 6), 60)
|
||||
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 4), 150)
|
||||
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: nil), 150)
|
||||
}
|
||||
|
||||
func testChunkOverlapScalesWithBody() {
|
||||
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 150), 15) // capped
|
||||
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 60), 8) // floored (60*0.12=7.2→8)
|
||||
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 90), 11) // 90*0.12=10.8→11
|
||||
}
|
||||
|
||||
func testPlanChunksShortBodyChunksAShortCall() {
|
||||
// A 100s call would be ONE chunk at the 2.5-min default, but at a 60s body it
|
||||
// splits — so "Large group" actually re-chunks medium calls.
|
||||
let c = SessionPackager.planChunks(durationSec: 100, chunkSeconds: 60,
|
||||
overlapSeconds: 8, thresholdSec: 72)
|
||||
XCTAssertEqual(c.count, 2)
|
||||
XCTAssertEqual(c[0].bodyStart, 0); XCTAssertEqual(c[0].bodyEnd, 60)
|
||||
XCTAssertEqual(c[1].bodyStart, 60); XCTAssertEqual(c[1].bodyEnd, 100)
|
||||
}
|
||||
|
||||
func testDropStuckSpansRemovesWholeCallCue() {
|
||||
let segs = [
|
||||
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant Gilliam", confidence: 1, source: "vision"), // stuck whole-call tile
|
||||
VisualTimeline.Segment(start: 100, end: 130, name: "Matt Odell", confidence: 0.9, source: "vision"), // real
|
||||
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant", confidence: 1, source: "mic_vad"), // self span: keep
|
||||
]
|
||||
let out = TranscriptPipeline.dropStuckSpans(segs, duration: 1976)
|
||||
XCTAssertFalse(out.contains { $0.name == "Grant Gilliam" }) // 96% of call in one span → dropped
|
||||
XCTAssertTrue(out.contains { $0.name == "Matt Odell" }) // short real span kept
|
||||
XCTAssertTrue(out.contains { $0.source == "mic_vad" }) // self never dropped
|
||||
}
|
||||
|
||||
func testRebaseClipsAndRebases() throws {
|
||||
let segs = [
|
||||
VisualTimeline.Segment(start: 140, end: 160, name: "A", confidence: 0.9, source: "vision"),
|
||||
|
||||
@@ -11,6 +11,27 @@ final class VisualObserverTests: XCTestCase {
|
||||
(id, CGRect(x: 0, y: 0, width: w, height: h))
|
||||
}
|
||||
|
||||
func testCanonicalizeFoldsOcrMisspellingsIntoDominantName() {
|
||||
func seg(_ s: Double, _ e: Double, _ n: String) -> VisualTimeline.Segment {
|
||||
.init(start: s, end: e, name: n, confidence: 0.9, source: "vision")
|
||||
}
|
||||
let segs = [
|
||||
seg(0, 1689, "Matt Odell"), // dominant
|
||||
seg(1700, 1702, "Matt Odel"), // OCR typo → fold
|
||||
seg(1702, 1702.3, "MattOdell"), // dropped-space typo → fold
|
||||
seg(0, 1155, "Mark"), // dominant
|
||||
seg(1200, 1201, "Mare"), // OCR typo → fold into Mark
|
||||
seg(0, 4, "Sidisel"), // screen junk, no near-twin → kept (dropped later, no voice match)
|
||||
]
|
||||
let names = Set(TimelineBuilder.canonicalizeByFrequency(segs).map { $0.name })
|
||||
XCTAssertTrue(names.contains("Matt Odell"))
|
||||
XCTAssertTrue(names.contains("Mark"))
|
||||
XCTAssertFalse(names.contains("Matt Odel"))
|
||||
XCTAssertFalse(names.contains("MattOdell"))
|
||||
XCTAssertFalse(names.contains("Mare"))
|
||||
XCTAssertTrue(names.contains("Sidisel"))
|
||||
}
|
||||
|
||||
func testPrefersMatchingWindowIDOverLargest() {
|
||||
// The Meet window (id 42) is NOT the largest — must still be chosen by ID.
|
||||
let candidates = [c(7, 1600, 1000), c(42, 800, 600), c(9, 1200, 900)]
|
||||
|
||||
@@ -135,10 +135,11 @@ Full request/response shapes, curl examples, limits, and error formats are in
|
||||
|
||||
## 7. Remaining open items (small)
|
||||
|
||||
1. **Base URL — RESOLVED.** `https://192.168.1.72:62419`, also
|
||||
`https://immense-voyage.local:62419` (prefer the `.local` form; it survives IP
|
||||
changes). Ship the `.local` host as the default; keep it editable in settings.
|
||||
Service-discovery at `GET /api/endpoints`.
|
||||
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
|
||||
over a raw IP, since it survives IP changes) — configured in Settings or via the
|
||||
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
|
||||
the default; keep it editable in settings. Service-discovery at
|
||||
`GET /api/endpoints`.
|
||||
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
|
||||
toggle if the user wants to eyeball the timeline first.
|
||||
3. **Retention** — keep the session folder after a successful hand-off, or prune
|
||||
|
||||
@@ -76,12 +76,13 @@ locally — the mic track is the user's known identity / VAD source.)
|
||||
|
||||
## 3. SparkControl — connection (real)
|
||||
|
||||
- **Base URL (confirmed):** `https://192.168.1.72:62419` — also reachable at
|
||||
`https://immense-voyage.local:62419` (the `.local` form survives IP changes;
|
||||
**prefer it as the default**). Service-discovery JSON is at
|
||||
- **Base URL (confirmed):** a private LAN host — a `.local` mDNS name (preferred
|
||||
over a raw IP; it survives IP changes) — configured in Settings or via the
|
||||
`SPARK_BACKEND_URL` env var, and **never committed**. Service-discovery JSON is at
|
||||
`GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio
|
||||
endpoints in §4–§5 hang off this base. Still **make it a setting** so the host
|
||||
can change, but ship `https://immense-voyage.local:62419` as the default.
|
||||
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
||||
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
||||
the default.
|
||||
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
||||
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
||||
the Start9 Root CA into the trust store.
|
||||
|
||||
+8
-5
@@ -7,17 +7,20 @@ options:
|
||||
createIntermediateGroups: true
|
||||
groupSortPosition: top
|
||||
|
||||
# Signing identity (DEVELOPMENT_TEAM) is kept out of source in a gitignored xcconfig
|
||||
# so the Team ID isn't committed. Copy Config/Signing.xcconfig.example to
|
||||
# Config/Signing.xcconfig and set your team. Keeping the value stable is what makes
|
||||
# macOS TCC grants (Mic / Screen Recording / Accessibility) persist across rebuilds.
|
||||
configFiles:
|
||||
Debug: Config/Signing.xcconfig
|
||||
Release: Config/Signing.xcconfig
|
||||
|
||||
settings:
|
||||
base:
|
||||
MARKETING_VERSION: "0.1.0"
|
||||
CURRENT_PROJECT_VERSION: "1"
|
||||
SWIFT_VERSION: "5.0"
|
||||
CODE_SIGN_STYLE: Automatic
|
||||
# Grant's free personal team (cert OU). Baked in so `xcodegen generate` keeps
|
||||
# a STABLE signing identity across regenerations — macOS ties TCC permission
|
||||
# grants (Mic / Screen Recording / Accessibility) to this identity, so a
|
||||
# stable team is what makes those permissions persist across rebuilds.
|
||||
DEVELOPMENT_TEAM: "BK4Y6CXN35"
|
||||
|
||||
targets:
|
||||
Ten31Transcripts:
|
||||
|
||||
Reference in New Issue
Block a user