Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 3dd02f8ce6 | |||
| b0a4b50dac | |||
| 9a80c7c96e | |||
| 18af17f26c | |||
| 19ca85abd5 | |||
| 98a198471c | |||
| a273e768dc | |||
| c81bdc4cba | |||
| 836b930083 | |||
| 217639f12e |
@@ -17,3 +17,10 @@ build/
|
|||||||
|
|
||||||
# Personal call screenshots / fixtures (faces, contact names) — never commit
|
# Personal call screenshots / fixtures (faces, contact names) — never commit
|
||||||
example-screenshots/
|
example-screenshots/
|
||||||
|
|
||||||
|
# Local signing identity (Apple Team ID) — keep out of source; template is committed
|
||||||
|
Config/Signing.xcconfig
|
||||||
|
|
||||||
|
# Local env files (e.g. SPARK_BACKEND_URL for dev/harness runs) — never commit
|
||||||
|
.env
|
||||||
|
.env.local
|
||||||
|
|||||||
@@ -0,0 +1,89 @@
|
|||||||
|
# AGENTS.md — Ten31 Transcripts
|
||||||
|
|
||||||
|
Native macOS **menu-bar app** that detects video calls, records dual-track audio + watches the call window for active-speaker cues, and sends audio + a visual timeline to a self-hosted **SparkControl** backend that does transcription/diarization/naming — producing named transcripts and recaps.
|
||||||
|
|
||||||
|
## Stack (versions that matter)
|
||||||
|
- **Swift 5.0**, **SwiftUI** + AppKit, macOS **13.0** deployment target. `LSUIElement` (menu-bar only, no Dock icon).
|
||||||
|
- Project is generated by **XcodeGen** from `project.yml` (`brew install xcodegen`). `*.xcodeproj` is **gitignored** — regenerate, don't edit.
|
||||||
|
- Full Xcode lives at `/Applications/Xcode.app`, but `xcode-select` points at CommandLineTools → **set `DEVELOPER_DIR` for every `xcodebuild`**.
|
||||||
|
- Bundle id `xyz.ten31.transcripts`; `DEVELOPMENT_TEAM` (Apple Team ID) is set in a **gitignored `Config/Signing.xcconfig`** (copy `Config/Signing.xcconfig.example` and set your team). Keep it stable — a constant signing identity is what preserves TCC grants across rebuilds.
|
||||||
|
- Backend: SparkControl gateway at `$SPARK_BACKEND_URL` (a private LAN `.local` host; self-signed cert, so TLS-skip is intentional). Resolution order: a value saved in **Settings → SparkControl backend** (UserDefaults) wins, else the `SPARK_BACKEND_URL` env var, else the placeholder default in `AppSettings.swift`. Diarization = Sortformer/TitaNet (**mono-only**, ~4 speakers/chunk); LLM = Qwen3 via OpenAI-compatible `/v1/chat/completions`; audio via `/api/audio/label-merge`.
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
First time on a machine — create the local signing config (else `xcodegen generate`/signing won't find a team):
|
||||||
|
```
|
||||||
|
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||||
|
```
|
||||||
|
Regenerate the Xcode project (after adding/removing/renaming any source file):
|
||||||
|
```
|
||||||
|
xcodegen generate
|
||||||
|
```
|
||||||
|
Build + run all tests:
|
||||||
|
```
|
||||||
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd
|
||||||
|
```
|
||||||
|
Run a **single** test (target/class/method):
|
||||||
|
```
|
||||||
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild test \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-destination 'platform=macOS' -derivedDataPath /tmp/ten31-dd \
|
||||||
|
-only-testing:Ten31TranscriptsTests/SpeakerReconcilerTests/testCosine
|
||||||
|
```
|
||||||
|
Build only: replace `test` with `build`. **Lint/format:** none configured (no SwiftLint/SwiftFormat/Makefile); adding one is tracked in `ROADMAP.md`.
|
||||||
|
Build a standalone app and install/run it (Xcode does **not** need to stay open):
|
||||||
|
```
|
||||||
|
DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer xcodebuild \
|
||||||
|
-project Ten31Transcripts.xcodeproj -scheme Ten31Transcripts \
|
||||||
|
-configuration Release -derivedDataPath /tmp/ten31-release build
|
||||||
|
ditto /tmp/ten31-release/Build/Products/Release/Ten31Transcripts.app /Applications/Ten31Transcripts.app
|
||||||
|
open /Applications/Ten31Transcripts.app
|
||||||
|
```
|
||||||
|
**Fast validation harness** (preferred for visual/backend logic): compile the specific `Ten31Transcripts/**.swift` files plus a `main.swift` with `xcrun --sdk macosx swiftc -O ... main.swift -o x` and run against real fixtures (`example-screenshots/`) or saved sessions. Top-level code must live in the file literally named `main.swift`.
|
||||||
|
|
||||||
|
## Layout (day one)
|
||||||
|
- `Ten31Transcripts/App/` — `@main` entry + `AppDelegate`.
|
||||||
|
- `Ten31Transcripts/Session/` — `SessionController` (state machine), `TranscriptPipeline`, `SessionPackager` (chunking), `TranscriptAssembler`, `SpeakerReconciler`, `ChunkPlan` (`ChunkMode`), `SpeakersFile`.
|
||||||
|
- `Ten31Transcripts/Visual/` — `VisualCapture`/`VisualObserver` (ScreenCaptureKit, ~3fps), `GridCallAnalyzer` (+ `FrameSampler`, `TextRecognizer`, `TimelineBuilder`, `VisualTimeline`, `SpeakerObservation`).
|
||||||
|
- `Ten31Transcripts/Adapters/` — per-app screen-readers (`MeetAdapter`, `ZoomAdapter`, `TeamsAdapter`, `SignalAdapter`) + `AdapterRegistry`.
|
||||||
|
- `Ten31Transcripts/Audio/` — `AudioRecorder`, `MicVAD`, `ChannelSelfVAD`.
|
||||||
|
- `Ten31Transcripts/Backend/` — `SparkControlClient`, `GatewayLLMClient`, `VoiceprintStore`, `SparkControlHealth`, `InsecureTrustDelegate` (TLS skip).
|
||||||
|
- `Ten31Transcripts/Recap/` — `RecapAnalyzer`, `RecapRenderer` (writes `transcript.md` + `recap.html`), `RecapModels`, `RecapTemplate`, `SpeakerEditing`, `RecapEditModel`.
|
||||||
|
- `Ten31Transcripts/{Detection,Permissions,Settings,UI,Support}/` — `CallDetector`; `PermissionsManager`; `AppSettings` (UserDefaults); SwiftUI views + AppKit window hosts; `Info.plist` + entitlements.
|
||||||
|
- `Ten31TranscriptsTests/` — XCTest. `example-screenshots/` — real fixtures (gitignored). `docs/`, `README.md`.
|
||||||
|
- **Runtime output** (default `~/Ten31Transcripts/sessions/<ts>_<app>/`, configurable in Settings): `mic.wav`, `system.wav`, `mixed_mono_16k.wav`, `self_vad.json`, `visual_timeline.json`, `speakers.json` (output), `cluster_fingerprints.json`, `recap.{html,json}`, `transcript.md`.
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
- Match the surrounding file's style; small reviewable diffs; comments explain **why**, not what.
|
||||||
|
- Write/extend XCTest alongside non-trivial changes; pure logic (chunking, reconciliation, analyzer math) is unit-tested offline.
|
||||||
|
- Commits: imperative mood, concise; authored by Grant. **No remote is configured** — confirm where to push (choosing one is tracked in `ROADMAP.md`). Branch before committing; never commit to `main` without asking.
|
||||||
|
- Never commit recordings, transcripts, screenshots, or the generated `*.xcodeproj`.
|
||||||
|
- No API keys/tokens/passwords in the repo. The backend host (`$SPARK_BACKEND_URL`) and the Apple Team ID (`Config/Signing.xcconfig`, gitignored) are kept out of source — real values live in Settings/UserDefaults and the local xcconfig. Build env vars: `DEVELOPER_DIR` (required) and optional `SPARK_BACKEND_URL`.
|
||||||
|
|
||||||
|
## Always
|
||||||
|
- Set `DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer` on every `xcodebuild`.
|
||||||
|
- Run `xcodegen generate` after adding/removing/renaming source files.
|
||||||
|
- Treat the backend as the owner of transcription, diarization, and speaker naming; the app only records, watches, packages, and reconciles hints.
|
||||||
|
- Identify **self by the mic channel** + the single name in Settings → Your name, and keep that name reserved so the LLM never assigns it to another speaker.
|
||||||
|
- Treat visual active-speaker cues as **naming hints over audio diarization** (the backbone): prefer sparse-but-correct detection over dense-but-wrong.
|
||||||
|
- Send the backend dual-channel (`mic_file` + `system_file`) when the system track is healthy, else the mono `mixed_mono_16k.wav`; keep backend calls **sequential** (one in flight).
|
||||||
|
- After any code change, rebuild Release + `ditto` to `/Applications` — the installed copy does **not** auto-update.
|
||||||
|
|
||||||
|
## Never
|
||||||
|
- **Never write video frames to disk** — analyze in-memory and release immediately (privacy non-negotiable).
|
||||||
|
- **Never add Co-Authored-By / "Generated with" / any AI or tool attribution** to commits or PRs.
|
||||||
|
- Never commit secrets, recordings, transcripts, or `example-screenshots/` (faces + contact names).
|
||||||
|
- Never do per-platform display-name matching for self (Zoom/Meet/Signal names differ) — channel + one canonical name only.
|
||||||
|
- Never treat a solid camera-off avatar tile (Meet's orange/magenta fill) as an active speaker — the real cue is a thin **hollow** coloured ring; require thin-edge + hue gate (see `GridCallAnalyzer.isHollow`, `FrameSampler.thinColoredPoints`).
|
||||||
|
- Never collapse adjacent same-speaker transcript segments (reverted by request) — one line per diarized utterance.
|
||||||
|
- Never send call audio to a raw IP the user didn't configure. The backend host (`$SPARK_BACKEND_URL`) is a private `.local` mDNS name a plain `swiftc` binary can't resolve via URLSession (`-1009`) — use the **real app** for backend runs (or `curl` for health checks).
|
||||||
|
- Never commit to `main` or force-push a shared branch; branch first and ask.
|
||||||
|
|
||||||
|
## Current state
|
||||||
|
Present tense; overwritten each session. 69 tests pass; `/Applications/Ten31Transcripts.app` matches HEAD and runs.
|
||||||
|
- **Working:** call detection (Meet/Zoom/Teams/Signal), dual-track capture, dual-channel + chunked backend hand-off, speaker reconciliation, recap (`transcript.md` + recap-relay-styled `recap.html`), speaker editor, configurable chunk length, standalone Settings window.
|
||||||
|
- **In progress:** the Meet visual fix (reject solid camera-off tiles) is unverified end-to-end — no clean run exists yet; the saved Meet session's `visual_timeline.json` predates the fix.
|
||||||
|
- **Decided but not implemented:** none open (deferred items live in `ROADMAP.md`).
|
||||||
|
- **Known bugs:** Meet speaking-detection is sparse (faint blue border); the mic channel emits some sub-second junk "self" fragments; the same person on desktop-mic vs phone-speakerphone does not unify by voiceprint.
|
||||||
|
- **Next:** (1) re-process the saved Meet session in the app, then read its `speakers.json` + `cluster_fingerprints.json` to confirm ~4 speakers recover; (2) confirm Settings → Your name = "Grant"; (3) record a fresh Meet call to validate the fix on a clean capture; (4) decide a git remote and push.
|
||||||
@@ -0,0 +1,4 @@
|
|||||||
|
// Template for Config/Signing.xcconfig (which is gitignored).
|
||||||
|
// Copy to Config/Signing.xcconfig and set your Apple Developer Team ID
|
||||||
|
// (Xcode ▸ Settings ▸ Accounts, or `security find-identity -p codesigning -v`).
|
||||||
|
DEVELOPMENT_TEAM = YOUR_APPLE_TEAM_ID
|
||||||
@@ -14,25 +14,30 @@ This repo is at **Phase 0** (scaffold, permissions, backend health check).
|
|||||||
```sh
|
```sh
|
||||||
brew install xcodegen
|
brew install xcodegen
|
||||||
```
|
```
|
||||||
3. **Generate the project:**
|
3. **Set your signing team.** The Apple Team ID is kept out of source in a
|
||||||
|
gitignored `Config/Signing.xcconfig`. Copy the template and set your team:
|
||||||
|
```sh
|
||||||
|
cp Config/Signing.xcconfig.example Config/Signing.xcconfig # then set DEVELOPMENT_TEAM
|
||||||
|
```
|
||||||
|
`xcodegen` wires it in via `configFiles`, so **Signing & Capabilities** shows the
|
||||||
|
team automatically — no manual selection. Keep the value stable so macOS
|
||||||
|
preserves the app's permission (TCC) grants across rebuilds. Edit the xcconfig,
|
||||||
|
not Xcode — `xcodegen generate` overwrites Xcode-side changes.
|
||||||
|
4. **Generate the project:**
|
||||||
```sh
|
```sh
|
||||||
xcodegen generate
|
xcodegen generate
|
||||||
```
|
```
|
||||||
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
This creates `Ten31Transcripts.xcodeproj` (git-ignored — regenerate any time).
|
||||||
4. **Open it:**
|
5. **Open it:**
|
||||||
```sh
|
```sh
|
||||||
open Ten31Transcripts.xcodeproj
|
open Ten31Transcripts.xcodeproj
|
||||||
```
|
```
|
||||||
5. Signing is preconfigured: `project.yml` sets `DEVELOPMENT_TEAM` to the free
|
|
||||||
personal team `BK4Y6CXN35` with automatic signing, so **Signing & Capabilities
|
|
||||||
should already show the team** — no manual selection needed. (If you ever sign
|
|
||||||
with a different Apple ID, update `DEVELOPMENT_TEAM` in `project.yml`, not in
|
|
||||||
Xcode — `xcodegen generate` overwrites Xcode-side changes.)
|
|
||||||
6. Press **Run** (⌘R).
|
6. Press **Run** (⌘R).
|
||||||
|
|
||||||
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
> **Note:** after adding files in a new phase, re-run `xcodegen generate` and let
|
||||||
> Xcode reload the project. The signing team persists because it lives in
|
> Xcode reload the project. The signing team persists because it lives in
|
||||||
> `project.yml`, so macOS permissions stay granted across rebuilds.
|
> `Config/Signing.xcconfig` (gitignored), so macOS permissions stay granted across
|
||||||
|
> rebuilds.
|
||||||
|
|
||||||
## What Phase 0 does
|
## What Phase 0 does
|
||||||
|
|
||||||
@@ -64,5 +69,6 @@ Ten31TranscriptsTests/ # placeholder; real tests land in Phase 3
|
|||||||
|
|
||||||
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
- **App Sandbox is off** and **Hardened Runtime is off** — this is a personal,
|
||||||
LAN-only tool that must observe other apps. Revisit only if distributing.
|
LAN-only tool that must observe other apps. Revisit only if distributing.
|
||||||
- The default backend host is `https://immense-voyage.local:62419` (editable in
|
- The backend host is a private LAN address — set it in **Settings**, or seed it
|
||||||
Settings).
|
from the `SPARK_BACKEND_URL` env var; the committed default is only a neutral
|
||||||
|
placeholder (`https://your-spark-backend.local`).
|
||||||
|
|||||||
+27
@@ -0,0 +1,27 @@
|
|||||||
|
# ROADMAP — Ten31 Transcripts
|
||||||
|
|
||||||
|
Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state.
|
||||||
|
|
||||||
|
## Visual detection
|
||||||
|
- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
|
||||||
|
- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
|
||||||
|
- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
|
||||||
|
- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
|
||||||
|
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
||||||
|
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
||||||
|
|
||||||
|
## Audio / speakers
|
||||||
|
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
||||||
|
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
||||||
|
|
||||||
|
## App / UX
|
||||||
|
- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
|
||||||
|
- Constrain recap reading width on very wide windows (long line length in the summary band).
|
||||||
|
|
||||||
|
## Tooling / repo
|
||||||
|
- Decide and configure a git remote (none set); then push.
|
||||||
|
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
||||||
|
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
||||||
|
|
||||||
|
## Deferred decisions
|
||||||
|
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|
||||||
@@ -32,6 +32,16 @@ struct MeetAdapter: AppAdapter {
|
|||||||
// The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
|
// The bright ring (#1a73e8) is ~0.89 sat but the lighter glow (#8ab4f8) is
|
||||||
// ~0.44, below the 0.5 default — lower the threshold so the glow registers.
|
// ~0.44, below the 0.5 default — lower the threshold so the glow registers.
|
||||||
config.colorSaturation = 0.35
|
config.colorSaturation = 0.35
|
||||||
|
// Meet's active cue is a thin BLUE (≈210°) ring + audio pill. Detect thin blue
|
||||||
|
// EDGES only, gated to blue: this rejects solid camera-off avatar tiles (orange
|
||||||
|
// ≈30°, magenta ≈340°), which otherwise read as "speaking" for the whole call
|
||||||
|
// and collapse every remote voice onto one name. Validated on real fixtures.
|
||||||
|
config.coloredBorderThinOnly = true
|
||||||
|
config.colorHueRange = 180...240
|
||||||
|
// Meet's blue border is faint; real rings measure ≈0.20–0.30 interior fill while
|
||||||
|
// solid tiles measure ≈0.36, so allow a higher fill here than the 0.2 default to
|
||||||
|
// recover real borders without readmitting the solid-tile false positives.
|
||||||
|
config.maxInteriorFill = 0.3
|
||||||
config.tileExpandX = 3.0
|
config.tileExpandX = 3.0
|
||||||
config.tileExpandY = 5.0
|
config.tileExpandY = 5.0
|
||||||
self.analyzer = GridCallAnalyzer(config: config)
|
self.analyzer = GridCallAnalyzer(config: config)
|
||||||
|
|||||||
@@ -82,6 +82,10 @@ enum RecapRenderer {
|
|||||||
|
|
||||||
// MARK: - HTML
|
// MARK: - HTML
|
||||||
|
|
||||||
|
/// Mirror of recap-relay's job-output view: a header, an optional band of recap
|
||||||
|
/// cards (summary + takeaways), then a two-pane split — topic list on the left,
|
||||||
|
/// full diarized transcript on the right, click a topic to jump + highlight its
|
||||||
|
/// range. Self-contained (data baked in; the click handler is inline JS).
|
||||||
static func html(file: SpeakersFile, result: RecapResult, title: String,
|
static func html(file: SpeakersFile, result: RecapResult, title: String,
|
||||||
entries: [RecapAnalyzer.Entry]) -> String {
|
entries: [RecapAnalyzer.Entry]) -> String {
|
||||||
let speakers = RecapAnalyzer.orderedSpeakerNames(entries)
|
let speakers = RecapAnalyzer.orderedSpeakerNames(entries)
|
||||||
@@ -91,66 +95,78 @@ enum RecapRenderer {
|
|||||||
return "<span class=\"chip\" style=\"background:\(c)\">\(esc(name))</span>"
|
return "<span class=\"chip\" style=\"background:\(c)\">\(esc(name))</span>"
|
||||||
}
|
}
|
||||||
|
|
||||||
var body = ""
|
// Header: title, meta line, speaker legend.
|
||||||
let sub = "\(esc(file.app)) · \(RecapAnalyzer.mmss(file.durationSec))"
|
let sub = "\(esc(file.app)) · \(RecapAnalyzer.mmss(file.durationSec))"
|
||||||
+ (speakers.isEmpty ? "" : " · \(speakers.count) speaker\(speakers.count == 1 ? "" : "s")")
|
+ (speakers.isEmpty ? "" : " · \(speakers.count) speaker\(speakers.count == 1 ? "" : "s")")
|
||||||
body += "<header><h1>\(esc(title))</h1><div class=\"sub\">\(sub)</div>"
|
var header = "<div class=\"header\"><div class=\"htext\"><h1>\(esc(title))</h1><div class=\"meta\">\(sub)</div></div>"
|
||||||
if !speakers.isEmpty {
|
if !speakers.isEmpty {
|
||||||
body += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
|
header += "<div class=\"legend\">" + speakers.map { chip($0) }.joined() + "</div>"
|
||||||
}
|
}
|
||||||
body += "</header>"
|
header += "</div>"
|
||||||
|
|
||||||
|
// Recap cards band (summary + template takeaways).
|
||||||
|
var cards = ""
|
||||||
if let x = result.extras {
|
if let x = result.extras {
|
||||||
if !x.tldr.isEmpty {
|
if !x.tldr.isEmpty {
|
||||||
body += card("Summary", "<p>\(esc(x.tldr))</p>"
|
cards += card("Summary", "<p>\(esc(x.tldr))</p>"
|
||||||
+ (x.primarySpeakers.isEmpty ? "" : "<p class=\"muted\">Primary: \(x.primarySpeakers.map(esc).joined(separator: ", "))</p>"))
|
+ (x.primarySpeakers.isEmpty ? "" : "<p class=\"muted\">Primary: \(x.primarySpeakers.map(esc).joined(separator: ", "))</p>"))
|
||||||
}
|
}
|
||||||
for section in x.sections where !section.isEmpty {
|
for section in x.sections where !section.isEmpty {
|
||||||
switch section.kind {
|
switch section.kind {
|
||||||
case .paragraph:
|
case .paragraph:
|
||||||
body += card(section.title, "<p>\(esc(section.paragraph))</p>")
|
cards += card(section.title, "<p>\(esc(section.paragraph))</p>")
|
||||||
case .bullets:
|
case .bullets:
|
||||||
body += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
|
cards += card(section.title, "<ul>" + section.bullets.map { "<li>\(esc($0))</li>" }.joined() + "</ul>")
|
||||||
case .items:
|
case .items:
|
||||||
let lis = section.items.map { item -> String in
|
let lis = section.items.map { item -> String in
|
||||||
var s = "<li>\(esc(item.text))"
|
var s = "<li>\(esc(item.text))"
|
||||||
if let who = item.who { s += " <strong>\(esc(who))</strong>" }
|
if let who = item.who { s += " <span class=\"who\">\(esc(who))</span>" }
|
||||||
if let note = item.note { s += " <span class=\"muted\">(\(esc(note)))</span>" }
|
if let note = item.note { s += " <span class=\"note\">(\(esc(note)))</span>" }
|
||||||
if let when = item.when { s += " <span class=\"ts\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
|
if let when = item.when { s += " <span class=\"ts-badge\">\(RecapAnalyzer.mmss(Double(when)))</span>" }
|
||||||
return s + "</li>"
|
return s + "</li>"
|
||||||
}.joined()
|
}.joined()
|
||||||
body += card(section.title, "<ul>\(lis)</ul>")
|
cards += card(section.title, "<ul>\(lis)</ul>")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
let band = cards.isEmpty ? "" : "<div class=\"band\">\(cards)</div>"
|
||||||
|
|
||||||
if !result.sections.isEmpty {
|
// Left pane: topic cards (click to jump). data-start/data-end index entries.
|
||||||
var topics = ""
|
var left = "<div class=\"left\">"
|
||||||
for (i, sec) in result.sections.enumerated() {
|
if result.sections.isEmpty {
|
||||||
let range = entries.indices.contains(sec.startIndex) && entries.indices.contains(sec.endIndex)
|
left += "<div class=\"empty\">No topic sections.</div>"
|
||||||
? "<span class=\"ts\">\(RecapAnalyzer.mmss(entries[sec.startIndex].offset))–\(RecapAnalyzer.mmss(entries[sec.endIndex].end))</span>" : ""
|
} else {
|
||||||
topics += "<details class=\"topic\"><summary><span class=\"tnum\">\(i + 1)</span> \(esc(sec.title)) \(range)</summary>"
|
for sec in result.sections {
|
||||||
if !sec.summary.isEmpty { topics += "<p>\(esc(sec.summary))</p>" }
|
let s = max(0, min(sec.startIndex, entries.count - 1))
|
||||||
topics += "<div class=\"turns\">" + turnsHtml(sec, entries: entries, chip: chip) + "</div></details>"
|
let e = max(s, min(sec.endIndex, entries.count - 1))
|
||||||
|
let time = entries.indices.contains(s) && entries.indices.contains(e)
|
||||||
|
? "<span class=\"chunk-time\">\(RecapAnalyzer.mmss(entries[s].offset)) — \(RecapAnalyzer.mmss(entries[e].end))</span>" : ""
|
||||||
|
left += "<div class=\"chunk\" data-start=\"\(s)\" data-end=\"\(e)\" onclick=\"jump(this)\">"
|
||||||
|
+ "<div class=\"chunk-title\">\(esc(sec.title))\(time)</div>"
|
||||||
|
+ (sec.summary.isEmpty ? "" : "<div class=\"chunk-summary\">\(esc(sec.summary))</div>")
|
||||||
|
+ "</div>"
|
||||||
}
|
}
|
||||||
body += card("Topics", topics)
|
|
||||||
}
|
}
|
||||||
|
left += "</div>"
|
||||||
|
|
||||||
let full = entries.map { "<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>" }.joined()
|
// Right pane: full diarized transcript, one line per turn (id=entry-i).
|
||||||
body += "<details class=\"topic\" open><summary>Full Transcript</summary><div class=\"turns\">\(full)</div></details>"
|
var right = "<div class=\"right\">"
|
||||||
|
if entries.isEmpty {
|
||||||
|
right += "<div class=\"empty\">No transcript.</div>"
|
||||||
|
} else {
|
||||||
|
for (i, en) in entries.enumerated() {
|
||||||
|
right += "<div class=\"transcript-line\" id=\"entry-\(i)\">"
|
||||||
|
+ "<span class=\"ts-badge\">\(RecapAnalyzer.mmss(en.offset))</span>"
|
||||||
|
+ chip(en.speaker)
|
||||||
|
+ "<span class=\"ts-text\">\(esc(en.text))</span></div>"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
right += "</div>"
|
||||||
|
|
||||||
|
let body = header + band + "<div class=\"split\">\(left)\(right)</div>"
|
||||||
return htmlShell(title: esc(title), body: body)
|
return htmlShell(title: esc(title), body: body)
|
||||||
}
|
}
|
||||||
|
|
||||||
private static func turnsHtml(_ sec: TopicSection, entries: [RecapAnalyzer.Entry],
|
|
||||||
chip: (String) -> String) -> String {
|
|
||||||
guard sec.startIndex <= sec.endIndex, entries.indices.contains(sec.startIndex), entries.indices.contains(sec.endIndex)
|
|
||||||
else { return "" }
|
|
||||||
return entries[sec.startIndex...sec.endIndex].map {
|
|
||||||
"<div class=\"turn\"><span class=\"ts\">\(RecapAnalyzer.mmss($0.offset))</span> \(chip($0.speaker)) <span class=\"txt\">\(esc($0.text))</span></div>"
|
|
||||||
}.joined()
|
|
||||||
}
|
|
||||||
|
|
||||||
private static func card(_ title: String, _ inner: String) -> String {
|
private static func card(_ title: String, _ inner: String) -> String {
|
||||||
"<section class=\"card\"><h2>\(esc(title))</h2>\(inner)</section>"
|
"<section class=\"card\"><h2>\(esc(title))</h2>\(inner)</section>"
|
||||||
}
|
}
|
||||||
@@ -176,34 +192,63 @@ enum RecapRenderer {
|
|||||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
<title>\(title)</title>
|
<title>\(title)</title>
|
||||||
<style>
|
<style>
|
||||||
:root{--bg:#15171c;--card:#1d2026;--fg:#e6e8ec;--muted:#9aa0aa;--line:#2a2e36;--accent:#5b8def;}
|
:root{--bg:#0a0e1a;--panel:#111827;--panel-2:#1e293b;--line:#1e293b;--line-2:#334155;
|
||||||
|
--fg:#e2e8f0;--fg-dim:#94a3b8;--fg-faint:#64748b;--accent:#818cf8;--accent-soft:#a5b4fc;}
|
||||||
*{box-sizing:border-box}
|
*{box-sizing:border-box}
|
||||||
body{margin:0;background:var(--bg);color:var(--fg);font:15px/1.55 -apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;}
|
body{margin:0;background:var(--bg);color:var(--fg);min-height:100vh;
|
||||||
main{max-width:820px;margin:0 auto;padding:32px 20px 80px;}
|
font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif;font-size:13px;line-height:1.55}
|
||||||
header h1{margin:0 0 4px;font-size:24px}
|
.header{padding:14px 24px;background:var(--panel);border-bottom:1px solid var(--line);
|
||||||
.sub{color:var(--muted);font-size:13px}
|
display:flex;align-items:center;gap:16px;flex-wrap:wrap}
|
||||||
.legend{margin-top:12px;display:flex;flex-wrap:wrap;gap:6px}
|
.header .htext{min-width:0}
|
||||||
.chip{display:inline-block;padding:1px 8px;border-radius:10px;color:#fff;font-size:12px;font-weight:600}
|
.header h1{margin:0;font-size:16px;font-weight:700;color:var(--fg)}
|
||||||
.card{background:var(--card);border:1px solid var(--line);border-radius:12px;padding:16px 18px;margin-top:18px}
|
.header .meta{font-size:11px;color:var(--fg-faint);margin-top:2px;font-variant-numeric:tabular-nums}
|
||||||
.card h2{margin:0 0 10px;font-size:16px;color:var(--accent)}
|
.legend{margin-left:auto;display:flex;flex-wrap:wrap;gap:6px;justify-content:flex-end}
|
||||||
.muted{color:var(--muted)}
|
.chip{display:inline-block;padding:1px 8px;border-radius:999px;color:#fff;font-size:10px;font-weight:700;white-space:nowrap}
|
||||||
ul{margin:0;padding-left:18px} li{margin:4px 0}
|
.band{padding:16px 24px;display:grid;gap:12px}
|
||||||
ul.actions{list-style:none;padding-left:0}
|
.card{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:14px 16px}
|
||||||
.ts{color:var(--muted);font-variant-numeric:tabular-nums;font-size:12px;margin-right:4px}
|
.card h2{margin:0 0 8px;font-size:11px;font-weight:700;text-transform:uppercase;letter-spacing:.04em;color:var(--accent-soft)}
|
||||||
blockquote{margin:0 0 12px;padding:8px 12px;border-left:3px solid var(--accent);background:#0e0f13;border-radius:0 8px 8px 0}
|
.card p{margin:0 0 8px}
|
||||||
blockquote cite{display:block;color:var(--muted);font-size:12px;margin-top:4px;font-style:normal}
|
.card p:last-child{margin-bottom:0}
|
||||||
details.topic{border-top:1px solid var(--line);padding:10px 0}
|
.card .muted{color:var(--fg-dim);font-size:12px}
|
||||||
details.topic > summary{cursor:pointer;font-weight:600;list-style:none}
|
.card ul{margin:0;padding-left:18px}
|
||||||
details.topic > summary::-webkit-details-marker{display:none}
|
.card li{margin:5px 0;color:var(--fg)}
|
||||||
.tnum{display:inline-block;min-width:20px;color:var(--accent);font-weight:700}
|
.card .who{color:var(--accent-soft);font-weight:600}
|
||||||
.turns{margin-top:10px}
|
.card .note{color:var(--fg-faint)}
|
||||||
.turn{margin:6px 0;display:flex;gap:8px;align-items:baseline;flex-wrap:wrap}
|
.split{display:flex;min-height:calc(100vh - 56px)}
|
||||||
.turn .txt{flex:1;min-width:60%}
|
.left{flex:0 0 42%;max-width:42%;border-right:1px solid var(--line);overflow-y:auto;padding:16px;background:var(--bg)}
|
||||||
@media print{body{background:#fff;color:#000}.card,blockquote{background:#fff;border-color:#ccc}details.topic{}.chip{border:1px solid #999}}
|
.right{flex:1;min-width:0;overflow-y:auto;padding:16px;background:var(--panel)}
|
||||||
|
@media(max-width:900px){.split{flex-direction:column}.left,.right{flex:none;max-width:100%;border-right:none}
|
||||||
|
.left{border-bottom:1px solid var(--line)}}
|
||||||
|
.chunk{padding:12px 14px;margin-bottom:8px;background:var(--panel);border:1px solid var(--line);
|
||||||
|
border-radius:10px;cursor:pointer;transition:border-color .15s,background .15s}
|
||||||
|
.chunk:hover{border-color:var(--accent)}
|
||||||
|
.chunk.active{border-color:var(--accent);background:rgba(129,140,248,.06);box-shadow:0 2px 16px rgba(129,140,248,.10)}
|
||||||
|
.chunk-title{font-size:13px;font-weight:700;color:var(--fg);margin-bottom:4px}
|
||||||
|
.chunk-time{font-size:10px;color:var(--fg-faint);margin-left:6px;font-weight:500;font-family:"SF Mono",Menlo,monospace}
|
||||||
|
.chunk-summary{font-size:12px;color:var(--fg-dim);line-height:1.5}
|
||||||
|
.transcript-line{display:flex;gap:10px;padding:4px 8px;border-radius:6px;line-height:1.6;align-items:baseline;scroll-margin-top:16px}
|
||||||
|
.transcript-line.hl{background:rgba(129,140,248,.10)}
|
||||||
|
.ts-badge{flex:0 0 auto;font-family:"SF Mono",Menlo,monospace;font-size:11px;color:var(--accent-soft);min-width:52px}
|
||||||
|
.ts-text{flex:1;font-size:13px;color:var(--fg)}
|
||||||
|
.empty{padding:32px 16px;text-align:center;color:var(--fg-faint)}
|
||||||
|
.foot{padding:14px 24px;color:var(--fg-faint);font-size:11px;border-top:1px solid var(--line)}
|
||||||
|
@media print{body{background:#fff;color:#000}.header,.right,.left,.card,.chunk{background:#fff;border-color:#ccc}
|
||||||
|
.split{display:block}.left,.right{max-width:100%}.chip{border:1px solid #999}}
|
||||||
</style></head>
|
</style></head>
|
||||||
<body><main>\(body)
|
<body>\(body)
|
||||||
<footer class="sub" style="margin-top:40px">Ten31 Transcripts · generated on-device</footer>
|
<div class="foot">Ten31 Transcripts · generated on-device</div>
|
||||||
</main></body></html>
|
<script>
|
||||||
|
function jump(el){
|
||||||
|
document.querySelectorAll('.chunk.active').forEach(function(x){x.classList.remove('active')});
|
||||||
|
el.classList.add('active');
|
||||||
|
var s=+el.dataset.start, e=+el.dataset.end;
|
||||||
|
var t=document.getElementById('entry-'+s);
|
||||||
|
if(t) t.scrollIntoView({behavior:'smooth',block:'start'});
|
||||||
|
document.querySelectorAll('.transcript-line.hl').forEach(function(x){x.classList.remove('hl')});
|
||||||
|
for(var i=s;i<=e;i++){var x=document.getElementById('entry-'+i); if(x) x.classList.add('hl');}
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
</body></html>
|
||||||
"""
|
"""
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,51 @@
|
|||||||
|
import Foundation
|
||||||
|
|
||||||
|
/// How long each diarization *body* chunk should be. Smaller chunks keep fewer
|
||||||
|
/// simultaneous speakers inside one window — Sortformer resolves at most ~4 speakers
|
||||||
|
/// per chunk, and the dual-channel split already spends the local user on the mic
|
||||||
|
/// track, so the system (remote) channel is what can saturate on a big call. The
|
||||||
|
/// cost of going smaller: weaker cross-chunk voiceprints, more cross-chunk speaker
|
||||||
|
/// splitting (the reconciler re-merges some), and more backend round-trips.
|
||||||
|
enum ChunkMode: String, CaseIterable, Identifiable, Codable {
|
||||||
|
case auto, standard, largeGroup, fine
|
||||||
|
|
||||||
|
var id: String { rawValue }
|
||||||
|
|
||||||
|
var label: String {
|
||||||
|
switch self {
|
||||||
|
case .auto: return "Auto (by call size)"
|
||||||
|
case .standard: return "Standard · 2.5 min"
|
||||||
|
case .largeGroup: return "Large group · 60 sec"
|
||||||
|
case .fine: return "Fine · 90 sec"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Fixed body length, or nil for `.auto` (resolved from the participant count).
|
||||||
|
var fixedBodySeconds: Double? {
|
||||||
|
switch self {
|
||||||
|
case .auto: return nil
|
||||||
|
case .standard: return 150
|
||||||
|
case .largeGroup: return 60
|
||||||
|
case .fine: return 90
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// More than this many detected participants makes `.auto` pick the short body,
|
||||||
|
/// so one chunk is less likely to exceed Sortformer's ~4-speaker resolution.
|
||||||
|
static let autoLargeThreshold = 4
|
||||||
|
|
||||||
|
/// Resolve the body length in seconds. `.auto` drops to 60s when more than
|
||||||
|
/// `autoLargeThreshold` participants were detected, else uses the 2.5-min default;
|
||||||
|
/// with no count available (audio-only) it stays at the 2.5-min default.
|
||||||
|
func bodySeconds(participantCount: Int?) -> Double {
|
||||||
|
if let fixed = fixedBodySeconds { return fixed }
|
||||||
|
if let n = participantCount, n > Self.autoLargeThreshold { return 60 }
|
||||||
|
return 150
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Overlap margin scaled to the body length (~12%, clamped 8…15s) so a 60s chunk
|
||||||
|
/// isn't dominated by a fixed 15s margin while a 2.5-min chunk keeps the full 15s.
|
||||||
|
static func overlapSeconds(forBody body: Double) -> Double {
|
||||||
|
max(8, min(15, (body * 0.12).rounded()))
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -256,6 +256,9 @@ final class SessionController: ObservableObject {
|
|||||||
private func startVisual(t0Host: Double, generation: Int, recorder: AudioRecorder) async {
|
private func startVisual(t0Host: Double, generation: Int, recorder: AudioRecorder) async {
|
||||||
guard let capture = pendingCapture else { return } // manual recording → audio-only
|
guard let capture = pendingCapture else { return } // manual recording → audio-only
|
||||||
pendingCapture = nil
|
pendingCapture = nil
|
||||||
|
// Honor the per-app adapter switch: if the user turned this app's adapter off,
|
||||||
|
// skip screen-reading entirely and record audio-only (transcription still runs).
|
||||||
|
guard settings.adapterEnabled[capture.app.rawValue] ?? true else { return }
|
||||||
guard let vc = VisualCapture(app: capture.app, bundleID: capture.bundleID,
|
guard let vc = VisualCapture(app: capture.app, bundleID: capture.bundleID,
|
||||||
windowID: capture.windowID, t0Host: t0Host) else { return }
|
windowID: capture.windowID, t0Host: t0Host) else { return }
|
||||||
// Register the live capture before the await so a quit (prepareForTermination)
|
// Register the live capture before the await so a quit (prepareForTermination)
|
||||||
@@ -375,12 +378,15 @@ final class SessionController: ObservableObject {
|
|||||||
let settings = self.settings
|
let settings = self.settings
|
||||||
let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
|
let pipeline = TranscriptPipeline(baseURL: settings.backendBaseURL,
|
||||||
skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
|
skipTLS: settings.skipTLSVerification, voiceprints: voiceprints)
|
||||||
|
// Resolve the diarization chunk length from the setting; "Auto" uses the
|
||||||
|
// participant count the visual capture saw for this session.
|
||||||
|
let chunkSeconds = settings.chunk.bodySeconds(participantCount: Self.participantCount(in: inputs.folder))
|
||||||
do {
|
do {
|
||||||
let speakers = try await pipeline.process(
|
let speakers = try await pipeline.process(
|
||||||
sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
|
sessionFolder: inputs.folder, sessionId: inputs.sessionId, app: inputs.app,
|
||||||
micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
|
micURL: inputs.micURL, systemURL: inputs.systemURL, mixedURL: inputs.mixedURL,
|
||||||
timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
|
timeline: inputs.timeline, selfSpans: inputs.selfSpans, selfName: inputs.selfName,
|
||||||
systemHealthy: inputs.systemHealthy,
|
systemHealthy: inputs.systemHealthy, chunkSeconds: chunkSeconds,
|
||||||
progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
|
progress: { done, total in await MainActor.run { self.transcriptStatus = .processing(done, total) } })
|
||||||
self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
|
self.transcriptStatus = .done(speakers: speakers.speakers.count, segments: speakers.segments.count)
|
||||||
try Task.checkCancellation()
|
try Task.checkCancellation()
|
||||||
@@ -528,6 +534,16 @@ final class SessionController: ObservableObject {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Detected participant count from a session's visual timeline, for "Auto" chunk
|
||||||
|
/// sizing. Nil when there's no visual timeline (audio-only) so callers keep the
|
||||||
|
/// default body length. Counts everyone OCR'd on the call, not just who spoke.
|
||||||
|
private static func participantCount(in folder: URL) -> Int? {
|
||||||
|
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
|
||||||
|
let vt = try? JSONDecoder().decode(VisualTimeline.self, from: data),
|
||||||
|
!vt.participants.isEmpty else { return nil }
|
||||||
|
return vt.participants.count
|
||||||
|
}
|
||||||
|
|
||||||
/// The remote (vision) visual-timeline segments saved for a session, if any.
|
/// The remote (vision) visual-timeline segments saved for a session, if any.
|
||||||
private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
|
private static func remoteTimeline(in folder: URL) -> [VisualTimeline.Segment] {
|
||||||
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
|
guard let data = try? Data(contentsOf: folder.appendingPathComponent("visual_timeline.json")),
|
||||||
|
|||||||
@@ -28,6 +28,7 @@ final class TranscriptPipeline {
|
|||||||
selfSpans: [VADSpan],
|
selfSpans: [VADSpan],
|
||||||
selfName: String,
|
selfName: String,
|
||||||
systemHealthy: Bool,
|
systemHealthy: Bool,
|
||||||
|
chunkSeconds: Double = 150,
|
||||||
progress: ((Int, Int) async -> Void)? = nil) async throws -> SpeakersFile {
|
progress: ((Int, Int) async -> Void)? = nil) async throws -> SpeakersFile {
|
||||||
let fm = FileManager.default
|
let fm = FileManager.default
|
||||||
let dual = systemHealthy
|
let dual = systemHealthy
|
||||||
@@ -36,7 +37,12 @@ final class TranscriptPipeline {
|
|||||||
let duration = dual
|
let duration = dual
|
||||||
? max(SessionPackager.duration(of: micURL), SessionPackager.duration(of: systemURL))
|
? max(SessionPackager.duration(of: micURL), SessionPackager.duration(of: systemURL))
|
||||||
: SessionPackager.duration(of: mixedURL)
|
: SessionPackager.duration(of: mixedURL)
|
||||||
let plan = SessionPackager.planChunks(durationSec: duration)
|
// Chunk to the requested body length; overlap and the single-chunk threshold
|
||||||
|
// scale with it (a 60s body shouldn't be cut by a fixed 15s margin or stay
|
||||||
|
// unchunked below the 2.5-min default threshold).
|
||||||
|
let overlap = ChunkMode.overlapSeconds(forBody: chunkSeconds)
|
||||||
|
let plan = SessionPackager.planChunks(durationSec: duration, chunkSeconds: chunkSeconds,
|
||||||
|
overlapSeconds: overlap, thresholdSec: chunkSeconds * 1.2)
|
||||||
|
|
||||||
// Zero-duration / empty session → a valid empty speakers.json, no backend call.
|
// Zero-duration / empty session → a valid empty speakers.json, no backend call.
|
||||||
if plan.isEmpty || duration <= 0 {
|
if plan.isEmpty || duration <= 0 {
|
||||||
@@ -50,13 +56,20 @@ final class TranscriptPipeline {
|
|||||||
try? fm.createDirectory(at: chunksDir, withIntermediateDirectories: true)
|
try? fm.createDirectory(at: chunksDir, withIntermediateDirectories: true)
|
||||||
defer { try? fm.removeItem(at: chunksDir) } // cleanup on success OR throw
|
defer { try? fm.removeItem(at: chunksDir) } // cleanup on success OR throw
|
||||||
|
|
||||||
|
// Defensive: drop any visual span covering most of the call in one unbroken
|
||||||
|
// segment — the signature of a stuck/false active-speaker cue (e.g. a solid
|
||||||
|
// camera-off tile read as "speaking" the whole call). Such a span would
|
||||||
|
// dominate the backend's name attribution and collapse every voice onto one
|
||||||
|
// name. Also salvages sessions captured before the adapter fix landed.
|
||||||
|
let vis = Self.dropStuckSpans(timeline, duration: duration)
|
||||||
|
|
||||||
// Start from stored voiceprints; accumulate this call's prints across chunks
|
// Start from stored voiceprints; accumulate this call's prints across chunks
|
||||||
// for within-call unification (the store only persists high-confidence ones).
|
// for within-call unification (the store only persists high-confidence ones).
|
||||||
var known = voiceprints.knownVoiceprints()
|
var known = voiceprints.knownVoiceprints()
|
||||||
var results: [TranscriptAssembler.ChunkResult] = []
|
var results: [TranscriptAssembler.ChunkResult] = []
|
||||||
// Mono fallback needs self folded into the timeline; dual sends it separately.
|
// Mono fallback needs self folded into the timeline; dual sends it separately.
|
||||||
let monoTimeline = dual ? timeline
|
let monoTimeline = dual ? vis
|
||||||
: timeline + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
|
: vis + Self.timeline(fromSelfSpans: selfSpans, selfName: selfName)
|
||||||
|
|
||||||
for chunk in plan {
|
for chunk in plan {
|
||||||
try Task.checkCancellation()
|
try Task.checkCancellation()
|
||||||
@@ -70,7 +83,7 @@ final class TranscriptPipeline {
|
|||||||
try SessionPackager.sliceAudio(from: micURL, startSec: chunk.start, endSec: chunk.end, to: micChunk)
|
try SessionPackager.sliceAudio(from: micURL, startSec: chunk.start, endSec: chunk.end, to: micChunk)
|
||||||
try SessionPackager.sliceAudio(from: systemURL, startSec: chunk.start, endSec: chunk.end, to: sysChunk)
|
try SessionPackager.sliceAudio(from: systemURL, startSec: chunk.start, endSec: chunk.end, to: sysChunk)
|
||||||
guard fm.fileExists(atPath: micChunk.path), fm.fileExists(atPath: sysChunk.path) else { continue }
|
guard fm.fileExists(atPath: micChunk.path), fm.fileExists(atPath: sysChunk.path) else { continue }
|
||||||
let timelineData = try SessionPackager.rebasedTimelineData(timeline, start: chunk.start, end: chunk.end)
|
let timelineData = try SessionPackager.rebasedTimelineData(vis, start: chunk.start, end: chunk.end)
|
||||||
let selfVadData = try SessionPackager.rebasedSelfVadData(selfSpans, start: chunk.start, end: chunk.end)
|
let selfVadData = try SessionPackager.rebasedSelfVadData(selfSpans, start: chunk.start, end: chunk.end)
|
||||||
response = try await client.labelMergeDual(
|
response = try await client.labelMergeDual(
|
||||||
micURL: micChunk, systemURL: sysChunk, selfName: selfName, selfVad: selfVadData,
|
micURL: micChunk, systemURL: sysChunk, selfName: selfName, selfVad: selfVadData,
|
||||||
@@ -113,4 +126,14 @@ final class TranscriptPipeline {
|
|||||||
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
static func timeline(fromSelfSpans spans: [VADSpan], selfName: String) -> [VisualTimeline.Segment] {
|
||||||
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
spans.map { .init(start: $0.start, end: $0.end, name: selfName, confidence: $0.confidence, source: "mic_vad") }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Drop visual (vision-source) spans whose single unbroken duration covers at
|
||||||
|
/// least `maxFraction` of the whole call — no one legitimately speaks that long
|
||||||
|
/// without a break, so it's a stuck/false cue. Self spans (mic_vad) are kept.
|
||||||
|
static func dropStuckSpans(_ timeline: [VisualTimeline.Segment], duration: Double,
|
||||||
|
maxFraction: Double = 0.6) -> [VisualTimeline.Segment] {
|
||||||
|
guard duration > 0 else { return timeline }
|
||||||
|
let limit = maxFraction * duration
|
||||||
|
return timeline.filter { $0.source != "vision" || ($0.end - $0.start) < limit }
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -60,6 +60,15 @@ final class AppSettings: ObservableObject {
|
|||||||
didSet { defaults.set(reconcileSpeakers, forKey: Keys.reconcileSpeakers) }
|
didSet { defaults.set(reconcileSpeakers, forKey: Keys.reconcileSpeakers) }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Diarization chunk length (raw value of `ChunkMode`). `.auto` shrinks chunks on
|
||||||
|
/// large calls so a window is less likely to exceed Sortformer's ~4-speaker cap.
|
||||||
|
@Published var chunkMode: String {
|
||||||
|
didSet { defaults.set(chunkMode, forKey: Keys.chunkMode) }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Typed accessor for `chunkMode`.
|
||||||
|
var chunk: ChunkMode { ChunkMode(rawValue: chunkMode) ?? .auto }
|
||||||
|
|
||||||
/// User-editable recap templates (takeaways categories per meeting type).
|
/// User-editable recap templates (takeaways categories per meeting type).
|
||||||
@Published var recapTemplates: [RecapTemplate] {
|
@Published var recapTemplates: [RecapTemplate] {
|
||||||
didSet { persist(recapTemplates, forKey: Keys.recapTemplates) }
|
didSet { persist(recapTemplates, forKey: Keys.recapTemplates) }
|
||||||
@@ -83,11 +92,19 @@ final class AppSettings: ObservableObject {
|
|||||||
|
|
||||||
private let defaults: UserDefaults
|
private let defaults: UserDefaults
|
||||||
|
|
||||||
|
/// Neutral placeholder. The real (private LAN) backend host is never committed —
|
||||||
|
/// it's entered in Settings (persisted to UserDefaults) or seeded from the
|
||||||
|
/// `SPARK_BACKEND_URL` env var for dev/CI/harness runs.
|
||||||
|
static let defaultBackendURL = "https://your-spark-backend.local"
|
||||||
|
|
||||||
init(defaults: UserDefaults = .standard) {
|
init(defaults: UserDefaults = .standard) {
|
||||||
self.defaults = defaults
|
self.defaults = defaults
|
||||||
|
|
||||||
|
// Precedence: a value the user saved in Settings wins; else the env var
|
||||||
|
// (handy when launching from Xcode/terminal); else the placeholder.
|
||||||
self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL)
|
self.backendBaseURL = defaults.string(forKey: Keys.backendBaseURL)
|
||||||
?? "https://immense-voyage.local:62419"
|
?? ProcessInfo.processInfo.environment["SPARK_BACKEND_URL"]
|
||||||
|
?? Self.defaultBackendURL
|
||||||
|
|
||||||
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
|
self.skipTLSVerification = defaults.object(forKey: Keys.skipTLS) as? Bool ?? true
|
||||||
|
|
||||||
@@ -104,6 +121,7 @@ final class AppSettings: ObservableObject {
|
|||||||
self.autoSendOnStop = defaults.object(forKey: Keys.autoSend) as? Bool ?? false
|
self.autoSendOnStop = defaults.object(forKey: Keys.autoSend) as? Bool ?? false
|
||||||
self.recapEnabled = defaults.object(forKey: Keys.recapEnabled) as? Bool ?? true
|
self.recapEnabled = defaults.object(forKey: Keys.recapEnabled) as? Bool ?? true
|
||||||
self.reconcileSpeakers = defaults.object(forKey: Keys.reconcileSpeakers) as? Bool ?? true
|
self.reconcileSpeakers = defaults.object(forKey: Keys.reconcileSpeakers) as? Bool ?? true
|
||||||
|
self.chunkMode = defaults.string(forKey: Keys.chunkMode) ?? ChunkMode.auto.rawValue
|
||||||
|
|
||||||
let loaded = (defaults.data(forKey: Keys.recapTemplates))
|
let loaded = (defaults.data(forKey: Keys.recapTemplates))
|
||||||
.flatMap { try? JSONDecoder().decode([RecapTemplate].self, from: $0) }
|
.flatMap { try? JSONDecoder().decode([RecapTemplate].self, from: $0) }
|
||||||
@@ -126,6 +144,7 @@ final class AppSettings: ObservableObject {
|
|||||||
static let autoSend = "autoSendOnStop"
|
static let autoSend = "autoSendOnStop"
|
||||||
static let recapEnabled = "recapEnabled"
|
static let recapEnabled = "recapEnabled"
|
||||||
static let reconcileSpeakers = "reconcileSpeakers"
|
static let reconcileSpeakers = "reconcileSpeakers"
|
||||||
|
static let chunkMode = "chunkMode"
|
||||||
static let recapTemplates = "recapTemplates"
|
static let recapTemplates = "recapTemplates"
|
||||||
static let defaultTemplate = "defaultTemplateId"
|
static let defaultTemplate = "defaultTemplateId"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -31,6 +31,35 @@ final class EditorWindow {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Hosts the app Settings in a standalone resizable window. Far roomier than the
|
||||||
|
/// old in-popover NavigationLink, which cramped the form into the 320pt menu-bar
|
||||||
|
/// panel and hid most controls below a non-obvious scroll.
|
||||||
|
@MainActor
|
||||||
|
final class SettingsWindow {
|
||||||
|
static let shared = SettingsWindow()
|
||||||
|
private var window: NSWindow?
|
||||||
|
|
||||||
|
func show(settings: AppSettings) {
|
||||||
|
if let window {
|
||||||
|
NSApp.activate(ignoringOtherApps: true)
|
||||||
|
window.makeKeyAndOrderFront(nil)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
let w = NSWindow(
|
||||||
|
contentRect: NSRect(x: 0, y: 0, width: 520, height: 660),
|
||||||
|
styleMask: [.titled, .closable, .resizable, .miniaturizable],
|
||||||
|
backing: .buffered, defer: false)
|
||||||
|
w.title = "Settings"
|
||||||
|
w.isReleasedWhenClosed = false
|
||||||
|
w.center()
|
||||||
|
w.contentViewController = NSHostingController(
|
||||||
|
rootView: SettingsView().environmentObject(settings))
|
||||||
|
window = w
|
||||||
|
NSApp.activate(ignoringOtherApps: true)
|
||||||
|
w.makeKeyAndOrderFront(nil)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Hosts the recap-templates manager in its own resizable window.
|
/// Hosts the recap-templates manager in its own resizable window.
|
||||||
@MainActor
|
@MainActor
|
||||||
final class TemplatesWindow {
|
final class TemplatesWindow {
|
||||||
|
|||||||
@@ -10,21 +10,19 @@ struct MenuBarView: View {
|
|||||||
@EnvironmentObject private var session: SessionController
|
@EnvironmentObject private var session: SessionController
|
||||||
|
|
||||||
var body: some View {
|
var body: some View {
|
||||||
NavigationStack {
|
VStack(alignment: .leading, spacing: 12) {
|
||||||
VStack(alignment: .leading, spacing: 12) {
|
header
|
||||||
header
|
Divider()
|
||||||
Divider()
|
recordingSection
|
||||||
recordingSection
|
Divider()
|
||||||
Divider()
|
permissionsSection
|
||||||
permissionsSection
|
Divider()
|
||||||
Divider()
|
backendSection
|
||||||
backendSection
|
Divider()
|
||||||
Divider()
|
footer
|
||||||
footer
|
|
||||||
}
|
|
||||||
.padding(14)
|
|
||||||
.frame(width: 320)
|
|
||||||
}
|
}
|
||||||
|
.padding(14)
|
||||||
|
.frame(width: 320)
|
||||||
.onAppear { permissions.refresh() }
|
.onAppear { permissions.refresh() }
|
||||||
.task { await refreshHealth() }
|
.task { await refreshHealth() }
|
||||||
}
|
}
|
||||||
@@ -227,9 +225,7 @@ struct MenuBarView: View {
|
|||||||
|
|
||||||
private var footer: some View {
|
private var footer: some View {
|
||||||
HStack {
|
HStack {
|
||||||
NavigationLink("Settings…") {
|
Button("Settings…") { SettingsWindow.shared.show(settings: settings) }
|
||||||
SettingsView()
|
|
||||||
}
|
|
||||||
Spacer()
|
Spacer()
|
||||||
Button("Quit") { NSApplication.shared.terminate(nil) }
|
Button("Quit") { NSApplication.shared.terminate(nil) }
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,6 +7,21 @@ struct SettingsView: View {
|
|||||||
|
|
||||||
var body: some View {
|
var body: some View {
|
||||||
Form {
|
Form {
|
||||||
|
Section("Your name") {
|
||||||
|
TextField("Your name", text: $settings.selfName)
|
||||||
|
.textFieldStyle(.roundedBorder)
|
||||||
|
if isDefaultName {
|
||||||
|
Label("Still set to the default. Enter your real name so your own voice is labeled correctly — and so the AI never gives your name to someone else.",
|
||||||
|
systemImage: "exclamationmark.triangle.fill")
|
||||||
|
.font(.caption)
|
||||||
|
.foregroundStyle(.orange)
|
||||||
|
} else {
|
||||||
|
Text("Labels your microphone channel as you in every transcript, and reserves this name so it’s never assigned to another speaker.")
|
||||||
|
.font(.caption)
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
Section("SparkControl backend") {
|
Section("SparkControl backend") {
|
||||||
TextField("Base URL", text: $settings.backendBaseURL)
|
TextField("Base URL", text: $settings.backendBaseURL)
|
||||||
.textFieldStyle(.roundedBorder)
|
.textFieldStyle(.roundedBorder)
|
||||||
@@ -22,10 +37,14 @@ struct SettingsView: View {
|
|||||||
}
|
}
|
||||||
|
|
||||||
Section("Transcription") {
|
Section("Transcription") {
|
||||||
TextField("Your name", text: $settings.selfName)
|
|
||||||
.textFieldStyle(.roundedBorder)
|
|
||||||
Toggle("Auto-send recordings to backend", isOn: $settings.autoSendOnStop)
|
Toggle("Auto-send recordings to backend", isOn: $settings.autoSendOnStop)
|
||||||
Toggle("Reconcile speakers (merge splits + name from content)", isOn: $settings.reconcileSpeakers)
|
Toggle("Reconcile speakers (merge splits + name from content)", isOn: $settings.reconcileSpeakers)
|
||||||
|
Picker("Chunk length", selection: $settings.chunkMode) {
|
||||||
|
ForEach(ChunkMode.allCases) { Text($0.label).tag($0.rawValue) }
|
||||||
|
}
|
||||||
|
Text("How finely audio is split for diarization. Shorter chunks keep fewer simultaneous speakers per window (the diarizer resolves ~4 at a time), at some cost to speed and voice matching. Auto uses 60-sec chunks when more than \(ChunkMode.autoLargeThreshold) people are detected on the call, else 2.5 min.")
|
||||||
|
.font(.caption)
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
Toggle("Build readable recap (topics + highlights)", isOn: $settings.recapEnabled)
|
Toggle("Build readable recap (topics + highlights)", isOn: $settings.recapEnabled)
|
||||||
HStack {
|
HStack {
|
||||||
Picker("Default recap template", selection: $settings.defaultTemplateId) {
|
Picker("Default recap template", selection: $settings.defaultTemplateId) {
|
||||||
@@ -33,7 +52,7 @@ struct SettingsView: View {
|
|||||||
}
|
}
|
||||||
Button("Manage…") { TemplatesWindow.shared.show(settings: settings) }
|
Button("Manage…") { TemplatesWindow.shared.show(settings: settings) }
|
||||||
}
|
}
|
||||||
Text("Your name labels your mic channel. Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
|
Text("Auto-send transcribes on stop; the recap writes transcript.md + recap.html. Templates define the takeaways categories per meeting type.")
|
||||||
.font(.caption)
|
.font(.caption)
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
@@ -50,7 +69,7 @@ struct SettingsView: View {
|
|||||||
}
|
}
|
||||||
|
|
||||||
Section("Adapters") {
|
Section("Adapters") {
|
||||||
Text("Inert in Phase 0 — these toggles only persist for now.")
|
Text("Screen-reading for active-speaker cues. Turn one off to record that app audio-only — transcription still runs, but speakers aren’t identified from the screen.")
|
||||||
.font(.caption)
|
.font(.caption)
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
ForEach(AppSettings.adapterKeys, id: \.key) { adapter in
|
ForEach(AppSettings.adapterKeys, id: \.key) { adapter in
|
||||||
@@ -59,10 +78,17 @@ struct SettingsView: View {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
.formStyle(.grouped)
|
.formStyle(.grouped)
|
||||||
.frame(width: 320)
|
.frame(minWidth: 460, idealWidth: 520, maxWidth: .infinity,
|
||||||
|
minHeight: 520, idealHeight: 660, maxHeight: .infinity)
|
||||||
.navigationTitle("Settings")
|
.navigationTitle("Settings")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// True while the user still has the placeholder name — drives the inline nudge.
|
||||||
|
private var isDefaultName: Bool {
|
||||||
|
let n = settings.selfName.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||||
|
return n.isEmpty || n.caseInsensitiveCompare("Me") == .orderedSame
|
||||||
|
}
|
||||||
|
|
||||||
private func binding(for key: String) -> Binding<Bool> {
|
private func binding(for key: String) -> Binding<Bool> {
|
||||||
Binding(
|
Binding(
|
||||||
get: { settings.adapterEnabled[key] ?? true },
|
get: { settings.adapterEnabled[key] ?? true },
|
||||||
|
|||||||
@@ -120,6 +120,43 @@ struct FrameSampler {
|
|||||||
return points
|
return points
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Grid-sampled saturated pixels that lie on a THIN structure (a non-saturated
|
||||||
|
/// pixel within `edgeGap` on some axis) — the coloured counterpart of
|
||||||
|
/// `thinWhitePoints`. This keeps a thin speaking BORDER/ring/pill but drops the
|
||||||
|
/// solid interior of a colour FILL (e.g. Meet's orange/magenta camera-off avatar
|
||||||
|
/// tiles), whose pixels are surrounded by the same colour. Pair with `hueRange`
|
||||||
|
/// to keep only the cue's colour (Meet's blue ring) and reject the thin edges a
|
||||||
|
/// solid tile still has against the background (orange/magenta boundaries).
|
||||||
|
func thinColoredPoints(threshold: Double = 0.35, minBrightness: Double = 60,
|
||||||
|
hueRange: ClosedRange<Double>? = nil,
|
||||||
|
edgeGap: Int = 6, gridStep: Int = 4) -> [CGPoint] {
|
||||||
|
func isCue(_ x: Int, _ y: Int) -> Bool {
|
||||||
|
guard x >= 0, x < width, y >= 0, y < height else { return false }
|
||||||
|
let i = (y * width + x) * 4
|
||||||
|
let r = Double(pixels[i]), g = Double(pixels[i + 1]), b = Double(pixels[i + 2])
|
||||||
|
let mx = max(r, g, b), mn = min(r, g, b)
|
||||||
|
let sat = mx > 0 ? (mx - mn) / mx : 0
|
||||||
|
guard sat > threshold, mx > minBrightness else { return false }
|
||||||
|
if let hr = hueRange { return hr.contains(Self.hueDegrees(r, g, b, mx, mn)) }
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
var points: [CGPoint] = []
|
||||||
|
var y = edgeGap
|
||||||
|
while y < height - edgeGap {
|
||||||
|
var x = edgeGap
|
||||||
|
while x < width - edgeGap {
|
||||||
|
if isCue(x, y) {
|
||||||
|
let thin = !isCue(x - edgeGap, y) || !isCue(x + edgeGap, y)
|
||||||
|
|| !isCue(x, y - edgeGap) || !isCue(x, y + edgeGap)
|
||||||
|
if thin { points.append(CGPoint(x: x, y: y)) }
|
||||||
|
}
|
||||||
|
x += gridStep
|
||||||
|
}
|
||||||
|
y += gridStep
|
||||||
|
}
|
||||||
|
return points
|
||||||
|
}
|
||||||
|
|
||||||
/// HSV hue in degrees (0…360) from RGB and its precomputed max/min channels.
|
/// HSV hue in degrees (0…360) from RGB and its precomputed max/min channels.
|
||||||
private static func hueDegrees(_ r: Double, _ g: Double, _ b: Double, _ mx: Double, _ mn: Double) -> Double {
|
private static func hueDegrees(_ r: Double, _ g: Double, _ b: Double, _ mx: Double, _ mn: Double) -> Double {
|
||||||
let d = mx - mn
|
let d = mx - mn
|
||||||
|
|||||||
@@ -35,11 +35,21 @@ struct GridCallAnalyzer {
|
|||||||
var colorSaturation: Double = 0.5
|
var colorSaturation: Double = 0.5
|
||||||
var colorMinBrightness: Double = 60
|
var colorMinBrightness: Double = 60
|
||||||
var colorHueRange: ClosedRange<Double>? = nil
|
var colorHueRange: ClosedRange<Double>? = nil
|
||||||
|
// When true, the coloured highlight is detected from THIN edges only (drops
|
||||||
|
// solid colour fills like Meet's camera-off avatar tiles). Pair with a tight
|
||||||
|
// `colorHueRange` so a solid tile's thin background boundary is rejected too.
|
||||||
|
var coloredBorderThinOnly = false
|
||||||
var minTextConfidence: Float = 0.3
|
var minTextConfidence: Float = 0.3
|
||||||
var maxNameLength = 40
|
var maxNameLength = 40
|
||||||
var minHighlightPoints = 6
|
var minHighlightPoints = 6
|
||||||
var highlightShareOfMax = 0.35
|
var highlightShareOfMax = 0.35
|
||||||
var minRingSpan: Double = 60 // a speaking border spans a sizable box, not a speck
|
var minRingSpan: Double = 60 // a speaking border spans a sizable box, not a speck
|
||||||
|
// A real active-speaker cue is a thin RING (border) with an EMPTY interior.
|
||||||
|
// A solid camera-off avatar tile (Meet's orange/magenta fill) or a screen-share
|
||||||
|
// fill is a filled BLOB — its highlight points spread through the interior. Reject
|
||||||
|
// a component when more than this fraction of its points fall in the central
|
||||||
|
// 60%×60% of its bbox (a hollow ring ≈ 0; a solid fill ≈ 0.36). Set ≥ 1 to disable.
|
||||||
|
var maxInteriorFill: Double = 0.2
|
||||||
}
|
}
|
||||||
|
|
||||||
var config = Config()
|
var config = Config()
|
||||||
@@ -68,9 +78,13 @@ struct GridCallAnalyzer {
|
|||||||
// Highlight pixels: coloured (saturated) and/or white (thin near-white).
|
// Highlight pixels: coloured (saturated) and/or white (thin near-white).
|
||||||
var highlight: [CGPoint] = []
|
var highlight: [CGPoint] = []
|
||||||
if config.detectColoredBorder {
|
if config.detectColoredBorder {
|
||||||
highlight += sampler.saturatedPoints(threshold: config.colorSaturation,
|
highlight += config.coloredBorderThinOnly
|
||||||
minBrightness: config.colorMinBrightness,
|
? sampler.thinColoredPoints(threshold: config.colorSaturation,
|
||||||
hueRange: config.colorHueRange)
|
minBrightness: config.colorMinBrightness,
|
||||||
|
hueRange: config.colorHueRange)
|
||||||
|
: sampler.saturatedPoints(threshold: config.colorSaturation,
|
||||||
|
minBrightness: config.colorMinBrightness,
|
||||||
|
hueRange: config.colorHueRange)
|
||||||
}
|
}
|
||||||
if config.detectWhiteBorder { highlight += sampler.thinWhitePoints() }
|
if config.detectWhiteBorder { highlight += sampler.thinWhitePoints() }
|
||||||
|
|
||||||
@@ -89,7 +103,8 @@ struct GridCallAnalyzer {
|
|||||||
var speakingBBox: [Int: CGRect] = [:] // tile index -> the ring bbox marking it speaking
|
var speakingBBox: [Int: CGRect] = [:] // tile index -> the ring bbox marking it speaking
|
||||||
for ring in rings where ring.count >= config.minHighlightPoints {
|
for ring in rings where ring.count >= config.minHighlightPoints {
|
||||||
let bb = Self.boundingBox(ring)
|
let bb = Self.boundingBox(ring)
|
||||||
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a blob
|
guard bb.width >= config.minRingSpan, bb.height >= config.minRingSpan else { continue } // a ring, not a speck
|
||||||
|
guard Self.isHollow(ring, bbox: bb, maxInteriorFill: config.maxInteriorFill) else { continue } // a ring, not a filled tile
|
||||||
for (i, tile) in tiles.enumerated() where bb.contains(CGPoint(x: tile.textRect.midX, y: tile.textRect.midY)) {
|
for (i, tile) in tiles.enumerated() where bb.contains(CGPoint(x: tile.textRect.midX, y: tile.textRect.midY)) {
|
||||||
speakingBBox[i] = bb
|
speakingBBox[i] = bb
|
||||||
}
|
}
|
||||||
@@ -128,6 +143,18 @@ struct GridCallAnalyzer {
|
|||||||
return Array(groups.values)
|
return Array(groups.values)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// True if `pts` form a hollow ring (border) rather than a filled blob: at most
|
||||||
|
/// `maxInteriorFill` of the points fall in the central 60%×60% of `bbox`. A thin
|
||||||
|
/// border has an empty interior (≈ 0); a solid camera-off avatar tile or a
|
||||||
|
/// screen-share fill spreads points through the interior (≈ 0.36). Disabled when
|
||||||
|
/// `maxInteriorFill >= 1`.
|
||||||
|
static func isHollow(_ pts: [CGPoint], bbox: CGRect, maxInteriorFill: Double) -> Bool {
|
||||||
|
guard maxInteriorFill < 1, !pts.isEmpty else { return true }
|
||||||
|
let inner = bbox.insetBy(dx: bbox.width * 0.2, dy: bbox.height * 0.2)
|
||||||
|
let innerCount = pts.reduce(into: 0) { if inner.contains($1) { $0 += 1 } }
|
||||||
|
return Double(innerCount) / Double(pts.count) <= maxInteriorFill
|
||||||
|
}
|
||||||
|
|
||||||
static func boundingBox(_ pts: [CGPoint]) -> CGRect {
|
static func boundingBox(_ pts: [CGPoint]) -> CGRect {
|
||||||
var minX = Double.greatestFiniteMagnitude, minY = minX, maxX = -minX, maxY = -minX
|
var minX = Double.greatestFiniteMagnitude, minY = minX, maxX = -minX, maxY = -minX
|
||||||
for p in pts { minX = min(minX, p.x); minY = min(minY, p.y); maxX = max(maxX, p.x); maxY = max(maxY, p.y) }
|
for p in pts { minX = min(minX, p.x); minY = min(minY, p.y); maxX = max(maxX, p.x); maxY = max(maxY, p.y) }
|
||||||
@@ -166,7 +193,11 @@ struct GridCallAnalyzer {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private func cleaned(_ s: String) -> String {
|
private func cleaned(_ s: String) -> String {
|
||||||
|
// Trim whitespace and any trailing punctuation OCR tacks on, so "Mark." folds
|
||||||
|
// into "Mark" rather than becoming a separate phantom speaker.
|
||||||
s.trimmingCharacters(in: .whitespacesAndNewlines)
|
s.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||||
|
.trimmingCharacters(in: CharacterSet(charactersIn: ".,;:·•-"))
|
||||||
|
.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// True if `s` looks like a participant name label rather than UI chrome. Call
|
/// True if `s` looks like a participant name label rather than UI chrome. Call
|
||||||
@@ -181,6 +212,14 @@ struct GridCallAnalyzer {
|
|||||||
if s.rangeOfCharacter(from: CharacterSet(charactersIn: "@:/\\|+*=<>#0123456789")) != nil {
|
if s.rangeOfCharacter(from: CharacterSet(charactersIn: "@:/\\|+*=<>#0123456789")) != nil {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
// Reject domain-like screen-share text (e.g. "WERUNBTC.COM", OCR'd "WERUNBTC.GOM"):
|
||||||
|
// a token whose final dotted segment is a 2–4 letter suffix. Real names don't end
|
||||||
|
// in a TLD; this keeps "Cait's Phone" and initials like "MO".
|
||||||
|
let lower = s.lowercased()
|
||||||
|
if let dot = lower.lastIndex(of: "."), lower.index(after: dot) < lower.endIndex {
|
||||||
|
let suffix = lower[lower.index(after: dot)...]
|
||||||
|
if (2...4).contains(suffix.count) && suffix.allSatisfy({ $0.isLetter }) { return false }
|
||||||
|
}
|
||||||
let words = s.split(separator: " ")
|
let words = s.split(separator: " ")
|
||||||
guard (1...3).contains(words.count) else { return false }
|
guard (1...3).contains(words.count) else { return false }
|
||||||
let allowed = CharacterSet.letters.union(CharacterSet(charactersIn: "'.-"))
|
let allowed = CharacterSet.letters.union(CharacterSet(charactersIn: "'.-"))
|
||||||
|
|||||||
@@ -15,9 +15,15 @@ final class TimelineBuilder {
|
|||||||
private let closeFrames: Int
|
private let closeFrames: Int
|
||||||
private var aliases: [String: String] = [:] // normalized variant -> canonical
|
private var aliases: [String: String] = [:] // normalized variant -> canonical
|
||||||
private var states: [String: NameState] = [:]
|
private var states: [String: NameState] = [:]
|
||||||
|
private var observed: Set<String> = [] // every tile name seen (speaking or not)
|
||||||
private var lastFrameT: Double = 0
|
private var lastFrameT: Double = 0
|
||||||
private(set) var segments: [VisualTimeline.Segment] = []
|
private(set) var segments: [VisualTimeline.Segment] = []
|
||||||
|
|
||||||
|
/// Every distinct participant name the adapter has OCR'd, whether or not they were
|
||||||
|
/// ever detected speaking — the call-size signal (drives "Auto" chunk sizing and a
|
||||||
|
/// complete participant roster, since speaking-detection is intentionally sparse).
|
||||||
|
var observedNames: [String] { observed.sorted() }
|
||||||
|
|
||||||
init(openFrames: Int = 2, closeFrames: Int = 2) {
|
init(openFrames: Int = 2, closeFrames: Int = 2) {
|
||||||
self.openFrames = max(1, openFrames)
|
self.openFrames = max(1, openFrames)
|
||||||
self.closeFrames = max(1, closeFrames)
|
self.closeFrames = max(1, closeFrames)
|
||||||
@@ -34,6 +40,9 @@ final class TimelineBuilder {
|
|||||||
func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
|
func ingest(_ observations: [SpeakerObservation], at t: TimeInterval) {
|
||||||
lastFrameT = t
|
lastFrameT = t
|
||||||
|
|
||||||
|
// Record every tile seen (speaking or not) for the participant roster / call size.
|
||||||
|
for obs in observations where !obs.name.isEmpty { observed.insert(canonical(obs.name)) }
|
||||||
|
|
||||||
// Best confidence per canonical name that is speaking this frame.
|
// Best confidence per canonical name that is speaking this frame.
|
||||||
var speaking: [String: Double] = [:]
|
var speaking: [String: Double] = [:]
|
||||||
for obs in observations where obs.speaking && !obs.name.isEmpty {
|
for obs in observations where obs.speaking && !obs.name.isEmpty {
|
||||||
@@ -93,9 +102,57 @@ final class TimelineBuilder {
|
|||||||
closeSegment(name: name, state: st)
|
closeSegment(name: name, state: st)
|
||||||
states[name]?.open = false
|
states[name]?.open = false
|
||||||
}
|
}
|
||||||
|
segments = Self.canonicalizeByFrequency(segments)
|
||||||
segments.sort { $0.start < $1.start }
|
segments.sort { $0.start < $1.start }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Fold rare OCR misspellings into the dominant name they're a typo of: a name with
|
||||||
|
/// little total time is remapped to a much longer-running name with the same initial
|
||||||
|
/// within a small edit distance (e.g. "Matt Odel"/"MattOdell"/"Mare" → "Matt Odell"/
|
||||||
|
/// "Mark"). Conservative by design — it won't merge two well-attested speakers, only
|
||||||
|
/// a transient variant into its clearly-dominant canonical. Pure/testable.
|
||||||
|
static func canonicalizeByFrequency(_ segs: [VisualTimeline.Segment],
|
||||||
|
minorMaxSec: Double = 5, dominanceRatio: Double = 8,
|
||||||
|
maxEdits: Int = 2) -> [VisualTimeline.Segment] {
|
||||||
|
var dur: [String: Double] = [:]
|
||||||
|
for s in segs { dur[s.name, default: 0] += s.end - s.start }
|
||||||
|
let names = Array(dur.keys)
|
||||||
|
var remap: [String: String] = [:]
|
||||||
|
for minor in names {
|
||||||
|
let md = dur[minor]!
|
||||||
|
guard md <= minorMaxSec, let mInit = minor.first else { continue }
|
||||||
|
var best: String?, bestDur = 0.0
|
||||||
|
for major in names where major != minor {
|
||||||
|
let Md = dur[major]!
|
||||||
|
guard Md >= md * dominanceRatio, Md > bestDur, major.first == mInit else { continue }
|
||||||
|
if levenshtein(minor.lowercased(), major.lowercased()) <= maxEdits { best = major; bestDur = Md }
|
||||||
|
}
|
||||||
|
if let b = best { remap[minor] = b }
|
||||||
|
}
|
||||||
|
guard !remap.isEmpty else { return segs }
|
||||||
|
return segs.map { s in
|
||||||
|
remap[s.name].map { VisualTimeline.Segment(start: s.start, end: s.end, name: $0,
|
||||||
|
confidence: s.confidence, source: s.source) } ?? s
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Levenshtein edit distance (small strings — names).
|
||||||
|
static func levenshtein(_ a: String, _ b: String) -> Int {
|
||||||
|
let x = Array(a), y = Array(b)
|
||||||
|
if x.isEmpty { return y.count }; if y.isEmpty { return x.count }
|
||||||
|
var prev = Array(0...y.count)
|
||||||
|
var cur = [Int](repeating: 0, count: y.count + 1)
|
||||||
|
for i in 1...x.count {
|
||||||
|
cur[0] = i
|
||||||
|
for j in 1...y.count {
|
||||||
|
cur[j] = x[i-1] == y[j-1] ? prev[j-1]
|
||||||
|
: Swift.min(prev[j-1], prev[j], cur[j-1]) + 1
|
||||||
|
}
|
||||||
|
swap(&prev, &cur)
|
||||||
|
}
|
||||||
|
return prev[y.count]
|
||||||
|
}
|
||||||
|
|
||||||
// MARK: - Internal
|
// MARK: - Internal
|
||||||
|
|
||||||
private struct NameState {
|
private struct NameState {
|
||||||
|
|||||||
@@ -75,7 +75,10 @@ final class VisualCapture {
|
|||||||
}, to: durationSec)
|
}, to: durationSec)
|
||||||
|
|
||||||
let artifact = (vision + selfSegs).sorted { $0.start < $1.start }
|
let artifact = (vision + selfSegs).sorted { $0.start < $1.start }
|
||||||
let names = Set(artifact.map { $0.name })
|
// Roster = everyone OCR'd (speaking or not) ∪ the names that produced segments,
|
||||||
|
// so the participant count reflects true call size even when few people were
|
||||||
|
// detected speaking. Drives "Auto" chunk sizing downstream.
|
||||||
|
let names = Set(artifact.map { $0.name }).union(observer.participantNames())
|
||||||
let participants = names.sorted().map {
|
let participants = names.sorted().map {
|
||||||
VisualTimeline.Participant(name: $0, isSelf: $0 == selfName ? true : nil, aliases: nil)
|
VisualTimeline.Participant(name: $0, isSelf: $0 == selfName ? true : nil, aliases: nil)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -114,6 +114,10 @@ final class VisualObserver: NSObject, SCStreamDelegate, SCStreamOutput {
|
|||||||
queue.sync { builder.mergeSelfSpans(spans, selfName: selfName) }
|
queue.sync { builder.mergeSelfSpans(spans, selfName: selfName) }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Every distinct participant name OCR'd over the session (read on the builder's
|
||||||
|
/// queue; safe to call after `stop`).
|
||||||
|
func participantNames() -> [String] { queue.sync { builder.observedNames } }
|
||||||
|
|
||||||
// MARK: - SCStreamOutput (on `queue`)
|
// MARK: - SCStreamOutput (on `queue`)
|
||||||
|
|
||||||
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
|
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
|
||||||
|
|||||||
@@ -138,16 +138,37 @@ final class GridCallAnalyzerTests: XCTestCase {
|
|||||||
func testNameFilterAgainstRealMeetOCR() {
|
func testNameFilterAgainstRealMeetOCR() {
|
||||||
// The exact strings OCR pulled from a real Meet session — only the first
|
// The exact strings OCR pulled from a real Meet session — only the first
|
||||||
// group are participants; the rest are UI chrome that must NOT become speakers.
|
// group are participants; the rest are UI chrome that must NOT become speakers.
|
||||||
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me"]
|
let names = ["Grant Gilliam", "Caitlyn Viggiano", "Cait's Phone", "Grant", "Me", "Matt Odell"]
|
||||||
let junk = ["11:43 AM | rvo-rmjg-rdq", "@ Embassy Er", "Admit 1 guest",
|
let junk = ["11:43 AM | rvo-rmjg-rdq", "@ Embassy Er", "Admit 1 guest",
|
||||||
"Joined as grant.gilliam@gmail.com", "Others may see your video differently",
|
"Joined as grant.gilliam@gmail.com", "Others may see your video differently",
|
||||||
"Others might still see your full video.", "Your meeting's ready", "efforot",
|
"Others might still see your full video.", "Your meeting's ready", "efforot",
|
||||||
"g* Add others", "g+ Add others", "meet.google.com/rvo-rmjg-rdq",
|
"g* Add others", "g+ Add others", "meet.google.com/rvo-rmjg-rdq",
|
||||||
"permission before they can join.", "the meeting", "G"]
|
"permission before they can join.", "the meeting", "G",
|
||||||
|
// Screen-share domain text OCR'd as a name (incl. OCR'd TLDs).
|
||||||
|
"WERUNBTC.COM", "WERUNBTG.COM", "WERUNBTC.GOM"]
|
||||||
for n in names { XCTAssertTrue(GridCallAnalyzer.isLikelyName(n), "should keep name: \(n)") }
|
for n in names { XCTAssertTrue(GridCallAnalyzer.isLikelyName(n), "should keep name: \(n)") }
|
||||||
for j in junk { XCTAssertFalse(GridCallAnalyzer.isLikelyName(j), "should drop junk: \(j)") }
|
for j in junk { XCTAssertFalse(GridCallAnalyzer.isLikelyName(j), "should drop junk: \(j)") }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func testHollowRingKeptFilledTileRejected() {
|
||||||
|
// A thin ring (border): points only on the perimeter of a 120×120 box.
|
||||||
|
var ring: [CGPoint] = []
|
||||||
|
for t in stride(from: 0.0, through: 120, by: 4) {
|
||||||
|
ring.append(.init(x: t, y: 0)); ring.append(.init(x: t, y: 120))
|
||||||
|
ring.append(.init(x: 0, y: t)); ring.append(.init(x: 120, y: t))
|
||||||
|
}
|
||||||
|
let rbb = GridCallAnalyzer.boundingBox(ring)
|
||||||
|
XCTAssertTrue(GridCallAnalyzer.isHollow(ring, bbox: rbb, maxInteriorFill: 0.2))
|
||||||
|
|
||||||
|
// A solid fill (camera-off avatar tile): points across the whole box.
|
||||||
|
var blob: [CGPoint] = []
|
||||||
|
for x in stride(from: 0.0, through: 120, by: 4) {
|
||||||
|
for y in stride(from: 0.0, through: 120, by: 4) { blob.append(.init(x: x, y: y)) }
|
||||||
|
}
|
||||||
|
let bbb = GridCallAnalyzer.boundingBox(blob)
|
||||||
|
XCTAssertFalse(GridCallAnalyzer.isHollow(blob, bbox: bbb, maxInteriorFill: 0.2))
|
||||||
|
}
|
||||||
|
|
||||||
func testWhiteBorderDetectorIgnoresColouredBorder() {
|
func testWhiteBorderDetectorIgnoresColouredBorder() {
|
||||||
// Signal looks only for the white border, so a coloured (Meet) border must
|
// Signal looks only for the white border, so a coloured (Meet) border must
|
||||||
// not register as a Signal speaker.
|
// not register as a Signal speaker.
|
||||||
|
|||||||
@@ -37,6 +37,45 @@ final class Phase5Tests: XCTestCase {
|
|||||||
XCTAssertEqual(asm.speakersFile.segments[0].start, 152, accuracy: 0.01)
|
XCTAssertEqual(asm.speakersFile.segments[0].start, 152, accuracy: 0.01)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func testChunkModeResolvesBodyLength() {
|
||||||
|
// Fixed presets ignore participant count.
|
||||||
|
XCTAssertEqual(ChunkMode.standard.bodySeconds(participantCount: 99), 150)
|
||||||
|
XCTAssertEqual(ChunkMode.largeGroup.bodySeconds(participantCount: 2), 60)
|
||||||
|
XCTAssertEqual(ChunkMode.fine.bodySeconds(participantCount: nil), 90)
|
||||||
|
// Auto: >4 detected → 60s, ≤4 → 150s, unknown → 150s.
|
||||||
|
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 6), 60)
|
||||||
|
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: 4), 150)
|
||||||
|
XCTAssertEqual(ChunkMode.auto.bodySeconds(participantCount: nil), 150)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testChunkOverlapScalesWithBody() {
|
||||||
|
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 150), 15) // capped
|
||||||
|
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 60), 8) // floored (60*0.12=7.2→8)
|
||||||
|
XCTAssertEqual(ChunkMode.overlapSeconds(forBody: 90), 11) // 90*0.12=10.8→11
|
||||||
|
}
|
||||||
|
|
||||||
|
func testPlanChunksShortBodyChunksAShortCall() {
|
||||||
|
// A 100s call would be ONE chunk at the 2.5-min default, but at a 60s body it
|
||||||
|
// splits — so "Large group" actually re-chunks medium calls.
|
||||||
|
let c = SessionPackager.planChunks(durationSec: 100, chunkSeconds: 60,
|
||||||
|
overlapSeconds: 8, thresholdSec: 72)
|
||||||
|
XCTAssertEqual(c.count, 2)
|
||||||
|
XCTAssertEqual(c[0].bodyStart, 0); XCTAssertEqual(c[0].bodyEnd, 60)
|
||||||
|
XCTAssertEqual(c[1].bodyStart, 60); XCTAssertEqual(c[1].bodyEnd, 100)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testDropStuckSpansRemovesWholeCallCue() {
|
||||||
|
let segs = [
|
||||||
|
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant Gilliam", confidence: 1, source: "vision"), // stuck whole-call tile
|
||||||
|
VisualTimeline.Segment(start: 100, end: 130, name: "Matt Odell", confidence: 0.9, source: "vision"), // real
|
||||||
|
VisualTimeline.Segment(start: 0, end: 1900, name: "Grant", confidence: 1, source: "mic_vad"), // self span: keep
|
||||||
|
]
|
||||||
|
let out = TranscriptPipeline.dropStuckSpans(segs, duration: 1976)
|
||||||
|
XCTAssertFalse(out.contains { $0.name == "Grant Gilliam" }) // 96% of call in one span → dropped
|
||||||
|
XCTAssertTrue(out.contains { $0.name == "Matt Odell" }) // short real span kept
|
||||||
|
XCTAssertTrue(out.contains { $0.source == "mic_vad" }) // self never dropped
|
||||||
|
}
|
||||||
|
|
||||||
func testRebaseClipsAndRebases() throws {
|
func testRebaseClipsAndRebases() throws {
|
||||||
let segs = [
|
let segs = [
|
||||||
VisualTimeline.Segment(start: 140, end: 160, name: "A", confidence: 0.9, source: "vision"),
|
VisualTimeline.Segment(start: 140, end: 160, name: "A", confidence: 0.9, source: "vision"),
|
||||||
|
|||||||
@@ -11,6 +11,27 @@ final class VisualObserverTests: XCTestCase {
|
|||||||
(id, CGRect(x: 0, y: 0, width: w, height: h))
|
(id, CGRect(x: 0, y: 0, width: w, height: h))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func testCanonicalizeFoldsOcrMisspellingsIntoDominantName() {
|
||||||
|
func seg(_ s: Double, _ e: Double, _ n: String) -> VisualTimeline.Segment {
|
||||||
|
.init(start: s, end: e, name: n, confidence: 0.9, source: "vision")
|
||||||
|
}
|
||||||
|
let segs = [
|
||||||
|
seg(0, 1689, "Matt Odell"), // dominant
|
||||||
|
seg(1700, 1702, "Matt Odel"), // OCR typo → fold
|
||||||
|
seg(1702, 1702.3, "MattOdell"), // dropped-space typo → fold
|
||||||
|
seg(0, 1155, "Mark"), // dominant
|
||||||
|
seg(1200, 1201, "Mare"), // OCR typo → fold into Mark
|
||||||
|
seg(0, 4, "Sidisel"), // screen junk, no near-twin → kept (dropped later, no voice match)
|
||||||
|
]
|
||||||
|
let names = Set(TimelineBuilder.canonicalizeByFrequency(segs).map { $0.name })
|
||||||
|
XCTAssertTrue(names.contains("Matt Odell"))
|
||||||
|
XCTAssertTrue(names.contains("Mark"))
|
||||||
|
XCTAssertFalse(names.contains("Matt Odel"))
|
||||||
|
XCTAssertFalse(names.contains("MattOdell"))
|
||||||
|
XCTAssertFalse(names.contains("Mare"))
|
||||||
|
XCTAssertTrue(names.contains("Sidisel"))
|
||||||
|
}
|
||||||
|
|
||||||
func testPrefersMatchingWindowIDOverLargest() {
|
func testPrefersMatchingWindowIDOverLargest() {
|
||||||
// The Meet window (id 42) is NOT the largest — must still be chosen by ID.
|
// The Meet window (id 42) is NOT the largest — must still be chosen by ID.
|
||||||
let candidates = [c(7, 1600, 1000), c(42, 800, 600), c(9, 1200, 900)]
|
let candidates = [c(7, 1600, 1000), c(42, 800, 600), c(9, 1200, 900)]
|
||||||
|
|||||||
@@ -135,10 +135,11 @@ Full request/response shapes, curl examples, limits, and error formats are in
|
|||||||
|
|
||||||
## 7. Remaining open items (small)
|
## 7. Remaining open items (small)
|
||||||
|
|
||||||
1. **Base URL — RESOLVED.** `https://192.168.1.72:62419`, also
|
1. **Base URL — RESOLVED.** A private LAN host — a `.local` mDNS name (preferred
|
||||||
`https://immense-voyage.local:62419` (prefer the `.local` form; it survives IP
|
over a raw IP, since it survives IP changes) — configured in Settings or via the
|
||||||
changes). Ship the `.local` host as the default; keep it editable in settings.
|
`SPARK_BACKEND_URL` env var, and never committed. Ship a neutral placeholder as
|
||||||
Service-discovery at `GET /api/endpoints`.
|
the default; keep it editable in settings. Service-discovery at
|
||||||
|
`GET /api/endpoints`.
|
||||||
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
|
2. **Send trigger** — assume auto-POST on call end; expose a "hold for review"
|
||||||
toggle if the user wants to eyeball the timeline first.
|
toggle if the user wants to eyeball the timeline first.
|
||||||
3. **Retention** — keep the session folder after a successful hand-off, or prune
|
3. **Retention** — keep the session folder after a successful hand-off, or prune
|
||||||
|
|||||||
@@ -76,12 +76,13 @@ locally — the mic track is the user's known identity / VAD source.)
|
|||||||
|
|
||||||
## 3. SparkControl — connection (real)
|
## 3. SparkControl — connection (real)
|
||||||
|
|
||||||
- **Base URL (confirmed):** `https://192.168.1.72:62419` — also reachable at
|
- **Base URL (confirmed):** a private LAN host — a `.local` mDNS name (preferred
|
||||||
`https://immense-voyage.local:62419` (the `.local` form survives IP changes;
|
over a raw IP; it survives IP changes) — configured in Settings or via the
|
||||||
**prefer it as the default**). Service-discovery JSON is at
|
`SPARK_BACKEND_URL` env var, and **never committed**. Service-discovery JSON is at
|
||||||
`GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio
|
`GET /api/endpoints` (returns current vLLM / Parakeet / Kokoro URLs). All audio
|
||||||
endpoints in §4–§5 hang off this base. Still **make it a setting** so the host
|
endpoints in §4–§5 hang off this base. **Make it a setting** so the host can
|
||||||
can change, but ship `https://immense-voyage.local:62419` as the default.
|
change, and ship a neutral placeholder (`https://your-spark-backend.local`) as
|
||||||
|
the default.
|
||||||
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
- **TLS:** Start9 self-signed Root CA. Either skip verification (`URLSession`
|
||||||
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
delegate trusting the cert; curl `-k`; `rejectUnauthorized:false`) **or** install
|
||||||
the Start9 Root CA into the trust store.
|
the Start9 Root CA into the trust store.
|
||||||
|
|||||||
+8
-5
@@ -7,17 +7,20 @@ options:
|
|||||||
createIntermediateGroups: true
|
createIntermediateGroups: true
|
||||||
groupSortPosition: top
|
groupSortPosition: top
|
||||||
|
|
||||||
|
# Signing identity (DEVELOPMENT_TEAM) is kept out of source in a gitignored xcconfig
|
||||||
|
# so the Team ID isn't committed. Copy Config/Signing.xcconfig.example to
|
||||||
|
# Config/Signing.xcconfig and set your team. Keeping the value stable is what makes
|
||||||
|
# macOS TCC grants (Mic / Screen Recording / Accessibility) persist across rebuilds.
|
||||||
|
configFiles:
|
||||||
|
Debug: Config/Signing.xcconfig
|
||||||
|
Release: Config/Signing.xcconfig
|
||||||
|
|
||||||
settings:
|
settings:
|
||||||
base:
|
base:
|
||||||
MARKETING_VERSION: "0.1.0"
|
MARKETING_VERSION: "0.1.0"
|
||||||
CURRENT_PROJECT_VERSION: "1"
|
CURRENT_PROJECT_VERSION: "1"
|
||||||
SWIFT_VERSION: "5.0"
|
SWIFT_VERSION: "5.0"
|
||||||
CODE_SIGN_STYLE: Automatic
|
CODE_SIGN_STYLE: Automatic
|
||||||
# Grant's free personal team (cert OU). Baked in so `xcodegen generate` keeps
|
|
||||||
# a STABLE signing identity across regenerations — macOS ties TCC permission
|
|
||||||
# grants (Mic / Screen Recording / Accessibility) to this identity, so a
|
|
||||||
# stable team is what makes those permissions persist across rebuilds.
|
|
||||||
DEVELOPMENT_TEAM: "BK4Y6CXN35"
|
|
||||||
|
|
||||||
targets:
|
targets:
|
||||||
Ten31Transcripts:
|
Ten31Transcripts:
|
||||||
|
|||||||
Reference in New Issue
Block a user