35ba6ecf05
The NSAppleEventsUsageDescription usage string was dead — the app has no AppleEvents/AppleScript code path (Meet detection reads window titles), so the permission prompt never fired; remove it. Rephrase the leftover "Phase N" build-plan references in source comments (one of which falsely claimed "no audio, capture, or call detection yet"), and complete the AGENTS.md Audio/Detection layout listings.
35 lines
3.1 KiB
Markdown
35 lines
3.1 KiB
Markdown
# ROADMAP — Ten31 Transcripts
|
|
|
|
Longer-term backlog and deferred decisions. Near-term status + the next few steps live in `AGENTS.md` → Current state.
|
|
|
|
## Visual detection
|
|
- Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
|
|
- Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
|
|
- Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
|
|
- Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
|
|
- 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
|
|
- Accessibility-tree name source for Electron/Meet (cleaner than OCR); `AppAdapter.namesFromAccessibility` hook exists but returns nil.
|
|
|
|
## Platform support
|
|
- Jitsi: add call detection + a `JitsiAdapter` (Jitsi Meet is browser-based like Google Meet — needs `CallDetector` title recognition, an adapter for participant-name reading, and active-speaker visual cues). New platform alongside Meet/Zoom/Teams/Signal.
|
|
|
|
## Audio / speakers
|
|
- Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
|
|
- Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.
|
|
|
|
## App / UX
|
|
- Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
|
|
- Constrain recap reading width on very wide windows (long line length in the summary band).
|
|
|
|
## Tooling / repo
|
|
- Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
|
|
- `SPARK_BACKEND_URL` is read only at `AppSettings.init` and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.
|
|
|
|
## Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in `EVALUATION.md`)
|
|
- Guard `RecapAnalyzer.mmss()` (`:137`) against NaN/∞ — a malformed backend `duration` aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
|
|
- Add `SessionController` state-machine tests (`pendingAutoStop`, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
|
|
- Smaller P3s in `EVALUATION.md`: whether to actually emit the `manifest.json` per-file `sha256` (now documented as not-emitted in `docs/03` §2); unauthenticated LAN backend (consider a bearer token).
|
|
|
|
## Deferred decisions
|
|
- Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).
|