Files
ten31-transcripts/ROADMAP.md
T
Grant Gilliam dda4322de7 Reconcile docs/ specs with the shipped app
Document the dual-channel label-merge path (mic_file/system_file/self_name/self_vad) and the recap phase (transcript.md + recap.html via the backend LLM) across docs/01-03; correct docs/02 $2.10 to the UI actually shipped; mark docs/01 $7 open items as settled; remove the dead AUDIO_API.md references; note the manifest sha256 fields are not emitted; mark docs/04 as a complete/historical build log. Also drop the last stale "Phase 0" UI string in MenuBarView and retire the now-done doc-debt items in ROADMAP.
2026-06-16 22:09:04 -05:00

3.5 KiB

ROADMAP — Ten31 Transcripts

Longer-term backlog and deferred decisions. Near-term status + the next few steps live in AGENTS.md → Current state.

Visual detection

  • Improve Meet faint-blue-border detection (currently sparse): infer tile columns from name-label spacing for reliable per-tile geometry, and/or key on the audio-wave pill.
  • Geometric screen-share exclusion: ignore OCR text in the shared-screen region (needs layout detection). Today only the domain filter + stuck-span guard catch share-text-as-speaker.
  • Speaker-view / spotlight layout: detect the one-dominant-tile case (active speaker is the large tile with no border) instead of assuming a grid.
  • Apply Meet's thin-edge + hollow-ring + hue gating to Zoom/Teams if real fixtures show solid-tile false positives there.
  • 1:1 Signal: audio-pill fallback (no active border ever appears in 1:1).
  • Accessibility-tree name source for Electron/Meet (cleaner than OCR); AppAdapter.namesFromAccessibility hook exists but returns nil.

Platform support

  • Jitsi: add call detection + a JitsiAdapter (Jitsi Meet is browser-based like Google Meet — needs CallDetector title recognition, an adapter for participant-name reading, and active-speaker visual cues). New platform alongside Meet/Zoom/Teams/Signal.

Audio / speakers

  • Self mic-channel cleanup: tighten self-VAD / smooth self so sub-second junk "self" fragments stop surviving (self is currently protected from fragment-smoothing).
  • Adaptive chunk sizing from the backend's first-chunk speaker count, instead of the visual participant estimate.

App / UX

  • Per-app recording control: call detection is all-or-nothing; the adapter toggle only gates visual capture, not whether the app records.
  • Constrain recap reading width on very wide windows (long line length in the summary band).

Tooling / repo

  • Decide whether to add a linter/formatter (SwiftLint/SwiftFormat) — none configured today.
  • SPARK_BACKEND_URL is read only at AppSettings.init and is shadowed by any value already saved in Settings (UserDefaults wins). So once a backend URL has been saved, the env var has no effect — a stale stored value can override it in dev/CI/harness runs. If that bites, treat an empty/placeholder stored URL as absent so the env var can still win.

Quality / debt (from the 2026-06-13 independent eval — full queue + evidence in EVALUATION.md)

  • Guard RecapAnalyzer.mmss() (:137) against NaN/∞ — a malformed backend duration aborts the app at recap render (eval P2). Cheap; fold into the next backend change.
  • Add SessionController state-machine tests (pendingAutoStop, visual-adoption generation guard) before refactoring; then extract its saved-session / open-panel UI (eval P2/P3).
  • Optional: sweep the stale "Phase N" references in source comments (e.g. SparkControlHealth.swift:7 "arrives in Phase 5", Ten31TranscriptsApp.swift:6 "Phase 0 only") — historical, not false, but dated. docs/04_BUILD_PLAN.md is now marked COMPLETE/historical and is the map for these.
  • Smaller P3s in EVALUATION.md: incomplete AGENTS Layout listings, unwritten manifest.json sha256 contract (now documented as not-emitted in docs/03 §2), unused NSAppleEventsUsageDescription, unauthenticated LAN backend (consider a bearer token).

Deferred decisions

  • Cross-device self unification (same person, desktop mic vs phone speakerphone) does not work by voiceprint and is treated as a separate identity; revisit only if a reliable signal emerges (mic-channel-as-self remains the robust path).