Files
2026-06-13 14:39:40 -05:00

12 KiB
Raw Permalink Blame History

Evaluation — ten31-transcripts — 2026-06-13

Intent: A native macOS menu-bar app (Swift/SwiftUI/AppKit, macOS 13+, generated by XcodeGen) that auto-detects conference calls (Meet/Zoom/Teams/Signal), records dual-track audio while watching the call window via ScreenCaptureKit for active-speaker cues, and hands audio + a visual speaker timeline to a self-hosted SparkControl backend that performs transcription/diarization/speaker-naming — producing named transcripts and recaps.

Agents run: evaluator, security-auditor, exerciser, doc-auditor. Skipped: start9-spec-checker (no StartOS-wrapper markers found), reviewer (working tree clean — no diff to review).

Verdict

This is a genuinely well-engineered personal tool: it builds cleanly with the documented DEVELOPER_DIR toolchain, all 69 tests pass in ~1s (claim verified empirically), the architecture is disciplined (the app records/watches/packages/reconciles and correctly delegates all ML to the backend), and secrets hygiene is verifiably clean — the documented 2026-06-13 history scrub survives a full-ref grep with zero leaked hosts or IPs. The headline risk is the TLS trust model: certificate validation is bypassed globally for any host, on by default, so anyone on the LAN can MITM the full upload of call audio, the visual timeline, and stored voiceprints — and the same bypass makes a reproducible recap-time crash (mmss() on a malformed Double) attacker-reachable. The second-largest issue is documentation: the README still describes "Phase 0 (scaffold)" for an app that has shipped through Phase 6, and the docs/ specs have diverged from the dual-channel API and the recap phase. Code-wise this is close to ready for its single-user, LAN-only purpose; the fixes are well-scoped and mostly small. Fix the TLS model first — it gates the safety of every backend-integration test that follows.

Cross-referenced findings

  • TLS bypass scope — contradiction resolved against the evaluator. The evaluator rated Security 4 partly on the basis that the TLS-skip is "intentional/scoped" (InsecureTrustDelegate.swift:9). The security-auditor read the implementation and found it is not scoped: URLCredential(trust:) is returned for any host without a host/fingerprint/CA check (InsecureTrustDelegate.swift:22), and it is default-on (AppSettings.swift:109). The auditor's direct evidence wins; the Security lens is adjusted down accordingly (see Scorecard).
  • One attack chain, two agents. The exerciser independently reproduced (twice) a fatal crash in RecapAnalyzer.mmss() on Double.nan/Double.infinity (RecapAnalyzer.swift:137), reachable when the backend returns e.g. "duration": 1e400. The security-auditor's P1 global TLS bypass is exactly what lets an on-LAN attacker be that backend. These are not two unrelated findings — the P1 bypass converts the P2 crash from "trust the backend" to "any LAN attacker can crash the app at recap time." Listed once each below, but they share an exploit path.
  • README staleness — corroborated by two agents. Both the evaluator (P2) and the doc-auditor (multiple lines) independently flagged that README.md describes Phase 0 while the code is at Phase 6+, and both flagged the matching stale source comment at AppSettings.swift:7. Merged into one finding; the doc-auditor adds that the drift extends into the docs/ design specs.
  • Test count — claim verified, not just asserted. The evaluator and exerciser both built and ran the suite; "69 tests pass" (AGENTS.md) is confirmed by execution, not by counting func test declarations.

Priority queue

  • [P1] Global, unscoped TLS bypass trusts any certificate from any host (default-on) — anyone on the LAN can ARP/DNS-spoof the unauthenticated .local mDNS name and receive the full mic+system audio, visual timeline, and voiceprints, then return attacker-chosen transcripts — InsecureTrustDelegate.swift:22, wired at SparkControlClient.swift:85/GatewayLLMClient.swift:36/SparkControlHealth.swift:35 — security-auditor
  • [P2] Skip-TLS defaults to ON, so the P1 MITM window is open from first launch before any user choice — AppSettings.swift:109 (... as? Bool ?? true) — security-auditor
  • [P2] RecapAnalyzer.mmss() fatally crashes on NaN/±Infinity (reproduced twice); a malformed/MITM'd backend duration decodes to Double.infinity and aborts the app at recap-render time — RecapAnalyzer.swift:137 (Int(sec.rounded())) — exerciser (exploit path opened by the P1 finding)
  • [P2] README is stale by six phases — claims "Phase 0 (scaffold)… no audio capture, call detection, screen reading, or backend hand-off yet" for an app that has all of it; the same lie is in source comment AppSettings.swift:7README.md:7,49,51,56-66 vs. Ten31Transcripts/{Audio,Detection,Visual,Session,Recap}/ — evaluator + doc-auditor
  • [P2] SessionController (670 lines, the most concurrency-dense file: generations, in-flight task adoption, pending-auto-stop) has zero unit tests, while comparable pure logic is well covered — SessionController.swift:256-282 — evaluator
  • [P3] docs/ design specs drifted from the implemented backend path: the dual-channel fields (mic_file/system_file/self_name/self_vad) are undocumented and the recap/LLM phase is absent — docs/03_DATA_CONTRACTS.md:109-116, docs/02_ARCHITECTURE.md:51,197, docs/01_PROJECT_BRIEF.md:31,83,94, docs/04_BUILD_PLAN.md (no recap phase) vs. SparkControlClient.swift:106-130 / RecapAnalyzer.swift:8-12 — doc-auditor
  • [P3] docs/01_PROJECT_BRIEF.md:142-153 §7 lists open items 25 (send trigger, retention, voiceprint-update policy, signing) that are already resolved in code — AppSettings.swift:46, VoiceprintStore.swift:25, Config/Signing.xcconfig — doc-auditor
  • [P3] docs/02_ARCHITECTURE.md:214-216 §2.10 claims MenuBarUI features (recent-sessions list with resend/delete, voiceprint manager) that are absent from the actual UI (MenuBarView surfaces only the single last session) — doc-auditor
  • [P3] AGENTS.md Layout listings are incomplete: Audio/ omits AudioMixer/MonoTrackWriter/Resampler, Detection/ omits AudioInputProcesses/MicActivityMonitorAGENTS.md:50,53 — doc-auditor
  • [P3] The manifest.json per-file sha256 integrity contract is specified but never written by the pipeline — spec-vs-reality gap — docs/03_DATA_CONTRACTS.md:61-63 — evaluator
  • [P3] Env-var precedence footgun: a saved UserDefaults backend URL permanently shadows SPARK_BACKEND_URL, so the env var silently has no effect once Settings is touched (already noted in ROADMAP) — AppSettings.swift:105-107, ROADMAP.md:23 — evaluator
  • [P3] SessionController owns three jobs — recording state machine, backend-processing orchestration, and the saved-session/NSOpenPanel UI flow; extract the open/reprocess UI before the file grows — SessionController.swift:467-535 — evaluator
  • [P3] Unused, scary-looking NSAppleEventsUsageDescription entitlement string ("reads the active browser tab's URL") with no AppleEvents code path (Meet detection uses CGWindowListCopyWindowInfo titles only) — drop it — Info.plist:33 — security-auditor
  • [P3] Backend is unauthenticated by design — any LAN device that reaches it can drive transcription; consider a shared bearer token even on LAN — docs/03_DATA_CONTRACTS.md:89 — security-auditor
  • [P3] App Sandbox OFF + Hardened Runtime OFF (intentional, required for cross-app observation) leaves the app unconfined; keep the zero-dependency posture as a deliberate compensating control and document it as such — project.yml:38 + entitlements — security-auditor

Scorecard

The evaluator's six-lens table, with two lenses adjusted where another agent's evidence contradicts the evaluator's stated basis (adjustments noted):

Lens Score /5 Notes
Architecture 5 Clean layering; ML delegated to backend per intent; pure/testable seams split from I/O. The single 670-line SessionController is the only concentration (P3 to extract).
Security 3 (was 4) Adjusted down. The evaluator's "TLS-skip is intentional/scoped" basis is contradicted by the security-auditor's read: the bypass is global/any-host (InsecureTrustDelegate.swift:22) and default-on. Otherwise strong — zero deps, no shell-out, verified-clean secrets, the "never write frames" privacy claim holds in code.
Performance 5 Idles near-zero; frames released immediately; grid-sampled vision with reused CIContext; sequential backend calls honor the single-GPU constraint.
Testing 4 69 tests pass (verified by execution); they target the real load-bearing logic. Gap: the SessionController concurrency state machine is untested.
Code quality 5 Consistent style, comments explain why, zero warnings, no try!. One latent robustness ding: the mmss() NaN/∞ fatal (P2).
Documentation 3 (was 4) Adjusted down. The evaluator scored 4 calling docs/ "excellent and true," but the doc-auditor's claim-by-claim pass found drift well beyond the README — the dual-channel API and the entire recap phase are undocumented across docs/01-04, and the build plan never mentions recap.

Disagreements & gaps

  • TLS scope (resolved). Evaluator said "scoped" and scored Security 4; security-auditor read InsecureTrustDelegate.swift:22 and found it global + default-on (P1). Resolved in favor of the auditor's direct evidence; Security adjusted to 3.
  • Documentation breadth (resolved). Evaluator sampled docs/ and judged them accurate (lens 4); doc-auditor did a claim-by-claim pass and found material drift in the specs, not just the README. Resolved in favor of the doc-auditor for the lens; adjusted to 3.
  • Shared blind spot (all runtime-capable agents). None could exercise live end-to-end behavior — the SparkControl .local backend is unreachable from any of these environments by design, and the real on-call visual-cue accuracy needs the gitignored example-screenshots/. The Meet visual fix (reject solid camera-off tiles) therefore remains unverified end-to-end, which AGENTS.md "Current state" itself acknowledges. No agent could close this; it requires a real call on the user's machine.

Suggested order of work

  1. Fix the TLS trust model first — scope the override to the configured backend host and pin the Start9 root CA (or the leaf SPKI hash); default skip-TLS to false. This is the P1, and it is the precondition that makes any later backend-integration test trustworthy (it currently gates the P2 crash's reachability).
  2. Harden DoubleInt conversions on backend-decoded values — give mmss() a finite-guard fallback and audit sibling call sites; closes the recap-time crash chain that step 1 also narrows.
  3. Rewrite README.md to match the shipped app and fix the AppSettings.swift:7 "Phase 0" comment — the single highest-leverage doc change (first thing any newcomer reads).
  4. Reconcile the docs/ specs — document the dual-channel fields in docs/03 §4 and docs/02, add the recap phase to docs/01/02/04, and close the already-resolved §7 open items.
  5. Add SessionController state-machine tests (auto-start-then-immediate-call-end via pendingAutoStop; the visual-adoption generation guard) — do this before the next refactor so it has a safety net.
  6. Then extract the saved-session/open-panel UI out of SessionController into a small coordinator.
  7. Run one real call end-to-end on the user's machine to validate the unverified Meet visual fix and confirm speakers.json + transcript.md + recap.html are written correctly — only meaningful after step 1 makes that path safe.