ten31-transcripts

Author	SHA1	Message	Date
Grant Gilliam	3629dbdaaa	Default TLS validation on; scope skip-TLS bypass to the configured host The app shipped with certificate validation bypassed globally and on by default — InsecureTrustDelegate trusted any cert from any host. That was the evaluation's P1: anyone on the LAN could MITM call audio, transcripts, and voiceprints. The backend's Start9 cert already validates under normal system trust when the StartOS Root CA is installed in the keychain (confirmed: URLSession default validation returns 200 against the backend and its fallback), so the bypass is unnecessary: - skip-TLS now defaults to off - when explicitly enabled, the bypass is scoped to the configured host via InsecureTrustDelegate.allowsTrustOverride, never "trust any server" - the host gate is pure and unit-tested (InsecureTrustDelegateTests) Docs reconciled: AGENTS.md backend/TLS line and Current state.	2026-06-13 16:02:57 -05:00
Grant Gilliam	11eb82178f	Add agent instructions; extract signing/backend secrets from source - Add AGENTS.md (canonical) + CLAUDE.md symlink + ROADMAP.md - Move Apple Team ID from project.yml into a gitignored Config/Signing.xcconfig via configFiles; commit the .example template - Replace hardcoded backend host in AppSettings with a neutral placeholder + SPARK_BACKEND_URL env-var fallback - Scrub the Team ID, .local host, and raw LAN IP from README/docs - Ignore Config/Signing.xcconfig and .env	2026-06-13 12:23:54 -05:00
Grant Gilliam	a3e3406b28	Make diarization chunk length configurable (Auto + presets) Chunk size was hardcoded at 2.5-min bodies. Add a Settings control: Auto / Standard 2.5min / Large group 60s / Fine 90s. Shorter chunks keep fewer simultaneous speakers per window (Sortformer resolves ~4/chunk), useful for large calls, at some cost to speed and cross-chunk voice matching. - ChunkMode (new, pure/testable): mode → body seconds; Auto picks 60s when >4 participants were detected, else 150s; overlap + single-chunk threshold scale with the body length. - AppSettings.chunkMode (+ typed `chunk`); SettingsView picker with explanation. - TranscriptPipeline.process gains chunkSeconds; derives overlap/threshold from it. - SessionController resolves the body from the setting + the session's detected participant count (visual_timeline participants) for both send + re-process. - Participant roster now counts EVERY tile OCR'd, not just who spoke (TimelineBuilder.observedNames → VisualObserver → VisualCapture), so the Auto call-size signal is meaningful even though speaking-detection is sparse. Tests: ChunkMode resolution, overlap scaling, short-body re-chunking. 69 pass.	2026-06-09 10:15:16 -05:00
Grant Gilliam	6d0c8be8c9	Speaker reconciliation + open/re-process any saved session Reconciliation (the marry-the-signals layer): after transcription, before the recap, SpeakerReconciler (1) MERGES non-self clusters whose voiceprints are highly similar (cosine >= 0.82) — fixes a person split across chunks (the real 1-on-1 failure: one remote came back as 'MH' + 'Unknown_0'); and (2) NAMES remaining non-self clusters from transcript CONTENT via the gateway LLM (people addressed by name / self-intros), conservative + confidence-gated, keeping the placeholder when unrevealed. The mic-channel self is protected and never reassigned. Voice does the segmentation; the fingerprint-merge fixes splits; the LLM adds the content signal visual/voiceprint lack. - SpeakerReconciler: pure cosine merge (tested) + LLM content-naming pass; rewrites speakers.json before recap. SessionController.finishBackend shares one model lookup for reconcile + recap. Gated by settings.reconcileSpeakers (default on). - Open saved session: menu 'Open saved session…' → folder picker. Edits it if already transcribed, else reconstructs inputs from disk (visual_timeline vision segs + channel self-spans) and runs transcribe → reconcile → recap, then opens the editor. Lets you evaluate/correct ANY past call, not just the in-memory last one. Note (from real Signal data): visual naming is unreliable on Signal (sparse, misread initials, lowercase/center names) — so reconciliation + the editor (which teaches voiceprints on confirm) carry it; the editor remains the human arbiter. 59/59 XCTest.	2026-06-08 11:54:41 -05:00
Grant Gilliam	c539b78a58	Configurable recap templates (categories per meeting type, in Settings) Takeaways categories are no longer hardcoded — they're editable templates. A template = the always-on TLDR + an ordered list of sections, each with a title, a type (attributed items / bulleted list / paragraph), and an instruction (the prompt text for that category). The analyzer assembles the LLM prompt FROM the template and parses generically, so adding/removing/renaming a category needs zero code and the output always renders. - RecapTemplate / TemplateSection / SectionKind + TopicGranularity; built-in defaults (Internal Meeting, 1:1, Company/Sales Call), all editable. - Generic extras: RecapExtras{tldr, primarySpeakers, sections:[RenderedSection]} + RecapItem{text,who,when,note} replaces the fixed MeetingExtras. Analyzer builds per-section sec_N fields + parses by kind; renderer + remap are generic. - Topic granularity (coarse/auto/fine) answers 'should chunking be configurable' — it scales the target topic count; raw window sizes stay as tuned defaults. - AppSettings persists templates + defaultTemplateId (seeded once). Settings gets a default-template picker + 'Manage…' → TemplatesView (CRUD, edit sections/ instructions, set default, Preview prompt for full transparency). - Recap editor gains a template picker; Regenerate uses the chosen template. Auto recap uses the default template. 54/54 XCTest (template prompt build, generic parse/remap/render updated).	2026-06-06 19:26:03 -05:00
Grant Gilliam	85bfdf2b56	Recap: readable transcript + topic sections + meeting extras (gateway LLM) New 'Recap' phase — turns speakers.json into a human-readable recap, leveraging recap-relay's proven logic/prompts but calling the Spark gateway's OpenAI-compatible /v1/chat/completions directly (same host/TLS as label-merge; Qwen3-35B). We start from already-named speakers (label-merge), so recap-relay's speaker clustering + name-inference are skipped entirely. - GatewayLLMClient: /v1/chat/completions (JSON mode), model discovery via /api/endpoints, TLS-skip reuse, 503 retry, sequential. - RecapAnalyzer: speakers.json → numbered [N] (MM:SS) Name: text transcript → time-windowed analyze (single window for short calls, 18min/2min overlap for long) → stitch/dedup topic sections → meeting extras (TLDR/decisions/action_items/ open_questions/key_quotes). Defensive JSON parsing of LLM output. - RecapRenderer: writes transcript.md + a self-contained dark-theme recap.html (topic sections w/ collapsible transcripts, extras panels, speaker color chips, full timestamped speaker-attributed transcript, print styles). - SessionController.buildRecap: best-effort after speakers.json (gated by settings.recapEnabled); surfaces recapURL → menu 'Open recap'. Skips silently if the gateway has no LLM. Settings toggle added. Validated END-TO-END on the real Meet session against the live gateway: dual-channel transcription → 3 topic sections + accurate TLDR + key quotes; 'Go Bitcoin' correctly attributed to the remote speaker. 46/46 XCTest (10 new).	2026-06-06 14:36:18 -05:00
Grant Gilliam	863136aeec	Phases 2-6: detection, visual timeline, backend hand-off, voiceprints Phase 2 (call detection): CallDetector using CoreAudio per-process mic attribution (anarlog technique) — robust start+stop for Zoom/Teams/Signal/Meet, ignoring our own recording; auto-record toggle. Built; pending live multi-app confirmation by the user. Phase 3 (visual timeline foundation): AppAdapter protocol + SpeakerObservation, TimelineBuilder (hysteresis/overlap/self-merge/aliases), VisualTimeline (schema 1.1), TextRecognizer (Vision OCR), FrameSampler + GridCallAnalyzer (name OCR + saturated-highlight active-speaker attribution), SignalAdapter, VisualObserver (window capture; frames released, never saved; minimized->visual_gap, idle != gap). Synthetic-frame tested; adapter geometry pending real Signal fixtures + live VisualObserver validation. Phase 5 (backend hand-off): SparkControlClient (multipart label-merge, sequential, TLS-skip, 503 Retry-After/413), SessionPackager (chunk plan + WAV slice + timeline slice/rebase), TranscriptAssembler + SpeakersFile, TranscriptPipeline. Validated END-TO-END against the live backend (chunk -> label-merge -> speakers.json). Phase 6 (voiceprints): VoiceprintStore (known_voiceprints, persist named fingerprints, skip Unknown). Wired: 'Send to backend' button + transcript status, auto-send toggle (default off) + self-name setting. All adversarial-review findings fixed. App + XCTest suite build; tests pass.	2026-06-06 00:15:49 -05:00
Grant Gilliam	b2ae3a62b9	Phase 0: menu-bar scaffold, permissions, backend health check Native SwiftUI menu-bar app (LSUIElement, macOS 13+), generated from project.yml via XcodeGen. Includes: - PermissionsManager (Microphone / Screen Recording / Accessibility) + UI - SparkControlHealth: GET /api/status over self-signed TLS (InsecureTrustDelegate) - AppSettings persistence (host, TLS-skip, output folder, adapter toggles) - Menu-bar panel + Settings, app sandbox & hardened runtime off (LAN tool)	2026-06-05 19:33:53 -05:00

8 Commits