Commit Graph

5 Commits

Author SHA1 Message Date
Keysat 5deffddb17 Fix transcript chunker context overflow; full-coverage extraction defaults
chunk_text split only on "\n\n", but ASR transcripts have none (speaker turns are joined by a single "\n"), so whole 2-3h episodes (~250K chars) went to the extractor in one call and 400'd on context overflow. Fall through paragraph -> line -> sentence -> word -> hard char-slice so no chunk exceeds the cap regardless of punctuation; guard max_chars < 1.

Default extraction to recall-first full coverage (chunk_chars 12K, max_chunks 999) and expose both as run-extract --chunk-chars / --max-chunks.
2026-06-15 22:28:12 -05:00
Keysat cabb8a3d6c Handoff: mark Strike test stalled, document resume steps 2026-06-15 12:11:49 -05:00
Keysat 19375dcdfb Update Current state: Strike in extraction phase; audio fix landed 2026-06-15 11:13:09 -05:00
Keysat 5bd8758ab8 Add portability retrofit: AGENTS.md + CLAUDE.md symlink, scoring-brain guide, ROADMAP, .env.example 2026-06-15 11:10:59 -05:00
Keysat a6aec77506 Initial commit: Ten31 Signal Engine (ingest, scoring brain, corpus seeds) 2026-06-15 09:24:29 -05:00