ten31-signal-engine

grant/ten31-signal-engine

Fork 0

Commit Graph

Author	SHA1	Message	Date
Keysat	e8d50efdf4	Disable Gemini thinking budget in extraction backend gemini-2.5-flash thinks by default and spent ~3.8k of the 4k output budget on reasoning, hitting MAX_TOKENS with a truncated JSON body -> 0 claims parsed. Set thinking_budget=0 so the full budget goes to the answer (mirrors the local path's enable_thinking=False). On the validation chunk this went from 0 -> 11 claims.	2026-06-15 22:28:12 -05:00
Keysat	5deffddb17	Fix transcript chunker context overflow; full-coverage extraction defaults chunk_text split only on "\n\n", but ASR transcripts have none (speaker turns are joined by a single "\n"), so whole 2-3h episodes (~250K chars) went to the extractor in one call and 400'd on context overflow. Fall through paragraph -> line -> sentence -> word -> hard char-slice so no chunk exceeds the cap regardless of punctuation; guard max_chars < 1. Default extraction to recall-first full coverage (chunk_chars 12K, max_chunks 999) and expose both as run-extract --chunk-chars / --max-chunks.	2026-06-15 22:28:12 -05:00
Keysat	a6aec77506	Initial commit: Ten31 Signal Engine (ingest, scoring brain, corpus seeds)	2026-06-15 09:24:29 -05:00

Author

SHA1

Message

Date

Keysat

e8d50efdf4

Disable Gemini thinking budget in extraction backend

gemini-2.5-flash thinks by default and spent ~3.8k of the 4k output budget on reasoning, hitting MAX_TOKENS with a truncated JSON body -> 0 claims parsed. Set thinking_budget=0 so the full budget goes to the answer (mirrors the local path's enable_thinking=False). On the validation chunk this went from 0 -> 11 claims.

2026-06-15 22:28:12 -05:00

Keysat

5deffddb17

Fix transcript chunker context overflow; full-coverage extraction defaults

chunk_text split only on "\n\n", but ASR transcripts have none (speaker turns are joined by a single "\n"), so whole 2-3h episodes (~250K chars) went to the extractor in one call and 400'd on context overflow. Fall through paragraph -> line -> sentence -> word -> hard char-slice so no chunk exceeds the cap regardless of punctuation; guard max_chars < 1.

Default extraction to recall-first full coverage (chunk_chars 12K, max_chunks 999) and expose both as run-extract --chunk-chars / --max-chunks.

2026-06-15 22:28:12 -05:00

Keysat

a6aec77506

Initial commit: Ten31 Signal Engine (ingest, scoring brain, corpus seeds)

2026-06-15 09:24:29 -05:00

3 Commits