Handoff: record Gemini timeout lesson; Strike extraction near complete

Fold the hang lesson into the Gemini operational rule (disable thinking AND set a timeout) and refresh Current state for the in-progress Gemini extraction (~68 docs left, 52.7k claims) and the gating Strike test.
This commit is contained in:
Keysat
2026-06-16 08:45:12 -05:00
parent 87b6b05d67
commit 0b001b49d5
+20 -20
View File
@@ -94,10 +94,12 @@ ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the o
`\n\n``\n`→sentence→word→hard-slice; splitting only on `\n\n` (the old bug) sent whole 23 h episodes
in ONE call → context-overflow 400s. Extraction defaults to full coverage at 12K chars/chunk
(`run-extract --chunk-chars/--max-chunks`); bigger chunks risk lost-in-the-middle recall loss.
- **Gemini extraction backend disables thinking.** `gemini-2.5-flash` thinks by default and burns the
output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims; the backend sets
`thinking_budget=0` (mirrors the local path's `enable_thinking=False`). Gemini = overflow for PUBLIC
data only; keep `EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing.
- **Gemini extraction backend: disable thinking AND set a timeout.** `gemini-2.5-flash` thinks by
default and burns the output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims, so
the backend sets `thinking_budget=0` (mirrors local `enable_thinking=False`). It also sets an HTTP
**timeout (120 s) + 4 retries** — a timeout-less call once hung the single-threaded worker ~50 min
(transient 504/read-timeouts then self-heal). Gemini = overflow for PUBLIC data only; keep
`EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing.
- **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read
**`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type
hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the
@@ -114,19 +116,17 @@ IP appears only as an env-var default.
## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`)
- **Strike adversarial test: extraction running — this is the gating step.** Root-caused & FIXED the
long-form 400s (the old `\n\n`-only chunker sent whole 23 h episodes in one call → context overflow).
Now recall-first full coverage (12K chars). Draining the ~700-doc / ~5.7k-chunk extract backlog through
the **Gemini backend** (one-time PUBLIC overflow, `EXTRACTION_BACKEND=gemini` inline) to finish faster
(~67 h serial); validated live — dense (~7.5 claims/chunk), zero failures; 27 prior 400-failures
requeued. **NEXT when it finishes:** `embed-claims``two-sided --conviction STRIKE2022 --modes
live,test` (PASS = quiet in live, fires in test).
- **Battery adversarial test: PASSES** (unchanged — demand-net rises, supply stays flat at 0.0).
**§7.1 power-infra backtest:** qualified YES (corpus-gated; caveats in `DESIGN_v2.md`).
- **2 commits made, UNPUSHED** — push to `main` was blocked by the permission classifier (enforcing the
old no-push-to-main rule); awaiting approval (`git push origin main`). Commits: chunker fix +
recall-first defaults; Gemini thinking-budget fix.
- **Open decisions for Grant:** (a) push the 2 commits; (b) speed-up approach — recommended real-time
concurrency over the async Batch API (serial Gemini runs as the fallback meanwhile).
- Corpus spans bitcoin podcasts, SEC/FMP filings (+`banks` cluster), the Battery corpus, River research;
EISC edges seeded for the bitcoin cluster.
- **Strike adversarial test: extraction ~complete — this is the gating step.** Chunker 400 bug FIXED
(old `\n\n`-only split sent whole 23 h episodes in one call → context overflow); extraction is now
recall-first full coverage (12K chars). Drained the ~700-doc backlog via the **Gemini backend**
(one-time PUBLIC overflow), hardened with thinking-off + 120 s timeout/4 retries after a timeout-less
call hung the worker ~50 min. At session end **~68 docs left** (the SEC/FMP filings tail); **52.7k
claims** in the DB, 0 failures, 3 transient timeouts. **NEXT once extraction finishes:** `embed-claims`
`two-sided --conviction STRIKE2022 --modes live,test` (PASS = quiet in live, fires in test).
- **If extraction stopped early** (session ended): resume with `EXTRACTION_BACKEND=gemini run-extract
--limit 800` (drop the env var for the local Qwen path). Pending claims still need `embed-claims` after.
- **Battery test PASSES; §7.1 power-infra qualified YES** (both unchanged).
- **3 commits local + `backends.py` uncommitted (Gemini timeout/retry) — ALL UNPUSHED.** Push to `main`
is blocked by the permission classifier; run `git push origin main` (your updated rule allows main).
- Corpus: bitcoin podcasts, SEC/FMP filings (+`banks` cluster), Battery corpus, River research; EISC
edges seeded for the bitcoin cluster.