From 0b001b49d5f8dfd84f921d11ba742cc70970c195 Mon Sep 17 00:00:00 2001 From: Keysat Date: Tue, 16 Jun 2026 08:45:12 -0500 Subject: [PATCH] Handoff: record Gemini timeout lesson; Strike extraction near complete Fold the hang lesson into the Gemini operational rule (disable thinking AND set a timeout) and refresh Current state for the in-progress Gemini extraction (~68 docs left, 52.7k claims) and the gating Strike test. --- AGENTS.md | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index dbdd37b..b423064 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -94,10 +94,12 @@ ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the o `\n\n`→`\n`→sentence→word→hard-slice; splitting only on `\n\n` (the old bug) sent whole 2–3 h episodes in ONE call → context-overflow 400s. Extraction defaults to full coverage at 12K chars/chunk (`run-extract --chunk-chars/--max-chunks`); bigger chunks risk lost-in-the-middle recall loss. -- **Gemini extraction backend disables thinking.** `gemini-2.5-flash` thinks by default and burns the - output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims; the backend sets - `thinking_budget=0` (mirrors the local path's `enable_thinking=False`). Gemini = overflow for PUBLIC - data only; keep `EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing. +- **Gemini extraction backend: disable thinking AND set a timeout.** `gemini-2.5-flash` thinks by + default and burns the output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims, so + the backend sets `thinking_budget=0` (mirrors local `enable_thinking=False`). It also sets an HTTP + **timeout (120 s) + 4 retries** — a timeout-less call once hung the single-threaded worker ~50 min + (transient 504/read-timeouts then self-heal). Gemini = overflow for PUBLIC data only; keep + `EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing. - **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read **`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the @@ -114,19 +116,17 @@ IP appears only as an env-var default. ## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`) -- **Strike adversarial test: extraction running — this is the gating step.** Root-caused & FIXED the - long-form 400s (the old `\n\n`-only chunker sent whole 2–3 h episodes in one call → context overflow). - Now recall-first full coverage (12K chars). Draining the ~700-doc / ~5.7k-chunk extract backlog through - the **Gemini backend** (one-time PUBLIC overflow, `EXTRACTION_BACKEND=gemini` inline) to finish faster - (~6–7 h serial); validated live — dense (~7.5 claims/chunk), zero failures; 27 prior 400-failures - requeued. **NEXT when it finishes:** `embed-claims` → `two-sided --conviction STRIKE2022 --modes - live,test` (PASS = quiet in live, fires in test). -- **Battery adversarial test: PASSES** (unchanged — demand-net rises, supply stays flat at 0.0). - **§7.1 power-infra backtest:** qualified YES (corpus-gated; caveats in `DESIGN_v2.md`). -- **2 commits made, UNPUSHED** — push to `main` was blocked by the permission classifier (enforcing the - old no-push-to-main rule); awaiting approval (`git push origin main`). Commits: chunker fix + - recall-first defaults; Gemini thinking-budget fix. -- **Open decisions for Grant:** (a) push the 2 commits; (b) speed-up approach — recommended real-time - concurrency over the async Batch API (serial Gemini runs as the fallback meanwhile). -- Corpus spans bitcoin podcasts, SEC/FMP filings (+`banks` cluster), the Battery corpus, River research; - EISC edges seeded for the bitcoin cluster. +- **Strike adversarial test: extraction ~complete — this is the gating step.** Chunker 400 bug FIXED + (old `\n\n`-only split sent whole 2–3 h episodes in one call → context overflow); extraction is now + recall-first full coverage (12K chars). Drained the ~700-doc backlog via the **Gemini backend** + (one-time PUBLIC overflow), hardened with thinking-off + 120 s timeout/4 retries after a timeout-less + call hung the worker ~50 min. At session end **~68 docs left** (the SEC/FMP filings tail); **52.7k + claims** in the DB, 0 failures, 3 transient timeouts. **NEXT once extraction finishes:** `embed-claims` + → `two-sided --conviction STRIKE2022 --modes live,test` (PASS = quiet in live, fires in test). +- **If extraction stopped early** (session ended): resume with `EXTRACTION_BACKEND=gemini run-extract + --limit 800` (drop the env var for the local Qwen path). Pending claims still need `embed-claims` after. +- **Battery test PASSES; §7.1 power-infra qualified YES** (both unchanged). +- **3 commits local + `backends.py` uncommitted (Gemini timeout/retry) — ALL UNPUSHED.** Push to `main` + is blocked by the permission classifier; run `git push origin main` (your updated rule allows main). +- Corpus: bitcoin podcasts, SEC/FMP filings (+`banks` cluster), Battery corpus, River research; EISC + edges seeded for the bitcoin cluster.