Roadmap: capture daily activity digest email (SMTP, Job A-gated findings)

Tighten Current state; document two-sided net-value interpretation
Record Strike test conditional pass; track reflexivity follow-up
2026-06-16 15:52:36 -05:00 · 2026-06-16 12:53:14 -05:00 · 2026-06-16 12:39:13 -05:00 · 2026-06-16 08:45:12 -05:00 · 2026-06-16 08:45:12 -05:00 · 2026-06-15 22:47:20 -05:00
7 changed files with 130 additions and 43 deletions
@@ -89,6 +89,17 @@ ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the o
  independent). Unconfirmed: Unchained, Debifi, Coinkite (held quarantined pending Grant's materiality call).
 - **Gemini quota is a rolling ~24h window** (~291 hour-long episodes / ~51M tokens), not a calendar-day
  reset. Bulk transcription overflows there; expect 429 RESOURCE_EXHAUSTED past the window.
+- **Transcript chunking is recall-first and MUST cap every chunk.** ASR transcripts have NO blank-line
+  paragraphs (speaker turns joined by a single `\n`), so `extract.claims.chunk_text` falls through
+  `\n\n`→`\n`→sentence→word→hard-slice; splitting only on `\n\n` (the old bug) sent whole 2–3 h episodes
+  in ONE call → context-overflow 400s. Extraction defaults to full coverage at 12K chars/chunk
+  (`run-extract --chunk-chars/--max-chunks`); bigger chunks risk lost-in-the-middle recall loss.
+- **Gemini extraction backend: disable thinking AND set a timeout.** `gemini-2.5-flash` thinks by
+  default and burns the output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims, so
+  the backend sets `thinking_budget=0` (mirrors local `enable_thinking=False`). It also sets an HTTP
+  **timeout (120 s) + 4 retries** — a timeout-less call once hung the single-threaded worker ~50 min
+  (transient 504/read-timeouts then self-heal). Gemini = overflow for PUBLIC data only; keep
+  `EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing.
 - **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read
  **`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type
  hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the
@@ -105,17 +116,19 @@ IP appears only as an env-var default.

 ## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`)

- **Battery adversarial test: PASSES.** Corpus built (23 docs via the `docs` fetcher); after the three
-  scoring fixes the engine reads demand-net rising (+3.9) while **supply stays flat at 0.0** — correctly
-  rejecting Cantor's *announced* $2B and borrower-side collateral claims as not-realized-supply.
- **Strike adversarial test: STALLED — needs a manual resume; no result yet.** The independent leg (What
-  Bitcoin Did, Stephan Livera, Kevin Rooke, Anita Posch, Cafe Bitcoin, + River research — all
-  independent) is ~586/671 transcribed. The `run_strike_pipeline.sh` watcher proceeded on that partial
-  corpus, but its extraction worker **died (2026-06-11)** after only 17/635 podcast docs; the stale lease
-  is cleared, so **608 bitcoin-podcast extract jobs are pending**. Spark vLLM is healthy → **RESUME:**
-  `run-extract --limit 700 --max-chunks 4` → `embed-claims` → `two-sided --conviction STRIKE2022
-  --modes live,test` (PASS = quiet in live, fires in test). The Spark **audio fix** (semaphore-of-2 +
-  retry-backoff) is committed and validated (~2.5× faster, zero episode aborts).
- **§7.1 power-infra backtest:** qualified YES (corpus-gated; runway/precision caveats in `DESIGN_v2.md`).
- Corpus now spans bitcoin podcasts, SEC/FMP company filings (incl. 6 major banks + Robinhood, a new
-  `banks` cluster), the Battery text corpus, and River research. EISC edges seeded for the bitcoin cluster.
+- **Strike adversarial test: CONDITIONAL PASS (2026-06-16).** Pipeline complete end-to-end — extraction
+  drained the last 63 filing jobs (Gemini, 3,330 claims, 0 failures) → **56,008 claims** all embedded in
+  Qdrant → `two-sided --conviction STRIKE2022 --modes live,test`. The engine **refuses the false positive**:
+  the Lightning-retail nodes net `+0.25` (capped single bitcoin cluster, ≪ `EISC_FLOOR=2.0`), §3 guardrail
+  holding. The own_network-drop *reflexivity demo* is unexercised (`own_net=0`, live==test) because
+  RHR/CD/Bitcoin.Review (169 eps) were deferred at transcription 2026-06-08; operator accepted the
+  conditional pass, no audio-GPU spend now. How to read the net value + the full follow-up:
+  `docs/guides/scoring-brain.md` (STRIKE2022) and `ROADMAP.md`.
+- **Battery test PASSES; §7.1 power-infra qualified YES** (both unchanged).
+- **Corpus:** bitcoin podcasts (own_network: TFTC partial 19/80; RHR/CD/Bitcoin.Review deferred), SEC/FMP
+  filings (`banks` cluster now extracted + power-infra names Oklo/NuScale/Cipher/TeraWulf), Battery corpus,
+  River research; EISC edges seeded for the bitcoin cluster.
+- **Repo:** clean, in sync with `origin/main` (Gitea). No automated test suite (on ROADMAP).
+- **NEXT (priority order, all fresh scopes — confirm direction first):** (1) frontier-fan-out test H6, the
+  untested half of the §1.1 validation; (2) complete the Strike reflexivity demo when audio-GPU budget
+  allows (un-defer RHR/CD 2022–23 → re-extract → re-run); (3) Job A discovery scorers for the forward pilot.
@@ -18,6 +18,13 @@ falsification hypotheses (H1–H6) are in `DESIGN_v2.md`.
 - **MD&A targeting** for filings — extract Item 7, not front-matter boilerplate.

 ## Corpus & independence
+- **Complete the Strike reflexivity demonstration (deferred 2026-06-16).** The STRIKE2022 adversarial test
+  is a *conditional* pass: the engine refuses the false positive via the capped single-bitcoin-cluster
+  guard (`net=+0.25` ≪ `EISC_FLOOR=2.0`), but the own_network-drop mechanism (live quiet < test fires) is
+  unexercised because RHR (80) / CD (77) / Bitcoin.Review (12) were deferred at transcription on 2026-06-08
+  (`own_net=0`, live==test). To finish: un-defer + transcribe the **RHR/CD 2022–23 Lightning-retail window**
+  → `run-extract` → re-run `two-sided --conviction STRIKE2022 --modes live,test`; PASS = test fires while
+  live stays quiet. Costs constrained audio-GPU time, hence deferred.
 - **Confirm materiality** of the remaining `own_network`-flagged sources with Grant: Unchained, Debifi,
  Coinkite (Bitcoin.Review). Immaterial → flip to independent (the River/Swan precedent).
 - **BTC Sessions (Ben Perrin)** — strongest still-missing independent high-Strike merchant/wallet-adoption
@@ -33,6 +40,21 @@ falsification hypotheses (H1–H6) are in `DESIGN_v2.md`.
 - **Episode-pipelining** in `transcribe_worker` — download/chunk the next episode while transcribing the
  current one, to close the inter-episode GPU idle gap (the per-chunk 2-in-flight path is already done).
 - **Corpus-management UI** — add to the corpus over time and see the full corpus selection.
+- **Expose pipeline tunables in the UI (with the UI topic).** Extraction chunk size + per-doc chunk cap,
+  audio chunk length, audio concurrency, etc. are currently hardcoded defaults (now also CLI flags on
+  `run-extract`: `--chunk-chars`, `--max-chunks`). Surface them in the UI so they're visible/adjustable,
+  not black-box assumptions we forget about. Tie to the corpus-management UI work.
+- **Daily activity digest email.** A `daily-digest` CLI command rendering a "last 24h" report —
+  corpus throughput (`documents.ingested_at`/`processed_at`, `claims.extracted_at` by kind/cluster),
+  queue health (`backfill.queue.stats()` — surface **failed/stuck** jobs), Qdrant index lag, infra
+  (`spark-status`), and a **key-findings** section (new `ledger` rows by `date_logged`; `candidate_scores`
+  that `cleared_evidence_bar`). All timestamps already default to `datetime('now')`, so the window is a
+  one-liner; the activity half is buildable today. Deliver via **SMTP** (stdlib `smtplib`+`email`, no new
+  dep, configurable per service — `SMTP_HOST/PORT/USER/PASS`, `DIGEST_TO`); ship a `--stdout` dry-run
+  mode; schedule via launchd on the Mac. Two dependencies: (1) the findings section is only real once the
+  **Job A discovery scorers** run on a schedule — until then it's stubbed/echoes manual adversarial runs;
+  (2) sovereignty (guardrail #7) — SMTP through your own/ten31 server keeps it inside the boundary; do NOT
+  route through a third-party email API if findings ever carry Battery/Strike or positioning substance.
 - **Forward live operation** — the only real test: scoring un-pre-selected signals as they arrive, with
  the dual-evaluation ledger as arbiter.

@@ -70,6 +70,11 @@ Pre-registered failed convictions used to test the engine against its target fai
  stays quiet in `live` (own_network dropped) while it would fire in `test`** — the engine refusing the
  intra-cluster echo. Run `two-sided --conviction STRIKE2022 --modes live,test`. The REALIZED-ONLY rule
  is load-bearing here (speculative "Lightning will revolutionize payments" is `predictive`, not signal).
+  **Reading the output:** a single capped bitcoin cluster nets `eisc≈0.25` — already sub-bar vs
+  `EISC_FLOOR=2.0`, so a `+0.25` "quiet in live" can be the *cluster cap* refusing the false positive,
+  NOT the own_network drop. Check `own_net`: if it's 0, live==test and the reflexivity mechanism is
+  unexercised (the affirmers are independent), so a quiet `live` does not by itself prove the echo-drop —
+  you need own_network affirms present (`own_net>0`) for `test` to fire above `live`.

 **Standing rule S1:** derivatives resolve on OUTCOME (scaled substance), never milestones or enablers.
 An announced program / a regulatory unblock / a single bank's toe-in is CONTEXT, not corroboration.
@@ -254,7 +254,8 @@ def cmd_run_extract(args: argparse.Namespace) -> int:
    conn = db.connect(cfg.db_path)
    db.init_db(conn)
    sc = from_config(cfg)
-    result = run_extract(conn, sc, cfg, limit=args.limit, max_chunks_per_doc=args.max_chunks)
+    result = run_extract(conn, sc, cfg, limit=args.limit, max_chunks_per_doc=args.max_chunks,
+                         chunk_chars=args.chunk_chars)
    print(f"extraction: {result['jobs_processed']} jobs, {result['claims_written']} claims written")
    return 0

@@ -581,7 +582,10 @@ def build_parser() -> argparse.ArgumentParser:

    re = sub.add_parser("run-extract", help="Drain 'extract' jobs → claims via the local LLM (§4.2)")
    re.add_argument("--limit", type=int, default=5, help="max jobs to process this run")
-    re.add_argument("--max-chunks", type=int, default=4, help="max chunks per document")
+    re.add_argument("--max-chunks", type=int, default=999,
+                    help="max chunks per document (default: full coverage (999))")
+    re.add_argument("--chunk-chars", type=int, default=12_000,
+                    help="chars per extraction chunk; smaller = better recall, more LLM calls")
    re.set_defaults(func=cmd_run_extract)

    sub.add_parser("queue-status", help="Backfill queue counts by type/state").set_defaults(func=cmd_queue_status)
@@ -10,6 +10,7 @@ A backend exposes: complete_json(messages, max_tokens) -> str  (a JSON object st
 from __future__ import annotations

 import logging
+import time

 log = logging.getLogger(__name__)

@@ -32,27 +33,43 @@ class GeminiBackend:
    API is the eventual scale path; this synchronous form is the drop-in fallback."""
    name = "gemini"

-    def __init__(self, api_key: str, model: str = "gemini-2.5-flash") -> None:
+    def __init__(self, api_key: str, model: str = "gemini-2.5-flash", *,
+                 timeout_s: float = 120.0, retries: int = 4) -> None:
        from google import genai  # guarded import; pip install google-genai
+        from google.genai import types
        self._genai = genai
-        self.client = genai.Client(api_key=api_key)
+        self._types = types
+        # http_options.timeout is in MILLISECONDS — without it a stalled request hangs the (single-
+        # threaded) worker forever; one such hang froze a whole batch for ~50 min before this fix.
+        self.client = genai.Client(api_key=api_key,
+                                   http_options=types.HttpOptions(timeout=int(timeout_s * 1000)))
        self.model = model
+        self.retries = retries

    def complete_json(self, messages: list[dict], *, max_tokens: int = 4000) -> str:
-        from google.genai import types
+        types = self._types
        system = "\n\n".join(m["content"] for m in messages if m["role"] == "system")
        user = "\n\n".join(m["content"] for m in messages if m["role"] != "system")
-        resp = self.client.models.generate_content(
-            model=self.model,
-            contents=user,
-            config=types.GenerateContentConfig(
-                system_instruction=system or None,
-                temperature=0,
-                max_output_tokens=max_tokens,
-                response_mime_type="application/json",
-            ),
+        cfg = types.GenerateContentConfig(
+            system_instruction=system or None,
+            temperature=0,
+            max_output_tokens=max_tokens,
+            response_mime_type="application/json",
+            # Gemini 2.5 thinks by default and spends the output budget on reasoning tokens —
+            # it hit MAX_TOKENS with ~3.8k thoughts and a truncated JSON body (0 claims parsed).
+            # Extraction is deterministic, no-CoT (mirrors the local path's enable_thinking=False).
+            thinking_config=types.ThinkingConfig(thinking_budget=0),
        )
-        return resp.text or "{}"
+        for attempt in range(self.retries + 1):
+            try:
+                resp = self.client.models.generate_content(model=self.model, contents=user, config=cfg)
+                return resp.text or "{}"
+            except Exception as e:  # noqa: BLE001 — timeout/5xx/429/network: back off and retry
+                if attempt >= self.retries:
+                    raise
+                sleep = 2.0 * (2 ** attempt)
+                log.warning("Gemini call failed (%s); retry %d/%d in %.0fs", e, attempt + 1, self.retries, sleep)
+                time.sleep(sleep)


 def from_config(cfg, sc) -> "LocalQwenBackend | GeminiBackend":
@@ -30,25 +30,51 @@ def register_seed_topics(conn: sqlite3.Connection) -> None:
    conn.commit()


+# Coarse→fine split boundaries. Transcripts arrive as `Speaker: turn` lines joined by a SINGLE
+# newline (ASR output has no blank-line paragraphs), filings as paragraph text — so splitting on
+# "\n\n" alone never fires on a transcript and the whole episode would go in one call. "" is the
+# per-character hard cap that guarantees termination regardless of punctuation.
+_SEPARATORS = ["\n\n", "\n", ". ", " ", ""]
+
+
 def chunk_text(text: str, max_chars: int) -> list[str]:
-    """Split on paragraph boundaries into windows that fit the model context alongside the prompt."""
+    """Pack text into windows that each fit the model context alongside the prompt.
+
+    Falls through paragraph → line → sentence → word → hard char-slice, so NO chunk ever exceeds
+    max_chars however the source is punctuated, while keeping speaker turns intact when they fit.
+    """
+    if max_chars < 1:  # else _pack recurses past the last separator → IndexError
+        raise ValueError(f"max_chars must be >= 1, got {max_chars}")
    text = text.strip()
    if not text:
        return []
+    return _pack(text, max_chars, _SEPARATORS)
+
+
+def _pack(text: str, max_chars: int, seps: list[str]) -> list[str]:
+    """Recursively pack `text` on the coarsest separator in `seps` that keeps chunks within
+    max_chars, descending to a finer one only for a part that is itself still too big."""
    if len(text) <= max_chars:
        return [text]
-    chunks: list[str] = []
-    cur: list[str] = []
-    size = 0
-    for para in text.split("\n\n"):
-        if size + len(para) > max_chars and cur:
-            chunks.append("\n\n".join(cur))
-            cur, size = [], 0
-        cur.append(para)
-        size += len(para) + 2
+    sep, rest = seps[0], seps[1:]
+    parts = list(text) if sep == "" else text.split(sep)
+    out: list[str] = []
+    cur = ""
+    for p in parts:
+        candidate = p if not cur else cur + sep + p
+        if len(candidate) <= max_chars:
+            cur = candidate
+            continue
+        if cur:
+            out.append(cur)
+        if len(p) <= max_chars:
+            cur = p
+        else:  # a single part still too big → split it on the next-finer boundary
+            out.extend(_pack(p, max_chars, rest))
+            cur = ""
    if cur:
-        chunks.append("\n\n".join(cur))
-    return chunks
+        out.append(cur)
+    return out


 def _parse_claims(content: str) -> list[dict]:
@@ -28,8 +28,8 @@ def _document_text(doc, *, user_agent: str) -> str:
    raise ValueError(f"no text source for {doc['doc_id']} (kind={doc['kind']}, url={doc['url']})")


-def run_extract(conn, sc, cfg, *, limit: int = 10, max_chunks_per_doc: int = 4,
-                chunk_chars: int = 18_000, lease_seconds: int = 900,
+def run_extract(conn, sc, cfg, *, limit: int = 10, max_chunks_per_doc: int = 999,
+                chunk_chars: int = 12_000, lease_seconds: int = 900,
                worker_id: str = "extract-1") -> dict:
    from .backends import from_config as backend_from_config
    backend = backend_from_config(cfg, sc)
Author	SHA1	Message	Date
Keysat	6d2c3fbf08	Roadmap: capture daily activity digest email (SMTP, Job A-gated findings)	2026-06-16 15:52:36 -05:00
Keysat	29bca8b387	Tighten Current state; document two-sided net-value interpretation	2026-06-16 12:53:14 -05:00
Keysat	0cba2626d3	Record Strike test conditional pass; track reflexivity follow-up Full STRIKE2022 pipeline ran clean: 63 filing extracts (3,330 claims, 56,008 total), embed-claims indexed all into Qdrant, two-sided live/test. Engine refuses the false positive (net=+0.25, capped single bitcoin cluster, far below the 2.0 firing bar) but the own_network-drop reflexivity demo is unexercised (own_net=0, live==test) because RHR/CD/Bitcoin.Review were deferred at transcription 2026-06-08. Accepted as a conditional pass; un-defer follow-up tracked in ROADMAP.	2026-06-16 12:39:13 -05:00
Keysat	0b001b49d5	Handoff: record Gemini timeout lesson; Strike extraction near complete Fold the hang lesson into the Gemini operational rule (disable thinking AND set a timeout) and refresh Current state for the in-progress Gemini extraction (~68 docs left, 52.7k claims) and the gating Strike test.	2026-06-16 08:45:12 -05:00
Keysat	87b6b05d67	Add request timeout and retry to Gemini extraction backend A timeout-less generate_content call hung the single-threaded extract worker for ~50 min mid-batch. Set an HTTP timeout (120s) plus 4 retries with backoff, mirroring SparkControl._post; transient 504/read-timeouts now self-heal instead of freezing the run.	2026-06-16 08:45:12 -05:00
Keysat	6f4698a98c	Handoff: durable chunker/Gemini rules; Strike extraction in progress Record two recurrence-prone gotchas in Key operational rules (transcript chunking must cap every chunk; Gemini backend must disable thinking) and rewrite Current state for the in-progress Gemini extraction batch and the gating Strike test.	2026-06-15 22:47:20 -05:00
Keysat	e8d50efdf4	Disable Gemini thinking budget in extraction backend gemini-2.5-flash thinks by default and spent ~3.8k of the 4k output budget on reasoning, hitting MAX_TOKENS with a truncated JSON body -> 0 claims parsed. Set thinking_budget=0 so the full budget goes to the answer (mirrors the local path's enable_thinking=False). On the validation chunk this went from 0 -> 11 claims.	2026-06-15 22:28:12 -05:00
Keysat	5deffddb17	Fix transcript chunker context overflow; full-coverage extraction defaults chunk_text split only on "\n\n", but ASR transcripts have none (speaker turns are joined by a single "\n"), so whole 2-3h episodes (~250K chars) went to the extractor in one call and 400'd on context overflow. Fall through paragraph -> line -> sentence -> word -> hard char-slice so no chunk exceeds the cap regardless of punctuation; guard max_chars < 1. Default extraction to recall-first full coverage (chunk_chars 12K, max_chunks 999) and expose both as run-extract --chunk-chars / --max-chunks.	2026-06-15 22:28:12 -05:00