Record Strike test conditional pass; track reflexivity follow-up

Full STRIKE2022 pipeline ran clean: 63 filing extracts (3,330 claims,
56,008 total), embed-claims indexed all into Qdrant, two-sided live/test.
Engine refuses the false positive (net=+0.25, capped single bitcoin
cluster, far below the 2.0 firing bar) but the own_network-drop
reflexivity demo is unexercised (own_net=0, live==test) because
RHR/CD/Bitcoin.Review were deferred at transcription 2026-06-08.
Accepted as a conditional pass; un-defer follow-up tracked in ROADMAP.
This commit is contained in:
Keysat
2026-06-16 12:39:13 -05:00
parent 0b001b49d5
commit 0cba2626d3
2 changed files with 28 additions and 13 deletions
+21 -13
View File
@@ -116,17 +116,25 @@ IP appears only as an env-var default.
## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`)
- **Strike adversarial test: extraction ~complete — this is the gating step.** Chunker 400 bug FIXED
(old `\n\n`-only split sent whole 23 h episodes in one call → context overflow); extraction is now
recall-first full coverage (12K chars). Drained the ~700-doc backlog via the **Gemini backend**
(one-time PUBLIC overflow), hardened with thinking-off + 120 s timeout/4 retries after a timeout-less
call hung the worker ~50 min. At session end **~68 docs left** (the SEC/FMP filings tail); **52.7k
claims** in the DB, 0 failures, 3 transient timeouts. **NEXT once extraction finishes:** `embed-claims`
`two-sided --conviction STRIKE2022 --modes live,test` (PASS = quiet in live, fires in test).
- **If extraction stopped early** (session ended): resume with `EXTRACTION_BACKEND=gemini run-extract
--limit 800` (drop the env var for the local Qwen path). Pending claims still need `embed-claims` after.
- **Strike adversarial test: CONDITIONAL PASS (2026-06-16).** Full pipeline ran: extraction drained the
final 63 filing jobs via the Gemini backend (3,330 claims, **0 failures/timeouts/429s**) → **56,008
claims** total → `embed-claims` indexed all 56,008 into Qdrant (points == claims, hybrid dense+BM25) →
`two-sided --conviction STRIKE2022 --modes live,test`. **Result:** the engine correctly **refuses the
false positive** — the two scoring nodes (`lightning-retail-acceptance`, `merchant-lightning-integration`)
sit at `net=+0.25`, the *capped single-bitcoin-cluster* value, far below the `EISC_FLOOR=2.0` firing bar
(`signals/bar.py`); the §3 "Bitcoin is one capped cluster" guardrail holds. **But the reflexivity
*demonstration* (live < test via own_network drop) is NOT exercised:** `own_net=0` and live==test because
the own_network shows that carry the reflexive Lightning chatter — **RHR (80), CD (77), Bitcoin.Review (12)
= 169 eps — were deferred at transcription 2026-06-08** ("focus on WBD/Livera/Rooke/Anita"; no audio
downloaded), so they have 0 claims. (TFTC partially transcribed, 19/80 → 329 claims; the current +0.25
comes from *independent* bitcoin-cluster shows.) **Operator call 2026-06-16: accept the conditional pass,
no audio-GPU spend now.** To fully demonstrate reflexivity later: un-defer + transcribe the RHR/CD
202223 Lightning-retail window → re-extract → re-run `two-sided` (then test mode should fire while live
stays quiet). Tracked in `ROADMAP.md`.
- **Battery test PASSES; §7.1 power-infra qualified YES** (both unchanged).
- **3 commits local + `backends.py` uncommitted (Gemini timeout/retry) — ALL UNPUSHED.** Push to `main`
is blocked by the permission classifier; run `git push origin main` (your updated rule allows main).
- Corpus: bitcoin podcasts, SEC/FMP filings (+`banks` cluster), Battery corpus, River research; EISC
edges seeded for the bitcoin cluster.
- **5 commits ahead of `origin/main`, UNPUSHED** (Gitea `immense-voyage.local`). The `backends.py`
timeout/retry change the prior handoff flagged as uncommitted was in fact committed (`87b6b05`); tree is
otherwise clean apart from this Current-state edit. Run `git push origin main`.
- Corpus: bitcoin podcasts (own_network: TFTC partial; RHR/CD/Bitcoin.Review deferred), SEC/FMP filings
(+`banks` cluster, now extracted: Robinhood 2216, Morgan Stanley 644, JPMorgan 382… + power-infra names
Oklo/NuScale/Cipher/TeraWulf), Battery corpus, River research; EISC edges seeded for the bitcoin cluster.
+7
View File
@@ -18,6 +18,13 @@ falsification hypotheses (H1H6) are in `DESIGN_v2.md`.
- **MD&A targeting** for filings — extract Item 7, not front-matter boilerplate.
## Corpus & independence
- **Complete the Strike reflexivity demonstration (deferred 2026-06-16).** The STRIKE2022 adversarial test
is a *conditional* pass: the engine refuses the false positive via the capped single-bitcoin-cluster
guard (`net=+0.25``EISC_FLOOR=2.0`), but the own_network-drop mechanism (live quiet < test fires) is
unexercised because RHR (80) / CD (77) / Bitcoin.Review (12) were deferred at transcription on 2026-06-08
(`own_net=0`, live==test). To finish: un-defer + transcribe the **RHR/CD 202223 Lightning-retail window**
`run-extract` → re-run `two-sided --conviction STRIKE2022 --modes live,test`; PASS = test fires while
live stays quiet. Costs constrained audio-GPU time, hence deferred.
- **Confirm materiality** of the remaining `own_network`-flagged sources with Grant: Unchained, Debifi,
Coinkite (Bitcoin.Review). Immaterial → flip to independent (the River/Swan precedent).
- **BTC Sessions (Ben Perrin)** — strongest still-missing independent high-Strike merchant/wallet-adoption