Files
ten31-signal-engine/ROADMAP.md
T
Keysat 0cba2626d3 Record Strike test conditional pass; track reflexivity follow-up
Full STRIKE2022 pipeline ran clean: 63 filing extracts (3,330 claims,
56,008 total), embed-claims indexed all into Qdrant, two-sided live/test.
Engine refuses the false positive (net=+0.25, capped single bitcoin
cluster, far below the 2.0 firing bar) but the own_network-drop
reflexivity demo is unexercised (own_net=0, live==test) because
RHR/CD/Bitcoin.Review were deferred at transcription 2026-06-08.
Accepted as a conditional pass; un-defer follow-up tracked in ROADMAP.
2026-06-16 12:39:13 -05:00

54 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ten31 Signal Engine — ROADMAP
Longer-term backlog (the near-term snapshot lives in `AGENTS.md` → Current state). Rationale and the
falsification hypotheses (H1H6) are in `DESIGN_v2.md`.
## Scoring brain — the real validation
- **Frontier-fan-out test (H6) — the untested half, = the actual §1.1 miss.** Seed a 2023 conviction,
give the model 2023-ONLY context, let it PROPOSE the derivatives, then score that tree's precision/
recall against what actually repriced. The §7.1 backtest hand-wrote the tree (hindsight leakage); this
is the part that matters and isn't tested yet.
- **Estimator rework (H4).** Replace the fragile 2nd-derivative *acceleration* with a persistence /
level-crossing test on the corroboration arrival rate, with per-source-type window cadence.
- **Build the real resolver.** `signals/resolver.py` is a stub. Settle the lead-time-vs-actual-repricing
debate empirically against structured outcomes (price, FERC interconnection queue, PPAs, capex, policy).
- **Extend claim-type weighting to the §7.1 power-infra tree** (it currently only gates the bitcoin
adversarial cases; descriptive-deployed > predictive-intent should apply everywhere).
- **Job A scorers** (emergence / stance / intersection) for the forward Discovery pilot.
- **MD&A targeting** for filings — extract Item 7, not front-matter boilerplate.
## Corpus & independence
- **Complete the Strike reflexivity demonstration (deferred 2026-06-16).** The STRIKE2022 adversarial test
is a *conditional* pass: the engine refuses the false positive via the capped single-bitcoin-cluster
guard (`net=+0.25``EISC_FLOOR=2.0`), but the own_network-drop mechanism (live quiet < test fires) is
unexercised because RHR (80) / CD (77) / Bitcoin.Review (12) were deferred at transcription on 2026-06-08
(`own_net=0`, live==test). To finish: un-defer + transcribe the **RHR/CD 202223 Lightning-retail window**
`run-extract` → re-run `two-sided --conviction STRIKE2022 --modes live,test`; PASS = test fires while
live stays quiet. Costs constrained audio-GPU time, hence deferred.
- **Confirm materiality** of the remaining `own_network`-flagged sources with Grant: Unchained, Debifi,
Coinkite (Bitcoin.Review). Immaterial → flip to independent (the River/Swan precedent).
- **BTC Sessions (Ben Perrin)** — strongest still-missing independent high-Strike merchant/wallet-adoption
show; resolve feed + ingest (a task chip exists for it).
- **River image-PDF reports** — the 2022 Lightning report + 2025/2026 adoption reports have no text layer;
add an OCR / page-rasterization+vision path to ingest them.
- Broad, **lineage-aware** corpus expansion toward independent vantage points (not more correlated
sell-side / trade-press voices).
## Infrastructure & ops
- **Add an automated test suite** (none today) — start with the scoring primitives (EISC, two_sided,
as-of harness) and the queue.
- **Episode-pipelining** in `transcribe_worker` — download/chunk the next episode while transcribing the
current one, to close the inter-episode GPU idle gap (the per-chunk 2-in-flight path is already done).
- **Corpus-management UI** — add to the corpus over time and see the full corpus selection.
- **Expose pipeline tunables in the UI (with the UI topic).** Extraction chunk size + per-doc chunk cap,
audio chunk length, audio concurrency, etc. are currently hardcoded defaults (now also CLI flags on
`run-extract`: `--chunk-chars`, `--max-chunks`). Surface them in the UI so they're visible/adjustable,
not black-box assumptions we forget about. Tie to the corpus-management UI work.
- **Forward live operation** — the only real test: scoring un-pre-selected signals as they arrive, with
the dual-evaluation ledger as arbiter.
## Packaging / deploy
- **Start9 `s9pk` packaging.** Build with `make x86` then `make install``immense-voyage.local`.
Bump the package version in the manifest BEFORE building (Start9 0.4.x won't recognize an un-bumped
rebuild). See the placement standard for infra conventions.