81 lines
5.7 KiB
Markdown
81 lines
5.7 KiB
Markdown
---
|
||
paths:
|
||
- signal_engine/signals/**
|
||
- signal_engine/extract/prompt.py
|
||
---
|
||
|
||
# Scoring brain — operating guide
|
||
|
||
The "don't get it wrong" manual for `signal_engine/signals/`. The full decision/falsification log and
|
||
hypotheses H1–H6 live in `DESIGN_v2.md`; the spine guardrails are in `AGENTS.md`. This file is the
|
||
subsystem detail an agent editing the scorers needs.
|
||
|
||
## The instruments
|
||
|
||
- `independence.py` — **EISC** (Effective Independent Source Count): a noisy-OR connectedness matrix
|
||
(source edges + voiceprints + cluster coupling) with inverse-row-sum. Returns `eisc_adj` (= `eisc_raw`
|
||
× `xcluster_mult`), `eisc_raw`, `k_eff`, `xcluster_mult`, `per_source_contrib`. `mode='live'` DROPS
|
||
`own_network` sources; `mode='test'` keeps them.
|
||
- `two_sided.py` — **net corroboration** = independence-weighted **affirms − denies** over time
|
||
(`classify_corpus` → `net_at` → `trajectory`). The instrument for the adversarial cases (NOT runway).
|
||
- `asof.py` — look-ahead guard (only claims dated ≤ as-of are visible). `windows.py` — windowed
|
||
acceleration / window bounds (match window to cadence: ~90d quarterly filings, wider for podcasts).
|
||
- `bar.py` / `under_acted.py` — the two-tier gate (evidence→ledger vs promotion→judge) and Job B scorer.
|
||
- `llm_helpers.derivative_relevance` — the bounded LLM **classifier** over PRE-FILTERED candidates
|
||
(search hits), never a nominator. `_REL_SYS` is its system prompt.
|
||
- `run.py` — orchestration / `run_backtest`. `resolver.py` — outcome resolver (currently a stub).
|
||
|
||
## Classifier invariants that MUST NOT regress
|
||
|
||
These five are what make the Battery adversarial case pass; each was a real bug found by running it.
|
||
|
||
1. **`max_tokens` sized to the batch.** A fixed 3000 truncated the JSON mid-array on ~60-claim batches →
|
||
empty parse → a whole node silently scored 0. `derivative_relevance` now sets `budget = max(3000,
|
||
120*len(claims)+500)`.
|
||
2. **Strip `[]` from echoed claim_ids.** The listing presents ids as `- [{id}] ...`; the model
|
||
*inconsistently* echoes them back as `[id]`, which misses the bracket-less lookup → all `(missing)`.
|
||
Normalize with `str(id).strip().strip("[]").strip()`.
|
||
3. **REALIZED-ONLY** (`_REL_SYS`): announcements / plans / intent / "may·will·expects·poised·up-to" are
|
||
NOT corroboration — only deployed/closed FACTS affirm. ("$2B announced" ≠ capital deployed.)
|
||
4. **ROLE-MATCH** (`_REL_SYS`): the actor must occupy the role the hypothesis is about. For a capital-
|
||
*provider* hypothesis, a *borrower* posting collateral is the wrong side → tangential, not affirms.
|
||
5. **Hard-evidence guard** (`net_at`, `require_hard_evidence=True`): a source only counts on a side if it
|
||
carries a **descriptive/reactive** (realized-fact) claim there; `predictive`/`interpretive` (forecast/
|
||
opinion/intent) alone don't qualify it. Reports `hard_affirm_src` / `soft_affirm_src_dropped`.
|
||
|
||
## EISC / independence rules
|
||
|
||
- **Bitcoin is one CAPPED cluster** (`cluster_capped_low`, `CAP_VALUE`): within-cluster agreement can
|
||
contribute at most ~0.25 of a voice — it can NOT masquerade as independent corroboration. Real
|
||
corroboration of a bitcoin thesis must come from OUTSIDE the cluster (e.g. the `banks` cluster).
|
||
- **Cross-cluster earns the multiplier** (`xcluster_mult`, gated by `k_eff` = clusters contributing ≥0.5
|
||
of a voice). One guest doing the rounds collapses to ~1; it does not earn the gold multiplier.
|
||
- **`own_network` quarantine is MATERIALITY-driven** (see AGENTS.md): live mode drops materially-tied
|
||
Ten31 sources; test mode keeps them. Validated: own_network-only affirms → live `eisc≈0`, test > 0.
|
||
- **Seed edges in `sorted([a,b])` order** (matches `transcribe_worker`'s `sorted()`+`weight+=1` upsert) so
|
||
auto-detected and seeded edges share a PK — a reversed-order row DOUBLE-COUNTS (math is frozenset-
|
||
undirected but the table PK is ordered). κ: shared_guest 0.85, citation 0.45, community 0.60.
|
||
|
||
## The adversarial cases (the validation harness)
|
||
|
||
Pre-registered failed convictions used to test the engine against its target failure mode. Seeds:
|
||
`seeds/conviction_log.adversarial.seed.yaml`, `seeds/fanout.{STRIKE,BATTERY}2022.seed.yaml`,
|
||
`seeds/resolution.{STRIKE,BATTERY}2022.yaml`, `seeds/resolution_outcomes.adversarial.yaml`.
|
||
|
||
- **BATTERY2022 (timing/disconfirmation).** BTC-collateralized lending: demand rose, institutional
|
||
*supply* failed. **PASS = demand-net rises while supply-net stays flat (≈0).** Run:
|
||
`two-sided --conviction BATTERY2022 --nodes demand,supply --modes live`. Supply resolves ONLY on
|
||
committed/DEPLOYED capital; policy/regulation is CONTEXT (the custody-policy node), never supply (S1).
|
||
- **STRIKE2022 (reflexivity/false-positive).** Lightning-retail-payments thesis FAILED. **PASS = net
|
||
stays quiet in `live` (own_network dropped) while it would fire in `test`** — the engine refusing the
|
||
intra-cluster echo. Run `two-sided --conviction STRIKE2022 --modes live,test`. The REALIZED-ONLY rule
|
||
is load-bearing here (speculative "Lightning will revolutionize payments" is `predictive`, not signal).
|
||
**Reading the output:** a single capped bitcoin cluster nets `eisc≈0.25` — already sub-bar vs
|
||
`EISC_FLOOR=2.0`, so a `+0.25` "quiet in live" can be the *cluster cap* refusing the false positive,
|
||
NOT the own_network drop. Check `own_net`: if it's 0, live==test and the reflexivity mechanism is
|
||
unexercised (the affirmers are independent), so a quiet `live` does not by itself prove the echo-drop —
|
||
you need own_network affirms present (`own_net>0`) for `test` to fire above `live`.
|
||
|
||
**Standing rule S1:** derivatives resolve on OUTCOME (scaled substance), never milestones or enablers.
|
||
An announced program / a regulatory unblock / a single bank's toe-in is CONTEXT, not corroboration.
|