Files
ten31-signal-engine/docs/guides/scoring-brain.md
T

76 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
paths:
- signal_engine/signals/**
- signal_engine/extract/prompt.py
---
# Scoring brain — operating guide
The "don't get it wrong" manual for `signal_engine/signals/`. The full decision/falsification log and
hypotheses H1H6 live in `DESIGN_v2.md`; the spine guardrails are in `AGENTS.md`. This file is the
subsystem detail an agent editing the scorers needs.
## The instruments
- `independence.py`**EISC** (Effective Independent Source Count): a noisy-OR connectedness matrix
(source edges + voiceprints + cluster coupling) with inverse-row-sum. Returns `eisc_adj` (= `eisc_raw`
× `xcluster_mult`), `eisc_raw`, `k_eff`, `xcluster_mult`, `per_source_contrib`. `mode='live'` DROPS
`own_network` sources; `mode='test'` keeps them.
- `two_sided.py`**net corroboration** = independence-weighted **affirms denies** over time
(`classify_corpus``net_at``trajectory`). The instrument for the adversarial cases (NOT runway).
- `asof.py` — look-ahead guard (only claims dated ≤ as-of are visible). `windows.py` — windowed
acceleration / window bounds (match window to cadence: ~90d quarterly filings, wider for podcasts).
- `bar.py` / `under_acted.py` — the two-tier gate (evidence→ledger vs promotion→judge) and Job B scorer.
- `llm_helpers.derivative_relevance` — the bounded LLM **classifier** over PRE-FILTERED candidates
(search hits), never a nominator. `_REL_SYS` is its system prompt.
- `run.py` — orchestration / `run_backtest`. `resolver.py` — outcome resolver (currently a stub).
## Classifier invariants that MUST NOT regress
These five are what make the Battery adversarial case pass; each was a real bug found by running it.
1. **`max_tokens` sized to the batch.** A fixed 3000 truncated the JSON mid-array on ~60-claim batches →
empty parse → a whole node silently scored 0. `derivative_relevance` now sets `budget = max(3000,
120*len(claims)+500)`.
2. **Strip `[]` from echoed claim_ids.** The listing presents ids as `- [{id}] ...`; the model
*inconsistently* echoes them back as `[id]`, which misses the bracket-less lookup → all `(missing)`.
Normalize with `str(id).strip().strip("[]").strip()`.
3. **REALIZED-ONLY** (`_REL_SYS`): announcements / plans / intent / "may·will·expects·poised·up-to" are
NOT corroboration — only deployed/closed FACTS affirm. ("$2B announced" ≠ capital deployed.)
4. **ROLE-MATCH** (`_REL_SYS`): the actor must occupy the role the hypothesis is about. For a capital-
*provider* hypothesis, a *borrower* posting collateral is the wrong side → tangential, not affirms.
5. **Hard-evidence guard** (`net_at`, `require_hard_evidence=True`): a source only counts on a side if it
carries a **descriptive/reactive** (realized-fact) claim there; `predictive`/`interpretive` (forecast/
opinion/intent) alone don't qualify it. Reports `hard_affirm_src` / `soft_affirm_src_dropped`.
## EISC / independence rules
- **Bitcoin is one CAPPED cluster** (`cluster_capped_low`, `CAP_VALUE`): within-cluster agreement can
contribute at most ~0.25 of a voice — it can NOT masquerade as independent corroboration. Real
corroboration of a bitcoin thesis must come from OUTSIDE the cluster (e.g. the `banks` cluster).
- **Cross-cluster earns the multiplier** (`xcluster_mult`, gated by `k_eff` = clusters contributing ≥0.5
of a voice). One guest doing the rounds collapses to ~1; it does not earn the gold multiplier.
- **`own_network` quarantine is MATERIALITY-driven** (see AGENTS.md): live mode drops materially-tied
Ten31 sources; test mode keeps them. Validated: own_network-only affirms → live `eisc≈0`, test > 0.
- **Seed edges in `sorted([a,b])` order** (matches `transcribe_worker`'s `sorted()`+`weight+=1` upsert) so
auto-detected and seeded edges share a PK — a reversed-order row DOUBLE-COUNTS (math is frozenset-
undirected but the table PK is ordered). κ: shared_guest 0.85, citation 0.45, community 0.60.
## The adversarial cases (the validation harness)
Pre-registered failed convictions used to test the engine against its target failure mode. Seeds:
`seeds/conviction_log.adversarial.seed.yaml`, `seeds/fanout.{STRIKE,BATTERY}2022.seed.yaml`,
`seeds/resolution.{STRIKE,BATTERY}2022.yaml`, `seeds/resolution_outcomes.adversarial.yaml`.
- **BATTERY2022 (timing/disconfirmation).** BTC-collateralized lending: demand rose, institutional
*supply* failed. **PASS = demand-net rises while supply-net stays flat (≈0).** Run:
`two-sided --conviction BATTERY2022 --nodes demand,supply --modes live`. Supply resolves ONLY on
committed/DEPLOYED capital; policy/regulation is CONTEXT (the custody-policy node), never supply (S1).
- **STRIKE2022 (reflexivity/false-positive).** Lightning-retail-payments thesis FAILED. **PASS = net
stays quiet in `live` (own_network dropped) while it would fire in `test`** — the engine refusing the
intra-cluster echo. Run `two-sided --conviction STRIKE2022 --modes live,test`. The REALIZED-ONLY rule
is load-bearing here (speculative "Lightning will revolutionize payments" is `predictive`, not signal).
**Standing rule S1:** derivatives resolve on OUTCOME (scaled substance), never milestones or enablers.
An announced program / a regulatory unblock / a single bank's toe-in is CONTEXT, not corroboration.