diff --git a/.claude/rules/scoring-brain.md b/.claude/rules/scoring-brain.md new file mode 120000 index 0000000..a910dd2 --- /dev/null +++ b/.claude/rules/scoring-brain.md @@ -0,0 +1 @@ +../../docs/guides/scoring-brain.md \ No newline at end of file diff --git a/.env.example b/.env.example new file mode 100644 index 0000000..8e90351 --- /dev/null +++ b/.env.example @@ -0,0 +1,31 @@ +# Ten31 Signal Engine — environment template. Copy to .env and fill in. +# Real values live in .env (gitignored). NEVER commit real keys. + +# --- Spark Control gateway (the single chokepoint; operator's LAN) --- +SPARK_CONTROL_URL=https://YOUR-SPARK-GATEWAY:62419 +SPARK_VERIFY_TLS=false +SPARK_TIMEOUT_S=180 +LOCAL_LLM_MODEL=RedHatAI/Qwen3.6-35B-A3B-NVFP4 +EMBED_MODEL=BAAI/bge-m3 +TRANSCRIBE_MODEL=nvidia/parakeet-tdt-0.6b-v3 +AUDIO_CONCURRENCY=2 # global in-flight cap across both audio endpoints (ceiling 3) + +# --- extraction backend: 'local' (Qwen via Spark) | 'gemini' (PUBLIC-data overflow only) --- +EXTRACTION_BACKEND=local +GEMINI_API_KEY= +GEMINI_MODEL=gemini-2.5-flash + +# --- frontier (bounded final step; sovereignty boundary applies) --- +ANTHROPIC_API_KEY= +FRONTIER_MODEL=claude-opus-4-8 + +# --- data sources --- +FMP_API_KEY= +EDGAR_USER_AGENT=Ten31 Research you@example.com + +# --- local --- +DATA_DIR=./data +AUDIO_CACHE_DIR=./data/audio-cache +DATABASE_URL= +UI_PORT=8000 +LOG_LEVEL=INFO diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..3f104e6 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,117 @@ +# Ten31 Signal Engine — AGENTS.md + +> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items +> tagged `(ten31-signal-engine)` and surface them before proposing next steps; triage with `/triage`. + +A recurring pipeline that ingests a growing corpus of **audio** (podcasts, YouTube) and **text** (SEC +filings, earnings calls, policy/lender/research docs), extracts structured **propositions ("claims")**, +and surfaces **signal over time** through Ten31's investment thesis as a *relevance lens* — logging every +surfaced signal as a **falsifiable prediction** scored against reality. + +**Source of truth (in order):** `ten31-signal-engine-handoff.md` (the spec — wins on any conflict; §refs +point into it) › `DESIGN_v2.md` (the living decision/falsification log — read before changing scoring) › +this file's **Current state**. `README.md` is the user-facing intro. + +## The spine — NON-NEGOTIABLE guardrails (never violate) + +1. **Nominate-then-judge.** Statistics & graph structure NOMINATE candidates; the frontier model only + JUDGES / FANS OUT a pre-filtered shortlist. The frontier never nominates from the raw corpus. +2. **Propositions, not vibes.** Extract atomic claims; separate **topic** from **stance**. +3. **Discount convergence by connectedness.** Independence is earned, not counted — the EISC graph + (source edges + voiceprints) downweights echo. Bitcoin is one *capped* cluster: within-cluster + agreement can NOT masquerade as independent corroboration; cross-cluster earns the multiplier. +4. **Thesis is a LENS, not a gate on truth.** The engine must surface signals *against* Ten31's thesis, + not just for it. +5. **Dual-evaluation ledger from day one** — precision AND recall; every signal is a logged prediction. +6. **~95% local compute via Spark Control.** Call the gateway's HTTP endpoints; do NOT stand up your own + vLLM / Whisper / Qdrant. Gemini is an explicit *overflow* lever for PUBLIC data only. +7. **Sovereignty boundary (hard).** Exposure/positioning/conviction data and the Strike/Battery + investment memos NEVER go to the frontier. Route sensitive frontier calls through + `/scrub → frontier → /rehydrate` (scrub identities, not substance). Read the memos LOCALLY only. + +**Two jobs:** **A — Discovery** (emergent themes via independent cross-cluster *convergence* scored on +acceleration; contrarian stances; their intersection). **B — Conviction-action gap** (fan held +convictions to 2nd/3rd-order derivatives, catch early corroboration — the countermeasure to the 2023 +"power is the binding constraint on AI/compute" miss: right on the root, late to the derivatives). + +## Architecture + +`signal_engine/` (Python package, run as `python -m signal_engine `): +- `config.py` — env-driven `Config` (+ `.env` loader). `spark/client.py` — the SINGLE gateway chokepoint + (no other module knows the gateway URL); scrub/rehydrate live here. +- `ingest/` — `edgar` (SEC), `earnings` (FMP REST), `feeds`+`podcasts` (RSS), `download`, `chunker`, + `transcribe_worker` (local Parakeet), `gemini_transcribe` (bulk overflow), `docs` (HTML/PDF/RSS text + fetcher for policy/lender/research), `identify`, `speaker_stitch`. +- `extract/` — `claims`+`worker` (proposition extraction), `backends` (LocalQwen | Gemini), `prompt`, + `html_text`. `embedstore/` — `embedder` + `qdrant_store` (hybrid dense+BM25). +- `signals/` (the scoring brain) — `independence` (EISC), `asof` (look-ahead guard), `windows`, + `under_acted` (Job B), `bar` (two-tier gate), `two_sided` (affirms−denies net-corroboration), + `llm_helpers` (`derivative_relevance`), `confusion` (precision/recall), `external` (price/outcome + fetcher), `ledger_writer` (§6.6 prediction ledger), `resolver` (stub), `run`. +- `store/` — `db` (SQLite + idempotent migrations), `schema.sql`, `seed`, `sources`. `backfill/queue.py` + (the job queue). `ui/app.py` (FastAPI corpus/eval UI). `util.py`. +- Data lands in `data/` (gitignored): `signal.db`, `transcripts/`, `docs/`, `audio-cache/`. + +**Flow:** seed sources/convictions/fanout → ingest (→ `documents` + `transcribe`/`extract` jobs) → +`run-transcribe` / `run-extract` drain the queue → `claims` → `embed-claims` (Qdrant) → scorers +(`backtest`, `two-sided`) read the proposition store as-of a date. + +## Build / run + +- **Setup:** virtualenv at `.venv` (Python 3.14). `.venv/bin/pip install -r requirements.txt`. +- **Invoke:** `.venv/bin/python -m signal_engine `. **`--help` is authoritative**; the rest is a + map: `init-db`; seeding `seed-sources`/`seed-convictions`/`seed-fanout`/`seed-edges`/`load-feeds`; + ingest `ingest-edgar`/`ingest-earnings`/`ingest-podcast`/`ingest-doc`/`ingest-doc-manifest`/ + `ingest-feed-text`; queue drain `run-transcribe`/`run-transcribe-gemini`/`run-extract`; index + `embed-claims`/`search`; score `backtest`/`two-sided`/`confusion-matrix`; inspect `queue-status`/ + `spark-status`/`feed-peek`/`provenance`/`db-tables`; `serve` (UI). +- **DB:** `python -m signal_engine init-db` (idempotent — re-creates schema + runs additive migrations). +- **Tests:** ⚠️ **no automated test suite yet** (no `tests/`, no pytest). Verification is by running + commands against the live gateway. Adding a test harness is on the ROADMAP. +- **Lint/format:** none configured. Match the surrounding style (dense, §-referenced docstrings). + +## Spark Control infra (`SPARK_CONTROL_URL`, self-signed TLS → `SPARK_VERIFY_TLS=false`) + +One gateway fronts two DGX Sparks: **vLLM** `RedHatAI/Qwen3.6-35B-A3B-NVFP4` on `:103`; **Parakeet** +ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the only URL anything calls. +- **AUDIO concurrency (learned 2026-06-09):** single serial GPU shared with the operator's production + meeting app. Cap **2 in-flight (ceiling 3), GLOBAL across both audio endpoints** — a process-wide + `BoundedSemaphore` (`AUDIO_CONCURRENCY` env, default 2). Going wider buys zero throughput. Transient + 1–4s "busy blips" (broken-pipe/503/timeout) are NOT failures → short retry-backoff. The + `transcribe_worker` runs a 2-wide chunk pool; the old size-1 lock was ~2.5× slower. + +## Key operational rules (learned this build — easy to get wrong) + +- **`own_network` quarantine is MATERIALITY-driven, not "any investment."** Quarantine (drop in live + scoring, keep in test) only for MATERIAL ties where the source is part of Ten31's voice: the partners' + own shows (TFTC, Citadel Dispatch, Rabbit Hole Recap), the Battery *partnership*, material portfolio + leads. **Immaterial passive stakes → INDEPENDENT** (River and Swan/Cafe Bitcoin were corrected to + independent). Unconfirmed: Unchained, Debifi, Coinkite (held quarantined pending Grant's materiality call). +- **Gemini quota is a rolling ~24h window** (~291 hour-long episodes / ~51M tokens), not a calendar-day + reset. Bulk transcription overflows there; expect 429 RESOURCE_EXHAUSTED past the window. +- **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read + **`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type + hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the + Battery/Strike adversarial-test PASS criteria. Don't regress those invariants (they're what make + Battery pass). Full decision log: `DESIGN_v2.md`. + +## Secrets / env + +Real values live in **`.env`** (gitignored). `.env.example` lists the names. Keys used: `SPARK_CONTROL_URL`, +`SPARK_VERIFY_TLS`, `LOCAL_LLM_MODEL`, `EMBED_MODEL`, `TRANSCRIBE_MODEL`, `AUDIO_CONCURRENCY`, +`EXTRACTION_BACKEND`, `GEMINI_API_KEY`, `GEMINI_MODEL`, `ANTHROPIC_API_KEY`, `FMP_API_KEY`, +`EDGAR_USER_AGENT`, `DATA_DIR`, `UI_PORT`, `LOG_LEVEL`. Never commit key values; the private LAN gateway +IP appears only as an env-var default. + +## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`) + +- **Battery adversarial test: PASSES.** Corpus built (23 docs via the `docs` fetcher); after the three + scoring fixes the engine reads demand-net rising (+3.9) while **supply stays flat at 0.0** — correctly + rejecting Cantor's *announced* $2B and borrower-side collateral claims as not-realized-supply. +- **Strike adversarial test: QUEUED & auto-firing.** The independent leg (What Bitcoin Did, Stephan + Livera, Kevin Rooke, Anita Posch, Cafe Bitcoin, + River research — all independent) is nearly done + transcribing on the fixed local Spark path. `run_strike_pipeline.sh` (a background watcher) auto-runs + extract → embed → `two-sided STRIKE2022` (live vs test reflexivity) when transcription hits zero. +- **§7.1 power-infra backtest:** qualified YES (corpus-gated; runway/precision caveats in `DESIGN_v2.md`). +- Corpus now spans bitcoin podcasts, SEC/FMP company filings (incl. 6 major banks + Robinhood, a new + `banks` cluster), the Battery text corpus, and River research. EISC edges seeded for the bitcoin cluster. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..6c2f051 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,42 @@ +# Ten31 Signal Engine — ROADMAP + +Longer-term backlog (the near-term snapshot lives in `AGENTS.md` → Current state). Rationale and the +falsification hypotheses (H1–H6) are in `DESIGN_v2.md`. + +## Scoring brain — the real validation +- **Frontier-fan-out test (H6) — the untested half, = the actual §1.1 miss.** Seed a 2023 conviction, + give the model 2023-ONLY context, let it PROPOSE the derivatives, then score that tree's precision/ + recall against what actually repriced. The §7.1 backtest hand-wrote the tree (hindsight leakage); this + is the part that matters and isn't tested yet. +- **Estimator rework (H4).** Replace the fragile 2nd-derivative *acceleration* with a persistence / + level-crossing test on the corroboration arrival rate, with per-source-type window cadence. +- **Build the real resolver.** `signals/resolver.py` is a stub. Settle the lead-time-vs-actual-repricing + debate empirically against structured outcomes (price, FERC interconnection queue, PPAs, capex, policy). +- **Extend claim-type weighting to the §7.1 power-infra tree** (it currently only gates the bitcoin + adversarial cases; descriptive-deployed > predictive-intent should apply everywhere). +- **Job A scorers** (emergence / stance / intersection) for the forward Discovery pilot. +- **MD&A targeting** for filings — extract Item 7, not front-matter boilerplate. + +## Corpus & independence +- **Confirm materiality** of the remaining `own_network`-flagged sources with Grant: Unchained, Debifi, + Coinkite (Bitcoin.Review). Immaterial → flip to independent (the River/Swan precedent). +- **BTC Sessions (Ben Perrin)** — strongest still-missing independent high-Strike merchant/wallet-adoption + show; resolve feed + ingest (a task chip exists for it). +- **River image-PDF reports** — the 2022 Lightning report + 2025/2026 adoption reports have no text layer; + add an OCR / page-rasterization+vision path to ingest them. +- Broad, **lineage-aware** corpus expansion toward independent vantage points (not more correlated + sell-side / trade-press voices). + +## Infrastructure & ops +- **Add an automated test suite** (none today) — start with the scoring primitives (EISC, two_sided, + as-of harness) and the queue. +- **Episode-pipelining** in `transcribe_worker` — download/chunk the next episode while transcribing the + current one, to close the inter-episode GPU idle gap (the per-chunk 2-in-flight path is already done). +- **Corpus-management UI** — add to the corpus over time and see the full corpus selection. +- **Forward live operation** — the only real test: scoring un-pre-selected signals as they arrive, with + the dual-evaluation ledger as arbiter. + +## Packaging / deploy +- **Start9 `s9pk` packaging.** Build with `make x86` then `make install` → `immense-voyage.local`. + Bump the package version in the manifest BEFORE building (Start9 0.4.x won't recognize an un-bumped + rebuild). See the placement standard for infra conventions. diff --git a/docs/guides/scoring-brain.md b/docs/guides/scoring-brain.md new file mode 100644 index 0000000..9fe8540 --- /dev/null +++ b/docs/guides/scoring-brain.md @@ -0,0 +1,75 @@ +--- +paths: + - signal_engine/signals/** + - signal_engine/extract/prompt.py +--- + +# Scoring brain — operating guide + +The "don't get it wrong" manual for `signal_engine/signals/`. The full decision/falsification log and +hypotheses H1–H6 live in `DESIGN_v2.md`; the spine guardrails are in `AGENTS.md`. This file is the +subsystem detail an agent editing the scorers needs. + +## The instruments + +- `independence.py` — **EISC** (Effective Independent Source Count): a noisy-OR connectedness matrix + (source edges + voiceprints + cluster coupling) with inverse-row-sum. Returns `eisc_adj` (= `eisc_raw` + × `xcluster_mult`), `eisc_raw`, `k_eff`, `xcluster_mult`, `per_source_contrib`. `mode='live'` DROPS + `own_network` sources; `mode='test'` keeps them. +- `two_sided.py` — **net corroboration** = independence-weighted **affirms − denies** over time + (`classify_corpus` → `net_at` → `trajectory`). The instrument for the adversarial cases (NOT runway). +- `asof.py` — look-ahead guard (only claims dated ≤ as-of are visible). `windows.py` — windowed + acceleration / window bounds (match window to cadence: ~90d quarterly filings, wider for podcasts). +- `bar.py` / `under_acted.py` — the two-tier gate (evidence→ledger vs promotion→judge) and Job B scorer. +- `llm_helpers.derivative_relevance` — the bounded LLM **classifier** over PRE-FILTERED candidates + (search hits), never a nominator. `_REL_SYS` is its system prompt. +- `run.py` — orchestration / `run_backtest`. `resolver.py` — outcome resolver (currently a stub). + +## Classifier invariants that MUST NOT regress + +These five are what make the Battery adversarial case pass; each was a real bug found by running it. + +1. **`max_tokens` sized to the batch.** A fixed 3000 truncated the JSON mid-array on ~60-claim batches → + empty parse → a whole node silently scored 0. `derivative_relevance` now sets `budget = max(3000, + 120*len(claims)+500)`. +2. **Strip `[]` from echoed claim_ids.** The listing presents ids as `- [{id}] ...`; the model + *inconsistently* echoes them back as `[id]`, which misses the bracket-less lookup → all `(missing)`. + Normalize with `str(id).strip().strip("[]").strip()`. +3. **REALIZED-ONLY** (`_REL_SYS`): announcements / plans / intent / "may·will·expects·poised·up-to" are + NOT corroboration — only deployed/closed FACTS affirm. ("$2B announced" ≠ capital deployed.) +4. **ROLE-MATCH** (`_REL_SYS`): the actor must occupy the role the hypothesis is about. For a capital- + *provider* hypothesis, a *borrower* posting collateral is the wrong side → tangential, not affirms. +5. **Hard-evidence guard** (`net_at`, `require_hard_evidence=True`): a source only counts on a side if it + carries a **descriptive/reactive** (realized-fact) claim there; `predictive`/`interpretive` (forecast/ + opinion/intent) alone don't qualify it. Reports `hard_affirm_src` / `soft_affirm_src_dropped`. + +## EISC / independence rules + +- **Bitcoin is one CAPPED cluster** (`cluster_capped_low`, `CAP_VALUE`): within-cluster agreement can + contribute at most ~0.25 of a voice — it can NOT masquerade as independent corroboration. Real + corroboration of a bitcoin thesis must come from OUTSIDE the cluster (e.g. the `banks` cluster). +- **Cross-cluster earns the multiplier** (`xcluster_mult`, gated by `k_eff` = clusters contributing ≥0.5 + of a voice). One guest doing the rounds collapses to ~1; it does not earn the gold multiplier. +- **`own_network` quarantine is MATERIALITY-driven** (see AGENTS.md): live mode drops materially-tied + Ten31 sources; test mode keeps them. Validated: own_network-only affirms → live `eisc≈0`, test > 0. +- **Seed edges in `sorted([a,b])` order** (matches `transcribe_worker`'s `sorted()`+`weight+=1` upsert) so + auto-detected and seeded edges share a PK — a reversed-order row DOUBLE-COUNTS (math is frozenset- + undirected but the table PK is ordered). κ: shared_guest 0.85, citation 0.45, community 0.60. + +## The adversarial cases (the validation harness) + +Pre-registered failed convictions used to test the engine against its target failure mode. Seeds: +`seeds/conviction_log.adversarial.seed.yaml`, `seeds/fanout.{STRIKE,BATTERY}2022.seed.yaml`, +`seeds/resolution.{STRIKE,BATTERY}2022.yaml`, `seeds/resolution_outcomes.adversarial.yaml`. + +- **BATTERY2022 (timing/disconfirmation).** BTC-collateralized lending: demand rose, institutional + *supply* failed. **PASS = demand-net rises while supply-net stays flat (≈0).** Run: + `two-sided --conviction BATTERY2022 --nodes demand,supply --modes live`. Supply resolves ONLY on + committed/DEPLOYED capital; policy/regulation is CONTEXT (the custody-policy node), never supply (S1). +- **STRIKE2022 (reflexivity/false-positive).** Lightning-retail-payments thesis FAILED. **PASS = net + stays quiet in `live` (own_network dropped) while it would fire in `test`** — the engine refusing the + intra-cluster echo. Run `two-sided --conviction STRIKE2022 --modes live,test`. The REALIZED-ONLY rule + is load-bearing here (speculative "Lightning will revolutionize payments" is `predictive`, not signal). + +**Standing rule S1:** derivatives resolve on OUTCOME (scaled substance), never milestones or enablers. +An announced program / a regulatory unblock / a single bank's toe-in is CONTEXT, not corroboration.