Files
ten31-signal-engine/AGENTS.md
T

122 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ten31 Signal Engine — AGENTS.md
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items
> tagged `(ten31-signal-engine)` and surface them before proposing next steps; triage with `/triage`.
A recurring pipeline that ingests a growing corpus of **audio** (podcasts, YouTube) and **text** (SEC
filings, earnings calls, policy/lender/research docs), extracts structured **propositions ("claims")**,
and surfaces **signal over time** through Ten31's investment thesis as a *relevance lens* — logging every
surfaced signal as a **falsifiable prediction** scored against reality.
**Source of truth (in order):** `ten31-signal-engine-handoff.md` (the spec — wins on any conflict; §refs
point into it) `DESIGN_v2.md` (the living decision/falsification log — read before changing scoring)
this file's **Current state**. `README.md` is the user-facing intro.
## The spine — NON-NEGOTIABLE guardrails (never violate)
1. **Nominate-then-judge.** Statistics & graph structure NOMINATE candidates; the frontier model only
JUDGES / FANS OUT a pre-filtered shortlist. The frontier never nominates from the raw corpus.
2. **Propositions, not vibes.** Extract atomic claims; separate **topic** from **stance**.
3. **Discount convergence by connectedness.** Independence is earned, not counted — the EISC graph
(source edges + voiceprints) downweights echo. Bitcoin is one *capped* cluster: within-cluster
agreement can NOT masquerade as independent corroboration; cross-cluster earns the multiplier.
4. **Thesis is a LENS, not a gate on truth.** The engine must surface signals *against* Ten31's thesis,
not just for it.
5. **Dual-evaluation ledger from day one** — precision AND recall; every signal is a logged prediction.
6. **~95% local compute via Spark Control.** Call the gateway's HTTP endpoints; do NOT stand up your own
vLLM / Whisper / Qdrant. Gemini is an explicit *overflow* lever for PUBLIC data only.
7. **Sovereignty boundary (hard).** Exposure/positioning/conviction data and the Strike/Battery
investment memos NEVER go to the frontier. Route sensitive frontier calls through
`/scrub → frontier → /rehydrate` (scrub identities, not substance). Read the memos LOCALLY only.
**Two jobs:** **A — Discovery** (emergent themes via independent cross-cluster *convergence* scored on
acceleration; contrarian stances; their intersection). **B — Conviction-action gap** (fan held
convictions to 2nd/3rd-order derivatives, catch early corroboration — the countermeasure to the 2023
"power is the binding constraint on AI/compute" miss: right on the root, late to the derivatives).
## Architecture
`signal_engine/` (Python package, run as `python -m signal_engine <cmd>`):
- `config.py` — env-driven `Config` (+ `.env` loader). `spark/client.py` — the SINGLE gateway chokepoint
(no other module knows the gateway URL); scrub/rehydrate live here.
- `ingest/``edgar` (SEC), `earnings` (FMP REST), `feeds`+`podcasts` (RSS), `download`, `chunker`,
`transcribe_worker` (local Parakeet), `gemini_transcribe` (bulk overflow), `docs` (HTML/PDF/RSS text
fetcher for policy/lender/research), `identify`, `speaker_stitch`.
- `extract/``claims`+`worker` (proposition extraction), `backends` (LocalQwen | Gemini), `prompt`,
`html_text`. `embedstore/``embedder` + `qdrant_store` (hybrid dense+BM25).
- `signals/` (the scoring brain) — `independence` (EISC), `asof` (look-ahead guard), `windows`,
`under_acted` (Job B), `bar` (two-tier gate), `two_sided` (affirmsdenies net-corroboration),
`llm_helpers` (`derivative_relevance`), `confusion` (precision/recall), `external` (price/outcome
fetcher), `ledger_writer` (§6.6 prediction ledger), `resolver` (stub), `run`.
- `store/``db` (SQLite + idempotent migrations), `schema.sql`, `seed`, `sources`. `backfill/queue.py`
(the job queue). `ui/app.py` (FastAPI corpus/eval UI). `util.py`.
- Data lands in `data/` (gitignored): `signal.db`, `transcripts/`, `docs/`, `audio-cache/`.
**Flow:** seed sources/convictions/fanout → ingest (→ `documents` + `transcribe`/`extract` jobs) →
`run-transcribe` / `run-extract` drain the queue → `claims``embed-claims` (Qdrant) → scorers
(`backtest`, `two-sided`) read the proposition store as-of a date.
## Build / run
- **Setup:** virtualenv at `.venv` (Python 3.14). `.venv/bin/pip install -r requirements.txt`.
- **Invoke:** `.venv/bin/python -m signal_engine <cmd>`. **`--help` is authoritative**; the rest is a
map: `init-db`; seeding `seed-sources`/`seed-convictions`/`seed-fanout`/`seed-edges`/`load-feeds`;
ingest `ingest-edgar`/`ingest-earnings`/`ingest-podcast`/`ingest-doc`/`ingest-doc-manifest`/
`ingest-feed-text`; queue drain `run-transcribe`/`run-transcribe-gemini`/`run-extract`; index
`embed-claims`/`search`; score `backtest`/`two-sided`/`confusion-matrix`; inspect `queue-status`/
`spark-status`/`feed-peek`/`provenance`/`db-tables`; `serve` (UI).
- **DB:** `python -m signal_engine init-db` (idempotent — re-creates schema + runs additive migrations).
- **Tests:** ⚠️ **no automated test suite yet** (no `tests/`, no pytest). Verification is by running
commands against the live gateway. Adding a test harness is on the ROADMAP.
- **Lint/format:** none configured. Match the surrounding style (dense, §-referenced docstrings).
## Spark Control infra (`SPARK_CONTROL_URL`, self-signed TLS → `SPARK_VERIFY_TLS=false`)
One gateway fronts two DGX Sparks: **vLLM** `RedHatAI/Qwen3.6-35B-A3B-NVFP4` on `:103`; **Parakeet**
ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the only URL anything calls.
- **AUDIO concurrency (learned 2026-06-09):** single serial GPU shared with the operator's production
meeting app. Cap **2 in-flight (ceiling 3), GLOBAL across both audio endpoints** — a process-wide
`BoundedSemaphore` (`AUDIO_CONCURRENCY` env, default 2). Going wider buys zero throughput. Transient
14s "busy blips" (broken-pipe/503/timeout) are NOT failures → short retry-backoff. The
`transcribe_worker` runs a 2-wide chunk pool; the old size-1 lock was ~2.5× slower.
## Key operational rules (learned this build — easy to get wrong)
- **`own_network` quarantine is MATERIALITY-driven, not "any investment."** Quarantine (drop in live
scoring, keep in test) only for MATERIAL ties where the source is part of Ten31's voice: the partners'
own shows (TFTC, Citadel Dispatch, Rabbit Hole Recap), the Battery *partnership*, material portfolio
leads. **Immaterial passive stakes → INDEPENDENT** (River and Swan/Cafe Bitcoin were corrected to
independent). Unconfirmed: Unchained, Debifi, Coinkite (held quarantined pending Grant's materiality call).
- **Gemini quota is a rolling ~24h window** (~291 hour-long episodes / ~51M tokens), not a calendar-day
reset. Bulk transcription overflows there; expect 429 RESOURCE_EXHAUSTED past the window.
- **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read
**`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type
hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the
Battery/Strike adversarial-test PASS criteria. Don't regress those invariants (they're what make
Battery pass). Full decision log: `DESIGN_v2.md`.
## Secrets / env
Real values live in **`.env`** (gitignored). `.env.example` lists the names. Keys used: `SPARK_CONTROL_URL`,
`SPARK_VERIFY_TLS`, `LOCAL_LLM_MODEL`, `EMBED_MODEL`, `TRANSCRIBE_MODEL`, `AUDIO_CONCURRENCY`,
`EXTRACTION_BACKEND`, `GEMINI_API_KEY`, `GEMINI_MODEL`, `ANTHROPIC_API_KEY`, `FMP_API_KEY`,
`EDGAR_USER_AGENT`, `DATA_DIR`, `UI_PORT`, `LOG_LEVEL`. Never commit key values; the private LAN gateway
IP appears only as an env-var default.
## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`)
- **Battery adversarial test: PASSES.** Corpus built (23 docs via the `docs` fetcher); after the three
scoring fixes the engine reads demand-net rising (+3.9) while **supply stays flat at 0.0** — correctly
rejecting Cantor's *announced* $2B and borrower-side collateral claims as not-realized-supply.
- **Strike adversarial test: RUNNING (extraction phase) — no result yet.** The independent leg (What
Bitcoin Did, Stephan Livera, Kevin Rooke, Anita Posch, Cafe Bitcoin, + River research — all
independent) is ~586/671 transcribed (60 stragglers). `run_strike_pipeline.sh` proceeded on that
partial corpus and is in the SLOW extraction phase (~600 podcast extract jobs on local Qwen);
`two-sided STRIKE2022` (live vs test reflexivity) has NOT produced a result yet — watch
`data/strike_pipeline.log`. If it stalls, resume manually: `run-extract``embed-claims`
`two-sided --conviction STRIKE2022 --modes live,test`. The Spark **audio fix** (semaphore-of-2 +
retry-backoff) is committed and validated (~2.5× faster, zero episode aborts).
- **§7.1 power-infra backtest:** qualified YES (corpus-gated; runway/precision caveats in `DESIGN_v2.md`).
- Corpus now spans bitcoin podcasts, SEC/FMP company filings (incl. 6 major banks + Robinhood, a new
`banks` cluster), the Battery text corpus, and River research. EISC edges seeded for the bitcoin cluster.