Files

135 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ten31 Signal Engine — AGENTS.md
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for items
> tagged `(ten31-signal-engine)` and surface them before proposing next steps; triage with `/triage`.
A recurring pipeline that ingests a growing corpus of **audio** (podcasts, YouTube) and **text** (SEC
filings, earnings calls, policy/lender/research docs), extracts structured **propositions ("claims")**,
and surfaces **signal over time** through Ten31's investment thesis as a *relevance lens* — logging every
surfaced signal as a **falsifiable prediction** scored against reality.
**Source of truth (in order):** `ten31-signal-engine-handoff.md` (the spec — wins on any conflict; §refs
point into it) `DESIGN_v2.md` (the living decision/falsification log — read before changing scoring)
this file's **Current state**. `README.md` is the user-facing intro.
## The spine — NON-NEGOTIABLE guardrails (never violate)
1. **Nominate-then-judge.** Statistics & graph structure NOMINATE candidates; the frontier model only
JUDGES / FANS OUT a pre-filtered shortlist. The frontier never nominates from the raw corpus.
2. **Propositions, not vibes.** Extract atomic claims; separate **topic** from **stance**.
3. **Discount convergence by connectedness.** Independence is earned, not counted — the EISC graph
(source edges + voiceprints) downweights echo. Bitcoin is one *capped* cluster: within-cluster
agreement can NOT masquerade as independent corroboration; cross-cluster earns the multiplier.
4. **Thesis is a LENS, not a gate on truth.** The engine must surface signals *against* Ten31's thesis,
not just for it.
5. **Dual-evaluation ledger from day one** — precision AND recall; every signal is a logged prediction.
6. **~95% local compute via Spark Control.** Call the gateway's HTTP endpoints; do NOT stand up your own
vLLM / Whisper / Qdrant. Gemini is an explicit *overflow* lever for PUBLIC data only.
7. **Sovereignty boundary (hard).** Exposure/positioning/conviction data and the Strike/Battery
investment memos NEVER go to the frontier. Route sensitive frontier calls through
`/scrub → frontier → /rehydrate` (scrub identities, not substance). Read the memos LOCALLY only.
**Two jobs:** **A — Discovery** (emergent themes via independent cross-cluster *convergence* scored on
acceleration; contrarian stances; their intersection). **B — Conviction-action gap** (fan held
convictions to 2nd/3rd-order derivatives, catch early corroboration — the countermeasure to the 2023
"power is the binding constraint on AI/compute" miss: right on the root, late to the derivatives).
## Architecture
`signal_engine/` (Python package, run as `python -m signal_engine <cmd>`):
- `config.py` — env-driven `Config` (+ `.env` loader). `spark/client.py` — the SINGLE gateway chokepoint
(no other module knows the gateway URL); scrub/rehydrate live here.
- `ingest/``edgar` (SEC), `earnings` (FMP REST), `feeds`+`podcasts` (RSS), `download`, `chunker`,
`transcribe_worker` (local Parakeet), `gemini_transcribe` (bulk overflow), `docs` (HTML/PDF/RSS text
fetcher for policy/lender/research), `identify`, `speaker_stitch`.
- `extract/``claims`+`worker` (proposition extraction), `backends` (LocalQwen | Gemini), `prompt`,
`html_text`. `embedstore/``embedder` + `qdrant_store` (hybrid dense+BM25).
- `signals/` (the scoring brain) — `independence` (EISC), `asof` (look-ahead guard), `windows`,
`under_acted` (Job B), `bar` (two-tier gate), `two_sided` (affirmsdenies net-corroboration),
`llm_helpers` (`derivative_relevance`), `confusion` (precision/recall), `external` (price/outcome
fetcher), `ledger_writer` (§6.6 prediction ledger), `resolver` (stub), `run`.
- `store/``db` (SQLite + idempotent migrations), `schema.sql`, `seed`, `sources`. `backfill/queue.py`
(the job queue). `ui/app.py` (FastAPI corpus/eval UI). `util.py`.
- Data lands in `data/` (gitignored): `signal.db`, `transcripts/`, `docs/`, `audio-cache/`.
**Flow:** seed sources/convictions/fanout → ingest (→ `documents` + `transcribe`/`extract` jobs) →
`run-transcribe` / `run-extract` drain the queue → `claims``embed-claims` (Qdrant) → scorers
(`backtest`, `two-sided`) read the proposition store as-of a date.
## Build / run
- **Setup:** virtualenv at `.venv` (Python 3.14). `.venv/bin/pip install -r requirements.txt`.
- **Invoke:** `.venv/bin/python -m signal_engine <cmd>`. **`--help` is authoritative**; the rest is a
map: `init-db`; seeding `seed-sources`/`seed-convictions`/`seed-fanout`/`seed-edges`/`load-feeds`;
ingest `ingest-edgar`/`ingest-earnings`/`ingest-podcast`/`ingest-doc`/`ingest-doc-manifest`/
`ingest-feed-text`; queue drain `run-transcribe`/`run-transcribe-gemini`/`run-extract`; index
`embed-claims`/`search`; score `backtest`/`two-sided`/`confusion-matrix`; inspect `queue-status`/
`spark-status`/`feed-peek`/`provenance`/`db-tables`; `serve` (UI).
- **DB:** `python -m signal_engine init-db` (idempotent — re-creates schema + runs additive migrations).
- **Tests:** ⚠️ **no automated test suite yet** (no `tests/`, no pytest). Verification is by running
commands against the live gateway. Adding a test harness is on the ROADMAP.
- **Lint/format:** none configured. Match the surrounding style (dense, §-referenced docstrings).
## Spark Control infra (`SPARK_CONTROL_URL`, self-signed TLS → `SPARK_VERIFY_TLS=false`)
One gateway fronts two DGX Sparks: **vLLM** `RedHatAI/Qwen3.6-35B-A3B-NVFP4` on `:103`; **Parakeet**
ASR + diarizer, **bge-m3** embeddings, **Qdrant** on `:87`. The gateway is the only URL anything calls.
- **AUDIO concurrency (learned 2026-06-09):** single serial GPU shared with the operator's production
meeting app. Cap **2 in-flight (ceiling 3), GLOBAL across both audio endpoints** — a process-wide
`BoundedSemaphore` (`AUDIO_CONCURRENCY` env, default 2). Going wider buys zero throughput. Transient
14s "busy blips" (broken-pipe/503/timeout) are NOT failures → short retry-backoff. The
`transcribe_worker` runs a 2-wide chunk pool; the old size-1 lock was ~2.5× slower.
## Key operational rules (learned this build — easy to get wrong)
- **`own_network` quarantine is MATERIALITY-driven, not "any investment."** Quarantine (drop in live
scoring, keep in test) only for MATERIAL ties where the source is part of Ten31's voice: the partners'
own shows (TFTC, Citadel Dispatch, Rabbit Hole Recap), the Battery *partnership*, material portfolio
leads. **Immaterial passive stakes → INDEPENDENT** (River and Swan/Cafe Bitcoin were corrected to
independent). Unconfirmed: Unchained, Debifi, Coinkite (held quarantined pending Grant's materiality call).
- **Gemini quota is a rolling ~24h window** (~291 hour-long episodes / ~51M tokens), not a calendar-day
reset. Bulk transcription overflows there; expect 429 RESOURCE_EXHAUSTED past the window.
- **Transcript chunking is recall-first and MUST cap every chunk.** ASR transcripts have NO blank-line
paragraphs (speaker turns joined by a single `\n`), so `extract.claims.chunk_text` falls through
`\n\n``\n`→sentence→word→hard-slice; splitting only on `\n\n` (the old bug) sent whole 23 h episodes
in ONE call → context-overflow 400s. Extraction defaults to full coverage at 12K chars/chunk
(`run-extract --chunk-chars/--max-chunks`); bigger chunks risk lost-in-the-middle recall loss.
- **Gemini extraction backend: disable thinking AND set a timeout.** `gemini-2.5-flash` thinks by
default and burns the output-token budget on reasoning → MAX_TOKENS → truncated JSON → 0 claims, so
the backend sets `thinking_budget=0` (mirrors local `enable_thinking=False`). It also sets an HTTP
**timeout (120 s) + 4 retries** — a timeout-less call once hung the single-threaded worker ~50 min
(transient 504/read-timeouts then self-heal). Gemini = overflow for PUBLIC data only; keep
`EXTRACTION_BACKEND=local` in `.env`, flip it inline per-run when overflowing.
- **Scoring-brain internals are scoped to a guide.** Before editing `signal_engine/signals/`, read
**`docs/guides/scoring-brain.md`** — the classifier invariants (REALIZED-ONLY, ROLE-MATCH, claim_type
hard-evidence guard, max_tokens budget, claim_id bracket-strip), the EISC cluster-cap, and the
Battery/Strike adversarial-test PASS criteria. Don't regress those invariants (they're what make
Battery pass). Full decision log: `DESIGN_v2.md`.
## Secrets / env
Real values live in **`.env`** (gitignored). `.env.example` lists the names. Keys used: `SPARK_CONTROL_URL`,
`SPARK_VERIFY_TLS`, `LOCAL_LLM_MODEL`, `EMBED_MODEL`, `TRANSCRIBE_MODEL`, `AUDIO_CONCURRENCY`,
`EXTRACTION_BACKEND`, `GEMINI_API_KEY`, `GEMINI_MODEL`, `ANTHROPIC_API_KEY`, `FMP_API_KEY`,
`EDGAR_USER_AGENT`, `DATA_DIR`, `UI_PORT`, `LOG_LEVEL`. Never commit key values; the private LAN gateway
IP appears only as an env-var default.
## Current state (snapshot — overwrite each session; longer-term backlog → `ROADMAP.md`)
- **Strike adversarial test: CONDITIONAL PASS (2026-06-16).** Pipeline complete end-to-end — extraction
drained the last 63 filing jobs (Gemini, 3,330 claims, 0 failures) → **56,008 claims** all embedded in
Qdrant → `two-sided --conviction STRIKE2022 --modes live,test`. The engine **refuses the false positive**:
the Lightning-retail nodes net `+0.25` (capped single bitcoin cluster, ≪ `EISC_FLOOR=2.0`), §3 guardrail
holding. The own_network-drop *reflexivity demo* is unexercised (`own_net=0`, live==test) because
RHR/CD/Bitcoin.Review (169 eps) were deferred at transcription 2026-06-08; operator accepted the
conditional pass, no audio-GPU spend now. How to read the net value + the full follow-up:
`docs/guides/scoring-brain.md` (STRIKE2022) and `ROADMAP.md`.
- **Battery test PASSES; §7.1 power-infra qualified YES** (both unchanged).
- **Corpus:** bitcoin podcasts (own_network: TFTC partial 19/80; RHR/CD/Bitcoin.Review deferred), SEC/FMP
filings (`banks` cluster now extracted + power-infra names Oklo/NuScale/Cipher/TeraWulf), Battery corpus,
River research; EISC edges seeded for the bitcoin cluster.
- **Repo:** clean, in sync with `origin/main` (Gitea). No automated test suite (on ROADMAP).
- **NEXT (priority order, all fresh scopes — confirm direction first):** (1) frontier-fan-out test H6, the
untested half of the §1.1 validation; (2) complete the Strike reflexivity demo when audio-GPU budget
allows (un-defer RHR/CD 202223 → re-extract → re-run); (3) Job A discovery scorers for the forward pilot.