ten31-signal-engine/ten31-signal-engine-handoff.md

# Ten31 Signal Engine — Build Handoff (Project 2 of 2)

**Audience:** Claude Code
**Owner:** Grant
**Status:** v2 spec. Pilot-first. Several items flagged DESIGN-FIRST must be resolved before scaling.
**Companion doc:** Project 1 (voice/writing assistant) is a separate, independent build with a different scope and hardware footprint. This doc covers Project 2 only.

---

## 1. What we're building (in one paragraph)

A recurring pipeline that ingests a large, growing corpus of audio (podcasts, YouTube) and text (SEC filings, earnings-call transcripts), extracts structured *claims* from it, and surfaces signal over time, all filtered through Ten31's investment thesis as a relevance lens (Section 3). Every surfaced signal is logged as a falsifiable prediction so the system can be scored against reality over time.

The system serves **two distinct jobs**, and the design must do both:

- **Job A — Discovery.** Surface emerging themes (including non-obvious 2nd/3rd-order themes) and contrarian signals that Grant does not yet see. Themes are detected as *convergence* of independent sources; contrarian as a credible minority position actively arguing against an established consensus; and the prize is their intersection — a credible minority view that is also *accelerating* (a consensus about to flip).
- **Job B — Closing the conviction-action gap.** Take convictions Ten31 *already holds*, fan them out to their 2nd/3rd-order consequences, and flag the derivative branches where the world is beginning to corroborate something Ten31 has little or no exposure to. This is designed against a specific, expensive, real failure (see Section 1.1): being *right about a root theme and late to its derivatives.*

This is **not** a summarizer, and it is **not** a free-associating "find me something interesting" engine. The discipline that separates signal from plausible-sounding noise is the hard constraint in Section 5: **statistics and graph structure nominate candidates; the frontier model only judges (and fans out from) a pre-filtered shortlist.**

### 1.1 The failure mode Job B exists to prevent (the AI/compute miss)

Three years ago Ten31 co-founder Jonathan Kirkwood publicly articulated, with conviction, that bitcoin mining and AI were both "distributed compute," that interruptible/flexible load was the differentiator, and that the world would need to ~1000x rack space over the decade. **The root call was correct and early.** What was missed was not the prediction — it was the *derivative tree*: if compute explodes and power becomes the binding constraint, then the investable consequences are grid interconnect, transformers, substations, cooling, gas turbines, nuclear, uranium, and the public picks-and-shovels of the buildout — much of which was extraordinarily profitable 2023–2025. Ten31 held the seed conviction but did not systematically fan it out and size up behind the branches; in at least one case (Giga) the AI tailwind arrived by accident rather than as the thesis. **Job B is the institutional countermeasure: never again be right about a root theme and under-act on its derivatives.** (Note: this fan-out logic is Ten31's own — the Platform essay frames the strategy as Sequoia's "aircraft carrier" approach of backing a thesis *and its second-order beneficiaries*: microprocessor → Atari → Apple → Oracle/Cisco. Job B operationalizes exactly that.)

---

## 2. The core conceptual model (read this before coding)

Three things people get wrong about this kind of system. The architecture exists to avoid all three:

1. **An isolated point in embedding space is almost never "early insight" — it's almost always noise** (ad reads, tangents, mistranscriptions, one-off anecdotes). Do not hunt for lonely outliers. Real signal is *independent convergence*: the same non-obvious idea appearing across sources that share no guests, community, or topic.

2. **Embeddings are bad at stance and negation.** "Rate cuts cause a recession" and "rate cuts do NOT cause a recession" embed almost on top of each other. Therefore we never rely on raw-chunk vector distance to separate positions. We extract `stance` as a structured field *before* embedding.

3. **A frontier model asked "what's the non-obvious connection here?" will ALWAYS produce one**, whether or not it's real. It never returns "nothing." So the model is never allowed to nominate candidates from the raw corpus, nor to have its fan-out branches trusted on their own. It only ever assesses candidates that already passed a quantitative bar, and a fanned-out derivative only becomes a signal once independent corpus corroboration confirms it.

Geometry and stats nominate. The model judges and expands. The world (the prediction ledger) is the final arbiter.

---

## 3. The relevance lens AND the seed convictions — Ten31's thesis

This is the lens that decides what's *relevant* (Job A) and the set of seed convictions the system fans out (Job B). It must NOT decide what's *true*: the system must remain able to surface signals that cut against this thesis (see Section 5). Use the operator's exact framing below.

### Root (the forcing function)
Debasement is the macro forcing function. The cost of everything *reproducible* — cognition, content, code, media — collapses toward zero, so value migrates to the few things that are scarce, verifiable, or owned. **AI is the abundance engine. Bitcoin is the scarcity anchor.** These are not two trends Ten31 straddles; they are two faces of one **megatrend**. Ten31 does not invest in AI because it is growing — it invests where AI's abundance *increases the value of the scarce and the verifiable*, which is the same thing Ten31 has invested in for a decade.

Framing discipline: **Ten31 is not "adding AI."** Ten31 has been making AI-adjacent investments for years (see below); they were simply filed under bitcoin. The thesis already spanned AI by construction.

### Bitcoin
The apex non-debasable reserve asset that capital progressively converges on to preserve wealth. (NOT framed as a settlement/medium-of-exchange layer — see weak forms, Section 9.)

### The three investable seams (picks-and-shovels at the convergence)
Ten31 backs the **indispensable enabling infrastructure** at the point where two megatrends meet — companies whose demand is *structurally redundant across both trends*, so the investment does not depend on which trend wins. This is NOT rent-extraction / "toll roads"; it is providing the critical infrastructure the buildout cannot happen without, positioned at the seam.

1. **Energy <-> Compute.** Power infrastructure, equipment, contracts, and software serving both bitcoin mining and AI data centers (portfolio: Giga Energy, Satoshi Energy, Upstream Data). Differentiator: **bitcoin-mining fluency is the underwriting lens for the AI energy buildout.** Miners were the first large-scale flexible, interruptible, behind-the-meter, location-agnostic electrical load in history; that exact playbook is what AI data centers now need. Mining isn't the bet — it's the training ground. Almost no generalist energy or AI fund has this fluency.
2. **Debasement <-> Bitcoin.** Bitcoin as **pristine collateral** (the best collateral ever created: liquid, 24/7, divisible, verifiable, un-debasable) plus the **picks-and-shovels infrastructure that makes it easier to access, hold, leverage, and utilize** (portfolio: River, Unchained, Strike, Battery, debifi, AnchorWatch).
3. **AI <-> Data-Ownership.** Keep proprietary data and inference under your own roof instead of feeding it to a third party that trains on it, monetizes it, or can cut you off (portfolio: Start9, OpenSecret/Maple). The indispensable option for those who cannot cede control — regulated industries, trade secrets, adversarial jurisdictions, the targeted.

### Connective logic (NOT a sub-sector to fund)
AI belongs in the mandate because the scarcity thesis is *incomplete without the abundance force creating it* — not because AI is adjacent or hot. The narrow playground: **only where AI collides with scarcity, energy, and data-ownership.** Not foundation models, not generic AI apps. Right to win: **when AI hits bitcoin, energy, and freedom tech, Ten31 sees it first and underwrites it best** — a decade of relationships and operating fluency a generalist AI fund lacks and a bitcoin fund without AI understanding lacks.

### Already-AI track record (proof, not forecast)
The mandate is demonstrated by existing investments, several years old, each landing on a named seam:
- Energy<->compute: Giga Energy, Satoshi Energy
- AI<->data-ownership: Start9, OpenSecret/Maple
- AI<->scarcity/verification: StatMuse, Stakwork, Vida
(Confirm exact dates and seam assignment per company when finalizing the pitch.)

### Free optionality (NOT load-bearing)
Censorship-resistant settlement for machine-to-machine payments — bitcoin/lightning wins the permissionless, un-freezable, cross-border margin that stablecoins structurally can't serve. Carried as upside, not underwritten.

### 3.1 The conviction log (seed nodes for Job B)
A maintained, human-owned list of beliefs Ten31 holds, each with **conviction** (low/med/high), **current exposure** (none/low/med/high), and a **disconfirming signal**. These are the seed nodes the system fans out (4.6) and the basis for the under-acted-conviction signal (4.4). Highest-leverage input to Job B; keep editable.

**Critical structural rule — conviction = team x thesis, but the engine can only track the thesis half.** Each entry separates the *trackable thematic proposition* (what the corpus/world can corroborate, what gets fanned out and scored) from *team conviction* (logged as context only — no podcast can resolve whether a given founder out-executes). The engine must never present corroboration of a theme as validation of the team bet beneath it.

Initial draft (v1 — operator to finalize levels):

**Root**
- **R1 Debasement / neutral reserve** — sovereign debt keeps being monetized not repaid; fiat debasement persists; bitcoin adopted as the neutral non-debasable reserve capital migrates to. Conviction HIGH / Exposure pervasive. Disconfirm: durable fiscal surpluses + falling debt/GDP + no reserve diversification.
- **R2 Abundance/scarcity** — AI drives marginal cost of the reproducible toward zero; value accrues to the scarce/verifiable; bitcoin is the "strongest horse," gains relative share, pricing-in-bitcoin grows. Conviction HIGH / Exposure thesis-wide. Disconfirm: scarce/verifiable assets earn no premium as AI content saturates.
- **R3 Sovereign + institutional adoption catalyst** — strategic bitcoin reserves (US / nation-states), SAB-121 repeal enabling bank custody, and ETF/treasury inflows provide a price-inelastic bid and invert allocator career risk (ignoring bitcoin becomes the risk). Conviction MED-HIGH / Exposure pervasive (esp. custody/credit names). Disconfirm: reserve plans stall or reverse; banks stay out; policy turns adversarial.

**Energy <-> Compute**
- **E1 Power not chips is the binding constraint** on AI buildout through ~2027-28; seam picks-and-shovels under-priced. Conviction HIGH / Exposure MED-HIGH (Giga, Satoshi Energy). Disconfirm: chips/capital remain bottleneck; interconnect clears fast.
- **E2 Miner flexible-load playbook goes mainstream** (demand response, behind-the-meter) for AI data centers + grids; mining fluency = transferable underwriting edge. Conviction HIGH / Exposure MED (Giga power-market optimization, Satoshi). Disconfirm: data centers reject flexible-load; fluency non-transferable.
- **E3 Straddlers beat pure-plays** — mining-native operators that pivot into/straddle AI/HPC capture the convergence; mining-only underperforms. Conviction MED / Exposure Giga (straddle) vs Upstream (mining-only, "whiffing on AI"). Disconfirm: pure-play mining outperforms straddlers.

**Debasement <-> Bitcoin (pristine collateral + picks-and-shovels)**
- **D1 Bitcoin-as-collateral goes mainstream** — new BTC-collateralized credit products proliferate; spreads compress; >=1 major traditional institution enters within 24-36 mo. Conviction HIGH / Exposure HIGH (Strike; Battery, Unchained, debifi, AnchorWatch). Disconfirm: stays crypto-native niche; no incumbent entry; spreads hold. *Scarcity-amplification mechanism:* as credit / dual-collateral / insurance products mature, holders borrow rather than sell, shrinking marginal supply (the "scarcer than you think" dynamic), pairing BTC with the ~$46T US credit / ~$4T real estate / ~$1T insurance markets. Battery = dual-collateral real-asset loans (effectively a cheap long-dated BTC call option for borrowers; thesis strong but execution lagging per operator). AnchorWatch = full BTC insurance in multisig (Miniscript) = the fiduciary unlock.
- **D2 Incumbents buy not build** — legacy finance/tech acquires bitcoin-natives rather than building (the published exit thesis). Conviction HIGH / Exposure portfolio-wide. Disconfirm: incumbents build in-house or via crypto-generalists; no strategic M&A.
- **D3 Bitcoin commercialization of legacy operating businesses** — compressed-multiple firms become structurally advantaged rearchitected around bitcoin (treasury, settlement, self-hosted infra, stranded energy). Conviction MED-HIGH / Exposure enablers (Fold, AnchorWatch, Giga/Upstream). Disconfirm: legacy adoption stalls; no margin advantage.
- **D4 Strike re-rates as a bitcoin bank, not payments** — market values it as exchange + major retail BTC-collateralized lender + global access (70+ jurisdictions), not legacy payments. Conviction HIGH (largest position, ~40%) / Exposure HIGH / team conviction high (separate). Disconfirm: stays valued/stuck as payments; lending/exchange don't scale.

**AI <-> Data-Ownership (PRIME under-acted-conviction target — mirrors the 2023 AI/compute miss)**
- **A1 Coherence: owned judgment is the last margin** — AI commoditizes competence, profit on undifferentiated output erodes toward zero; durable margin needs owned/protected proprietary data + judgment; demand grows for sovereign-root + confidential-inference infra. Conviction HIGH (thematic) / Exposure LOW (Start9, OpenSecret/Maple, maybe Primal; small checks). Disconfirm: enterprises cede data/inference with no margin penalty.
- **A2 Sovereign option for the segment that can't cede** (regulated, IP-sensitive, adversarial jurisdictions) adopts owned infra + confidential inference even as the majority cedes to convenience. Conviction MED / Exposure LOW. Disconfirm: even IP-sensitive segment fully cedes.
- **A3 Start9 broadens beyond the niche** (SaaS->on-prem reversion). Conviction LOW / explicitly uncertain (team high, theme unproven — "maybe drinking our own koolaid, tbd") / Exposure LOW. Disconfirm: stays bitcoiner-niche.

**Monitored thesis-breakers (engine must surface these against the thesis)**
- **B1 Quantum acceleration** compresses CRQC timelines inside NIST 2035 before mitigations deploy (bitcoin-leg breaker).
- **B2 AI permanently outbids mining for power**, pushing mining to only truly-stranded margin (energy-leg breaker).
- **B3 Stablecoins/CBDCs capture the neutral-reserve role** or bitcoin fails as the exit (tests the complementary-stablecoin view).

Note: the highest-value early use of the engine is pressuring **A1/A2** — high published conviction, low exposure, world beginning to corroborate — exactly the shape of the prior miss. A3 and E3 are deliberately low-conviction seeds the engine should help resolve.

---

## 4. Architecture — pipeline layers

Design target: **~95% of compute runs locally** (Grant operates dual DGX Spark running Qwen3 via vLLM). Frontier API is used ONLY at the final synthesis/judgment/fan-out step on a small shortlist. Both a cost decision (bulk extraction at frontier prices would be an order of magnitude more expensive) and a data-sovereignty decision.

```
[Ingestion] -> [Extraction (LOCAL)] -> [Embedding+Store] -> [Cluster + Temporal + Graph]
   -> [Candidate scoring: emergence / contrarian / intersection / under-acted-conviction]
   -> [Frontier judge + conviction fan-out (SHORTLIST / SEEDS ONLY)]
   -> [Dual-evaluation ledger: human ratings + falsifiable predictions w/ lead time]
```

### 4.1 Ingestion
- Podcasts/YouTube: pull via RSS feeds and YouTube; download audio; **transcribe locally through the operator's existing Spark Control gateway, NOT a Whisper deployment you stand up.** Transcription is a live OpenAI-compatible endpoint: `POST /v1/audio/transcriptions` backed by **NVIDIA Parakeet TDT 0.6B** (`response_format=verbose_json` gives word- and segment-level timestamps). It runs ~60x real-time on the operator's GPU, so transcription is not the bottleneck. See §10 and §13 for the full endpoint list and exactly what you must build vs. call.
- **Speaker labels are available — and they're more useful than the doc's "where possible" implies.** Spark Control also exposes **diarization + 192-dim voice fingerprints** (NVIDIA Sortformer + TitaNet): `POST /api/audio/diarize-chunk` returns per-speaker segments **plus a voiceprint per speaker**, and `POST /api/audio/transcribe-with-speakers` returns speaker-attributed transcript blocks in one call. The voiceprints are the important part for this project: they let you **identify the same guest by voice across different shows even when unlabeled** — a direct, automated input to the source-independence graph (§4.5, "shared guests") that you'd otherwise have to infer from show notes.
- **Long-audio handling (important operational note).** Podcasts run 1–3 hours; the diarizer (Sortformer) caps at 4 speakers per chunk and the operator's Spark 2 is a single GPU. So you **chunk long audio into ~2–3 minute pieces and send them sequentially** (parallel audio requests can trip a GPU FFT race → 503/retry). `diarize-chunk` is purpose-built for this: it returns a voiceprint per chunk so you can re-cluster the same speaker across chunks (cosine similarity, ~0.7 distance threshold). This chunking + cross-chunk speaker stitching is **your code**, on top of the per-chunk endpoint.
- Companies: pull SEC filings (EDGAR) and earnings-call transcripts on a schedule. (These are text — no transcription needed; the earnings-call transcript *source* is still TBD, §12.)
- Store raw + transcript with metadata: source, source_cluster, date, speakers, speaker_voiceprints, url.
- NOTE on source-quality asymmetry: filings/earnings calls are high-information-density and a likelier source of differentiated signal; podcasts are low-density and more prone to echo. Weight accordingly downstream; do not treat all sources as equal-value just because they're in one corpus.

### 4.2 Extraction (LOCAL model — this is the cost & quality center) — schema FINALIZED for pilot
Run each transcript through a local model to extract structured **claim units**. Extract at the level of the **proposition**; let one passage emit *multiple* claims or *zero*. Most of a podcast hour is zero — the extractor must be willing to find nothing. An extractor that dutifully emits a claim per chunk reintroduces exactly the noise everything else is designed to remove.

```json
{
  "claim_id": "...",
  "proposition": "normalized one-sentence proposition: subject-assertion-object",
  "topic_canonical": "normalized topic for clustering / stance distributions",
  "topic_raw": "what was actually said (preserved)",
  "claimant": "who said it",
  "source": "...",
  "source_cluster": "macro | ai_tech | energy | bitcoin | vc_consensus | generalist",
  "date": "ISO date",
  "claim_type": "interpretive | predictive | descriptive | reactive",
  "time_horizon": "near | medium | long | unspecified",
  "confidence": "low | med | high",
  "relation": { "target_proposition_id": "...|null", "polarity": "affirms | denies | qualifies | none" },
  "engages_consensus": true,
  "counters_position": "the mainstream position it argues against, if any",
  "thesis_seam": "energy_compute | debasement_bitcoin | ai_data_ownership | none",
  "salience": "central | secondary | aside"
}
```
Design rationale:
- **`proposition` is the atomic unit** of the whole system: a normalized claim with an owner and a date. It is what makes "two sources, same stance" detectable, and it is what later becomes a falsifiable prediction. Do NOT collapse stance into a bull/bear label — too lossy.
- **`topic_canonical` vs `topic_raw`** — without normalization, "Fed policy"/"interest rates"/"the FOMC" scatter and clustering fails.
- **`relation` (affirms/denies/qualifies a prior proposition)** is how a real stance distribution gets built ("11 affirm, 3 deny, 2 qualify") and is the negation fix.
- **`claim_type`** separates insight (interpretive/predictive) from news echo (descriptive/reactive); these look identical on a raw frequency chart and completely different once separated.
- **`time_horizon`** — a predictive claim is useless to the ledger without one; `unspecified` predictions are lower value.
- **`confidence` is low/med/high only** — a 0-1 number from a local model is false precision.
- **No `falsifiable` flag** — falsifiability is implied by structure (predictive claim_type + a resolvable proposition), not a model judgment.
- **`engages_consensus`/`counters_position`** — distinguishes a real counter-argument (signal) from ignorance that happens to disagree (noise).
- **`thesis_seam`** is a tag, NOT a hard filter (off-thesis-but-important signals must survive).
- **`salience`** cheaply downweights throwaway lines.

**Serving (no new infra to build):** run extraction against the operator's local LLM through Spark Control — `POST /v1/chat/completions` (OpenAI-compatible), currently serving **Qwen3.6-35B-A3B-NVFP4 (64K context)** on Spark 1 via vLLM. Two things that make the structured schema above reliable: (1) pass `response_format={"type":"json_object"}` for guaranteed-valid JSON (the operator already uses this exact pattern in production for a strict-JSON extraction task — it works), and (2) `chat_template_kwargs={"enable_thinking": false}`, `temperature: 0` for deterministic, no-chain-of-thought extraction. The 64K context comfortably holds a full transcript chunk plus the schema instructions. The model is hot-swappable from the Spark Control dashboard if you want a different local model for extraction, but one model serves at a time on Spark 1 (see §13 capacity notes).

### 4.3 Embedding + storage
- Embed the **distilled propositions**, NOT raw chunks. (Grant's stack: Qdrant + bge-m3 + SQLite — all already running.)
- **Embeddings endpoint:** `POST /v1/embeddings` (OpenAI-compatible) → **bge-m3, 1024-dim dense**, on the operator's GPU. Also live: `POST /v1/rerank` (bge-reranker-v2-m3 cross-encoder) and `POST /api/search` (hybrid dense+sparse retrieval with RRF fusion + optional rerank) against Qdrant.
- **Use hybrid, not dense-only.** Propositions are entity-heavy (tickers, company names, fund names, people). bge-m3 dense captures meaning; pair it with Qdrant's BM25 sparse leg (the operator's CRM does exactly this) so "MSTR" / "Strategy" / "Microstrategy" match on the lexical leg too, not just the fuzzy semantic one. `/api/search` already orchestrates dense+sparse+rerank; the BM25 sparse vectors are generated client-side at ingest (FastEmbed `Qdrant/bm25`) and the collection uses `modifier: idf`.
- **Reranker is a free quality lever for the judge.** Before the frontier judge (§4.6) sees a shortlist, `/v1/rerank` can re-order the quantitatively-nominated candidates by relevance to the theme query — cheap, local, +precision.
- Clustering now means something: propositions, with topic and stance already separated.

### 4.4 Clustering + temporal + graph + candidate scoring
- **Emergence (theme):** track cluster size over time; the signal is the *second derivative* (acceleration), not size. A big static cluster is just a popular topic.
- **2nd/3rd-order themes** won't appear as one growing cluster (no one states them outright). Detect two ways: (a) *bridge formation* — new edges in the co-mention graph between previously-unconnected clusters; (b) *top-down synthesis pass* (Section 4.6) that NAMES the higher-order theme. Themes are generated by synthesis, not discovered by geometry.
- **Contrarian:** build, per topic, the actual **stance distribution** (possible only because stance/relation were extracted). Needs ALL of: minority position + genuine majority consensus + credible source + `engages_consensus = true`.
- **Intersection (consensus-flip):** a minority stance that is ALSO accelerating. Self-correcting against the lonely-crank problem (a crank stays lonely; a correct-early contrarian pulls independent sources in, which velocity catches).
- **Under-acted conviction (Job B signal type):** for each seed conviction and its fanned-out derivatives (Section 4.6), score = **conviction (high) x current exposure (low) x rising independent corroboration in the corpus.** Fires when Ten31 believes something, has little/no position, and the world is starting to corroborate it or a derivative of it. This is the signal that should have flagged "size up power-infrastructure picks-and-shovels" in 2023.

### 4.5 Source-independence graph (build this even in the pilot)
Source independence is mostly an illusion: podcasters share guests, quote each other, move in cliques. Five shows "independently converging" may be one guest doing the rounds. Build a graph of sources (shared guests, citations, community overlap) and **discount convergence by how connected the sources are.** Cross-cluster convergence (a macro show + an energy show + an AI show, no shared guests, landing on the same on-thesis idea) is the gold; within-cluster convergence is near-noise. Deliberately under-weight the bitcoin cluster: it's the most correlated with Ten31's own priors.

**Capability you already have for this: voiceprint-based guest identity.** The "shared guests" edge is the hardest part of this graph to build from metadata (show notes are inconsistent, a guest's name is spelled three ways, many appearances aren't announced). The operator's transcription stack returns a **192-dim TitaNet voiceprint per speaker** (§4.1). Persist a voiceprint library and you can detect **the same person speaking across two shows by voice**, automatically, even when neither show labels them — turning "did these five shows actually share a guest?" from a manual annotation task into a cosine-similarity lookup. This is the single highest-leverage use of the diarization capability for *this* project, and it directly powers the convergence-discounting that separates real cross-cluster signal from one guest doing the rounds.

### 4.6 Frontier synthesis / judge / fan-out (SHORTLIST + SEEDS ONLY)
Frontier API used in two bounded roles, never on the raw corpus:
- **Judge (Job A):** receives ONLY candidates that passed the quantitative bar — "this minority stance, these N independent sources, this acceleration, this consensus baseline" — assesses genuine vs artifact, and emits the resolution spec (Section 6). Must NOT generate candidates from scratch.
- **Synthesis (Job A):** receives cluster centroids + newly-strengthened edges and NAMES higher-order themes.
- **Conviction fan-out (Job B):** receives the seed convictions (Section 3.1) and generates their 2nd/3rd-order derivative nodes. These derivatives are then matched against the corpus; a derivative becomes an under-acted-conviction signal ONLY when independent corroboration AND the exposure gap both clear the bar. The fan-out proposes the tree; the world and the book decide which branch matters. (Confabulation guard: fan-out branches are hypotheses, never signals on their own.)

**Sovereignty at the frontier boundary — use the operator's redaction gateway.** The public corpus (podcasts, filings) needs no protection. But the inputs to this frontier step are the *most* sensitive thing in the system: Ten31's **conviction log** (§3.1) — actual positions, conviction levels, and explicit **exposure gaps** (where Ten31 believes something and is under-invested). Sending that raw to an external frontier API leaks Ten31's playbook and its blind spots. The operator already runs a **scrub/rehydrate gateway** on Spark Control for exactly this: `POST /scrub` de-identifies the proprietary entities/positions into stable placeholders (`[FUND_1]`, `[POSITION_2]`, `[AMOUNT_3]`) before the call, the frontier model reasons over placeholders, and `POST /rehydrate` restores the real values locally — the de-anonymization map never leaves the box. **Route the conviction-fan-out and judge prompts through scrub → frontier → rehydrate** so the engine can use a frontier model without exposing Ten31's exposure map. (Caller supplies the entity dictionary per request; the gateway also runs a local-Qwen NER backstop for anything the dictionary misses. See §13.)

**Refinement — keep exposure off the frontier entirely; scrub identities, not substance.** The cheapest and least-blunting split is architectural, not redactional. The frontier's two jobs here — fan out derivatives from a seed *thematic* conviction, and judge a candidate shortlist — do **not** require Ten31's exposure data. So do not send it: compute the conviction x exposure gap (the under-acted-conviction score, §4.4) **locally, after** the frontier returns its thematic output. The crown-jewel data (position sizes, exposure levels, the prioritized strategic map) then never leaves the box, at **zero capability cost** — it isn't redacted, the model simply never needed it. For what *does* go (the relevant thematic slice + public corpus candidates), scrub *entities/identifiers* into placeholders but never redact the *substance* the model must reason over — de-identification is not content redaction (tokenizing `Strike`→`[FUND_1]` preserves reasoning; deleting the claim the model must weigh blunts it). And send only the thematic slice relevant to the current judgment, not the whole conviction map: the individual theses are mostly published in Ten31's essays (low sensitivity); the *combination and prioritization* is the proprietary part. Validate in the pilot: run a sample of judge/fan-out prompts scrubbed vs. unscrubbed and compare output quality, so any reasoning cost of scrubbing is **measured, not assumed**. (Threat-model note: the realistic risk on a bounded commercial frontier call is data-at-rest / breach / subpoena, not the model training on inputs — commercial frontier APIs do not train on inputs by default — so the conservatism is reasonable, and this split buys sovereignty on the part that matters without taxing capability.)

### 4.7 Dual-evaluation ledger (start day one — see Section 6)

---

## 5. Hard constraints / anti-patterns (do not violate)

1. **Stats/geometry nominate; the model judges/expands a pre-filtered shortlist or seed set.** The model never nominates from the raw corpus, and fanned-out derivatives are never trusted without corpus corroboration.
2. **Extract structured propositions first; embed distilled propositions, not raw chunks.**
3. **Separate topic from stance** (the `relation` field); never infer stance from vector distance.
4. **Discount convergence by source connectedness** (independence graph).
5. **Consensus is a moving baseline** — recompute per time-window. Yesterday's contrarian is today's mainstream.
6. **Filter reactive/descriptive claims against a news baseline; weight interpretive/predictive.**
7. **The lens tags relevance and seeds fan-out; it must not gate truth.** The system must be able to surface a credible, accelerating signal that argues *against* Ten31's own thesis. Concrete bear case it must be able to voice: if AI compute permanently outbids mining for power, mining gets pushed to only truly-stranded margin and the "mining underwrites the grid" leg weakens.

---

## 6. Evaluation & resolution — themes as the unit, events as the evidence

### 6.1 The corrected model (themes are primary; clean events are nested evidence)
Earlier framing treated "Tier 1 (clean/falsifiable)" and "Tier 2 (thematic/directional)" as two parallel tracks. That was wrong. They are **two altitudes of one signal**, and they nest:

- A **theme** is the unit Ten31 actually cares about and acts on ("power, not chips, becomes the gating constraint on AI and gets repriced"). Ten31 is an allocator betting on secular direction, so **themes are primary.**
- A theme is *made of* **clean events** — specific, observable, dated confirmations (a multi-GW nuclear PPA signed; transformer lead times blow out; a BTC-collateralized lending product ships). **These clean events are the external-confirmation evidence that grades the theme**, not a competing category.
- The higher-order / more thesis-core a theme is, the sparser and fuzzier its clean-event rungs (the abundance/scarcity theme barely has clean confirmations). That is not a defect — those are exactly the themes where Ten31 has the most edge if right and the most risk of self-deception. The ledger's job is to make that tradeoff visible, not hide it.

### 6.2 Theme resolution = two legs (both required)
1. **Discourse leg (leading, partly causal):** did independent, cross-cluster discourse on the theme keep accelerating from the log date forward? Treat discourse acceleration as a *causal leading indicator*, not merely an echo to discount — in Ten31's domains, narrative partly drives the outcome (capital follows story, price follows capital).
2. **External-confirmation leg (resolving):** did the bundle of nested clean events actually occur — real-world validation (capital flows, policy, adoption, price, signed deals), not just sustained chatter? Discourse alone resolving a theme would be circular (predicting that people keep talking), so this leg is mandatory for themes Ten31 acts on.

### 6.3 Lead time is a first-class logged field
For every theme signal, record the gap between when the system flagged it (discourse leg) and when external confirmation arrived. **This lead time IS the alpha measurement** — it separates "early to something real" (edge) from "articulate trend-follower" (worthless even if accurate). For Job B derivatives, measure earliness to the *derivative* node, not the root theme (the AI/compute miss was late derivatives, not a wrong root).

### 6.4 Reflexivity outcome taxonomy
- **Discourse up (cross-cluster, independent) + external confirmation follows + positive lead time** -> real, early, edge. The prize.
- **Discourse up + no external confirmation by horizon** -> narrative bubble that didn't cash out; record as a faked-out signal (this is how the system learns reflexive-and-real vs reflexive-and-hollow).
- **External moves with little prior discourse** -> blind spot; the system missed it; log it.
- **Discourse up but only within one cluster** -> echo; discount.

### 6.5 Clean-event scoring (the nested evidence)
Clean events resolve `correct / partial / wrong / unresolved-expired / too-early`, automatically where data exists (price, filings) or via a quick human check. They keep the system honest: if it is systematically wrong on the falsifiable rungs, its themes shouldn't be trusted either. Keep a deliberate minority of pure clean-event predictions as a calibration backbone.

### 6.6 Two failure modes to design against now
- **Survivorship/cherry-picking:** log EVERY candidate that passes the quantitative bar, including boring ones, or there's no denominator and no hit rate.
- **Look-ahead / "already priced in":** theme resolution must care about acceleration and confirmation *from the log date forward*, never whether the theme was real in absolute terms — otherwise the ledger rewards noticing things that already happened (the expensive-summarizer failure wearing a success badge).

### 6.7 Mechanics
Dual track, started day one (predictions need time to resolve; the clock can't be backfilled). Human eval (Grant) answers "non-obvious and relevant to me?"; the ledger answers "was it correct, and how early?" Keep them in separate columns and **do not let the model see Grant's rating before it logs its prediction.** The valuable cell is disagreement (boring-but-right -> lens too narrow; brilliant-but-wrong -> seduced by plausibility). The model may *propose* resolution criteria; resolution comes from the world, never from model self-confidence (log `model_confidence` only to measure its uselessness, never to score).

Minimal ledger (SQLite):
`signal_id | type(theme|event|under_acted_conviction) | proposition | date_logged | discourse_metric | external_check | resolution_date | discourse_outcome | external_outcome | lead_time | grant_rating | model_confidence`

---

## 7. Pilot scope (do this before the 500-source build)

Run the full pipeline end-to-end on a bounded, diverse, deliberately *non-correlated* source set. ~20 podcasts already spans hundreds of episodes quickly; widen companies to ~25 across categories.

### 7.1 Headline pilot test — backtest against Ten31's own history (Job B validation)
Seed the conviction log with the ~2023 Kirkwood conviction ("compute will ~1000x; energy becomes the binding constraint; interruptible load is the edge") and run the pipeline over a corpus from that period. **Does the under-acted-conviction signal surface the derivative "size up the power-infrastructure picks-and-shovels of the buildout"?** If yes, that is the strongest possible validation that the system does the job Ten31 actually needs. If no, we learn exactly what's missing before building the big version. This backtest is more convincing than any forward-looking hit and should be run first.

### 7.2 Forward success test (Job A)
Does the system surface anything Grant, a domain expert, finds genuinely non-obvious and didn't already know? Even one or two true hits validates scaling. If it only reproduces what normal reading yields, that's a cheap, early "no."

### 7.3 Source list — companies (v1 draft; VERIFY tickers/status at ingestion — this space moves fast)

| Category (seam) | Companies |
|---|---|
| AI compute & hyperscalers | NVIDIA, Alphabet, Microsoft, Amazon, Meta, Broadcom, TSMC, CoreWeave, Oracle |
| Energy & power (binding constraint) | Constellation, Vistra, Talen, GE Vernova, NextEra, Cameco, Vertiv; (watch: Quanta, Oklo, NuScale) |
| Mining <-> AI/HPC (energy-compute seam) | Core Scientific, IREN, TeraWulf, Cipher; (watch: Riot, MARA, Bitdeer) |
| Debasement <-> bitcoin (treasury/custody) | Strategy (MSTR), Coinbase, Block, Twenty One (XXI) |

### 7.4 Source list — podcasts / YouTube (v1 draft; VERIFY feeds/hosts/status — some may have changed)

Roles: CB = consensus barometer, IND = independent/contrarian, DX = domain expert.

| Cluster | Source | Role |
|---|---|---|
| Macro/monetary | Odd Lots | IND / cross-domain |
| Macro/monetary | Forward Guidance | DX |
| Macro/monetary | Macro Voices (energy-heavy) | DX |
| Macro/monetary | The Grant Williams Podcast | IND |
| Macro/monetary | Monetary Matters | DX |
| Macro/monetary | Hidden Forces | IND / cross-domain |
| AI/tech | Dwarkesh Podcast | DX / IND |
| AI/tech | No Priors | DX |
| AI/tech | Latent Space | DX (technical) |
| AI/tech | Cognitive Revolution | DX |
| AI/tech | BG2 | DX (mild Ten31 correlation) |
| AI/tech | a16z Podcast | DX (crypto correlation) |
| Energy | Catalyst w/ Shayle Kann | DX |
| Energy | Columbia Energy Exchange | DX |
| Energy | Doomberg | IND |
| Bitcoin (limited) | The Bitcoin Layer | DX (macro-literate) |
| Bitcoin (limited) | What Bitcoin Did | - |
| Generalist | All-In | CB |
| Generalist | Invest Like the Best | DX / cross-domain |
| Generalist | Lex Fridman | - (wide reach, variable) |

Independence notes for the graph:
- **Deliberately limited bitcoin cluster**, and TFTC / Bitcoin Alpha / the Odell-Bent orbit are excluded: most correlated with Ten31's own priors; convergence there ~ confirming the prior. (Confirmed in the 2021 vision essay: Matt Odell and Marty Bent are Ten31 partners — this cluster is literally Ten31's own network, so convergence there is the prior, not signal.)
- **VC-consensus cluster** (All-In, a16z, BG2, No Priors): shared guests/worldview -> discount internal convergence; All-In retained primarily as a *consensus barometer*.
- **Highest-independence cross-domain:** Odd Lots, Dwarkesh, Hidden Forces, Invest Like the Best.
- Target signal: cross-cluster convergence among sources with no shared guests.

---

## 8. Source credibility over time (DESIGN-FIRST; reuses the ledger)
Do not hand-assign static credibility. Start every source at a neutral prior, then **earn credibility from the prediction ledger**: a source's weight rises when claims it made early later resolve correct — and rises most when a contrarian call it made *against consensus* resolves right. Credibility becomes a learned track record, not an opinion, running on the same ledger from Section 6. Cold-start caveat: early weights are weak until enough predictions resolve, so treat credibility as provisional during/after the pilot. A light bootstrap prior (domain relevance + reach) is acceptable as a placeholder but should decay in favor of earned track record.

---

## 9. Conceded weak forms (keep on hand; not headline claims)
Rebuttals to assumptions, not main points:
- NOT a bet on bitcoin as a medium of exchange / "buy coffee with bitcoin." It's bitcoin as the neutral, non-debasable reserve asset.
- NOT a bet that everyone self-hosts. Freedom tech / AI-data-ownership needs to be the durable, high-value *indispensable option* for those who can't cede control — even as the majority cedes to convenience for frontier LLMs. (Cost of the sovereign option keeps falling; centralization mints its own dissenters with every breach/deplatforming, so the margin is durable and arguably growing.)
- Every thesis leg has a strong form and a conceded weak form; always claim the strong one.

---

## 10. Tech / infra notes — corrected to the operator's ACTUAL running stack
All local model serving is already live behind one host, **Spark Control** (a StartOS gateway on the operator's Start9 server that fronts the two DGX Sparks). You do not stand up vLLM/Whisper/Qdrant yourself — you call HTTP endpoints. Full inventory + what's yours-to-build is in **§13**.
- **Local LLM (extraction + clustering + most scoring):** `POST /v1/chat/completions` → **Qwen3.6-35B-A3B-NVFP4, 64K ctx**, vLLM on Spark 1. Hot-swappable to other models from the dashboard (one at a time).
- **Transcription:** `POST /v1/audio/transcriptions` → **NVIDIA Parakeet TDT 0.6B** (~60x real-time). NOT Whisper.
- **Diarization + voiceprints:** `POST /api/audio/diarize-chunk` (Sortformer 4-spk + 192-dim TitaNet voiceprints), `POST /api/audio/transcribe-with-speakers` (merged). Voiceprints → cross-show guest identity (§4.5).
- **Embeddings / rerank / search:** `POST /v1/embeddings` (**bge-m3**, 1024-d), `POST /v1/rerank` (bge-reranker-v2-m3), `POST /api/search` (Qdrant hybrid dense+sparse + RRF).
- **Vector/store:** Qdrant (hybrid-configured) + SQLite (your ledger/metadata).
- **Frontier-boundary sovereignty:** `POST /scrub` + `POST /rehydrate` (redaction gateway) — wrap the proprietary conviction/judge prompts (§4.6).
- **Health/discovery:** `GET /api/endpoints`, `GET /api/status`.
- **Auth:** none on the LAN today (behind the Start9's TLS + access control). Same-LAN clients use a self-signed-cert skip; the operator can add auth if you run off-LAN.
- **Ingestion (yours to build):** EDGAR (filings), earnings-call transcript source (TBD), RSS + YouTube pulling/scheduling/download. Spark Control transcribes audio you fetch; it does not fetch.
- **Self-hosted ethos (Start9 ecosystem):** private/proprietary data stays local end-to-end; the only external call is the bounded frontier step, and even that is scrubbed.

## 11. Build order (suggested)
1. Ingestion + local transcription for the pilot source set.
2. Extraction (schema in 4.2 is finalized for pilot) — local.
3. Embedding + storage; basic clustering.
4. Prediction ledger scaffold (turn on immediately, even before scoring is good).
5. Conviction log (3.1) + seed the ~2023 Kirkwood conviction for the backtest.
6. Temporal scoring (emergence acceleration) + stance distributions (contrarian).
7. Source-independence graph + convergence discounting.
8. Intersection scoring (consensus-flip) + under-acted-conviction scoring.
9. Frontier judge + synthesis + conviction fan-out (shortlist/seeds only).
10. Human-eval interface (Grant's ratings, kept independent of model).
11. Run the backtest (7.1) FIRST, then the forward pilot (7.2); do the disagreement analysis; decide on scaling.

## 12. Open DESIGN-FIRST items (resolve before scaling past pilot)
- Conviction log format/governance (3.1) — who maintains, how exposure is scored.
- Earnings-call transcript data source (4.1).
- Credibility cold-start bootstrap weighting (8).
- Canonical-topic vocabulary management (4.2) — controlled vs emergent.

---

## 13. The existing platform (Spark Control) — what's built, what's yours, where the gaps are

*Added for the implementing dev, who may not have context on the operator's existing infrastructure. The local-compute backbone this project needs already exists and is in production serving two other apps. Treat this section as ground truth; it supersedes any "stand up vLLM/Whisper/Qdrant" assumption elsewhere.*

### 13.1 What Spark Control is
Spark Control is a StartOS package running on the operator's Start9 server. It is a **single trusted HTTP gateway** in front of two NVIDIA DGX Sparks (GB10 Grace-Blackwell, 128 GB unified memory each, ARM64): **Spark 1** runs the LLM (vLLM); **Spark 2** runs the audio models + the embedding server + Qdrant. Everything below is one base URL (the operator provides the LAN address), one TLS cert, OpenAI-compatible where it can be. It already powers a fundraising-CRM agent system and a meeting-transcription app, so it's a stable platform, not a prototype.

### 13.2 Endpoint reference (all live)
| Method + path | Backed by | Use in this project |
|---|---|---|
| `POST /v1/chat/completions` | Qwen3.6-35B-A3B-NVFP4, 64K ctx (vLLM) | **Extraction (§4.2)**, clustering helpers, most local scoring. JSON-mode supported. |
| `POST /v1/embeddings` | bge-m3, 1024-d | **Embed propositions (§4.3).** |
| `POST /v1/rerank` | bge-reranker-v2-m3 | Rerank candidate shortlists before the judge (§4.3/4.6). |
| `POST /api/search` | Qdrant hybrid (dense+sparse, RRF) + rerank | Retrieval over stored propositions; corpus-corroboration lookups for Job B. |
| `POST /v1/audio/transcriptions` | Parakeet TDT 0.6B | **Transcribe podcast/YouTube audio (§4.1).** |
| `POST /api/audio/diarize-chunk` | Sortformer + TitaNet | Speaker turns **+ 192-d voiceprints** → guest identity for the independence graph (§4.5). |
| `POST /api/audio/transcribe-with-speakers` | Parakeet + Sortformer | Speaker-attributed transcript in one call. |
| `POST /scrub` + `POST /rehydrate` | redaction gateway + local-Qwen NER | Wrap the proprietary conviction/judge prompts to the frontier model (§4.6). |
| `GET /api/endpoints`, `GET /api/status` | — | Service discovery + health. |
| (Qdrant direct) `:6333` | Qdrant v1.16 | Collection mgmt + upserts (ingest side); hybrid named dense+sparse vectors. |

### 13.3 Build vs. provided
**Provided (call it, don't build it):** local LLM serving, transcription, diarization, voiceprints, embeddings, reranking, hybrid vector search + storage, and the scrub/rehydrate sovereignty boundary for the frontier step.

**Yours to build (the actual project):** all ingestion + scheduling (RSS/YouTube pulling, audio download, EDGAR/earnings fetch); the long-audio chunking + cross-chunk speaker stitching; the voiceprint library + guest-matching logic; the claim-extraction prompts (the schema is finalized, the prompt engineering is yours); all clustering/temporal/graph/scoring logic; the source-independence graph; the prediction ledger + conviction log; the frontier orchestration; and the human-eval interface. **Spark Control gives you the model primitives; the Signal Engine's intelligence is entirely your layer on top.**

### 13.4 Gaps — what the operator's stack does NOT yet serve (flag these to plan around)
1. **No ingestion/scheduler.** Spark Control transcribes audio you hand it; it does not fetch RSS, download YouTube, or pull EDGAR. The entire ingestion layer (feeds, downloaders, cron) is greenfield. *(This is the biggest "build," and the doc's §4.1 already owns it — just don't expect any of it from the gateway.)*
2. **No batch/queue orchestration → throughput is the real constraint, not capability.** Transcription (Spark 2) and extraction (Spark 1) each run on a **single GPU**, and audio requests must go **sequentially** (a parallel-request GPU race returns 503). Transcribing is fast per item (~60x real-time) but backfilling "hundreds of episodes" + a 500-source corpus is a **serial job measured in GPU-hours**, and extraction (one LLM forward pass per chunk over the whole corpus) is the heavier of the two. Plan the backfill as a managed queue with patience, not a real-time fan-out. If backfill latency becomes painful, the levers are: a dedicated transcription window, a second model instance, or accepting wall-clock. *(There is no server-side job queue today — you build the queue client-side.)*
3. **No earnings-call transcript source wired up** (already flagged §12). EDGAR (filings) is a clean public API; earnings-call *transcripts* need a chosen provider.
4. **Embeddings are dense bge-m3 (1024-d), not Matryoshka-truncatable.** Fine at this corpus scale (low hundreds of thousands of propositions is trivial for Qdrant); just don't design around dimension truncation. If proposition-retrieval recall ever becomes the bottleneck, Qwen3-Embedding is the documented A/B upgrade — same `/v1/embeddings` contract.
5. **No auth on the LAN endpoints, and Qdrant has no auth/backups yet.** Acceptable for a LAN pilot; if this corpus + ledger becomes long-lived and valuable, the operator would add a Qdrant API key + snapshots (a known, small hardening task) before it's the system of record.
6. **One local LLM loads at a time on Spark 1.** If you ever want a *different* local model for extraction vs. a local synthesis step, that's a (slow) hot-swap, not concurrent. For the pilot, Qwen3.6 for all local steps is the assumption.

### 13.5 Net for the dev
The "~95% local compute" design goal in §4 is not aspirational — it's already the operating reality, because every local model this pipeline needs is a live endpoint on hardware the operator runs. Your job is the ingestion, the extraction prompts, and the entire stats/graph/ledger intelligence layer. Wire the model calls to Spark Control; build the brain on top.