288 lines
31 KiB
Markdown
288 lines
31 KiB
Markdown
# Ten31 Signal Engine — Pilot Backtest Write-up
|
||
|
||
**Author:** Claude (Claude Code), implementing dev
|
||
**For:** Grant + the dev who authored the handoff/scoping document
|
||
**Date:** 2026-06-08
|
||
**Status:** Pilot build complete; §7.1 backtest executed end-to-end with a *qualified* result. This document is the honest assessment, the judgment calls I made, and the open questions for a second opinion.
|
||
|
||
> **Read this as a peer review request, not a victory lap.** The engine works end-to-end and surfaced the right thesis, but the *signal quality* on the current corpus is coarse, and several design tensions in the handoff doc only became visible once there was real data flowing through. Those tensions — especially the cross-cluster gating question Grant raised — are the point of this write-up.
|
||
|
||
---
|
||
|
||
## 1. Executive summary
|
||
|
||
I built the full pilot per the handoff: ingestion (audio + text) → local claim extraction → hybrid vector store → the "scoring brain" (independence-discounted, as-of-disciplined nomination) → the §7.1 backtest → a dual-evaluation ledger. It runs against the operator's real local-compute stack (Spark Control) and a real ~6,600-claim corpus drawn from ~25 companies and a handful of podcasts.
|
||
|
||
**The §7.1 backtest verdict is a qualified YES.** Seeded with the 2023 Kirkwood "power is the binding constraint" conviction and marched as-of across 2023–2024, the under-acted-conviction scorer:
|
||
|
||
- **surfaced the root thesis cross-cluster in May 2023** (energy *and* AI sources, independent), and
|
||
- **surfaced the headline derivative ("size up the power-infra picks-and-shovels") in May 2024**, along with transformers and utilities-repriced.
|
||
|
||
So the mechanism the project exists to build — *fan a held conviction to its derivatives and catch the world starting to corroborate them* — demonstrably works on real history.
|
||
|
||
**But three honest caveats keep it from being a clean win**, and they drive the open questions:
|
||
|
||
1. The signal is **noisy** (the acceleration metric swings between earnings seasons; there's visible run-to-run variance).
|
||
2. The cross-cluster breadth shows up at the **root** level, not the **derivative** level — the specific power-infra derivatives stay energy-cluster-corroborated.
|
||
3. The derivatives only clear because I **relaxed a cross-cluster gate for Job B** — a judgment call (§7 below) that is exactly what Grant wants to debate.
|
||
|
||
The most important open question, in Grant's words: *is strict cross-cluster gating limiting our ability to pick up signal early — and is the real fix to dramatically broaden the cluster taxonomy and the corpus?* I think the answer is largely yes, and I lay out why in §8.
|
||
|
||
---
|
||
|
||
## 2. What was built (architecture as implemented)
|
||
|
||
3,347 lines of Python, 44 modules. Everything local-compute runs through the operator's existing **Spark Control** gateway (we call HTTP endpoints; we did not stand up vLLM/Whisper/Qdrant). The one external call is the bounded frontier step (not exercised in the backtest — see §7, deferred).
|
||
|
||
| Layer (handoff §) | What's built | Notes |
|
||
|---|---|---|
|
||
| **Ingestion — text (§4.1)** | SEC EDGAR (10-K/10-Q/20-F/40-F), FMP earnings-call transcripts | Earnings-call *audio* proved unfetchable (no uniform feed, ~30–90d replay expiry) → FMP transcript API, per §12. Filings dedup on accession; earnings on symbol+quarter. |
|
||
| **Ingestion — audio (§4.1, §4.5)** | RSS + YouTube fetch, long-audio chunking (~2.5 min), **Parakeet transcribe + Sortformer diarize + 192-d TitaNet voiceprints**, cross-chunk speaker stitching, a persisted voiceprint library | Verified live: a real podcast → speaker-attributed transcript → claims. |
|
||
| **Speaker identity (§4.5)** | Voiceprint cosine matching across episodes/shows **+ LLM speaker-naming** (host/guest from the intro) → name-based independence edges | Grant's idea: name-based overlap is robust to voiceprint drift across shows. Both edge types feed the independence graph. |
|
||
| **Extraction (§4.2)** | Local Qwen, the finalized claim schema, JSON-mode, temp 0, "willing to emit zero" | Pluggable backend: **local Qwen (default) or Gemini batch** (validated, for overflow/scale; public corpus only). |
|
||
| **Embedding + store (§4.3)** | bge-m3 dense + BM25 sparse → Qdrant hybrid collection; retrieval + rerank via the gateway | Embeds distilled propositions, not raw chunks. |
|
||
| **Scoring brain (§4.4, §4.5)** | EISC independence primitive; as-of harness; windowed acceleration; **under-acted-conviction (Job B) scorer**; the quantitative bar; ledger writer; resolver (stub) | See §3. Job A scorers (emergence/stance/intersection) and the frontier judge/fan-out are **deferred** per the blueprint build-order — the backtest is Job B only. |
|
||
| **Backfill queue (§13.4)** | Client-side GPU-hours queue: idempotent, leased/crash-safe, prioritized | Extraction ran ~900 docs on one GPU as a serial job. Transcription on the other GPU in parallel. |
|
||
| **Provenance / dedup** | Layered: stable item-id (robust pre-GPU guard) + normalized title/date (cross-mirror) + content-hash (audit only) | Corrected after Grant flagged that a transcript hash is a brittle dedup key. |
|
||
| **Ledger (§4.7, §6)** | SQLite dual-evaluation ledger; logs every bar-clearer; resolution columns separated from scoring (look-ahead guard) | Live with its first entries. |
|
||
| **UI** | FastAPI corpus-management app (dashboard, add/view sources, inspect per-source claims) | The "menu" to grow and audit the corpus over time. |
|
||
|
||
**Corpus the backtest ran on (snapshot):** 6,569 claims (5,129 embedded at backtest time), from 411 filings + 410 earnings transcripts + 82 podcast episodes (4 RSS-full shows for 2022–2023: Dwarkesh, Hidden Forces, All-In, Invest Like the Best; plus a partial Catalyst slice). Claim types: 2,780 predictive / 1,447 interpretive / 2,267 descriptive / 75 reactive. Clusters: **energy 3,135 · ai_tech 2,329 · bitcoin 765 · vc_consensus 139 · macro 103 · generalist 98.** 90 voiceprints (35 named), 10 shared-guest edges.
|
||
|
||
**Note the cluster imbalance** — it's central to §8. The corpus is overwhelmingly company filings/earnings (two clusters, energy + ai_tech) with a thin podcast layer. That is not a balanced cross-cluster corpus.
|
||
|
||
---
|
||
|
||
## 3. The scoring brain (how nomination works)
|
||
|
||
This is the part where the handoff's hard constraints (§5) had to become concrete code. Design was done via a 3-way design panel (statistical / graph / pragmatic lenses) synthesized into one blueprint; I then built it.
|
||
|
||
- **EISC — Effective Independent Source Count (the §4.5 differentiator).** Given the sources converging on a topic, discount by connectedness using a noisy-OR connectedness matrix + inverse-row-sum. Verified on synthetic cases: 5 identical clones → ~1.0 voice; 5 cross-cluster independents → ~5.0; all-bitcoin → floored ~0.4; "one guest doing the rounds" across many shows → ~1.0. (I improved the cross-cluster multiplier over the blueprint so a single guest spanning many clusters can't fake the gold-tier bonus.) **Every count that feeds a score routes through EISC — never a raw source count.**
|
||
- **As-of harness (§6.6).** Every scorer reads an `as_of`-filtered view; nothing reads the raw claims table. At nomination time only claims dated ≤ as_of are visible. This is what makes the backtest honest (no look-ahead).
|
||
- **Windowed acceleration (§4.4).** The signal is the discrete 2nd derivative of the EISC-weighted claim flow per topic — *not* raw size. Window length must match corpus cadence (90 days for quarterly filings; 28 for weekly podcasts).
|
||
- **Under-acted-conviction / Job B (§4.4).** `conviction_weight × exposure_gap × rising_independent_corroboration`. Corroboration = retrieve (hybrid search) → LLM filter to affirms-only → independence-weighted acceleration over the confirmed set. **Exposure is joined locally and never crosses the frontier boundary** (§4.6).
|
||
- **The quantitative bar (§5.1).** Two tiers: an *evidence bar* (clears hard gates → log a ledger row, the denominator) and a *promotion bar* (also clears a score threshold → would go to the frontier judge). Stats nominate; the model would only judge a pre-filtered shortlist.
|
||
|
||
---
|
||
|
||
## 4. The §7.1 backtest — methodology
|
||
|
||
Per the handoff (§7.1 is the headline pilot test), I ran it **before** any forward pilot.
|
||
|
||
- **Seed:** the 2023 Kirkwood conviction `K2023` ("compute will ~1000x; energy becomes the binding constraint; interruptible load is the edge"), logged in the human-owned conviction log with high conviction / low exposure (`lt2`).
|
||
- **Fan-out (v1, hand-written):** Per the blueprint's build order, I **hand-wrote** the 2nd/3rd-order derivative tree (grid interconnect, transformers, substations, cooling, gas turbines, nuclear, uranium, utilities repriced, and the headline "size up power-infra picks-and-shovels"). *Why hand-written:* it removes the frontier from the first backtest and isolates the real question — **does the scoring surface the derivative once it exists?** — from the separate question of whether the frontier can *propose* the right derivatives. (That second question is untested; see §6.)
|
||
- **Run:** marched a quarterly `as_of` from 2023-03 to 2024-09 (7–9 points), 90-day windows. At each as_of, for each derivative: retrieve corroboration from the corpus, LLM-filter to genuine affirmations, compute independence-weighted acceleration, apply the bar, log every clearer to the ledger.
|
||
- **Look-ahead control:** all retrieval/scoring at as_of only sees claims dated ≤ as_of. The resolver (forward leg) is a separate, isolated pass (a stub for now — see §6).
|
||
|
||
---
|
||
|
||
## 5. The §7.1 backtest — results
|
||
|
||
I ran it twice: once on the company-only corpus (~4,500 claims), then a "sharpened" re-run after the cross-cluster podcast claims landed (~5,100 embedded). **Presenting both is deliberate — the differences between them are themselves a finding (run-to-run variance / noise).**
|
||
|
||
### Run 1 — company corpus (~4,500 claims)
|
||
| Derivative | First cleared evidence bar | Evidence at clear |
|
||
|---|---|---|
|
||
| **Root: "power is the binding constraint"** | **2023-05-30** | EISC 3.0, 4 sources, **k_eff=2 (cross-cluster: energy+AI)**, accel +1.0 |
|
||
| **Headline: "picks-and-shovels"** | 2024-05-24 | EISC 2.0, 5 sources, k_eff=1, score 2.56 |
|
||
| Utilities repriced | 2024-05-24 | EISC 2.5, **8 sources**, k_eff=1, built steadily from 2023 (src 1→2→4→8) |
|
||
| nuclear / transformers / gas / uranium / cooling | peaked but did **not** clear | EISC or acceleration fell short in the cleared window |
|
||
|
||
### Run 2 — + cross-cluster podcast claims (~5,100 embedded)
|
||
| Derivative | First cleared | Note |
|
||
|---|---|---|
|
||
| **Root** | **2023-05-30** | unchanged (cross-cluster) |
|
||
| **Headline: "picks-and-shovels"** | **2024-05-24** | peak 3.33; notably it *scored* 3.33 back at 2023-11 but EISC was 1.6, just under the 2.0 floor, so it logged-but-didn't-clear then |
|
||
| **Transformers** | **2024-05-24** | newly cleared (peak 4.80) |
|
||
| Uranium | did not clear | peak 7.04 (!) but never simultaneously cleared all gates |
|
||
| **Utilities repriced** | did **not** clear | cleared in Run 1, *not* in Run 2 — **this is the run-to-run variance / noise, exhibited directly** |
|
||
|
||
**What the numbers say, honestly:**
|
||
|
||
- The **root thesis is a genuinely clean result** — it cleared cross-cluster (k_eff=2) in May 2023 in both runs, *independent of the contested design call*. The system would have flagged "the world is starting to corroborate that power is the binding constraint, and Ten31 is under-exposed" in mid-2023.
|
||
- The **derivatives surface, but messily.** They clear mid-2024, mostly single-cluster, and *which* ones clear shifts between runs. The acceleration (2nd derivative) flips sign between earnings seasons (`+2.6 → −2.2 → +1.6 → −1.0`), so a derivative clears in whatever window the curvature happens to be positive. That is fragile.
|
||
|
||
---
|
||
|
||
## 6. Honest assessment
|
||
|
||
### What worked well
|
||
1. **The end-to-end machine is real and disciplined.** Ingest (text *and* audio) → local extraction → hybrid store → independence-discounted nomination → as-of-honest backtest → ledger. It runs on the operator's actual stack, on a real multi-thousand-claim corpus.
|
||
2. **The EISC independence primitive does its job.** "Five shows, one guest" collapses to ~1 voice; the bitcoin cluster is structurally floored; cross-cluster gets the bonus. This is the heart of §4.5 and it behaves correctly and auditably (every score is reconstructable from its inputs).
|
||
3. **Extraction discipline holds.** The extractor emits *zero* on boilerplate (8-Ks, 10-K front-matter) and rich, well-typed claims on earnings Q&A (~82% interpretive/predictive vs. descriptive). Earnings calls massively out-yield filings for signal — a concrete finding that confirms a §4.1 hypothesis.
|
||
4. **The root-thesis result is the real validation.** The single most important thing §7.1 asked — would the engine have surfaced this in time — is *yes* for the root conviction, cross-cluster, in 2023.
|
||
5. **The as-of discipline + the ledger are correct by construction.** Resolution is structurally separated from scoring; the denominator started day one; the model never sees a human rating before logging. The anti-self-deception machinery is in place.
|
||
|
||
### Limitations & open questions (the important half)
|
||
1. **Noise on sparse, quarterly, single-domain data.** The 2nd-derivative acceleration is fragile when claims cluster in earnings seasons. The blueprint *deliberately deferred* the statistical smoothing (weighted-quadratic fits, significance gates, shrinkage) as premature at small n. **Open question:** with a bigger corpus, is raw 2nd-difference enough, or do we need that smoothing now? The run-to-run variance suggests we need *something*.
|
||
2. **Cross-cluster breadth is at the root, not the derivatives.** The diagnosis was concrete: in 2022–2023, AI-company *earnings* barely mentioned electricity as a constraint (that narrative hit 2024–25). So the niche power-infra derivatives are corroborated almost entirely by the *energy* cluster. The cross-domain early discussion lived in *specialist* discourse (energy/macro podcasts), which we under-sampled. **This is the crux — see §8.**
|
||
3. **The frontier fan-out is untested.** The backtest used a *hand-written* derivative tree. We have **not** validated whether the frontier model, given the seed conviction, would *propose* the right derivatives (grid/transformers/nuclear/…). That's a separate and important test (it's the other half of Job B). It's deferred, not done.
|
||
4. **No lead-time measured yet.** The resolver (external-confirmation leg) is a stub. We can say the engine *surfaced* the derivatives at specific dates, but we have not yet measured earliness against the *actual* repricing of power infrastructure (the alpha measurement, §6.3). That needs price/event data and forward time.
|
||
5. **Filing extraction targets the wrong thing.** It reads filings front-to-back; 10-K front-matter and risk-factors are low-yield. It should target Item 7 (MD&A). This skews filing claims toward boilerplate and likely costs us signal.
|
||
6. **Stance/relation extraction is thin.** The local extractor sees one chunk at a time, so it rarely wires the cross-document `relation` links the §4.2 schema assumes. The Job A contrarian scorer therefore needs a separate LLM stance-folding pass (designed, not built). **Worth flagging to the handoff author:** the schema implies relation-linking that is hard to populate at extraction time.
|
||
|
||
---
|
||
|
||
## 7. Judgment calls I made (please scrutinize all of these)
|
||
|
||
Every place I made a decision the handoff didn't fully specify, or where I diverged:
|
||
|
||
1. **[BIGGEST] Relaxed the cross-cluster gate for Job B.** The design blueprint applied the §4.5 cross-cluster rule (`k_eff ≥ 2`) as a *universal* hard gate. I removed it as a *hard gate for the under-acted-conviction (Job B) scorer* — keeping EISC ≥ 2.0 (genuine independence) and a ≥2-source requirement, and letting cross-cluster *boost the score* instead of gating it. **Rationale:** the handoff §4.4 defines Job B as *"rising independent corroboration,"* whereas §4.5's cross-cluster-is-gold framing is about Job A *discovery* (avoiding echo chambers). N independent energy companies confirming a power thesis is corroboration, not an echo. **This is the difference between the derivatives clearing or not** — with the strict gate, *only the root clears* (cross-cluster, 2023). This is the #1 thing to debate (§8).
|
||
2. **Window length = 90 days for the backtest** (blueprint default was 28). 28-day windows are degenerate on quarterly filings/earnings (most windows empty). Made it configurable; 90d for filing-cadence corpora, 28d for weekly podcasts. *Open question: mixed-cadence corpora (filings + podcasts) want different windows simultaneously — currently one global value.*
|
||
3. **Improved the EISC cross-cluster multiplier.** Blueprint counted "distinct non-capped clusters present." I changed it to count only clusters that contribute ≥ 0.5 of an independent voice — so "one guest spanning 4 clusters" can't earn the gold multiplier. (A correctness fix, not a divergence in intent.)
|
||
4. **Hand-wrote the fan-out for v1** (per blueprint build-order). The derivative *phrasings* are mine, and the LLM relevance filter judges corroboration against those phrasings — so wording matters. A frontier-generated tree might phrase them to match the corpus better (or worse). Untested.
|
||
5. **Deferred the statistical-significance machinery** (Design 1's fitted curves / bootstraps / z-gates) as premature at pilot n — kept the hard minimum-evidence gates, not the smoothing. This is *why* the signal is noisy. Reconsider as the corpus grows (§6.1).
|
||
6. **Build order: Job B first; Job A (emergence/stance/intersection) and the frontier judge/fan-out deferred.** So the backtest tested Job B only, with no frontier in the loop. Faithful to the blueprint, but it means large parts of the §4 design are designed-not-built.
|
||
7. **Filings = 10-K/10-Q/20-F/40-F only** (skipped 8-K/6-K as low-yield current-reports). Earnings via FMP. Podcasts = the 4 RSS-full shows + a partial Catalyst slice. **I did not get the specialist energy/macro podcasts** (Catalyst/Columbia Energy/Macro Voices/Odd Lots) for 2022–2023 — they're YouTube-only with slow date-windowed enumeration. This under-samples exactly the cluster breadth the derivatives needed.
|
||
8. **Local Qwen for all extraction + scoring LLM helpers.** Gemini validated as an overflow backend but not used in the backtest.
|
||
|
||
---
|
||
|
||
## 8. The central debate: cross-cluster gating vs. corpus breadth
|
||
|
||
This is the section to take into the brainstorm. Grant's framing (paraphrased): *strict cross-cluster gating may limit our ability to pick up signal early; perhaps the real fix is that the cluster list is too small and there isn't enough breadth within each cluster, so the corpus needs to be dramatically increased.* I think this is the right instinct, and here's the structured case.
|
||
|
||
### The tension, precisely
|
||
- §4.5 is unambiguous and correct *for Job A discovery*: cross-cluster convergence is gold, within-cluster is near-noise (five bitcoin shows agreeing = the prior, not signal).
|
||
- But **Job B (derivatives / fan-out) has the opposite early-signal dynamic.** A niche derivative's *earliest* corroboration almost always comes from the single most-relevant cluster — the people closest to it. Power-infra repricing showed up *first* in energy-company earnings and energy-specialist discourse, and only *later* spread to AI companies and generalist macro. **Requiring cross-cluster corroboration means you only fire once the signal has already spread — which is precisely when you've lost the lead time.** The backtest demonstrates this exactly: the cross-cluster version of the signal (the root) is real but broad; the *actionable derivative* corroboration is single-cluster and earlier.
|
||
|
||
This is, I think, a genuine gap in the handoff: §4.5's "within-cluster is near-noise" was written with discovery in mind and is in tension with §4.4's "rising independent corroboration" for Job B. The implementation had to pick; I picked "relax for Job B." **The dev who wrote the spec should weigh in on whether that's the intended reading.**
|
||
|
||
### Why this points at corpus breadth (Grant's hypothesis), and I agree
|
||
The reason single-cluster corroboration feels uncomfortable is the fear of an echo chamber (energy companies talking their book). **The principled fix isn't to demand cross-cluster — it's to make "independent within a domain" *mean something*, which requires breadth.** Right now:
|
||
- We have **6 coarse clusters**, and the corpus is dominated by **two** of them (energy, ai_tech), almost entirely **company filings/earnings**. Within "energy," CEG/VST/TLN/NEE are independent issuers but they're all *sell-side-of-their-own-demand* — partly correlated by construction.
|
||
- A handful of podcasts (4 shows) provide the only non-company voices, and the *specialist* energy/macro podcasts that would carry the early cross-domain signal weren't ingested for the backtest window.
|
||
|
||
So the corpus is both **too narrow** (few clusters, two dominant) and **too shallow within clusters** (few genuinely independent voice-types per cluster). Two complementary directions:
|
||
|
||
1. **Finer cluster taxonomy.** "Energy" → {power utilities, grid/equipment, nuclear/uranium, gas, energy-specialist media}. "AI/tech" → {chips, hyperscalers, data-center REITs, AI-specialist media}. Add clusters the pilot omitted entirely: **sell-side research, trade press / industry newsletters, expert-network transcripts, specialist substacks, conference/earnings-adjacent commentary, policy/regulatory.** With a finer taxonomy, *cross-sub-cluster* convergence (e.g., a nuclear operator **and** a grid-equipment maker **and** an energy-trade newsletter) becomes a meaningful *early* signal — and the strict cross-cluster gate becomes defensible again because the clusters are now granular enough to convergence early.
|
||
2. **Dramatically more breadth within each cluster.** More issuers, far more podcasts/media, and crucially the *specialist* sources where derivatives are discussed first. This is the difference between "4 energy companies" (correlated) and "20 independent energy-ecosystem voices of different types" (genuinely independent).
|
||
|
||
### My recommendation for the debate (not a decision — a starting position)
|
||
- **Short term:** keep Job B's gate at *independence* (EISC ≥ 2, ≥2 sources) for the **evidence/logging tier** — so we *catch and log* early single-cluster corroboration and start the lead-time clock — and use **cross-cluster as the promotion/confidence tier** (the thing we'd actually act on). This preserves earliness *and* honesty: we log the early single-cluster whisper, but we don't treat it as high-confidence until it's broadened.
|
||
- **Medium term (the real fix, Grant's point):** broaden the cluster taxonomy and dramatically expand the corpus — especially the specialist/media sources and finer sub-clusters. This likely does more for signal quality than any scoring tweak, and it would let us *re-tighten* the cross-cluster requirement without losing earliness, because convergence would happen earlier across a richer cluster space.
|
||
- **Either way:** build the **resolver / lead-time** measurement next, because *"did it clear the bar"* is far less interesting than *"how early did it clear vs. the actual repricing"* — and that number is what tells us whether the relaxed gate is finding alpha or just noise.
|
||
|
||
---
|
||
|
||
## 9. Suggested agenda for the brainstorm with the handoff author
|
||
|
||
1. **The §4.4-vs-§4.5 tension for Job B.** Is "rising independent corroboration" meant to allow single-cluster (independent-within-domain) corroboration, with cross-cluster as a confidence multiplier? Or is cross-cluster a hard requirement even for derivatives (accepting later signal)? *This is the load-bearing question.*
|
||
2. **Cluster taxonomy + corpus breadth.** How far to broaden clusters and sources? Which new source *types* matter most (sell-side, trade press, expert networks, specialist media)? What's the target corpus size for the cross-cluster signal to be early *and* honest?
|
||
3. **The temporal statistic.** Is raw 2nd-difference acceleration the right signal, or do we adopt the deferred smoothing now? The run-to-run variance argues for the latter.
|
||
4. **Frontier fan-out validation.** Design a test for whether the frontier *proposes* the right derivatives from a seed conviction (the untested half of Job B).
|
||
5. **Lead-time / resolution.** What external-confirmation data (price, signed deals, policy) feeds the resolver, and how do we grade earliness?
|
||
6. **Filing extraction → MD&A targeting**, and the relation/stance extraction gap (does the §4.2 schema's relation-linking need a dedicated pass?).
|
||
|
||
---
|
||
|
||
## 10. Appendix
|
||
|
||
**Corpus at backtest time:** 6,569 claims (5,129 embedded) · 411 filings + 410 earnings + 82 podcasts + 3 youtube · 47 sources · 90 voiceprints (35 named) · 10 shared-guest edges · 4 ledger rows · 81 candidate-score rows.
|
||
|
||
**Key parameters:** windows 90d × 3 (84/270-day lookback); EISC floor 2.0; under-acted score floor 0.3; coupling κ {shared_guest 0.85, citation 0.45, community 0.60}; cluster coupling {bitcoin 0.55, vc_consensus 0.35, other-same 0.25}; bitcoin/capped contribution ≤ 0.25.
|
||
|
||
**The contested gate, in code:** `signal_engine/signals/bar.py::_under_acted` — the `k_eff ≥ 2` requirement is commented out with the rationale; re-adding it reverts to "only the root clears."
|
||
|
||
**Reproduce:** `python -m signal_engine backtest --conviction K2023 --start 2023-03-01 --end 2024-09-01 --step-days 90 --window-days 90`. Trajectories print per-derivative with the evidence at each as_of.
|
||
|
||
**Module map:** `ingest/` (fetch + transcribe + diarize + identify), `extract/` (claims + backends), `embedstore/` (Qdrant hybrid), `signals/` (the scoring brain: independence, asof, windows, under_acted, bar, ledger_writer, resolver, run), `frontier/` (designed, deferred), `spark/` (the single gateway client), `store/` (schema + seeds), `ui/` (corpus app).
|
||
|
||
---
|
||
|
||
*Bottom line for the brainstorm: the engine is built, disciplined, and it surfaced the right thesis on real history. The honest gap is signal quality, and the highest-leverage fix is almost certainly corpus breadth + a finer cluster taxonomy (Grant's instinct), which would also let us resolve the cross-cluster gating debate from a position of strength rather than scarcity.*
|
||
|
||
---
|
||
|
||
> **Note on dates:** the quarterly as-of march is 2023-03, -05, -08, -11, 2024-02, -05, -08. The **2023-12 and 2024-03** columns are two ad-hoc single-date smoke runs (off the quarterly grid) that happen to be stored in the same table — included for completeness. The score for the SAME node at adjacent dates (e.g. 2023-11 vs 2023-12) swinging from 3.3 to 0 is itself a vivid illustration of the cadence-sensitivity problem.
|
||
|
||
## Appendix A — Full score trajectories (the noise, concretely)
|
||
|
||
Every under-acted-conviction node × every as-of date that was scored. `★` = cleared the evidence bar. The point of showing this: watch the score and the acceleration `a` swing between adjacent quarters — that is the noise the write-up (§6.1) describes.
|
||
|
||
| derivative | 2023-03 | 2023-05 | 2023-08 | 2023-11 | 2023-12 | 2024-02 | 2024-03 | 2024-05 | 2024-08 |
|
||
|---|---|---|---|---|---|---|---|---|---|
|
||
| K2023 | 0.0 | 2.4★ | 0.0 | 0.0 | 1.6 | 0.8 | 0.8 | 0.0 | 0.0 |
|
||
| K2023-cooling | 0.8 | 0.0 | 0.0 | 0.0 | 0.0 | 1.6 | 1.6 | 0.0 | 0.0 |
|
||
| K2023-gas-turbines | 0.0 | 0.0 | 0.0 | 0.8 | 0.8 | 0.0 | 0.0 | 0.0 | 0.0 |
|
||
| K2023-grid-interconnect | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|
||
| K2023-nuclear | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.8 | 0.0 | 0.0 | 0.0 |
|
||
| K2023-picks-and-shovels | 0.0 | 0.0 | 0.0 | 3.3 | 0.0 | 0.0 | 0.0 | 2.6★ | 0.0 |
|
||
| K2023-transformers | 0.0 | 0.0 | 0.8 | 0.5 | 0.0 | 0.0 | 0.0 | 4.8★ | 0.0 |
|
||
| K2023-uranium | 0.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|
||
| K2023-utilities-repriced | 0.8 | 0.0 | 0.8 | 0.0 | 0.8 | 0.0 | 1.6 | 0.0 | 0.0 |
|
||
|
||
### Detail — the acceleration sign-flips (why it's noisy)
|
||
|
||
For the headline derivative and the root, the raw inputs at each as-of (conf=confirmed corroborating claims, src=distinct sources, eisc=independence-weighted count, a=acceleration/2nd-derivative, k_eff=distinct independent clusters):
|
||
|
||
|
||
**K2023**
|
||
|
||
| as_of | score | cleared | conf | src | eisc | a | k_eff |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2023-03-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-05-30 | 2.40 | YES | 6 | 4 | 3.0 | 1.0 | 2 |
|
||
| 2023-08-28 | 0.00 | — | 6 | 4 | 0.0 | -5.0 | 0 |
|
||
| 2023-11-26 | 0.00 | — | 6 | 4 | 0.0 | 3.0 | 0 |
|
||
| 2023-12-01 | 1.60 | — | 6 | 1 | 1.0 | 2.0 | 1 |
|
||
| 2024-02-24 | 0.80 | — | 7 | 4 | 1.0 | 1.0 | 1 |
|
||
| 2024-03-01 | 0.80 | — | 6 | 4 | 1.0 | 1.0 | 1 |
|
||
| 2024-05-24 | 0.00 | — | 9 | 6 | 1.6 | -0.4 | 1 |
|
||
| 2024-08-22 | 0.00 | — | 10 | 7 | 1.0 | -1.2 | 1 |
|
||
|
||
**K2023-picks-and-shovels**
|
||
|
||
| as_of | score | cleared | conf | src | eisc | a | k_eff |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2023-03-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-05-30 | 0.00 | — | 2 | 2 | 1.0 | -1.0 | 1 |
|
||
| 2023-08-28 | 0.00 | — | 2 | 2 | 0.0 | -1.0 | 0 |
|
||
| 2023-11-26 | 3.33 | — | 4 | 3 | 1.6 | 2.6 | 1 |
|
||
| 2023-12-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-02-24 | 0.00 | — | 5 | 3 | 1.0 | -2.2 | 1 |
|
||
| 2024-03-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-05-24 | 2.56 | YES | 10 | 5 | 2.0 | 1.6 | 1 |
|
||
| 2024-08-22 | 0.00 | — | 5 | 3 | 0.0 | -1.0 | 0 |
|
||
|
||
**K2023-utilities-repriced**
|
||
|
||
| as_of | score | cleared | conf | src | eisc | a | k_eff |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2023-03-01 | 0.80 | — | 1 | 1 | 1.0 | 1.0 | 1 |
|
||
| 2023-05-30 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-08-28 | 0.80 | — | 1 | 1 | 1.0 | 1.0 | 1 |
|
||
| 2023-11-26 | 0.00 | — | 3 | 2 | 1.0 | -1.0 | 1 |
|
||
| 2023-12-01 | 0.77 | — | 4 | 2 | 1.6 | 0.6 | 1 |
|
||
| 2024-02-24 | 0.00 | — | 4 | 3 | 1.0 | 0.0 | 1 |
|
||
| 2024-03-01 | 1.60 | — | 7 | 4 | 2.0 | 1.0 | 1 |
|
||
| 2024-05-24 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-08-22 | 0.00 | — | 16 | 7 | 2.286 | -1.714 | 1 |
|
||
|
||
**K2023-nuclear**
|
||
|
||
| as_of | score | cleared | conf | src | eisc | a | k_eff |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2023-03-01 | 0.00 | — | 6 | 4 | 1.0 | 0.0 | 1 |
|
||
| 2023-05-30 | 2.05 | — | 5 | 3 | 1.6 | 1.6 | 1 |
|
||
| 2023-08-28 | 0.00 | — | 10 | 7 | 1.0 | -7.0 | 1 |
|
||
| 2023-11-26 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-12-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-02-24 | 0.80 | — | 6 | 4 | 1.0 | 1.0 | 1 |
|
||
| 2024-03-01 | 0.00 | — | 2 | 2 | 0.0 | 0.0 | 0 |
|
||
| 2024-05-24 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-08-22 | 0.00 | — | 12 | 4 | 1.0 | -2.0 | 1 |
|
||
|
||
**K2023-transformers**
|
||
|
||
| as_of | score | cleared | conf | src | eisc | a | k_eff |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2023-03-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-05-30 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2023-08-28 | 0.80 | — | 1 | 1 | 1.0 | 1.0 | 1 |
|
||
| 2023-11-26 | 0.48 | — | 4 | 2 | 1.0 | 0.6 | 1 |
|
||
| 2023-12-01 | 0.00 | — | 0 | 0 | 0.0 | 0.0 | 0 |
|
||
| 2024-02-24 | 0.00 | — | 4 | 2 | 0.0 | -1.0 | 0 |
|
||
| 2024-03-01 | 0.00 | — | 6 | 4 | 0.0 | -1.0 | 0 |
|
||
| 2024-05-24 | 4.80 | YES | 8 | 5 | 2.0 | 3.0 | 1 |
|
||
| 2024-08-22 | 0.00 | — | 8 | 5 | 1.6 | -1.6 | 1 | |