Swap the dead "scarcity as the connecting idea" / bitcoin-as-settlement spine for the v2.0 reserve-asset spine (bitcoin = apex non-debasable reserve asset; debasement = forcing function; AI = abundance engine; throughline is an asset-value/capital-flow claim, not settlement; three seams Energy<->Compute, Debasement<->Bitcoin, AI<->Data-Ownership) everywhere it was still encoded in live code, the seed, and the docs. - architect_agent.py / outreach_agent.py: both system prompts carried "scarcity as the connecting idea" and shipped settlement framing into every generated draft; rewritten to the reserve-asset spine. - thesis_seed.py: THROUGHLINE, PILLAR_1, the AI/energy-operator segment angle, and THESIS_V2 corrected and voice-cleaned (no em dash / "X, not Y" / "bet"). PILLAR_2/3 (real revenue, founder access) kept. - ensure_thesis_v2_promoted / revert_thesis_v2_promotion: make the v2.0 spine the working APPROVED spine and re-ground/clean the core nodes, deployment-state-invariant (structural targeting, not body text) and fully reversible (captures prior body/title/status/deleted_at). NODE level only: never sets a thesis_version canonical (guardrail #4); no hard deletes (guardrail #3). Wired into init_db after the v2 candidate stage. - docs/thesis-handoff.md replaced wholesale with the complete v2.0 doc; Ten31_Agentic_Build_Plan.md + PHASE_1.md throughline glosses updated. The v2.0 spine remains an unratified draft from the signal-engine workstream: canonical freeze stays the partners' dual sign-off, and Appendix-A conviction/exposure figures stay Grant's working read. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15 KiB
Ten31 — Agentic Capability Build Plan
Working document. Purpose: a concrete, sequenced plan for building an in-house system of AI agents to widen the top of the fundraising funnel, refine and propagate Ten31's thesis, and automate marketing/branding workflows — built with internal resources using Claude and Claude Code as the engineering partner.
1. Approach in one paragraph
Build six agents — five workers plus a lightweight orchestrator — on the Claude Agent SDK, connected to your systems through MCP. Run the reasoning on Claude (frontier-quality judgment for research, messaging, drafting). Self-host the data layer and the privacy-sensitive model work on your existing Start9 server and your dual DGX Sparks. Buy nothing for the core: your self-built CRM becomes the system of record, and your existing Gmail/Superhuman + calendar connectors supply the relationship data. The real unit of reuse is not the agent count — it is one shared LP graph (your CRM) plus a library of skills every agent draws from.
2. Guiding principles
- Sovereignty first. Sensitive LP and relationship data stays on infrastructure you control (Start9 + DGX Sparks). Only the minimum necessary context per call ever reaches a third-party model API.
- Frontier reasoning where it is best-in-class; local where privacy or cost dominate. Claude for hard agentic reasoning and LP-facing output; local open models for embeddings, redaction, triage, transcription, and reasoning over data that must not leave your walls.
- Human-in-the-loop on anything outbound or thesis-defining. Agents draft and prepare; partners approve and send.
- Compliant by design. Log every agent action; gate all outbound; bring counsel in before any cold outreach goes live.
- One source of truth. Every agent reads from and writes to the same LP graph, so research → outreach → nurture → meeting prep compound instead of fragmenting.
3. The agent roster (6)
| Agent | Job | Cadence | Brain | Human gate |
|---|---|---|---|---|
| Scout | Watches sources (X/nostr, filings, treasury announcements, conference rosters, podcast networks); flags trigger events; populates the pipeline. | Continuous / scheduled | Local (triage) + Claude (judgment calls) | None (internal only) |
| Analyst | Builds LP dossiers, enriches records, maps shortest warm-intro path through the team's network. | On-demand + triggered | Claude (synthesis); local for RAG/embeddings | None (internal only) |
| Architect | Thesis articulation. Owns and refines the canonical messaging — the reserve-asset throughline: as fiat debases and AI commoditizes the reproducible, value accrues to the scarce side of one supply chain (energy, compute, and bitcoin as the non-debasable reserve asset), structured on three seams (Energy↔Compute, Debasement↔Bitcoin, AI↔Data-Ownership). The copilot partners sit with to sharpen the narrative. Output = a living "messaging source of truth." | On-demand, collaborative | Claude | Partner sign-off on canonical thesis |
| Scribe | Distribution / amplification. Takes the Architect's canonical thesis + your content (Bitcoin Alpha, partner shows, memos) and propagates segment-specific cuts across X, nostr, LinkedIn, email. | Scheduled + on-demand | Claude | Review before publish |
| Closer | Drafts personalized outreach and nurture sequences, preps partners before LP calls, writes follow-ups, keeps the CRM clean. | Triggered + on-demand | Claude | Hard gate — human sends all outbound |
| Orchestrator ("Chief of Staff") | Schedules runs, routes work between agents, escalates to a human. | Always on | Claude (light) | n/a |
Why Architect and Scribe are separate. Distribution is high-frequency and semi-mechanical; thesis articulation is low-frequency, high-judgment, and collaborative. Keeping them apart lets the Architect own a stable, partner-approved narrative that the Scribe then propagates consistently everywhere.
4. Architecture and hosting map
4.1 Model layer
- Claude (API) — the brains for Analyst synthesis, Architect thesis work, Scribe drafting, Closer judgment, and Orchestrator routing. Use a stronger model for Architect/Analyst, a faster one for high-volume Scout/Closer tasks.
- Local model on the DGX Sparks — current local model is Qwen3.6 35B-A3B running on a single Spark. Used for PII redaction before any data leaves your walls, inbound triage/classification, transcription orchestration, structuring/extraction, and local reasoning over data you choose never to send out.
- The A3B (~3B active params) design means only a small slice of the model runs per token, so it largely sidesteps the Spark's memory-bandwidth limit and keeps decode fast despite being a 35B-total model. No need to link both Sparks for a larger model — that earlier ceiling is moot for this workload.
- Embeddings + reranking (shipped, Spark Control v0.15.0). Retrieval runs on
BAAI/bge-m3(dense, 1024-dim, L2-normalized) plusBAAI/bge-reranker-v2-m3(cross-encoder), served by spark-embed — a small FastAPI server on Spark 2 built from the NGC PyTorch image (HF TEI was ruled out: no arm64 CUDA image). Exposed through Spark Control as/v1/embeddings,/v1/rerank, and/api/search(orchestrated hybrid retrieval). Combined GPU footprint on Spark 2 is trivial (~3 GB). - Spark allocation. Spark 1 = LLM serving (hot KV cache). Spark 2 = embeddings + reranker + audio + the Qdrant vector index. Both Sparks are treated as always-on production infrastructure.
- All local model services are fronted by Spark Control (the self-hosted gateway on Start9): agents hit one trusted URL for chat, embeddings, rerank, transcription, and TTS, with shared TLS, access control, and observability.
- Auth note: Agent SDK agents must authenticate with an API key, not a claude.ai login.
4.2 Data layer — the LP graph (self-hosted)
- The CRM (self-hosted on Start9) is the canonical system of record. Extend it to be the LP graph. Add: prospect/LP schema fields (thesis fit, segment, accreditation/QP status, warmth score, source, owner, last-touch), an interaction log (every agent action + every human touch), a derived relationship graph table, and canonical entity IDs for entity resolution (see ingest pipeline).
- Vector store: Qdrant on Spark 2 (settled). Holds the embedded chunks. It is a rebuildable, derived index, not a second source of truth — if lost, it re-embeds from the CRM in minutes. Qdrant provides dense search + native BM25 + payload filtering + Reciprocal Rank Fusion in one service.
- Retrieval pipeline. One orchestrated call to Spark Control
/api/search: embed query (BGE-M3) → Qdrant dense + BM25 RRF with payload pre-filter → cross-encoder rerank → top_k. BM25 is generated client-side via FastEmbed (Qdrant/bm25) at both ingest and query time, with Qdrant applying IDF over your corpus — so domain entities (LP names, tickers, portfolio companies) are weighted by your own term statistics rather than BGE-M3's general-web sparse weights. - Ingest pipeline (the real Phase 0 work). CRM record/change → chunk (one chunk per email/note/transcript-turn; one per memo section; time-aware; entities +
date_tskept as filterable payload, not embedded text) → resolve entities to a canonicallp_id(lightweight local-Qwen step) → produce both a dense vector (/v1/embeddings) and a sparse BM25 vector (FastEmbed) → upsert both + payload to Qdrant directly (not via the gateway). One-time backfill + idempotent incremental sync. Full recipe:docs/EMBEDDINGS.md. - Per-agent retrieval modes. Don't force one pipeline on all agents. Build a small library the orchestrator picks from: high-recall dense at large K (Scout), high-precision keyword/BM25 (Closer — "did we ever discuss X with this LP?"), long-context + rerank (Architect). The CRM MCP server exposes these as tools.
- Wrap the CRM in an MCP server so all agents read/write through one uniform interface, including the retrieval modes above. Because the CRM is self-built, any endpoint the agents need can be added.
4.3 Integration layer (MCP fabric)
- MCP servers to stand up / connect:
- CRM / LP graph (custom, self-hosted) — primary.
- Email + calendar — Gmail/Superhuman connectors are already live; these feed Closer (drafting, follow-ups) and the Analyst's warm-path derivation.
- Drive / notes — internal documents and memos.
- Publishing channels — X, nostr, LinkedIn, email/newsletter (for Scribe).
- Public data sources — filings, web search, and the X API (official key in hand) for Scout/Analyst enrichment. X is a primary source here: per-prospect public profile/bio/activity and follower-following overlap for thesis-fit scoring and mutual-connection discovery (Analyst), plus account/list/keyword monitoring and follower-graph signals (Scout). Confirm what your X access tier permits (full-archive search, follower-graph pulls, streaming) — that sets the ceiling on heavier monitoring. nostr APIs as a complementary source.
4.4 Orchestration / runtime
- Inner loop: Claude Agent SDK handles each agent's tool-use loop and context management.
- Outer loop: a thin workflow engine decides when and which agent runs (Temporal for durable retries, or simpler cron/queue + n8n glue to start).
- Observability: structured logging of every agent action, with a simple dashboard. Required for both debugging and compliance.
4.5 Enrichment (privacy-preserving)
- Default: one-way, per-prospect public lookups that write results into the CRM. Never upload the LP list to a third party. The X API is the workhorse here — public, per-prospect, ToS-compliant via the official key — and its follower-graph data complements the email/calendar relationship graph for warm-path mapping.
- Optional: a self-hosted scraper/enrichment pipeline on the Sparks if you want zero third-party API exposure.
4.6 Redaction / re-hydration boundary (Claude-facing reasoning)
- For the steps where an agent must have Claude reason over LP-specific content (Analyst dossiers, Closer drafting), a local scrub → reason → re-hydrate round-trip keeps identifiers off the third-party API: the Sparks pseudonymize names/orgs/amounts to stable placeholders, Claude reasons over the de-identified prompt, and real values are swapped back locally before a human reviews. The ingest/retrieval path is already fully local and needs none of this.
- This is designed now, built in Phase 2/3 (it is not needed in Phase 0). Full design:
docs/redaction-rehydration.md.
5. Build sequence
Phase 0 — Foundation
The substrate: data layer + retrieval, no live-in-the-wild agents yet. Division of labor:
- Spark developer (their side): TEI serving BGE-M3 + BGE-Reranker-v2-m3 and Qdrant on Spark 2, exposed via Spark Control
/v1/embeddings+/v1/rerank. - Claude Code + you (this project):
- Read the CRM code; document the storage engine, schema, and API surface.
- Extend the CRM schema (LP/prospect fields, interaction log, relationship graph, canonical entity IDs).
- Build the ingest/sync pipeline (chunking + entity resolution + metadata payloads; backfill + incremental).
- Build the CRM MCP server wrapping CRM reads/writes and the per-agent retrieval modes.
- Bring counsel in to define outbound and recordkeeping rules so the system is compliant from day one.
Phase 1 — Architect + Scribe
- Stand up the Architect first: encode the current thesis, voice, and segment definitions as skills; use it collaboratively to produce the canonical messaging source of truth.
- Then Scribe: propagate that thesis into segment-specific content with human review before publish.
- Lowest risk, highest immediate awareness ROI, never touches cold outreach — and it proves the full pattern (SDK + skills + MCP + human review).
Phase 2 — Scout + Analyst
- Scout populates the pipeline from public signals (X monitoring via the API key); Analyst builds dossiers and derives warm paths from your own email/calendar graph plus X follower-graph overlap.
- Internal-facing, still no outbound. This is where the Sparks earn their keep (bulk classification, embeddings, RAG).
Phase 3 — Closer + Orchestrator
- Closer drafts outbound, nurture, and meeting prep — with hard human-in-the-loop gates and full logging. Highest-risk and regulated, so it comes last.
- Orchestrator added once there are multiple agents to coordinate and schedule.
6. Team and ownership model
- Engineering partner: Claude + Claude Code, supplying Agent SDK and MCP fluency, scaffolding the agents, writing the MCP servers and orchestration, and customizing the Start9 CRM package.
- Operator: you (and your partner). You own deployment, secrets/key management, uptime, and the human-review gates. Your prior Start9 CRM build demonstrates this is well within reach.
- The one real risk is time, not capability. Removing the part-time data/ops hire means operational ownership lands on the partners. If partner time is scarce, that — not tooling or skill — is the constraint to manage. Mitigations: keep the early phases internal-only (no on-call urgency), automate logging/monitoring, and stage the highest-maintenance agent (Closer) last.
7. Compliance by design
- Log every agent action and every outbound draft.
- Gate all outbound through human send.
- Resolve solicitation posture (e.g. 506(b) vs 506(c)), accreditation/QP verification, and recordkeeping with counsel before the Closer touches cold outreach.
- Start with distribution and inbound nurture, where constraints are lightest.
8. Open decisions
Resolved: local chat/triage model = Qwen3.6 35B-A3B (Spark 1); embedding = BAAI/bge-m3 dense 1024-dim; reranker = BAAI/bge-reranker-v2-m3; vector DB = Qdrant v1.16.0 on Spark 2; serving = spark-embed (custom FastAPI on NGC PyTorch image, not TEI); BM25 sparse generated client-side via FastEmbed (Qdrant/bm25); all fronted by Spark Control (/v1/embeddings, /v1/rerank, /api/search), shipped v0.15.0. Embedding-model A/B upgrade candidate if dense recall lags: Qwen3-Embedding-4B (same /v1/embeddings contract).
Still open:
- Workflow engine for the outer loop (Phase 3): Temporal vs. cron/queue + n8n to start.
- Whether any third-party enrichment API is acceptable, or X + fully self-hosted enrichment only.
- Confirm X API usage limits (full-archive search, follower-graph pulls, streaming) to size Scout's monitoring scope. (Current access is pay-as-you-go credits.)
- Segment definitions for the Architect/Scribe (who are the distinct LP audiences, and what does each one need to hear?).
- Embedding dimension/quantization left at BGE-M3 native 1024-dim fp16 — no Matryoshka truncation or int8 needed at this corpus scale.