Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server

Workstream A–C substrate for the Ten31 agentic system: - A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9 - A2: additive/reversible core migration (canonical_entities, entity_links, interaction_log, relationship_edges, soft-delete) + ledgered runner - B1/B3: chunking + deterministic entity resolution (backend/ingest) - B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks - C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools - docs: redaction/re-hydration, Gmail enablement runbook - synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db, drop legacy files + start9/0.3.5) Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity queries). Real backfill runs on Ten31 infra; index holds synthetic data only. Branch snapshot also captures pre-existing working-tree changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 08:11:28 -05:00
parent 7027efd777
commit c7ce44d963
99 changed files with 10676 additions and 7817 deletions
@@ -0,0 +1,133 @@
+# Ten31 — Agentic Capability Build Plan
+
+*Working document. Purpose: a concrete, sequenced plan for building an in-house system of AI agents to widen the top of the fundraising funnel, refine and propagate Ten31's thesis, and automate marketing/branding workflows — built with internal resources using Claude and Claude Code as the engineering partner.*
+
+---
+
+## 1. Approach in one paragraph
+
+Build **six agents** — five workers plus a lightweight orchestrator — on the **Claude Agent SDK**, connected to your systems through **MCP**. Run the *reasoning* on **Claude** (frontier-quality judgment for research, messaging, drafting). **Self-host the data layer and the privacy-sensitive model work** on your existing Start9 server and your **dual DGX Sparks**. **Buy nothing for the core**: your self-built CRM becomes the system of record, and your existing Gmail/Superhuman + calendar connectors supply the relationship data. The real unit of reuse is not the agent count — it is one shared **LP graph** (your CRM) plus a library of **skills** every agent draws from.
+
+---
+
+## 2. Guiding principles
+
+1. **Sovereignty first.** Sensitive LP and relationship data stays on infrastructure you control (Start9 + DGX Sparks). Only the minimum necessary context per call ever reaches a third-party model API.
+2. **Frontier reasoning where it is best-in-class; local where privacy or cost dominate.** Claude for hard agentic reasoning and LP-facing output; local open models for embeddings, redaction, triage, transcription, and reasoning over data that must not leave your walls.
+3. **Human-in-the-loop on anything outbound or thesis-defining.** Agents draft and prepare; partners approve and send.
+4. **Compliant by design.** Log every agent action; gate all outbound; bring counsel in before any cold outreach goes live.
+5. **One source of truth.** Every agent reads from and writes to the same LP graph, so research → outreach → nurture → meeting prep compound instead of fragmenting.
+
+---
+
+## 3. The agent roster (6)
+
+| Agent | Job | Cadence | Brain | Human gate |
+|---|---|---|---|---|
+| **Scout** | Watches sources (X/nostr, filings, treasury announcements, conference rosters, podcast networks); flags trigger events; populates the pipeline. | Continuous / scheduled | Local (triage) + Claude (judgment calls) | None (internal only) |
+| **Analyst** | Builds LP dossiers, enriches records, maps shortest warm-intro path through the team's network. | On-demand + triggered | Claude (synthesis); local for RAG/embeddings | None (internal only) |
+| **Architect** | **Thesis articulation.** Owns and refines the canonical messaging — the scarcity / critical-infrastructure throughline tying bitcoin to AI infrastructure. The copilot partners sit with to sharpen the narrative. Output = a living "messaging source of truth." | On-demand, collaborative | Claude | Partner sign-off on canonical thesis |
+| **Scribe** | **Distribution / amplification.** Takes the Architect's canonical thesis + your content (Bitcoin Alpha, partner shows, memos) and propagates segment-specific cuts across X, nostr, LinkedIn, email. | Scheduled + on-demand | Claude | Review before publish |
+| **Closer** | Drafts personalized outreach and nurture sequences, preps partners before LP calls, writes follow-ups, keeps the CRM clean. | Triggered + on-demand | Claude | **Hard gate** — human sends all outbound |
+| **Orchestrator** ("Chief of Staff") | Schedules runs, routes work between agents, escalates to a human. | Always on | Claude (light) | n/a |
+
+**Why Architect and Scribe are separate.** Distribution is high-frequency and semi-mechanical; thesis articulation is low-frequency, high-judgment, and collaborative. Keeping them apart lets the Architect own a stable, partner-approved narrative that the Scribe then propagates consistently everywhere.
+
+---
+
+## 4. Architecture and hosting map
+
+### 4.1 Model layer
+- **Claude (API)** — the brains for Analyst synthesis, Architect thesis work, Scribe drafting, Closer judgment, and Orchestrator routing. Use a stronger model for Architect/Analyst, a faster one for high-volume Scout/Closer tasks.
+- **Local model on the DGX Sparks** — current local model is **Qwen3.6 35B-A3B running on a single Spark**. Used for PII redaction before any data leaves your walls, inbound triage/classification, transcription orchestration, structuring/extraction, and local reasoning over data you choose never to send out.
+  - The **A3B (~3B active params)** design means only a small slice of the model runs per token, so it largely sidesteps the Spark's memory-bandwidth limit and keeps decode fast despite being a 35B-total model. No need to link both Sparks for a larger model — that earlier ceiling is moot for this workload.
+  - **Embeddings + reranking (shipped, Spark Control v0.15.0).** Retrieval runs on `BAAI/bge-m3` (dense, 1024-dim, L2-normalized) plus `BAAI/bge-reranker-v2-m3` (cross-encoder), served by **spark-embed** — a small FastAPI server on **Spark 2** built from the NGC PyTorch image (HF TEI was ruled out: no arm64 CUDA image). Exposed through Spark Control as `/v1/embeddings`, `/v1/rerank`, and `/api/search` (orchestrated hybrid retrieval). Combined GPU footprint on Spark 2 is trivial (~3 GB).
+  - **Spark allocation.** Spark 1 = LLM serving (hot KV cache). Spark 2 = embeddings + reranker + audio + the Qdrant vector index. Both Sparks are treated as always-on production infrastructure.
+- **All local model services are fronted by Spark Control** (the self-hosted gateway on Start9): agents hit one trusted URL for chat, embeddings, rerank, transcription, and TTS, with shared TLS, access control, and observability.
+- **Auth note:** Agent SDK agents must authenticate with an **API key**, not a claude.ai login.
+
+### 4.2 Data layer — the LP graph (self-hosted)
+- **The CRM (self-hosted on Start9) is the canonical system of record.** Extend it to be the LP graph. Add: prospect/LP schema fields (thesis fit, segment, accreditation/QP status, warmth score, source, owner, last-touch), an interaction log (every agent action + every human touch), a derived **relationship graph** table, and **canonical entity IDs** for entity resolution (see ingest pipeline).
+- **Vector store: Qdrant on Spark 2 (settled).** Holds the embedded chunks. It is a **rebuildable, derived index**, not a second source of truth — if lost, it re-embeds from the CRM in minutes. Qdrant provides dense search + native BM25 + payload filtering + Reciprocal Rank Fusion in one service.
+- **Retrieval pipeline.** One orchestrated call to Spark Control `/api/search`: embed query (BGE-M3) → Qdrant dense + BM25 RRF with payload pre-filter → cross-encoder rerank → top_k. BM25 is generated **client-side** via FastEmbed (`Qdrant/bm25`) at both ingest and query time, with Qdrant applying IDF over *your* corpus — so domain entities (LP names, tickers, portfolio companies) are weighted by your own term statistics rather than BGE-M3's general-web sparse weights.
+- **Ingest pipeline (the real Phase 0 work).** CRM record/change → chunk (one chunk per email/note/transcript-turn; one per memo *section*; time-aware; entities + `date_ts` kept as filterable payload, not embedded text) → resolve entities to a canonical `lp_id` (lightweight local-Qwen step) → produce **both** a dense vector (`/v1/embeddings`) and a sparse BM25 vector (FastEmbed) → upsert both + payload to Qdrant **directly** (not via the gateway). One-time backfill + idempotent incremental sync. Full recipe: `docs/EMBEDDINGS.md`.
+- **Per-agent retrieval modes.** Don't force one pipeline on all agents. Build a small library the orchestrator picks from: high-recall dense at large K (Scout), high-precision keyword/BM25 (Closer — "did we ever discuss X with this LP?"), long-context + rerank (Architect). The CRM MCP server exposes these as tools.
+- **Wrap the CRM in an MCP server** so all agents read/write through one uniform interface, including the retrieval modes above. Because the CRM is self-built, any endpoint the agents need can be added.
+
+### 4.3 Integration layer (MCP fabric)
+- MCP servers to stand up / connect:
+  - **CRM / LP graph** (custom, self-hosted) — primary.
+  - **Email + calendar** — Gmail/Superhuman connectors are already live; these feed Closer (drafting, follow-ups) and the Analyst's warm-path derivation.
+  - **Drive / notes** — internal documents and memos.
+  - **Publishing channels** — X, nostr, LinkedIn, email/newsletter (for Scribe).
+  - **Public data sources** — filings, web search, and the **X API (official key in hand)** for Scout/Analyst enrichment. X is a primary source here: per-prospect public profile/bio/activity and follower-following overlap for thesis-fit scoring and mutual-connection discovery (Analyst), plus account/list/keyword monitoring and follower-graph signals (Scout). Confirm what your X access *tier* permits (full-archive search, follower-graph pulls, streaming) — that sets the ceiling on heavier monitoring. nostr APIs as a complementary source.
+
+### 4.4 Orchestration / runtime
+- Inner loop: **Claude Agent SDK** handles each agent's tool-use loop and context management.
+- Outer loop: a thin workflow engine decides *when* and *which* agent runs (Temporal for durable retries, or simpler cron/queue + n8n glue to start).
+- **Observability:** structured logging of every agent action, with a simple dashboard. Required for both debugging and compliance.
+
+### 4.5 Enrichment (privacy-preserving)
+- Default: **one-way, per-prospect public lookups** that write results *into* the CRM. Never upload the LP list to a third party. The **X API** is the workhorse here — public, per-prospect, ToS-compliant via the official key — and its follower-graph data complements the email/calendar relationship graph for warm-path mapping.
+- Optional: a **self-hosted scraper/enrichment pipeline on the Sparks** if you want zero third-party API exposure.
+
+### 4.6 Redaction / re-hydration boundary (Claude-facing reasoning)
+- For the steps where an agent must have **Claude reason over LP-specific content** (Analyst dossiers, Closer drafting), a local **scrub → reason → re-hydrate** round-trip keeps identifiers off the third-party API: the Sparks pseudonymize names/orgs/amounts to stable placeholders, Claude reasons over the de-identified prompt, and real values are swapped back locally before a human reviews. The ingest/retrieval path is already fully local and needs none of this.
+- This is **designed now, built in Phase 2/3** (it is not needed in Phase 0). Full design: `docs/redaction-rehydration.md`.
+
+---
+
+## 5. Build sequence
+
+### Phase 0 — Foundation
+The substrate: data layer + retrieval, no live-in-the-wild agents yet. Division of labor:
+- **Spark developer (their side):** TEI serving BGE-M3 + BGE-Reranker-v2-m3 and Qdrant on Spark 2, exposed via Spark Control `/v1/embeddings` + `/v1/rerank`.
+- **Claude Code + you (this project):**
+  1. Read the CRM code; document the storage engine, schema, and API surface.
+  2. Extend the CRM schema (LP/prospect fields, interaction log, relationship graph, canonical entity IDs).
+  3. Build the ingest/sync pipeline (chunking + entity resolution + metadata payloads; backfill + incremental).
+  4. Build the CRM MCP server wrapping CRM reads/writes and the per-agent retrieval modes.
+  5. Bring counsel in to define outbound and recordkeeping rules so the system is compliant from day one.
+
+### Phase 1 — Architect + Scribe
+- Stand up the **Architect** first: encode the current thesis, voice, and segment definitions as skills; use it collaboratively to produce the canonical messaging source of truth.
+- Then **Scribe**: propagate that thesis into segment-specific content with human review before publish.
+- Lowest risk, highest immediate awareness ROI, never touches cold outreach — and it proves the full pattern (SDK + skills + MCP + human review).
+
+### Phase 2 — Scout + Analyst
+- **Scout** populates the pipeline from public signals (X monitoring via the API key); **Analyst** builds dossiers and derives warm paths from your own email/calendar graph plus X follower-graph overlap.
+- Internal-facing, still no outbound. This is where the Sparks earn their keep (bulk classification, embeddings, RAG).
+
+### Phase 3 — Closer + Orchestrator
+- **Closer** drafts outbound, nurture, and meeting prep — with hard human-in-the-loop gates and full logging. Highest-risk and regulated, so it comes last.
+- **Orchestrator** added once there are multiple agents to coordinate and schedule.
+
+---
+
+## 6. Team and ownership model
+
+- **Engineering partner:** Claude + Claude Code, supplying Agent SDK and MCP fluency, scaffolding the agents, writing the MCP servers and orchestration, and customizing the Start9 CRM package.
+- **Operator:** you (and your partner). You own deployment, secrets/key management, uptime, and the human-review gates. Your prior Start9 CRM build demonstrates this is well within reach.
+- **The one real risk is time, not capability.** Removing the part-time data/ops hire means operational ownership lands on the partners. If partner time is scarce, that — not tooling or skill — is the constraint to manage. Mitigations: keep the early phases internal-only (no on-call urgency), automate logging/monitoring, and stage the highest-maintenance agent (Closer) last.
+
+---
+
+## 7. Compliance by design
+
+- Log every agent action and every outbound draft.
+- Gate all outbound through human send.
+- Resolve solicitation posture (e.g. 506(b) vs 506(c)), accreditation/QP verification, and recordkeeping with counsel **before** the Closer touches cold outreach.
+- Start with distribution and inbound nurture, where constraints are lightest.
+
+---
+
+## 8. Open decisions
+
+**Resolved:** local chat/triage model = Qwen3.6 35B-A3B (Spark 1); embedding = `BAAI/bge-m3` dense 1024-dim; reranker = `BAAI/bge-reranker-v2-m3`; vector DB = Qdrant v1.16.0 on Spark 2; serving = **spark-embed** (custom FastAPI on NGC PyTorch image, *not* TEI); BM25 sparse generated client-side via FastEmbed (`Qdrant/bm25`); all fronted by Spark Control (`/v1/embeddings`, `/v1/rerank`, `/api/search`), shipped v0.15.0. Embedding-model A/B upgrade candidate if dense recall lags: `Qwen3-Embedding-4B` (same `/v1/embeddings` contract).
+
+**Still open:**
+1. Workflow engine for the outer loop (Phase 3): Temporal vs. cron/queue + n8n to start.
+2. Whether any third-party enrichment API is acceptable, or X + fully self-hosted enrichment only.
+3. Confirm **X API usage limits** (full-archive search, follower-graph pulls, streaming) to size Scout's monitoring scope. (Current access is pay-as-you-go credits.)
+4. Segment definitions for the Architect/Scribe (who are the distinct LP audiences, and what does each one need to hear?).
+5. Embedding dimension/quantization left at BGE-M3 native 1024-dim fp16 — no Matryoshka truncation or int8 needed at this corpus scale.