Phase 1 Workstream A+E: thesis substrate + dual-approval gate

- migration 0002_phase1_architect: thesis_lines (core spine + per-segment lines), thesis_nodes (+ append-only revisions), thesis_versions (one-canonical-per-line DB invariant), thesis_reviews (dual approval + feedback), segments. Reversible. - backend/mcp/architect_tools.py: agent draft tools (node tree, versions, segments, get_canonical fails-closed) — NO self-approval path. MCP-exposed. - backend/thesis_review.py + server.py routes: human-gated approval. Dual sign-off via thesis_required_approvals; atomic supersede; every action logged. - docs/PHASE_1.md (kickoff brief); docs/OPERATIONS.md (partner guide); start9/0.4 "Resolve duplicate names" fuzzy action. Verified on synthetic data: dual approval promotes correctly, exactly one canonical survives supersede, get_canonical fails closed, full interaction_log. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 10:20:00 -05:00
parent 6be2e40f54
commit 3e199fd8d5
10 changed files with 993 additions and 0 deletions
@@ -0,0 +1,106 @@
+# Operating the Ten31 CRM — A Partner's Guide
+
+> **Status: DRAFT / living document.** This is the operator-facing guide to our new agent-enhanced CRM. It is written for the firm's non-engineer members — especially the partners who will be thought-partners and dual-approvers of the thesis. It will grow as the system is built; the open questions and "what's coming" notes below are real, not placeholders. Last updated 2026-06-05.
+
+---
+
+## 1. What this is
+
+Our CRM has quietly become two things at once.
+
+**First, it's the canonical "LP graph."** It is the single source of truth for who our LPs and prospects are, what we've committed and discussed, and how everyone connects. Historically we tracked investors in two different places — a classic contacts/opportunities system and the live fundraising grid (the collaborative spreadsheet the partners actually edit). Those two never agreed on a single record per person. The new layer fixes that: it resolves all the variants of one investor — their grid row, their contact card, their org, their closed-LP profile — into **one real canonical record**. That canonical record is what everything else is built on.
+
+**Second, it now has an AI-agent layer on top.** The vision is six specialized agents that widen the fundraising funnel and sharpen how we tell our story, all running on Claude for reasoning and on our own local models for anything sensitive. In one paragraph: **Scout** watches public sources for trigger events; **Analyst** builds LP dossiers and maps warm-intro paths; **Architect** helps us converge on and evolve our investment thesis; **Scribe** turns that thesis into content; **Closer** drafts outreach and meeting prep; and an **Orchestrator** schedules and routes work between them. Every one of them reads from the canonical LP graph, and none of them sends anything to a human without a partner approving it first.
+
+**Where we are today.** We are deliberately phased.
+
+- **Phase 0 (live): the data + retrieval substrate.** The canonical entities, an append-only interaction log, and search over our own corpus. No outward-facing agents exist yet — this is the foundation everything else stands on.
+- **Phase 1 (starting): the Architect.** A collaborative copilot for the thesis. It drafts and pressure-tests; a partner signs off; the approved thesis becomes the source of truth every later agent reads.
+- **Later (Phase 2–3): Scout, Analyst, Closer, Orchestrator** — and only after counsel has defined what we're allowed to do on outbound.
+
+---
+
+## 2. The big ideas, in plain terms
+
+**Canonical entities — one real record per LP.** No more "is this the John Smith from the grid or the J. Smith from contacts?" The system collapses name variants and cross-system duplicates into a single canonical identity. When you look something up, you get the whole person — not a fragment. This is also why our search works: if the same LP is scattered under three spellings, retrieval fragments and the agents get a partial picture.
+
+**The interaction log — everything is recorded.** There is now an append-only log of every meaningful action: every human touch (a logged call, a note, a meeting) and, going forward, every agent action (a draft generated, a record enriched, a thesis version approved). It is never edited or deleted, only appended. This is both our compliance trail and the agents' memory. The richer it is, the smarter every agent gets.
+
+**Retrieval / search over our own corpus.** We can now ask questions across everything we've ever recorded — notes, logged communications, fundraising-grid notes, and (once enabled) Gmail correspondence — and get back the most relevant pieces. It's a hybrid of meaning-based search and exact keyword/name matching, tuned on our own vocabulary so exact fund and LP names rank correctly. This is what lets an agent answer "did we ever discuss X with this LP?" instead of guessing.
+
+**Sovereignty — our sensitive data stays ours.** This is the non-negotiable. All the LP-specific data, the embeddings, the search index, and the duplicate-resolution all run on **our own infrastructure** — the Start9 server and our local Spark machines. Claude (a third party) is only ever sent the *minimum necessary, non-sensitive* context for a given task, and never a bulk export of the LP list. When an agent genuinely needs Claude to reason over real record content (later phases), that content first passes through a redaction step that swaps real names, amounts, and emails for placeholders, then swaps them back locally. The de-anonymization key never leaves our box.
+
+---
+
+## 3. How to operate it day-to-day
+
+You don't need to touch any code to operate this. Three habits and three buttons.
+
+**Keep the CRM clean.** The canonical graph is only as good as what goes in. When you add an investor, use the real legal name where you can, attach them to the right org, and avoid creating a second record for someone who's already there. The duplicate resolver is good, but it works best as a backstop, not a crutch.
+
+**Log interactions, and log them well — this is the highest-leverage habit.** When you have a call, a meeting, or a meaningful email exchange, log it with substance: what was discussed, the LP's reaction, objections raised, next steps. Two reasons this matters more than it looks. (1) It's our compliance record. (2) It is literally the training material the agents reason over. A thin "had a call" note teaches the agents nothing; "pushed back on energy thesis, worried about regulatory risk in Texas, wants to see Fund II returns" becomes evidence the Architect can use to anticipate objections and the Analyst can use to build a real dossier. Good logging compounds.
+
+**The three one-click actions (on the StartOS server page).** These run on our infrastructure and are safe to re-run any time. None of them modifies your CRM source data — they build or refresh the derived search index and the canonical IDs.
+
+- **Build search index** — the one-time (or full-rebuild) setup. It resolves the canonical entity IDs from your live data, then reads every record, and builds the entire search index from scratch. Takes roughly 8–15 minutes. Use this for the initial go-live or if you ever want a clean rebuild.
+- **Refresh search index** — the fast, routine one. It updates the search index with just what's changed since the last run. Seconds to minutes. Use this to keep search current after a batch of edits. (Eventually this will run automatically on a schedule; for now it's a button.)
+- **Resolve duplicate names** — the smart de-duplication. The build step merges the obvious exact matches automatically and *flags* the harder, judgment-call pairs (e.g. "Kate" vs "Katherine"). This action asks our local Qwen model to decide which flagged pairs are truly the same person and merges them. It runs entirely on our infrastructure and is idempotent (safe to re-run). It needs our Spark Control gateway to be reachable, because that's where the local model lives.
+
+A sensible rhythm: **Build** once at go-live, **Resolve duplicate names** after the build flags candidates, and **Refresh** routinely as the grid and correspondence change.
+
+**Where to look when something seems off.** If search results feel stale, run **Refresh search index**. If the same LP shows up as two people, run **Resolve duplicate names** (and check you didn't create a true second record by accident). If an action fails mentioning Spark Control or Qdrant, the local-model gateway or the search database isn't reachable from the box — that's an infrastructure check, not a data problem. The interaction log is the place to see what happened and when.
+
+---
+
+## 4. The agent workflows — what's live, what's coming, and the approval gates
+
+The cardinal rule across all of them: **agents draft, partners approve, and nothing goes outbound without a human.** No agent emails an LP, posts publicly, or contacts a prospect on its own. Ever.
+
+| Agent | What it does | Status |
+|---|---|---|
+| **Architect** | Collaborative copilot for the thesis: generates competing framings of a claim, turns your critique into a clean edit, red-teams LP objections, and grounds every claim in real evidence from our corpus. | **Starting now (Phase 1)** |
+| **Scout** | Monitors public sources (X, filings) for trigger events worth acting on. | Coming (Phase 2) |
+| **Analyst** | Builds LP dossiers, enriches records with public info, maps warm-intro paths. | Coming (Phase 2) |
+| **Scribe** | Distributes the approved thesis as content across channels — read-only consumer of what the Architect produces. | Coming (after Architect) |
+| **Closer** | Drafts outreach, nurture sequences, and meeting prep. | Coming (Phase 3, gated) |
+| **Orchestrator** | Schedules and routes work between the agents. | Coming (Phase 3) |
+
+**The Architect, concretely (because it's the one you'll use first).** It is *not* a one-shot thesis generator. It's a workbench for **exploration → convergence → continual evolution.** You bring the seed of a thesis; it helps you sharpen it claim by claim. Each claim is a small, separately-editable node, so you can rework one argument without re-litigating the whole narrative, and hold competing phrasings side by side. Crucially, the Architect can *draft and stage* a candidate thesis version, but **it cannot make a version canonical.** Promoting a version to "this is our official thesis" is a deliberate human action through a partner-authenticated route — the plan supports single- or dual-partner sign-off (an open decision, see below). Once approved, that version becomes the single source every downstream agent reads, and it's logged in the interaction log as a human decision. Scribe and Closer can never generate against an unapproved draft.
+
+**The approval gates, summarized.** (1) Canonicalizing the thesis is a human-only action. (2) Any outbound message (Closer/Scribe) is drafted by an agent and *sent by a human* after review. (3) When agents reason over sensitive record content, it passes through the redaction boundary first. (4) The entire outbound capability is *blocked* until counsel has defined our solicitation posture — we don't ship cold outreach before that gate clears.
+
+---
+
+## 5. Best practices — getting the most out of it
+
+**Habits that compound:**
+
+- **Log richly and consistently.** This is the single biggest lever. Substance over checkbox. (See §3.)
+- **Tag and segment deliberately.** As segments firm up (e.g. family office, institution, bitcoin-native HNWI, energy player), assigning each LP to the right segment is what lets the Architect tailor "what this audience needs to hear" and lets us say the right thing to the right person.
+- **Use one real record per person.** Resolve duplicates when flagged; don't paper over them.
+- **Keep the index fresh.** Refresh after meaningful batches of edits so search and the agents reflect reality.
+- **Treat the thesis as versioned.** When the message evolves, evolve a claim node and re-approve — don't overwrite history. The whole point is recoverable iteration.
+
+**What NOT to do:**
+
+- **Don't bulk-export the LP list** to any third-party tool. Sovereignty is the line we don't cross.
+- **Don't paste real LP data or query results into a public Claude/ChatGPT session.** The local pipeline exists precisely so we don't have to.
+- **Don't treat the search index as the source of truth.** It's derived from the CRM and rebuildable in minutes; the CRM is canonical. If they ever disagree, the CRM wins and you rebuild the index.
+- **Don't let an agent's draft go out unreviewed.** A draft is a draft until a partner approves and sends it.
+- **Don't route bulk email ingest through Superhuman** (or any external mail tool) — use the built-in sovereign Gmail capture, which keeps mail on our box. Superhuman is great for *your* inbox triage and drafting; it's not our system of record.
+
+---
+
+## 6. This is a living document
+
+**Last updated:** 2026-06-05 · **Maintained by:** the build team, alongside the partners.
+
+This guide will expand as each agent comes online. Things deliberately left open for later phases:
+
+- **Thesis approval policy** — single-partner vs. dual partner sign-off (the dual-approver workflow this guide is partly written for is still being decided).
+- **LP segments** — the firm-defined audience set and the per-segment "what to say / what to avoid" is content the partners supply, not something the system invents.
+- **The agents themselves** — Scout, Analyst, Scribe, Closer, Orchestrator are described here as intent; their operating instructions get written when they're built.
+- **The compliance gate** — outbound capability stays off until counsel defines solicitation posture, accreditation/QP verification, and recordkeeping rules.
+- **Automatic index refresh** — today's manual "Refresh" button becomes a scheduled background sync.
+
+When in doubt about an operating question this guide doesn't answer, ask — and we'll fold the answer back in here.
@@ -0,0 +1,83 @@
+# Phase 1 — The Architect: Kickoff Brief
+
+**Goal:** stand up the **Architect** — a collaborative copilot that helps the partners *converge on* and then *continually evolve* a **versioned, evidence-grounded, partner-approved canonical thesis** (the "messaging source of truth"). The Architect drafts and pressure-tests; a partner signs off; the approved thesis becomes the single source every later agent reads. **Internal-only, collaborative, no outbound** (that's Scribe/Closer, later).
+
+See `CLAUDE.md` for settled architecture + guardrails; `docs/Ten31_Agentic_Build_Plan.md` §3–5. This brief assumes Phase 0 is built and deployed (canonical entities, the CRM MCP server with retrieval modes + `interaction_log`, the ingest pipeline).
+
+## Design stance (the load-bearing constraint)
+
+Grant: *"we are gravitating towards what we think the key message is, but we have NOT landed on it, and we may iterate over time."* So the Architect is **not** a one-shot generator of a finished thesis — it is a substrate for **exploration → convergence → continual evolution.** The unit of iteration is a small typed node (one claim, one proof-point, one throughline), not a monolithic doc, so partners can rework one claim without re-litigating the whole narrative, hold competing phrasings side by side, and promote a winner while keeping the rest recoverable.
+
+## What the partners must supply (the content the substrate can't invent)
+
+The Architect *sharpens an existing thesis; it does not author one from nothing.* These are inputs, co-authored in the first Architect sessions:
+- [ ] **Thesis seed (v1):** the current-best throughline (scarcity / critical-infrastructure tying bitcoin ↔ AI infrastructure / energy / freedom tech) broken into **3–5 pillars** and a first set of testable **claims**.
+- [ ] **LP segments** (build-plan open decision #4): confirm/define the distinct audiences (proposed starter set: family office, institution, bitcoin-native HNWI, energy player) and, per segment, *what they need to hear* and *what to avoid saying*.
+- [ ] **Voice:** tone/diction, "this is us / this is not us" before-after examples, sacred phrases, words we never use.
+- [ ] **Approval policy:** who may promote a thesis version to canonical (any admin? Grant specifically? dual partner sign-off?).
+
+## Workstream A — Thesis artifact + versioning *(substrate; buildable now, no content needed)*
+
+Additive, reversible migration `0002_phase1_architect.sql` (+ `.down.sql`) via the existing `core_migrations.py` runner, reusing every Phase-0 convention (full-length ids, soft-delete only, `interaction_log` on every write).
+- **`thesis_nodes`** — typed node tree (`thesis_root → throughline/section → claim → proof_point → objection/rebuttal → segment_cut`), `ord` as REAL (stable insert-between), `variant_group` for competing A/B phrasings, `status` (draft|candidate|approved|retired).
+- **`thesis_node_revisions`** — append-only per-node history (prior content + `change_summary`/`change_reason`/actor/`claude_session_id`): fine-grained undo + provenance.
+- **`thesis_versions`** — immutable named snapshots; **a DB-level partial-unique index guarantees at most one `canonical` version per thesis.** Each approved version also freezes a `body_json` (throughline, pillars, claims, proof-points, segment angles, voice, guardrails) — the stable, machine-readable **Architect→Scribe contract**.
+- **Publish-on-approval:** approving a version publishes *only its* nodes into the existing Qdrant `crm_chunks` collection under new `thesis_*` `doc_type`s (idempotent), and prunes the prior version's thesis chunks — so a downstream search for "the message" returns the approved version, never a draft.
+
+*(Design decision to confirm: the fine-grained node tree (powerful for iteration) plus a frozen `body_json` snapshot per approved version (simple, stable contract for Scribe) — keep both; the tree is the editing surface, the snapshot is the published artifact.)*
+
+## Workstream B — The collaborative loop (Architect skills)
+
+The copilot session is turn-based propose → react → revise, delivered as Agent SDK **skills** (one per move, independently testable) over a new `backend/mcp/architect_tools.py` surface (drafts only, every move logged; the agent can stage candidates but **cannot cross the canonical gate**):
+1. **Vary** — generate ≥3 genuinely distinct framings of a target node, scored (sharpness, differentiation, evidence-backing, segment-portability, credibility).
+2. **Revise** — turn a free-text partner critique into a faithful before/after edit (never silently drop a framing the partner liked).
+3. **Red-team** — anticipate LP objections per segment, each with our drafted answer + an honest *substantiated / hand-wavy* flag.
+4. **Consistency-check** — when a throughline/pillar changes, surface every downstream node that now conflicts + a proposed reconciliation (apply none without partner acceptance).
+5. **Substantiate (ground)** — see Workstream D.
+Plus a session-orchestration skill that loads state, sequences moves, and resumes across sessions (replays deferred proposals, the open-objection ledger, still-weak claims) — proving iteration spans sessions, not just turns.
+
+## Workstream C — Segments & voice
+
+`segments` table (versioned; one `active` row per `segment_key`); reuse the Phase-0 `canonical_entities.segment` field as the pointer tagging each LP to a segment (closes the loop between *who an LP is* and *what we say to them*). Voice + each segment cut become skills (`ten31-voice`, `ten31-thesis-spine`, `ten31-segment-cut`); a segment cut must trace every claim to a spine pillar (orphans/contradictions surfaced), and a **drift flag** fires when a cut's spine version falls behind the active spine.
+
+## Workstream D — Grounding & defensibility
+
+The corpus is a **defensibility oracle, not a generator.** Each claim is a structured object (`draft|grounded|contested|retired`) that cannot leave `draft` without ≥1 **citation bundle pinned to a stable `source_model:source_id`** *and* a completed **counter-evidence sweep** (the negation framing, not just the claim). An **objection register** per claim is assembled from `get_interaction_history` + `keyword_search` (recurring LP pushback). Stale-evidence (>~12mo via `date_ts`) is flagged for revalidation. Uses the Phase-0 retrieval modes unchanged.
+- **Sovereignty:** retrieval + embeddings stay local. The thesis *content* is non-LP-specific messaging substance → generally fine to send to Claude as-is. But the *evidence* (real LP conversations) used to ground it is sensitive → the Claude-facing synthesis step routes through the **redaction/re-hydration boundary** (`docs/redaction-rehydration.md`). **The Architect is the first agent to send retrieved record substance to Claude, so this boundary must be built here** (scrub/rehydrate at Spark Control).
+
+## Workstream E — Approval gate & Scribe contract
+
+Canonicalization is a **logged human action**, enforced by capability not convention:
+- The promote-to-canonical edge is a **human-authenticated CRM route** (`POST /api/thesis/{id}/approve`, Bearer + admin) on `server.py` — **not exposed as an agent tool.** It atomically supersedes the prior canonical and writes a `thesis.approved` `interaction_log` row (`actor_type='human'`, real `users.id`).
+- A thin **"Thesis review" view** in the existing SPA (`frontend/index.html`): diff candidate vs canonical, Approve / Request-changes.
+- `get_canonical_thesis()` returns the one canonical version's `body_json`, **fails closed** if none — so Scribe can never generate against an unapproved thesis.
+- **Architect↔Scribe boundary:** Architect owns/articulates and writes only to `thesis_versions` (never outbound); **Scribe is a read-only consumer** of the canonical version, stamps each draft with the source `thesis_version_id`, and routes through its *own* separate review-before-publish gate.
+
+## Acceptance criteria
+
+- [ ] Migration 0002 (additive, reversible) creates the thesis tables with the one-canonical invariant; applies + reverses cleanly via the existing runner.
+- [ ] A thesis exists as a typed node tree with a seeded, partner-signed **canonical v1**; renders back to a coherent document; supports competing variants with logged, reversible promotion.
+- [ ] A partner can run a full session (load → intent → any of the 5 moves in any order → accept/reject/defer → converge) in one Agent SDK conversation; a later session resumes from prior state.
+- [ ] No claim promotes past `draft` without a pinned citation + counter-sweep; every citation is auditable back to its memo/call/email/note.
+- [ ] **No version becomes canonical except through the human route; the agent has no self-promotion path** (tested). Every transition is in `interaction_log`.
+- [ ] The redaction boundary is built and asserts no Tier-1 content / no real Tier-2 identifier reaches Claude in the grounding step (golden-file test).
+- [ ] The Architect→Scribe `body_json` contract is documented; a (future) Scribe draft is traceable to the exact approved `thesis_version_id`.
+- [ ] No outbound/publish/contact capability anywhere in the Architect surface (guardrails #4, #6).
+
+## Out of scope for Phase 1 (Architect sub-phase)
+
+- The Scribe *build* (distribution/publishing) — defined here only as the downstream contract; built as the next sub-phase with review-before-publish.
+- Any outbound send, public post, or LP contact. Scout/Analyst (Phase 2), Closer/Orchestrator (Phase 3).
+
+## Suggested order
+
+A (substrate) → E (gate + contract) → B (loop skills) → C (segments/voice) → D (grounding + redaction boundary). **A and E are buildable now without the thesis content;** B–D become useful once the partners seed v1. Start the content prep (the four inputs above) in parallel.
+
+## Open decisions for the owner
+
+1. **One thesis line, or several** (one throughline vs. per-vertical theses)?
+2. **The four content inputs** above (seed, segments, voice, approval policy) — the critical path.
+3. **Approval:** single-partner vs. dual sign-off; a dedicated `thesis_approver` capability vs. reuse `admin`.
+4. **Grounding dials** (partner-set, not Claude): rerank-score floor for "real" support; doc_type source-weighting (memo/transcript > one-line note?); counter-evidence threshold to mark a claim `contested`.
+5. **Phase-1 evidence scope:** internal corpus only, or admit external sources (web/filings) before Scout/Analyst? And is the Gmail corpus live/backfilled (thin email evidence if not)?
+6. **May a claim be promoted while still `contested`/unsubstantiated** (a deliberate bet), or must the gate block on unresolved weaknesses?