Files
ten31-database/AGENTS.md
T
Keysat 41def0f014 Handoff: Adopt the Pipeline done — v88 verified live, full round-trip smoked
Box and repo on v0.1.0:88; the +Pipeline -> board -> advance-stage -> remove
round-trip is verified on the box. Pipeline adoption is closed out; ROADMAP
item marked done and the Next list advances to the spark-control intake card.
2026-06-18 08:34:32 -05:00

119 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ten31 Venture CRM + Agentic System — AGENTS.md
**The foundation is a self-hosted venture-fund CRM** — a purpose-built fundraising tool that replaced Airtable to (1) keep sensitive LP/prospect data off third-party servers, (2) drop subscription cost, and (3) fit the fund's workflow: managing ~150 existing LPs, tracking 250+ prospects, and running the capital-raise pipeline. Core CRM domain: contacts (investor/prospect/advisor), organizations, opportunities (the deal pipeline), and communications; investor commitments live in the canonical `fundraising_*` grid (the legacy single-fund `lp_profiles` table was retired in v0.1.0:78). The fund (Ten31, ~$200M AUM, bitcoin/energy/AI thesis) runs it on a Start9 box, accessed over ClearNet (StartOS StartTunnel) with app-level user auth by a team of ~5 (Tailscale is not in use). Schema/API tour: `docs/crm-overview.md`.
**The agentic system is new functionality built on top of that CRM** — an in-house AI layer to widen the fundraising funnel, sharpen the thesis, and automate outreach drafting. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the **Spark Control** gateway. **Phase 0/1 — no live outward-facing agents; agents draft, humans send.**
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(CRM)` and surface them before proposing next steps; triage with `/triage`.
## Stack (versions that matter)
- **Python 3.11, standard library only at runtime.** The CRM is one monolith, `backend/server.py` (~5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch (`do_GET`/`do_POST`). **Not FastAPI.** `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but **none are imported at runtime** (vestigial).
- **SQLite** at `data/crm.db` (WAL, `foreign_keys=ON`), opened per-request via `get_db()`. Schema via ordered migrations.
- **Frontend:** single `frontend/index.html`, inline-Babel React. **No build step.**
- Optional runtime deps, used only if present: `bcrypt`, `PyJWT` (`jwt`), `cryptography` (Gmail module).
- **MCP + ingest** (in the Docker image, not the bare CRM): `mcp==1.2.0` (FastMCP, `backend/mcp/server.py`), `fastembed==0.4.2`, `anthropic`, `cryptography==42.0.5`.
- **Packaging:** StartOS 0.4, TypeScript SDK (`@start9labs/start-sdk`) under `start9/0.4/startos/`. Live target is `start9/0.4/`.
- **Local models** (bge-m3 embeddings, bge-reranker-v2-m3, `/api/search`, Qdrant): always via Spark Control. Contract: `docs/EMBEDDINGS.md`.
## Commands
```bash
# Run locally (dev, port 8080; or ./start.sh <port>) — runs python3 backend/server.py
./start.sh
# Run prod-mode (beta) — requires CRM_SECRET_KEY
./start_beta.sh
# Sanity-check edits (there is no compiler/build for the CRM)
python3 -m py_compile backend/server.py
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction`
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
cd start9/0.4 && make
```
- **Migrations** apply automatically at startup (`backend/core_migrations.py`, `schema_migrations` ledger). See `docs/guides/migrations.md` before adding one.
- **Lint:** none configured.
## Directory layout (day-one)
- `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member).
- `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup.
- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders, wired in `server.init_db()`.
- `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical).
- `backend/mcp/``architect_agent.py` (Claude thesis copilot), `architect_tools.py`, `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP).
- `backend/email_integration/` — Gmail capture via domain-wide delegation + Tier-B draft creation (`compose.py`).
- `backend/redaction/``scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary.
- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes.
- `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation).
- `backend/matrix_intake/` — Matrix intake bot (separate process; `matrix-nio`, isolated to this component): typed message → local-Qwen parse → in-thread approve → write via the CRM's own `log-communication`. See the matrix-intake guide.
- `frontend/index.html` — the entire UI.
- `docs/` — architecture, phase plans, contracts, runbooks (see Deeper docs). `docs/guides/` — scoped subsystem rules (see below).
- `start9/0.4/` — StartOS package (`startos/utils.ts` holds `PACKAGE_VERSION`).
- `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored).
## Scoped guides
Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude/rules/` symlinks (scoped by `paths:` frontmatter). **Read the guide before editing that area:**
- **Migrations or seeders** (`backend/migrations/`, `core_migrations.py`, `thesis_seed.py`) → `docs/guides/migrations.md`
- **Thesis logic** (`backend/thesis_*.py`, `backend/mcp/architect_*.py`) → `docs/guides/thesis.md`
- **Redaction or any MCP/Claude path** (`backend/redaction/`, `backend/mcp/`) → `docs/guides/redaction.md`
- **Ingest / retrieval** (`backend/ingest/`) → `docs/guides/spark-ingest.md`
- **Email capture / drafts + digest send** (`backend/email_integration/`, `backend/digest_mailer.py`, `backend/smtp_send.py`) → `docs/guides/email.md`
- **Building or deploying the s9pk** (`start9/`) → `docs/guides/packaging.md`
- **Matrix intake bot** (`backend/matrix_intake/`) → `docs/guides/matrix-intake.md`
## Conventions
- **Investor model — the grid is canonical (since v0.1.0:78).** The `fundraising_*` grid is the **system of record**: an investor entity (row) → many contact "pills" → per-fund commitments. The classic `contacts` table is a **read-only per-person directory**, auto-populated from the grid — create/edit people in the grid, not the Contacts page. Email capture rolls multiple people up to one investor. The legacy single-fund `lp_profiles` model is **retired** (empty table kept, per never-hard-delete). Reconciling grid ↔ classic `contacts` to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **opportunities/pipeline** aggregates were fixed in v0.1.0:87 (`handle_pipeline_report` + dashboard pipeline metrics now filter `deleted_at`), but the **reports** subsystem's **communications-side** aggregates (dashboard `recent_comms`/`comms_this_month`/`meetings_this_month`, activity report) still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`; digest mailer: `CRM_DIGEST_SENDER` (DWD impersonation sender) + `SMTP_HOST`/`SMTP_PORT`/`SMTP_SECURITY`/`SMTP_FROM`/`SMTP_USERNAME`/`SMTP_PASSWORD` (SMTP fallback); daily digest (Phase B): `CRM_DIGEST_ENABLED` + `CRM_DIGEST_SEND_HOUR` **only seed the first-boot default** — the live control is the DB policy (`app_settings.digest_policy`, set in Settings → Admin).
- **Config placement:** operational/feature toggles live in the **admin panel**, DB-backed via `app_settings` (read-merge through a `load_*_policy(conn)` helper shared by the API + any scheduler; precedence DB-row → env-seed → default), so they're discoverable and take effect live. Reserve StartOS actions / env for **secrets and deploy-time config** (SMTP creds, API keys, DWD sender). Precedent: `digest_policy` (`GET/PATCH /api/admin/digest/policy`), `fundraising_backup_policy`.
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
## Always
- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`, never production.
- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` first.
- **Get explicit user authorization before any production deploy/install** to `$START9_BOX_HOST`.
## Never
- **Never treat Qdrant (or any derived index) as source of truth** — the CRM/SQLite is canonical and rebuildable-from.
- **Never hard-delete** CRM records or thesis history — soft-delete/archive only.
- **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends.
- **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off.
- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).
- **Never commit secrets, `data/crm.db`, `.env`, or `data/backups/`** (all gitignored). Scan staged files before committing. (`.claude/` *is* tracked — `launch.json` and `rules/` symlinks ship with the repo; keep local-only settings in `.claude/settings.local.json`.)
- **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude.
- **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite.
- **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's.
## Deeper docs
- Full constitution + guardrails: `docs/ten31-constitution.md`
- Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md`
- Retrieval/embeddings contract: `docs/EMBEDDINGS.md`
- CRM schema/API tour: `docs/crm-overview.md`
- Current thesis handoff: `docs/thesis-handoff.md`
- Operations & runbooks: `docs/OPERATIONS.md`, `docs/go-live-runbook.md`, `docs/gmail-enablement-runbook.md`
## Current state
_Phase 0 + Phase 1 built; **box and repo at v0.1.0:88** (deployed & verified live 2026-06-18 — chain …86→88 clean, `0005_grid_pipeline_link.sql` applied on the box, server up; the full Pipeline **+Pipeline → board → advance-stage → remove** round-trip is live-smoked on the box). **The fundraising grid + email capture is the canonical system of record** (2026-06-16) — vestigial classic-CRM surfaces get pruned/repurposed. Deploy/feature history lives in git log + `start9/0.4/startos/versions/`; longer-term backlog + debt in `ROADMAP.md` / `EVALUATION.md`._
- **Adopt the Pipeline — grid drives the deal board — DEPLOYED & live-smoked 2026-06-18 (v0.1.0:88; the full +Pipeline → board → advance-stage → remove round-trip is verified on the box). v88 (frontend-only): retired the Pipeline page's "+ New Opportunity" button + its create-by-contact modal** — opportunities are now born **only** from a grid investor row (matches how the team works; the board is view + stage-management; button replaced with a muted "Add deals from the Fundraising Grid" hint). An **"Add to Pipeline"** row action on the fundraising grid opens a seed modal (primary contact / target fund / expected amount / stage / probability) and creates a durably-linked `opportunities` row via the new **`opportunities.fundraising_investor_id`** (migration 0005, additive + reversible). **Grid owns the link + seed; the board owns stage/probability/owner** — a grid save never reseeds a live opp (`POST /api/fundraising/pipeline/link` is idempotent, one live opp/investor). Contact is **reused from the grid's synced `fundraising_contacts.contact_id`** (the `POST /api/contacts` side-door is gone); grid `lead`→owner. Two **read-only** grid columns (Pipeline action + Pipeline Stage) are **injected on read** from the live opp and **stripped on write** (never persisted, never dirty the autosave). **Remove from pipeline** (`POST .../unlink`) **soft-deletes the opp; the grid row stays fully intact**; deleting an investor from the grid archives its orphaned opp (`reconcile_grid_pipeline_links`, after `sync_fundraising_relational`). **Folded in:** the standing P2 soft-delete leak in `handle_pipeline_report` + dashboard pipeline aggregates (archived opps no longer counted). Tests: `backend/test_grid_pipeline_link.py`; 28/28 suite green, render-smoke green; migration verified on a copy of `data/crm.db` and **applied clean on the box**. **Next: live-smoke on the box — add an investor to the pipeline, confirm it lands on the board, advance a stage, and remove (opp archived, grid row intact).** Detail + locked decisions in `ROADMAP.md` "Adopt the Pipeline".
- **Matrix intake bot — DEPLOYED & LIVE (2026-06-17), `backend/matrix_intake/`:** a separate-process bot (its `matrix-nio` dep isolated from the stdlib CRM) turning a typed Matrix-room message into a proposed fundraising-grid add/edit, written only after **in-thread human approval** (`yes`/`edit field=value`/`no`). Parse = local Qwen via Spark Control (no Claude/scrub, like the digest); writes reuse the CRM's own `POST /api/fundraising/log-communication` tagged `source="matrix_intake"`; new-vs-existing via read-only `GET /api/intake/match` (returns the grid row id → no duplicate). **Runs on the Spark as a docker-compose service** (`modelo32`, container `matrix-intake`, `restart: unless-stopped` → survives a reboot; `docker-compose.yml` at the repo root + `backend/matrix_intake/Dockerfile` bundling `backend/matrix_intake` + the stdlib `backend/ingest` Spark client; retired the old nohup launch, 2026-06-17). A spark-control dashboard card is still pending (handoff: `docs/handoffs/add-intake-bot-to-spark-control.md`). **Live-smoked end-to-end** (new-investor create + existing-investor note matched & appended, no dup). Server side shipped to the box as **v0.1.0:84** (`/api/intake/match` + `source` provenance — these were missing on v83, so the bot 404'd until v84); then UX adds: main-timeline nudge pointer, top-level-`yes`→thread redirect, clearer commit wording, note text in the grid line (v85 dropped the `[note]` tag). M3 (business-card photo) deferred (no Spark vision model). Guide: `docs/guides/matrix-intake.md`.
- **Matrix intake — fuzzy-match + conversational-edit pass — DEPLOYED & LIVE 2026-06-17 (box on v0.1.0:86, bot restarted on the Spark; `candidates` endpoint verified live); `revise` leg live-smoked 2026-06-17, fuzzy disambiguation grammar still un-smoked.** Closes the two locked post-deploy enhancements (ROADMAP). **(a) Fuzzy matching (server-side, ships in the s9pk):** `find_intake_candidates` in `server.py` (deterministic — stdlib `difflib` name similarity + token-set Jaccard, legal-suffix-aware via `_strip_legal_suffix`, + email Levenshtein ≤ 2; ranked, ≥0.62, top 5); `GET /api/intake/match` now returns `{match, candidates}`. The bot surfaces a numbered shortlist (`_stage="disambiguate"`) so a near-duplicate ("Charlie"/"Charles", "Acme Capital"/"Acme Capital LLC", a one-char email typo) is **confirmed by a human** instead of silently creating a second investor — never auto-attached. **The optional LLM-judge re-rank was deferred** (deterministic filter already surfaces the cases; LLM is the right shortlist *pruner* if noise proves real). **(b) Conversational edits (bot-side, ships on the Spark):** any in-thread reply that isn't `yes`/`no`/`edit field=value``parse.revise` re-runs `{proposal + instruction}` through local Qwen and re-renders the card; **email integrity preserved** (a changed address must literally appear in the instruction; the model's email field is never trusted); no-op revisions re-prompt (`same_fields`). **Deploy is split:** the `candidates` need an **s9pk build+install** (v86); the bot's disambiguation+revise need a **Spark `git pull` + restart** — a bot restart alone won't deliver `candidates` (box returns `[]`, bot safely proposes new). Tests green; the Qwen `revise` leg is now live-smoked (2026-06-17, with the roster fix below); the fuzzy **disambiguation** numbered-pick grammar is the one in-room path still un-smoked. Guide updated.
- **Matrix intake — team-roster parse frame — DEPLOYED & LIVE 2026-06-17 (bot at `c1ea176` on the Spark; live-smoked).** Fixes the live-smoke gripe where *"jonathan is chatting with wyoming"* extracted the teammate, not the prospect. `parse.build_system(roster)` appends an **outreach frame** when `INTAKE_TEAM_ROSTER` (`.env`, comma-separated names/initials, case-insensitive — 11 entries live) is set: roster names are the people *doing* outreach and are **never extracted** as investor/contact — the *other* party is the prospect. Same framing on `revise`; roster read once at startup (`settings.team_roster()`, logs `team roster loaded (N names)`), so a roster change needs a bot restart. **Bot-side only — no s9pk bump** (box stays v0.1.0:86); shipped via Spark `git pull` + `docker compose up -d --build` + the new `.env` var. Roster unset → prior behavior. Tests: +3 in `test_parse.py`. Guide updated.
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Deploy history (done — git log + `start9/0.4/startos/versions/` + guides):** v74 security/path-traversal hardening; v78 retired `lp_profiles`/LP Tracker (grid is canonical); v8083 email-activity Communications tab (typed investor facet, date filter, full-body view, semantic content search) + daily-digest windowed preview→send (`docs/guides/email.md`); v82 vendored + SRI-pinned front-end libs + jsdom render-smoke build gate (`docs/guides/packaging.md`).
- **Tests:** **27/27 backend green** (`python3 backend/run_tests.py`), `py_compile` clean; frontend render-smoke gates the default `make` build.
- **Debt (P2, not deploy-blocking; full list `EVALUATION.md`):** reports-subsystem soft-delete sweep — **pipeline/opportunities aggregates fixed v87**; remaining: the dashboard **communications** aggregates (`recent_comms`/`comms_this_month`/`meetings_this_month`) + activity report + report-endpoint tests; `?limit=abc` crashes the request thread; auth regression test for the 3 v79-gated GETs (`/api/users`, `/api/email/status`, `/api/email/accounts`); scrub-gateway TLS verify off; hardcoded Spark/Qdrant IPs + **oversized StartOS package icon** (fix before the next s9pk upload); the 5.4k-line `server.py` monolith.
- **Open / risks:** the v2.0 reserve-asset spine is the *working* approved spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off; Appendix-A conviction incl. ~40% Strike stays Grant's working read, not fed to the engine); **Claude/Architect path still unverified live on the box**; the intake matcher reads only the grid blob (not classic `contacts`); doc drift — `crm-overview.md` + `EVALUATION.md` still call `lp_profiles` live (doc-auditor pass).
- **Next:** 1) **spark-control intake dashboard card** (separate session in the spark-control repo — handoff at `docs/handoffs/add-intake-bot-to-spark-control.md`), and longer-term **extract the bot to its own repo** (ROADMAP); 2) in-room smoke of the intake **disambiguation** numbered-pick grammar (the one unexercised path) — and a roster-tuning pass if any teammate name/initial still slips through; 3) **NL→safe-query** (search item 3 — separate, larger build); 4) Grant + Jonathan freeze v2.0 canonical; 5) reply-all for Tier-B drafts; then clear the P2 debt (reports comms-aggregate soft-delete sweep, `?limit=abc` crash, auth regression test, oversized StartOS icon, etc.). **Possible Pipeline follow-ups if wanted:** drag-and-drop stage moves on the board; surface `expected_amount × probability` weighting.