115 lines
18 KiB
Markdown
115 lines
18 KiB
Markdown
# Ten31 Venture CRM + Agentic System — AGENTS.md
|
|
|
|
**The foundation is a self-hosted venture-fund CRM** — a purpose-built fundraising tool that replaced Airtable to (1) keep sensitive LP/prospect data off third-party servers, (2) drop subscription cost, and (3) fit the fund's workflow: managing ~150 existing LPs, tracking 250+ prospects, and running the capital-raise pipeline. Core CRM domain: contacts (investor/prospect/advisor), organizations, opportunities (the deal pipeline), and communications; investor commitments live in the canonical `fundraising_*` grid (the legacy single-fund `lp_profiles` table was retired in v0.1.0:78). The fund (Ten31, ~$200M AUM, bitcoin/energy/AI thesis) runs it on a Start9 box, accessed on the LAN or over Tailscale by a team of ~5. Schema/API tour: `docs/crm-overview.md`.
|
|
|
|
**The agentic system is new functionality built on top of that CRM** — an in-house AI layer to widen the fundraising funnel, sharpen the thesis, and automate outreach drafting. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the **Spark Control** gateway. **Phase 0/1 — no live outward-facing agents; agents draft, humans send.**
|
|
|
|
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
|
|
> items tagged `(CRM)` and surface them before proposing next steps; triage with `/triage`.
|
|
|
|
## Stack (versions that matter)
|
|
|
|
- **Python 3.11, standard library only at runtime.** The CRM is one monolith, `backend/server.py` (~5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch (`do_GET`/`do_POST`). **Not FastAPI.** `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but **none are imported at runtime** (vestigial).
|
|
- **SQLite** at `data/crm.db` (WAL, `foreign_keys=ON`), opened per-request via `get_db()`. Schema via ordered migrations.
|
|
- **Frontend:** single `frontend/index.html`, inline-Babel React. **No build step.**
|
|
- Optional runtime deps, used only if present: `bcrypt`, `PyJWT` (`jwt`), `cryptography` (Gmail module).
|
|
- **MCP + ingest** (in the Docker image, not the bare CRM): `mcp==1.2.0` (FastMCP, `backend/mcp/server.py`), `fastembed==0.4.2`, `anthropic`, `cryptography==42.0.5`.
|
|
- **Packaging:** StartOS 0.4, TypeScript SDK (`@start9labs/start-sdk`) under `start9/0.4/startos/`. Live target is `start9/0.4/`.
|
|
- **Local models** (bge-m3 embeddings, bge-reranker-v2-m3, `/api/search`, Qdrant): always via Spark Control. Contract: `docs/EMBEDDINGS.md`.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Run locally (dev, port 8080; or ./start.sh <port>) — runs python3 backend/server.py
|
|
./start.sh
|
|
# Run prod-mode (Tailscale/beta) — requires CRM_SECRET_KEY
|
|
./start_beta.sh
|
|
# Sanity-check edits (there is no compiler/build for the CRM)
|
|
python3 -m py_compile backend/server.py
|
|
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
|
|
python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py
|
|
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
|
|
python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction`
|
|
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
|
|
cd start9/0.4 && make
|
|
```
|
|
|
|
- **Migrations** apply automatically at startup (`backend/core_migrations.py`, `schema_migrations` ledger). See `docs/guides/migrations.md` before adding one.
|
|
- **Lint:** none configured.
|
|
|
|
## Directory layout (day-one)
|
|
|
|
- `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member).
|
|
- `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup.
|
|
- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders, wired in `server.init_db()`.
|
|
- `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical).
|
|
- `backend/mcp/` — `architect_agent.py` (Claude thesis copilot), `architect_tools.py`, `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP).
|
|
- `backend/email_integration/` — Gmail capture via domain-wide delegation + Tier-B draft creation (`compose.py`).
|
|
- `backend/redaction/` — `scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary.
|
|
- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes.
|
|
- `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation).
|
|
- `frontend/index.html` — the entire UI.
|
|
- `docs/` — architecture, phase plans, contracts, runbooks (see Deeper docs). `docs/guides/` — scoped subsystem rules (see below).
|
|
- `start9/0.4/` — StartOS package (`startos/utils.ts` holds `PACKAGE_VERSION`).
|
|
- `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored).
|
|
|
|
## Scoped guides
|
|
|
|
Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude/rules/` symlinks (scoped by `paths:` frontmatter). **Read the guide before editing that area:**
|
|
|
|
- **Migrations or seeders** (`backend/migrations/`, `core_migrations.py`, `thesis_seed.py`) → `docs/guides/migrations.md`
|
|
- **Thesis logic** (`backend/thesis_*.py`, `backend/mcp/architect_*.py`) → `docs/guides/thesis.md`
|
|
- **Redaction or any MCP/Claude path** (`backend/redaction/`, `backend/mcp/`) → `docs/guides/redaction.md`
|
|
- **Ingest / retrieval** (`backend/ingest/`) → `docs/guides/spark-ingest.md`
|
|
- **Email capture / drafts + digest send** (`backend/email_integration/`, `backend/digest_mailer.py`, `backend/smtp_send.py`) → `docs/guides/email.md`
|
|
- **Building or deploying the s9pk** (`start9/`) → `docs/guides/packaging.md`
|
|
|
|
## Conventions
|
|
|
|
- **Investor model — the grid is canonical (since v0.1.0:78).** The `fundraising_*` grid is the **system of record**: an investor entity (row) → many contact "pills" → per-fund commitments. The classic `contacts` table is a **read-only per-person directory**, auto-populated from the grid — create/edit people in the grid, not the Contacts page. Email capture rolls multiple people up to one investor. The legacy single-fund `lp_profiles` model is **retired** (empty table kept, per never-hard-delete). Reconciling grid ↔ classic `contacts` to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
|
|
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
|
|
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`; digest mailer: `CRM_DIGEST_SENDER` (DWD impersonation sender) + `SMTP_HOST`/`SMTP_PORT`/`SMTP_SECURITY`/`SMTP_FROM`/`SMTP_USERNAME`/`SMTP_PASSWORD` (SMTP fallback); daily digest (Phase B): `CRM_DIGEST_ENABLED` + `CRM_DIGEST_SEND_HOUR` **only seed the first-boot default** — the live control is the DB policy (`app_settings.digest_policy`, set in Settings → Admin).
|
|
- **Config placement:** operational/feature toggles live in the **admin panel**, DB-backed via `app_settings` (read-merge through a `load_*_policy(conn)` helper shared by the API + any scheduler; precedence DB-row → env-seed → default), so they're discoverable and take effect live. Reserve StartOS actions / env for **secrets and deploy-time config** (SMTP creds, API keys, DWD sender). Precedent: `digest_policy` (`GET/PATCH /api/admin/digest/policy`), `fundraising_backup_policy`.
|
|
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
|
|
|
|
## Always
|
|
|
|
- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`, never production.
|
|
- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` first.
|
|
- **Get explicit user authorization before any production deploy/install** to `$START9_BOX_HOST`.
|
|
|
|
## Never
|
|
|
|
- **Never treat Qdrant (or any derived index) as source of truth** — the CRM/SQLite is canonical and rebuildable-from.
|
|
- **Never hard-delete** CRM records or thesis history — soft-delete/archive only.
|
|
- **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends.
|
|
- **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off.
|
|
- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).
|
|
- **Never commit secrets, `data/crm.db`, `.env`, or `data/backups/`** (all gitignored). Scan staged files before committing. (`.claude/` *is* tracked — `launch.json` and `rules/` symlinks ship with the repo; keep local-only settings in `.claude/settings.local.json`.)
|
|
- **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude.
|
|
- **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite.
|
|
- **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's.
|
|
|
|
## Deeper docs
|
|
|
|
- Full constitution + guardrails: `docs/ten31-constitution.md`
|
|
- Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md`
|
|
- Retrieval/embeddings contract: `docs/EMBEDDINGS.md`
|
|
- CRM schema/API tour: `docs/crm-overview.md`
|
|
- Current thesis handoff: `docs/thesis-handoff.md`
|
|
- Operations & runbooks: `docs/OPERATIONS.md`, `docs/go-live-runbook.md`, `docs/gmail-enablement-runbook.md`
|
|
|
|
## Current state
|
|
|
|
_Phase 0 substrate + Phase 1 thesis/outreach are built; **box and repo at v0.1.0:81** (latest: **Communications tab is matched-only** — the email-activity panel now surfaces only email linked to a known investor/contact; unmatched cold/unknown-sender email is captured but never shown; prior v80: repurposed the tab into the admin-only captured-Gmail search over the `email_*` tables). **Decision (2026-06-16): the fundraising grid + email capture is the canonical system of record** — vestigial classic-CRM surfaces get pruned or repurposed (see `ROADMAP.md` → "Consolidate on the fundraising grid as canonical"). Longer-term backlog: `ROADMAP.md`._
|
|
|
|
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
|
|
- **Deployed & verified live: v0.1.0:81** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version`→`0.1.0:81`, migration chain `…80→81` clean, server up on `:8080`, schedulers + Gmail integration up). **v0.1.0:81 makes the Communications tab matched-only:** `query_email_activity` now gates on `EXISTS(email_investor_links)`, so the panel surfaces only email linked to a known investor/contact; unmatched cold/unknown-sender email is still captured (metadata-only) and will appear automatically if its sender is later added as an investor — a read-side filter, no schema/capture change. Graveyard investors unaffected (their email has a link), still hidden from the picker but visible/searchable as an audit surface. Backend-only (frontend `index.html` byte-identical to v80, which was render-verified). **Prior — v0.1.0:80 repurposed the Communications tab into the admin-only email-activity panel:** new `GET /api/email/activity` (admin-enforced server-side) over the `email_*` tables, filterable by investor / mailbox / direction + free-text search; soft-delete honored on the per-mailbox sighting; direction decided at the email level (mirrors `digest_builder`); graveyard investors hidden from the picker but their email stays visible + searchable (audit surface). The classic manual "Log Communication" form was retired (the grid context menu remains the manual-log path); nav item + page are admin-only. Query lives in `email_integration/db.py:query_email_activity`; tests in `email_integration/test_email_activity_panel.py`. **Prior — v0.1.0:79 was a P0 hotfix:** the page loaded `@babel/standalone` from unpkg **unpinned**, so the CDN served **Babel 8.0.0**, whose `@babel/preset-react` automatic JSX runtime prepends an ESM `import {jsx} from "react/jsx-runtime"` — illegal in this classic (non-module) inline `<script>`, so the browser rejected the whole bundle and React never mounted → **blank screen for every user**. Fix: pin `@babel/standalone@7.29.7` (classic runtime; verified via headless render locally + on the box). Same release closed **3 server-side admin gaps** from a permissions audit — `GET /api/users`, `/api/email/status`, `/api/email/accounts` were UI-hidden from members but not API-enforced; all now `require_admin` (write endpoints were already gated). **Prior — v0.1.0:78 retired `lp_profiles` + the orphaned LP Tracker** (endpoints/handlers/lp-breakdown report/contact-dossier LP section/frontend component+redirect removed; empty table left in place per never-hard-delete) and **repointed the Dashboard "Total Committed"** onto `fundraising_investors.total_invested` (graveyard-excluded; "Total Funded" dropped — the grid has no funded concept). **Digest is fully live:** capture (DWD) → propose→approve; transport routes Gmail-DWD→SMTP (no app password); and **daily activity digest (Phase B)** — `digest_builder.py` (by-team-member Spark narrative + by-investor section, soft-delete filtered) + always-on `digest_scheduler.py` reading a DB policy + `send-now`. **Auto-send defaults OFF** (env seed unset → `app_settings.digest_policy` off) until Grant enables it in Settings → Admin. Detail: `docs/guides/email.md`.
|
|
- **Live since v74 (2026-06-13):** login works; `/assets/` traversal 404s (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). Security/privacy hardening (path-traversal close, outreach NER backstop, get-by-id soft-delete) shipped in v74 — detail in `EVALUATION.md`.
|
|
- **Tests (2026-06-16):** **22/22 backend tests green** via `python3 backend/run_tests.py` (`email_integration/test_email_activity_panel.py` updated for v81: matched-only scope — unmatched email never surfaces, not even by free-text search — plus investor/mailbox/search/direction filters, per-sighting soft-delete, email-level direction, mailbox + investor roll-ups, graveyard hidden-from-picker-but-visible, facets, route 401/403 admin enforcement; prior: `test_dashboard_report.py`, `test_digest_builder.py`). `py_compile` clean. Frontend render checked locally (jsdom mount + pinned-Babel transform). The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
|
|
- **Decided, not yet built (detail in `ROADMAP.md`):** Pipeline adoption + a grid flag that auto-loads flagged investors as opportunities; NL→safe-query feature; CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (currently reply to the LP only). *(Done v80: the admin-only per-investor/per-mailbox email-activity panel; v81: made that panel matched-only.)*
|
|
- **Known debt (P2, not deploy-blocking):** **reports-subsystem soft-delete sweep** — `handle_pipeline_report` + remaining report/aggregate queries over opportunities/communications still count soft-deleted rows (v78 shrank this surface: the `lp_profiles`/lp-breakdown aggregates are gone and the dashboard "Total Committed" is now grid-sourced); needs a pass + report-endpoint tests. Also `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; **front-end CDN libs still loaded from unpkg without SRI** — Babel is now version-pinned (v79, after an unpinned auto-upgrade to Babel 8 blanked the whole UI), but React/Babel should be **vendored into the package + SRI-pinned** so a CDN can never swap prod deps again; **deploy verification must include a browser-render smoke check** — v78's blank UI shipped as "verified live" because the checks were server-up/curl only, which can't catch a client render failure; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
|
|
- **Doc drift to reconcile:** `crm-overview.md` + `EVALUATION.md` still describe `lp_profiles` as a live model in places — a doc-auditor pass should align them to "grid canonical, `lp_profiles` retired."
|
|
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
|
|
- **Next:** 1) **Vendor + SRI-pin the front-end libs** (serve React/Babel from the package, integrity-checked) so a CDN can never swap prod deps again, **and script the render smoke check into deploy-verify** — a working jsdom-mount + pinned-Babel-transform check was run manually for v80 (catches the v78/v79 blank-screen class); wire it into the build/install flow; 2) add an **auth regression test** asserting the 3 v79-gated GET endpoints (`/api/users`, `/api/email/status`, `/api/email/accounts`) reject members (v80 added the analogous test for `/api/email/activity`); 3) Grant validates digest Phase B on the box — Settings→Admin **Send Digest Now**, then tick **Send automatically every day**; 4) **reports-subsystem soft-delete sweep** + report-endpoint tests; 5) **Pipeline adoption** — grid flag → auto-load opportunities; 6) `?limit=abc` crash; 7) **NL→safe-query** (separate, larger); 8) Grant + Jonathan freeze v2.0 canonical; 9) build reply-all.
|