Files
ten31-database/AGENTS.md
T
Keysat 2758ac81d3 Add daily-digest Phase A: per-package SMTP send + admin test endpoint (v0.1.0:75)
Groundwork for the daily activity digest: give the CRM an outbound mail path.
Today nothing leaves the box (Gmail capture + drafts only), so this adds a
dedicated, per-package SMTP account independent of any StartOS system-wide SMTP.

- configureDigestSmtp Start9 action: writes host/port/from/username/password/
  security to /data/secrets/smtp/* (password piped over stdin, never argv/env;
  per-field files, owner-only) — mirrors the setAnthropicApiKey pattern.
- docker_entrypoint.sh reads those at boot and exports SMTP_* (operator env wins).
- backend/smtp_send.py: stdlib smtplib wrapper reading SMTP_* (one code path for
  dev .env and the box); starttls/tls/none modes.
- POST /api/admin/digest/test-email (admin-only): proves the pipe. Recipients are
  restricted to the active-admin set — an arbitrary `to` is rejected, so the
  endpoint is not an open relay; send failures are logged, not echoed (an SMTP
  auth error can carry the credential).
- Tests: test_smtp_send.py (sender), test_smtp_endpoint.py (gating + relay
  restriction + no-leak). 18/18 backend green; s9pk typechecks.

Analysis/summarization for the digest body (Phase B) will run on Spark, never
Claude — the digest is deliberately un-anonymized. Decisions + Phase B plan in
ROADMAP.md.
2026-06-15 18:33:06 -05:00

114 lines
13 KiB
Markdown

# Ten31 Venture CRM + Agentic System — AGENTS.md
**The foundation is a self-hosted venture-fund CRM** — a purpose-built fundraising tool that replaced Airtable to (1) keep sensitive LP/prospect data off third-party servers, (2) drop subscription cost, and (3) fit the fund's workflow: managing ~150 existing LPs, tracking 250+ prospects, and running the capital-raise pipeline. Core CRM domain: contacts (investor/prospect/advisor), organizations, opportunities (the deal pipeline), communications, and LP profiles. The fund (Ten31, ~$200M AUM, bitcoin/energy/AI thesis) runs it on a Start9 box, accessed on the LAN or over Tailscale by a team of ~5. Schema/API tour: `docs/crm-overview.md`.
**The agentic system is new functionality built on top of that CRM** — an in-house AI layer to widen the fundraising funnel, sharpen the thesis, and automate outreach drafting. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the **Spark Control** gateway. **Phase 0/1 — no live outward-facing agents; agents draft, humans send.**
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(CRM)` and surface them before proposing next steps; triage with `/triage`.
## Stack (versions that matter)
- **Python 3.11, standard library only at runtime.** The CRM is one monolith, `backend/server.py` (~5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch (`do_GET`/`do_POST`). **Not FastAPI.** `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but **none are imported at runtime** (vestigial).
- **SQLite** at `data/crm.db` (WAL, `foreign_keys=ON`), opened per-request via `get_db()`. Schema via ordered migrations.
- **Frontend:** single `frontend/index.html`, inline-Babel React. **No build step.**
- Optional runtime deps, used only if present: `bcrypt`, `PyJWT` (`jwt`), `cryptography` (Gmail module).
- **MCP + ingest** (in the Docker image, not the bare CRM): `mcp==1.2.0` (FastMCP, `backend/mcp/server.py`), `fastembed==0.4.2`, `anthropic`, `cryptography==42.0.5`.
- **Packaging:** StartOS 0.4, TypeScript SDK (`@start9labs/start-sdk`) under `start9/0.4/startos/`. Live target is `start9/0.4/`.
- **Local models** (bge-m3 embeddings, bge-reranker-v2-m3, `/api/search`, Qdrant): always via Spark Control. Contract: `docs/EMBEDDINGS.md`.
## Commands
```bash
# Run locally (dev, port 8080; or ./start.sh <port>) — runs python3 backend/server.py
./start.sh
# Run prod-mode (Tailscale/beta) — requires CRM_SECRET_KEY
./start_beta.sh
# Sanity-check edits (there is no compiler/build for the CRM)
python3 -m py_compile backend/server.py
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction`
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
cd start9/0.4 && make
```
- **Migrations** apply automatically at startup (`backend/core_migrations.py`, `schema_migrations` ledger). See `docs/guides/migrations.md` before adding one.
- **Lint:** none configured.
## Directory layout (day-one)
- `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member).
- `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup.
- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders, wired in `server.init_db()`.
- `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical).
- `backend/mcp/``architect_agent.py` (Claude thesis copilot), `architect_tools.py`, `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP).
- `backend/email_integration/` — Gmail capture via domain-wide delegation + Tier-B draft creation (`compose.py`).
- `backend/redaction/``scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary.
- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes.
- `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation).
- `frontend/index.html` — the entire UI.
- `docs/` — architecture, phase plans, contracts, runbooks (see Deeper docs). `docs/guides/` — scoped subsystem rules (see below).
- `start9/0.4/` — StartOS package (`startos/utils.ts` holds `PACKAGE_VERSION`).
- `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored).
## Scoped guides
Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude/rules/` symlinks (scoped by `paths:` frontmatter). **Read the guide before editing that area:**
- **Migrations or seeders** (`backend/migrations/`, `core_migrations.py`, `thesis_seed.py`) → `docs/guides/migrations.md`
- **Thesis logic** (`backend/thesis_*.py`, `backend/mcp/architect_*.py`) → `docs/guides/thesis.md`
- **Redaction or any MCP/Claude path** (`backend/redaction/`, `backend/mcp/`) → `docs/guides/redaction.md`
- **Ingest / retrieval** (`backend/ingest/`) → `docs/guides/spark-ingest.md`
- **Email capture / drafts** (`backend/email_integration/`) → `docs/guides/email.md`
- **Building or deploying the s9pk** (`start9/`) → `docs/guides/packaging.md`
## Conventions
- **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
## Always
- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`, never production.
- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` first.
- **Get explicit user authorization before any production deploy/install** to `$START9_BOX_HOST`.
## Never
- **Never treat Qdrant (or any derived index) as source of truth** — the CRM/SQLite is canonical and rebuildable-from.
- **Never hard-delete** CRM records or thesis history — soft-delete/archive only.
- **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends.
- **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off.
- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).
- **Never commit secrets, `data/crm.db`, `.env`, or `data/backups/`** (all gitignored). Scan staged files before committing. (`.claude/` *is* tracked — `launch.json` and `rules/` symlinks ship with the repo; keep local-only settings in `.claude/settings.local.json`.)
- **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude.
- **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite.
- **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's.
## Deeper docs
- Full constitution + guardrails: `docs/ten31-constitution.md`
- Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md`
- Retrieval/embeddings contract: `docs/EMBEDDINGS.md`
- CRM schema/API tour: `docs/crm-overview.md`
- Current thesis handoff: `docs/thesis-handoff.md`
- Operations & runbooks: `docs/OPERATIONS.md`, `docs/go-live-runbook.md`, `docs/gmail-enablement-runbook.md`
## Current state
_Phase 0 substrate + Phase 1 thesis/outreach are built; **deployed box is v0.1.0:74**, **repo is at v0.1.0:75** (committed, not yet built/deployed). Longer-term backlog: `ROADMAP.md`._
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Deployed & verified live (2026-06-13):** v0.1.0:74 is **installed and healthy on the box** (`$START9_BOX_HOST` / immense-voyage.local). Grant confirms login works; `/assets/` traversal 404s live (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
- **Repo ahead of the box (committed, NOT yet built/deployed):** the box is pristine v74; `main` is at **v0.1.0:75** and carries two unshipped batches. (a) Post-v74: the **list-view soft-delete aggregate fix** (`server.py`: org `contact_count`/`total_funded`, contacts `comm_count`/`last_contact_date` now filter `deleted_at`), three **regression tests**, and an **aggregate test runner**. (b) **v0.1.0:75 — daily-digest Phase A** (outbound SMTP send): the **`configureDigestSmtp`** Start9 action writes a per-package SMTP account to `/data/secrets/smtp/*` (password over stdin; independent of any StartOS system-wide SMTP), `docker_entrypoint.sh` exports `SMTP_*`, `backend/smtp_send.py` (stdlib smtplib) sends, and admin **`POST /api/admin/digest/test-email`** proves the pipe (recipients restricted to the active-admin set — not an open relay). One `make` ships both batches.
- **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. Verified locally (py_compile, query exec, redaction/outreach tests, containment logic) + two reviewer passes.
- **Tests (2026-06-15):** **18/18 backend tests green** via `python3 backend/run_tests.py` (+`test_smtp_send.py`/`test_smtp_endpoint.py` this session). `py_compile` clean; the s9pk TypeScript typechecks (`cd start9/0.4 && npm run check`, deps installed); `docker_entrypoint.sh` passes `sh -n`. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
- **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only).
- **Known debt (P2, not deploy-blocking):** the **reports subsystem** (`handle_dashboard_report`/`handle_pipeline_report`/`handle_lp_breakdown_report`, ~16 aggregate queries over contacts/opportunities/communications/lp_profiles) still counts soft-deleted rows — the list/detail aggregates were fixed (v74 + the org/contacts list-view follow-up) but the reports were not; needs its own pass + report-endpoint tests; `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
- **Next:** 1) **build/deploy v0.1.0:75** (one `make` ships the list-view fix + digest Phase-A SMTP); then on the box: run **Configure Digest SMTP**, restart, and hit **Send Test Digest Email** to verify the pipe; 2) **digest Phase B** — daily scheduler + per-user→per-investor activity query (`deleted_at IS NULL`) + **Spark-narrative** summary (never Claude) → email all admins (decisions locked in `ROADMAP.md`); 3) **reports-subsystem soft-delete sweep** (~16 aggregates still leak; fix + tests); 4) `?limit=abc` crash (P2); 5) Grant + Jonathan freeze v2.0 canonical; 6) build reply-all; 7) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.