Files
ten31-database/AGENTS.md
T
Keysat c7b74a2704 Email search/query + windowed digest preview (v0.1.0:83)
Communications tab (search/query roadmap items 1 & 2):
- Fix the investor dropdown: the facet only listed grid investors, so it
  came back empty whenever email matched a classic contact or org domain
  (no grid id — the common case). It now mirrors the email list, resolving
  each link to a typed identity (fund:/org:/contact:/addr:) with precedence
  grid -> org -> contact -> address; investor_id accepts the typed key
  (bare id = fund: for back-compat) and an unknown prefix matches nothing.
- Add a date-range filter and a click-to-expand full-body view
  (GET /api/email/detail, admin, soft-delete-gated; body_text only, never
  raw remote HTML).
- Add a "Search content" mode: GET /api/email/search wraps the ingest
  hybrid_search over the Qdrant email index (doc_type=email), hydrated and
  soft-delete-filtered against SQLite (canonical), 503 if Spark/Qdrant down.

Daily digest:
- Settings -> Admin builds a digest over a chosen window (last 24h or since
  a date) as an in-app preview before sending (POST /api/admin/digest/preview),
  so the local-Spark summarizer can be verified on demand even on a quiet day.
  Manual send uses the same window; neither advances the daily cursor, so a
  preview never suppresses the scheduled digest.

Code-only, migrations no-op. 22/22 backend tests, render-smoke pass.
2026-06-16 20:46:15 -05:00

116 lines
18 KiB
Markdown

# Ten31 Venture CRM + Agentic System — AGENTS.md
**The foundation is a self-hosted venture-fund CRM** — a purpose-built fundraising tool that replaced Airtable to (1) keep sensitive LP/prospect data off third-party servers, (2) drop subscription cost, and (3) fit the fund's workflow: managing ~150 existing LPs, tracking 250+ prospects, and running the capital-raise pipeline. Core CRM domain: contacts (investor/prospect/advisor), organizations, opportunities (the deal pipeline), and communications; investor commitments live in the canonical `fundraising_*` grid (the legacy single-fund `lp_profiles` table was retired in v0.1.0:78). The fund (Ten31, ~$200M AUM, bitcoin/energy/AI thesis) runs it on a Start9 box, accessed on the LAN or over Tailscale by a team of ~5. Schema/API tour: `docs/crm-overview.md`.
**The agentic system is new functionality built on top of that CRM** — an in-house AI layer to widen the fundraising funnel, sharpen the thesis, and automate outreach drafting. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the **Spark Control** gateway. **Phase 0/1 — no live outward-facing agents; agents draft, humans send.**
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(CRM)` and surface them before proposing next steps; triage with `/triage`.
## Stack (versions that matter)
- **Python 3.11, standard library only at runtime.** The CRM is one monolith, `backend/server.py` (~5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch (`do_GET`/`do_POST`). **Not FastAPI.** `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but **none are imported at runtime** (vestigial).
- **SQLite** at `data/crm.db` (WAL, `foreign_keys=ON`), opened per-request via `get_db()`. Schema via ordered migrations.
- **Frontend:** single `frontend/index.html`, inline-Babel React. **No build step.**
- Optional runtime deps, used only if present: `bcrypt`, `PyJWT` (`jwt`), `cryptography` (Gmail module).
- **MCP + ingest** (in the Docker image, not the bare CRM): `mcp==1.2.0` (FastMCP, `backend/mcp/server.py`), `fastembed==0.4.2`, `anthropic`, `cryptography==42.0.5`.
- **Packaging:** StartOS 0.4, TypeScript SDK (`@start9labs/start-sdk`) under `start9/0.4/startos/`. Live target is `start9/0.4/`.
- **Local models** (bge-m3 embeddings, bge-reranker-v2-m3, `/api/search`, Qdrant): always via Spark Control. Contract: `docs/EMBEDDINGS.md`.
## Commands
```bash
# Run locally (dev, port 8080; or ./start.sh <port>) — runs python3 backend/server.py
./start.sh
# Run prod-mode (Tailscale/beta) — requires CRM_SECRET_KEY
./start_beta.sh
# Sanity-check edits (there is no compiler/build for the CRM)
python3 -m py_compile backend/server.py
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction`
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
cd start9/0.4 && make
```
- **Migrations** apply automatically at startup (`backend/core_migrations.py`, `schema_migrations` ledger). See `docs/guides/migrations.md` before adding one.
- **Lint:** none configured.
## Directory layout (day-one)
- `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member).
- `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup.
- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders, wired in `server.init_db()`.
- `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical).
- `backend/mcp/``architect_agent.py` (Claude thesis copilot), `architect_tools.py`, `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP).
- `backend/email_integration/` — Gmail capture via domain-wide delegation + Tier-B draft creation (`compose.py`).
- `backend/redaction/``scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary.
- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes.
- `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation).
- `frontend/index.html` — the entire UI.
- `docs/` — architecture, phase plans, contracts, runbooks (see Deeper docs). `docs/guides/` — scoped subsystem rules (see below).
- `start9/0.4/` — StartOS package (`startos/utils.ts` holds `PACKAGE_VERSION`).
- `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored).
## Scoped guides
Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude/rules/` symlinks (scoped by `paths:` frontmatter). **Read the guide before editing that area:**
- **Migrations or seeders** (`backend/migrations/`, `core_migrations.py`, `thesis_seed.py`) → `docs/guides/migrations.md`
- **Thesis logic** (`backend/thesis_*.py`, `backend/mcp/architect_*.py`) → `docs/guides/thesis.md`
- **Redaction or any MCP/Claude path** (`backend/redaction/`, `backend/mcp/`) → `docs/guides/redaction.md`
- **Ingest / retrieval** (`backend/ingest/`) → `docs/guides/spark-ingest.md`
- **Email capture / drafts + digest send** (`backend/email_integration/`, `backend/digest_mailer.py`, `backend/smtp_send.py`) → `docs/guides/email.md`
- **Building or deploying the s9pk** (`start9/`) → `docs/guides/packaging.md`
## Conventions
- **Investor model — the grid is canonical (since v0.1.0:78).** The `fundraising_*` grid is the **system of record**: an investor entity (row) → many contact "pills" → per-fund commitments. The classic `contacts` table is a **read-only per-person directory**, auto-populated from the grid — create/edit people in the grid, not the Contacts page. Email capture rolls multiple people up to one investor. The legacy single-fund `lp_profiles` model is **retired** (empty table kept, per never-hard-delete). Reconciling grid ↔ classic `contacts` to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`; digest mailer: `CRM_DIGEST_SENDER` (DWD impersonation sender) + `SMTP_HOST`/`SMTP_PORT`/`SMTP_SECURITY`/`SMTP_FROM`/`SMTP_USERNAME`/`SMTP_PASSWORD` (SMTP fallback); daily digest (Phase B): `CRM_DIGEST_ENABLED` + `CRM_DIGEST_SEND_HOUR` **only seed the first-boot default** — the live control is the DB policy (`app_settings.digest_policy`, set in Settings → Admin).
- **Config placement:** operational/feature toggles live in the **admin panel**, DB-backed via `app_settings` (read-merge through a `load_*_policy(conn)` helper shared by the API + any scheduler; precedence DB-row → env-seed → default), so they're discoverable and take effect live. Reserve StartOS actions / env for **secrets and deploy-time config** (SMTP creds, API keys, DWD sender). Precedent: `digest_policy` (`GET/PATCH /api/admin/digest/policy`), `fundraising_backup_policy`.
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
## Always
- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`, never production.
- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` first.
- **Get explicit user authorization before any production deploy/install** to `$START9_BOX_HOST`.
## Never
- **Never treat Qdrant (or any derived index) as source of truth** — the CRM/SQLite is canonical and rebuildable-from.
- **Never hard-delete** CRM records or thesis history — soft-delete/archive only.
- **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends.
- **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off.
- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).
- **Never commit secrets, `data/crm.db`, `.env`, or `data/backups/`** (all gitignored). Scan staged files before committing. (`.claude/` *is* tracked — `launch.json` and `rules/` symlinks ship with the repo; keep local-only settings in `.claude/settings.local.json`.)
- **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude.
- **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite.
- **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's.
## Deeper docs
- Full constitution + guardrails: `docs/ten31-constitution.md`
- Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md`
- Retrieval/embeddings contract: `docs/EMBEDDINGS.md`
- CRM schema/API tour: `docs/crm-overview.md`
- Current thesis handoff: `docs/thesis-handoff.md`
- Operations & runbooks: `docs/OPERATIONS.md`, `docs/go-live-runbook.md`, `docs/gmail-enablement-runbook.md`
## Current state
_Phase 0 substrate + Phase 1 thesis/outreach are built; **repo at v0.1.0:83, box at v0.1.0:82** (v83 built + verified locally, **not yet deployed** — needs authorization). v83 (latest): **email search/query + windowed digest preview** — Communications tab gains a fixed/typed investor dropdown, a date-range filter, a full-body view, and a semantic "Search content" mode; the Daily Digest gains an in-app windowed preview before send. Prior v82: front-end libs vendored + SRI-pinned + jsdom render-smoke build gate. **Decision (2026-06-16): the fundraising grid + email capture is the canonical system of record** — vestigial classic-CRM surfaces get pruned or repurposed (see `ROADMAP.md` → "Consolidate on the fundraising grid as canonical"). Longer-term backlog: `ROADMAP.md`._
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Built + verified locally, NOT yet deployed: v0.1.0:83** — **email search/query + windowed digest preview** (code-only, migrations no-op). Communications tab (`CommunicationsPage` + `email_integration/db.query_email_activity`): **fixed the investor dropdown** — the facet now mirrors the list with the digest's precedence (grid → org → contact → address) and **typed keys** (`fund:`/`org:`/`contact:`), so email matched only to a classic contact or org domain (no grid id — the common case, since `fundraising_contacts.email` is sparsely populated) now resolves to a real name and is selectable, instead of the dropdown being empty; added a **date-range filter** (`since`/`until`), and a **click-to-expand full-body view** (`GET /api/email/detail?id=``query_email_detail`, admin, soft-delete-gated, renders `body_text` escaped — never raw HTML). New **semantic content search**: a "Search content" toggle → `GET /api/email/search?q=` (`routes._h_search`) wrapping `ingest/search.py:hybrid_search` filtered to `doc_type='email'` (lazy import; **503** if Spark/Qdrant unreachable), **hydrated + soft-delete-filtered against SQLite** (`db.search_hit_emails` — never trust the derived index). **Daily Digest:** Settings → Admin now builds a digest over a chosen window (last 24h or since a date) as an **in-app preview** before sending (`POST /api/admin/digest/preview`); manual send uses the same window (`send-now` + `digest_scheduler.send_digest_window`); window resolved by `digest_builder.resolve_digest_window` (cap 92d). Both run the **real local-Spark summarizer** and **never touch the daily cursor**. Verified: 22/22 backend tests, `py_compile` clean, render-smoke pass. Spark summarization path confirmed reachable (synthetic probe). Detail: `docs/guides/email.md`.
- **Deployed & verified live: v0.1.0:82** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version``0.1.0:82`, migration chain `…81→82` clean, server up on `:8080`, schedulers + Gmail integration up). **v82 vendored React 18.3.1 / ReactDOM 18.3.1 / @babel/standalone 7.29.7 into `frontend/assets/vendor/`**, served same-origin with `sha384` SRI (no CDN, no outbound-internet dependency to render the UI), and added **`start9/0.4/render-smoke.mjs`** — a jsdom check (shipped-Babel transform asserts classic/non-module + parseable; real mount asserts the login UI renders) wired into the default `make` goal (`verified-build`), so every build is gated on the frontend actually rendering. Closes the v78 (blank screen) + v79 (Babel-8 ESM-import) class structurally. Detail: `docs/guides/packaging.md`. **Prior shipped & live:** v81 Communications-tab matched-only (`query_email_activity` gates on `EXISTS(email_investor_links)`; unmatched email captured but never shown; `docs/guides/email.md`); v80 admin-only email-activity panel (`GET /api/email/activity`); v78 retired `lp_profiles`/LP Tracker + repointed Dashboard "Total Committed" onto the grid (graveyard-excluded). **Digest fully live:** capture (DWD) → propose→approve; Gmail-DWD→SMTP transport; daily Phase-B digest (`digest_builder.py` + always-on `digest_scheduler.py` reading a DB policy + `send-now`); **auto-send defaults OFF** until Grant enables it in Settings → Admin. Detail: `docs/guides/email.md`.
- **Live since v74 (2026-06-13):** login works; `/assets/` traversal 404s (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). Security/privacy hardening (path-traversal close, outreach NER backstop, get-by-id soft-delete) shipped in v74 — detail in `EVALUATION.md`.
- **Tests (2026-06-16):** **22/22 backend tests green** via `python3 backend/run_tests.py`, `py_compile` clean. `test_email_activity_panel.py` now covers the **typed facet + org/contact resolution** (the dropdown fix), the **date-range filter**, the **detail view** (full body / recipients / attachments / soft-delete), and the **content-search route** (hydrate / drop-tombstoned / 503 / admin) with retrieval stubbed; `test_digest_builder.py` adds the **window resolver** + **`send_digest_window`** (no-cursor-touch) cases. Frontend **render smoke check** (`cd start9/0.4 && make render-smoke`) still gates the default `make` build. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
- **Decided, not yet built (detail in `ROADMAP.md`):** Pipeline adoption + a grid flag that auto-loads flagged investors as opportunities; **NL→safe-query** feature (search item 3 — the larger, separate build); CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (currently reply to the LP only). *(Done this session, v83: email search item 1 [activity query/panel gaps — typed facet fix + date range + full-body view] and item 2 [semantic content search] both shipped; daily-digest windowed preview→send.)*
- **Known debt (P2, not deploy-blocking):** **reports-subsystem soft-delete sweep**`handle_pipeline_report` + remaining report/aggregate queries over opportunities/communications still count soft-deleted rows (v78 shrank this surface: the `lp_profiles`/lp-breakdown aggregates are gone and the dashboard "Total Committed" is now grid-sourced); needs a pass + report-endpoint tests. Also `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`. *(Resolved v82: front-end CDN/SRI risk — libs vendored + SRI-pinned — and the render smoke check is now scripted into the build.)*
- **Doc drift to reconcile:** `crm-overview.md` + `EVALUATION.md` still describe `lp_profiles` as a live model in places — a doc-auditor pass should align them to "grid canonical, `lp_profiles` retired."
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
- **Next:** 1) **deploy v0.1.0:83** to the box (needs authorization) — then Grant validates the **digest windowed preview** (Settings→Admin → "Since" a ~30-day-ago date → **Preview** to see the real Spark narratives, then **Send**) and the **Communications** dropdown/date/search; 2) add an **auth regression test** asserting the 3 v79-gated GET endpoints (`/api/users`, `/api/email/status`, `/api/email/accounts`) reject members; 3) **reports-subsystem soft-delete sweep** + report-endpoint tests; 4) **Pipeline adoption** — grid flag → auto-load opportunities; 5) `?limit=abc` crash; 6) **NL→safe-query** (search item 3 — separate, larger); 7) Grant + Jonathan freeze v2.0 canonical; 8) build reply-all. *(Logged to ROADMAP: a build step that pre-compiles JSX to drop runtime Babel entirely — bigger, contradicts the "no build step" convention.)*