Files
ten31-database/AGENTS.md
T
Keysat 7f9a15ebf3 Adopt the Pipeline: grid-driven opportunities link (v0.1.0:87)
The fundraising grid (canonical) now drives the classic opportunities
Pipeline board, instead of the board being a disconnected second data-entry
surface. An "Add to Pipeline" row action creates a durably-linked opportunity
via the new opportunities.fundraising_investor_id (migration 0005, additive +
reversible), reusing the grid's already-synced contact — retiring the
POST /api/contacts side-door — and mapping the grid lead to the opp owner.

Ownership is split so the two stay reconciled: the grid owns whether the link
exists and the seed; the board owns stage/probability/owner. The link endpoint
is idempotent (one live opp per investor; a re-link never reseeds funnel
fields). "Is in pipeline?"/"what stage?" are derived from a live opp join and
injected as read-only grid columns on read, stripped on write, so they never
persist or dirty the autosave. Remove-from-pipeline soft-deletes the opp and
leaves the grid row fully intact; deleting an investor from the grid archives
its orphaned opp.

Also fixes the standing soft-delete leak in handle_pipeline_report and the
dashboard pipeline aggregates, which counted tombstoned opportunities.

Tests: backend/test_grid_pipeline_link.py (link/idempotent/round-trip/guards/
unlink-intact/re-link/orphan/aggregates); 28/28 suite green, render-smoke green.
2026-06-17 23:08:36 -05:00

20 KiB
Raw Blame History

Ten31 Venture CRM + Agentic System — AGENTS.md

The foundation is a self-hosted venture-fund CRM — a purpose-built fundraising tool that replaced Airtable to (1) keep sensitive LP/prospect data off third-party servers, (2) drop subscription cost, and (3) fit the fund's workflow: managing ~150 existing LPs, tracking 250+ prospects, and running the capital-raise pipeline. Core CRM domain: contacts (investor/prospect/advisor), organizations, opportunities (the deal pipeline), and communications; investor commitments live in the canonical fundraising_* grid (the legacy single-fund lp_profiles table was retired in v0.1.0:78). The fund (Ten31, ~$200M AUM, bitcoin/energy/AI thesis) runs it on a Start9 box, accessed over ClearNet (StartOS StartTunnel) with app-level user auth by a team of ~5 (Tailscale is not in use). Schema/API tour: docs/crm-overview.md.

The agentic system is new functionality built on top of that CRM — an in-house AI layer to widen the fundraising funnel, sharpen the thesis, and automate outreach drafting. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the Spark Control gateway. Phase 0/1 — no live outward-facing agents; agents draft, humans send.

Inbox check: At session start, if ~/Projects/standards/INBOX.md exists, scan it for items tagged (CRM) and surface them before proposing next steps; triage with /triage.

Stack (versions that matter)

  • Python 3.11, standard library only at runtime. The CRM is one monolith, backend/server.py (~5k lines): a stdlib http.server.ThreadingHTTPServer + hand-written CRMHandler with manual path dispatch (do_GET/do_POST). Not FastAPI. backend/requirements.txt lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but none are imported at runtime (vestigial).
  • SQLite at data/crm.db (WAL, foreign_keys=ON), opened per-request via get_db(). Schema via ordered migrations.
  • Frontend: single frontend/index.html, inline-Babel React. No build step.
  • Optional runtime deps, used only if present: bcrypt, PyJWT (jwt), cryptography (Gmail module).
  • MCP + ingest (in the Docker image, not the bare CRM): mcp==1.2.0 (FastMCP, backend/mcp/server.py), fastembed==0.4.2, anthropic, cryptography==42.0.5.
  • Packaging: StartOS 0.4, TypeScript SDK (@start9labs/start-sdk) under start9/0.4/startos/. Live target is start9/0.4/.
  • Local models (bge-m3 embeddings, bge-reranker-v2-m3, /api/search, Qdrant): always via Spark Control. Contract: docs/EMBEDDINGS.md.

Commands

# Run locally (dev, port 8080; or ./start.sh <port>) — runs python3 backend/server.py
./start.sh
# Run prod-mode (beta) — requires CRM_SECRET_KEY
./start_beta.sh
# Sanity-check edits (there is no compiler/build for the CRM)
python3 -m py_compile backend/server.py
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
python3 backend/redaction/test_scrub_leak.py        # substitute any backend/**/test_*.py
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
python3 backend/run_tests.py                         # add substrings to filter, e.g. `... soft_delete redaction`
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
cd start9/0.4 && make
  • Migrations apply automatically at startup (backend/core_migrations.py, schema_migrations ledger). See docs/guides/migrations.md before adding one.
  • Lint: none configured.

Directory layout (day-one)

  • backend/server.py — the CRM monolith: HTTP handler, route dispatch, init_db(), auth (username/password → HS256 JWT, roles admin/member).
  • backend/core_migrations.py + backend/migrations/NNNN_*.sql (+ paired .down.sql) — additive schema migrations, applied at startup.
  • backend/thesis_seed.py — Thesis Workshop seed + idempotent ensure_* one-time seeders, wired in server.init_db().
  • backend/thesis_review.py — thesis version review/approval (human dual sign-off → canonical).
  • backend/mcp/architect_agent.py (Claude thesis copilot), architect_tools.py, outreach_agent.py (LP draft assistant), architect_grounding.py, crm_tools.py, server.py (FastMCP).
  • backend/email_integration/ — Gmail capture via domain-wide delegation + Tier-B draft creation (compose.py).
  • backend/redaction/scrub.py + client.py: the scrub→Claude→re-hydrate privacy boundary.
  • backend/ingest/ — chunk→embed→Qdrant + retrieval modes.
  • backend/entity_*.py — entity resolution/merge (the two-investor-model reconciliation).
  • backend/matrix_intake/ — Matrix intake bot (separate process; matrix-nio, isolated to this component): typed message → local-Qwen parse → in-thread approve → write via the CRM's own log-communication. See the matrix-intake guide.
  • frontend/index.html — the entire UI.
  • docs/ — architecture, phase plans, contracts, runbooks (see Deeper docs). docs/guides/ — scoped subsystem rules (see below).
  • start9/0.4/ — StartOS package (startos/utils.ts holds PACKAGE_VERSION).
  • data/crm.db — the live DB (gitignored). .env / .env.example — config (.env gitignored).

Scoped guides

Subsystem rules live in docs/guides/ and lazy-load in Claude Code via .claude/rules/ symlinks (scoped by paths: frontmatter). Read the guide before editing that area:

  • Migrations or seeders (backend/migrations/, core_migrations.py, thesis_seed.py) → docs/guides/migrations.md
  • Thesis logic (backend/thesis_*.py, backend/mcp/architect_*.py) → docs/guides/thesis.md
  • Redaction or any MCP/Claude path (backend/redaction/, backend/mcp/) → docs/guides/redaction.md
  • Ingest / retrieval (backend/ingest/) → docs/guides/spark-ingest.md
  • Email capture / drafts + digest send (backend/email_integration/, backend/digest_mailer.py, backend/smtp_send.py) → docs/guides/email.md
  • Building or deploying the s9pk (start9/) → docs/guides/packaging.md
  • Matrix intake bot (backend/matrix_intake/) → docs/guides/matrix-intake.md

Conventions

  • Investor model — the grid is canonical (since v0.1.0:78). The fundraising_* grid is the system of record: an investor entity (row) → many contact "pills" → per-fund commitments. The classic contacts table is a read-only per-person directory, auto-populated from the grid — create/edit people in the grid, not the Contacts page. Email capture rolls multiple people up to one investor. The legacy single-fund lp_profiles model is retired (empty table kept, per never-hard-delete). Reconciling grid ↔ classic contacts to canonical IDs is the core entity-resolution task — see docs/crm-overview.md.
  • Soft-delete only: deleted_at and/or status='retired'; never hard-delete. Every READ path must filter deleted_at IS NULL — list handlers, get-by-id, nested related-data sub-selects, and aggregate sub-selects (COUNT/SUM/MAX). Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view contact_count/total_funded/comm_count); the opportunities/pipeline aggregates were fixed in v0.1.0:87 (handle_pipeline_report + dashboard pipeline metrics now filter deleted_at), but the reports subsystem's communications-side aggregates (dashboard recent_comms/comms_this_month/meetings_this_month, activity report) still leak (see Current state). Regression-guarded by backend/test_soft_delete_reads.py. (Thesis has a subtlety here — see the thesis guide.)
  • Env: secrets in .env (gitignored); names in .env.example. Verified names: ANTHROPIC_API_KEY, SPARK_CONTROL_URL, SPARK_CONTROL_VERIFY_TLS, QDRANT_URL, X_API_KEY, CRM_DB_PATH, CRM_DEV_DB_PATH. Also used: CRM_SECRET_KEY (beta/prod), CRM_HOST/CRM_PORT, CRM_DATA_DIR; digest mailer: CRM_DIGEST_SENDER (DWD impersonation sender) + SMTP_HOST/SMTP_PORT/SMTP_SECURITY/SMTP_FROM/SMTP_USERNAME/SMTP_PASSWORD (SMTP fallback); daily digest (Phase B): CRM_DIGEST_ENABLED + CRM_DIGEST_SEND_HOUR only seed the first-boot default — the live control is the DB policy (app_settings.digest_policy, set in Settings → Admin).
  • Config placement: operational/feature toggles live in the admin panel, DB-backed via app_settings (read-merge through a load_*_policy(conn) helper shared by the API + any scheduler; precedence DB-row → env-seed → default), so they're discoverable and take effect live. Reserve StartOS actions / env for secrets and deploy-time config (SMTP creds, API keys, DWD sender). Precedent: digest_policy (GET/PATCH /api/admin/digest/policy), fundraising_backup_policy.
  • Commit style: imperative subject, concise body explaining the why; put the package version in the subject (… (v0.1.0:NN)) for shippable changes. No AI co-author / attribution trailers — commits are authored by the user.

Always

  • Verify before shipping: python3 -m py_compile the edited files; for DB logic, run the change against a copy of data/crm.db, never production.
  • Keep real LP data out of Claude: develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through backend/redaction first.
  • Get explicit user authorization before any production deploy/install to $START9_BOX_HOST.

Never

  • Never treat Qdrant (or any derived index) as source of truth — the CRM/SQLite is canonical and rebuildable-from.
  • Never hard-delete CRM records or thesis history — soft-delete/archive only.
  • Never let an agent send email, post, or contact an LP autonomously — agents draft; a human approves and sends.
  • Never set a thesis_version canonical from code/seeds — that is human dual sign-off.
  • Never call a Spark directly — go through Spark Control (SPARK_CONTROL_URL).
  • Never commit secrets, data/crm.db, .env, or data/backups/ (all gitignored). Scan staged files before committing. (.claude/ is tracked — launch.json and rules/ symlinks ship with the repo; keep local-only settings in .claude/settings.local.json.)
  • Never bulk-export the LP list to any third party; send only minimal non-sensitive context to Claude.
  • Never assume FastAPI / SQLAlchemy / pytest are in play — they sit in requirements.txt unused; runtime is stdlib + SQLite.
  • Never add a Co-Authored-By / "Generated with" trailer to commits or PRs — commits are the user's.

Deeper docs

  • Full constitution + guardrails: docs/ten31-constitution.md
  • Architecture & rationale: docs/Ten31_Agentic_Build_Plan.md
  • Retrieval/embeddings contract: docs/EMBEDDINGS.md
  • CRM schema/API tour: docs/crm-overview.md
  • Current thesis handoff: docs/thesis-handoff.md
  • Operations & runbooks: docs/OPERATIONS.md, docs/go-live-runbook.md, docs/gmail-enablement-runbook.md

Current state

Phase 0 + Phase 1 built; box live on v0.1.0:86; repo at v0.1.0:87 (built + locally verified, s9pk build + install pending). The fundraising grid + email capture is the canonical system of record (2026-06-16) — vestigial classic-CRM surfaces get pruned/repurposed. Deploy/feature history lives in git log + start9/0.4/startos/versions/; longer-term backlog + debt in ROADMAP.md / EVALUATION.md.

  • Adopt the Pipeline — grid drives the deal board — BUILT + locally verified 2026-06-17 (v0.1.0:87); s9pk build + box install pending. An "Add to Pipeline" row action on the fundraising grid opens a seed modal (primary contact / target fund / expected amount / stage / probability) and creates a durably-linked opportunities row via the new opportunities.fundraising_investor_id (migration 0005, additive + reversible). Grid owns the link + seed; the board owns stage/probability/owner — a grid save never reseeds a live opp (POST /api/fundraising/pipeline/link is idempotent, one live opp/investor). Contact is reused from the grid's synced fundraising_contacts.contact_id (the POST /api/contacts side-door is gone); grid lead→owner. Two read-only grid columns (Pipeline action + Pipeline Stage) are injected on read from the live opp and stripped on write (never persisted, never dirty the autosave). Remove from pipeline (POST .../unlink) soft-deletes the opp; the grid row stays fully intact; deleting an investor from the grid archives its orphaned opp (reconcile_grid_pipeline_links, after sync_fundraising_relational). Folded in: the standing P2 soft-delete leak in handle_pipeline_report + dashboard pipeline aggregates (archived opps no longer counted). Tests: backend/test_grid_pipeline_link.py; 28/28 suite green, render-smoke green; migration verified on a copy of data/crm.db. Next: build the s9pk (v87) and install to the box (needs authorization). Detail + locked decisions in ROADMAP.md "Adopt the Pipeline".

  • Matrix intake bot — DEPLOYED & LIVE (2026-06-17), backend/matrix_intake/: a separate-process bot (its matrix-nio dep isolated from the stdlib CRM) turning a typed Matrix-room message into a proposed fundraising-grid add/edit, written only after in-thread human approval (yes/edit field=value/no). Parse = local Qwen via Spark Control (no Claude/scrub, like the digest); writes reuse the CRM's own POST /api/fundraising/log-communication tagged source="matrix_intake"; new-vs-existing via read-only GET /api/intake/match (returns the grid row id → no duplicate). Runs on the Spark as a docker-compose service (modelo32, container matrix-intake, restart: unless-stopped → survives a reboot; docker-compose.yml at the repo root + backend/matrix_intake/Dockerfile bundling backend/matrix_intake + the stdlib backend/ingest Spark client; retired the old nohup launch, 2026-06-17). A spark-control dashboard card is still pending (handoff: docs/handoffs/add-intake-bot-to-spark-control.md). Live-smoked end-to-end (new-investor create + existing-investor note matched & appended, no dup). Server side shipped to the box as v0.1.0:84 (/api/intake/match + source provenance — these were missing on v83, so the bot 404'd until v84); then UX adds: main-timeline nudge pointer, top-level-yes→thread redirect, clearer commit wording, note text in the grid line (v85 dropped the [note] tag). M3 (business-card photo) deferred (no Spark vision model). Guide: docs/guides/matrix-intake.md.

  • Matrix intake — fuzzy-match + conversational-edit pass — DEPLOYED & LIVE 2026-06-17 (box on v0.1.0:86, bot restarted on the Spark; candidates endpoint verified live); revise leg live-smoked 2026-06-17, fuzzy disambiguation grammar still un-smoked. Closes the two locked post-deploy enhancements (ROADMAP). (a) Fuzzy matching (server-side, ships in the s9pk): find_intake_candidates in server.py (deterministic — stdlib difflib name similarity + token-set Jaccard, legal-suffix-aware via _strip_legal_suffix, + email Levenshtein ≤ 2; ranked, ≥0.62, top 5); GET /api/intake/match now returns {match, candidates}. The bot surfaces a numbered shortlist (_stage="disambiguate") so a near-duplicate ("Charlie"/"Charles", "Acme Capital"/"Acme Capital LLC", a one-char email typo) is confirmed by a human instead of silently creating a second investor — never auto-attached. The optional LLM-judge re-rank was deferred (deterministic filter already surfaces the cases; LLM is the right shortlist pruner if noise proves real). (b) Conversational edits (bot-side, ships on the Spark): any in-thread reply that isn't yes/no/edit field=valueparse.revise re-runs {proposal + instruction} through local Qwen and re-renders the card; email integrity preserved (a changed address must literally appear in the instruction; the model's email field is never trusted); no-op revisions re-prompt (same_fields). Deploy is split: the candidates need an s9pk build+install (v86); the bot's disambiguation+revise need a Spark git pull + restart — a bot restart alone won't deliver candidates (box returns [], bot safely proposes new). Tests green; the Qwen revise leg is now live-smoked (2026-06-17, with the roster fix below); the fuzzy disambiguation numbered-pick grammar is the one in-room path still un-smoked. Guide updated.

  • Matrix intake — team-roster parse frame — DEPLOYED & LIVE 2026-06-17 (bot at c1ea176 on the Spark; live-smoked). Fixes the live-smoke gripe where "jonathan is chatting with wyoming" extracted the teammate, not the prospect. parse.build_system(roster) appends an outreach frame when INTAKE_TEAM_ROSTER (.env, comma-separated names/initials, case-insensitive — 11 entries live) is set: roster names are the people doing outreach and are never extracted as investor/contact — the other party is the prospect. Same framing on revise; roster read once at startup (settings.team_roster(), logs team roster loaded (N names)), so a roster change needs a bot restart. Bot-side only — no s9pk bump (box stays v0.1.0:86); shipped via Spark git pull + docker compose up -d --build + the new .env var. Roster unset → prior behavior. Tests: +3 in test_parse.py. Guide updated.

  • Working (all draft-only): CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.

  • Deploy history (done — git log + start9/0.4/startos/versions/ + guides): v74 security/path-traversal hardening; v78 retired lp_profiles/LP Tracker (grid is canonical); v8083 email-activity Communications tab (typed investor facet, date filter, full-body view, semantic content search) + daily-digest windowed preview→send (docs/guides/email.md); v82 vendored + SRI-pinned front-end libs + jsdom render-smoke build gate (docs/guides/packaging.md).

  • Tests: 27/27 backend green (python3 backend/run_tests.py), py_compile clean; frontend render-smoke gates the default make build.

  • Debt (P2, not deploy-blocking; full list EVALUATION.md): reports-subsystem soft-delete sweep — pipeline/opportunities aggregates fixed v87; remaining: the dashboard communications aggregates (recent_comms/comms_this_month/meetings_this_month) + activity report + report-endpoint tests; ?limit=abc crashes the request thread; auth regression test for the 3 v79-gated GETs (/api/users, /api/email/status, /api/email/accounts); scrub-gateway TLS verify off; hardcoded Spark/Qdrant IPs + oversized StartOS package icon (fix before the next s9pk upload); the 5.4k-line server.py monolith.

  • Open / risks: the v2.0 reserve-asset spine is the working approved spine but not a canonical thesis_version (needs Grant + Jonathan dual sign-off; Appendix-A conviction incl. ~40% Strike stays Grant's working read, not fed to the engine); Claude/Architect path still unverified live on the box; the intake matcher reads only the grid blob (not classic contacts); doc drift — crm-overview.md + EVALUATION.md still call lp_profiles live (doc-auditor pass).

  • Next: 1) Pipeline adoption — BUILT (v0.1.0:87, above); ship it — build the s9pk + install to the box (needs authorization), then live-smoke the "Add to Pipeline"/board round-trip; 2) spark-control intake dashboard card (separate session in the spark-control repo — handoff at docs/handoffs/add-intake-bot-to-spark-control.md), and longer-term extract the bot to its own repo (ROADMAP); 3) in-room smoke of the intake disambiguation numbered-pick grammar (the one unexercised path) — and a roster-tuning pass if any teammate name/initial still slips through; 4) NL→safe-query (search item 3 — separate, larger build); 5) Grant + Jonathan freeze v2.0 canonical; 6) reply-all for Tier-B drafts; then clear the P2 debt above.