From ef869be082f2fd9741c4cd1063c673547e173102 Mon Sep 17 00:00:00 2001 From: Keysat Date: Fri, 12 Jun 2026 16:23:10 -0500 Subject: [PATCH] docs: add AGENTS.md as canonical agent guide; symlink CLAUDE.md Add a concise day-one AGENTS.md (stack, exact build/run/test/deploy commands, directory layout, conventions, Always/Never). Preserve the existing CLAUDE.md project constitution as docs/ten31-constitution.md (referenced from AGENTS.md) and point CLAUDE.md -> AGENTS.md so Claude Code loads the canonical guide. --- AGENTS.md | 90 ++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 90 +------------------------------------- docs/ten31-constitution.md | 89 +++++++++++++++++++++++++++++++++++++ 3 files changed, 180 insertions(+), 89 deletions(-) create mode 100644 AGENTS.md mode change 100644 => 120000 CLAUDE.md create mode 100644 docs/ten31-constitution.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..c33b6d4 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,90 @@ +# Ten31 Agentic System — AGENTS.md + +In-house AI-agent system over a self-hosted Start9 CRM (SQLite) for a bitcoin/energy/AI investment fund: widen the fundraising funnel, sharpen the thesis, automate outreach. Frontier reasoning runs on Claude (Agent SDK/API); privacy-sensitive and bulk work runs on local DGX Spark models via the **Spark Control** gateway. **Phase 0/1 — no live outward-facing agents; agents draft, humans send.** + +## Stack (versions that matter) + +- **Python 3.11, standard library only at runtime.** The CRM is one monolith, `backend/server.py` (~5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch (`do_GET`/`do_POST`). **Not FastAPI.** `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic/pytest-style deps but **none are imported at runtime** (vestigial). +- **SQLite** at `data/crm.db` (WAL, `foreign_keys=ON`), opened per-request via `get_db()`. Schema via ordered migrations. +- **Frontend:** single `frontend/index.html`, inline-Babel React. **No build step.** +- Optional runtime deps, used only if present: `bcrypt`, `PyJWT` (`jwt`), `cryptography` (Gmail module). +- **MCP + ingest** (in the Docker image, not the bare CRM): `mcp==1.2.0` (FastMCP, `backend/mcp/server.py`), `fastembed==0.4.2`, `anthropic`, `cryptography==42.0.5`. +- **Packaging:** StartOS 0.4, TypeScript SDK (`@start9labs/start-sdk`) under `start9/0.4/startos/`. Live target is `start9/0.4/`. +- **Local models** (bge-m3 embeddings, bge-reranker-v2-m3, `/api/search`, Qdrant): always via Spark Control. Contract: `docs/EMBEDDINGS.md`. + +## Commands + +```bash +# Run locally (dev, port 8080; or ./start.sh ) — runs python3 backend/server.py +./start.sh +# Run prod-mode (Tailscale/beta) — requires CRM_SECRET_KEY +./start_beta.sh +# Sanity-check edits (there is no compiler/build for the CRM) +python3 -m py_compile backend/server.py +# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed) +python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py (13 exist) +# Run all tests (no aggregate runner exists) +for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done +# Build the s9pk (x86_64 only) -> ten-database_x86_64.s9pk — BUMP THE VERSION FIRST (see Always) +cd start9/0.4 && make +# Install to the box — PRODUCTION; get explicit user OK first. TODO: confirm exact host/context. +start-cli package install -s ten-database_x86_64.s9pk # target: immense-voyage.local +``` + +- **Migrations** apply automatically at startup via `backend/core_migrations.py` from `backend/migrations/NNNN_*.sql`, tracked in a `schema_migrations` ledger. Verify a new one against a **copy** of `data/crm.db`, never production. +- **Lint:** none configured. + +## Directory layout (day-one) + +- `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member). +- `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup. +- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders (interaction_log sentinels), wired in `server.init_db()`. +- `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical). +- `backend/mcp/` — `architect_agent.py` (Claude thesis copilot), `architect_tools.py` (thesis CRUD/versions), `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP). +- `backend/email_integration/` — Gmail capture via domain-wide delegation: `credentials.py`, `matcher.py`, `parser.py`, `db.py`, `sync.py`, `scheduler.py`, `routes.py`, `compose.py` (Tier-B draft creation), `migrations/`. +- `backend/redaction/` — `scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary (`Boundary`, `SCRUB_BACKEND=local|gateway`, fail-closed). +- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes (`search.py`, `embed.py`, `qdrant_io.py`, `sparse.py`, `entity_resolution.py`). +- `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation). +- `frontend/index.html` — the entire UI. +- `docs/` — `Ten31_Agentic_Build_Plan.md` (architecture), `PHASE_0.md`/`PHASE_1.md`, `EMBEDDINGS.md` (retrieval contract), `crm-overview.md` (schema/API tour), `thesis-handoff.md`, `ten31-constitution.md` (full constitution + guardrails). +- `start9/0.4/` — StartOS package: `startos/utils.ts` (`PACKAGE_VERSION`), `startos/versions/`, `Dockerfile`, `docker_entrypoint.sh`, `Makefile`, `s9pk.mk`. +- `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored). + +## Conventions + +- **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`. +- **Migrations are additive + reversible only:** numbered `NNNN_*.sql` with a paired `NNNN_*.down.sql`. SQLite ALTER = add-column/rename only. +- **One-time seeds/backfills are idempotent** via `interaction_log` sentinels (the `ensure_*` pattern), wired into `init_db` — safe to re-run on every boot. +- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. `_node_tree` and `create_thesis_version` filter on `deleted_at IS NULL` and **ignore status** — so to drop a node from the live agent prompt AND version snapshots you must set `deleted_at`, not just status. +- **Thesis canonical gate:** node status is `draft|candidate|approved|retired` (the working tree); a canonical `thesis_version` is frozen ONLY by human **dual** sign-off (`thesis_review`). Code/seeds never set a version canonical. +- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT` (`start.sh`), `CRM_DATA_DIR`. +- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user. (Older history carries a `Co-Authored-By: Claude` trailer; dropped going forward.) + +## Always + +- **Bump the version before building an s9pk:** edit `PACKAGE_VERSION` in `start9/0.4/startos/utils.ts`, add `start9/0.4/startos/versions/v0.1.0.NN.ts`, and register it in `versions/index.ts` (import, set `current`, move prior `current` into `other[]`). Start9 0.4.x ignores a same-version rebuild. +- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`. +- **Make migrations/seeders deployment-state-invariant and idempotent:** target rows **structurally**, not by transient text the same change mutates; capture prior state so a revert is exact. (Learned the hard way: matching old nodes by a body string the same changeset deleted broke fresh DBs.) +- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` before it reaches a Claude model. +- **Get explicit user authorization before any production deploy/install** to `immense-voyage.local`. +- **Ship a paired `.down.sql`** with every new migration. + +## Never + +- **Never treat Qdrant (or any derived index) as source of truth** — the CRM/SQLite is canonical and rebuildable-from. +- **Never hard-delete** CRM records or thesis history — soft-delete/archive only. +- **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends. +- **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off. +- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`). +- **Never commit secrets, `data/crm.db`, `.env`, backups, or `.claude/`** (all gitignored). Scan staged files before committing. +- **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude. +- **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite. +- **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's. + +## Deeper docs + +- Full constitution + guardrails: `docs/ten31-constitution.md` — TODO: consider folding its still-current content into this file and retiring the separate doc. +- Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md` +- Retrieval/embeddings contract: `docs/EMBEDDINGS.md` +- CRM schema/API tour: `docs/crm-overview.md` +- Current thesis handoff: `docs/thesis-handoff.md` diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index c9e4a42..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,89 +0,0 @@ -# Ten31 Agentic System — Project Memory - -This file is the project constitution. Read it first; it states settled decisions and non-negotiable guardrails. Where anything here conflicts with a one-off prompt, this file wins. - -## What we're building - -Ten31 is an investment platform (bitcoin ecosystem, energy, AI infrastructure, freedom tech) that raises from LPs and deploys into private companies. We are building an in-house system of AI agents to widen the fundraising funnel, sharpen and propagate our investment thesis, and automate marketing/branding. Build agents on the **Claude Agent SDK**, connected to our systems via **MCP**. Frontier reasoning runs on **Claude**; privacy-sensitive and high-volume work runs on **local models on our DGX Sparks**, fronted by **Spark Control**. - -Full architecture and rationale: see `@./docs/Ten31_Agentic_Build_Plan.md`. -Current phase tasks and acceptance criteria: see `@./docs/PHASE_0.md`. -Embedding/retrieval API contract + ingest recipe (authoritative): see `@./docs/EMBEDDINGS.md`. - -**We are in Phase 0.** Phase 0 builds the data + retrieval substrate. There are NO live, outward-facing agents in Phase 0. - -## Settled architecture - -- **Reasoning model:** Claude via the Agent SDK / API (API-key auth, not claude.ai login). -- **Local models (Sparks, via Spark Control gateway):** - - Chat/triage: Qwen3.6 35B-A3B on Spark 1. - - Embeddings: `BAAI/bge-m3` (dense, 1024-dim, L2-normalized) → `/v1/embeddings` (OpenAI shape). - - Reranker: `BAAI/bge-reranker-v2-m3` (cross-encoder) → `/v1/rerank` (Cohere shape). - - Served by **spark-embed**, a small FastAPI server on Spark 2 (NGC PyTorch image — *not* HF TEI, which ships no arm64 CUDA image). Shipped in Spark Control v0.15.0. - - Audio: transcription + diarization + TTS on Spark 2. -- **Canonical data store:** the self-built CRM on the Start9 server. This is the single source of truth for LP/prospect data. -- **Vector index:** Qdrant v1.16.0 on Spark 2 (ports 6333/6334). Derived and rebuildable from the CRM (~8–15 min full re-embed) — NOT a second source of truth. But it holds the only *live* copy of the index, so it is never auto-restarted; the ingest pipeline must be idempotent so a rebuild is always safe. -- **Retrieval:** one orchestrated call, `POST /api/search` (embed query → Qdrant dense+sparse RRF with payload pre-filter → cross-encoder rerank → top_k). The sparse/BM25 leg is generated **client-side** with FastEmbed (`Qdrant/bm25`) at both ingest and query time, with Qdrant applying IDF over our own corpus — so exact entity/name matching is weighted by our term statistics, not bge-m3's pretrained sparse. Authoritative contract + ingest recipe: `@./docs/EMBEDDINGS.md`. -- **Gateway:** Spark Control (on Start9) fronts all local model services behind one trusted URL with shared TLS, access control, and observability. - -## Environment & services - -- All local model calls go through **Spark Control**, never directly to a Spark. -- Endpoints: `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`, `/api/search` (orchestrated hybrid retrieval), `/v1/audio/transcriptions`, `/v1/audio/speech`. -- **Secrets live in `.env` (gitignored). Never commit secret values.** Required variables (names only): - - `ANTHROPIC_API_KEY` - - `SPARK_CONTROL_URL` — gateway for `/v1/embeddings`, `/v1/rerank`, `/api/search` (reads + dense embeds) - - `QDRANT_URL` — direct Qdrant on Spark 2 (`http://:6333`) for collection admin + ingest upserts - - `X_API_KEY` — the X (Twitter) API key for Scout/Analyst enrichment. **Note:** this is *not* a CRM auth key; the CRM has no service-key/API-key path today (see below). - - CRM connection vars: - - `CRM_DB_PATH` — absolute path to the SQLite file (default `/crm.db`). The CRM has **no network DB protocol** — ingest "connects" by opening this file directly (read-only, `mode=ro`), co-located with the Start9 `/data` volume. - - `CRM_DATA_DIR` — the `/data` volume root (holds `crm.db`, `backups/`, `secrets/`, `email_attachments/`). - - `CRM_BASE_URL` — `http://:8080` (env `CRM_HOST`/`CRM_PORT`), for any HTTP access to the running CRM. - - `CRM_SECRET_KEY` — the CRM's own JWT signing secret (set on the Start9 deployment, persisted at `/data/.crm-secret`); only needed if the MCP server authenticates over HTTP rather than reading SQLite directly. -- A `.env.example` lists the variable names with empty values. - -## The agents (target roster — built in later phases) - -- **Scout** — monitors public sources (X via API, filings, etc.); flags trigger events. (Phase 2) -- **Analyst** — builds LP dossiers, enriches records, maps warm-intro paths. (Phase 2) -- **Architect** — owns/refines the canonical thesis; collaborative copilot. (Phase 1) -- **Scribe** — distributes the thesis as content across channels. (Phase 1) -- **Closer** — drafts outreach, nurture, meeting prep. Humans approve/send everything. (Phase 3) -- **Orchestrator** — schedules and routes work; picks per-agent retrieval modes. (Phase 3) - -## Division of labor - -- **Spark developer (separate):** TEI serving (BGE-M3 + reranker) and Qdrant on Spark 2, exposed via Spark Control `/v1/embeddings` + `/v1/rerank`. -- **This repo (Claude Code + the partners):** CRM schema extensions, ingest/sync pipeline, CRM MCP server, retrieval-mode library, and (later phases) the agents. - -## Guardrails — NON-NEGOTIABLE - -1. **Sovereignty.** Sensitive LP and relationship data stays on our infrastructure (Start9 + Sparks). Send only the minimum necessary, non-sensitive context to the Claude API. Never bulk-export the LP list to any third party. -2. **CRM is canonical.** Qdrant and any other store are derived. Never treat a derived index as the source of truth; never let them silently diverge. -3. **No destructive data ops.** Never hard-delete CRM records or history. Soft-delete/archive only. Migrations must be reversible and reviewed before running. -4. **Human-in-the-loop on anything outbound.** No agent sends email, posts publicly, or contacts an LP/prospect autonomously. Agents draft; a partner approves and sends. (Especially Closer and Scribe.) -5. **Log every agent action** to the interaction log, for compliance and debugging. -6. **Compliance gate before Phase 3.** No cold/outbound capability ships until counsel has defined solicitation posture (e.g. 506(b) vs 506(c)), accreditation/QP verification, and recordkeeping rules. -7. **Secrets never committed.** Use `.env` / a secrets store. No keys, tokens, or credentials in code, configs, or docs. -8. **Enrichment is one-way and public.** Per-prospect public lookups that write INTO the CRM; never push our data outward. -9. **Development data handling — keep real LP data out of Claude during the build.** Claude Code (the engineering partner) runs on the Anthropic API, so anything it reads is sent to a third party. Therefore Claude Code works only on **code, the schema, and synthetic or properly-redacted data** — never the real LP list, live records, or raw note/email prose. The real backfill and ingest **run on Ten31 infrastructure** (Start9 + Sparks) via **local models**; sensitive rows are never pasted into a Claude Code session or sent to the Claude API during development. To produce a realistic test corpus, redact/pseudonymize a copy **on the Sparks** (local) — do not hand-feed real records to Claude to "clean up." This is the same sovereignty boundary as guardrail #1, applied to the engineering workflow itself. - -## Conventions - -*Filled in from the CRM code (2026-06). Full detail: `@./docs/crm-overview.md`.* - -- **Language / runtime:** Python 3.11, standard library only at runtime. The CRM is one file, `backend/server.py` (~4.5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch. **Not** FastAPI — `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic but **none are imported** (vestigial). The only non-stdlib runtime deps are optional `bcrypt`/`jwt` and (for the Gmail module) `cryptography`. -- **Storage:** a single SQLite DB (`data/crm.db`), WAL mode, `foreign_keys=ON`, opened per-request via `get_db()`. Two parallel investor models coexist (classic `contacts`/`lp_profiles` + the `fundraising_*` grid) — see `docs/crm-overview.md` §2.3; reconciling them to canonical IDs is the core Phase-0 entity-resolution task. -- **Migrations:** **additive and reversible only.** Core schema uses ordered `backend/migrations/NNNN_*.sql` files applied once at startup by `backend/core_migrations.py`, tracked in a `schema_migrations` ledger; ship a paired `NNNN_*.down.sql` for rollback. (The Gmail module has its own runner under `backend/email_integration/migrations/`.) SQLite ALTER is add-column/rename only — which enforces the additive guardrail. -- **Run locally:** `./start.sh` (dev defaults, port 8080). `./start_beta.sh` for a Tailscale/production-mode launch (requires `CRM_SECRET_KEY`). No build step. -- **Tests / lint:** none in-repo. Sanity-check edits with `python3 -m py_compile backend/server.py`. Verify migrations against a *copy* of `crm.db`, never production. -- **Production:** Start9 package `ten-database`. **`start9/0.4/` is the live target** (TypeScript SDK manifest under `start9/0.4/startos/`); `start9/0.3.5/` (YAML manifest) is the superseded prior generation. All state on the persistent `/data` volume. -- **Auth:** username/password → HS256 JWT (Bearer header), two roles (`admin`/`member`), no row-level authorization. `X_API_KEY` (in this file's env list) is the *X/Twitter* key — there is **no CRM service-key path in code**; an MCP/ingest client must read SQLite directly or authenticate as a real CRM user. -- Prefer clear, reviewable changes over cleverness. Keep the ingest pipeline and MCP server modular so retrieval modes and sources can be added without rewrites. - -## First actions for a new session - -1. Read `@./docs/PHASE_0.md` and `@./docs/EMBEDDINGS.md` (the latter is the authoritative embedding/retrieval contract and ingest recipe). -2. Read the CRM source in the repo; produce a short written summary of the storage engine, schema, and API surface, and fill in the Conventions section above and the CRM env vars. -3. Confirm Spark Control is reachable and `/v1/embeddings`, `/v1/rerank`, and `/api/search` respond (these shipped in v0.15.0; check `GET /api/endpoints`). -4. Proceed through the Phase 0 workstreams in order. Do not build any outward-facing agent behavior in Phase 0. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/docs/ten31-constitution.md b/docs/ten31-constitution.md new file mode 100644 index 0000000..c9e4a42 --- /dev/null +++ b/docs/ten31-constitution.md @@ -0,0 +1,89 @@ +# Ten31 Agentic System — Project Memory + +This file is the project constitution. Read it first; it states settled decisions and non-negotiable guardrails. Where anything here conflicts with a one-off prompt, this file wins. + +## What we're building + +Ten31 is an investment platform (bitcoin ecosystem, energy, AI infrastructure, freedom tech) that raises from LPs and deploys into private companies. We are building an in-house system of AI agents to widen the fundraising funnel, sharpen and propagate our investment thesis, and automate marketing/branding. Build agents on the **Claude Agent SDK**, connected to our systems via **MCP**. Frontier reasoning runs on **Claude**; privacy-sensitive and high-volume work runs on **local models on our DGX Sparks**, fronted by **Spark Control**. + +Full architecture and rationale: see `@./docs/Ten31_Agentic_Build_Plan.md`. +Current phase tasks and acceptance criteria: see `@./docs/PHASE_0.md`. +Embedding/retrieval API contract + ingest recipe (authoritative): see `@./docs/EMBEDDINGS.md`. + +**We are in Phase 0.** Phase 0 builds the data + retrieval substrate. There are NO live, outward-facing agents in Phase 0. + +## Settled architecture + +- **Reasoning model:** Claude via the Agent SDK / API (API-key auth, not claude.ai login). +- **Local models (Sparks, via Spark Control gateway):** + - Chat/triage: Qwen3.6 35B-A3B on Spark 1. + - Embeddings: `BAAI/bge-m3` (dense, 1024-dim, L2-normalized) → `/v1/embeddings` (OpenAI shape). + - Reranker: `BAAI/bge-reranker-v2-m3` (cross-encoder) → `/v1/rerank` (Cohere shape). + - Served by **spark-embed**, a small FastAPI server on Spark 2 (NGC PyTorch image — *not* HF TEI, which ships no arm64 CUDA image). Shipped in Spark Control v0.15.0. + - Audio: transcription + diarization + TTS on Spark 2. +- **Canonical data store:** the self-built CRM on the Start9 server. This is the single source of truth for LP/prospect data. +- **Vector index:** Qdrant v1.16.0 on Spark 2 (ports 6333/6334). Derived and rebuildable from the CRM (~8–15 min full re-embed) — NOT a second source of truth. But it holds the only *live* copy of the index, so it is never auto-restarted; the ingest pipeline must be idempotent so a rebuild is always safe. +- **Retrieval:** one orchestrated call, `POST /api/search` (embed query → Qdrant dense+sparse RRF with payload pre-filter → cross-encoder rerank → top_k). The sparse/BM25 leg is generated **client-side** with FastEmbed (`Qdrant/bm25`) at both ingest and query time, with Qdrant applying IDF over our own corpus — so exact entity/name matching is weighted by our term statistics, not bge-m3's pretrained sparse. Authoritative contract + ingest recipe: `@./docs/EMBEDDINGS.md`. +- **Gateway:** Spark Control (on Start9) fronts all local model services behind one trusted URL with shared TLS, access control, and observability. + +## Environment & services + +- All local model calls go through **Spark Control**, never directly to a Spark. +- Endpoints: `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`, `/api/search` (orchestrated hybrid retrieval), `/v1/audio/transcriptions`, `/v1/audio/speech`. +- **Secrets live in `.env` (gitignored). Never commit secret values.** Required variables (names only): + - `ANTHROPIC_API_KEY` + - `SPARK_CONTROL_URL` — gateway for `/v1/embeddings`, `/v1/rerank`, `/api/search` (reads + dense embeds) + - `QDRANT_URL` — direct Qdrant on Spark 2 (`http://:6333`) for collection admin + ingest upserts + - `X_API_KEY` — the X (Twitter) API key for Scout/Analyst enrichment. **Note:** this is *not* a CRM auth key; the CRM has no service-key/API-key path today (see below). + - CRM connection vars: + - `CRM_DB_PATH` — absolute path to the SQLite file (default `/crm.db`). The CRM has **no network DB protocol** — ingest "connects" by opening this file directly (read-only, `mode=ro`), co-located with the Start9 `/data` volume. + - `CRM_DATA_DIR` — the `/data` volume root (holds `crm.db`, `backups/`, `secrets/`, `email_attachments/`). + - `CRM_BASE_URL` — `http://:8080` (env `CRM_HOST`/`CRM_PORT`), for any HTTP access to the running CRM. + - `CRM_SECRET_KEY` — the CRM's own JWT signing secret (set on the Start9 deployment, persisted at `/data/.crm-secret`); only needed if the MCP server authenticates over HTTP rather than reading SQLite directly. +- A `.env.example` lists the variable names with empty values. + +## The agents (target roster — built in later phases) + +- **Scout** — monitors public sources (X via API, filings, etc.); flags trigger events. (Phase 2) +- **Analyst** — builds LP dossiers, enriches records, maps warm-intro paths. (Phase 2) +- **Architect** — owns/refines the canonical thesis; collaborative copilot. (Phase 1) +- **Scribe** — distributes the thesis as content across channels. (Phase 1) +- **Closer** — drafts outreach, nurture, meeting prep. Humans approve/send everything. (Phase 3) +- **Orchestrator** — schedules and routes work; picks per-agent retrieval modes. (Phase 3) + +## Division of labor + +- **Spark developer (separate):** TEI serving (BGE-M3 + reranker) and Qdrant on Spark 2, exposed via Spark Control `/v1/embeddings` + `/v1/rerank`. +- **This repo (Claude Code + the partners):** CRM schema extensions, ingest/sync pipeline, CRM MCP server, retrieval-mode library, and (later phases) the agents. + +## Guardrails — NON-NEGOTIABLE + +1. **Sovereignty.** Sensitive LP and relationship data stays on our infrastructure (Start9 + Sparks). Send only the minimum necessary, non-sensitive context to the Claude API. Never bulk-export the LP list to any third party. +2. **CRM is canonical.** Qdrant and any other store are derived. Never treat a derived index as the source of truth; never let them silently diverge. +3. **No destructive data ops.** Never hard-delete CRM records or history. Soft-delete/archive only. Migrations must be reversible and reviewed before running. +4. **Human-in-the-loop on anything outbound.** No agent sends email, posts publicly, or contacts an LP/prospect autonomously. Agents draft; a partner approves and sends. (Especially Closer and Scribe.) +5. **Log every agent action** to the interaction log, for compliance and debugging. +6. **Compliance gate before Phase 3.** No cold/outbound capability ships until counsel has defined solicitation posture (e.g. 506(b) vs 506(c)), accreditation/QP verification, and recordkeeping rules. +7. **Secrets never committed.** Use `.env` / a secrets store. No keys, tokens, or credentials in code, configs, or docs. +8. **Enrichment is one-way and public.** Per-prospect public lookups that write INTO the CRM; never push our data outward. +9. **Development data handling — keep real LP data out of Claude during the build.** Claude Code (the engineering partner) runs on the Anthropic API, so anything it reads is sent to a third party. Therefore Claude Code works only on **code, the schema, and synthetic or properly-redacted data** — never the real LP list, live records, or raw note/email prose. The real backfill and ingest **run on Ten31 infrastructure** (Start9 + Sparks) via **local models**; sensitive rows are never pasted into a Claude Code session or sent to the Claude API during development. To produce a realistic test corpus, redact/pseudonymize a copy **on the Sparks** (local) — do not hand-feed real records to Claude to "clean up." This is the same sovereignty boundary as guardrail #1, applied to the engineering workflow itself. + +## Conventions + +*Filled in from the CRM code (2026-06). Full detail: `@./docs/crm-overview.md`.* + +- **Language / runtime:** Python 3.11, standard library only at runtime. The CRM is one file, `backend/server.py` (~4.5k lines): a stdlib `http.server.ThreadingHTTPServer` + hand-written `CRMHandler` with manual path dispatch. **Not** FastAPI — `backend/requirements.txt` lists FastAPI/SQLAlchemy/Alembic/Pydantic but **none are imported** (vestigial). The only non-stdlib runtime deps are optional `bcrypt`/`jwt` and (for the Gmail module) `cryptography`. +- **Storage:** a single SQLite DB (`data/crm.db`), WAL mode, `foreign_keys=ON`, opened per-request via `get_db()`. Two parallel investor models coexist (classic `contacts`/`lp_profiles` + the `fundraising_*` grid) — see `docs/crm-overview.md` §2.3; reconciling them to canonical IDs is the core Phase-0 entity-resolution task. +- **Migrations:** **additive and reversible only.** Core schema uses ordered `backend/migrations/NNNN_*.sql` files applied once at startup by `backend/core_migrations.py`, tracked in a `schema_migrations` ledger; ship a paired `NNNN_*.down.sql` for rollback. (The Gmail module has its own runner under `backend/email_integration/migrations/`.) SQLite ALTER is add-column/rename only — which enforces the additive guardrail. +- **Run locally:** `./start.sh` (dev defaults, port 8080). `./start_beta.sh` for a Tailscale/production-mode launch (requires `CRM_SECRET_KEY`). No build step. +- **Tests / lint:** none in-repo. Sanity-check edits with `python3 -m py_compile backend/server.py`. Verify migrations against a *copy* of `crm.db`, never production. +- **Production:** Start9 package `ten-database`. **`start9/0.4/` is the live target** (TypeScript SDK manifest under `start9/0.4/startos/`); `start9/0.3.5/` (YAML manifest) is the superseded prior generation. All state on the persistent `/data` volume. +- **Auth:** username/password → HS256 JWT (Bearer header), two roles (`admin`/`member`), no row-level authorization. `X_API_KEY` (in this file's env list) is the *X/Twitter* key — there is **no CRM service-key path in code**; an MCP/ingest client must read SQLite directly or authenticate as a real CRM user. +- Prefer clear, reviewable changes over cleverness. Keep the ingest pipeline and MCP server modular so retrieval modes and sources can be added without rewrites. + +## First actions for a new session + +1. Read `@./docs/PHASE_0.md` and `@./docs/EMBEDDINGS.md` (the latter is the authoritative embedding/retrieval contract and ingest recipe). +2. Read the CRM source in the repo; produce a short written summary of the storage engine, schema, and API surface, and fill in the Conventions section above and the CRM env vars. +3. Confirm Spark Control is reachable and `/v1/embeddings`, `/v1/rerank`, and `/api/search` respond (these shipped in v0.15.0; check `GET /api/endpoints`). +4. Proceed through the Phase 0 workstreams in order. Do not build any outward-facing agent behavior in Phase 0.