From 090416f05e93367595ee74b819683b40cd2d264f Mon Sep 17 00:00:00 2001 From: Keysat Date: Fri, 12 Jun 2026 16:46:49 -0500 Subject: [PATCH] docs: extract subsystem guides; keep AGENTS.md to whole-repo facts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move subsystem mechanics (migrations, thesis gate, redaction, ingest, email, packaging) out of AGENTS.md into docs/guides/.md, each scoped by paths: frontmatter and symlinked from .claude/rules/ so Claude Code lazy-loads them. AGENTS.md keeps whole-repo facts and universal guardrails plus a one-line index per guide. Fix the inaccurate ".claude/ is gitignored" note — it is tracked. --- .claude/rules/email.md | 1 + .claude/rules/migrations.md | 1 + .claude/rules/packaging.md | 1 + .claude/rules/redaction.md | 1 + .claude/rules/spark-ingest.md | 1 + .claude/rules/thesis.md | 1 + AGENTS.md | 54 +++++++++++++++++++---------------- docs/guides/email.md | 23 +++++++++++++++ docs/guides/migrations.md | 28 ++++++++++++++++++ docs/guides/packaging.md | 31 ++++++++++++++++++++ docs/guides/redaction.md | 26 +++++++++++++++++ docs/guides/spark-ingest.md | 21 ++++++++++++++ docs/guides/thesis.md | 28 ++++++++++++++++++ 13 files changed, 192 insertions(+), 25 deletions(-) create mode 120000 .claude/rules/email.md create mode 120000 .claude/rules/migrations.md create mode 120000 .claude/rules/packaging.md create mode 120000 .claude/rules/redaction.md create mode 120000 .claude/rules/spark-ingest.md create mode 120000 .claude/rules/thesis.md create mode 100644 docs/guides/email.md create mode 100644 docs/guides/migrations.md create mode 100644 docs/guides/packaging.md create mode 100644 docs/guides/redaction.md create mode 100644 docs/guides/spark-ingest.md create mode 100644 docs/guides/thesis.md diff --git a/.claude/rules/email.md b/.claude/rules/email.md new file mode 120000 index 0000000..fc540e6 --- /dev/null +++ b/.claude/rules/email.md @@ -0,0 +1 @@ +../../docs/guides/email.md \ No newline at end of file diff --git a/.claude/rules/migrations.md b/.claude/rules/migrations.md new file mode 120000 index 0000000..b0e967d --- /dev/null +++ b/.claude/rules/migrations.md @@ -0,0 +1 @@ +../../docs/guides/migrations.md \ No newline at end of file diff --git a/.claude/rules/packaging.md b/.claude/rules/packaging.md new file mode 120000 index 0000000..e2e13a0 --- /dev/null +++ b/.claude/rules/packaging.md @@ -0,0 +1 @@ +../../docs/guides/packaging.md \ No newline at end of file diff --git a/.claude/rules/redaction.md b/.claude/rules/redaction.md new file mode 120000 index 0000000..2a560b8 --- /dev/null +++ b/.claude/rules/redaction.md @@ -0,0 +1 @@ +../../docs/guides/redaction.md \ No newline at end of file diff --git a/.claude/rules/spark-ingest.md b/.claude/rules/spark-ingest.md new file mode 120000 index 0000000..c8d1977 --- /dev/null +++ b/.claude/rules/spark-ingest.md @@ -0,0 +1 @@ +../../docs/guides/spark-ingest.md \ No newline at end of file diff --git a/.claude/rules/thesis.md b/.claude/rules/thesis.md new file mode 120000 index 0000000..c61d1c4 --- /dev/null +++ b/.claude/rules/thesis.md @@ -0,0 +1 @@ +../../docs/guides/thesis.md \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 05d63fc..73543f2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,52 +22,55 @@ In-house AI-agent system over a self-hosted Start9 CRM (SQLite) for a bitcoin/en # Sanity-check edits (there is no compiler/build for the CRM) python3 -m py_compile backend/server.py # Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed) -python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py (13 exist) +python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py # Run all tests (no aggregate runner exists) for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done -# Build the s9pk (x86_64 only) -> ten-database_x86_64.s9pk — BUMP THE VERSION FIRST (see Always) +# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md. cd start9/0.4 && make -# Install to the box — PRODUCTION; get explicit user OK first. TODO: confirm exact host/context. -start-cli package install -s ten-database_x86_64.s9pk # target host = $START9_BOX_HOST (real value lives in your local start-cli context config, NOT this repo) ``` -- **Migrations** apply automatically at startup via `backend/core_migrations.py` from `backend/migrations/NNNN_*.sql`, tracked in a `schema_migrations` ledger. Verify a new one against a **copy** of `data/crm.db`, never production. +- **Migrations** apply automatically at startup (`backend/core_migrations.py`, `schema_migrations` ledger). See `docs/guides/migrations.md` before adding one. - **Lint:** none configured. ## Directory layout (day-one) - `backend/server.py` — the CRM monolith: HTTP handler, route dispatch, `init_db()`, auth (username/password → HS256 JWT, roles admin/member). - `backend/core_migrations.py` + `backend/migrations/NNNN_*.sql` (+ paired `.down.sql`) — additive schema migrations, applied at startup. -- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders (interaction_log sentinels), wired in `server.init_db()`. +- `backend/thesis_seed.py` — Thesis Workshop seed + idempotent `ensure_*` one-time seeders, wired in `server.init_db()`. - `backend/thesis_review.py` — thesis version review/approval (human dual sign-off → canonical). -- `backend/mcp/` — `architect_agent.py` (Claude thesis copilot), `architect_tools.py` (thesis CRUD/versions), `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP). -- `backend/email_integration/` — Gmail capture via domain-wide delegation: `credentials.py`, `matcher.py`, `parser.py`, `db.py`, `sync.py`, `scheduler.py`, `routes.py`, `compose.py` (Tier-B draft creation), `migrations/`. -- `backend/redaction/` — `scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary (`Boundary`, `SCRUB_BACKEND=local|gateway`, fail-closed). -- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes (`search.py`, `embed.py`, `qdrant_io.py`, `sparse.py`, `entity_resolution.py`). +- `backend/mcp/` — `architect_agent.py` (Claude thesis copilot), `architect_tools.py`, `outreach_agent.py` (LP draft assistant), `architect_grounding.py`, `crm_tools.py`, `server.py` (FastMCP). +- `backend/email_integration/` — Gmail capture via domain-wide delegation + Tier-B draft creation (`compose.py`). +- `backend/redaction/` — `scrub.py` + `client.py`: the scrub→Claude→re-hydrate privacy boundary. +- `backend/ingest/` — chunk→embed→Qdrant + retrieval modes. - `backend/entity_*.py` — entity resolution/merge (the two-investor-model reconciliation). - `frontend/index.html` — the entire UI. -- `docs/` — `Ten31_Agentic_Build_Plan.md` (architecture), `PHASE_0.md`/`PHASE_1.md`, `EMBEDDINGS.md` (retrieval contract), `crm-overview.md` (schema/API tour), `thesis-handoff.md`, `ten31-constitution.md` (full constitution + guardrails). -- `start9/0.4/` — StartOS package: `startos/utils.ts` (`PACKAGE_VERSION`), `startos/versions/`, `Dockerfile`, `docker_entrypoint.sh`, `Makefile`, `s9pk.mk`. +- `docs/` — architecture, phase plans, contracts, runbooks (see Deeper docs). `docs/guides/` — scoped subsystem rules (see below). +- `start9/0.4/` — StartOS package (`startos/utils.ts` holds `PACKAGE_VERSION`). - `data/crm.db` — the live DB (gitignored). `.env` / `.env.example` — config (`.env` gitignored). +## Scoped guides + +Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude/rules/` symlinks (scoped by `paths:` frontmatter). **Read the guide before editing that area:** + +- **Migrations or seeders** (`backend/migrations/`, `core_migrations.py`, `thesis_seed.py`) → `docs/guides/migrations.md` +- **Thesis logic** (`backend/thesis_*.py`, `backend/mcp/architect_*.py`) → `docs/guides/thesis.md` +- **Redaction or any MCP/Claude path** (`backend/redaction/`, `backend/mcp/`) → `docs/guides/redaction.md` +- **Ingest / retrieval** (`backend/ingest/`) → `docs/guides/spark-ingest.md` +- **Email capture / drafts** (`backend/email_integration/`) → `docs/guides/email.md` +- **Building or deploying the s9pk** (`start9/`) → `docs/guides/packaging.md` + ## Conventions - **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`. -- **Migrations are additive + reversible only:** numbered `NNNN_*.sql` with a paired `NNNN_*.down.sql`. SQLite ALTER = add-column/rename only. -- **One-time seeds/backfills are idempotent** via `interaction_log` sentinels (the `ensure_*` pattern), wired into `init_db` — safe to re-run on every boot. -- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. `_node_tree` and `create_thesis_version` filter on `deleted_at IS NULL` and **ignore status** — so to drop a node from the live agent prompt AND version snapshots you must set `deleted_at`, not just status. -- **Thesis canonical gate:** node status is `draft|candidate|approved|retired` (the working tree); a canonical `thesis_version` is frozen ONLY by human **dual** sign-off (`thesis_review`). Code/seeds never set a version canonical. -- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT` (`start.sh`), `CRM_DATA_DIR`. -- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user. (Older history carries a `Co-Authored-By: Claude` trailer; dropped going forward.) +- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. (Thesis has a subtlety here — see the thesis guide.) +- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`. +- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user. ## Always -- **Bump the version before building an s9pk:** edit `PACKAGE_VERSION` in `start9/0.4/startos/utils.ts`, add `start9/0.4/startos/versions/v0.1.0.NN.ts`, and register it in `versions/index.ts` (import, set `current`, move prior `current` into `other[]`). Start9 0.4.x ignores a same-version rebuild. -- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`. -- **Make migrations/seeders deployment-state-invariant and idempotent:** target rows **structurally**, not by transient text the same change mutates; capture prior state so a revert is exact. (Learned the hard way: matching old nodes by a body string the same changeset deleted broke fresh DBs.) -- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` before it reaches a Claude model. +- **Verify before shipping:** `python3 -m py_compile` the edited files; for DB logic, run the change against a **copy** of `data/crm.db`, never production. +- **Keep real LP data out of Claude:** develop only on code/schema/synthetic-or-locally-redacted data; route any real record substance through `backend/redaction` first. - **Get explicit user authorization before any production deploy/install** to `$START9_BOX_HOST`. -- **Ship a paired `.down.sql`** with every new migration. ## Never @@ -76,18 +79,19 @@ start-cli package install -s ten-database_x86_64.s9pk # target host = $START9_ - **Never let an agent send email, post, or contact an LP autonomously** — agents draft; a human approves and sends. - **Never set a `thesis_version` canonical from code/seeds** — that is human dual sign-off. - **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`). -- **Never commit secrets, `data/crm.db`, `.env`, backups, or `.claude/`** (all gitignored). Scan staged files before committing. +- **Never commit secrets, `data/crm.db`, `.env`, or `data/backups/`** (all gitignored). Scan staged files before committing. (`.claude/` *is* tracked — `launch.json` and `rules/` symlinks ship with the repo; keep local-only settings in `.claude/settings.local.json`.) - **Never bulk-export the LP list** to any third party; send only minimal non-sensitive context to Claude. - **Never assume FastAPI / SQLAlchemy / pytest** are in play — they sit in `requirements.txt` unused; runtime is stdlib + SQLite. - **Never add a `Co-Authored-By` / "Generated with" trailer** to commits or PRs — commits are the user's. ## Deeper docs -- Full constitution + guardrails: `docs/ten31-constitution.md` — TODO: consider folding its still-current content into this file and retiring the separate doc. +- Full constitution + guardrails: `docs/ten31-constitution.md` - Architecture & rationale: `docs/Ten31_Agentic_Build_Plan.md` - Retrieval/embeddings contract: `docs/EMBEDDINGS.md` - CRM schema/API tour: `docs/crm-overview.md` - Current thesis handoff: `docs/thesis-handoff.md` +- Operations & runbooks: `docs/OPERATIONS.md`, `docs/go-live-runbook.md`, `docs/gmail-enablement-runbook.md` ## Current state diff --git a/docs/guides/email.md b/docs/guides/email.md new file mode 100644 index 0000000..64323e2 --- /dev/null +++ b/docs/guides/email.md @@ -0,0 +1,23 @@ +--- +paths: + - backend/email_integration/** +--- + +# Email capture & drafts (Gmail) + +Read this before editing Gmail capture or draft creation. + +## What it does + +- `backend/email_integration/` captures Gmail via **domain-wide delegation** (`credentials.py`, `matcher.py`, `parser.py`, `db.py`, `sync.py`, `scheduler.py`, `routes.py`) and creates Tier-B in-thread drafts (`compose.py`). It has its own `migrations/`. +- Captured email becomes CRM activity through a **propose → approve** flow — nothing lands on a contact record until a human approves the proposal. + +## Hard rule + +- **Agents draft; humans send.** Never let an agent send email, post, or contact an LP autonomously. Tier-B `compose.py` only *creates* a Gmail draft for human review. + +## Known gap + +- Tier-B drafts currently reply to the **LP only**; reply-all is the next change (see AGENTS.md → Current state). + +See also `docs/gmail-enablement-runbook.md`. diff --git a/docs/guides/migrations.md b/docs/guides/migrations.md new file mode 100644 index 0000000..d0b1ce9 --- /dev/null +++ b/docs/guides/migrations.md @@ -0,0 +1,28 @@ +--- +paths: + - backend/migrations/** + - backend/core_migrations.py + - backend/thesis_seed.py +--- + +# Migrations & seeders + +Read this before adding or editing a schema migration or a one-time seed/backfill. + +## How they run + +- Migrations apply automatically at startup via `backend/core_migrations.py`, reading `backend/migrations/NNNN_*.sql` in order and tracking applied files in a `schema_migrations` ledger. +- One-time seeds/backfills live in `backend/thesis_seed.py` (the `ensure_*` functions), wired into `server.init_db()` and run on every boot. + +## Rules + +- **Additive + reversible only.** Numbered `NNNN_*.sql` with a paired `NNNN_*.down.sql` — ship the `.down.sql` with every new migration. SQLite `ALTER` is add-column / rename only; no drop-column, no type change. +- **Seeds/backfills must be idempotent** via `interaction_log` sentinels (the `ensure_*` pattern) — safe to re-run on every boot. +- **Make migrations/seeders deployment-state-invariant.** Target rows **structurally**, not by transient text the same change mutates; capture prior state so a revert is exact. + - *Learned the hard way:* matching old nodes by a body string the same changeset deleted broke fresh DBs. A migration must produce the same end state whether the box is empty, mid-version, or fully seeded. +- **Soft-delete only** — `deleted_at` and/or `status='retired'`; never hard-delete CRM records or thesis history. + +## Verify before shipping + +- `python3 -m py_compile` the edited Python. +- For any DB logic, run the change against a **copy** of `data/crm.db`, never production. Confirm the paired `.down.sql` cleanly reverts. diff --git a/docs/guides/packaging.md b/docs/guides/packaging.md new file mode 100644 index 0000000..d97e8b0 --- /dev/null +++ b/docs/guides/packaging.md @@ -0,0 +1,31 @@ +--- +paths: + - start9/** +--- + +# StartOS packaging & deploy + +Read this before building or installing the s9pk. Live target is `start9/0.4/`. + +## Bump the version FIRST — every build + +Start9 0.4.x ignores a same-version rebuild (the install silently does nothing). Before `make`: + +1. Edit `PACKAGE_VERSION` in `start9/0.4/startos/utils.ts`. +2. Add `start9/0.4/startos/versions/v0.1.0.NN.ts`. +3. Register it in `start9/0.4/startos/versions/index.ts`: import it, set it as `current`, and move the prior `current` into `other[]`. + +## Build (x86_64 only) + +```bash +cd start9/0.4 && make # -> ten-database_x86_64.s9pk +``` + +## Install — PRODUCTION + +```bash +start-cli package install -s ten-database_x86_64.s9pk # target host = $START9_BOX_HOST +``` + +- `$START9_BOX_HOST` resolves from your local `start-cli` context config — the real hostname is **not** in this repo. +- **Get explicit user authorization before any production deploy/install.** Verify a new migration against a **copy** of `data/crm.db` first, never the box's DB. diff --git a/docs/guides/redaction.md b/docs/guides/redaction.md new file mode 100644 index 0000000..8aafd6b --- /dev/null +++ b/docs/guides/redaction.md @@ -0,0 +1,26 @@ +--- +paths: + - backend/redaction/** + - backend/mcp/** +--- + +# Redaction & the Claude privacy boundary + +Read this before editing anything that sends data to a Claude model — the redaction layer or any MCP agent/tool path. + +## The boundary + +- `backend/redaction/` (`scrub.py` + `client.py`) is the **scrub → Claude → re-hydrate** boundary: `Boundary`, `SCRUB_BACKEND=local|gateway`, **fail-closed**. +- `SCRUB_BACKEND=gateway` routes scrubbing through Spark Control (caller-supplied dict). Local backend scrubs in-process. If scrubbing can't run, the call fails closed — it does not pass raw text through. + +## Hard rules + +- **Keep real LP data out of Claude.** Develop only on code/schema/synthetic-or-locally-redacted data. Route any real record substance through `backend/redaction` before it reaches a Claude model. +- **Never bulk-export the LP list** to any third party. Send only minimal, non-sensitive context to Claude. +- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`). + +## When adding a new Claude/MCP call + +Trace the data path: any field carrying LP substance must cross `Boundary` first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in `backend/redaction/test_*.py`. + +See also `docs/redaction-rehydration.md` and `docs/spark-control-scrub-endpoints.md`. diff --git a/docs/guides/spark-ingest.md b/docs/guides/spark-ingest.md new file mode 100644 index 0000000..c383e3b --- /dev/null +++ b/docs/guides/spark-ingest.md @@ -0,0 +1,21 @@ +--- +paths: + - backend/ingest/** +--- + +# Ingest, retrieval & Spark/Qdrant + +Read this before editing the ingest pipeline or retrieval modes. + +## Pipeline + +- `backend/ingest/` is chunk → embed → Qdrant plus retrieval modes (`search.py`, `embed.py`, `qdrant_io.py`, `sparse.py`, `entity_resolution.py`). +- Local models — bge-m3 embeddings, bge-reranker-v2-m3, `/api/search` — run **always via Spark Control**, never against a Spark directly (`SPARK_CONTROL_URL`). The retrieval/embeddings contract is `docs/EMBEDDINGS.md`; honor it. + +## Hard rule + +- **Never treat Qdrant (or any derived index) as source of truth.** The CRM / SQLite is canonical and the index is rebuildable from it. Code may drop and rebuild the Qdrant collection; it must never read a fact from Qdrant that isn't recoverable from SQLite. + +## Entity resolution + +The two-investor-model reconciliation (classic `contacts`/`lp_profiles` vs the `fundraising_*` grid → canonical IDs) is the core entity-resolution task. See `backend/entity_*.py` and `docs/crm-overview.md`. diff --git a/docs/guides/thesis.md b/docs/guides/thesis.md new file mode 100644 index 0000000..379d17f --- /dev/null +++ b/docs/guides/thesis.md @@ -0,0 +1,28 @@ +--- +paths: + - backend/thesis_seed.py + - backend/thesis_review.py + - backend/mcp/architect_agent.py + - backend/mcp/architect_tools.py + - backend/mcp/architect_grounding.py +--- + +# Thesis Workshop & canonical gate + +Read this before editing thesis nodes, versions, the review flow, or the Architect copilot. + +## The two layers + +- **Working tree** — thesis nodes with status `draft | candidate | approved | retired`. Code and seeds may move nodes around this ladder freely. +- **Canonical** — a frozen `thesis_version`, the read source for the live agent prompt. A version becomes canonical **only** by human **dual** sign-off through `backend/thesis_review.py` (currently Grant + Jonathan). + +## Hard rules + +- **Never set a `thesis_version` canonical from code or seeds.** That is human dual sign-off, full stop. `ensure_*` seeders may promote a *working* spine to `approved` (node-level, reversible) but must not freeze a canonical version. +- **Soft-delete subtlety — this trips people up:** `_node_tree` and `create_thesis_version` filter on `deleted_at IS NULL` and **ignore status**. So to drop a node from *both* the live agent prompt and version snapshots you must set `deleted_at` — setting `status='retired'` alone leaves it in the tree. + +## Boot behavior + +- On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible) — it does **not** freeze a canonical version. Promotion to canonical still waits on dual sign-off in the Workshop. + +See also `docs/thesis-handoff.md` for the current thesis content state.