docs: extract subsystem guides; keep AGENTS.md to whole-repo facts

Move subsystem mechanics (migrations, thesis gate, redaction, ingest, email, packaging) out of AGENTS.md into docs/guides/<topic>.md, each scoped by paths: frontmatter and symlinked from .claude/rules/ so Claude Code lazy-loads them. AGENTS.md keeps whole-repo facts and universal guardrails plus a one-line index per guide. Fix the inaccurate ".claude/ is gitignored" note — it is tracked.
2026-06-12 16:46:49 -05:00
parent cabbcae5d5
commit 090416f05e
13 changed files with 192 additions and 25 deletions
@@ -0,0 +1,23 @@
+---
+paths:
+  - backend/email_integration/**
+---
+
+# Email capture & drafts (Gmail)
+
+Read this before editing Gmail capture or draft creation.
+
+## What it does
+
+- `backend/email_integration/` captures Gmail via **domain-wide delegation** (`credentials.py`, `matcher.py`, `parser.py`, `db.py`, `sync.py`, `scheduler.py`, `routes.py`) and creates Tier-B in-thread drafts (`compose.py`). It has its own `migrations/`.
+- Captured email becomes CRM activity through a **propose → approve** flow — nothing lands on a contact record until a human approves the proposal.
+
+## Hard rule
+
+- **Agents draft; humans send.** Never let an agent send email, post, or contact an LP autonomously. Tier-B `compose.py` only *creates* a Gmail draft for human review.
+
+## Known gap
+
+- Tier-B drafts currently reply to the **LP only**; reply-all is the next change (see AGENTS.md → Current state).
+
+See also `docs/gmail-enablement-runbook.md`.
@@ -0,0 +1,28 @@
+---
+paths:
+  - backend/migrations/**
+  - backend/core_migrations.py
+  - backend/thesis_seed.py
+---
+
+# Migrations & seeders
+
+Read this before adding or editing a schema migration or a one-time seed/backfill.
+
+## How they run
+
+- Migrations apply automatically at startup via `backend/core_migrations.py`, reading `backend/migrations/NNNN_*.sql` in order and tracking applied files in a `schema_migrations` ledger.
+- One-time seeds/backfills live in `backend/thesis_seed.py` (the `ensure_*` functions), wired into `server.init_db()` and run on every boot.
+
+## Rules
+
+- **Additive + reversible only.** Numbered `NNNN_*.sql` with a paired `NNNN_*.down.sql` — ship the `.down.sql` with every new migration. SQLite `ALTER` is add-column / rename only; no drop-column, no type change.
+- **Seeds/backfills must be idempotent** via `interaction_log` sentinels (the `ensure_*` pattern) — safe to re-run on every boot.
+- **Make migrations/seeders deployment-state-invariant.** Target rows **structurally**, not by transient text the same change mutates; capture prior state so a revert is exact.
+  - *Learned the hard way:* matching old nodes by a body string the same changeset deleted broke fresh DBs. A migration must produce the same end state whether the box is empty, mid-version, or fully seeded.
+- **Soft-delete only** — `deleted_at` and/or `status='retired'`; never hard-delete CRM records or thesis history.
+
+## Verify before shipping
+
+- `python3 -m py_compile` the edited Python.
+- For any DB logic, run the change against a **copy** of `data/crm.db`, never production. Confirm the paired `.down.sql` cleanly reverts.
@@ -0,0 +1,31 @@
+---
+paths:
+  - start9/**
+---
+
+# StartOS packaging & deploy
+
+Read this before building or installing the s9pk. Live target is `start9/0.4/`.
+
+## Bump the version FIRST — every build
+
+Start9 0.4.x ignores a same-version rebuild (the install silently does nothing). Before `make`:
+
+1. Edit `PACKAGE_VERSION` in `start9/0.4/startos/utils.ts`.
+2. Add `start9/0.4/startos/versions/v0.1.0.NN.ts`.
+3. Register it in `start9/0.4/startos/versions/index.ts`: import it, set it as `current`, and move the prior `current` into `other[]`.
+
+## Build (x86_64 only)
+
+```bash
+cd start9/0.4 && make        # -> ten-database_x86_64.s9pk
+```
+
+## Install — PRODUCTION
+
+```bash
+start-cli package install -s ten-database_x86_64.s9pk   # target host = $START9_BOX_HOST
+```
+
+- `$START9_BOX_HOST` resolves from your local `start-cli` context config — the real hostname is **not** in this repo.
+- **Get explicit user authorization before any production deploy/install.** Verify a new migration against a **copy** of `data/crm.db` first, never the box's DB.
@@ -0,0 +1,26 @@
+---
+paths:
+  - backend/redaction/**
+  - backend/mcp/**
+---
+
+# Redaction & the Claude privacy boundary
+
+Read this before editing anything that sends data to a Claude model — the redaction layer or any MCP agent/tool path.
+
+## The boundary
+
+- `backend/redaction/` (`scrub.py` + `client.py`) is the **scrub → Claude → re-hydrate** boundary: `Boundary`, `SCRUB_BACKEND=local|gateway`, **fail-closed**.
+- `SCRUB_BACKEND=gateway` routes scrubbing through Spark Control (caller-supplied dict). Local backend scrubs in-process. If scrubbing can't run, the call fails closed — it does not pass raw text through.
+
+## Hard rules
+
+- **Keep real LP data out of Claude.** Develop only on code/schema/synthetic-or-locally-redacted data. Route any real record substance through `backend/redaction` before it reaches a Claude model.
+- **Never bulk-export the LP list** to any third party. Send only minimal, non-sensitive context to Claude.
+- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).
+
+## When adding a new Claude/MCP call
+
+Trace the data path: any field carrying LP substance must cross `Boundary` first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in `backend/redaction/test_*.py`.
+
+See also `docs/redaction-rehydration.md` and `docs/spark-control-scrub-endpoints.md`.
@@ -0,0 +1,21 @@
+---
+paths:
+  - backend/ingest/**
+---
+
+# Ingest, retrieval & Spark/Qdrant
+
+Read this before editing the ingest pipeline or retrieval modes.
+
+## Pipeline
+
+- `backend/ingest/` is chunk → embed → Qdrant plus retrieval modes (`search.py`, `embed.py`, `qdrant_io.py`, `sparse.py`, `entity_resolution.py`).
+- Local models — bge-m3 embeddings, bge-reranker-v2-m3, `/api/search` — run **always via Spark Control**, never against a Spark directly (`SPARK_CONTROL_URL`). The retrieval/embeddings contract is `docs/EMBEDDINGS.md`; honor it.
+
+## Hard rule
+
+- **Never treat Qdrant (or any derived index) as source of truth.** The CRM / SQLite is canonical and the index is rebuildable from it. Code may drop and rebuild the Qdrant collection; it must never read a fact from Qdrant that isn't recoverable from SQLite.
+
+## Entity resolution
+
+The two-investor-model reconciliation (classic `contacts`/`lp_profiles` vs the `fundraising_*` grid → canonical IDs) is the core entity-resolution task. See `backend/entity_*.py` and `docs/crm-overview.md`.
@@ -0,0 +1,28 @@
+---
+paths:
+  - backend/thesis_seed.py
+  - backend/thesis_review.py
+  - backend/mcp/architect_agent.py
+  - backend/mcp/architect_tools.py
+  - backend/mcp/architect_grounding.py
+---
+
+# Thesis Workshop & canonical gate
+
+Read this before editing thesis nodes, versions, the review flow, or the Architect copilot.
+
+## The two layers
+
+- **Working tree** — thesis nodes with status `draft | candidate | approved | retired`. Code and seeds may move nodes around this ladder freely.
+- **Canonical** — a frozen `thesis_version`, the read source for the live agent prompt. A version becomes canonical **only** by human **dual** sign-off through `backend/thesis_review.py` (currently Grant + Jonathan).
+
+## Hard rules
+
+- **Never set a `thesis_version` canonical from code or seeds.** That is human dual sign-off, full stop. `ensure_*` seeders may promote a *working* spine to `approved` (node-level, reversible) but must not freeze a canonical version.
+- **Soft-delete subtlety — this trips people up:** `_node_tree` and `create_thesis_version` filter on `deleted_at IS NULL` and **ignore status**. So to drop a node from *both* the live agent prompt and version snapshots you must set `deleted_at` — setting `status='retired'` alone leaves it in the tree.
+
+## Boot behavior
+
+- On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible) — it does **not** freeze a canonical version. Promotion to canonical still waits on dual sign-off in the Workshop.
+
+See also `docs/thesis-handoff.md` for the current thesis content state.