Add Matrix intake bot (M1+M2): typed message → approved fundraising-grid write

New backend/matrix_intake/ runs as its own process (matrix-nio isolated from the stdlib CRM): local-Qwen parse via Spark Control → in-thread human approval (yes/edit/no) → write through the CRM's own log-communication endpoint, tagged source=matrix_intake. Adds read-only GET /api/intake/match (returns grid row id, no-duplicate contract); threads provenance through handle_log_fundraising_communication. Reviewer-passed: pop-before-commit closes a double-approve race; edit-grammar fix. Text-only v1; business-card photo (M3) deferred (no Spark vision model). 26/26 tests green; live Matrix smoke pending deploy.
2026-06-17 07:51:27 -05:00
parent 172c76553b
commit 7ad0ee7624
20 changed files with 1169 additions and 7 deletions
@@ -0,0 +1,68 @@
+---
+paths:
+  - backend/matrix_intake/**
+---
+
+# Matrix intake bot
+
+Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
+dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
+approval** before any write. Phase status: **M1 + M2 built** (text intake + approval + write);
+**M3 (business-card photo) deferred** — Spark Control has no vision model yet.
+
+## What it is (and isn't)
+
+- A **separate process**, not part of the CRM. Its only third-party dep, `matrix-nio`, lives
+  in `backend/matrix_intake/requirements.txt` and **must never** be added to the stdlib CRM
+  (`backend/server.py`). Runs on the Spark (placement per `standards/guides/placement.md`).
+- It **drafts; a human approves.** Nothing is written autonomously — every CRM write follows a
+  `yes` reply in the proposal thread. This is exempt from "agents draft, humans send" the same
+  way the digest is: it's internal data entry to our own CRM, not outward LP contact.
+- It is **not** a parallel write path. It reuses the CRM's own canonical endpoint
+  `POST /api/fundraising/log-communication` (create-if-missing + contact upsert + note +
+  relational sync + audit) for both new-investor and existing-note cases. Don't reimplement
+  grid mutation in the bot.
+
+## Flow
+
+1. Top-level message in the intake room → `parse.parse_message` → local **Qwen via Spark
+   Control** (`spark.py` reuses `backend/ingest/llm.py`; temp 0, JSON only) extracts
+   `{intent, investor_name, contact_name, contact_email, contact_title, note}`.
+2. `crm_client.match` (`GET /api/intake/match`) checks new-vs-existing and returns the **grid
+   row id** so an approved note lands on exactly that investor (no duplicate).
+3. The proposal is posted **in a thread** rooted at the user's message; the pending proposal is
+   held in memory keyed by that thread root (`proposals.ProposalStore`).
+4. User replies in-thread: `yes` / `edit field=value` / `no`. On `yes`, `crm_client.commit`
+   POSTs to `log-communication` tagged `source="matrix_intake"` (provenance in the audit log).
+
+## Rules / gotchas
+
+- **Module-name collision:** the intake config module is `settings.py`, **not** `config.py`,
+  because `backend/ingest/config.py` is imported (as bare `config`) through `spark → llm`. A
+  second `config` module would shadow it in `sys.modules` and break `llm` (`CHAT_MODEL`).
+  Keep intake module names from colliding with ingest's (`config`, `http_util`, `llm`).
+- **Email integrity:** `parse.normalize` only keeps an address that literally appears in the
+  source message — the model must never mint one (a wrong email is worse than none). It takes
+  the **first** address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com")
+  could attach the wrong one; the human sees it in the proposal and can `edit email=…` before
+  approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal
+  for v1.
+- **Double-approve guard:** `handle_reply` pops the pending proposal from the store *before*
+  awaiting the commit, so a second `yes` arriving mid-write is a no-op (asyncio is cooperative;
+  the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
+- **Local-only parse:** intake text is real LP substance but goes ONLY to local Qwen via Spark
+  Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
+  Spark directly; always go through `SPARK_CONTROL_URL`.
+- **Auth:** the CRM has no service-key path; the bot logs in as a dedicated CRM user
+  (`CRM_BOT_USERNAME`/`CRM_BOT_PASSWORD`) → Bearer JWT, re-login once on 401.
+- **Tests** are offline: `test_parse.py` / `test_proposals.py` / `test_crm_client.py` stub the
+  network; `backend/test_intake_endpoints.py` boots the real server against a temp DB and
+  covers `/api/intake/match` + the create→match (no-duplicate) contract + provenance. A **live
+  Matrix smoke** needs creds + `matrix-nio` installed on the Spark — it can't run in CI.
+
+## Config
+
+All in `.env` (names in `.env.example`): `MATRIX_HOMESERVER`, `MATRIX_USER`,
+`MATRIX_ACCESS_TOKEN`, `MATRIX_DEVICE_ID`, `MATRIX_INTAKE_ROOM`; `CRM_API_BASE`,
+`CRM_BOT_USERNAME`, `CRM_BOT_PASSWORD`, `CRM_API_VERIFY_TLS`. Spark settings are inherited from
+the ingest client (`SPARK_CONTROL_URL`, `CRM_CHAT_MODEL`).