Add Matrix intake bot (M1+M2): typed message → approved fundraising-grid write

New backend/matrix_intake/ runs as its own process (matrix-nio isolated from the
stdlib CRM): local-Qwen parse via Spark Control → in-thread human approval
(yes/edit/no) → write through the CRM's own log-communication endpoint, tagged
source=matrix_intake. Adds read-only GET /api/intake/match (returns grid row id,
no-duplicate contract); threads provenance through handle_log_fundraising_communication.
Reviewer-passed: pop-before-commit closes a double-approve race; edit-grammar fix.
Text-only v1; business-card photo (M3) deferred (no Spark vision model).
26/26 tests green; live Matrix smoke pending deploy.
This commit is contained in:
Keysat
2026-06-17 07:51:27 -05:00
parent 172c76553b
commit 7ad0ee7624
20 changed files with 1169 additions and 7 deletions
+68
View File
@@ -0,0 +1,68 @@
---
paths:
- backend/matrix_intake/**
---
# Matrix intake bot
Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
approval** before any write. Phase status: **M1 + M2 built** (text intake + approval + write);
**M3 (business-card photo) deferred** — Spark Control has no vision model yet.
## What it is (and isn't)
- A **separate process**, not part of the CRM. Its only third-party dep, `matrix-nio`, lives
in `backend/matrix_intake/requirements.txt` and **must never** be added to the stdlib CRM
(`backend/server.py`). Runs on the Spark (placement per `standards/guides/placement.md`).
- It **drafts; a human approves.** Nothing is written autonomously — every CRM write follows a
`yes` reply in the proposal thread. This is exempt from "agents draft, humans send" the same
way the digest is: it's internal data entry to our own CRM, not outward LP contact.
- It is **not** a parallel write path. It reuses the CRM's own canonical endpoint
`POST /api/fundraising/log-communication` (create-if-missing + contact upsert + note +
relational sync + audit) for both new-investor and existing-note cases. Don't reimplement
grid mutation in the bot.
## Flow
1. Top-level message in the intake room → `parse.parse_message` → local **Qwen via Spark
Control** (`spark.py` reuses `backend/ingest/llm.py`; temp 0, JSON only) extracts
`{intent, investor_name, contact_name, contact_email, contact_title, note}`.
2. `crm_client.match` (`GET /api/intake/match`) checks new-vs-existing and returns the **grid
row id** so an approved note lands on exactly that investor (no duplicate).
3. The proposal is posted **in a thread** rooted at the user's message; the pending proposal is
held in memory keyed by that thread root (`proposals.ProposalStore`).
4. User replies in-thread: `yes` / `edit field=value` / `no`. On `yes`, `crm_client.commit`
POSTs to `log-communication` tagged `source="matrix_intake"` (provenance in the audit log).
## Rules / gotchas
- **Module-name collision:** the intake config module is `settings.py`, **not** `config.py`,
because `backend/ingest/config.py` is imported (as bare `config`) through `spark → llm`. A
second `config` module would shadow it in `sys.modules` and break `llm` (`CHAT_MODEL`).
Keep intake module names from colliding with ingest's (`config`, `http_util`, `llm`).
- **Email integrity:** `parse.normalize` only keeps an address that literally appears in the
source message — the model must never mint one (a wrong email is worse than none). It takes
the **first** address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com")
could attach the wrong one; the human sees it in the proposal and can `edit email=…` before
approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal
for v1.
- **Double-approve guard:** `handle_reply` pops the pending proposal from the store *before*
awaiting the commit, so a second `yes` arriving mid-write is a no-op (asyncio is cooperative;
the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
- **Local-only parse:** intake text is real LP substance but goes ONLY to local Qwen via Spark
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
Spark directly; always go through `SPARK_CONTROL_URL`.
- **Auth:** the CRM has no service-key path; the bot logs in as a dedicated CRM user
(`CRM_BOT_USERNAME`/`CRM_BOT_PASSWORD`) → Bearer JWT, re-login once on 401.
- **Tests** are offline: `test_parse.py` / `test_proposals.py` / `test_crm_client.py` stub the
network; `backend/test_intake_endpoints.py` boots the real server against a temp DB and
covers `/api/intake/match` + the create→match (no-duplicate) contract + provenance. A **live
Matrix smoke** needs creds + `matrix-nio` installed on the Spark — it can't run in CI.
## Config
All in `.env` (names in `.env.example`): `MATRIX_HOMESERVER`, `MATRIX_USER`,
`MATRIX_ACCESS_TOKEN`, `MATRIX_DEVICE_ID`, `MATRIX_INTAKE_ROOM`; `CRM_API_BASE`,
`CRM_BOT_USERNAME`, `CRM_BOT_PASSWORD`, `CRM_API_VERIFY_TLS`. Spark settings are inherited from
the ingest client (`SPARK_CONTROL_URL`, `CRM_CHAT_MODEL`).