7ad0ee7624
New backend/matrix_intake/ runs as its own process (matrix-nio isolated from the stdlib CRM): local-Qwen parse via Spark Control → in-thread human approval (yes/edit/no) → write through the CRM's own log-communication endpoint, tagged source=matrix_intake. Adds read-only GET /api/intake/match (returns grid row id, no-duplicate contract); threads provenance through handle_log_fundraising_communication. Reviewer-passed: pop-before-commit closes a double-approve race; edit-grammar fix. Text-only v1; business-card photo (M3) deferred (no Spark vision model). 26/26 tests green; live Matrix smoke pending deploy.
69 lines
4.3 KiB
Markdown
69 lines
4.3 KiB
Markdown
---
|
|
paths:
|
|
- backend/matrix_intake/**
|
|
---
|
|
|
|
# Matrix intake bot
|
|
|
|
Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
|
|
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
|
|
approval** before any write. Phase status: **M1 + M2 built** (text intake + approval + write);
|
|
**M3 (business-card photo) deferred** — Spark Control has no vision model yet.
|
|
|
|
## What it is (and isn't)
|
|
|
|
- A **separate process**, not part of the CRM. Its only third-party dep, `matrix-nio`, lives
|
|
in `backend/matrix_intake/requirements.txt` and **must never** be added to the stdlib CRM
|
|
(`backend/server.py`). Runs on the Spark (placement per `standards/guides/placement.md`).
|
|
- It **drafts; a human approves.** Nothing is written autonomously — every CRM write follows a
|
|
`yes` reply in the proposal thread. This is exempt from "agents draft, humans send" the same
|
|
way the digest is: it's internal data entry to our own CRM, not outward LP contact.
|
|
- It is **not** a parallel write path. It reuses the CRM's own canonical endpoint
|
|
`POST /api/fundraising/log-communication` (create-if-missing + contact upsert + note +
|
|
relational sync + audit) for both new-investor and existing-note cases. Don't reimplement
|
|
grid mutation in the bot.
|
|
|
|
## Flow
|
|
|
|
1. Top-level message in the intake room → `parse.parse_message` → local **Qwen via Spark
|
|
Control** (`spark.py` reuses `backend/ingest/llm.py`; temp 0, JSON only) extracts
|
|
`{intent, investor_name, contact_name, contact_email, contact_title, note}`.
|
|
2. `crm_client.match` (`GET /api/intake/match`) checks new-vs-existing and returns the **grid
|
|
row id** so an approved note lands on exactly that investor (no duplicate).
|
|
3. The proposal is posted **in a thread** rooted at the user's message; the pending proposal is
|
|
held in memory keyed by that thread root (`proposals.ProposalStore`).
|
|
4. User replies in-thread: `yes` / `edit field=value` / `no`. On `yes`, `crm_client.commit`
|
|
POSTs to `log-communication` tagged `source="matrix_intake"` (provenance in the audit log).
|
|
|
|
## Rules / gotchas
|
|
|
|
- **Module-name collision:** the intake config module is `settings.py`, **not** `config.py`,
|
|
because `backend/ingest/config.py` is imported (as bare `config`) through `spark → llm`. A
|
|
second `config` module would shadow it in `sys.modules` and break `llm` (`CHAT_MODEL`).
|
|
Keep intake module names from colliding with ingest's (`config`, `http_util`, `llm`).
|
|
- **Email integrity:** `parse.normalize` only keeps an address that literally appears in the
|
|
source message — the model must never mint one (a wrong email is worse than none). It takes
|
|
the **first** address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com")
|
|
could attach the wrong one; the human sees it in the proposal and can `edit email=…` before
|
|
approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal
|
|
for v1.
|
|
- **Double-approve guard:** `handle_reply` pops the pending proposal from the store *before*
|
|
awaiting the commit, so a second `yes` arriving mid-write is a no-op (asyncio is cooperative;
|
|
the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
|
|
- **Local-only parse:** intake text is real LP substance but goes ONLY to local Qwen via Spark
|
|
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
|
|
Spark directly; always go through `SPARK_CONTROL_URL`.
|
|
- **Auth:** the CRM has no service-key path; the bot logs in as a dedicated CRM user
|
|
(`CRM_BOT_USERNAME`/`CRM_BOT_PASSWORD`) → Bearer JWT, re-login once on 401.
|
|
- **Tests** are offline: `test_parse.py` / `test_proposals.py` / `test_crm_client.py` stub the
|
|
network; `backend/test_intake_endpoints.py` boots the real server against a temp DB and
|
|
covers `/api/intake/match` + the create→match (no-duplicate) contract + provenance. A **live
|
|
Matrix smoke** needs creds + `matrix-nio` installed on the Spark — it can't run in CI.
|
|
|
|
## Config
|
|
|
|
All in `.env` (names in `.env.example`): `MATRIX_HOMESERVER`, `MATRIX_USER`,
|
|
`MATRIX_ACCESS_TOKEN`, `MATRIX_DEVICE_ID`, `MATRIX_INTAKE_ROOM`; `CRM_API_BASE`,
|
|
`CRM_BOT_USERNAME`, `CRM_BOT_PASSWORD`, `CRM_API_VERIFY_TLS`. Spark settings are inherited from
|
|
the ingest client (`SPARK_CONTROL_URL`, `CRM_CHAT_MODEL`).
|