7ad0ee7624
New backend/matrix_intake/ runs as its own process (matrix-nio isolated from the stdlib CRM): local-Qwen parse via Spark Control → in-thread human approval (yes/edit/no) → write through the CRM's own log-communication endpoint, tagged source=matrix_intake. Adds read-only GET /api/intake/match (returns grid row id, no-duplicate contract); threads provenance through handle_log_fundraising_communication. Reviewer-passed: pop-before-commit closes a double-approve race; edit-grammar fix. Text-only v1; business-card photo (M3) deferred (no Spark vision model). 26/26 tests green; live Matrix smoke pending deploy.
4.3 KiB
4.3 KiB
paths
| paths | |
|---|---|
|
Matrix intake bot
Read this before editing backend/matrix_intake/. The bot turns a typed message in a
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on in-thread human
approval before any write. Phase status: M1 + M2 built (text intake + approval + write);
M3 (business-card photo) deferred — Spark Control has no vision model yet.
What it is (and isn't)
- A separate process, not part of the CRM. Its only third-party dep,
matrix-nio, lives inbackend/matrix_intake/requirements.txtand must never be added to the stdlib CRM (backend/server.py). Runs on the Spark (placement perstandards/guides/placement.md). - It drafts; a human approves. Nothing is written autonomously — every CRM write follows a
yesreply in the proposal thread. This is exempt from "agents draft, humans send" the same way the digest is: it's internal data entry to our own CRM, not outward LP contact. - It is not a parallel write path. It reuses the CRM's own canonical endpoint
POST /api/fundraising/log-communication(create-if-missing + contact upsert + note + relational sync + audit) for both new-investor and existing-note cases. Don't reimplement grid mutation in the bot.
Flow
- Top-level message in the intake room →
parse.parse_message→ local Qwen via Spark Control (spark.pyreusesbackend/ingest/llm.py; temp 0, JSON only) extracts{intent, investor_name, contact_name, contact_email, contact_title, note}. crm_client.match(GET /api/intake/match) checks new-vs-existing and returns the grid row id so an approved note lands on exactly that investor (no duplicate).- The proposal is posted in a thread rooted at the user's message; the pending proposal is
held in memory keyed by that thread root (
proposals.ProposalStore). - User replies in-thread:
yes/edit field=value/no. Onyes,crm_client.commitPOSTs tolog-communicationtaggedsource="matrix_intake"(provenance in the audit log).
Rules / gotchas
- Module-name collision: the intake config module is
settings.py, notconfig.py, becausebackend/ingest/config.pyis imported (as bareconfig) throughspark → llm. A secondconfigmodule would shadow it insys.modulesand breakllm(CHAT_MODEL). Keep intake module names from colliding with ingest's (config,http_util,llm). - Email integrity:
parse.normalizeonly keeps an address that literally appears in the source message — the model must never mint one (a wrong email is worse than none). It takes the first address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com") could attach the wrong one; the human sees it in the proposal and canedit email=…before approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal for v1. - Double-approve guard:
handle_replypops the pending proposal from the store before awaiting the commit, so a secondyesarriving mid-write is a no-op (asyncio is cooperative; the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry. - Local-only parse: intake text is real LP substance but goes ONLY to local Qwen via Spark
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
Spark directly; always go through
SPARK_CONTROL_URL. - Auth: the CRM has no service-key path; the bot logs in as a dedicated CRM user
(
CRM_BOT_USERNAME/CRM_BOT_PASSWORD) → Bearer JWT, re-login once on 401. - Tests are offline:
test_parse.py/test_proposals.py/test_crm_client.pystub the network;backend/test_intake_endpoints.pyboots the real server against a temp DB and covers/api/intake/match+ the create→match (no-duplicate) contract + provenance. A live Matrix smoke needs creds +matrix-nioinstalled on the Spark — it can't run in CI.
Config
All in .env (names in .env.example): MATRIX_HOMESERVER, MATRIX_USER,
MATRIX_ACCESS_TOKEN, MATRIX_DEVICE_ID, MATRIX_INTAKE_ROOM; CRM_API_BASE,
CRM_BOT_USERNAME, CRM_BOT_PASSWORD, CRM_API_VERIFY_TLS. Spark settings are inherited from
the ingest client (SPARK_CONTROL_URL, CRM_CHAT_MODEL).