Files

T

Keysat 7ad0ee7624 Add Matrix intake bot (M1+M2): typed message → approved fundraising-grid write

New backend/matrix_intake/ runs as its own process (matrix-nio isolated from the
stdlib CRM): local-Qwen parse via Spark Control → in-thread human approval
(yes/edit/no) → write through the CRM's own log-communication endpoint, tagged
source=matrix_intake. Adds read-only GET /api/intake/match (returns grid row id,
no-duplicate contract); threads provenance through handle_log_fundraising_communication.
Reviewer-passed: pop-before-commit closes a double-approve race; edit-grammar fix.
Text-only v1; business-card photo (M3) deferred (no Spark vision model).
26/26 tests green; live Matrix smoke pending deploy.

2026-06-17 07:51:27 -05:00

4.3 KiB

Raw Blame History

paths

backend/matrix_intake/**

Matrix intake bot

Read this before editing backend/matrix_intake/. The bot turns a typed message in a dedicated Matrix room into a proposed fundraising-grid add/edit, gated on in-thread human approval before any write. Phase status: M1 + M2 built (text intake + approval + write); M3 (business-card photo) deferred — Spark Control has no vision model yet.

What it is (and isn't)

A separate process, not part of the CRM. Its only third-party dep, matrix-nio, lives in backend/matrix_intake/requirements.txt and must never be added to the stdlib CRM (backend/server.py). Runs on the Spark (placement per standards/guides/placement.md).
It drafts; a human approves. Nothing is written autonomously — every CRM write follows a yes reply in the proposal thread. This is exempt from "agents draft, humans send" the same way the digest is: it's internal data entry to our own CRM, not outward LP contact.
It is not a parallel write path. It reuses the CRM's own canonical endpoint POST /api/fundraising/log-communication (create-if-missing + contact upsert + note + relational sync + audit) for both new-investor and existing-note cases. Don't reimplement grid mutation in the bot.

Flow

Top-level message in the intake room → parse.parse_message → local Qwen via Spark Control (spark.py reuses backend/ingest/llm.py; temp 0, JSON only) extracts {intent, investor_name, contact_name, contact_email, contact_title, note}.
crm_client.match (GET /api/intake/match) checks new-vs-existing and returns the grid row id so an approved note lands on exactly that investor (no duplicate).
The proposal is posted in a thread rooted at the user's message; the pending proposal is held in memory keyed by that thread root (proposals.ProposalStore).
User replies in-thread: yes / edit field=value / no. On yes, crm_client.commit POSTs to log-communication tagged source="matrix_intake" (provenance in the audit log).

Rules / gotchas

Module-name collision: the intake config module is settings.py, not config.py, because backend/ingest/config.py is imported (as bare config) through spark → llm. A second config module would shadow it in sys.modules and break llm (CHAT_MODEL). Keep intake module names from colliding with ingest's (config, http_util, llm).
Email integrity: parse.normalize only keeps an address that literally appears in the source message — the model must never mint one (a wrong email is worse than none). It takes the first address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com") could attach the wrong one; the human sees it in the proposal and can edit email=… before approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal for v1.
Double-approve guard: handle_reply pops the pending proposal from the store before awaiting the commit, so a second yes arriving mid-write is a no-op (asyncio is cooperative; the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
Local-only parse: intake text is real LP substance but goes ONLY to local Qwen via Spark Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a Spark directly; always go through SPARK_CONTROL_URL.
Auth: the CRM has no service-key path; the bot logs in as a dedicated CRM user (CRM_BOT_USERNAME/CRM_BOT_PASSWORD) → Bearer JWT, re-login once on 401.
Tests are offline: test_parse.py / test_proposals.py / test_crm_client.py stub the network; backend/test_intake_endpoints.py boots the real server against a temp DB and covers /api/intake/match + the create→match (no-duplicate) contract + provenance. A live Matrix smoke needs creds + matrix-nio installed on the Spark — it can't run in CI.

Config

All in .env (names in .env.example): MATRIX_HOMESERVER, MATRIX_USER, MATRIX_ACCESS_TOKEN, MATRIX_DEVICE_ID, MATRIX_INTAKE_ROOM; CRM_API_BASE, CRM_BOT_USERNAME, CRM_BOT_PASSWORD, CRM_API_VERIFY_TLS. Spark settings are inherited from the ingest client (SPARK_CONTROL_URL, CRM_CHAT_MODEL).

4.3 KiB Raw Blame History