The email-capture "proposed grid notes" gain two review surfaces:
1. Inline source email — each proposed-note card on the Email Capture page
gets a "View email" toggle that lazily fetches the existing
GET /api/email/detail and shows from/to/cc/date/subject + scrollable body,
so a reviewer can judge the note against the email it was drafted from.
2. CRM->Matrix review bridge — the CRM (box, stdlib, no matrix-nio) can't post
to Matrix, so the intake bot (Spark) PULLS: GET /api/intake/email-proposals
returns to_post/open/to_close work-lists; the bot posts a review card
(metadata + snippet + draft note) to a dedicated review room
(MATRIX_EMAIL_REVIEW_ROOM) and relays in-thread yes / no / NL-edit
(POST .../{id}/decide, note revised via local Qwen). Decisions sync both
ways: web decide -> bot announces + closes the thread; Matrix decide -> the
web panel's ~25s poll clears the card. State lives CRM-side in the new
email_proposal_matrix side row (email-integration migration 0003, additive
+ idempotent CREATE TABLE IF NOT EXISTS), so it survives a bot restart.
Adds a 'bot' role (authenticated, never admin; require_bot_or_admin) to gate
the email-proposal endpoints rather than handing the bot full admin — the
principled base for the coming agentic capabilities. Role controls reach;
the draft->approve gate still controls autonomy (a human approves every write).
Deploy split: endpoints + migration + role + frontend ship in the s9pk; the
bot poll loop + review-room handling ship on the Spark. The bot's CRM user
must be flipped member->bot and joined to the review room (one-time).
Tests: backend/test_email_proposal_matrix.py + matrix_intake/test_email_proposals.py
(30/30 suite green, render-smoke green, migration verified twice on a DB copy).
18 KiB
paths
| paths | |
|---|---|
|
Matrix intake bot
Read this before editing backend/matrix_intake/. The bot turns a typed message in a
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on in-thread human
approval before any write. Phase status: M1 + M2 deployed & live (text intake + approval + write; bot on the Spark,
CRM endpoints on the box at v0.1.0:86; live-smoked 2026-06-17). M3 (business-card photo)
deferred — Spark Control has no vision model yet.
Post-deploy UX pass — DEPLOYED & LIVE 2026-06-17: fuzzy investor matching (server-side,
v0.1.0:86, installed to the box — candidates endpoint verified live) + in-thread
disambiguation and conversational natural-language edits (bot-side, pulled + restarted on the
Spark). See Fuzzy matching below. Tests green (27/27 backend + the offline bot suite); the
Matrix live-smoke of the disambiguation grammar and the Qwen revise leg is still pending.
What it is (and isn't)
- A separate process, not part of the CRM. Its only third-party dep,
matrix-nio, lives inbackend/matrix_intake/requirements.txtand must never be added to the stdlib CRM (backend/server.py). Runs on the Spark (placement perstandards/guides/placement.md). - It drafts; a human approves. Nothing is written autonomously — every CRM write follows a
yesreply in the proposal thread. This is exempt from "agents draft, humans send" the same way the digest is: it's internal data entry to our own CRM, not outward LP contact. - It is not a parallel write path. It reuses the CRM's own canonical endpoint
POST /api/fundraising/log-communication(create-if-missing + contact upsert + note + relational sync + audit) for both new-investor and existing-note cases. Don't reimplement grid mutation in the bot.
Flow
- Top-level message in the intake room →
parse.parse_message→ local Qwen via Spark Control (spark.pyreusesbackend/ingest/llm.py; temp 0, JSON only) extracts{intent, investor_name, contact_name, contact_email, contact_title, note}. The original message text is stashed on the proposal as_source_text(needed later forrevise's email-integrity check). The system prompt is built byparse.build_system(roster), which — when a team roster is configured (INTAKE_TEAM_ROSTER, see Config) — appends an outreach frame: those names are our own team members doing the outreach, so a teammate's name is never extracted as the investor/contact and the other party is the prospect. Fixes the live-smoke gripe where "jonathan is chatting with wyoming" picked the teammate, not the prospect.revisegets the same framing. Roster unset → prior behavior (no frame). crm_client.match(GET /api/intake/match) resolves new-vs-existing. It returns both an exactmatch(returns the grid row id so an approved note lands on exactly that investor, no duplicate) and, when there's no exact match, a ranked list of fuzzycandidates(see Fuzzy matching below).- Three outcomes drive what gets posted, all in a thread rooted at the user's message, plus a
brief main-timeline nudge (a plain reply —
matrix_io.make_reply) so it isn't missed:- Exact match → auto-attach: proposal flips to
meeting_notewith_match_idset, rendered as the normal approval card. - Fuzzy candidates, no exact → a disambiguation card (
proposals.render_disambiguation): the proposal is held at_stage="disambiguate"with_candidates, and the human must pick a number /new/nobefore it becomes an approval-stage proposal. - Neither → the new-investor approval card.
The nudge is a pointer only, not a reply target — you need the thread to act. The pending
proposal is held in memory keyed by the thread root (
proposals.ProposalStore).
- Exact match → auto-attach: proposal flips to
- User replies in the thread.
handle_replybranches on_stage:- disambiguate (
handle_disambiguation): a number attaches to that candidate (→meeting_note_match_id, re-rendered for approval);newproceeds as a new investor;nodiscards.
- approval:
yescommits;nodiscards;edit field=valueis the deterministic fast-path edit; anything else is treated as a natural-language revision —parse.revisesends{current proposal + instruction}back through local Qwen and re-renders the revised card (a no-op revision is detected viaproposals.same_fieldsand re-prompts instead of saying "Updated"). Onyes,crm_client.commitPOSTs tolog-communicationtaggedsource="matrix_intake"(provenance in the audit log). A bareyes/notyped top-level (not in the thread) while a proposal is pending gets a "reply in the thread" redirect (store.any_pending()guard inhandle_intake), not a misparsed new intake.
- disambiguate (
Fuzzy matching (server-side, ships in the s9pk)
GET /api/intake/match returns {match, candidates}. find_intake_match is unchanged —
exact-after-normalization, and an exact match still auto-attaches without disambiguation.
find_intake_candidates (new) is the fuzzy layer, deterministic, no LLM: it scans the same
canonical grid blob and scores each row by max(name similarity, email near-match), keeping
rows ≥ min_score (0.62), ranked, capped at 5:
- Name (
_name_similarity): max of stdlibdifflibsequence ratio (near-spellings — "Charlie"/"Charles") and token-set Jaccard (word-order). Legal-entity suffixes (LLC/LP/Inc/… via_strip_legal_suffix) are stripped first, so "Acme Capital" ~ "Acme Capital LLC" scores 1.0 (a near-certain duplicatefind_intake_matchmisses because it compares the full string) — and is surfaced as a candidate, never auto-attached (the human still confirms). - Email (
_email_edit_distance): Levenshtein ≤ 2 against each contact email (dist 1→0.9, 2→0.8). Distance 0 is an exact email — that'sfind_intake_match's job, skipped here. - Recall-favoring by design: a shared common name-word ("… Capital") can lift an unrelated firm into the 0.6–0.8 band. Acceptable — it's a ranked, human-confirmed shortlist, and the cost of an occasional stray suggestion is far lower than missing a real near-duplicate. Semantic pruning of the shortlist (the "Charlie really is Charles" judgment) is a deferred LLM-judge re-rank — fed only the shortlist, never the whole LP list — intentionally NOT built in this pass, because the deterministic filter already surfaces every duplicate the human then resolves.
Email-activity proposal review (the CRM→Matrix bridge, v0.1.0:89)
A second, separate flow runs alongside intake: reviewing the proposed grid notes the CRM
drafts from newly-matched email (server.propose_email_activity_notes, surfaced on the web Email
Capture panel). The bot lets the team approve/dismiss/edit those on mobile, kept in sync with
the web panel. The CRM (box, stdlib, no matrix-nio) can't post to Matrix, so the bot pulls.
- Dedicated room (
MATRIX_EMAIL_REVIEW_ROOM, see Config) — separate from the intake room so high-volume email proposals don't drown the conversational intake. Unset → the whole leg is off (the bot just does intake). The bot must be a member of this room. - Poll loop (
bot.poll_email_proposals, everyEMAIL_POLL_SEC=20s) callscrm_client. list_email_proposals→GET /api/intake/email-proposals, which returns three work-lists:- to_post — pending, not yet posted → the bot posts a review card (metadata + a short email
snippet + the drafted note; the full body is the web popup's job, kept compact for mobile),
then records the thread-root event id via
POST .../{id}/matrix {event_id}. - open — pending, posted, not closed → the bot rebuilds its
event_id → proposalrouting map from these on every poll, so replies still route after a bot restart (unlike intake's in-memory-only store — the state lives CRM-side inemail_proposal_matrix). - to_close — decided on the web while a thread was open → the bot posts a "decided on the
web — thread closed" line and
POST .../{id}/matrix {closed:true}.
- to_post — pending, not yet posted → the bot posts a review card (metadata + a short email
snippet + the drafted note; the full body is the web popup's job, kept compact for mobile),
then records the thread-root event id via
- In-thread replies (
bot.handle_email_reply,email_proposals.interpret):yes→POST .../{id}/decide {decision:"approve", note}(appends the note to the grid, source='matrix', closes the thread atomically);no→ dismiss; anything else → NL revision of the note via local Qwen (email_proposals.revise_note, no Claude/scrub) — re-rendered for re-approval, so the draft→approve gate holds. A no-op/empty revision re-prompts instead of saying "Updated". - Two surfaces, one source of truth. Decide on the web → the bot announces + closes the thread;
decide on Matrix → the web panel polls
/api/activity/proposals(~25s) and the card clears.email_proposal_matrix(1:1 side row, migration0003) carriesevent_id/posted_at/closed_at; a matrix decision setsclosed_atin the same txn so it's never re-announced viato_close. - Pure logic is
email_proposals.py(card render, reply grammar, note revision) — unit-tested offline intest_email_proposals.py; the async poll/post wiring is inbot.py(live-smoke only). - Known minors (low-likelihood, ~5-person team): if the CRM is unreachable between posting a
card and recording its event id, the next poll re-posts a duplicate card (the orphan's replies
won't route — re-send/decide the recorded one). A mid-revise bot restart loses the in-memory
revised note (rebuilt from
open= the originalproposed_note; still a valid proposal).
Rules / gotchas
- Module-name collision: the intake config module is
settings.py, notconfig.py, becausebackend/ingest/config.pyis imported (as bareconfig) throughspark → llm. A secondconfigmodule would shadow it insys.modulesand breakllm(CHAT_MODEL). Keep intake module names from colliding with ingest's (config,http_util,llm). - Email integrity:
parse.normalizeonly keeps an address that literally appears in the source message — the model must never mint one (a wrong email is worse than none). It takes the first address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com") could attach the wrong one; the human sees it in the proposal and canedit email=…before approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal for v1. - Conversational revise keeps the email rule:
parse.revisere-runs a free-form correction through Qwen but never trusts the model's email field. A changed address is accepted only if it literally appears in the instruction text (searched first), else the existing integrity-checked address is kept (_apply_revision). The model can edit name/contact/title/note freely but cannot mint an email. A revision that nulls both investor and contact is rejected (the proposal can't be emptied to something unactionable). Revise edits fields on the current proposal; it does not re-run the matcher if you rename the firm mid-thread (a known v1 limit — the human still approves). - Deploy is split across two surfaces (mind which one carries a change): the fuzzy
candidatescome fromserver.py→ ship in the s9pk (build + install, version-bumped). The bot's disambiguation flow +reviselive inbackend/matrix_intake/→ ship on the Spark viagit pull+ restart. A bot restart alone won't delivercandidates(the box would return an empty list and the bot just proposes new — safe, but no fuzzy surfacing until the s9pk is installed). Same lesson as the v83→v84/api/intake/match404. - Double-approve guard:
handle_replypops the pending proposal from the store before awaiting the commit, so a secondyesarriving mid-write is a no-op (asyncio is cooperative; the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry. Known minor: in the disambiguate stage the pick re-stores an approval-stage proposal before itsawait say, so a rapidly-repeated1can have the second one fall through to the NL-revise path (a wasted Spark round-trip that re-prompts) — harmless, nothing commits, not guarded (low likelihood on a ~5-person team). - Local-only parse: intake text is real LP substance but goes ONLY to local Qwen via Spark
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
Spark directly; always go through
SPARK_CONTROL_URL. - Auth: the CRM has no service-key path; the bot logs in as a dedicated CRM user
(
CRM_BOT_USERNAME/CRM_BOT_PASSWORD) → Bearer JWT, re-login once on 401. - Tests are offline:
test_parse.py/test_proposals.py/test_crm_client.pystub the network;backend/test_intake_endpoints.pyboots the real server against a temp DB and covers/api/intake/match+ the create→match (no-duplicate) contract + provenance. A live Matrix smoke needs creds +matrix-nioinstalled on the Spark — it can't run in CI. - Grid note line: the bot sends a blank
subjectwhen there's a note so the CRM's one-line note summary shows the note text (the CRM renders subject-or-body); a provenance label is sent only when there's no note. v0.1.0:85 also dropped the redundant[note]type tag from that server-side line (informative types like[call]keep theirs).
Deployment & ops
- Runs on the Spark as a docker container (
matrix-intake), since 2026-06-17 — SSH aliasmodelo32, hostspark-32d0, repo clone at/home/modelo/ten31-database. Defined bydocker-compose.ymlat the repo root +backend/matrix_intake/Dockerfile. The image bundlesbackend/matrix_intake/andbackend/ingest/(spark.py reaches into the latter's stdlib Spark client via sys.path);.envis mounted read-only at/app/.env.network_mode: hostso it reaches Matrix, the CRM, and Spark Control. Startup logslistening as … in room …. - Survives a Spark reboot via
restart: unless-stopped— the durability fix that retired the old barenohuplaunch. (The previous nohup method +/tmp/intake-bot.pidare gone.) - Deploy / update after a
git pull:cd /home/modelo/ten31-database && git pull && docker compose up -d --build. Logs:docker logs -f matrix-intake. Restart:docker restart matrix-intake. Stop:docker compose down. A restart still drops in-memory pending proposals (re-send to recover). - Not yet a spark-control dashboard card. The container is managed via
docker/SSH today; a managed card (Update/Restart/Stop/Logs tile, likematrix-bridge) is a separate spark-control task — seedocs/handoffs/add-intake-bot-to-spark-control.md. - Gotcha — the repo-root
.dockerignoreis SHARED with the s9pk build (start9/0.4/Dockerfile, same repo-root context). Don't add bot-only exclusions (e.g.frontend/,docs/) to it — you'd break the CRM image build, which needs them. It already excludes the security-critical bits (data/,.env), which is all the bot build needs. - Server-side endpoints ship in the s9pk, not the bot.
GET /api/intake/matchand thesourceprovenance onlog-communicationlive inbackend/server.py, so they reach the box only via an s9pk build + install — a bot restart won't deliver them. (Missed in v83: the box 404'd/api/intake/matchuntil v0.1.0:84.) Same split for the email-review bridge (v0.1.0:89): the/api/intake/email-proposals*endpoints + theemail_proposal_matrixmigration (0003) + thebotrole ship in the s9pk; the poll loop + review-room handling ship on the Spark (git pull + restart). A bot restart against a pre-v89 box returns nothing useful (404/empty), so install the s9pk first, then set the bot user's role + the review room. CRM_API_BASEis the box over the LAN, not localhost (bot on the Spark, CRM on the box).https://immense-voyage.local(443) is the StartOS dashboard, not the CRM — the CRM has its own interface address (the URL you open in a browser); container port 8080 isn't LAN-reachable.
Config
All in .env (names in .env.example): MATRIX_HOMESERVER, MATRIX_USER,
MATRIX_ACCESS_TOKEN, MATRIX_DEVICE_ID, MATRIX_INTAKE_ROOM; CRM_API_BASE,
CRM_BOT_USERNAME, CRM_BOT_PASSWORD, CRM_API_VERIFY_TLS. Spark settings are inherited from
the ingest client (SPARK_CONTROL_URL, CRM_CHAT_MODEL).
-
MATRIX_EMAIL_REVIEW_ROOM(optional) — the dedicated room for the email-activity proposal review leg (above). Unset/empty disables that leg entirely (the bot does intake only). The bot must be invited to + joined in this room. Read once at startup, like the room/roster. -
Bot CRM user needs role
bot. The email-proposal endpoints (/api/intake/email-proposals*) are gated torequire_bot_or_adminbecause they expose LP email content (the proposals are admin-only on the web). Thebotrole is authenticated-but-not-admin — it passes these endpoints + the auth-only ones the bot already uses (login,/api/intake/match,log-communication), but neverrequire_admin(no user-management/settings/security reach). One-time flip of the existing service account (kept out of the invite UI's member/admin dropdown — provision deliberately): an adminPATCH /api/users/<id> {"role":"bot"}, or on the boxUPDATE users SET role='bot' WHERE username='<CRM_BOT_USERNAME>';. Role controls reach; the draft→approve gate (a human still approves every write) controls autonomy — two separate axes. -
INTAKE_TEAM_ROSTER(optional, comma-separated) — Ten31 team-member names that frame the parse (see Flow step 1). Use the first names as actually typed in the room ("Grant, Jonathan, …"). Read once at startup bysettings.team_roster(), so a roster change needs a bot restart. It lives only in the Spark's.env(bot-side) — no s9pk change. Empty/unset disables the framing.