Local-smoke found "jonathan is chatting with wyoming" extracted the teammate, not the prospect. Feed the parser an optional team roster (INTAKE_TEAM_ROSTER) via a build_system(roster) outreach frame: roster names/initials are the people doing outreach and are never extracted; the other party is the investor/prospect. Same framing on the revise leg. Unset roster = prior behavior.
13 KiB
paths
| paths | |
|---|---|
|
Matrix intake bot
Read this before editing backend/matrix_intake/. The bot turns a typed message in a
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on in-thread human
approval before any write. Phase status: M1 + M2 deployed & live (text intake + approval + write; bot on the Spark,
CRM endpoints on the box at v0.1.0:86; live-smoked 2026-06-17). M3 (business-card photo)
deferred — Spark Control has no vision model yet.
Post-deploy UX pass — DEPLOYED & LIVE 2026-06-17: fuzzy investor matching (server-side,
v0.1.0:86, installed to the box — candidates endpoint verified live) + in-thread
disambiguation and conversational natural-language edits (bot-side, pulled + restarted on the
Spark). See Fuzzy matching below. Tests green (27/27 backend + the offline bot suite); the
Matrix live-smoke of the disambiguation grammar and the Qwen revise leg is still pending.
What it is (and isn't)
- A separate process, not part of the CRM. Its only third-party dep,
matrix-nio, lives inbackend/matrix_intake/requirements.txtand must never be added to the stdlib CRM (backend/server.py). Runs on the Spark (placement perstandards/guides/placement.md). - It drafts; a human approves. Nothing is written autonomously — every CRM write follows a
yesreply in the proposal thread. This is exempt from "agents draft, humans send" the same way the digest is: it's internal data entry to our own CRM, not outward LP contact. - It is not a parallel write path. It reuses the CRM's own canonical endpoint
POST /api/fundraising/log-communication(create-if-missing + contact upsert + note + relational sync + audit) for both new-investor and existing-note cases. Don't reimplement grid mutation in the bot.
Flow
- Top-level message in the intake room →
parse.parse_message→ local Qwen via Spark Control (spark.pyreusesbackend/ingest/llm.py; temp 0, JSON only) extracts{intent, investor_name, contact_name, contact_email, contact_title, note}. The original message text is stashed on the proposal as_source_text(needed later forrevise's email-integrity check). The system prompt is built byparse.build_system(roster), which — when a team roster is configured (INTAKE_TEAM_ROSTER, see Config) — appends an outreach frame: those names are our own team members doing the outreach, so a teammate's name is never extracted as the investor/contact and the other party is the prospect. Fixes the live-smoke gripe where "jonathan is chatting with wyoming" picked the teammate, not the prospect.revisegets the same framing. Roster unset → prior behavior (no frame). crm_client.match(GET /api/intake/match) resolves new-vs-existing. It returns both an exactmatch(returns the grid row id so an approved note lands on exactly that investor, no duplicate) and, when there's no exact match, a ranked list of fuzzycandidates(see Fuzzy matching below).- Three outcomes drive what gets posted, all in a thread rooted at the user's message, plus a
brief main-timeline nudge (a plain reply —
matrix_io.make_reply) so it isn't missed:- Exact match → auto-attach: proposal flips to
meeting_notewith_match_idset, rendered as the normal approval card. - Fuzzy candidates, no exact → a disambiguation card (
proposals.render_disambiguation): the proposal is held at_stage="disambiguate"with_candidates, and the human must pick a number /new/nobefore it becomes an approval-stage proposal. - Neither → the new-investor approval card.
The nudge is a pointer only, not a reply target — you need the thread to act. The pending
proposal is held in memory keyed by the thread root (
proposals.ProposalStore).
- Exact match → auto-attach: proposal flips to
- User replies in the thread.
handle_replybranches on_stage:- disambiguate (
handle_disambiguation): a number attaches to that candidate (→meeting_note_match_id, re-rendered for approval);newproceeds as a new investor;nodiscards.
- approval:
yescommits;nodiscards;edit field=valueis the deterministic fast-path edit; anything else is treated as a natural-language revision —parse.revisesends{current proposal + instruction}back through local Qwen and re-renders the revised card (a no-op revision is detected viaproposals.same_fieldsand re-prompts instead of saying "Updated"). Onyes,crm_client.commitPOSTs tolog-communicationtaggedsource="matrix_intake"(provenance in the audit log). A bareyes/notyped top-level (not in the thread) while a proposal is pending gets a "reply in the thread" redirect (store.any_pending()guard inhandle_intake), not a misparsed new intake.
- disambiguate (
Fuzzy matching (server-side, ships in the s9pk)
GET /api/intake/match returns {match, candidates}. find_intake_match is unchanged —
exact-after-normalization, and an exact match still auto-attaches without disambiguation.
find_intake_candidates (new) is the fuzzy layer, deterministic, no LLM: it scans the same
canonical grid blob and scores each row by max(name similarity, email near-match), keeping
rows ≥ min_score (0.62), ranked, capped at 5:
- Name (
_name_similarity): max of stdlibdifflibsequence ratio (near-spellings — "Charlie"/"Charles") and token-set Jaccard (word-order). Legal-entity suffixes (LLC/LP/Inc/… via_strip_legal_suffix) are stripped first, so "Acme Capital" ~ "Acme Capital LLC" scores 1.0 (a near-certain duplicatefind_intake_matchmisses because it compares the full string) — and is surfaced as a candidate, never auto-attached (the human still confirms). - Email (
_email_edit_distance): Levenshtein ≤ 2 against each contact email (dist 1→0.9, 2→0.8). Distance 0 is an exact email — that'sfind_intake_match's job, skipped here. - Recall-favoring by design: a shared common name-word ("… Capital") can lift an unrelated firm into the 0.6–0.8 band. Acceptable — it's a ranked, human-confirmed shortlist, and the cost of an occasional stray suggestion is far lower than missing a real near-duplicate. Semantic pruning of the shortlist (the "Charlie really is Charles" judgment) is a deferred LLM-judge re-rank — fed only the shortlist, never the whole LP list — intentionally NOT built in this pass, because the deterministic filter already surfaces every duplicate the human then resolves.
Rules / gotchas
- Module-name collision: the intake config module is
settings.py, notconfig.py, becausebackend/ingest/config.pyis imported (as bareconfig) throughspark → llm. A secondconfigmodule would shadow it insys.modulesand breakllm(CHAT_MODEL). Keep intake module names from colliding with ingest's (config,http_util,llm). - Email integrity:
parse.normalizeonly keeps an address that literally appears in the source message — the model must never mint one (a wrong email is worse than none). It takes the first address in the text, so a two-person message ("Alice a@x.com and Bob b@y.com") could attach the wrong one; the human sees it in the proposal and canedit email=…before approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal for v1. - Conversational revise keeps the email rule:
parse.revisere-runs a free-form correction through Qwen but never trusts the model's email field. A changed address is accepted only if it literally appears in the instruction text (searched first), else the existing integrity-checked address is kept (_apply_revision). The model can edit name/contact/title/note freely but cannot mint an email. A revision that nulls both investor and contact is rejected (the proposal can't be emptied to something unactionable). Revise edits fields on the current proposal; it does not re-run the matcher if you rename the firm mid-thread (a known v1 limit — the human still approves). - Deploy is split across two surfaces (mind which one carries a change): the fuzzy
candidatescome fromserver.py→ ship in the s9pk (build + install, version-bumped). The bot's disambiguation flow +reviselive inbackend/matrix_intake/→ ship on the Spark viagit pull+ restart. A bot restart alone won't delivercandidates(the box would return an empty list and the bot just proposes new — safe, but no fuzzy surfacing until the s9pk is installed). Same lesson as the v83→v84/api/intake/match404. - Double-approve guard:
handle_replypops the pending proposal from the store before awaiting the commit, so a secondyesarriving mid-write is a no-op (asyncio is cooperative; the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry. Known minor: in the disambiguate stage the pick re-stores an approval-stage proposal before itsawait say, so a rapidly-repeated1can have the second one fall through to the NL-revise path (a wasted Spark round-trip that re-prompts) — harmless, nothing commits, not guarded (low likelihood on a ~5-person team). - Local-only parse: intake text is real LP substance but goes ONLY to local Qwen via Spark
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
Spark directly; always go through
SPARK_CONTROL_URL. - Auth: the CRM has no service-key path; the bot logs in as a dedicated CRM user
(
CRM_BOT_USERNAME/CRM_BOT_PASSWORD) → Bearer JWT, re-login once on 401. - Tests are offline:
test_parse.py/test_proposals.py/test_crm_client.pystub the network;backend/test_intake_endpoints.pyboots the real server against a temp DB and covers/api/intake/match+ the create→match (no-duplicate) contract + provenance. A live Matrix smoke needs creds +matrix-nioinstalled on the Spark — it can't run in CI. - Grid note line: the bot sends a blank
subjectwhen there's a note so the CRM's one-line note summary shows the note text (the CRM renders subject-or-body); a provenance label is sent only when there's no note. v0.1.0:85 also dropped the redundant[note]type tag from that server-side line (informative types like[call]keep theirs).
Deployment & ops
- Runs on the Spark as a docker container (
matrix-intake), since 2026-06-17 — SSH aliasmodelo32, hostspark-32d0, repo clone at/home/modelo/ten31-database. Defined bydocker-compose.ymlat the repo root +backend/matrix_intake/Dockerfile. The image bundlesbackend/matrix_intake/andbackend/ingest/(spark.py reaches into the latter's stdlib Spark client via sys.path);.envis mounted read-only at/app/.env.network_mode: hostso it reaches Matrix, the CRM, and Spark Control. Startup logslistening as … in room …. - Survives a Spark reboot via
restart: unless-stopped— the durability fix that retired the old barenohuplaunch. (The previous nohup method +/tmp/intake-bot.pidare gone.) - Deploy / update after a
git pull:cd /home/modelo/ten31-database && git pull && docker compose up -d --build. Logs:docker logs -f matrix-intake. Restart:docker restart matrix-intake. Stop:docker compose down. A restart still drops in-memory pending proposals (re-send to recover). - Not yet a spark-control dashboard card. The container is managed via
docker/SSH today; a managed card (Update/Restart/Stop/Logs tile, likematrix-bridge) is a separate spark-control task — seedocs/handoffs/add-intake-bot-to-spark-control.md. - Gotcha — the repo-root
.dockerignoreis SHARED with the s9pk build (start9/0.4/Dockerfile, same repo-root context). Don't add bot-only exclusions (e.g.frontend/,docs/) to it — you'd break the CRM image build, which needs them. It already excludes the security-critical bits (data/,.env), which is all the bot build needs. - Server-side endpoints ship in the s9pk, not the bot.
GET /api/intake/matchand thesourceprovenance onlog-communicationlive inbackend/server.py, so they reach the box only via an s9pk build + install — a bot restart won't deliver them. (Missed in v83: the box 404'd/api/intake/matchuntil v0.1.0:84.) CRM_API_BASEis the box over the LAN, not localhost (bot on the Spark, CRM on the box).https://immense-voyage.local(443) is the StartOS dashboard, not the CRM — the CRM has its own interface address (the URL you open in a browser); container port 8080 isn't LAN-reachable.
Config
All in .env (names in .env.example): MATRIX_HOMESERVER, MATRIX_USER,
MATRIX_ACCESS_TOKEN, MATRIX_DEVICE_ID, MATRIX_INTAKE_ROOM; CRM_API_BASE,
CRM_BOT_USERNAME, CRM_BOT_PASSWORD, CRM_API_VERIFY_TLS. Spark settings are inherited from
the ingest client (SPARK_CONTROL_URL, CRM_CHAT_MODEL).
INTAKE_TEAM_ROSTER(optional, comma-separated) — Ten31 team-member names that frame the parse (see Flow step 1). Use the first names as actually typed in the room ("Grant, Jonathan, …"). Read once at startup bysettings.team_roster(), so a roster change needs a bot restart. It lives only in the Spark's.env(bot-side) — no s9pk change. Empty/unset disables the framing.