Add business-card photo intake to the Matrix bot (M3)
The intake bot now accepts a photo of a business card in the intake room and turns it into the same new-investor proposal a typed note would. The only new step is image -> text; everything downstream (parse, fuzzy match, in-thread approval, log-communication write) is reused unchanged. M3 was deferred only because Spark Control had no vision model. That blocker is gone: the daily-driver Qwen is vision-capable under the same model id, and the gateway forwards OpenAI multimodal content untouched, so no gateway/server/s9pk change is needed -- this ships bot-only (git pull + rebuild on the Spark). Transcribe-then-reuse (not vision-straight-to-JSON) is deliberate: the transcription becomes the source text the email-integrity rule checks against, so a mis-read address can't reach the CRM unapproved -- same guarantee as the text path. Card commits tag source="matrix_card" for the audit log. - llm.chat_vision: multimodal /v1/chat/completions, same model, same gateway - spark.transcribe_card: faithful card->text, "" on a non-card (NONE sentinel) - bot.on_image/handle_card: download image, transcribe, hand to handle_intake - crm_client: source provenance overridable via the proposal's _source key - tests: test_spark.py + a provenance case; 41/41 suite green
This commit is contained in:
@@ -8,8 +8,9 @@ paths:
|
||||
Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
|
||||
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
|
||||
approval** before any write. Phase status: **M1 + M2 deployed & live** (text intake + approval + write; bot on the Spark,
|
||||
CRM endpoints on the box at **v0.1.0:86**; live-smoked 2026-06-17). **M3 (business-card photo)
|
||||
deferred** — Spark Control has no vision model yet.
|
||||
CRM endpoints on the box at **v0.1.0:86**; live-smoked 2026-06-17). **M3 (business-card photo) BUILT —
|
||||
bot-only, awaiting live-smoke** (the prior blocker — "Spark Control has no vision model" — is gone:
|
||||
the daily-driver model is now vision-capable; see *Business-card capture* below).
|
||||
|
||||
**Post-deploy UX pass — DEPLOYED & LIVE 2026-06-17:** fuzzy investor matching (server-side,
|
||||
**v0.1.0:86**, installed to the box — `candidates` endpoint verified live) + in-thread
|
||||
@@ -69,6 +70,47 @@ Spark). See *Fuzzy matching* below. Tests green (27/27 backend + the offline bot
|
||||
"reply in the thread" redirect (`store.any_pending()` guard in `handle_intake`), not a
|
||||
misparsed new intake.
|
||||
|
||||
## Business-card capture (M3 — image intake)
|
||||
|
||||
Send a **photo of a business card** into the intake room and the bot turns it into the same
|
||||
new-investor proposal a typed note would. The **only added step is image → text**; from there the
|
||||
existing flow (parse → match → disambiguate → approve → `log-communication`) runs **unchanged** —
|
||||
`handle_card` just calls `handle_intake` with the transcription.
|
||||
|
||||
- **Trigger:** a top-level `m.image` event in the intake room (`on_image` → `handle_card` in
|
||||
`bot.py`; registered via a second `add_event_callback(on_image, RoomMessageImage)`). Images in
|
||||
the Q&A / email-review rooms, the bot's own uploads, and an image dropped **inside an existing
|
||||
thread** are ignored. The card's own event becomes the proposal thread root, like a text message.
|
||||
- **The one new call** (`spark.transcribe_card` → `llm.chat_vision`): download the image
|
||||
(`client.download(mxc=event.url)` — **unencrypted only**; an E2EE room delivers a different event
|
||||
class we don't register for, so encryption is naturally excluded), base64-encode, and POST an
|
||||
**OpenAI multimodal** `/v1/chat/completions` to **Spark Control** — *same endpoint, same model id*
|
||||
(`CRM_CHAT_MODEL`, the daily-driver Qwen, `capabilities: [vision, reasoning]`), with the user
|
||||
message's `content` an array of a text part + an `image_url` data-URI. Spark Control is a **dumb
|
||||
passthrough** (`image/app/llm_proxy.py`), so **no gateway change** was needed. The model
|
||||
**transcribes** the card; it does not emit JSON.
|
||||
- **Why transcribe-then-reuse (not vision-straight-to-JSON):** the transcription becomes the
|
||||
**source text** the email-integrity rule checks against — `parse.normalize` only keeps an address
|
||||
that *literally appears in the source*, never one the model mints. So a mis-read address can't
|
||||
reach the CRM unapproved, exactly as on the text path, and 100% of parse/match/disambiguation/
|
||||
approval is reused. The transcription is framed (`"New investor — from a business card:\n…"`) so
|
||||
the extractor reads it as a new investor.
|
||||
- **Provenance:** a card commit tags `source="matrix_card"` (vs `"matrix_intake"` for a typed note)
|
||||
in the audit log, threaded via the proposal's `_source` control key (`handle_intake(…, source=…)`
|
||||
→ `crm_client.build_commit_payload`, which defaults to `"matrix_intake"` when absent).
|
||||
- **UX:** the bot acks `📇 Reading the card…` before the (slower) vision call; an unreadable image
|
||||
(model replies `NONE`, or transcription < 5 chars) gets a "try a clearer, well-lit photo" reply
|
||||
instead of a garbage proposal.
|
||||
- **Deploy is bot-only** — the change lives in `backend/matrix_intake/` (`bot.py`, `spark.py`) +
|
||||
`backend/ingest/llm.py` (bundled into the bot image), shipped on the **Spark** via `git pull` +
|
||||
`docker compose up -d --build`. **No s9pk, no version bump, no new env** (same model; no auth on
|
||||
the LAN). Contrast with M2 / email-review, whose server endpoints had to ship in the s9pk.
|
||||
- **Known limits (live-smoke checklist):** ① a StartOS reverse-proxy body cap could `413` a large
|
||||
photo — the model already downscales server-side (`max_pixels` ≈ 2 MP), so if it trips, add a
|
||||
client-side resize (would pull Pillow into the bot image); ② iPhone **HEIC** may not decode in
|
||||
vLLM's PIL — most clients (Element iOS) transcode to JPEG on upload, but confirm on-device; ③ the
|
||||
offline tests stub the vision call (`test_spark.py`); the download + real OCR is **live-smoke only**.
|
||||
|
||||
## Fuzzy matching (server-side, ships in the s9pk)
|
||||
|
||||
`GET /api/intake/match` returns `{match, candidates}`. `find_intake_match` is unchanged —
|
||||
|
||||
Reference in New Issue
Block a user