Add business-card photo intake to the Matrix bot (M3)

The intake bot now accepts a photo of a business card in the intake room and turns it into the same new-investor proposal a typed note would. The only new step is image -> text; everything downstream (parse, fuzzy match, in-thread approval, log-communication write) is reused unchanged. M3 was deferred only because Spark Control had no vision model. That blocker is gone: the daily-driver Qwen is vision-capable under the same model id, and the gateway forwards OpenAI multimodal content untouched, so no gateway/server/s9pk change is needed -- this ships bot-only (git pull + rebuild on the Spark). Transcribe-then-reuse (not vision-straight-to-JSON) is deliberate: the transcription becomes the source text the email-integrity rule checks against, so a mis-read address can't reach the CRM unapproved -- same guarantee as the text path. Card commits tag source="matrix_card" for the audit log. - llm.chat_vision: multimodal /v1/chat/completions, same model, same gateway - spark.transcribe_card: faithful card->text, "" on a non-card (NONE sentinel) - bot.on_image/handle_card: download image, transcribe, hand to handle_intake - crm_client: source provenance overridable via the proposal's _source key - tests: test_spark.py + a provenance case; 41/41 suite green
2026-06-20 10:26:27 -05:00
parent be40520c3d
commit 536358093f
7 changed files with 209 additions and 6 deletions
@@ -19,3 +19,29 @@ import llm  # noqa: E402  (backend/ingest/llm.py — chat / chat_json over Spark
 def parse_json(prompt, system=None, max_tokens=400):
    """Send to local Qwen (temp 0, thinking off) and parse the first JSON object, or None."""
    return llm.chat_json(prompt, system=system, max_tokens=max_tokens)
+
+
+# The vision model only TRANSCRIBES the card; the existing text-parse flow then extracts the
+# structured proposal from that transcription. Keeping the two steps separate (vs. asking the
+# vision model for JSON directly) is deliberate: the transcription becomes the source text the
+# email-integrity check runs against, so the "only keep an address that literally appears in the
+# source, never let the model mint one" rule (parse.normalize) protects card intake too.
+CARD_SYSTEM = (
+    "You are transcribing a photo of a business card for a venture-fund team. Read every line of "
+    "text on the card and write it out exactly as printed — the person's name, job title, company "
+    "or firm name, email address, phone number(s), website, and mailing address. Copy the email "
+    "address and phone numbers character-for-character; never guess, complete, or correct them. Do "
+    "not summarize, translate, or add anything that is not printed on the card. If the image is not "
+    "a readable business card, reply with the single word NONE. Output only the transcription, one "
+    "item per line."
+)
+
+
+def transcribe_card(image_b64, mime="image/jpeg", chat_fn=None):
+    """Vision-transcribe a business card to faithful text via the local VL model (same model and
+    Spark Control endpoint as the text parse). Returns the transcription string, or '' if the model
+    saw no readable card. `chat_fn` is injectable for offline tests (defaults to Spark/VL)."""
+    chat_fn = chat_fn or llm.chat_vision
+    out = (chat_fn("Transcribe this business card.", image_b64, mime=mime,
+                   system=CARD_SYSTEM, max_tokens=600) or "").strip()
+    return "" if out.upper() == "NONE" else out