Add business-card photo intake to the Matrix bot (M3)

The intake bot now accepts a photo of a business card in the intake room and
turns it into the same new-investor proposal a typed note would. The only new
step is image -> text; everything downstream (parse, fuzzy match, in-thread
approval, log-communication write) is reused unchanged.

M3 was deferred only because Spark Control had no vision model. That blocker is
gone: the daily-driver Qwen is vision-capable under the same model id, and the
gateway forwards OpenAI multimodal content untouched, so no gateway/server/s9pk
change is needed -- this ships bot-only (git pull + rebuild on the Spark).

Transcribe-then-reuse (not vision-straight-to-JSON) is deliberate: the
transcription becomes the source text the email-integrity rule checks against,
so a mis-read address can't reach the CRM unapproved -- same guarantee as the
text path. Card commits tag source="matrix_card" for the audit log.

- llm.chat_vision: multimodal /v1/chat/completions, same model, same gateway
- spark.transcribe_card: faithful card->text, "" on a non-card (NONE sentinel)
- bot.on_image/handle_card: download image, transcribe, hand to handle_intake
- crm_client: source provenance overridable via the proposal's _source key
- tests: test_spark.py + a provenance case; 41/41 suite green
This commit is contained in:
Keysat
2026-06-20 10:26:27 -05:00
parent be40520c3d
commit 536358093f
7 changed files with 209 additions and 6 deletions
+26
View File
@@ -19,3 +19,29 @@ import llm # noqa: E402 (backend/ingest/llm.py — chat / chat_json over Spark
def parse_json(prompt, system=None, max_tokens=400):
"""Send to local Qwen (temp 0, thinking off) and parse the first JSON object, or None."""
return llm.chat_json(prompt, system=system, max_tokens=max_tokens)
# The vision model only TRANSCRIBES the card; the existing text-parse flow then extracts the
# structured proposal from that transcription. Keeping the two steps separate (vs. asking the
# vision model for JSON directly) is deliberate: the transcription becomes the source text the
# email-integrity check runs against, so the "only keep an address that literally appears in the
# source, never let the model mint one" rule (parse.normalize) protects card intake too.
CARD_SYSTEM = (
"You are transcribing a photo of a business card for a venture-fund team. Read every line of "
"text on the card and write it out exactly as printed — the person's name, job title, company "
"or firm name, email address, phone number(s), website, and mailing address. Copy the email "
"address and phone numbers character-for-character; never guess, complete, or correct them. Do "
"not summarize, translate, or add anything that is not printed on the card. If the image is not "
"a readable business card, reply with the single word NONE. Output only the transcription, one "
"item per line."
)
def transcribe_card(image_b64, mime="image/jpeg", chat_fn=None):
"""Vision-transcribe a business card to faithful text via the local VL model (same model and
Spark Control endpoint as the text parse). Returns the transcription string, or '' if the model
saw no readable card. `chat_fn` is injectable for offline tests (defaults to Spark/VL)."""
chat_fn = chat_fn or llm.chat_vision
out = (chat_fn("Transcribe this business card.", image_b64, mime=mime,
system=CARD_SYSTEM, max_tokens=600) or "").strip()
return "" if out.upper() == "NONE" else out