Add business-card photo intake to the Matrix bot (M3)
The intake bot now accepts a photo of a business card in the intake room and turns it into the same new-investor proposal a typed note would. The only new step is image -> text; everything downstream (parse, fuzzy match, in-thread approval, log-communication write) is reused unchanged. M3 was deferred only because Spark Control had no vision model. That blocker is gone: the daily-driver Qwen is vision-capable under the same model id, and the gateway forwards OpenAI multimodal content untouched, so no gateway/server/s9pk change is needed -- this ships bot-only (git pull + rebuild on the Spark). Transcribe-then-reuse (not vision-straight-to-JSON) is deliberate: the transcription becomes the source text the email-integrity rule checks against, so a mis-read address can't reach the CRM unapproved -- same guarantee as the text path. Card commits tag source="matrix_card" for the audit log. - llm.chat_vision: multimodal /v1/chat/completions, same model, same gateway - spark.transcribe_card: faithful card->text, "" on a non-card (NONE sentinel) - bot.on_image/handle_card: download image, transcribe, hand to handle_intake - crm_client: source provenance overridable via the proposal's _source key - tests: test_spark.py + a provenance case; 41/41 suite green
This commit is contained in:
@@ -19,3 +19,29 @@ import llm # noqa: E402 (backend/ingest/llm.py — chat / chat_json over Spark
|
||||
def parse_json(prompt, system=None, max_tokens=400):
|
||||
"""Send to local Qwen (temp 0, thinking off) and parse the first JSON object, or None."""
|
||||
return llm.chat_json(prompt, system=system, max_tokens=max_tokens)
|
||||
|
||||
|
||||
# The vision model only TRANSCRIBES the card; the existing text-parse flow then extracts the
|
||||
# structured proposal from that transcription. Keeping the two steps separate (vs. asking the
|
||||
# vision model for JSON directly) is deliberate: the transcription becomes the source text the
|
||||
# email-integrity check runs against, so the "only keep an address that literally appears in the
|
||||
# source, never let the model mint one" rule (parse.normalize) protects card intake too.
|
||||
CARD_SYSTEM = (
|
||||
"You are transcribing a photo of a business card for a venture-fund team. Read every line of "
|
||||
"text on the card and write it out exactly as printed — the person's name, job title, company "
|
||||
"or firm name, email address, phone number(s), website, and mailing address. Copy the email "
|
||||
"address and phone numbers character-for-character; never guess, complete, or correct them. Do "
|
||||
"not summarize, translate, or add anything that is not printed on the card. If the image is not "
|
||||
"a readable business card, reply with the single word NONE. Output only the transcription, one "
|
||||
"item per line."
|
||||
)
|
||||
|
||||
|
||||
def transcribe_card(image_b64, mime="image/jpeg", chat_fn=None):
|
||||
"""Vision-transcribe a business card to faithful text via the local VL model (same model and
|
||||
Spark Control endpoint as the text parse). Returns the transcription string, or '' if the model
|
||||
saw no readable card. `chat_fn` is injectable for offline tests (defaults to Spark/VL)."""
|
||||
chat_fn = chat_fn or llm.chat_vision
|
||||
out = (chat_fn("Transcribe this business card.", image_b64, mime=mime,
|
||||
system=CARD_SYSTEM, max_tokens=600) or "").strip()
|
||||
return "" if out.upper() == "NONE" else out
|
||||
|
||||
Reference in New Issue
Block a user