8b2eb01a65
The card transcription prompt now reads emails/URLs/phones character-by-character, explicitly forbids autocompleting toward a plausible domain (the mara.com -> marac.com failure), and emits labeled lines (which also feeds the field extractor cleaner input). The extractor gains city + linkedin_url. city is a plain field (low-harm if wrong; the human sees it on the card). linkedin_url follows the email-integrity rule: kept only if it literally appears in the source / a revise instruction, never minted -- a wrong profile URL points at the wrong person. Both flow to the contact via the existing log-communication upsert (city also syncs to the grid contact pill). Phone is intentionally NOT included yet: the bot's write path can't store it until a small server-side change lands (next s9pk). See the matrix-intake guide.
55 lines
3.0 KiB
Python
55 lines
3.0 KiB
Python
"""Thin reuse of the in-repo local-Qwen client (backend/ingest/llm.py) via Spark Control.
|
|
|
|
We import the ingest client rather than re-implementing the HTTP call so the intake bot
|
|
speaks the exact same Spark contract (model, /v1/chat/completions, TLS verify, .env load).
|
|
The intake message is real LP substance, but it goes ONLY to the local Qwen on Ten31 infra
|
|
— never Claude — so no scrub boundary applies (same basis as the daily digest). Never call a
|
|
Spark directly; everything goes through SPARK_CONTROL_URL.
|
|
"""
|
|
import os
|
|
import sys
|
|
|
|
_INGEST = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "ingest")
|
|
if _INGEST not in sys.path:
|
|
sys.path.insert(0, _INGEST)
|
|
|
|
import llm # noqa: E402 (backend/ingest/llm.py — chat / chat_json over Spark Control)
|
|
|
|
|
|
def parse_json(prompt, system=None, max_tokens=400):
|
|
"""Send to local Qwen (temp 0, thinking off) and parse the first JSON object, or None."""
|
|
return llm.chat_json(prompt, system=system, max_tokens=max_tokens)
|
|
|
|
|
|
# The vision model only TRANSCRIBES the card; the existing text-parse flow then extracts the
|
|
# structured proposal from that transcription. Keeping the two steps separate (vs. asking the
|
|
# vision model for JSON directly) is deliberate: the transcription becomes the source text the
|
|
# email-integrity check runs against, so the "only keep an address that literally appears in the
|
|
# source, never let the model mint one" rule (parse.normalize) protects card intake too.
|
|
CARD_SYSTEM = (
|
|
"You are transcribing a photo of a business card. Copy the text EXACTLY as printed — never "
|
|
"paraphrase, translate, complete, normalize, or correct anything.\n"
|
|
"Read each of these character-by-character and reproduce every glyph precisely. Do NOT 'fix' "
|
|
"them toward a more common spelling or a well-known company's domain, and never add or drop a "
|
|
"character:\n"
|
|
" - Email: check the local part, the @, and the domain separately (transcribe 'mara.com' as "
|
|
"'mara.com', never 'marac.com').\n"
|
|
" - Phone number(s).\n"
|
|
" - Website / LinkedIn URL.\n"
|
|
"Then list, each on its own labeled line and ONLY if present on the card:\n"
|
|
" Name: Title: Company: Email: Phone: LinkedIn: City:\n"
|
|
"If a character is genuinely ambiguous, give your single best reading — never invent extra "
|
|
"characters to fill a gap. If the image is not a readable business card, reply with the single "
|
|
"word NONE. Output only the labeled lines, nothing else."
|
|
)
|
|
|
|
|
|
def transcribe_card(image_b64, mime="image/jpeg", chat_fn=None):
|
|
"""Vision-transcribe a business card to faithful text via the local VL model (same model and
|
|
Spark Control endpoint as the text parse). Returns the transcription string, or '' if the model
|
|
saw no readable card. `chat_fn` is injectable for offline tests (defaults to Spark/VL)."""
|
|
chat_fn = chat_fn or llm.chat_vision
|
|
out = (chat_fn("Transcribe this business card.", image_b64, mime=mime,
|
|
system=CARD_SYSTEM, max_tokens=600) or "").strip()
|
|
return "" if out.upper() == "NONE" else out
|