Capture city + LinkedIn on card intake; sharpen the transcription prompt

The card transcription prompt now reads emails/URLs/phones character-by-character,
explicitly forbids autocompleting toward a plausible domain (the mara.com ->
marac.com failure), and emits labeled lines (which also feeds the field extractor
cleaner input).

The extractor gains city + linkedin_url. city is a plain field (low-harm if wrong;
the human sees it on the card). linkedin_url follows the email-integrity rule: kept
only if it literally appears in the source / a revise instruction, never minted -- a
wrong profile URL points at the wrong person. Both flow to the contact via the
existing log-communication upsert (city also syncs to the grid contact pill).

Phone is intentionally NOT included yet: the bot's write path can't store it until a
small server-side change lands (next s9pk). See the matrix-intake guide.
This commit is contained in:
Keysat
2026-06-20 11:07:17 -05:00
parent 5e115a3409
commit 8b2eb01a65
8 changed files with 120 additions and 13 deletions
+14 -7
View File
@@ -27,13 +27,20 @@ def parse_json(prompt, system=None, max_tokens=400):
# email-integrity check runs against, so the "only keep an address that literally appears in the
# source, never let the model mint one" rule (parse.normalize) protects card intake too.
CARD_SYSTEM = (
"You are transcribing a photo of a business card for a venture-fund team. Read every line of "
"text on the card and write it out exactly as printed — the person's name, job title, company "
"or firm name, email address, phone number(s), website, and mailing address. Copy the email "
"address and phone numbers character-for-character; never guess, complete, or correct them. Do "
"not summarize, translate, or add anything that is not printed on the card. If the image is not "
"a readable business card, reply with the single word NONE. Output only the transcription, one "
"item per line."
"You are transcribing a photo of a business card. Copy the text EXACTLY as printed — never "
"paraphrase, translate, complete, normalize, or correct anything.\n"
"Read each of these character-by-character and reproduce every glyph precisely. Do NOT 'fix' "
"them toward a more common spelling or a well-known company's domain, and never add or drop a "
"character:\n"
" - Email: check the local part, the @, and the domain separately (transcribe 'mara.com' as "
"'mara.com', never 'marac.com').\n"
" - Phone number(s).\n"
" - Website / LinkedIn URL.\n"
"Then list, each on its own labeled line and ONLY if present on the card:\n"
" Name: Title: Company: Email: Phone: LinkedIn: City:\n"
"If a character is genuinely ambiguous, give your single best reading — never invent extra "
"characters to fill a gap. If the image is not a readable business card, reply with the single "
"word NONE. Output only the labeled lines, nothing else."
)