Matrix intake: strip surrounding punctuation from extracted emails

normalize()'s email regex matched non-@/non-space runs, so "Name <addr>"
(the most common contact format) yielded "<addr"; only trailing punctuation
was stripped, never leading. Tighten the regex to standard local@domain.tld
so the bare address is extracted from <…>, (…), and trailing-period forms.
Found via the live-deploy pre-flight. Add a regression test.

Also log two intake backlog items in ROADMAP: the scoped service-credential
auth path (deferred; bot uses a member login for now) and fuzzy match +
in-thread confirm (post-deploy).
This commit is contained in:
Keysat
2026-06-17 14:06:32 -05:00
parent 7ad0ee7624
commit fd2e3ed78e
3 changed files with 29 additions and 1 deletions
+1 -1
View File
@@ -20,7 +20,7 @@ SYSTEM = (
"Use null (not empty string) for anything not present. Output JSON only."
)
_EMAIL_RE = re.compile(r"[^@\s]+@[^@\s]+\.[^@\s]+")
_EMAIL_RE = re.compile(r"[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}")
_VALID_INTENTS = {"new_investor", "meeting_note", "unclear"}
_FIELDS = ("intent", "investor_name", "contact_name", "contact_email", "contact_title", "note")