Matrix intake: strip surrounding punctuation from extracted emails
normalize()'s email regex matched non-@/non-space runs, so "Name <addr>" (the most common contact format) yielded "<addr"; only trailing punctuation was stripped, never leading. Tighten the regex to standard local@domain.tld so the bare address is extracted from <…>, (…), and trailing-period forms. Found via the live-deploy pre-flight. Add a regression test. Also log two intake backlog items in ROADMAP: the scoped service-credential auth path (deferred; bot uses a member login for now) and fuzzy match + in-thread confirm (post-deploy).
This commit is contained in:
@@ -20,7 +20,7 @@ SYSTEM = (
|
||||
"Use null (not empty string) for anything not present. Output JSON only."
|
||||
)
|
||||
|
||||
_EMAIL_RE = re.compile(r"[^@\s]+@[^@\s]+\.[^@\s]+")
|
||||
_EMAIL_RE = re.compile(r"[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}")
|
||||
_VALID_INTENTS = {"new_investor", "meeting_note", "unclear"}
|
||||
_FIELDS = ("intent", "investor_name", "contact_name", "contact_email", "contact_title", "note")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user