diff --git a/ROADMAP.md b/ROADMAP.md index 2ca0ce1..6204346 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -100,6 +100,17 @@ Use the **matrix-bridge** repo's pattern to listen on a dedicated ten31-database - **CRM-side:** `POST /api/intake/investor` (service-auth) creates a new investor+contact **through the existing grid-save path** (so relational sync + audit + backup-on-write happen as with a UI edit; bot never does whole-blob RMW) or appends a meeting note to the interaction log for an existing investor; `GET /api/intake/match?q=` fuzzy-matches via the existing entity-resolution/email-matcher. New investor needs no fund at intake. - **Phases:** M1 = scaffold + parse + in-thread propose, **no writes** (proves Matrix↔Spark). M2 = intake endpoint + match + write-on-approve + tests. M3 (deferred) = business-card photo. +**Post-deploy enhancement — fuzzy match + in-thread confirm (Grant, 2026-06-17).** Today `find_intake_match` is **exact-after-normalization** (`_normalize_text` = lowercase+strip), so near-misses — "Charlie" vs "Charles" (same last name), "Acme Capital" vs "Acme Capital LLC", a one-character email typo — return no match and the bot proposes a **new** investor, risking a duplicate the human approves without realizing a near-match exists. The existing in-thread approval gate is useless against this because the human is never *shown* the near-match. Fix: matcher returns **ranked fuzzy candidates** (deterministic pre-filter: normalized name similarity / token overlap + email edit-distance ≤ ~2), surfaced in-thread for the human to confirm or pick, with the **local Spark LLM optionally re-ranking/judging the shortlist** (good at Charlie/Charles + legal-suffix equivalence; fed only the shortlist, never the whole LP list). Keeps the approval gate but makes it effective against duplicates. Land **after** the live smoke — net-new logic + reply grammar + tests; the current exact match is safe and its failure mode (a duplicate) is recoverable via the existing entity-merge subsystem (`backend/entity_*.py`). + +### Scoped service-credential auth path for automated CRM writers +*Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.* + +**Accepted residual risk (why this is worth revisiting):** a member credential is far broader than the bot's actual need (two endpoints: `GET /api/intake/match`, `POST /api/fundraising/log-communication`). A member can **read the entire LP/prospect database** — the exact data this system exists to keep off third-party servers — plus broad member-level *write* within the fundraising domain (could create/append on any investor). The credential lives in a `.env` on the Spark, so a Spark compromise leaks read-access to all LP data. Mitigating context: own-infra, LAN-local; the Matrix bot is the **first out-of-process API writer** (the digest runs in-process with direct DB access), so there is exactly **one** consumer today → building a token-scope framework now is premature (YAGNI). + +**Right long-term design:** a hashed, revocable **service token** with a per-route **scope allowlist** (intake-match + log-communication only), minted/revoked from the admin panel, replacing the bot's member login. Revocation then kills the token without rotating a reused human password. + +**Build trigger:** when a **second** out-of-process automated writer appears, OR before **any** automated writer is reachable beyond the LAN — whichever comes first. Build it once, properly, at that point. + ### Admin-only vs. all-users web-UI surface — audit *Requested 2026-06-16 (idea, P2).* Have the **explorer agent** report which web-UI functionality is visible only to admins vs. to all users (member role) — a map of the role-gated surface across `frontend/index.html` and the backend route auth checks. Useful input for the consolidation/permissions work. diff --git a/backend/matrix_intake/parse.py b/backend/matrix_intake/parse.py index b9d05dc..9fd75d9 100644 --- a/backend/matrix_intake/parse.py +++ b/backend/matrix_intake/parse.py @@ -20,7 +20,7 @@ SYSTEM = ( "Use null (not empty string) for anything not present. Output JSON only." ) -_EMAIL_RE = re.compile(r"[^@\s]+@[^@\s]+\.[^@\s]+") +_EMAIL_RE = re.compile(r"[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}") _VALID_INTENTS = {"new_investor", "meeting_note", "unclear"} _FIELDS = ("intent", "investor_name", "contact_name", "contact_email", "contact_title", "note") diff --git a/backend/matrix_intake/test_parse.py b/backend/matrix_intake/test_parse.py index d2c35e0..41789d8 100644 --- a/backend/matrix_intake/test_parse.py +++ b/backend/matrix_intake/test_parse.py @@ -43,6 +43,23 @@ def test_fabricated_email_dropped_when_not_in_source(): assert p["contact_email"] is None +def test_email_extracted_without_surrounding_punctuation(): + # "Name " is the most common contact format; parens / trailing period also occur. + # The salvage-from-source path must extract the bare address, never the brackets. + cases = [ + ("New investor: Larch Capital — Dana Reed , met at conf", "dana@larchcap.com"), + ("ping (sam@beta.io) re the deck", "sam@beta.io"), + ("reach kim@acme.co.", "kim@acme.co"), + ] + for src, expected in cases: + p = parse.parse_message( + src, + parse_fn=_stub({"intent": "new_investor", "investor_name": "X", + "contact_name": "Y", "contact_email": None}), + ) + assert p["contact_email"] == expected, (src, p["contact_email"]) + + def test_meeting_note_intent_preserved(): p = parse.parse_message( "Note for Acme Capital: wants the Q3 deck",