Matrix intake: strip surrounding punctuation from extracted emails
normalize()'s email regex matched non-@/non-space runs, so "Name <addr>" (the most common contact format) yielded "<addr"; only trailing punctuation was stripped, never leading. Tighten the regex to standard local@domain.tld so the bare address is extracted from <…>, (…), and trailing-period forms. Found via the live-deploy pre-flight. Add a regression test. Also log two intake backlog items in ROADMAP: the scoped service-credential auth path (deferred; bot uses a member login for now) and fuzzy match + in-thread confirm (post-deploy).
This commit is contained in:
+11
@@ -100,6 +100,17 @@ Use the **matrix-bridge** repo's pattern to listen on a dedicated ten31-database
|
||||
- **CRM-side:** `POST /api/intake/investor` (service-auth) creates a new investor+contact **through the existing grid-save path** (so relational sync + audit + backup-on-write happen as with a UI edit; bot never does whole-blob RMW) or appends a meeting note to the interaction log for an existing investor; `GET /api/intake/match?q=` fuzzy-matches via the existing entity-resolution/email-matcher. New investor needs no fund at intake.
|
||||
- **Phases:** M1 = scaffold + parse + in-thread propose, **no writes** (proves Matrix↔Spark). M2 = intake endpoint + match + write-on-approve + tests. M3 (deferred) = business-card photo.
|
||||
|
||||
**Post-deploy enhancement — fuzzy match + in-thread confirm (Grant, 2026-06-17).** Today `find_intake_match` is **exact-after-normalization** (`_normalize_text` = lowercase+strip), so near-misses — "Charlie" vs "Charles" (same last name), "Acme Capital" vs "Acme Capital LLC", a one-character email typo — return no match and the bot proposes a **new** investor, risking a duplicate the human approves without realizing a near-match exists. The existing in-thread approval gate is useless against this because the human is never *shown* the near-match. Fix: matcher returns **ranked fuzzy candidates** (deterministic pre-filter: normalized name similarity / token overlap + email edit-distance ≤ ~2), surfaced in-thread for the human to confirm or pick, with the **local Spark LLM optionally re-ranking/judging the shortlist** (good at Charlie/Charles + legal-suffix equivalence; fed only the shortlist, never the whole LP list). Keeps the approval gate but makes it effective against duplicates. Land **after** the live smoke — net-new logic + reply grammar + tests; the current exact match is safe and its failure mode (a duplicate) is recoverable via the existing entity-merge subsystem (`backend/entity_*.py`).
|
||||
|
||||
### Scoped service-credential auth path for automated CRM writers
|
||||
*Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.*
|
||||
|
||||
**Accepted residual risk (why this is worth revisiting):** a member credential is far broader than the bot's actual need (two endpoints: `GET /api/intake/match`, `POST /api/fundraising/log-communication`). A member can **read the entire LP/prospect database** — the exact data this system exists to keep off third-party servers — plus broad member-level *write* within the fundraising domain (could create/append on any investor). The credential lives in a `.env` on the Spark, so a Spark compromise leaks read-access to all LP data. Mitigating context: own-infra, LAN-local; the Matrix bot is the **first out-of-process API writer** (the digest runs in-process with direct DB access), so there is exactly **one** consumer today → building a token-scope framework now is premature (YAGNI).
|
||||
|
||||
**Right long-term design:** a hashed, revocable **service token** with a per-route **scope allowlist** (intake-match + log-communication only), minted/revoked from the admin panel, replacing the bot's member login. Revocation then kills the token without rotating a reused human password.
|
||||
|
||||
**Build trigger:** when a **second** out-of-process automated writer appears, OR before **any** automated writer is reachable beyond the LAN — whichever comes first. Build it once, properly, at that point.
|
||||
|
||||
### Admin-only vs. all-users web-UI surface — audit
|
||||
*Requested 2026-06-16 (idea, P2).* Have the **explorer agent** report which web-UI functionality is visible only to admins vs. to all users (member role) — a map of the role-gated surface across `frontend/index.html` and the backend route auth checks. Useful input for the consolidation/permissions work.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user