Matrix intake: fuzzy investor matching + conversational in-thread edits (v0.1.0:86)

Close the two locked post-deploy enhancements for the Matrix intake bot. Fuzzy matching (server-side, ships in the s9pk): new find_intake_candidates in server.py returns ranked deterministic near-matches (difflib name similarity + token-set Jaccard, legal-suffix-aware, + email Levenshtein <= 2); GET /api/intake/match now returns {match, candidates}. The bot surfaces a numbered shortlist so a near-duplicate (Charlie/Charles, Acme Capital vs Acme Capital LLC, a one-char email typo) is confirmed by a human instead of silently creating a second investor. Exact match still auto-attaches; fuzzy candidates are never auto-attached. The optional LLM-judge re-rank is deferred. Conversational edits (bot-side, ships on the Spark): any in-thread reply that isn't yes/no/edit field=value is treated as a natural-language revision and re-run through local Qwen (parse.revise). Email integrity is preserved -- a changed address must literally appear in the instruction; the model's email field is structurally unreachable. No-op revisions re-prompt. Docs/current-state brought current; 27/27 backend tests green.
2026-06-17 18:50:58 -05:00
parent fa6c9da0e6
commit 0b893295e1
15 changed files with 734 additions and 41 deletions
@@ -5,7 +5,12 @@ Matrix thread root (the bot's proposal lives in a thread rooted at the user's me
 the user replies inside that thread). In-memory and ephemeral by design — a restart drops
 pending proposals (the user just re-sends), matching matrix-bridge's stateless-by-default
 ethos. Nothing here writes to the CRM; the bot calls the CRM client only after `approve`.
+
+A proposal carries a `_stage`: "approval" (the normal yes/edit/no card) or "disambiguate"
+(a fuzzy-match shortlist the human must resolve — pick a number / "new" / "no" — before it
+becomes an approval-stage proposal). The shortlist itself rides on `_candidates`.
 """
+import re

 # field aliases accepted in `edit <field>=<value>`
 _EDIT_ALIASES = {
@@ -18,6 +23,10 @@ _EDIT_ALIASES = {

 _YES = {"yes", "y", "approve", "approved", "ok", "confirm", "go", "👍", "✅"}
 _NO = {"no", "n", "cancel", "discard", "reject", "stop", "👎", "❌"}
+# "create a new investor anyway" replies to a disambiguation shortlist
+_NEW = {"new", "none", "new investor", "none of these", "create", "create new", "add new", "neither"}
+
+_CONTENT_FIELDS = ("intent", "investor_name", "contact_name", "contact_email", "contact_title", "note")


 class ProposalStore:
@@ -84,6 +93,75 @@ def apply_edit(proposal, field, value):
    return updated


+def same_fields(a, b):
+    """True if two proposals carry identical content (used to detect a no-op NL revision so we
+    don't tell the human 'Updated' when nothing changed)."""
+    return all((a or {}).get(k) == (b or {}).get(k) for k in _CONTENT_FIELDS)
+
+
+def interpret_disambiguation(text, n_candidates):
+    """Classify a reply to a fuzzy-match shortlist.
+
+    Returns ("pick", index) | ("new", None) | ("reject", None) | ("unknown", None). A bare
+    number selects that candidate; "new"/"none" creates a new investor; "no"/"cancel" discards."""
+    t = (text or "").strip().lower()
+    if not t:
+        return ("unknown", None)
+    if t in _NO:
+        return ("reject", None)
+    if t in _NEW:
+        return ("new", None)
+    m = re.fullmatch(r"#?\s*(\d{1,2})", t)
+    if m:
+        idx = int(m.group(1)) - 1
+        if 0 <= idx < n_candidates:
+            return ("pick", idx)
+    return ("unknown", None)
+
+
+def attach_to_candidate(proposal, candidate):
+    """Promote a disambiguation pick into an approval-stage meeting note on the chosen investor.
+    The note will target that existing grid row (via _match_id); the firm name is shown for
+    accuracy. Drops the shortlist."""
+    updated = dict(proposal)
+    updated.pop("_candidates", None)
+    updated["_stage"] = "approval"
+    updated["_match_id"] = candidate["id"]
+    updated["intent"] = "meeting_note"
+    if candidate.get("name"):
+        updated["investor_name"] = candidate["name"]
+    return updated
+
+
+def promote_to_new(proposal):
+    """Disambiguation 'new' — discard the shortlist and proceed as a new-investor proposal."""
+    updated = dict(proposal)
+    updated.pop("_candidates", None)
+    updated.pop("_match_id", None)
+    updated["_stage"] = "approval"
+    return updated
+
+
+def render_disambiguation(proposal):
+    """Render the fuzzy-match shortlist a human resolves before we create a new investor."""
+    name = proposal.get("investor_name") or proposal.get("contact_name") or "?"
+    cands = proposal.get("_candidates") or []
+    lines = [f"🔎 Before adding **{name}** as new — these existing investors look similar:"]
+    for i, c in enumerate(cands, 1):
+        lines.append(f"  **{i}.** {c.get('name') or '?'}")
+    lines.append("")
+    lines.append("Reply a **number** to log this against that investor, **new** to add it as a "
+                 "new investor, or **no** to discard.")
+    return "\n".join(lines)
+
+
+def disambiguation_nudge(proposal):
+    """Brief main-timeline pointer for a disambiguation proposal (the shortlist is in the thread)."""
+    name = proposal.get("investor_name") or proposal.get("contact_name") or "?"
+    return (f"🔎 **{name}** may match an existing investor — open the **thread** to pick one "
+            "or confirm it's new.")
+
+
 def render(proposal):
    """Render a proposal as the in-thread message a human approves."""
    if proposal.get("intent") == "meeting_note":