Device-test round 2: 4 in-app fixes + Matrix intake cleanup (v0.1.0:99)

Grant's real-phone testing surfaced seven items; this lands six (the seventh, in-app camera card intake, is planned in docs/handoffs/in-app-card-intake-plan.md). CRM half — ships in the s9pk (v0.1.0:99): - Intake fuzzy match no longer over-indexes on generic firm words. _name_similarity now compares DISTINCTIVE tokens only (generic descriptors — "Investment Group", "Capital", "Family Office" — stripped via _GENERIC_ORG_WORDS) for both the difflib ratio and the Jaccard, so "Fortitude Investment Group" stops surfacing Aether/Russell while "Aether Capital" still surfaces "Aether Investment Group". +2 regression cases. - Mobile grid "Last contact"/staleness sort is reversible. SortSheet gains opt-in dir/onToggleDir; other surfaces (Contacts/Pipeline) are untouched. - Mobile "Edit investor" prefills a contact's saved email. GET /api/fundraising/state heals a blank grid pill email from the linked classic contact (fundraising_contacts.contact_id -> contacts.email), fill-only, by pill order then name; the next one-row save persists it. +test_grid_email_heal.py. - Mobile quick-log pencil icon renders. iOS collapses a sole, centered, attribute-only -sized flex-child <svg>; .quicklog-btn svg now gets explicit CSS width/height + flex:none (the pattern the working bottom-tab/sort-pill icons use). The v97 fix only changed color. Matrix intake bot — ships on the Spark (bot-only, NOT the s9pk): - Approve/reject now redacts the whole intake thread (card + ack + main-timeline nudge + the user's own photo/note), mirroring the email-review room; redact_thread takes the room as an arg and matches replies by m.thread OR m.in_reply_to (so the nudge clears). No more in-Matrix confirmation after a commit (the thread vanishing is the ack). Needs the bot to hold a redact/moderator power level in the intake room. - New one-time backend/matrix_intake/redact_intake.py clears the room's pre-existing backlog (dry-run default; --apply). Tests 42/42 green; frontend render-smoke green. Frontend fixes are inspection + render -smoke verified (on-device confirm pending); the bot redaction is live-smoke only.
2026-06-20 12:32:56 -05:00
parent 7fe5f57c6e
commit a917280bbb
13 changed files with 606 additions and 58 deletions
@@ -171,9 +171,13 @@ async def main():
                store.put(root, proposal)  # commit failed — restore so the user can retry
                await say(room_id, f"⚠️ write failed, nothing committed: {exc}", root)
                return
-            await say(room_id, f"✅ {summary}", root)
+            # Committed → clear the whole thread (card + ack + nudge + the user's note/photo),
+            # like the email-review room. The thread vanishing is the acknowledgment; a confirmation
+            # reply would just keep it alive (and need redacting too). Needs the bot's redact/mod
+            # power in the intake room to clear the user's own messages — else those linger.
+            await redact_thread(room_id, root)
        elif action == "reject":
-            await say(room_id, "🗑️ Discarded — nothing written.", root)
+            await redact_thread(room_id, root)
        elif action == "edit":
            field, value = payload
            proposal = proposals.apply_edit(proposal, field, value)
@@ -212,42 +216,49 @@ async def main():
            await say(room_id, "➕ OK — adding as a new investor:\n\n"
                               + proposals.render(updated), root)
        elif action == "reject":
-            await say(room_id, "🗑️ Discarded — nothing written.", root)
+            await redact_thread(room_id, root)  # discard → clear the thread, like an approve
        else:  # unrecognized — re-show the shortlist
            store.put(root, proposal)
            await say(room_id, "I didn't catch that.\n\n" + proposals.render_disambiguation(proposal), root)

-    async def redact_card(event_id):
-        """Redact one event (best-effort). Redacting our OWN message needs no special power;
-        redacting someone else's reply needs the bot to hold a redact/mod power level."""
+    async def redact_card(room_id, event_id):
+        """Redact one event in `room_id` (best-effort). Redacting our OWN message needs no special
+        power; redacting someone else's message (a human reply, or the user's original card photo /
+        intake note) needs the bot to hold a redact/mod power level in that room."""
        try:
-            await client.room_redact(review_room, event_id, reason="proposal resolved")
+            await client.room_redact(room_id, event_id, reason="proposal resolved")
        except Exception as exc:
            print(f"matrix-intake: could not redact {event_id}: {exc}", flush=True)

-    async def redact_thread(root):
-        """Clear a resolved thread: redact the card AND every reply under it, so the thread drops
-        out of the threads view (not just the main timeline). The card is ours (always redactable);
-        the human's yes/no reply needs the bot's redact/mod power — if it lacks power that redact
-        just no-ops and the reply lingers. Finds replies by scanning recent room history for
-        m.thread events pointing at this root (the triggering reply is already synced, so a
-        backward scan from the current token includes it)."""
-        await redact_card(root)
+    async def redact_thread(room_id, root):
+        """Clear a resolved thread in `room_id`: redact the root AND every message that hangs off it
+        — the m.thread children (cards/acks/human replies) AND the main-timeline **nudge** (a plain
+        m.in_reply_to reply, not a thread child), so the thread drops out of both the threads view
+        and the timeline. For email-review the root is the bot's card; for intake it's the USER'S
+        own note/photo, so clearing it (and the human reply) needs the bot's redact/mod power in that
+        room — without it those just no-op and linger. Replies are found by scanning recent history
+        from the current sync token (the triggering reply is already synced, so a backward scan
+        includes it)."""
+        await redact_card(room_id, root)
        token = getattr(client, "next_batch", None)
        if not token:
            return
        try:
            scanned = 0
            for _ in range(MAX_THREAD_SCAN_PAGES):
-                resp = await client.room_messages(review_room, start=token,
+                resp = await client.room_messages(room_id, start=token,
                                                  direction=MessageDirection.back, limit=100)
                chunk = getattr(resp, "chunk", None)
                if not chunk:
                    break
                for ev in chunk:
                    rel = ((getattr(ev, "source", None) or {}).get("content", {}) or {}).get("m.relates_to") or {}
-                    if rel.get("rel_type") == "m.thread" and rel.get("event_id") == root:
-                        await redact_card(ev.event_id)
+                    in_reply = (rel.get("m.in_reply_to") or {}).get("event_id")
+                    # A thread child carries event_id==root; the un-threaded nudge carries only
+                    # m.in_reply_to.event_id==root. Catch both so the thread AND its main-timeline
+                    # pointer clear together.
+                    if rel.get("event_id") == root or in_reply == root:
+                        await redact_card(room_id, ev.event_id)
                token = getattr(resp, "end", None)
                scanned += len(chunk)
                if not token or scanned > 1000:
@@ -275,7 +286,7 @@ async def main():
                return
            # Success → clear the whole thread (card + replies). No confirmation: the thread
            # vanishing is the acknowledgment, and a confirmation reply would keep it alive.
-            await redact_thread(root)
+            await redact_thread(review_room, root)
        elif decision == "reject":
            email_threads.pop(root, None)
            try:
@@ -284,7 +295,7 @@ async def main():
                email_threads[root] = item
                await say(room_id, email_proposals.frame(f"⚠️ couldn't dismiss it ({str(exc)[:200]}). Try again."), root)
                return
-            await redact_thread(root)
+            await redact_thread(review_room, root)
        else:
            try:
                new_note = await asyncio.to_thread(email_proposals.revise_note, item.get("note") or "", text)
@@ -332,7 +343,7 @@ async def main():
                    if not ev:
                        continue
                    try:
-                        await redact_thread(ev)
+                        await redact_thread(review_room, ev)
                        await asyncio.to_thread(crm_client.mark_email_proposal_closed, it["id"])
                        email_threads.pop(ev, None)
                    except Exception as exc:
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""One-time maintenance: clear the intake room's backlog of resolved/stale messages.
+
+Going forward the bot redacts each intake thread when it's approved/rejected (bot card + ack +
+nudge + the user's own note/photo). This clears the messages that piled up BEFORE that shipped.
+
+The intake room is single-purpose and the bot keeps **no durable pending state** (its proposal
+store is in-memory and is lost on every restart), so nothing in the room is "still live" after a
+restart — every message in it is safe to redact. This walks the room history and redacts every
+m.room.message event (text + business-card images), bot's and humans' alike.
+
+Redacting another user's message (the humans' notes/photos) needs the bot to hold a **redact /
+moderator power level** in the intake room — without it those just no-op and linger (the bot's own
+messages still clear). Make the bot a moderator of the intake room in Element first.
+
+Safe by default: prints what it WOULD redact and does nothing. Pass --apply to actually redact.
+Run on the Spark via the bot's own creds/image:
+    docker compose run --rm matrix-intake python -u backend/matrix_intake/redact_intake.py
+    docker compose run --rm matrix-intake python -u backend/matrix_intake/redact_intake.py --apply
+"""
+import asyncio
+import sys
+
+from nio import AsyncClient, MessageDirection
+
+import settings
+
+MAX_PAGES = 50  # 50 * 100 events is far more history than this room holds
+
+
+async def main(apply):
+    mx = settings.matrix_settings()
+    intake_room = mx.get("intake_room")
+    if not intake_room:
+        print("MATRIX_INTAKE_ROOM is not set — nothing to do.")
+        return
+    client = AsyncClient(mx["homeserver"], mx["user_id"])
+    client.restore_login(user_id=mx["user_id"], device_id=mx["device_id"], access_token=mx["token"])
+    try:
+        sync = await client.sync(timeout=10000, full_state=False)
+        token = sync.next_batch
+        targets = []  # (event_id, label)
+        seen = set()
+        for _ in range(MAX_PAGES):
+            resp = await client.room_messages(intake_room, start=token,
+                                              direction=MessageDirection.back, limit=100)
+            chunk = getattr(resp, "chunk", None)
+            if not chunk:
+                break
+            for ev in chunk:
+                src = getattr(ev, "source", None) or {}
+                if src.get("type") != "m.room.message":
+                    continue  # only chat messages + images; leave membership/state events alone
+                eid = getattr(ev, "event_id", None)
+                if not eid or eid in seen:
+                    continue
+                seen.add(eid)
+                content = src.get("content") or {}
+                if not content:
+                    continue  # already redacted (content stripped) — skip
+                msgtype = content.get("msgtype") or "?"
+                body = (content.get("body", "") or "").replace("\n", " ")
+                who = "bot " if getattr(ev, "sender", None) == mx["user_id"] else "user"
+                targets.append((eid, f"{who} [{msgtype}] {body[:60]}"))
+            token = getattr(resp, "end", None)
+            if not token:
+                break
+
+        print(f"messages to clear in the intake room: {len(targets)}")
+        fails = 0
+        for eid, label in targets:
+            print(("APPLY redact " if apply else "WOULD redact ") + eid + "  ::  " + label)
+            if apply:
+                r = await client.room_redact(intake_room, eid, reason="retroactive intake-room cleanup")
+                if not hasattr(r, "event_id"):
+                    fails += 1
+                    print(f"   ! redact failed (need mod power for others' messages?): {r}")
+        print(("done — redacted " if apply else "dry run — would redact ")
+              + f"{len(targets) - (fails if apply else 0)}/{len(targets)} event(s)"
+              + (f"; {fails} failed" if apply and fails else "") + ".")
+    finally:
+        await client.close()
+
+
+if __name__ == "__main__":
+    asyncio.run(main(apply="--apply" in sys.argv[1:]))
@@ -1305,14 +1305,40 @@ def _strip_legal_suffix(normalized_name):
    return " ".join(toks)


+# Generic firm-descriptor words that carry almost no identifying signal: nearly every firm name
+# contains one ("… Investment Group", "… Capital", "… Family Office"). Two names that overlap ONLY
+# on these are NOT duplicates — 'Fortitude Investment Group' is not 'Aether Investment Group'. We
+# compare on the DISTINCTIVE remainder so a shared descriptor can't inflate the score (the earlier
+# "Capital/Ventures/Partners are distinctive enough to keep" assumption produced false shortlists —
+# Grant, 2026-06-20). If a name is ALL descriptor ('Family Office'), we fall back to its full tokens
+# so there's still something to compare.
+_GENERIC_ORG_WORDS = frozenset({
+    "investment", "investments", "investing", "investor", "investors",
+    "capital", "ventures", "venture", "partners", "partner", "group",
+    "fund", "funds", "management", "advisors", "advisers", "advisory",
+    "asset", "assets", "holdings", "holding", "family", "office",
+    "trust", "associates", "equity", "financial", "finance", "global",
+    "international", "company", "enterprises", "wealth", "the", "and", "of",
+})
+
+
+def _distinctive_tokens(normalized_name):
+    """Tokens of a (legal-suffix-stripped) name with generic firm descriptors removed. Falls back to
+    the full token list when the name is nothing but descriptors, so an all-generic name still compares."""
+    toks = re.findall(r"[a-z0-9]+", normalized_name)
+    keep = [t for t in toks if t not in _GENERIC_ORG_WORDS]
+    return keep or toks
+
+
 def _name_similarity(a, b):
    """0..1 fuzzy similarity between two investor names: the max of difflib's sequence ratio
    (catches near-spellings — 'Charlie'/'Charles') and token-set Jaccard overlap (catches
    word-order differences). Legal-entity suffixes are stripped first, so two names differing
    only by 'LLC'/'LP'/'Inc' score 1.0 (a near-certain duplicate to surface — find_intake_match
-    won't have caught it, since it compares the full string). Favors recall: a shared common
-    name-word ('… Capital') can lift unrelated firms into the 0.6–0.8 band — acceptable noise in
-    a ranked, human-confirmed shortlist; semantic pruning is the deferred LLM-judge's job."""
+    won't have caught it, since it compares the full string). Both the ratio and the Jaccard run on
+    the DISTINCTIVE tokens (generic descriptors like 'Investment Group'/'Capital' removed), so firms
+    that share only a descriptor don't surface as look-alikes; 'Aether Capital' ~ 'Aether Capital
+    Partners' still scores 1.0 on the distinctive 'aether'. Still recall-favoring on real overlap."""
    a = _normalize_text(a)
    b = _normalize_text(b)
    if not a or not b:
@@ -1323,9 +1349,10 @@ def _name_similarity(a, b):
    sb = _strip_legal_suffix(b) or b
    if sa == sb:
        return 1.0
-    ratio = difflib.SequenceMatcher(None, sa, sb).ratio()
-    ta = set(re.findall(r"[a-z0-9]+", sa))
-    tb = set(re.findall(r"[a-z0-9]+", sb))
+    da = _distinctive_tokens(sa)  # order-preserving for the sequence ratio
+    db = _distinctive_tokens(sb)
+    ratio = difflib.SequenceMatcher(None, " ".join(da), " ".join(db)).ratio()
+    ta, tb = set(da), set(db)
    jaccard = len(ta & tb) / len(ta | tb) if (ta or tb) else 0.0
    return max(ratio, jaccard)

@@ -1881,6 +1908,45 @@ def existing_investor_by_source_row(conn):
    return out


+def fundraising_contact_emails_by_row(conn):
+    """{ source_row_id: {'order': {sort_order: email}, 'name': {normalized_name: email}} } of the
+    authoritative email per grid contact, for HEALING blank pill emails on read.
+
+    The grid blob is canonical for the edit sheet, but an email can reach the linked classic
+    contact (via email capture / a contact edit) without ever being written back into the blob
+    pill — so the mobile "Edit investor" sheet shows an empty email for a contact the directory
+    clearly has (Grant, 2026-06-20). We recover it from the relational mirror: prefer the synced
+    fundraising_contacts.email, else the linked classic contacts.email (the source that actually
+    holds the captured address). Keyed by sort_order (pills and fundraising_contacts share the
+    blob order — the robust key) with a normalized-name fallback. Only non-blank emails are
+    returned; filling is fill-only-when-blank in the handler, so it heals and converges (the next
+    one-row save persists the recovered email into the blob)."""
+    out = {}
+    rows = conn.execute(
+        """
+        SELECT fi.source_row_id AS srid, fc.sort_order AS so, fc.full_name AS name,
+               COALESCE(NULLIF(TRIM(fc.email), ''), c.email) AS email
+        FROM fundraising_investors fi
+        JOIN fundraising_contacts fc ON fc.investor_id = fi.id
+        LEFT JOIN contacts c ON c.id = fc.contact_id AND c.deleted_at IS NULL
+        """
+    ).fetchall()
+    for r in rows:
+        email = str(r['email'] or '').strip()
+        if not email:
+            continue
+        srid = str(r['srid'] or '')
+        if not srid:
+            continue
+        bucket = out.setdefault(srid, {'order': {}, 'name': {}})
+        if r['so'] is not None:
+            bucket['order'][int(r['so'])] = email
+        nm = _normalize_text(r['name'])
+        if nm:
+            bucket['name'][nm] = email
+    return out
+
+
 def contact_grid_signals(conn, contact_id=None):
    """Return {contacts.id: {'committed': float, 'pipeline_stage': str|None, 'priority': bool}} for
    every classic contact linked to a fundraising-grid investor (via fundraising_contacts.contact_id,
@@ -5830,6 +5896,7 @@ class CRMHandler(BaseHTTPRequestHandler):
        reminder_by_row = reminder_status_by_source_row(conn)
        existing_by_row = existing_investor_by_source_row(conn)
        recency_by_row = staleness_by_source_row(conn)
+        emails_by_row = fundraising_contact_emails_by_row(conn)
        conn.close()

        try:
@@ -5873,6 +5940,19 @@ class CRMHandler(BaseHTTPRequestHandler):
            last_activity, staleness = recency_by_row.get(srid, (None, ''))
            r['last_activity_at'] = last_activity
            r['staleness'] = staleness
+            # Heal blank pill emails from the relational mirror (fill-only — never overwrite a value
+            # already in the blob). Unlike the read-only columns above, email is a REAL blob field,
+            # so this is a backfill, not a derived signal: it needs NO strip point, and the next
+            # one-row save legitimately persists it. Match by pill order, then by name.
+            heal = emails_by_row.get(srid)
+            pills = r.get('contacts')
+            if heal and isinstance(pills, list):
+                for i, c in enumerate(pills):
+                    if not isinstance(c, dict) or str(c.get('email') or '').strip():
+                        continue
+                    found = heal['order'].get(i) or heal['name'].get(_normalize_text(c.get('name')))
+                    if found:
+                        c['email'] = found

        return self.send_json({
            "data": {
@@ -0,0 +1,135 @@
+#!/usr/bin/env python3
+"""Regression: GET /api/fundraising/state heals blank grid-pill emails from the relational mirror.
+
+The grid blob is canonical for the mobile "Edit investor" sheet, but an email can reach a linked
+classic contact (email capture / a contact edit) without ever being written back into the blob pill
+— so the edit form showed an empty email for a contact the directory clearly had (Grant, 2026-06-20).
+The state handler now fills a blank pill email from fundraising_contacts.email, else the linked
+contacts.email, matched by pill order then name. This asserts:
+  - a blank pill whose linked contact has an email is HEALED on read;
+  - a blank pill whose linked contact is also blank stays blank;
+  - a pill that already carries an email in the blob is NEVER overwritten (fill-only).
+Synthetic data only.
+
+Run: cd backend && python3 test_grid_email_heal.py
+"""
+import http.client
+import json
+import os
+import sqlite3
+import sys
+import tempfile
+import threading
+from http.server import ThreadingHTTPServer
+
+_DATA = tempfile.mkdtemp()
+os.environ["CRM_DATA_DIR"] = _DATA
+os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+import server  # noqa: E402
+
+FAILS = []
+
+
+def check(cond, msg):
+    print(("  PASS " if cond else "  FAIL ") + msg)
+    if not cond:
+        FAILS.append(msg)
+
+
+class _Quiet(server.CRMHandler):
+    def log_message(self, *a):
+        pass
+
+
+def _get_state(port, token):
+    conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
+    conn.request("GET", "/api/fundraising/state", headers={"Authorization": "Bearer " + token})
+    resp = conn.getresponse()
+    raw = resp.read().decode("utf-8", "replace")
+    conn.close()
+    return resp.status, (json.loads(raw) if raw else None)
+
+
+GRID = {
+    "columns": [{"id": "investor_name", "label": "Investor", "type": "text"},
+                {"id": "contacts", "label": "Contacts", "type": "contacts"}],
+    "rows": [
+        {"id": "rowW", "investor_name": "Wyoming", "notes": "",
+         "contacts": [{"name": "Philip Treick", "email": "", "title": ""},
+                      {"name": "Jose Briones", "email": "", "title": ""}]},
+        {"id": "rowA", "investor_name": "Acme Capital", "notes": "",
+         "contacts": [{"name": "Jane Doe", "email": "keep@acme.com", "title": ""}]},
+    ],
+}
+
+
+def seed():
+    c = sqlite3.connect(os.environ["CRM_DB_PATH"])
+    c.execute("INSERT INTO users (id,username,email,password_hash,full_name,role,is_active) "
+              "VALUES ('u1','grant','grant@ten31.example','x','Grant','admin',1)")
+    c.execute("INSERT INTO fundraising_state (id, grid_json, views_json, version) "
+              "VALUES ('main', ?, '[]', 1) "
+              "ON CONFLICT(id) DO UPDATE SET grid_json = excluded.grid_json", (json.dumps(GRID),))
+    # Classic contacts directory: Jose has the captured email the blob never got; Philip is blank.
+    c.execute("INSERT INTO contacts (id,first_name,last_name,email) VALUES "
+              "('c-phil','Philip','Treick',''),"
+              "('c-jose','Jose','Briones','jbriones@uwyo.edu'),"
+              "('c-jane','Jane','Doe','other@acme.com')")  # differs from the blob's keep@acme.com
+    # Relational mirror (what sync_fundraising_relational would build): blank fc.email, linked contact_id.
+    c.execute("INSERT INTO fundraising_investors (id,investor_name,source_row_id,total_invested) VALUES "
+              "('inv-w','Wyoming','rowW',0),('inv-a','Acme Capital','rowA',0)")
+    c.execute("INSERT INTO fundraising_contacts (id,investor_id,full_name,email,sort_order,contact_id) VALUES "
+              "('fc-phil','inv-w','Philip Treick','',0,'c-phil'),"
+              "('fc-jose','inv-w','Jose Briones','',1,'c-jose'),"
+              "('fc-jane','inv-a','Jane Doe','',0,'c-jane')")
+    c.commit()
+    c.close()
+
+
+def main():
+    server.init_db()
+    seed()
+    token = server.create_token("u1", "grant", "admin")
+
+    httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
+    port = httpd.server_address[1]
+    threading.Thread(target=httpd.serve_forever, daemon=True).start()
+    try:
+        st, d = _get_state(port, token)
+        rows = ((d or {}).get("data", {}).get("grid", {}) or {}).get("rows", [])
+        by_id = {r.get("id"): r for r in rows}
+        w = by_id.get("rowW", {})
+        a = by_id.get("rowA", {})
+        wc = w.get("contacts", [])
+        ac = a.get("contacts", [])
+
+        print("\n[heal: blank pill email filled from the linked contact (Jose)]")
+        jose = next((c for c in wc if c.get("name") == "Jose Briones"), {})
+        check(st == 200 and jose.get("email") == "jbriones@uwyo.edu",
+              f"Jose pill healed to jbriones@uwyo.edu (got {jose.get('email')!r})")
+
+        print("\n[heal: blank pill whose contact is also blank stays blank (Philip)]")
+        phil = next((c for c in wc if c.get("name") == "Philip Treick"), {})
+        check(phil.get("email", "") == "",
+              f"Philip pill stays blank (got {phil.get('email')!r})")
+
+        print("\n[heal: a pill that already has an email is never overwritten (Jane)]")
+        jane = next((c for c in ac if c.get("name") == "Jane Doe"), {})
+        check(jane.get("email") == "keep@acme.com",
+              f"Jane pill keeps its blob email, not the contact's (got {jane.get('email')!r})")
+    finally:
+        httpd.shutdown()
+
+    print()
+    if FAILS:
+        print(f"FAILED ({len(FAILS)}):")
+        for f in FAILS:
+            print(f"  - {f}")
+        sys.exit(1)
+    print("ALL PASS (grid email heal)")
+
+
+if __name__ == "__main__":
+    main()
@@ -75,6 +75,12 @@ GRID = {
         "contacts": [{"name": "Charlie Brown", "email": "cb@brown.fund", "title": ""}]},
        {"id": "rowBeta", "investor_name": "Beta Capital LLC", "notes": "",
         "contacts": [{"name": "Pat Roe", "email": "pat@beta.com", "title": ""}]},
+        # Generic-descriptor decoys: share only "investment group" / "investments" with the
+        # Fortitude card below — must NOT surface as look-alikes (the 2026-06-20 false-positive fix).
+        {"id": "rowAether", "investor_name": "Aether Investment Group", "notes": "",
+         "contacts": [{"name": "Ada Ng", "email": "ada@aether.com", "title": ""}]},
+        {"id": "rowRussell", "investor_name": "Russell Investments", "notes": "",
+         "contacts": [{"name": "Russ Lee", "email": "russ@russell.com", "title": ""}]},
    ],
 }

@@ -178,6 +184,20 @@ def main():
        check(st == 200 and data.get("match") is None and data.get("candidates") == [],
              f"unrelated query -> no match, no candidates (got {data})")

+        print("\n[fuzzy: shared generic words alone do NOT surface look-alikes (Fortitude vs Aether/Russell)]")
+        st, d = _req(port, "GET", "/api/intake/match?q=Fortitude%20Investment%20Group", token)
+        data = (d or {}).get("data", {})
+        cids = [c["id"] for c in data.get("candidates", [])]
+        check(data.get("match") is None and "rowAether" not in cids and "rowRussell" not in cids,
+              f"generic-only overlap -> no decoy candidates (got {data})")
+
+        print("\n[fuzzy: a shared DISTINCTIVE word still surfaces (Aether Capital ~ Aether Investment Group)]")
+        st, d = _req(port, "GET", "/api/intake/match?q=Aether%20Capital", token)
+        data = (d or {}).get("data", {})
+        cids = [c["id"] for c in data.get("candidates", [])]
+        check(data.get("match") is None and "rowAether" in cids,
+              f"distinctive overlap -> rowAether candidate (got {data})")
+
        print("\n[match: missing q and email -> 400]")
        st, _ = _req(port, "GET", "/api/intake/match", token)
        check(st == 400, f"no params -> 400 (got {st})")