Device-test round 2: 4 in-app fixes + Matrix intake cleanup (v0.1.0:99)

Grant's real-phone testing surfaced seven items; this lands six (the seventh, in-app camera card intake, is planned in docs/handoffs/in-app-card-intake-plan.md). CRM half — ships in the s9pk (v0.1.0:99): - Intake fuzzy match no longer over-indexes on generic firm words. _name_similarity now compares DISTINCTIVE tokens only (generic descriptors — "Investment Group", "Capital", "Family Office" — stripped via _GENERIC_ORG_WORDS) for both the difflib ratio and the Jaccard, so "Fortitude Investment Group" stops surfacing Aether/Russell while "Aether Capital" still surfaces "Aether Investment Group". +2 regression cases. - Mobile grid "Last contact"/staleness sort is reversible. SortSheet gains opt-in dir/onToggleDir; other surfaces (Contacts/Pipeline) are untouched. - Mobile "Edit investor" prefills a contact's saved email. GET /api/fundraising/state heals a blank grid pill email from the linked classic contact (fundraising_contacts.contact_id -> contacts.email), fill-only, by pill order then name; the next one-row save persists it. +test_grid_email_heal.py. - Mobile quick-log pencil icon renders. iOS collapses a sole, centered, attribute-only -sized flex-child <svg>; .quicklog-btn svg now gets explicit CSS width/height + flex:none (the pattern the working bottom-tab/sort-pill icons use). The v97 fix only changed color. Matrix intake bot — ships on the Spark (bot-only, NOT the s9pk): - Approve/reject now redacts the whole intake thread (card + ack + main-timeline nudge + the user's own photo/note), mirroring the email-review room; redact_thread takes the room as an arg and matches replies by m.thread OR m.in_reply_to (so the nudge clears). No more in-Matrix confirmation after a commit (the thread vanishing is the ack). Needs the bot to hold a redact/moderator power level in the intake room. - New one-time backend/matrix_intake/redact_intake.py clears the room's pre-existing backlog (dry-run default; --apply). Tests 42/42 green; frontend render-smoke green. Frontend fixes are inspection + render -smoke verified (on-device confirm pending); the bot redaction is live-smoke only.
2026-06-20 12:32:56 -05:00
parent 7fe5f57c6e
commit a917280bbb
13 changed files with 606 additions and 58 deletions
@@ -1305,14 +1305,40 @@ def _strip_legal_suffix(normalized_name):
    return " ".join(toks)


+# Generic firm-descriptor words that carry almost no identifying signal: nearly every firm name
+# contains one ("… Investment Group", "… Capital", "… Family Office"). Two names that overlap ONLY
+# on these are NOT duplicates — 'Fortitude Investment Group' is not 'Aether Investment Group'. We
+# compare on the DISTINCTIVE remainder so a shared descriptor can't inflate the score (the earlier
+# "Capital/Ventures/Partners are distinctive enough to keep" assumption produced false shortlists —
+# Grant, 2026-06-20). If a name is ALL descriptor ('Family Office'), we fall back to its full tokens
+# so there's still something to compare.
+_GENERIC_ORG_WORDS = frozenset({
+    "investment", "investments", "investing", "investor", "investors",
+    "capital", "ventures", "venture", "partners", "partner", "group",
+    "fund", "funds", "management", "advisors", "advisers", "advisory",
+    "asset", "assets", "holdings", "holding", "family", "office",
+    "trust", "associates", "equity", "financial", "finance", "global",
+    "international", "company", "enterprises", "wealth", "the", "and", "of",
+})
+
+
+def _distinctive_tokens(normalized_name):
+    """Tokens of a (legal-suffix-stripped) name with generic firm descriptors removed. Falls back to
+    the full token list when the name is nothing but descriptors, so an all-generic name still compares."""
+    toks = re.findall(r"[a-z0-9]+", normalized_name)
+    keep = [t for t in toks if t not in _GENERIC_ORG_WORDS]
+    return keep or toks
+
+
 def _name_similarity(a, b):
    """0..1 fuzzy similarity between two investor names: the max of difflib's sequence ratio
    (catches near-spellings — 'Charlie'/'Charles') and token-set Jaccard overlap (catches
    word-order differences). Legal-entity suffixes are stripped first, so two names differing
    only by 'LLC'/'LP'/'Inc' score 1.0 (a near-certain duplicate to surface — find_intake_match
-    won't have caught it, since it compares the full string). Favors recall: a shared common
-    name-word ('… Capital') can lift unrelated firms into the 0.6–0.8 band — acceptable noise in
-    a ranked, human-confirmed shortlist; semantic pruning is the deferred LLM-judge's job."""
+    won't have caught it, since it compares the full string). Both the ratio and the Jaccard run on
+    the DISTINCTIVE tokens (generic descriptors like 'Investment Group'/'Capital' removed), so firms
+    that share only a descriptor don't surface as look-alikes; 'Aether Capital' ~ 'Aether Capital
+    Partners' still scores 1.0 on the distinctive 'aether'. Still recall-favoring on real overlap."""
    a = _normalize_text(a)
    b = _normalize_text(b)
    if not a or not b:
@@ -1323,9 +1349,10 @@ def _name_similarity(a, b):
    sb = _strip_legal_suffix(b) or b
    if sa == sb:
        return 1.0
-    ratio = difflib.SequenceMatcher(None, sa, sb).ratio()
-    ta = set(re.findall(r"[a-z0-9]+", sa))
-    tb = set(re.findall(r"[a-z0-9]+", sb))
+    da = _distinctive_tokens(sa)  # order-preserving for the sequence ratio
+    db = _distinctive_tokens(sb)
+    ratio = difflib.SequenceMatcher(None, " ".join(da), " ".join(db)).ratio()
+    ta, tb = set(da), set(db)
    jaccard = len(ta & tb) / len(ta | tb) if (ta or tb) else 0.0
    return max(ratio, jaccard)

@@ -1881,6 +1908,45 @@ def existing_investor_by_source_row(conn):
    return out


+def fundraising_contact_emails_by_row(conn):
+    """{ source_row_id: {'order': {sort_order: email}, 'name': {normalized_name: email}} } of the
+    authoritative email per grid contact, for HEALING blank pill emails on read.
+
+    The grid blob is canonical for the edit sheet, but an email can reach the linked classic
+    contact (via email capture / a contact edit) without ever being written back into the blob
+    pill — so the mobile "Edit investor" sheet shows an empty email for a contact the directory
+    clearly has (Grant, 2026-06-20). We recover it from the relational mirror: prefer the synced
+    fundraising_contacts.email, else the linked classic contacts.email (the source that actually
+    holds the captured address). Keyed by sort_order (pills and fundraising_contacts share the
+    blob order — the robust key) with a normalized-name fallback. Only non-blank emails are
+    returned; filling is fill-only-when-blank in the handler, so it heals and converges (the next
+    one-row save persists the recovered email into the blob)."""
+    out = {}
+    rows = conn.execute(
+        """
+        SELECT fi.source_row_id AS srid, fc.sort_order AS so, fc.full_name AS name,
+               COALESCE(NULLIF(TRIM(fc.email), ''), c.email) AS email
+        FROM fundraising_investors fi
+        JOIN fundraising_contacts fc ON fc.investor_id = fi.id
+        LEFT JOIN contacts c ON c.id = fc.contact_id AND c.deleted_at IS NULL
+        """
+    ).fetchall()
+    for r in rows:
+        email = str(r['email'] or '').strip()
+        if not email:
+            continue
+        srid = str(r['srid'] or '')
+        if not srid:
+            continue
+        bucket = out.setdefault(srid, {'order': {}, 'name': {}})
+        if r['so'] is not None:
+            bucket['order'][int(r['so'])] = email
+        nm = _normalize_text(r['name'])
+        if nm:
+            bucket['name'][nm] = email
+    return out
+
+
 def contact_grid_signals(conn, contact_id=None):
    """Return {contacts.id: {'committed': float, 'pipeline_stage': str|None, 'priority': bool}} for
    every classic contact linked to a fundraising-grid investor (via fundraising_contacts.contact_id,
@@ -5830,6 +5896,7 @@ class CRMHandler(BaseHTTPRequestHandler):
        reminder_by_row = reminder_status_by_source_row(conn)
        existing_by_row = existing_investor_by_source_row(conn)
        recency_by_row = staleness_by_source_row(conn)
+        emails_by_row = fundraising_contact_emails_by_row(conn)
        conn.close()

        try:
@@ -5873,6 +5940,19 @@ class CRMHandler(BaseHTTPRequestHandler):
            last_activity, staleness = recency_by_row.get(srid, (None, ''))
            r['last_activity_at'] = last_activity
            r['staleness'] = staleness
+            # Heal blank pill emails from the relational mirror (fill-only — never overwrite a value
+            # already in the blob). Unlike the read-only columns above, email is a REAL blob field,
+            # so this is a backfill, not a derived signal: it needs NO strip point, and the next
+            # one-row save legitimately persists it. Match by pill order, then by name.
+            heal = emails_by_row.get(srid)
+            pills = r.get('contacts')
+            if heal and isinstance(pills, list):
+                for i, c in enumerate(pills):
+                    if not isinstance(c, dict) or str(c.get('email') or '').strip():
+                        continue
+                    found = heal['order'].get(i) or heal['name'].get(_normalize_text(c.get('name')))
+                    if found:
+                        c['email'] = found

        return self.send_json({
            "data": {