Device-test round 2: 4 in-app fixes + Matrix intake cleanup (v0.1.0:99)

Grant's real-phone testing surfaced seven items; this lands six (the seventh,
in-app camera card intake, is planned in docs/handoffs/in-app-card-intake-plan.md).

CRM half — ships in the s9pk (v0.1.0:99):
- Intake fuzzy match no longer over-indexes on generic firm words. _name_similarity
  now compares DISTINCTIVE tokens only (generic descriptors — "Investment Group",
  "Capital", "Family Office" — stripped via _GENERIC_ORG_WORDS) for both the difflib
  ratio and the Jaccard, so "Fortitude Investment Group" stops surfacing Aether/Russell
  while "Aether Capital" still surfaces "Aether Investment Group". +2 regression cases.
- Mobile grid "Last contact"/staleness sort is reversible. SortSheet gains opt-in
  dir/onToggleDir; other surfaces (Contacts/Pipeline) are untouched.
- Mobile "Edit investor" prefills a contact's saved email. GET /api/fundraising/state
  heals a blank grid pill email from the linked classic contact
  (fundraising_contacts.contact_id -> contacts.email), fill-only, by pill order then
  name; the next one-row save persists it. +test_grid_email_heal.py.
- Mobile quick-log pencil icon renders. iOS collapses a sole, centered, attribute-only
  -sized flex-child <svg>; .quicklog-btn svg now gets explicit CSS width/height + flex:none
  (the pattern the working bottom-tab/sort-pill icons use). The v97 fix only changed color.

Matrix intake bot — ships on the Spark (bot-only, NOT the s9pk):
- Approve/reject now redacts the whole intake thread (card + ack + main-timeline nudge +
  the user's own photo/note), mirroring the email-review room; redact_thread takes the
  room as an arg and matches replies by m.thread OR m.in_reply_to (so the nudge clears).
  No more in-Matrix confirmation after a commit (the thread vanishing is the ack).
  Needs the bot to hold a redact/moderator power level in the intake room.
- New one-time backend/matrix_intake/redact_intake.py clears the room's pre-existing
  backlog (dry-run default; --apply).

Tests 42/42 green; frontend render-smoke green. Frontend fixes are inspection + render
-smoke verified (on-device confirm pending); the bot redaction is live-smoke only.
This commit is contained in:
Keysat
2026-06-20 12:32:56 -05:00
parent 7fe5f57c6e
commit a917280bbb
13 changed files with 606 additions and 58 deletions
+86 -6
View File
@@ -1305,14 +1305,40 @@ def _strip_legal_suffix(normalized_name):
return " ".join(toks)
# Generic firm-descriptor words that carry almost no identifying signal: nearly every firm name
# contains one ("… Investment Group", "… Capital", "… Family Office"). Two names that overlap ONLY
# on these are NOT duplicates — 'Fortitude Investment Group' is not 'Aether Investment Group'. We
# compare on the DISTINCTIVE remainder so a shared descriptor can't inflate the score (the earlier
# "Capital/Ventures/Partners are distinctive enough to keep" assumption produced false shortlists —
# Grant, 2026-06-20). If a name is ALL descriptor ('Family Office'), we fall back to its full tokens
# so there's still something to compare.
_GENERIC_ORG_WORDS = frozenset({
"investment", "investments", "investing", "investor", "investors",
"capital", "ventures", "venture", "partners", "partner", "group",
"fund", "funds", "management", "advisors", "advisers", "advisory",
"asset", "assets", "holdings", "holding", "family", "office",
"trust", "associates", "equity", "financial", "finance", "global",
"international", "company", "enterprises", "wealth", "the", "and", "of",
})
def _distinctive_tokens(normalized_name):
"""Tokens of a (legal-suffix-stripped) name with generic firm descriptors removed. Falls back to
the full token list when the name is nothing but descriptors, so an all-generic name still compares."""
toks = re.findall(r"[a-z0-9]+", normalized_name)
keep = [t for t in toks if t not in _GENERIC_ORG_WORDS]
return keep or toks
def _name_similarity(a, b):
"""0..1 fuzzy similarity between two investor names: the max of difflib's sequence ratio
(catches near-spellings 'Charlie'/'Charles') and token-set Jaccard overlap (catches
word-order differences). Legal-entity suffixes are stripped first, so two names differing
only by 'LLC'/'LP'/'Inc' score 1.0 (a near-certain duplicate to surface find_intake_match
won't have caught it, since it compares the full string). Favors recall: a shared common
name-word ('… Capital') can lift unrelated firms into the 0.60.8 band acceptable noise in
a ranked, human-confirmed shortlist; semantic pruning is the deferred LLM-judge's job."""
won't have caught it, since it compares the full string). Both the ratio and the Jaccard run on
the DISTINCTIVE tokens (generic descriptors like 'Investment Group'/'Capital' removed), so firms
that share only a descriptor don't surface as look-alikes; 'Aether Capital' ~ 'Aether Capital
Partners' still scores 1.0 on the distinctive 'aether'. Still recall-favoring on real overlap."""
a = _normalize_text(a)
b = _normalize_text(b)
if not a or not b:
@@ -1323,9 +1349,10 @@ def _name_similarity(a, b):
sb = _strip_legal_suffix(b) or b
if sa == sb:
return 1.0
ratio = difflib.SequenceMatcher(None, sa, sb).ratio()
ta = set(re.findall(r"[a-z0-9]+", sa))
tb = set(re.findall(r"[a-z0-9]+", sb))
da = _distinctive_tokens(sa) # order-preserving for the sequence ratio
db = _distinctive_tokens(sb)
ratio = difflib.SequenceMatcher(None, " ".join(da), " ".join(db)).ratio()
ta, tb = set(da), set(db)
jaccard = len(ta & tb) / len(ta | tb) if (ta or tb) else 0.0
return max(ratio, jaccard)
@@ -1881,6 +1908,45 @@ def existing_investor_by_source_row(conn):
return out
def fundraising_contact_emails_by_row(conn):
"""{ source_row_id: {'order': {sort_order: email}, 'name': {normalized_name: email}} } of the
authoritative email per grid contact, for HEALING blank pill emails on read.
The grid blob is canonical for the edit sheet, but an email can reach the linked classic
contact (via email capture / a contact edit) without ever being written back into the blob
pill so the mobile "Edit investor" sheet shows an empty email for a contact the directory
clearly has (Grant, 2026-06-20). We recover it from the relational mirror: prefer the synced
fundraising_contacts.email, else the linked classic contacts.email (the source that actually
holds the captured address). Keyed by sort_order (pills and fundraising_contacts share the
blob order the robust key) with a normalized-name fallback. Only non-blank emails are
returned; filling is fill-only-when-blank in the handler, so it heals and converges (the next
one-row save persists the recovered email into the blob)."""
out = {}
rows = conn.execute(
"""
SELECT fi.source_row_id AS srid, fc.sort_order AS so, fc.full_name AS name,
COALESCE(NULLIF(TRIM(fc.email), ''), c.email) AS email
FROM fundraising_investors fi
JOIN fundraising_contacts fc ON fc.investor_id = fi.id
LEFT JOIN contacts c ON c.id = fc.contact_id AND c.deleted_at IS NULL
"""
).fetchall()
for r in rows:
email = str(r['email'] or '').strip()
if not email:
continue
srid = str(r['srid'] or '')
if not srid:
continue
bucket = out.setdefault(srid, {'order': {}, 'name': {}})
if r['so'] is not None:
bucket['order'][int(r['so'])] = email
nm = _normalize_text(r['name'])
if nm:
bucket['name'][nm] = email
return out
def contact_grid_signals(conn, contact_id=None):
"""Return {contacts.id: {'committed': float, 'pipeline_stage': str|None, 'priority': bool}} for
every classic contact linked to a fundraising-grid investor (via fundraising_contacts.contact_id,
@@ -5830,6 +5896,7 @@ class CRMHandler(BaseHTTPRequestHandler):
reminder_by_row = reminder_status_by_source_row(conn)
existing_by_row = existing_investor_by_source_row(conn)
recency_by_row = staleness_by_source_row(conn)
emails_by_row = fundraising_contact_emails_by_row(conn)
conn.close()
try:
@@ -5873,6 +5940,19 @@ class CRMHandler(BaseHTTPRequestHandler):
last_activity, staleness = recency_by_row.get(srid, (None, ''))
r['last_activity_at'] = last_activity
r['staleness'] = staleness
# Heal blank pill emails from the relational mirror (fill-only — never overwrite a value
# already in the blob). Unlike the read-only columns above, email is a REAL blob field,
# so this is a backfill, not a derived signal: it needs NO strip point, and the next
# one-row save legitimately persists it. Match by pill order, then by name.
heal = emails_by_row.get(srid)
pills = r.get('contacts')
if heal and isinstance(pills, list):
for i, c in enumerate(pills):
if not isinstance(c, dict) or str(c.get('email') or '').strip():
continue
found = heal['order'].get(i) or heal['name'].get(_normalize_text(c.get('name')))
if found:
c['email'] = found
return self.send_json({
"data": {