Device-test round 2: 4 in-app fixes + Matrix intake cleanup (v0.1.0:99)

Grant's real-phone testing surfaced seven items; this lands six (the seventh,
in-app camera card intake, is planned in docs/handoffs/in-app-card-intake-plan.md).

CRM half — ships in the s9pk (v0.1.0:99):
- Intake fuzzy match no longer over-indexes on generic firm words. _name_similarity
  now compares DISTINCTIVE tokens only (generic descriptors — "Investment Group",
  "Capital", "Family Office" — stripped via _GENERIC_ORG_WORDS) for both the difflib
  ratio and the Jaccard, so "Fortitude Investment Group" stops surfacing Aether/Russell
  while "Aether Capital" still surfaces "Aether Investment Group". +2 regression cases.
- Mobile grid "Last contact"/staleness sort is reversible. SortSheet gains opt-in
  dir/onToggleDir; other surfaces (Contacts/Pipeline) are untouched.
- Mobile "Edit investor" prefills a contact's saved email. GET /api/fundraising/state
  heals a blank grid pill email from the linked classic contact
  (fundraising_contacts.contact_id -> contacts.email), fill-only, by pill order then
  name; the next one-row save persists it. +test_grid_email_heal.py.
- Mobile quick-log pencil icon renders. iOS collapses a sole, centered, attribute-only
  -sized flex-child <svg>; .quicklog-btn svg now gets explicit CSS width/height + flex:none
  (the pattern the working bottom-tab/sort-pill icons use). The v97 fix only changed color.

Matrix intake bot — ships on the Spark (bot-only, NOT the s9pk):
- Approve/reject now redacts the whole intake thread (card + ack + main-timeline nudge +
  the user's own photo/note), mirroring the email-review room; redact_thread takes the
  room as an arg and matches replies by m.thread OR m.in_reply_to (so the nudge clears).
  No more in-Matrix confirmation after a commit (the thread vanishing is the ack).
  Needs the bot to hold a redact/moderator power level in the intake room.
- New one-time backend/matrix_intake/redact_intake.py clears the room's pre-existing
  backlog (dry-run default; --apply).

Tests 42/42 green; frontend render-smoke green. Frontend fixes are inspection + render
-smoke verified (on-device confirm pending); the bot redaction is live-smoke only.
This commit is contained in:
Keysat
2026-06-20 12:32:56 -05:00
parent 7fe5f57c6e
commit a917280bbb
13 changed files with 606 additions and 58 deletions
+32 -21
View File
@@ -171,9 +171,13 @@ async def main():
store.put(root, proposal) # commit failed — restore so the user can retry
await say(room_id, f"⚠️ write failed, nothing committed: {exc}", root)
return
await say(room_id, f"{summary}", root)
# Committed → clear the whole thread (card + ack + nudge + the user's note/photo),
# like the email-review room. The thread vanishing is the acknowledgment; a confirmation
# reply would just keep it alive (and need redacting too). Needs the bot's redact/mod
# power in the intake room to clear the user's own messages — else those linger.
await redact_thread(room_id, root)
elif action == "reject":
await say(room_id, "🗑️ Discarded — nothing written.", root)
await redact_thread(room_id, root)
elif action == "edit":
field, value = payload
proposal = proposals.apply_edit(proposal, field, value)
@@ -212,42 +216,49 @@ async def main():
await say(room_id, " OK — adding as a new investor:\n\n"
+ proposals.render(updated), root)
elif action == "reject":
await say(room_id, "🗑️ Discarded — nothing written.", root)
await redact_thread(room_id, root) # discard → clear the thread, like an approve
else: # unrecognized — re-show the shortlist
store.put(root, proposal)
await say(room_id, "I didn't catch that.\n\n" + proposals.render_disambiguation(proposal), root)
async def redact_card(event_id):
"""Redact one event (best-effort). Redacting our OWN message needs no special power;
redacting someone else's reply needs the bot to hold a redact/mod power level."""
async def redact_card(room_id, event_id):
"""Redact one event in `room_id` (best-effort). Redacting our OWN message needs no special
power; redacting someone else's message (a human reply, or the user's original card photo /
intake note) needs the bot to hold a redact/mod power level in that room."""
try:
await client.room_redact(review_room, event_id, reason="proposal resolved")
await client.room_redact(room_id, event_id, reason="proposal resolved")
except Exception as exc:
print(f"matrix-intake: could not redact {event_id}: {exc}", flush=True)
async def redact_thread(root):
"""Clear a resolved thread: redact the card AND every reply under it, so the thread drops
out of the threads view (not just the main timeline). The card is ours (always redactable);
the human's yes/no reply needs the bot's redact/mod power — if it lacks power that redact
just no-ops and the reply lingers. Finds replies by scanning recent room history for
m.thread events pointing at this root (the triggering reply is already synced, so a
backward scan from the current token includes it)."""
await redact_card(root)
async def redact_thread(room_id, root):
"""Clear a resolved thread in `room_id`: redact the root AND every message that hangs off it
the m.thread children (cards/acks/human replies) AND the main-timeline **nudge** (a plain
m.in_reply_to reply, not a thread child), so the thread drops out of both the threads view
and the timeline. For email-review the root is the bot's card; for intake it's the USER'S
own note/photo, so clearing it (and the human reply) needs the bot's redact/mod power in that
room — without it those just no-op and linger. Replies are found by scanning recent history
from the current sync token (the triggering reply is already synced, so a backward scan
includes it)."""
await redact_card(room_id, root)
token = getattr(client, "next_batch", None)
if not token:
return
try:
scanned = 0
for _ in range(MAX_THREAD_SCAN_PAGES):
resp = await client.room_messages(review_room, start=token,
resp = await client.room_messages(room_id, start=token,
direction=MessageDirection.back, limit=100)
chunk = getattr(resp, "chunk", None)
if not chunk:
break
for ev in chunk:
rel = ((getattr(ev, "source", None) or {}).get("content", {}) or {}).get("m.relates_to") or {}
if rel.get("rel_type") == "m.thread" and rel.get("event_id") == root:
await redact_card(ev.event_id)
in_reply = (rel.get("m.in_reply_to") or {}).get("event_id")
# A thread child carries event_id==root; the un-threaded nudge carries only
# m.in_reply_to.event_id==root. Catch both so the thread AND its main-timeline
# pointer clear together.
if rel.get("event_id") == root or in_reply == root:
await redact_card(room_id, ev.event_id)
token = getattr(resp, "end", None)
scanned += len(chunk)
if not token or scanned > 1000:
@@ -275,7 +286,7 @@ async def main():
return
# Success → clear the whole thread (card + replies). No confirmation: the thread
# vanishing is the acknowledgment, and a confirmation reply would keep it alive.
await redact_thread(root)
await redact_thread(review_room, root)
elif decision == "reject":
email_threads.pop(root, None)
try:
@@ -284,7 +295,7 @@ async def main():
email_threads[root] = item
await say(room_id, email_proposals.frame(f"⚠️ couldn't dismiss it ({str(exc)[:200]}). Try again."), root)
return
await redact_thread(root)
await redact_thread(review_room, root)
else:
try:
new_note = await asyncio.to_thread(email_proposals.revise_note, item.get("note") or "", text)
@@ -332,7 +343,7 @@ async def main():
if not ev:
continue
try:
await redact_thread(ev)
await redact_thread(review_room, ev)
await asyncio.to_thread(crm_client.mark_email_proposal_closed, it["id"])
email_threads.pop(ev, None)
except Exception as exc:
+86
View File
@@ -0,0 +1,86 @@
#!/usr/bin/env python3
"""One-time maintenance: clear the intake room's backlog of resolved/stale messages.
Going forward the bot redacts each intake thread when it's approved/rejected (bot card + ack +
nudge + the user's own note/photo). This clears the messages that piled up BEFORE that shipped.
The intake room is single-purpose and the bot keeps **no durable pending state** (its proposal
store is in-memory and is lost on every restart), so nothing in the room is "still live" after a
restart — every message in it is safe to redact. This walks the room history and redacts every
m.room.message event (text + business-card images), bot's and humans' alike.
Redacting another user's message (the humans' notes/photos) needs the bot to hold a **redact /
moderator power level** in the intake room — without it those just no-op and linger (the bot's own
messages still clear). Make the bot a moderator of the intake room in Element first.
Safe by default: prints what it WOULD redact and does nothing. Pass --apply to actually redact.
Run on the Spark via the bot's own creds/image:
docker compose run --rm matrix-intake python -u backend/matrix_intake/redact_intake.py
docker compose run --rm matrix-intake python -u backend/matrix_intake/redact_intake.py --apply
"""
import asyncio
import sys
from nio import AsyncClient, MessageDirection
import settings
MAX_PAGES = 50 # 50 * 100 events is far more history than this room holds
async def main(apply):
mx = settings.matrix_settings()
intake_room = mx.get("intake_room")
if not intake_room:
print("MATRIX_INTAKE_ROOM is not set — nothing to do.")
return
client = AsyncClient(mx["homeserver"], mx["user_id"])
client.restore_login(user_id=mx["user_id"], device_id=mx["device_id"], access_token=mx["token"])
try:
sync = await client.sync(timeout=10000, full_state=False)
token = sync.next_batch
targets = [] # (event_id, label)
seen = set()
for _ in range(MAX_PAGES):
resp = await client.room_messages(intake_room, start=token,
direction=MessageDirection.back, limit=100)
chunk = getattr(resp, "chunk", None)
if not chunk:
break
for ev in chunk:
src = getattr(ev, "source", None) or {}
if src.get("type") != "m.room.message":
continue # only chat messages + images; leave membership/state events alone
eid = getattr(ev, "event_id", None)
if not eid or eid in seen:
continue
seen.add(eid)
content = src.get("content") or {}
if not content:
continue # already redacted (content stripped) — skip
msgtype = content.get("msgtype") or "?"
body = (content.get("body", "") or "").replace("\n", " ")
who = "bot " if getattr(ev, "sender", None) == mx["user_id"] else "user"
targets.append((eid, f"{who} [{msgtype}] {body[:60]}"))
token = getattr(resp, "end", None)
if not token:
break
print(f"messages to clear in the intake room: {len(targets)}")
fails = 0
for eid, label in targets:
print(("APPLY redact " if apply else "WOULD redact ") + eid + " :: " + label)
if apply:
r = await client.room_redact(intake_room, eid, reason="retroactive intake-room cleanup")
if not hasattr(r, "event_id"):
fails += 1
print(f" ! redact failed (need mod power for others' messages?): {r}")
print(("done — redacted " if apply else "dry run — would redact ")
+ f"{len(targets) - (fails if apply else 0)}/{len(targets)} event(s)"
+ (f"; {fails} failed" if apply and fails else "") + ".")
finally:
await client.close()
if __name__ == "__main__":
asyncio.run(main(apply="--apply" in sys.argv[1:]))
+86 -6
View File
@@ -1305,14 +1305,40 @@ def _strip_legal_suffix(normalized_name):
return " ".join(toks)
# Generic firm-descriptor words that carry almost no identifying signal: nearly every firm name
# contains one ("… Investment Group", "… Capital", "… Family Office"). Two names that overlap ONLY
# on these are NOT duplicates — 'Fortitude Investment Group' is not 'Aether Investment Group'. We
# compare on the DISTINCTIVE remainder so a shared descriptor can't inflate the score (the earlier
# "Capital/Ventures/Partners are distinctive enough to keep" assumption produced false shortlists —
# Grant, 2026-06-20). If a name is ALL descriptor ('Family Office'), we fall back to its full tokens
# so there's still something to compare.
_GENERIC_ORG_WORDS = frozenset({
"investment", "investments", "investing", "investor", "investors",
"capital", "ventures", "venture", "partners", "partner", "group",
"fund", "funds", "management", "advisors", "advisers", "advisory",
"asset", "assets", "holdings", "holding", "family", "office",
"trust", "associates", "equity", "financial", "finance", "global",
"international", "company", "enterprises", "wealth", "the", "and", "of",
})
def _distinctive_tokens(normalized_name):
"""Tokens of a (legal-suffix-stripped) name with generic firm descriptors removed. Falls back to
the full token list when the name is nothing but descriptors, so an all-generic name still compares."""
toks = re.findall(r"[a-z0-9]+", normalized_name)
keep = [t for t in toks if t not in _GENERIC_ORG_WORDS]
return keep or toks
def _name_similarity(a, b):
"""0..1 fuzzy similarity between two investor names: the max of difflib's sequence ratio
(catches near-spellings 'Charlie'/'Charles') and token-set Jaccard overlap (catches
word-order differences). Legal-entity suffixes are stripped first, so two names differing
only by 'LLC'/'LP'/'Inc' score 1.0 (a near-certain duplicate to surface find_intake_match
won't have caught it, since it compares the full string). Favors recall: a shared common
name-word ('… Capital') can lift unrelated firms into the 0.60.8 band acceptable noise in
a ranked, human-confirmed shortlist; semantic pruning is the deferred LLM-judge's job."""
won't have caught it, since it compares the full string). Both the ratio and the Jaccard run on
the DISTINCTIVE tokens (generic descriptors like 'Investment Group'/'Capital' removed), so firms
that share only a descriptor don't surface as look-alikes; 'Aether Capital' ~ 'Aether Capital
Partners' still scores 1.0 on the distinctive 'aether'. Still recall-favoring on real overlap."""
a = _normalize_text(a)
b = _normalize_text(b)
if not a or not b:
@@ -1323,9 +1349,10 @@ def _name_similarity(a, b):
sb = _strip_legal_suffix(b) or b
if sa == sb:
return 1.0
ratio = difflib.SequenceMatcher(None, sa, sb).ratio()
ta = set(re.findall(r"[a-z0-9]+", sa))
tb = set(re.findall(r"[a-z0-9]+", sb))
da = _distinctive_tokens(sa) # order-preserving for the sequence ratio
db = _distinctive_tokens(sb)
ratio = difflib.SequenceMatcher(None, " ".join(da), " ".join(db)).ratio()
ta, tb = set(da), set(db)
jaccard = len(ta & tb) / len(ta | tb) if (ta or tb) else 0.0
return max(ratio, jaccard)
@@ -1881,6 +1908,45 @@ def existing_investor_by_source_row(conn):
return out
def fundraising_contact_emails_by_row(conn):
"""{ source_row_id: {'order': {sort_order: email}, 'name': {normalized_name: email}} } of the
authoritative email per grid contact, for HEALING blank pill emails on read.
The grid blob is canonical for the edit sheet, but an email can reach the linked classic
contact (via email capture / a contact edit) without ever being written back into the blob
pill so the mobile "Edit investor" sheet shows an empty email for a contact the directory
clearly has (Grant, 2026-06-20). We recover it from the relational mirror: prefer the synced
fundraising_contacts.email, else the linked classic contacts.email (the source that actually
holds the captured address). Keyed by sort_order (pills and fundraising_contacts share the
blob order the robust key) with a normalized-name fallback. Only non-blank emails are
returned; filling is fill-only-when-blank in the handler, so it heals and converges (the next
one-row save persists the recovered email into the blob)."""
out = {}
rows = conn.execute(
"""
SELECT fi.source_row_id AS srid, fc.sort_order AS so, fc.full_name AS name,
COALESCE(NULLIF(TRIM(fc.email), ''), c.email) AS email
FROM fundraising_investors fi
JOIN fundraising_contacts fc ON fc.investor_id = fi.id
LEFT JOIN contacts c ON c.id = fc.contact_id AND c.deleted_at IS NULL
"""
).fetchall()
for r in rows:
email = str(r['email'] or '').strip()
if not email:
continue
srid = str(r['srid'] or '')
if not srid:
continue
bucket = out.setdefault(srid, {'order': {}, 'name': {}})
if r['so'] is not None:
bucket['order'][int(r['so'])] = email
nm = _normalize_text(r['name'])
if nm:
bucket['name'][nm] = email
return out
def contact_grid_signals(conn, contact_id=None):
"""Return {contacts.id: {'committed': float, 'pipeline_stage': str|None, 'priority': bool}} for
every classic contact linked to a fundraising-grid investor (via fundraising_contacts.contact_id,
@@ -5830,6 +5896,7 @@ class CRMHandler(BaseHTTPRequestHandler):
reminder_by_row = reminder_status_by_source_row(conn)
existing_by_row = existing_investor_by_source_row(conn)
recency_by_row = staleness_by_source_row(conn)
emails_by_row = fundraising_contact_emails_by_row(conn)
conn.close()
try:
@@ -5873,6 +5940,19 @@ class CRMHandler(BaseHTTPRequestHandler):
last_activity, staleness = recency_by_row.get(srid, (None, ''))
r['last_activity_at'] = last_activity
r['staleness'] = staleness
# Heal blank pill emails from the relational mirror (fill-only — never overwrite a value
# already in the blob). Unlike the read-only columns above, email is a REAL blob field,
# so this is a backfill, not a derived signal: it needs NO strip point, and the next
# one-row save legitimately persists it. Match by pill order, then by name.
heal = emails_by_row.get(srid)
pills = r.get('contacts')
if heal and isinstance(pills, list):
for i, c in enumerate(pills):
if not isinstance(c, dict) or str(c.get('email') or '').strip():
continue
found = heal['order'].get(i) or heal['name'].get(_normalize_text(c.get('name')))
if found:
c['email'] = found
return self.send_json({
"data": {
+135
View File
@@ -0,0 +1,135 @@
#!/usr/bin/env python3
"""Regression: GET /api/fundraising/state heals blank grid-pill emails from the relational mirror.
The grid blob is canonical for the mobile "Edit investor" sheet, but an email can reach a linked
classic contact (email capture / a contact edit) without ever being written back into the blob pill
— so the edit form showed an empty email for a contact the directory clearly had (Grant, 2026-06-20).
The state handler now fills a blank pill email from fundraising_contacts.email, else the linked
contacts.email, matched by pill order then name. This asserts:
- a blank pill whose linked contact has an email is HEALED on read;
- a blank pill whose linked contact is also blank stays blank;
- a pill that already carries an email in the blob is NEVER overwritten (fill-only).
Synthetic data only.
Run: cd backend && python3 test_grid_email_heal.py
"""
import http.client
import json
import os
import sqlite3
import sys
import tempfile
import threading
from http.server import ThreadingHTTPServer
_DATA = tempfile.mkdtemp()
os.environ["CRM_DATA_DIR"] = _DATA
os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import server # noqa: E402
FAILS = []
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
class _Quiet(server.CRMHandler):
def log_message(self, *a):
pass
def _get_state(port, token):
conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
conn.request("GET", "/api/fundraising/state", headers={"Authorization": "Bearer " + token})
resp = conn.getresponse()
raw = resp.read().decode("utf-8", "replace")
conn.close()
return resp.status, (json.loads(raw) if raw else None)
GRID = {
"columns": [{"id": "investor_name", "label": "Investor", "type": "text"},
{"id": "contacts", "label": "Contacts", "type": "contacts"}],
"rows": [
{"id": "rowW", "investor_name": "Wyoming", "notes": "",
"contacts": [{"name": "Philip Treick", "email": "", "title": ""},
{"name": "Jose Briones", "email": "", "title": ""}]},
{"id": "rowA", "investor_name": "Acme Capital", "notes": "",
"contacts": [{"name": "Jane Doe", "email": "keep@acme.com", "title": ""}]},
],
}
def seed():
c = sqlite3.connect(os.environ["CRM_DB_PATH"])
c.execute("INSERT INTO users (id,username,email,password_hash,full_name,role,is_active) "
"VALUES ('u1','grant','grant@ten31.example','x','Grant','admin',1)")
c.execute("INSERT INTO fundraising_state (id, grid_json, views_json, version) "
"VALUES ('main', ?, '[]', 1) "
"ON CONFLICT(id) DO UPDATE SET grid_json = excluded.grid_json", (json.dumps(GRID),))
# Classic contacts directory: Jose has the captured email the blob never got; Philip is blank.
c.execute("INSERT INTO contacts (id,first_name,last_name,email) VALUES "
"('c-phil','Philip','Treick',''),"
"('c-jose','Jose','Briones','jbriones@uwyo.edu'),"
"('c-jane','Jane','Doe','other@acme.com')") # differs from the blob's keep@acme.com
# Relational mirror (what sync_fundraising_relational would build): blank fc.email, linked contact_id.
c.execute("INSERT INTO fundraising_investors (id,investor_name,source_row_id,total_invested) VALUES "
"('inv-w','Wyoming','rowW',0),('inv-a','Acme Capital','rowA',0)")
c.execute("INSERT INTO fundraising_contacts (id,investor_id,full_name,email,sort_order,contact_id) VALUES "
"('fc-phil','inv-w','Philip Treick','',0,'c-phil'),"
"('fc-jose','inv-w','Jose Briones','',1,'c-jose'),"
"('fc-jane','inv-a','Jane Doe','',0,'c-jane')")
c.commit()
c.close()
def main():
server.init_db()
seed()
token = server.create_token("u1", "grant", "admin")
httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
port = httpd.server_address[1]
threading.Thread(target=httpd.serve_forever, daemon=True).start()
try:
st, d = _get_state(port, token)
rows = ((d or {}).get("data", {}).get("grid", {}) or {}).get("rows", [])
by_id = {r.get("id"): r for r in rows}
w = by_id.get("rowW", {})
a = by_id.get("rowA", {})
wc = w.get("contacts", [])
ac = a.get("contacts", [])
print("\n[heal: blank pill email filled from the linked contact (Jose)]")
jose = next((c for c in wc if c.get("name") == "Jose Briones"), {})
check(st == 200 and jose.get("email") == "jbriones@uwyo.edu",
f"Jose pill healed to jbriones@uwyo.edu (got {jose.get('email')!r})")
print("\n[heal: blank pill whose contact is also blank stays blank (Philip)]")
phil = next((c for c in wc if c.get("name") == "Philip Treick"), {})
check(phil.get("email", "") == "",
f"Philip pill stays blank (got {phil.get('email')!r})")
print("\n[heal: a pill that already has an email is never overwritten (Jane)]")
jane = next((c for c in ac if c.get("name") == "Jane Doe"), {})
check(jane.get("email") == "keep@acme.com",
f"Jane pill keeps its blob email, not the contact's (got {jane.get('email')!r})")
finally:
httpd.shutdown()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (grid email heal)")
if __name__ == "__main__":
main()
+20
View File
@@ -75,6 +75,12 @@ GRID = {
"contacts": [{"name": "Charlie Brown", "email": "cb@brown.fund", "title": ""}]},
{"id": "rowBeta", "investor_name": "Beta Capital LLC", "notes": "",
"contacts": [{"name": "Pat Roe", "email": "pat@beta.com", "title": ""}]},
# Generic-descriptor decoys: share only "investment group" / "investments" with the
# Fortitude card below — must NOT surface as look-alikes (the 2026-06-20 false-positive fix).
{"id": "rowAether", "investor_name": "Aether Investment Group", "notes": "",
"contacts": [{"name": "Ada Ng", "email": "ada@aether.com", "title": ""}]},
{"id": "rowRussell", "investor_name": "Russell Investments", "notes": "",
"contacts": [{"name": "Russ Lee", "email": "russ@russell.com", "title": ""}]},
],
}
@@ -178,6 +184,20 @@ def main():
check(st == 200 and data.get("match") is None and data.get("candidates") == [],
f"unrelated query -> no match, no candidates (got {data})")
print("\n[fuzzy: shared generic words alone do NOT surface look-alikes (Fortitude vs Aether/Russell)]")
st, d = _req(port, "GET", "/api/intake/match?q=Fortitude%20Investment%20Group", token)
data = (d or {}).get("data", {})
cids = [c["id"] for c in data.get("candidates", [])]
check(data.get("match") is None and "rowAether" not in cids and "rowRussell" not in cids,
f"generic-only overlap -> no decoy candidates (got {data})")
print("\n[fuzzy: a shared DISTINCTIVE word still surfaces (Aether Capital ~ Aether Investment Group)]")
st, d = _req(port, "GET", "/api/intake/match?q=Aether%20Capital", token)
data = (d or {}).get("data", {})
cids = [c["id"] for c in data.get("candidates", [])]
check(data.get("match") is None and "rowAether" in cids,
f"distinctive overlap -> rowAether candidate (got {data})")
print("\n[match: missing q and email -> 400]")
st, _ = _req(port, "GET", "/api/intake/match", token)
check(st == 400, f"no params -> 400 (got {st})")