Grid/contacts unification step 1: real contact_id link + grid as front door (v0.1.0:52)
Structural fix for the duplicate-people class of bug: instead of matching a grid contact "pill" to a contacts row heuristically by name/email (which drifted and caused the 1406 double-count), link them by id. Backend: - Migration 0004: fundraising_contacts.contact_id (additive, nullable, logical FK to contacts(id)) + index. Paired down migration. - sync_fundraising_relational now stores the id that _upsert_contact_from_fundraising already returns, so every grid contact carries its contacts-table id. - _backfill_grid_contact_ids: one-time, idempotent backfill on startup (re-runs the grid sync once if any row lacks contact_id), so existing data links immediately. - entity_resolution: grid pass prefers the explicit contact_id link (match_kind 'grid_link') over heuristic email / name+investor, guarded by a PRAGMA check so older DBs without the column still work. Frontend: - Fundraising grid "+ Row" -> "+ Investor" (clear, single investor entry point). - Contacts page: the "+ Add Contact" trigger is replaced by a pointer to the grid; the page is now a read/search/edit view (ContactDetailPanel still edits all fields). New people are added from the grid. No contact data is removed. Tests: backend/ingest/test_entity_resolution.py extended (explicit-link case, 11/11) and a new backend/test_grid_contact_link.py integration test (init_db applies 0004, sync populates contact_id to the right contact, re-sync is idempotent). py_compile + frontend html.parser clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -220,27 +220,29 @@ def resolve_people(conn, org_canon_by_orgid, org_canon_by_fundinv, merge_map=Non
|
||||
if cid:
|
||||
contact_to_person[r["id"]] = cid
|
||||
|
||||
# 2. Grid contacts are associations, not new people: match to a contact-person
|
||||
# (by email, else name within the same investor) and just add membership.
|
||||
# Only create a person when there is genuinely no matching contact.
|
||||
for r in conn.execute("SELECT id, full_name, email, investor_id FROM fundraising_contacts"):
|
||||
# 2. Grid contacts are associations, not new people: link each to its
|
||||
# contacts-table person and record membership. We prefer the EXPLICIT
|
||||
# contact_id link (migration 0004 — the grid pill stores the id of the
|
||||
# contact it was created from), and fall back to provable email / exact name
|
||||
# within the same investor for rows not yet backfilled. On a miss we
|
||||
# deliberately do NOT mint a person: the old else-branch mint is exactly what
|
||||
# produced the people double-count, and guessing by name across firms risks
|
||||
# binding two different same-named people — honest separation, never on a guess.
|
||||
fc_cols = {row[1] for row in conn.execute("PRAGMA table_info(fundraising_contacts)")}
|
||||
has_contact_id = "contact_id" in fc_cols
|
||||
sel = ("SELECT id, full_name, email, investor_id" +
|
||||
(", contact_id" if has_contact_id else "") + " FROM fundraising_contacts")
|
||||
for r in conn.execute(sel):
|
||||
email = norm_email(r["email"])
|
||||
name_norm = norm_text(r["full_name"] or "")
|
||||
inv_canon = org_canon_by_fundinv.get(r["investor_id"])
|
||||
# Match the grid contact to its contacts-table person by PROVABLE keys only:
|
||||
# exact email, else exact name within the SAME canonical investor. The app
|
||||
# keeps the grid and the contacts table in sync (_upsert_contact_from_
|
||||
# fundraising), so a grid contact IS an existing contact-person, never a new
|
||||
# one. On a confident match, record the membership. On a miss we deliberately
|
||||
# do NOT mint a person: the old else-branch mint is exactly what produced the
|
||||
# people double-count (a grid row whose (name, investor) key didn't line up
|
||||
# with its contact minted a duplicate), and guessing by name across firms
|
||||
# risks binding two different same-named people. Unresolved grid rows are
|
||||
# left for the explicit contact_id link planned in the grid/contacts
|
||||
# unification — honest separation: never merge or mint on a guess.
|
||||
cid = (by_email.get(email) if email else None) or by_name_inv.get((name_norm, inv_canon or ""))
|
||||
link_cid = r["contact_id"] if has_contact_id else None
|
||||
cid = (contact_to_person.get(link_cid) if link_cid else None) \
|
||||
or (by_email.get(email) if email else None) \
|
||||
or by_name_inv.get((name_norm, inv_canon or ""))
|
||||
if cid:
|
||||
_link(conn, cid, "fundraising_contacts", r["id"], email or name_norm, "grid_assoc", 0.9)
|
||||
mk = "grid_link" if (link_cid and contact_to_person.get(link_cid)) else "grid_assoc"
|
||||
_link(conn, cid, "fundraising_contacts", r["id"], email or name_norm, mk, 0.95 if mk == "grid_link" else 0.9)
|
||||
_member_of(conn, cid, inv_canon)
|
||||
|
||||
# lp_profiles -> the person entity of its contact
|
||||
|
||||
Reference in New Issue
Block a user