Fix people double-count + duplicate-queue explosion (v0.1.0:51)

Root cause: grid contacts (fundraising_contacts) are the SAME people as the
contacts table (the app syncs them by name/email), but resolution matched grid
rows by (name + investor-canon) where the two sides derive the investor key from
different tables that rarely line up — so nearly every grid contact minted a
duplicate person (715 + ~692 ≈ 1406), and the duplicate finder then flagged each
twin against its real self (~676 candidates).

Fix (entity_resolution.py):
- Grid pass matches a grid contact to its existing contacts-table person by
  PROVABLE keys only (exact email, else exact name within the same investor) and
  records membership; on a miss it MINTS NOTHING (the old else-branch mint was the
  double-count source, and guessing by name across firms risks binding two
  different same-named people).
- Targeted, audited cleanup soft-deletes leftover grid-only "twins" (person rows
  with no 'contacts' link) and superseded pre-:48 'lp'/'organization' rows, guarded
  so any row carrying enrichment/human data is never dropped (guardrail #3); the
  tombstoned ids are logged to interaction_log (guardrail #5).
- _upsert_entity clears deleted_at on conflict so a re-emitted id is un-tombstoned
  (no permanent burial); fuzzy-merge losers stay buried via _redirect.

entity_merge.py / server.py: the duplicate queue + pending count now filter to
candidates whose both sides are still live, so self-healed twins drop out.

Verified: offline reproduction test (backend/ingest/test_entity_resolution.py,
10/10) reproduces the 1406-style doubling and proves it collapses; no regression
on the synthetic dev set; two adversarial review passes. Known pre-existing
identity-key weaknesses (same name+firm+no email collision; shared role inbox
over-link) are unchanged by this fix and will be resolved structurally by the
contact_id link in the grid/contacts unification.

Run "Build search index" after upgrading to recompute the canonical layer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 14:49:39 -05:00
parent 3d9caac178
commit d16264f401
7 changed files with 293 additions and 11 deletions
+3 -2
View File
@@ -11,8 +11,9 @@ import { v_0_1_0_47 } from './v0.1.0.47'
import { v_0_1_0_48 } from './v0.1.0.48'
import { v_0_1_0_49 } from './v0.1.0.49'
import { v_0_1_0_50 } from './v0.1.0.50'
import { v_0_1_0_51 } from './v0.1.0.51'
export const versionGraph = VersionGraph.of({
current: v_0_1_0_50,
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49],
current: v_0_1_0_51,
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49, v_0_1_0_50],
})
+22
View File
@@ -0,0 +1,22 @@
import { VersionInfo } from '@start9labs/start-sdk'
// Entity-resolution fix for the people double-count (1406) and the runaway
// duplicate-review queue (676). Grid contacts now link back to their existing
// contacts-table person by provable keys only (exact email, or exact name within
// the same investor) and never mint a duplicate person on a miss; leftover grid
// "twins" and superseded pre-:48 rows are soft-deleted (enrichment-protected,
// audited); re-emitted ids are un-tombstoned. Run "Build search index" after
// upgrading to recompute the canonical layer. No data migration.
export const v_0_1_0_51 = VersionInfo.of({
version: '0.1.0:51',
releaseNotes: {
en_US: [
'Fixes the inflated People count and the oversized duplicate-review queue:',
'grid contacts now resolve to their existing contact record instead of being',
'duplicated, and leftover duplicate "twins" are cleaned up safely (enriched',
'records are never dropped). Run "Build search index" after upgrading to',
'refresh the numbers.',
].join(' '),
},
migrations: { up: async () => {}, down: async () => {} },
})