Files
ten31-database/start9/0.4
Keysat d16264f401 Fix people double-count + duplicate-queue explosion (v0.1.0:51)
Root cause: grid contacts (fundraising_contacts) are the SAME people as the
contacts table (the app syncs them by name/email), but resolution matched grid
rows by (name + investor-canon) where the two sides derive the investor key from
different tables that rarely line up — so nearly every grid contact minted a
duplicate person (715 + ~692 ≈ 1406), and the duplicate finder then flagged each
twin against its real self (~676 candidates).

Fix (entity_resolution.py):
- Grid pass matches a grid contact to its existing contacts-table person by
  PROVABLE keys only (exact email, else exact name within the same investor) and
  records membership; on a miss it MINTS NOTHING (the old else-branch mint was the
  double-count source, and guessing by name across firms risks binding two
  different same-named people).
- Targeted, audited cleanup soft-deletes leftover grid-only "twins" (person rows
  with no 'contacts' link) and superseded pre-:48 'lp'/'organization' rows, guarded
  so any row carrying enrichment/human data is never dropped (guardrail #3); the
  tombstoned ids are logged to interaction_log (guardrail #5).
- _upsert_entity clears deleted_at on conflict so a re-emitted id is un-tombstoned
  (no permanent burial); fuzzy-merge losers stay buried via _redirect.

entity_merge.py / server.py: the duplicate queue + pending count now filter to
candidates whose both sides are still live, so self-healed twins drop out.

Verified: offline reproduction test (backend/ingest/test_entity_resolution.py,
10/10) reproduces the 1406-style doubling and proves it collapses; no regression
on the synthetic dev set; two adversarial review passes. Known pre-existing
identity-key weaknesses (same name+firm+no email collision; shared role inbox
over-link) are unchanged by this fix and will be resolved structurally by the
contact_id link in the grid/contacts unification.

Run "Build search index" after upgrading to recompute the canonical layer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 14:49:39 -05:00
..

Ten31 Database — StartOS 0.4 wrapper (x86_64)

This directory is the self-contained StartOS 0.4 service package for Ten31 Database. It is the x86_64 successor to the 0.3.5 (aarch64) wrapper in ../0.3.5/. Both packages share the same package id (ten-database) and the same /data volume layout so data can be preserved across the migration.

Start here

Read DEPLOY_040.md first. It covers:

  1. How the image-seed data-preservation mechanism works.
  2. How to refresh the seed with live production data from the 0.3.5 host (via ./refresh_seed.sh or manual scp).
  3. How to install the build prerequisites (Node, Docker, start-cli).
  4. How to build the x86_64 .s9pk.
  5. How to sideload onto the StartOS 0.4 beta node.
  6. A rollback plan and a post-install verification checklist.

Quick cheat sheet

# From this directory:
./refresh_seed.sh embassy@embassy.local   # pull live prod data into seed/
make clean
make x86
make install                              # uses ~/.startos/config.yaml

Data layout (unchanged from 0.3.5)

Inside the container:

  • /data/crm.db — SQLite database
  • /data/backups/ — app-level JSON exports
  • /data/.crm-secret — JWT signing key (created on first boot if absent)

The entrypoint seeds an empty volume from the image's baked-in snapshot on first boot, and is a no-op for every later boot. Existing volumes are never overwritten.

Status

  • Source scaffold: complete and tsc --noEmit clean against @start9labs/start-sdk 0.4.0.
  • Dockerfile: self-contained under start9/0.4/ with no cross-folder references to start9/0.3.5/.
  • Seed snapshot: present at seed/data/ (repo dev DB — replace with live prod data before building).
  • Not yet built into a .s9pk here; build on a machine with Docker + start-cli per DEPLOY_040.md.