Root cause: grid contacts (fundraising_contacts) are the SAME people as the
contacts table (the app syncs them by name/email), but resolution matched grid
rows by (name + investor-canon) where the two sides derive the investor key from
different tables that rarely line up — so nearly every grid contact minted a
duplicate person (715 + ~692 ≈ 1406), and the duplicate finder then flagged each
twin against its real self (~676 candidates).
Fix (entity_resolution.py):
- Grid pass matches a grid contact to its existing contacts-table person by
PROVABLE keys only (exact email, else exact name within the same investor) and
records membership; on a miss it MINTS NOTHING (the old else-branch mint was the
double-count source, and guessing by name across firms risks binding two
different same-named people).
- Targeted, audited cleanup soft-deletes leftover grid-only "twins" (person rows
with no 'contacts' link) and superseded pre-:48 'lp'/'organization' rows, guarded
so any row carrying enrichment/human data is never dropped (guardrail #3); the
tombstoned ids are logged to interaction_log (guardrail #5).
- _upsert_entity clears deleted_at on conflict so a re-emitted id is un-tombstoned
(no permanent burial); fuzzy-merge losers stay buried via _redirect.
entity_merge.py / server.py: the duplicate queue + pending count now filter to
candidates whose both sides are still live, so self-healed twins drop out.
Verified: offline reproduction test (backend/ingest/test_entity_resolution.py,
10/10) reproduces the 1406-style doubling and proves it collapses; no regression
on the synthetic dev set; two adversarial review passes. Known pre-existing
identity-key weaknesses (same name+firm+no email collision; shared role inbox
over-link) are unchanged by this fix and will be resolved structurally by the
contact_id link in the grid/contacts unification.
Run "Build search index" after upgrading to recompute the canonical layer.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lets a non-technical operator install the Architect's Claude key from the
StartOS UI instead of the terminal: a masked text field whose value is written
to /data/secrets/anthropic-api-key (0600) on the box — the same file the
entrypoint already loads at boot. Secret is piped over stdin (never argv/env),
CR/LF stripped to match the entrypoint's read. allowedStatuses 'any'; a restart
is required (and stated in the action's warning + success message) since the
entrypoint reads the key only at startup.
Verified the Architect's data boundary first: the deployed Thesis Workshop
routes send only Ten31's own thesis text (thesis_lines/thesis_nodes) + the
partner-typed guidance to Claude — no contacts/lp_profiles/communications/grid.
(The MCP CRM-retrieval tools that DO return record substance are not wired into
the deployed Architect; the redaction boundary must land before any grounding
path uses them — Phase 1 Workstream D.)
tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Frontend: ThesisWorkshopPage / ThesisWorkshopNode / ThesisWorkshopOptions —
the collaborative iteration screen where partners generate a variable number
of competing thesis options (1, 2, 3, A1/A2/A3 ...) for any node, give
feedback, and regenerate. Reuses the shared api() helper; flexible option
count is the core UX constraint.
Backend Architect agent (architect_agent.py) + routes shipped in dd25bbc;
this completes the user-facing surface and bumps the StartOS package to
0.1.0:49 (anthropic dep already in the image, key loaded from
/data/secrets/anthropic-api-key — self-disabling until present).
Also lands thesis seed iterations v3 and v5 (voice/messaging corrections).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Grant's clarification of the real data model:
- Investor entities come from the fundraising grid, one per row, all labeled
"investor" (drops the confusing lp/organization split). Grid is source of truth.
- People come ONLY from the contacts table. The grid's contacts (fundraising_
contacts) are matched to a contact-person and recorded as member_of links to
their investor, instead of creating duplicate person entities. This fixes the
~doubled people count (people now ≈ contacts, not contacts + grid contacts).
- System Status cards: Investors / People (resolved) / Contacts in CRM / Grid
contacts, so resolved-vs-source is visible at a glance.
Verified on synthetic: people == contacts count (no double-count); multi-contact
investors preserved via member_of.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The image COPY'd backend/server.py + a few subdirs but missed core_migrations.py,
backend/migrations/, and the Phase-1 modules (thesis_review/entity_merge/
entity_jobs). On the box the migrations never ran (tables absent) and those
endpoints 503'd ("Jobs unavailable"). Now COPY backend wholesale (.dockerignore
keeps __pycache__/data out). Bump to 0.1.0:46.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- frontend: System Status page extended with one-click index actions
(update/rebuild/find-duplicates, with live job status) and a human-in-the-loop
duplicate-review queue (approve=merge / reject=keep-separate per candidate).
- StartOS version 0.1.0:45 (image-only; schema via the in-app migration runner).
Backend + new routes verified end-to-end via the running HTTP server.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Fuzzy tier (backend/ingest/fuzzy_resolve.py + llm.py): local Qwen adjudicates
the deterministic resolver's flagged name-variant candidates; merges are
durable via entity_merges (deterministic re-runs respect them), losers
soft-deleted, logged. Idempotent.
- Incremental sync (backend/ingest/sync.py): re-embeds only rows changed since a
watermark (ingest_sync_state); first run / --recreate = full. Tested full→0→1.
- Start9 packaging (start9/0.4): Dockerfile bundles ingest+mcp + fastembed/mcp;
"Build search index" action runs the init in a subcontainer; MCP shipped as a
manual stdio server (not a daemon); version 0.1.0:44. INGEST_PACKAGING.md.
- backfill.py: factored embed_and_upsert() shared with sync.
Verified end-to-end on synthetic data + live Sparks/Qwen/Qdrant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>