Per Grant's clarification of the real data model:
- Investor entities come from the fundraising grid, one per row, all labeled
"investor" (drops the confusing lp/organization split). Grid is source of truth.
- People come ONLY from the contacts table. The grid's contacts (fundraising_
contacts) are matched to a contact-person and recorded as member_of links to
their investor, instead of creating duplicate person entities. This fixes the
~doubled people count (people now ≈ contacts, not contacts + grid contacts).
- System Status cards: Investors / People (resolved) / Contacts in CRM / Grid
contacts, so resolved-vs-source is visible at a glance.
Verified on synthetic: people == contacts count (no double-count); multi-contact
investors preserved via member_of.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The image COPY'd backend/server.py + a few subdirs but missed core_migrations.py,
backend/migrations/, and the Phase-1 modules (thesis_review/entity_merge/
entity_jobs). On the box the migrations never ran (tables absent) and those
endpoints 503'd ("Jobs unavailable"). Now COPY backend wholesale (.dockerignore
keeps __pycache__/data out). Bump to 0.1.0:46.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These fundraising-state backup JSONs contain real investor data and were tracked
from the initial commit. Untracked (local files preserved) and gitignored, same
as data/crm.db. Keeps real data out of future commits and the package gitHash.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- frontend: System Status page extended with one-click index actions
(update/rebuild/find-duplicates, with live job status) and a human-in-the-loop
duplicate-review queue (approve=merge / reject=keep-separate per candidate).
- StartOS version 0.1.0:45 (image-only; schema via the in-app migration runner).
Backend + new routes verified end-to-end via the running HTTP server.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Dual sign-off is now the default (thesis_required_approvals defaults to 2).
- Entity-merge review queue (migration 0003): the fuzzy/Qwen tier no longer
auto-merges — it writes CANDIDATES (entity_merge_candidates) with a same/different
suggestion + confidence + reason for a human to approve (merge) or reject (keep
separate). entity_merge.py applies/rejects (durable via entity_merges, soft-delete,
repoint links+edges); decided pairs aren't re-surfaced.
- entity_jobs.py: UI-triggered background index jobs (rebuild/update/find-duplicates)
as subprocesses with a one-at-a-time lock; status in /api/system/status.
- server.py: /api/index/{rebuild,update}, /api/entities/find-duplicates,
/api/entities/merge-candidates [+ /{id} decide] — admin-gated.
- docs/thesis-seed-v2.md: concrete, plain-English rewrite per Grant's feedback.
Backend verified end-to-end on synthetic data (candidate gen -> approve/reject).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two React-via-Babel views in the CRM SPA, reusing the existing api() helper and
conventions:
- Thesis: lists thesis lines + the in-review queue with approvals/required pills;
version detail renders throughline/pillars/claims/objections + the reviews
timeline; admin review form (approve/request-changes/comment + feedback) ->
POST /api/thesis/versions/{id}/review (the dual-approval feedback loop).
- System Status: entity counts, last index sync, thesis counts, recent activity
from the interaction log — index health visible in-app, no shell.
Backend + full approve flow verified end-to-end via the running HTTP server.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- entity_resolution: emit member_of relationship edges (contact -> investor),
so one investor entity owns many contacts (institution) and a HNWI is the N=1
case; crm_tools.get_investor_contacts + get_entity contacts/member_of; MCP tool.
- seed_synthetic: multi-contact institutions to exercise it (Harbor & Vine = 5).
- server.py: GET /api/system/status (index/entity/thesis/activity health) for an
in-app status view (no shell needed to verify the index).
- docs/thesis-seed-v1.md: grounded v1 thesis (throughline, 6 pillars, objections,
per-segment angles, voice) drawn from Ten31's newsletter/site/essays.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- backend/ingest/sync_scheduler.py: periodic incremental-sync loop (every
CRM_INGEST_SYNC_INTERVAL_MIN min); resilient, --once for testing.
- start9/0.4: "Refresh search index" action (incremental sync.py); entrypoint
launches the scheduler as a background process when Spark/Qdrant are set;
CRM_INGEST_SYNC_INTERVAL_MIN env; pre-release note on fastembed/mcp pins.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Fuzzy tier (backend/ingest/fuzzy_resolve.py + llm.py): local Qwen adjudicates
the deterministic resolver's flagged name-variant candidates; merges are
durable via entity_merges (deterministic re-runs respect them), losers
soft-deleted, logged. Idempotent.
- Incremental sync (backend/ingest/sync.py): re-embeds only rows changed since a
watermark (ingest_sync_state); first run / --recreate = full. Tested full→0→1.
- Start9 packaging (start9/0.4): Dockerfile bundles ingest+mcp + fastembed/mcp;
"Build search index" action runs the init in a subcontainer; MCP shipped as a
manual stdio server (not a daemon); version 0.1.0:44. INGEST_PACKAGING.md.
- backfill.py: factored embed_and_upsert() shared with sync.
Verified end-to-end on synthetic data + live Sparks/Qwen/Qdrant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>