Files
Keysat f357c23c75 Phase 0 complete: fuzzy entity tier, incremental sync, Start9 packaging
- Fuzzy tier (backend/ingest/fuzzy_resolve.py + llm.py): local Qwen adjudicates
  the deterministic resolver's flagged name-variant candidates; merges are
  durable via entity_merges (deterministic re-runs respect them), losers
  soft-deleted, logged. Idempotent.
- Incremental sync (backend/ingest/sync.py): re-embeds only rows changed since a
  watermark (ingest_sync_state); first run / --recreate = full. Tested full→0→1.
- Start9 packaging (start9/0.4): Dockerfile bundles ingest+mcp + fastembed/mcp;
  "Build search index" action runs the init in a subcontainer; MCP shipped as a
  manual stdio server (not a daemon); version 0.1.0:44. INGEST_PACKAGING.md.
- backfill.py: factored embed_and_upsert() shared with sync.

Verified end-to-end on synthetic data + live Sparks/Qwen/Qdrant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 08:55:12 -05:00

48 lines
2.3 KiB
TypeScript

import { VersionInfo } from '@start9labs/start-sdk'
// Phase-0 substrate packaging release.
//
// Context:
// * Ships the Phase-0 ingest pipeline (backend/ingest/) and the CRM MCP
// server (backend/mcp/) inside the existing CRM container image, alongside
// the web server. Two runtime deps are added to the image: `fastembed`
// (client-side BM25 for the sparse retrieval leg) and `mcp` (the MCP
// Python SDK, used only to run backend/mcp/server.py). The CRM web server
// itself gains no new dependencies and is unchanged.
// * Adds a one-shot "Build search index" StartOS action that runs the
// one-time init on the box where /data/crm.db lives:
// entity_resolution.py --db /data/crm.db (canonical ids)
// backfill.py --db /data/crm.db --recreate (Qdrant search index)
// Both steps are idempotent and read-only on the CRM source tables.
// * docker_entrypoint.sh now exports the Spark Control / Qdrant env
// (SPARK_CONTROL_URL, SPARK_CONTROL_VERIFY_TLS, QDRANT_URL) with LAN
// defaults so manual ingest / MCP runs on the box inherit them.
//
// The MCP server is intentionally NOT a daemon in this release: it is an
// stdio server with no port to bind and (in Phase 0) no live agent on the box
// to talk to it, so it is run manually for testing. See
// start9/0.4/INGEST_PACKAGING.md.
//
// No schema changes and no data migration: the SQLite schema is unchanged and
// the live /data volume is left exactly as-is. The new tables the ingest
// pipeline reads/writes are created by the CRM's own migration runner
// (migrations/0001_phase0_foundation.sql), independent of this package change.
export const v_0_1_0_44 = VersionInfo.of({
version: '0.1.0:44',
releaseNotes: {
en_US: [
'Ships the Phase-0 data substrate inside the CRM image: the ingest',
'pipeline (entity resolution + Qdrant backfill) and the CRM MCP server,',
'plus the fastembed and mcp runtime dependencies. Adds a one-time',
'"Build search index" action that resolves canonical entity ids from',
'your live CRM and rebuilds the Qdrant search index — both steps are',
'idempotent and read-only on your CRM source data. The CRM web server',
'is unchanged and gains no new dependencies. No data migration.',
].join(' '),
},
migrations: {
up: async () => {},
down: async () => {},
},
})