f357c23c75
- Fuzzy tier (backend/ingest/fuzzy_resolve.py + llm.py): local Qwen adjudicates the deterministic resolver's flagged name-variant candidates; merges are durable via entity_merges (deterministic re-runs respect them), losers soft-deleted, logged. Idempotent. - Incremental sync (backend/ingest/sync.py): re-embeds only rows changed since a watermark (ingest_sync_state); first run / --recreate = full. Tested full→0→1. - Start9 packaging (start9/0.4): Dockerfile bundles ingest+mcp + fastembed/mcp; "Build search index" action runs the init in a subcontainer; MCP shipped as a manual stdio server (not a daemon); version 0.1.0:44. INGEST_PACKAGING.md. - backfill.py: factored embed_and_upsert() shared with sync. Verified end-to-end on synthetic data + live Sparks/Qwen/Qdrant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
48 lines
2.3 KiB
TypeScript
48 lines
2.3 KiB
TypeScript
import { VersionInfo } from '@start9labs/start-sdk'
|
|
|
|
// Phase-0 substrate packaging release.
|
|
//
|
|
// Context:
|
|
// * Ships the Phase-0 ingest pipeline (backend/ingest/) and the CRM MCP
|
|
// server (backend/mcp/) inside the existing CRM container image, alongside
|
|
// the web server. Two runtime deps are added to the image: `fastembed`
|
|
// (client-side BM25 for the sparse retrieval leg) and `mcp` (the MCP
|
|
// Python SDK, used only to run backend/mcp/server.py). The CRM web server
|
|
// itself gains no new dependencies and is unchanged.
|
|
// * Adds a one-shot "Build search index" StartOS action that runs the
|
|
// one-time init on the box where /data/crm.db lives:
|
|
// entity_resolution.py --db /data/crm.db (canonical ids)
|
|
// backfill.py --db /data/crm.db --recreate (Qdrant search index)
|
|
// Both steps are idempotent and read-only on the CRM source tables.
|
|
// * docker_entrypoint.sh now exports the Spark Control / Qdrant env
|
|
// (SPARK_CONTROL_URL, SPARK_CONTROL_VERIFY_TLS, QDRANT_URL) with LAN
|
|
// defaults so manual ingest / MCP runs on the box inherit them.
|
|
//
|
|
// The MCP server is intentionally NOT a daemon in this release: it is an
|
|
// stdio server with no port to bind and (in Phase 0) no live agent on the box
|
|
// to talk to it, so it is run manually for testing. See
|
|
// start9/0.4/INGEST_PACKAGING.md.
|
|
//
|
|
// No schema changes and no data migration: the SQLite schema is unchanged and
|
|
// the live /data volume is left exactly as-is. The new tables the ingest
|
|
// pipeline reads/writes are created by the CRM's own migration runner
|
|
// (migrations/0001_phase0_foundation.sql), independent of this package change.
|
|
export const v_0_1_0_44 = VersionInfo.of({
|
|
version: '0.1.0:44',
|
|
releaseNotes: {
|
|
en_US: [
|
|
'Ships the Phase-0 data substrate inside the CRM image: the ingest',
|
|
'pipeline (entity resolution + Qdrant backfill) and the CRM MCP server,',
|
|
'plus the fastembed and mcp runtime dependencies. Adds a one-time',
|
|
'"Build search index" action that resolves canonical entity ids from',
|
|
'your live CRM and rebuilds the Qdrant search index — both steps are',
|
|
'idempotent and read-only on your CRM source data. The CRM web server',
|
|
'is unchanged and gains no new dependencies. No data migration.',
|
|
].join(' '),
|
|
},
|
|
migrations: {
|
|
up: async () => {},
|
|
down: async () => {},
|
|
},
|
|
})
|