Phase 0 complete: fuzzy entity tier, incremental sync, Start9 packaging

- Fuzzy tier (backend/ingest/fuzzy_resolve.py + llm.py): local Qwen adjudicates
  the deterministic resolver's flagged name-variant candidates; merges are
  durable via entity_merges (deterministic re-runs respect them), losers
  soft-deleted, logged. Idempotent.
- Incremental sync (backend/ingest/sync.py): re-embeds only rows changed since a
  watermark (ingest_sync_state); first run / --recreate = full. Tested full→0→1.
- Start9 packaging (start9/0.4): Dockerfile bundles ingest+mcp + fastembed/mcp;
  "Build search index" action runs the init in a subcontainer; MCP shipped as a
  manual stdio server (not a daemon); version 0.1.0:44. INGEST_PACKAGING.md.
- backfill.py: factored embed_and_upsert() shared with sync.

Verified end-to-end on synthetic data + live Sparks/Qwen/Qdrant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 08:55:12 -05:00
parent c7ce44d963
commit f357c23c75
16 changed files with 808 additions and 48 deletions
+3 -2
View File
@@ -4,8 +4,9 @@ import { v_0_1_0_40 } from './v0.1.0.40'
import { v_0_1_0_41 } from './v0.1.0.41'
import { v_0_1_0_42 } from './v0.1.0.42'
import { v_0_1_0_43 } from './v0.1.0.43'
import { v_0_1_0_44 } from './v0.1.0.44'
export const versionGraph = VersionGraph.of({
current: v_0_1_0_43,
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42],
current: v_0_1_0_44,
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43],
})