Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server

Workstream A–C substrate for the Ten31 agentic system:
- A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9
- A2: additive/reversible core migration (canonical_entities, entity_links,
  interaction_log, relationship_edges, soft-delete) + ledgered runner
- B1/B3: chunking + deterministic entity resolution (backend/ingest)
- B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks
- C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools
- docs: redaction/re-hydration, Gmail enablement runbook
- synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db,
  drop legacy files + start9/0.3.5)

Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity
queries). Real backfill runs on Ten31 infra; index holds synthetic data only.
Branch snapshot also captures pre-existing working-tree changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 08:11:28 -05:00
parent 7027efd777
commit c7ce44d963
99 changed files with 10676 additions and 7817 deletions
+21
View File
@@ -0,0 +1,21 @@
# Ten31 agentic system — environment template.
# Copy to .env (gitignored) and fill in. Secret values NEVER go in .env.example.
# ── Claude (frontier reasoning; Agent SDK uses an API key, not claude.ai login) ──
ANTHROPIC_API_KEY=
# ── Spark Control gateway (local model services; reads + dense embeds) ──
# HTTPS with the Start9 self-signed cert -> clients must skip TLS verification.
SPARK_CONTROL_URL=https://<spark-control-host>:<port>
SPARK_CONTROL_VERIFY_TLS=false
# ── Qdrant (direct, for ingest: create collection + upsert points) ──
# Plain HTTP on the trusted LAN, no auth currently.
QDRANT_URL=http://<spark2-host>:6333
# ── X (Twitter) API for Scout/Analyst enrichment (NOT a CRM key) ──
X_API_KEY=
# ── CRM (ingest opens the SQLite file directly, read-only) ──
CRM_DB_PATH=./data/crm.db
CRM_DEV_DB_PATH=./data/crm_dev.db