c7ce44d963
Workstream A–C substrate for the Ten31 agentic system: - A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9 - A2: additive/reversible core migration (canonical_entities, entity_links, interaction_log, relationship_edges, soft-delete) + ledgered runner - B1/B3: chunking + deterministic entity resolution (backend/ingest) - B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks - C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools - docs: redaction/re-hydration, Gmail enablement runbook - synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db, drop legacy files + start9/0.3.5) Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity queries). Real backfill runs on Ten31 infra; index holds synthetic data only. Branch snapshot also captures pre-existing working-tree changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
18 lines
656 B
Python
18 lines
656 B
Python
"""Dense embeddings via Spark Control /v1/embeddings (BAAI/bge-m3, 1024-d)."""
|
|
import config
|
|
import http_util
|
|
|
|
|
|
def dense_embed(texts, batch=32):
|
|
out = []
|
|
for i in range(0, len(texts), batch):
|
|
group = texts[i:i + batch]
|
|
status, data = http_util.request(
|
|
"POST", f"{config.SPARK_CONTROL_URL}/v1/embeddings",
|
|
{"input": group, "model": config.EMBED_MODEL}, verify=config.SPARK_VERIFY_TLS)
|
|
if status != 200:
|
|
raise RuntimeError(f"/v1/embeddings -> {status}: {data}")
|
|
rows = sorted(data["data"], key=lambda d: d["index"])
|
|
out.extend(r["embedding"] for r in rows)
|
|
return out
|