docs: extract subsystem guides; keep AGENTS.md to whole-repo facts
Move subsystem mechanics (migrations, thesis gate, redaction, ingest, email, packaging) out of AGENTS.md into docs/guides/<topic>.md, each scoped by paths: frontmatter and symlinked from .claude/rules/ so Claude Code lazy-loads them. AGENTS.md keeps whole-repo facts and universal guardrails plus a one-line index per guide. Fix the inaccurate ".claude/ is gitignored" note — it is tracked.
This commit is contained in:
@@ -0,0 +1,21 @@
|
||||
---
|
||||
paths:
|
||||
- backend/ingest/**
|
||||
---
|
||||
|
||||
# Ingest, retrieval & Spark/Qdrant
|
||||
|
||||
Read this before editing the ingest pipeline or retrieval modes.
|
||||
|
||||
## Pipeline
|
||||
|
||||
- `backend/ingest/` is chunk → embed → Qdrant plus retrieval modes (`search.py`, `embed.py`, `qdrant_io.py`, `sparse.py`, `entity_resolution.py`).
|
||||
- Local models — bge-m3 embeddings, bge-reranker-v2-m3, `/api/search` — run **always via Spark Control**, never against a Spark directly (`SPARK_CONTROL_URL`). The retrieval/embeddings contract is `docs/EMBEDDINGS.md`; honor it.
|
||||
|
||||
## Hard rule
|
||||
|
||||
- **Never treat Qdrant (or any derived index) as source of truth.** The CRM / SQLite is canonical and the index is rebuildable from it. Code may drop and rebuild the Qdrant collection; it must never read a fact from Qdrant that isn't recoverable from SQLite.
|
||||
|
||||
## Entity resolution
|
||||
|
||||
The two-investor-model reconciliation (classic `contacts`/`lp_profiles` vs the `fundraising_*` grid → canonical IDs) is the core entity-resolution task. See `backend/entity_*.py` and `docs/crm-overview.md`.
|
||||
Reference in New Issue
Block a user