Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server

Workstream A–C substrate for the Ten31 agentic system:
- A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9
- A2: additive/reversible core migration (canonical_entities, entity_links,
  interaction_log, relationship_edges, soft-delete) + ledgered runner
- B1/B3: chunking + deterministic entity resolution (backend/ingest)
- B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks
- C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools
- docs: redaction/re-hydration, Gmail enablement runbook
- synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db,
  drop legacy files + start9/0.3.5)

Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity
queries). Real backfill runs on Ten31 infra; index holds synthetic data only.
Branch snapshot also captures pre-existing working-tree changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 08:11:28 -05:00
parent 7027efd777
commit c7ce44d963
99 changed files with 10676 additions and 7817 deletions
+79
View File
@@ -0,0 +1,79 @@
"""
Exception taxonomy for Gmail integration.
gmail_client._call() maps HTTP status codes to these exception types. The retry
loop in gmail_client._with_retry() inspects the class hierarchy to decide
whether to back off + retry or fail fast.
"""
class GmailError(Exception):
"""Base class for all Gmail-integration errors."""
def __init__(self, message: str = "", *, status: int = 0, payload: object = None):
super().__init__(message)
self.status = status
self.payload = payload
class AuthError(GmailError):
"""401 / 403 that is not a rate-limit. Requires operator intervention
(bad service account key, revoked OAuth, missing DWD scope). Not retried."""
class RateLimitError(GmailError):
"""429 or 403 with reason in {rateLimitExceeded, userRateLimitExceeded}.
Retried with exponential backoff."""
class TransientError(GmailError):
"""5xx or network error. Retried with exponential backoff."""
class NotFoundError(GmailError):
"""404. For messages this usually means 'deleted in Gmail after we saw it';
for history this is HistoryExpiredError."""
class HistoryExpiredError(NotFoundError):
"""404 on history.list with startHistoryId — Gmail only retains history
for a limited window (~7 days). Triggers date-based backfill fallback."""
class PermanentError(GmailError):
"""400 or other permanent failure. Skip and log; do not retry."""
def classify_http(status: int, payload: object) -> GmailError:
"""Map a Gmail API response to the appropriate exception type.
`payload` is the decoded JSON body if any; used to distinguish rate-limit
403s from pure auth 403s via the `reason` field Google returns.
"""
reason = ""
if isinstance(payload, dict):
try:
errs = payload.get("error", {}).get("errors") or []
if errs:
reason = str(errs[0].get("reason", ""))
except Exception: # pragma: no cover — defensive
pass
if status == 429:
return RateLimitError(f"rate limited: {reason}", status=status, payload=payload)
if status == 403:
if reason in ("rateLimitExceeded", "userRateLimitExceeded", "quotaExceeded"):
return RateLimitError(f"quota: {reason}", status=status, payload=payload)
return AuthError(f"forbidden: {reason}", status=status, payload=payload)
if status == 401:
return AuthError("unauthorized", status=status, payload=payload)
if status == 404:
return NotFoundError("not found", status=status, payload=payload)
if 500 <= status < 600:
return TransientError(f"server error {status}", status=status, payload=payload)
if 400 <= status < 500:
return PermanentError(f"client error {status}: {reason}", status=status, payload=payload)
return GmailError(f"unexpected status {status}", status=status, payload=payload)
RETRYABLE = (RateLimitError, TransientError)