Files
ten31-database/backend/email_integration
Keysat c7b74a2704 Email search/query + windowed digest preview (v0.1.0:83)
Communications tab (search/query roadmap items 1 & 2):
- Fix the investor dropdown: the facet only listed grid investors, so it
  came back empty whenever email matched a classic contact or org domain
  (no grid id — the common case). It now mirrors the email list, resolving
  each link to a typed identity (fund:/org:/contact:/addr:) with precedence
  grid -> org -> contact -> address; investor_id accepts the typed key
  (bare id = fund: for back-compat) and an unknown prefix matches nothing.
- Add a date-range filter and a click-to-expand full-body view
  (GET /api/email/detail, admin, soft-delete-gated; body_text only, never
  raw remote HTML).
- Add a "Search content" mode: GET /api/email/search wraps the ingest
  hybrid_search over the Qdrant email index (doc_type=email), hydrated and
  soft-delete-filtered against SQLite (canonical), 503 if Spark/Qdrant down.

Daily digest:
- Settings -> Admin builds a digest over a chosen window (last 24h or since
  a date) as an in-app preview before sending (POST /api/admin/digest/preview),
  so the local-Spark summarizer can be verified on demand even on a quiet day.
  Manual send uses the same window; neither advances the daily cursor, so a
  preview never suppresses the scheduled digest.

Code-only, migrations no-op. 22/22 backend tests, render-smoke pass.
2026-06-16 20:46:15 -05:00
..

email_integration — Gmail capture for the Venture CRM

Scaffolded Phase 1 of the Gmail integration described in GMAIL_INTEGRATION_ARCHITECTURE.md (repo root). Everything in this module is isolated from server.py until you wire it in explicitly.

Contents

File Purpose
config.py Env-var loader; exposes CONFIG singleton.
errors.py Exception taxonomy used by the retry loop.
crypto.py AES-GCM wrapper for OAuth refresh-token encryption (only used in OAuth mode).
credentials.py CredentialProvider protocol + DWDCredentialProvider / OAuthCredentialProvider.
gmail_client.py Gmail API HTTP wrapper (rate limit, retry, pagination).
db.py All SQL touching emails_* tables. Migrations live under migrations/.
parser.py Gmail payload → canonical dict (headers, body, attachments).
matcher.py Investor address index + match logic.
threads.py Thread resolution using Gmail threadId + RFC References.
attachments.py Stub rows + on-disk storage + download worker.
sync.py Orchestrator for backfill + incremental sync of one account.
scheduler.py Background thread that runs sync.sync_all on an interval.
routes.py HTTP handlers under /api/email/* compatible with CRMHandler.
migrations/0001_email_tables.sql Table DDL.

Wiring it in

All changes are in backend/server.py, all guarded by an env flag. Each is independently revertible. None run unless CRM_GMAIL_INTEGRATION_ENABLED=true.

Patch 1 — migrations (append to init_db() after all existing cursor.executescript(...) calls, before conn.commit()):

try:
    from email_integration.db import apply_migrations
    apply_migrations(cursor)
except ImportError:
    pass

Patch 2 — scheduler (in main(), after start_backup_scheduler()):

if os.environ.get("CRM_GMAIL_INTEGRATION_ENABLED", "").lower() in ("1", "true", "yes", "on"):
    from email_integration.scheduler import start_sync_scheduler
    start_sync_scheduler()

Patch 3 — routes (add near the top of CRMHandler.do_GET and CRMHandler.do_POST, after auth/rate-limit pre-checks, before API routing):

try:
    from email_integration.routes import try_handle
    if try_handle(self):
        return
except ImportError:
    pass

Environment variables

# Master on/off. Default off; scheduler won't start, routes return 503.
CRM_GMAIL_INTEGRATION_ENABLED=true

# Auth method: "dwd" (default, recommended) or "oauth"
CRM_GMAIL_AUTH_METHOD=dwd

# DWD mode
CRM_GMAIL_SA_KEY_PATH=/path/to/CRM/data/secrets/gmail-service-account.json
CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz

# OAuth mode (fallback; not required for DWD)
CRM_GMAIL_OAUTH_CLIENT_ID=...
CRM_GMAIL_OAUTH_CLIENT_SECRET=...
CRM_GMAIL_OAUTH_REDIRECT_URI=https://crm.ten31.xyz/api/email/oauth/callback
CRM_GMAIL_SECRET_KEY=<base64-32-random-bytes>   # for encrypting refresh tokens

# Sync
CRM_GMAIL_SYNC_INTERVAL_MIN=180          # default 3h
CRM_GMAIL_BACKFILL_PAGE_SIZE=500
CRM_GMAIL_MAX_ATTACHMENT_MB=50
CRM_GMAIL_ATTACH_CONCURRENCY=4
CRM_GMAIL_RATE_UNITS_SEC=150             # per account, leaves 40% headroom
CRM_GMAIL_RETRY_MAX=5
CRM_GMAIL_HISTORY_STALE_DAYS=5

Google Cloud / Workspace setup (DWD)

See GMAIL_INTEGRATION_ARCHITECTURE.md §3 for the full runbook. Short form:

  1. Create GCP project, enable Gmail API.
  2. Create service account, download JSON key, enable domain-wide delegation.
  3. In Google Admin console → Security → API controls → Manage domain-wide delegation, authorize the service account's client ID with scope https://www.googleapis.com/auth/gmail.readonly.
  4. Copy the JSON key to data/secrets/gmail-service-account.json, chmod 600.
  5. Set env vars in .env.beta, restart CRM.
  6. As admin, POST /api/email/accounts/enroll-all to create email_accounts rows for every active user whose email ends in the Workspace domain.

Adding the crypto dependency (only for OAuth mode)

If you use OAuth fallback you need cryptography:

cryptography==42.0.5

Append to backend/requirements.txt. DWD mode also uses cryptography for the RSA signing of the JWT bearer token — so if you enable the integration in either mode, add the dep.

Rollback

To disable instantly: set CRM_GMAIL_INTEGRATION_ENABLED=false and restart. The scheduler won't start, routes return 503, DB tables remain (unused).

To remove completely: drop the env var, delete data/email_attachments/, drop all emails_* tables and email_* tables (migration is idempotent create-only; a separate drop script would be required — not provided in Phase 1 since you said you're not rushing).

Local development

The module has zero network dependencies when imported without the scheduler starting. You can:

python3 -c "from email_integration.parser import parse; \
    import json; \
    print(parse(json.load(open('fixture.json'))))"

Testing checklist (before enabling in production)

  • Enable CRM_GMAIL_INTEGRATION_ENABLED=true on a staging copy of the DB only.
  • Verify migrations applied: emails, email_accounts, etc. present.
  • Enroll one account (yours) via /api/email/accounts/enroll.
  • Trigger POST /api/email/sync/run-now.
  • Check email_sync_runs for status='ok'.
  • Spot-check emails rows against Gmail.
  • Verify an attachment downloaded correctly (hash and size).
  • Let the scheduler run for 24 hours; monitor /api/email/status.
  • Enroll remaining 4 teammates.

What's scaffolded vs. TODO

Scaffolded and complete:

  • Schema (migration 0001)
  • Config and env parsing
  • Error taxonomy + retry classifier
  • AES-GCM crypto helpers
  • DWD JWT minting + access token caching
  • OAuth refresh + consent flow endpoints
  • Gmail client (list/get/history/attachments/profile) with rate limit + retry
  • Full DB data-access layer
  • MIME parser including RFC 2047 subjects and HTML→text fallback
  • Investor matcher with exact + domain strategies
  • Thread resolution (Gmail threadId + RFC References cross-account)
  • Attachment storage with SHA-256 dedup
  • Sync orchestrator (backfill + incremental with history-expired fallback)
  • Scheduler with manual-trigger hook
  • HTTP routes (status, accounts, threads, enroll, run-now, rematch, oauth)

TODO before production (see architecture doc §15):

  • Multipart batch metadata fetch in gmail_client.batch_get_metadata (currently serial fallback).
  • Unit tests (fixtures for parser, matcher, threads; integration tests with responses-style HTTP mock).
  • Frontend UI: a thread list + detail pane in frontend/index.html.
  • Sandboxed HTML rendering for email bodies (out of scope here).