Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server

Workstream A–C substrate for the Ten31 agentic system: - A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9 - A2: additive/reversible core migration (canonical_entities, entity_links, interaction_log, relationship_edges, soft-delete) + ledgered runner - B1/B3: chunking + deterministic entity resolution (backend/ingest) - B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks - C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools - docs: redaction/re-hydration, Gmail enablement runbook - synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db, drop legacy files + start9/0.3.5) Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity queries). Real backfill runs on Ten31 infra; index holds synthetic data only. Branch snapshot also captures pre-existing working-tree changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 08:11:28 -05:00
parent 7027efd777
commit c7ce44d963
99 changed files with 10676 additions and 7817 deletions
@@ -0,0 +1,175 @@
+# `email_integration` — Gmail capture for the Venture CRM
+
+Scaffolded Phase 1 of the Gmail integration described in
+`GMAIL_INTEGRATION_ARCHITECTURE.md` (repo root). Everything in this module is
+isolated from `server.py` until you wire it in explicitly.
+
+## Contents
+
+| File | Purpose |
+|------|---------|
+| `config.py` | Env-var loader; exposes `CONFIG` singleton. |
+| `errors.py` | Exception taxonomy used by the retry loop. |
+| `crypto.py` | AES-GCM wrapper for OAuth refresh-token encryption (only used in OAuth mode). |
+| `credentials.py` | `CredentialProvider` protocol + `DWDCredentialProvider` / `OAuthCredentialProvider`. |
+| `gmail_client.py` | Gmail API HTTP wrapper (rate limit, retry, pagination). |
+| `db.py` | All SQL touching `emails_*` tables. Migrations live under `migrations/`. |
+| `parser.py` | Gmail payload → canonical dict (headers, body, attachments). |
+| `matcher.py` | Investor address index + match logic. |
+| `threads.py` | Thread resolution using Gmail threadId + RFC References. |
+| `attachments.py` | Stub rows + on-disk storage + download worker. |
+| `sync.py` | Orchestrator for backfill + incremental sync of one account. |
+| `scheduler.py` | Background thread that runs `sync.sync_all` on an interval. |
+| `routes.py` | HTTP handlers under `/api/email/*` compatible with `CRMHandler`. |
+| `migrations/0001_email_tables.sql` | Table DDL. |
+
+## Wiring it in
+
+All changes are in `backend/server.py`, all guarded by an env flag. Each is
+independently revertible. None run unless `CRM_GMAIL_INTEGRATION_ENABLED=true`.
+
+**Patch 1 — migrations** (append to `init_db()` after all existing
+`cursor.executescript(...)` calls, before `conn.commit()`):
+
+```python
+try:
+    from email_integration.db import apply_migrations
+    apply_migrations(cursor)
+except ImportError:
+    pass
+```
+
+**Patch 2 — scheduler** (in `main()`, after `start_backup_scheduler()`):
+
+```python
+if os.environ.get("CRM_GMAIL_INTEGRATION_ENABLED", "").lower() in ("1", "true", "yes", "on"):
+    from email_integration.scheduler import start_sync_scheduler
+    start_sync_scheduler()
+```
+
+**Patch 3 — routes** (add near the top of `CRMHandler.do_GET` and
+`CRMHandler.do_POST`, after auth/rate-limit pre-checks, before API routing):
+
+```python
+try:
+    from email_integration.routes import try_handle
+    if try_handle(self):
+        return
+except ImportError:
+    pass
+```
+
+## Environment variables
+
+```bash
+# Master on/off. Default off; scheduler won't start, routes return 503.
+CRM_GMAIL_INTEGRATION_ENABLED=true
+
+# Auth method: "dwd" (default, recommended) or "oauth"
+CRM_GMAIL_AUTH_METHOD=dwd
+
+# DWD mode
+CRM_GMAIL_SA_KEY_PATH=/path/to/CRM/data/secrets/gmail-service-account.json
+CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz
+
+# OAuth mode (fallback; not required for DWD)
+CRM_GMAIL_OAUTH_CLIENT_ID=...
+CRM_GMAIL_OAUTH_CLIENT_SECRET=...
+CRM_GMAIL_OAUTH_REDIRECT_URI=https://crm.ten31.xyz/api/email/oauth/callback
+CRM_GMAIL_SECRET_KEY=<base64-32-random-bytes>   # for encrypting refresh tokens
+
+# Sync
+CRM_GMAIL_SYNC_INTERVAL_MIN=180          # default 3h
+CRM_GMAIL_BACKFILL_PAGE_SIZE=500
+CRM_GMAIL_MAX_ATTACHMENT_MB=50
+CRM_GMAIL_ATTACH_CONCURRENCY=4
+CRM_GMAIL_RATE_UNITS_SEC=150             # per account, leaves 40% headroom
+CRM_GMAIL_RETRY_MAX=5
+CRM_GMAIL_HISTORY_STALE_DAYS=5
+```
+
+## Google Cloud / Workspace setup (DWD)
+
+See `GMAIL_INTEGRATION_ARCHITECTURE.md` §3 for the full runbook. Short form:
+
+1. Create GCP project, enable Gmail API.
+2. Create service account, download JSON key, enable domain-wide delegation.
+3. In Google Admin console → Security → API controls → Manage domain-wide
+   delegation, authorize the service account's client ID with scope
+   `https://www.googleapis.com/auth/gmail.readonly`.
+4. Copy the JSON key to `data/secrets/gmail-service-account.json`, `chmod 600`.
+5. Set env vars in `.env.beta`, restart CRM.
+6. As admin, POST `/api/email/accounts/enroll-all` to create `email_accounts`
+   rows for every active user whose email ends in the Workspace domain.
+
+## Adding the crypto dependency (only for OAuth mode)
+
+If you use OAuth fallback you need `cryptography`:
+
+```
+cryptography==42.0.5
+```
+
+Append to `backend/requirements.txt`. DWD mode also uses `cryptography` for
+the RSA signing of the JWT bearer token — so if you enable the integration in
+either mode, add the dep.
+
+## Rollback
+
+To disable instantly: set `CRM_GMAIL_INTEGRATION_ENABLED=false` and restart.
+The scheduler won't start, routes return 503, DB tables remain (unused).
+
+To remove completely: drop the env var, delete `data/email_attachments/`,
+drop all `emails_*` tables and `email_*` tables (migration is idempotent
+create-only; a separate drop script would be required — not provided in
+Phase 1 since you said you're not rushing).
+
+## Local development
+
+The module has zero network dependencies when imported without the scheduler
+starting. You can:
+
+```python
+python3 -c "from email_integration.parser import parse; \
+    import json; \
+    print(parse(json.load(open('fixture.json'))))"
+```
+
+## Testing checklist (before enabling in production)
+
+- [ ] Enable `CRM_GMAIL_INTEGRATION_ENABLED=true` on a staging copy of the DB only.
+- [ ] Verify migrations applied: `emails`, `email_accounts`, etc. present.
+- [ ] Enroll one account (yours) via `/api/email/accounts/enroll`.
+- [ ] Trigger `POST /api/email/sync/run-now`.
+- [ ] Check `email_sync_runs` for `status='ok'`.
+- [ ] Spot-check `emails` rows against Gmail.
+- [ ] Verify an attachment downloaded correctly (hash and size).
+- [ ] Let the scheduler run for 24 hours; monitor `/api/email/status`.
+- [ ] Enroll remaining 4 teammates.
+
+## What's scaffolded vs. TODO
+
+**Scaffolded and complete:**
+- Schema (migration 0001)
+- Config and env parsing
+- Error taxonomy + retry classifier
+- AES-GCM crypto helpers
+- DWD JWT minting + access token caching
+- OAuth refresh + consent flow endpoints
+- Gmail client (list/get/history/attachments/profile) with rate limit + retry
+- Full DB data-access layer
+- MIME parser including RFC 2047 subjects and HTML→text fallback
+- Investor matcher with exact + domain strategies
+- Thread resolution (Gmail threadId + RFC References cross-account)
+- Attachment storage with SHA-256 dedup
+- Sync orchestrator (backfill + incremental with history-expired fallback)
+- Scheduler with manual-trigger hook
+- HTTP routes (status, accounts, threads, enroll, run-now, rematch, oauth)
+
+**TODO before production (see architecture doc §15):**
+- Multipart batch metadata fetch in `gmail_client.batch_get_metadata`
+  (currently serial fallback).
+- Unit tests (fixtures for parser, matcher, threads; integration tests with
+  responses-style HTTP mock).
+- Frontend UI: a thread list + detail pane in `frontend/index.html`.
+- Sandboxed HTML rendering for email bodies (out of scope here).
@@ -0,0 +1,15 @@
+"""
+Gmail Integration for Venture CRM.
+
+Phase 1 scope: OAuth2/DWD authentication, incremental Gmail sync, MIME parsing,
+investor matching, threading, attachment storage. All logic isolated to this
+module; server.py integration is a 3-line patch guarded by
+CRM_GMAIL_INTEGRATION_ENABLED.
+
+See GMAIL_INTEGRATION_ARCHITECTURE.md at the repo root for full design.
+"""
+
+from . import config  # noqa: F401
+from . import errors  # noqa: F401
+
+__all__ = ["config", "errors"]
@@ -0,0 +1,234 @@
+"""
+Attachment download + on-disk storage.
+
+Two usage patterns:
+
+  1. During message parsing we call `register_stubs(conn, email_id, parsed)`
+     which writes pending rows to email_attachments.
+
+  2. A separate worker (kicked off by sync after each account completes)
+     calls `drain_pending()` which fetches attachment bytes from Gmail and
+     writes them to disk under CONFIG.attachments_dir.
+
+Files are named: <CRM_DATA_DIR>/email_attachments/<email_id[:2]>/<email_id>/<attachment_id>-<sanitized_filename>
+
+Sanitization prevents path traversal and keeps cross-platform-safe names.
+"""
+
+import base64
+import hashlib
+import os
+import re
+import sqlite3
+from typing import Iterable, Optional
+
+from . import config as _cfg
+from . import db as _db
+from . import errors as _errors
+from . import gmail_client as _gmail
+
+
+_MAX_FILENAME_LEN = 200
+_BAD_FILENAME_CHARS = re.compile(r'[/\\\x00-\x1f\x7f:*?"<>|]+')
+
+
+def _sanitize_filename(name: str) -> str:
+    if not name:
+        return "unnamed.bin"
+    # strip path components first
+    name = os.path.basename(name.replace("\\", "/"))
+    name = _BAD_FILENAME_CHARS.sub("_", name).strip(" .")
+    if not name:
+        name = "unnamed.bin"
+    if len(name) > _MAX_FILENAME_LEN:
+        stem, dot, ext = name.rpartition(".")
+        if dot:
+            name = stem[: _MAX_FILENAME_LEN - len(ext) - 1] + "." + ext
+        else:
+            name = name[:_MAX_FILENAME_LEN]
+    return name
+
+
+def _storage_path_for(email_id: str, attachment_id: str, sanitized_filename: str) -> str:
+    root = _cfg.CONFIG.attachments_dir
+    bucket = email_id[:2] or "_0"
+    dir_ = os.path.join(root, bucket, email_id)
+    os.makedirs(dir_, exist_ok=True)
+    return os.path.join(dir_, f"{attachment_id}-{sanitized_filename}")
+
+
+# ---------------------------------------------------------------------------- phase 1: register stubs
+
+def register_stubs(conn: sqlite3.Connection, *, email_id: str,
+                   parsed_attachments: Iterable[dict]) -> list[str]:
+    """Write pending attachment rows from parsed message data.
+
+    Also handles tiny inline attachments whose bytes arrived with the message
+    body (body.data present, no separate attachmentId) by writing them
+    directly and marking as downloaded.
+
+    Returns list of attachment ids created.
+    """
+    max_bytes = _cfg.CONFIG.max_attachment_mb * 1024 * 1024
+    ids = []
+
+    for att in parsed_attachments:
+        filename = att.get("filename") or "unnamed.bin"
+        sanitized = _sanitize_filename(filename)
+        gmail_att_id = att.get("gmail_attachment_id") or ""
+        mime = att.get("mime_type")
+        size = att.get("size")
+
+        # Determine storage path (we write the path whether or not the download
+        # succeeded; missing files surface via download_status).
+        att_row_id = _db.insert_attachment_stub(
+            conn,
+            email_id=email_id,
+            gmail_attachment_id=gmail_att_id,
+            filename=filename,
+            sanitized_filename=sanitized,
+            mime_type=mime,
+            size_bytes=size,
+            storage_path=_storage_path_for(email_id, gmail_att_id or att_row_id_fallback(), sanitized),
+        )
+        ids.append(att_row_id)
+
+        # Oversize guard.
+        if isinstance(size, int) and size > max_bytes:
+            conn.execute(
+                "UPDATE email_attachments SET download_status = 'skipped', "
+                "download_error = ? WHERE id = ?",
+                (f"exceeds max size {_cfg.CONFIG.max_attachment_mb}MB", att_row_id),
+            )
+            continue
+
+        # Inline data fast-path.
+        inline_b64 = att.get("inline_data_b64")
+        if inline_b64:
+            try:
+                raw = base64.urlsafe_b64decode(_pad(inline_b64).encode("ascii"))
+                path = _storage_path_for(email_id, att_row_id, sanitized)
+                _write_bytes(path, raw)
+                sha = hashlib.sha256(raw).hexdigest()
+                conn.execute(
+                    "UPDATE email_attachments SET storage_path = ? WHERE id = ?",
+                    (path, att_row_id),
+                )
+                _db.mark_attachment_downloaded(
+                    conn, att_row_id, sha256_hex=sha, size_bytes=len(raw)
+                )
+            except Exception as e:
+                _db.mark_attachment_failed(conn, att_row_id, error=f"inline decode: {e}")
+
+    return ids
+
+
+def att_row_id_fallback() -> str:
+    # Placeholder so the path template always produces something if gmail_att_id
+    # was missing at stub time; the real path is rewritten when the worker
+    # picks it up.
+    import uuid
+    return uuid.uuid4().hex
+
+
+# ---------------------------------------------------------------------------- phase 2: worker
+
+def drain_pending(conn_factory, client: _gmail.GmailClient, account_id: str,
+                  *, limit: int = 50) -> int:
+    """Download up to `limit` pending attachments for `account_id`.
+
+    Returns count of successfully downloaded attachments. Called after each
+    account's sync completes so large files don't block the sync loop.
+    """
+    conn = conn_factory()
+    try:
+        pending = _db.pending_attachments(conn, limit=limit)
+    finally:
+        conn.close()
+
+    downloaded = 0
+    for row in pending:
+        if row["account_id"] != account_id:
+            continue
+        conn = conn_factory()
+        try:
+            ok = _download_one(conn, client, row)
+            if ok:
+                downloaded += 1
+            conn.commit()
+        finally:
+            conn.close()
+    return downloaded
+
+
+def _download_one(conn: sqlite3.Connection, client: _gmail.GmailClient, row) -> bool:
+    try:
+        resp = client.get_attachment(row["gmail_message_id"], row["gmail_attachment_id"])
+    except _errors.RETRYABLE as e:
+        _db.mark_attachment_failed(conn, row["id"], error=f"transient: {type(e).__name__}")
+        return False
+    except _errors.GmailError as e:
+        _db.mark_attachment_failed(conn, row["id"], error=f"{type(e).__name__}: {e}")
+        return False
+
+    data_b64 = resp.get("data")
+    if not data_b64:
+        _db.mark_attachment_failed(conn, row["id"], error="empty data in response")
+        return False
+
+    try:
+        raw = base64.urlsafe_b64decode(_pad(data_b64).encode("ascii"))
+    except Exception as e:
+        _db.mark_attachment_failed(conn, row["id"], error=f"decode: {e}")
+        return False
+
+    sha = hashlib.sha256(raw).hexdigest()
+    # If an existing attachment has the same SHA, re-point storage_path and skip write.
+    existing = _find_existing_by_sha(conn, sha, exclude_id=row["id"])
+    if existing:
+        conn.execute(
+            "UPDATE email_attachments SET storage_path = ? WHERE id = ?",
+            (existing["storage_path"], row["id"]),
+        )
+        _db.mark_attachment_downloaded(conn, row["id"], sha256_hex=sha, size_bytes=len(raw))
+        return True
+
+    path = _storage_path_for(row["email_id"], row["id"], row["sanitized_filename"])
+    try:
+        _write_bytes(path, raw)
+    except OSError as e:
+        _db.mark_attachment_failed(conn, row["id"], error=f"disk: {e}")
+        return False
+
+    conn.execute(
+        "UPDATE email_attachments SET storage_path = ? WHERE id = ?",
+        (path, row["id"]),
+    )
+    _db.mark_attachment_downloaded(conn, row["id"], sha256_hex=sha, size_bytes=len(raw))
+    return True
+
+
+def _find_existing_by_sha(conn: sqlite3.Connection, sha: str, *, exclude_id: str) -> Optional[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT * FROM email_attachments WHERE sha256_hex = ? AND id != ? "
+        "AND download_status = 'downloaded' LIMIT 1",
+        (sha, exclude_id),
+    )
+    return cur.fetchone()
+
+
+# ---------------------------------------------------------------------------- utils
+
+def _pad(b64: str) -> str:
+    pad = 4 - (len(b64) % 4)
+    return b64 + ("=" * pad if pad != 4 else "")
+
+
+def _write_bytes(path: str, data: bytes) -> None:
+    os.makedirs(os.path.dirname(path), exist_ok=True)
+    tmp = path + ".tmp"
+    with open(tmp, "wb") as f:
+        f.write(data)
+    os.chmod(tmp, 0o600)
+    os.replace(tmp, path)
@@ -0,0 +1,112 @@
+"""
+Email integration configuration.
+
+Reads from the same env-var surface as the rest of the CRM (server.py style),
+no pydantic/dotenv magic — stdlib only.
+"""
+
+import os
+from dataclasses import dataclass
+from typing import Optional
+
+# Reuse the CRM's data dir so backups and email storage live together.
+_PROJECT_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+_DEFAULT_DATA_DIR = os.path.join(_PROJECT_DIR, "data")
+
+
+def _bool_env(name: str, default: bool = False) -> bool:
+    v = os.environ.get(name, "").strip().lower()
+    if v in ("1", "true", "yes", "on"):
+        return True
+    if v in ("0", "false", "no", "off"):
+        return False
+    return default
+
+
+def _int_env(name: str, default: int) -> int:
+    try:
+        return int(os.environ.get(name, str(default)))
+    except (TypeError, ValueError):
+        return default
+
+
+@dataclass(frozen=True)
+class EmailConfig:
+    # Master kill switch. When False, scheduler doesn't start and routes
+    # return 503. Migrations are still applied (so schema is ready).
+    enabled: bool
+
+    # Primary auth path. "dwd" means service account / domain-wide delegation.
+    # "oauth" means per-user refresh tokens. DWD is default; OAuth is the
+    # pluggable fallback.
+    primary_auth: str
+
+    # DWD specifics
+    dwd_key_path: Optional[str]
+    workspace_domain: Optional[str]
+
+    # OAuth specifics (used for fallback + admin UI)
+    oauth_client_id: Optional[str]
+    oauth_client_secret: Optional[str]
+    oauth_redirect_uri: Optional[str]
+
+    # Encryption key (base64) for OAuth refresh-token-at-rest encryption.
+    # Required whenever oauth path is in use. DWD path never persists tokens.
+    secret_key_b64: Optional[str]
+
+    # Sync scheduling
+    sync_interval_sec: int
+    backfill_page_size: int
+    max_attachment_mb: int
+    max_parallel_attachment_downloads: int
+
+    # Storage
+    data_dir: str
+    attachments_dir: str
+    secrets_dir: str
+
+    # Rate limit / retry
+    rate_limit_units_per_sec_per_account: int
+    retry_max_attempts: int
+    retry_initial_delay_sec: float
+    retry_max_delay_sec: float
+
+    # Gmail history retention — if we fall this far behind, switch to date
+    # backfill since Gmail may have pruned history records.
+    history_stale_days: int
+
+
+def load() -> EmailConfig:
+    data_dir = os.environ.get("CRM_DATA_DIR", _DEFAULT_DATA_DIR)
+    return EmailConfig(
+        enabled=_bool_env("CRM_GMAIL_INTEGRATION_ENABLED", False),
+        primary_auth=os.environ.get("CRM_GMAIL_AUTH_METHOD", "dwd").lower(),
+        dwd_key_path=os.environ.get("CRM_GMAIL_SA_KEY_PATH") or None,
+        workspace_domain=os.environ.get("CRM_GMAIL_WORKSPACE_DOMAIN") or None,
+        oauth_client_id=os.environ.get("CRM_GMAIL_OAUTH_CLIENT_ID") or None,
+        oauth_client_secret=os.environ.get("CRM_GMAIL_OAUTH_CLIENT_SECRET") or None,
+        oauth_redirect_uri=os.environ.get("CRM_GMAIL_OAUTH_REDIRECT_URI") or None,
+        secret_key_b64=os.environ.get("CRM_GMAIL_SECRET_KEY") or None,
+        sync_interval_sec=_int_env("CRM_GMAIL_SYNC_INTERVAL_MIN", 180) * 60,
+        backfill_page_size=_int_env("CRM_GMAIL_BACKFILL_PAGE_SIZE", 500),
+        max_attachment_mb=_int_env("CRM_GMAIL_MAX_ATTACHMENT_MB", 50),
+        max_parallel_attachment_downloads=_int_env("CRM_GMAIL_ATTACH_CONCURRENCY", 4),
+        data_dir=data_dir,
+        attachments_dir=os.path.join(data_dir, "email_attachments"),
+        secrets_dir=os.path.join(data_dir, "secrets"),
+        rate_limit_units_per_sec_per_account=_int_env("CRM_GMAIL_RATE_UNITS_SEC", 150),
+        retry_max_attempts=_int_env("CRM_GMAIL_RETRY_MAX", 5),
+        retry_initial_delay_sec=float(os.environ.get("CRM_GMAIL_RETRY_INITIAL_SEC", "1.0")),
+        retry_max_delay_sec=float(os.environ.get("CRM_GMAIL_RETRY_MAX_SEC", "60.0")),
+        history_stale_days=_int_env("CRM_GMAIL_HISTORY_STALE_DAYS", 5),
+    )
+
+
+# Singleton. Reload with `reload_config()` if env changes (mostly for tests).
+CONFIG = load()
+
+
+def reload_config() -> EmailConfig:
+    global CONFIG
+    CONFIG = load()
+    return CONFIG
@@ -0,0 +1,297 @@
+"""
+Credential providers for Gmail API access.
+
+Two implementations behind a common protocol:
+
+- DWDCredentialProvider: signs a JWT with the Workspace-authorized service
+  account, exchanges for a short-lived access token that impersonates a
+  specific user. No per-user persistent state.
+
+- OAuthCredentialProvider: uses a per-user refresh token (stored encrypted
+  in email_accounts.oauth_refresh_enc) to mint access tokens. Supports the
+  'connect Gmail' UI flow.
+
+Both provide the same interface:
+
+    provider.access_token_for(email_address: str) -> AccessToken
+"""
+
+import base64
+import json
+import os
+import threading
+import time
+from dataclasses import dataclass
+from typing import Optional, Protocol
+import urllib.parse
+import urllib.request
+
+from . import config as _cfg
+from . import crypto
+from . import errors
+
+
+GMAIL_READONLY_SCOPE = "https://www.googleapis.com/auth/gmail.readonly"
+GOOGLE_TOKEN_URL = "https://oauth2.googleapis.com/token"
+
+
+@dataclass
+class AccessToken:
+    token: str
+    expires_at: float   # epoch seconds
+
+
+class CredentialProvider(Protocol):
+    def access_token_for(self, email_address: str) -> AccessToken: ...
+    def revoke(self, email_address: str) -> None: ...
+
+
+# ============================================================================
+# Domain-wide delegation
+# ============================================================================
+
+class DWDCredentialProvider:
+    """Impersonation via service-account JWT bearer grant."""
+
+    def __init__(self, key_path: str):
+        with open(key_path, "r") as f:
+            self._key = json.load(f)
+        self._client_email = self._key["client_email"]
+        self._private_key_pem = self._key["private_key"].encode("utf-8")
+        self._cache: dict[str, AccessToken] = {}
+        self._lock = threading.Lock()
+
+    def access_token_for(self, email_address: str) -> AccessToken:
+        with self._lock:
+            cached = self._cache.get(email_address)
+            if cached and cached.expires_at - time.time() > 60:
+                return cached
+            token = self._mint(email_address)
+            self._cache[email_address] = token
+            return token
+
+    def revoke(self, email_address: str) -> None:
+        # DWD tokens expire naturally in <1h. Revocation is via Admin console.
+        # We just drop the cache so next call mints fresh.
+        with self._lock:
+            self._cache.pop(email_address, None)
+
+    # ------------------------------------------------------------------ helpers
+
+    def _mint(self, subject_email: str) -> AccessToken:
+        try:
+            from cryptography.hazmat.primitives import hashes, serialization  # type: ignore
+            from cryptography.hazmat.primitives.asymmetric import padding  # type: ignore
+        except ImportError as e:  # pragma: no cover
+            raise errors.AuthError(
+                "DWD requires the `cryptography` package. Add to requirements.txt."
+            ) from e
+
+        now = int(time.time())
+        header = {"alg": "RS256", "typ": "JWT"}
+        claim = {
+            "iss": self._client_email,
+            "sub": subject_email,
+            "scope": GMAIL_READONLY_SCOPE,
+            "aud": GOOGLE_TOKEN_URL,
+            "iat": now,
+            "exp": now + 3600,
+        }
+        signing_input = _b64url(_json(header)) + b"." + _b64url(_json(claim))
+
+        private_key = serialization.load_pem_private_key(self._private_key_pem, password=None)
+        signature = private_key.sign(signing_input, padding.PKCS1v15(), hashes.SHA256())
+        jwt = signing_input + b"." + _b64url(signature)
+
+        body = urllib.parse.urlencode({
+            "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
+            "assertion": jwt.decode("ascii"),
+        }).encode("ascii")
+
+        req = urllib.request.Request(
+            GOOGLE_TOKEN_URL,
+            data=body,
+            headers={"Content-Type": "application/x-www-form-urlencoded"},
+        )
+        try:
+            with urllib.request.urlopen(req, timeout=15) as resp:
+                payload = json.loads(resp.read())
+        except urllib.error.HTTPError as e:
+            body = e.read().decode("utf-8", errors="replace")
+            try:
+                payload = json.loads(body)
+            except Exception:
+                payload = {"raw": body}
+            raise errors.classify_http(e.code, payload)
+
+        if "access_token" not in payload:
+            raise errors.AuthError("DWD token exchange returned no access_token", payload=payload)
+        return AccessToken(
+            token=payload["access_token"],
+            expires_at=time.time() + float(payload.get("expires_in", 3600)) - 30,
+        )
+
+
+# ============================================================================
+# Per-user OAuth (fallback)
+# ============================================================================
+
+class OAuthCredentialProvider:
+    """Refreshes access tokens using a stored encrypted refresh token.
+
+    Refresh tokens are obtained via the consent-flow routes in routes.py and
+    stored in email_accounts.oauth_refresh_enc (AES-GCM ciphertext).
+    """
+
+    def __init__(self, db_conn_factory, client_id: str, client_secret: str, secret_key_b64: str):
+        self._db = db_conn_factory
+        self._client_id = client_id
+        self._client_secret = client_secret
+        self._secret_key_b64 = secret_key_b64
+        self._lock = threading.Lock()
+
+    def access_token_for(self, email_address: str) -> AccessToken:
+        with self._lock:
+            row = self._load_account(email_address)
+            if row is None:
+                raise errors.AuthError(f"no email_accounts row for {email_address}")
+            # Cached access token still valid?
+            if row["oauth_token_enc"] and row["oauth_token_exp"]:
+                try:
+                    exp = float(row["oauth_token_exp"])
+                except ValueError:
+                    exp = 0.0
+                if exp - time.time() > 60:
+                    token = crypto.decrypt(row["oauth_token_enc"], secret_key_b64=self._secret_key_b64).decode("ascii")
+                    return AccessToken(token=token, expires_at=exp)
+            # Refresh.
+            return self._refresh(email_address, row)
+
+    def revoke(self, email_address: str) -> None:
+        row = self._load_account(email_address)
+        if not row or not row["oauth_refresh_enc"]:
+            return
+        refresh = crypto.decrypt(row["oauth_refresh_enc"], secret_key_b64=self._secret_key_b64).decode("ascii")
+        body = urllib.parse.urlencode({"token": refresh}).encode("ascii")
+        req = urllib.request.Request(
+            "https://oauth2.googleapis.com/revoke",
+            data=body,
+            headers={"Content-Type": "application/x-www-form-urlencoded"},
+        )
+        try:
+            urllib.request.urlopen(req, timeout=10).read()
+        except Exception:
+            pass  # best effort; we zero locally regardless
+        self._zero_account(email_address)
+
+    # ------------------------------------------------------------------ helpers
+
+    def _refresh(self, email_address: str, row) -> AccessToken:
+        if not row["oauth_refresh_enc"]:
+            raise errors.AuthError(f"no refresh token stored for {email_address}")
+        refresh = crypto.decrypt(row["oauth_refresh_enc"], secret_key_b64=self._secret_key_b64).decode("ascii")
+        body = urllib.parse.urlencode({
+            "grant_type": "refresh_token",
+            "refresh_token": refresh,
+            "client_id": self._client_id,
+            "client_secret": self._client_secret,
+        }).encode("ascii")
+        req = urllib.request.Request(
+            GOOGLE_TOKEN_URL,
+            data=body,
+            headers={"Content-Type": "application/x-www-form-urlencoded"},
+        )
+        try:
+            with urllib.request.urlopen(req, timeout=15) as resp:
+                payload = json.loads(resp.read())
+        except urllib.error.HTTPError as e:
+            body_text = e.read().decode("utf-8", errors="replace")
+            try:
+                payload = json.loads(body_text)
+            except Exception:
+                payload = {"raw": body_text}
+            raise errors.classify_http(e.code, payload)
+
+        if "access_token" not in payload:
+            raise errors.AuthError("OAuth refresh returned no access_token", payload=payload)
+
+        token_str = payload["access_token"]
+        exp = time.time() + float(payload.get("expires_in", 3600)) - 30
+        enc_token = crypto.encrypt(token_str.encode("ascii"), secret_key_b64=self._secret_key_b64)
+        self._save_token(email_address, enc_token, exp)
+        return AccessToken(token=token_str, expires_at=exp)
+
+    def _load_account(self, email_address: str):
+        conn = self._db()
+        try:
+            cur = conn.cursor()
+            cur.execute(
+                "SELECT id, oauth_refresh_enc, oauth_token_enc, oauth_token_exp "
+                "FROM email_accounts WHERE email_address = ?",
+                (email_address,),
+            )
+            return cur.fetchone()
+        finally:
+            conn.close()
+
+    def _save_token(self, email_address: str, enc_token: bytes, exp: float):
+        conn = self._db()
+        try:
+            conn.execute(
+                "UPDATE email_accounts SET oauth_token_enc = ?, oauth_token_exp = ?, "
+                "updated_at = datetime('now') WHERE email_address = ?",
+                (enc_token, str(exp), email_address),
+            )
+            conn.commit()
+        finally:
+            conn.close()
+
+    def _zero_account(self, email_address: str):
+        conn = self._db()
+        try:
+            conn.execute(
+                "UPDATE email_accounts SET oauth_refresh_enc = NULL, oauth_token_enc = NULL, "
+                "oauth_token_exp = NULL, sync_enabled = 0, sync_status = 'paused', "
+                "updated_at = datetime('now') WHERE email_address = ?",
+                (email_address,),
+            )
+            conn.commit()
+        finally:
+            conn.close()
+
+
+# ============================================================================
+# Factory — resolves CONFIG.primary_auth to a concrete provider
+# ============================================================================
+
+def build_provider(db_conn_factory) -> CredentialProvider:
+    cfg = _cfg.CONFIG
+    if cfg.primary_auth == "dwd":
+        if not cfg.dwd_key_path or not os.path.exists(cfg.dwd_key_path):
+            raise errors.AuthError(
+                f"CRM_GMAIL_SA_KEY_PATH not found: {cfg.dwd_key_path!r}"
+            )
+        return DWDCredentialProvider(cfg.dwd_key_path)
+    if cfg.primary_auth == "oauth":
+        if not (cfg.oauth_client_id and cfg.oauth_client_secret and cfg.secret_key_b64):
+            raise errors.AuthError(
+                "OAuth mode requires CRM_GMAIL_OAUTH_CLIENT_ID, "
+                "CRM_GMAIL_OAUTH_CLIENT_SECRET, and CRM_GMAIL_SECRET_KEY."
+            )
+        return OAuthCredentialProvider(
+            db_conn_factory,
+            cfg.oauth_client_id,
+            cfg.oauth_client_secret,
+            cfg.secret_key_b64,
+        )
+    raise errors.AuthError(f"unknown primary_auth: {cfg.primary_auth!r}")
+
+
+# ---------------------------------------------------------------------------- utils
+
+def _b64url(data: bytes) -> bytes:
+    return base64.urlsafe_b64encode(data).rstrip(b"=")
+
+
+def _json(obj) -> bytes:
+    return json.dumps(obj, separators=(",", ":")).encode("utf-8")
@@ -0,0 +1,79 @@
+"""
+AES-256-GCM encryption for OAuth refresh tokens at rest.
+
+Key material comes from CONFIG.secret_key_b64 (env: CRM_GMAIL_SECRET_KEY).
+Must be at least 32 bytes of entropy, base64-encoded.
+
+Storage format (as stored in BLOB columns):
+    version(1 byte) || nonce(12 bytes) || ciphertext+tag(N bytes)
+
+version = 1 for AES-GCM-256.
+
+Uses the `cryptography` library. If not available (optional at scaffold time),
+the OAuth fallback path is disabled with a clear error — DWD path is unaffected.
+"""
+
+import base64
+import os
+import secrets
+from typing import Optional
+
+try:
+    from cryptography.hazmat.primitives.ciphers.aead import AESGCM  # type: ignore
+    _AVAILABLE = True
+except ImportError:  # pragma: no cover
+    AESGCM = None  # type: ignore
+    _AVAILABLE = False
+
+
+VERSION = 1
+NONCE_LEN = 12
+
+
+class CryptoUnavailable(RuntimeError):
+    pass
+
+
+def _load_key(secret_key_b64: Optional[str]) -> bytes:
+    if not secret_key_b64:
+        raise CryptoUnavailable(
+            "CRM_GMAIL_SECRET_KEY not set; cannot encrypt/decrypt OAuth tokens. "
+            "DWD auth does not require this."
+        )
+    try:
+        key = base64.b64decode(secret_key_b64)
+    except Exception as e:
+        raise CryptoUnavailable(f"CRM_GMAIL_SECRET_KEY not valid base64: {e}") from e
+    if len(key) < 32:
+        raise CryptoUnavailable(
+            f"CRM_GMAIL_SECRET_KEY decodes to {len(key)} bytes; need >= 32."
+        )
+    return key[:32]  # AES-256
+
+
+def encrypt(plaintext: bytes, *, secret_key_b64: Optional[str]) -> bytes:
+    if not _AVAILABLE:
+        raise CryptoUnavailable("cryptography library not installed")
+    key = _load_key(secret_key_b64)
+    nonce = secrets.token_bytes(NONCE_LEN)
+    ct = AESGCM(key).encrypt(nonce, plaintext, None)
+    return bytes([VERSION]) + nonce + ct
+
+
+def decrypt(blob: bytes, *, secret_key_b64: Optional[str]) -> bytes:
+    if not _AVAILABLE:
+        raise CryptoUnavailable("cryptography library not installed")
+    if not blob or len(blob) < 1 + NONCE_LEN + 16:
+        raise ValueError("ciphertext too short")
+    version = blob[0]
+    if version != VERSION:
+        raise ValueError(f"unsupported crypto version: {version}")
+    nonce = blob[1:1 + NONCE_LEN]
+    ct = blob[1 + NONCE_LEN:]
+    key = _load_key(secret_key_b64)
+    return AESGCM(key).decrypt(nonce, ct, None)
+
+
+def generate_secret_key_b64() -> str:
+    """Helper for initial setup: prints a fresh key you can drop into env."""
+    return base64.b64encode(os.urandom(32)).decode("ascii")
@@ -0,0 +1,416 @@
+"""
+Data-access layer for the email_integration module.
+
+All SQL touching emails_* tables lives here. Other modules call named
+helpers — they never write SQL inline. This keeps schema changes contained.
+
+Connection pattern matches server.py get_db():
+    - WAL mode, foreign keys on, busy_timeout
+    - sqlite3.Row row_factory
+The caller is responsible for committing / closing.
+"""
+
+import json
+import os
+import sqlite3
+import uuid
+from datetime import datetime, timezone
+from typing import Iterable, Optional
+
+
+# ------------------------------------------------------------------ migrations
+
+def apply_migrations(cursor: sqlite3.Cursor) -> None:
+    """Apply all .sql migration files in migrations/ in lexicographic order.
+
+    Called from server.init_db(). Idempotent. Does not log past migrations in
+    a table yet — each file is guarded by CREATE ... IF NOT EXISTS etc. If
+    we ever need more complex migrations, add a schema_migrations table.
+    """
+    here = os.path.dirname(os.path.abspath(__file__))
+    mdir = os.path.join(here, "migrations")
+    if not os.path.isdir(mdir):
+        return
+    for name in sorted(os.listdir(mdir)):
+        if not name.endswith(".sql"):
+            continue
+        path = os.path.join(mdir, name)
+        with open(path, "r") as f:
+            sql = f.read()
+        cursor.executescript(sql)
+
+
+# ------------------------------------------------------------------ utils
+
+def _uuid() -> str:
+    return str(uuid.uuid4())
+
+
+def _now_iso() -> str:
+    return datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+
+
+def _json(v) -> str:
+    return json.dumps(v, separators=(",", ":"))
+
+
+# ------------------------------------------------------------------ email_accounts
+
+def list_sync_ready_accounts(conn: sqlite3.Connection) -> list[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT * FROM email_accounts "
+        "WHERE sync_enabled = 1 AND sync_status IN ('pending','active') "
+        "ORDER BY last_synced_at IS NOT NULL, last_synced_at"
+    )
+    return cur.fetchall()
+
+
+def get_account_by_email(conn: sqlite3.Connection, email_address: str) -> Optional[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute("SELECT * FROM email_accounts WHERE email_address = ?", (email_address,))
+    return cur.fetchone()
+
+
+def upsert_account(conn: sqlite3.Connection, *, user_id: str, email_address: str,
+                   auth_method: str) -> str:
+    existing = get_account_by_email(conn, email_address)
+    if existing:
+        return existing["id"]
+    account_id = _uuid()
+    conn.execute(
+        "INSERT INTO email_accounts (id, user_id, email_address, auth_method) "
+        "VALUES (?, ?, ?, ?)",
+        (account_id, user_id, email_address, auth_method),
+    )
+    return account_id
+
+
+def set_account_status(conn: sqlite3.Connection, account_id: str, *,
+                       status: str, error: Optional[str] = None) -> None:
+    conn.execute(
+        "UPDATE email_accounts SET sync_status = ?, sync_error = ?, "
+        "updated_at = datetime('now') WHERE id = ?",
+        (status, error, account_id),
+    )
+
+
+def set_account_checkpoint(conn: sqlite3.Connection, account_id: str, *,
+                           history_id: Optional[str] = None,
+                           backfill_cursor: Optional[str] = None,
+                           backfill_complete: Optional[bool] = None,
+                           last_synced_at: Optional[str] = None) -> None:
+    sets, params = [], []
+    if history_id is not None:
+        sets.append("last_history_id = ?"); params.append(history_id)
+    if backfill_cursor is not None:
+        sets.append("backfill_cursor = ?"); params.append(backfill_cursor)
+    if backfill_complete is not None:
+        sets.append("backfill_complete = ?"); params.append(1 if backfill_complete else 0)
+    if last_synced_at is not None:
+        sets.append("last_synced_at = ?"); params.append(last_synced_at)
+    if not sets:
+        return
+    sets.append("updated_at = datetime('now')")
+    params.append(account_id)
+    conn.execute(f"UPDATE email_accounts SET {', '.join(sets)} WHERE id = ?", params)
+
+
+# ------------------------------------------------------------------ emails
+
+def find_email_by_rfc_id(conn: sqlite3.Connection, rfc_message_id: str) -> Optional[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute("SELECT * FROM emails WHERE rfc_message_id = ?", (rfc_message_id,))
+    return cur.fetchone()
+
+
+def find_email_id_by_any_rfc_id(conn: sqlite3.Connection,
+                                rfc_ids: Iterable[str]) -> Optional[str]:
+    ids = [r for r in rfc_ids if r]
+    if not ids:
+        return None
+    placeholders = ",".join("?" for _ in ids)
+    cur = conn.cursor()
+    cur.execute(
+        f"SELECT id FROM emails WHERE rfc_message_id IN ({placeholders}) "
+        "ORDER BY sent_at ASC LIMIT 1",
+        ids,
+    )
+    row = cur.fetchone()
+    return row["id"] if row else None
+
+
+def insert_email(conn: sqlite3.Connection, *, parsed: dict, match_status: str) -> str:
+    """Insert a fresh emails row. Returns email_id.
+
+    Caller must ensure no row exists for parsed['rfc_message_id']; use
+    find_email_by_rfc_id first.
+    """
+    email_id = _uuid()
+    conn.execute(
+        """INSERT INTO emails
+        (id, rfc_message_id, gmail_thread_id, rfc_thread_root_id, subject,
+         from_email, from_name, to_emails_json, cc_emails_json, bcc_emails_json,
+         reply_to, sent_at, body_text, body_html, snippet, in_reply_to,
+         references_json, has_attachments, size_estimate, is_matched,
+         match_status, raw_headers_json)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+        (
+            email_id,
+            parsed["rfc_message_id"],
+            parsed.get("gmail_thread_id"),
+            parsed.get("rfc_thread_root_id"),
+            parsed.get("subject"),
+            parsed["from_email"],
+            parsed.get("from_name"),
+            _json(parsed.get("to", [])),
+            _json(parsed.get("cc", [])),
+            _json(parsed.get("bcc", [])),
+            parsed.get("reply_to"),
+            parsed["sent_at"],
+            parsed.get("body_text"),
+            parsed.get("body_html"),
+            parsed.get("snippet"),
+            parsed.get("in_reply_to"),
+            _json(parsed.get("references", [])),
+            1 if parsed.get("attachments") else 0,
+            parsed.get("size_estimate"),
+            1 if match_status == "matched" else 0,
+            match_status,
+            _json(parsed.get("raw_headers", {})) if parsed.get("raw_headers") else None,
+        ),
+    )
+    # recipients
+    for kind in ("from", "to", "cc", "bcc", "reply_to"):
+        addrs = []
+        if kind == "from" and parsed.get("from_email"):
+            addrs = [(parsed["from_email"], parsed.get("from_name"))]
+        elif kind == "reply_to" and parsed.get("reply_to"):
+            addrs = [(parsed["reply_to"], None)]
+        else:
+            for a in parsed.get(kind, []):
+                if isinstance(a, dict):
+                    addrs.append((a.get("email"), a.get("name")))
+                else:
+                    addrs.append((a, None))
+        for address, name in addrs:
+            if not address:
+                continue
+            conn.execute(
+                "INSERT INTO email_recipients (id, email_id, address, display_name, kind) "
+                "VALUES (?, ?, ?, ?, ?)",
+                (_uuid(), email_id, address.lower().strip(), name, kind),
+            )
+    return email_id
+
+
+def set_email_thread(conn: sqlite3.Connection, email_id: str, thread_id: str) -> None:
+    conn.execute(
+        "UPDATE emails SET thread_id = ?, updated_at = datetime('now') WHERE id = ?",
+        (thread_id, email_id),
+    )
+
+
+# ------------------------------------------------------------------ sightings
+
+def upsert_sighting(conn: sqlite3.Connection, *, email_id: str, account_id: str,
+                    gmail_message_id: str, gmail_thread_id: str,
+                    labels: list[str], is_sent: bool) -> None:
+    conn.execute(
+        """INSERT OR IGNORE INTO email_account_messages
+        (id, email_id, account_id, gmail_message_id, gmail_thread_id,
+         labels_json, is_sent)
+        VALUES (?, ?, ?, ?, ?, ?, ?)""",
+        (_uuid(), email_id, account_id, gmail_message_id, gmail_thread_id,
+         _json(labels), 1 if is_sent else 0),
+    )
+
+
+def update_sighting_labels(conn: sqlite3.Connection, *, account_id: str,
+                           gmail_message_id: str, labels: list[str]) -> None:
+    conn.execute(
+        "UPDATE email_account_messages SET labels_json = ? "
+        "WHERE account_id = ? AND gmail_message_id = ?",
+        (_json(labels), account_id, gmail_message_id),
+    )
+
+
+def tombstone_sighting(conn: sqlite3.Connection, *, account_id: str,
+                       gmail_message_id: str) -> None:
+    conn.execute(
+        "UPDATE email_account_messages SET deleted_at = datetime('now') "
+        "WHERE account_id = ? AND gmail_message_id = ?",
+        (account_id, gmail_message_id),
+    )
+
+
+# ------------------------------------------------------------------ attachments
+
+def insert_attachment_stub(conn: sqlite3.Connection, *, email_id: str,
+                           gmail_attachment_id: str, filename: str,
+                           sanitized_filename: str, mime_type: Optional[str],
+                           size_bytes: Optional[int], storage_path: str) -> str:
+    att_id = _uuid()
+    conn.execute(
+        """INSERT INTO email_attachments
+        (id, email_id, gmail_attachment_id, filename, sanitized_filename,
+         mime_type, size_bytes, storage_path)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
+        (att_id, email_id, gmail_attachment_id, filename, sanitized_filename,
+         mime_type, size_bytes, storage_path),
+    )
+    return att_id
+
+
+def mark_attachment_downloaded(conn: sqlite3.Connection, attachment_id: str, *,
+                               sha256_hex: str, size_bytes: int) -> None:
+    conn.execute(
+        "UPDATE email_attachments SET download_status = 'downloaded', "
+        "sha256_hex = ?, size_bytes = ?, downloaded_at = datetime('now') "
+        "WHERE id = ?",
+        (sha256_hex, size_bytes, attachment_id),
+    )
+
+
+def mark_attachment_failed(conn: sqlite3.Connection, attachment_id: str, *,
+                           error: str) -> None:
+    conn.execute(
+        "UPDATE email_attachments SET download_status = 'failed', "
+        "download_attempts = download_attempts + 1, download_error = ? "
+        "WHERE id = ?",
+        (error, attachment_id),
+    )
+
+
+def pending_attachments(conn: sqlite3.Connection, limit: int = 50) -> list[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT a.*, eam.gmail_message_id, eam.account_id "
+        "FROM email_attachments a "
+        "JOIN email_account_messages eam ON eam.email_id = a.email_id "
+        "WHERE a.download_status = 'pending' AND a.download_attempts < 5 "
+        "LIMIT ?",
+        (limit,),
+    )
+    return cur.fetchall()
+
+
+# ------------------------------------------------------------------ threads
+
+def find_thread_by_gmail_id(conn: sqlite3.Connection, gmail_thread_id: str) -> Optional[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT * FROM email_threads WHERE gmail_thread_id = ?",
+        (gmail_thread_id,),
+    )
+    return cur.fetchone()
+
+
+def find_thread_by_rfc_root(conn: sqlite3.Connection, rfc_root: str) -> Optional[sqlite3.Row]:
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT * FROM email_threads WHERE rfc_thread_root_id = ?",
+        (rfc_root,),
+    )
+    return cur.fetchone()
+
+
+def create_thread(conn: sqlite3.Connection, *, gmail_thread_id: Optional[str],
+                  rfc_thread_root_id: Optional[str], subject_normalized: Optional[str],
+                  first_message_at: Optional[str]) -> str:
+    thread_id = _uuid()
+    conn.execute(
+        """INSERT INTO email_threads
+        (id, gmail_thread_id, rfc_thread_root_id, subject_normalized,
+         first_message_at, last_message_at, message_count)
+        VALUES (?, ?, ?, ?, ?, ?, 0)""",
+        (thread_id, gmail_thread_id, rfc_thread_root_id, subject_normalized,
+         first_message_at, first_message_at),
+    )
+    return thread_id
+
+
+def rollup_thread(conn: sqlite3.Connection, thread_id: str) -> None:
+    """Recompute count / last_message_at / participants from member emails.
+
+    Cheap at 5-person team volumes. For larger deployments swap to triggers.
+    """
+    cur = conn.cursor()
+    cur.execute(
+        "SELECT COUNT(*) AS n, MIN(sent_at) AS first, MAX(sent_at) AS last, "
+        "MAX(is_matched) AS matched FROM emails WHERE thread_id = ?",
+        (thread_id,),
+    )
+    row = cur.fetchone()
+    if not row or row["n"] == 0:
+        return
+    cur.execute(
+        "SELECT DISTINCT address FROM email_recipients er "
+        "JOIN emails e ON e.id = er.email_id WHERE e.thread_id = ?",
+        (thread_id,),
+    )
+    participants = [r["address"] for r in cur.fetchall()]
+    conn.execute(
+        "UPDATE email_threads SET message_count = ?, first_message_at = ?, "
+        "last_message_at = ?, participant_count = ?, participants_json = ?, "
+        "is_matched = ?, updated_at = datetime('now') WHERE id = ?",
+        (row["n"], row["first"], row["last"], len(participants),
+         _json(participants), int(row["matched"] or 0), thread_id),
+    )
+
+
+# ------------------------------------------------------------------ investor links
+
+def insert_investor_link(conn: sqlite3.Connection, *, email_id: str,
+                         link: dict) -> None:
+    conn.execute(
+        """INSERT INTO email_investor_links
+        (id, email_id, fundraising_investor_id, fundraising_contact_id,
+         contact_id, organization_id, matched_address, match_kind,
+         match_confidence)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+        (
+            _uuid(),
+            email_id,
+            link.get("fundraising_investor_id"),
+            link.get("fundraising_contact_id"),
+            link.get("contact_id"),
+            link.get("organization_id"),
+            link["matched_address"],
+            link["match_kind"],
+            float(link.get("match_confidence", 1.0)),
+        ),
+    )
+
+
+# ------------------------------------------------------------------ sync runs
+
+def start_sync_run(conn: sqlite3.Connection, *, account_id: str, kind: str) -> str:
+    run_id = _uuid()
+    conn.execute(
+        "INSERT INTO email_sync_runs (id, account_id, kind, started_at, status) "
+        "VALUES (?, ?, ?, ?, 'running')",
+        (run_id, account_id, kind, _now_iso()),
+    )
+    return run_id
+
+
+def finish_sync_run(conn: sqlite3.Connection, run_id: str, *, status: str,
+                    stats: Optional[dict] = None, error: Optional[str] = None) -> None:
+    stats = stats or {}
+    conn.execute(
+        """UPDATE email_sync_runs
+           SET finished_at = ?, status = ?, messages_seen = ?, messages_stored = ?,
+               attachments_saved = ?, api_calls = ?, retries = ?, error = ?
+           WHERE id = ?""",
+        (
+            _now_iso(), status,
+            int(stats.get("messages_seen", 0)),
+            int(stats.get("messages_stored", 0)),
+            int(stats.get("attachments_saved", 0)),
+            int(stats.get("api_calls", 0)),
+            int(stats.get("retries", 0)),
+            error,
+            run_id,
+        ),
+    )
@@ -0,0 +1,79 @@
+"""
+Exception taxonomy for Gmail integration.
+
+gmail_client._call() maps HTTP status codes to these exception types. The retry
+loop in gmail_client._with_retry() inspects the class hierarchy to decide
+whether to back off + retry or fail fast.
+"""
+
+
+class GmailError(Exception):
+    """Base class for all Gmail-integration errors."""
+
+    def __init__(self, message: str = "", *, status: int = 0, payload: object = None):
+        super().__init__(message)
+        self.status = status
+        self.payload = payload
+
+
+class AuthError(GmailError):
+    """401 / 403 that is not a rate-limit. Requires operator intervention
+    (bad service account key, revoked OAuth, missing DWD scope). Not retried."""
+
+
+class RateLimitError(GmailError):
+    """429 or 403 with reason in {rateLimitExceeded, userRateLimitExceeded}.
+    Retried with exponential backoff."""
+
+
+class TransientError(GmailError):
+    """5xx or network error. Retried with exponential backoff."""
+
+
+class NotFoundError(GmailError):
+    """404. For messages this usually means 'deleted in Gmail after we saw it';
+    for history this is HistoryExpiredError."""
+
+
+class HistoryExpiredError(NotFoundError):
+    """404 on history.list with startHistoryId — Gmail only retains history
+    for a limited window (~7 days). Triggers date-based backfill fallback."""
+
+
+class PermanentError(GmailError):
+    """400 or other permanent failure. Skip and log; do not retry."""
+
+
+def classify_http(status: int, payload: object) -> GmailError:
+    """Map a Gmail API response to the appropriate exception type.
+
+    `payload` is the decoded JSON body if any; used to distinguish rate-limit
+    403s from pure auth 403s via the `reason` field Google returns.
+    """
+    reason = ""
+    if isinstance(payload, dict):
+        try:
+            errs = payload.get("error", {}).get("errors") or []
+            if errs:
+                reason = str(errs[0].get("reason", ""))
+        except Exception:  # pragma: no cover — defensive
+            pass
+
+    if status == 429:
+        return RateLimitError(f"rate limited: {reason}", status=status, payload=payload)
+    if status == 403:
+        if reason in ("rateLimitExceeded", "userRateLimitExceeded", "quotaExceeded"):
+            return RateLimitError(f"quota: {reason}", status=status, payload=payload)
+        return AuthError(f"forbidden: {reason}", status=status, payload=payload)
+    if status == 401:
+        return AuthError("unauthorized", status=status, payload=payload)
+    if status == 404:
+        return NotFoundError("not found", status=status, payload=payload)
+    if 500 <= status < 600:
+        return TransientError(f"server error {status}", status=status, payload=payload)
+    if 400 <= status < 500:
+        return PermanentError(f"client error {status}: {reason}", status=status, payload=payload)
+    return GmailError(f"unexpected status {status}", status=status, payload=payload)
+
+
+RETRYABLE = (RateLimitError, TransientError)
@@ -0,0 +1,249 @@
+"""
+Thin Gmail API wrapper.
+
+Responsibilities:
+- HTTPS calls to https://gmail.googleapis.com/gmail/v1/users/me/*
+- Per-account access-token injection via CredentialProvider
+- Rate limiting via token bucket
+- Retry loop with exponential backoff + jitter for RETRYABLE errors
+- Batch requests for metadata fetches (multipart/mixed) — sketch provided
+- Call-count accounting for observability (plumbed to email_sync_runs)
+
+We call Gmail over raw urllib instead of the google-api-python-client to keep
+the dependency surface small. If you prefer the Google SDK, replace _call()
+with client calls; everything else is independent.
+"""
+
+import json
+import random
+import threading
+import time
+import urllib.error
+import urllib.parse
+import urllib.request
+from dataclasses import dataclass, field
+from typing import Any, Iterator, Optional
+
+from . import config as _cfg
+from . import errors
+
+
+BASE = "https://gmail.googleapis.com/gmail/v1/users"
+
+
+# ---------------------------------------------------------------------------- token bucket
+
+class _TokenBucket:
+    """Simple per-account rate limiter. Call wait(cost) before each API call."""
+
+    def __init__(self, units_per_sec: int, burst: Optional[int] = None):
+        self._rate = float(units_per_sec)
+        self._burst = float(burst if burst is not None else units_per_sec)
+        self._tokens = self._burst
+        self._last = time.monotonic()
+        self._lock = threading.Lock()
+
+    def wait(self, cost: float) -> None:
+        while True:
+            with self._lock:
+                now = time.monotonic()
+                self._tokens = min(self._burst, self._tokens + (now - self._last) * self._rate)
+                self._last = now
+                if self._tokens >= cost:
+                    self._tokens -= cost
+                    return
+                needed = cost - self._tokens
+                sleep_for = needed / self._rate
+            time.sleep(sleep_for)
+
+
+# ---------------------------------------------------------------------------- call stats
+
+@dataclass
+class CallStats:
+    api_calls: int = 0
+    retries: int = 0
+    bytes_in: int = 0
+    last_errors: list[str] = field(default_factory=list)
+
+
+# ---------------------------------------------------------------------------- client
+
+class GmailClient:
+    """Per-account Gmail client. Bind one instance per sync run."""
+
+    def __init__(self, credential_provider, email_address: str, stats: Optional[CallStats] = None):
+        self._creds = credential_provider
+        self._email = email_address
+        self._bucket = _TokenBucket(units_per_sec=_cfg.CONFIG.rate_limit_units_per_sec_per_account)
+        self.stats = stats or CallStats()
+
+    # -------------------------------------------------------------- messages.*
+
+    def list_messages(self, *, q: str = "", page_token: Optional[str] = None,
+                      max_results: int = 500, label_ids: Optional[list[str]] = None) -> dict:
+        """https://developers.google.com/gmail/api/reference/rest/v1/users.messages/list"""
+        params = {"maxResults": str(max_results)}
+        if q:
+            params["q"] = q
+        if page_token:
+            params["pageToken"] = page_token
+        if label_ids:
+            for lid in label_ids:
+                params.setdefault("labelIds", [])
+                params["labelIds"].append(lid) if False else None
+        return self._get("/messages", params=params, cost=5)
+
+    def get_message(self, message_id: str, *, format: str = "metadata",
+                    metadata_headers: Optional[list[str]] = None) -> dict:
+        params = {"format": format}
+        if format == "metadata" and metadata_headers:
+            params["metadataHeaders"] = metadata_headers
+        return self._get(f"/messages/{message_id}", params=params, cost=5)
+
+    def get_attachment(self, message_id: str, attachment_id: str) -> dict:
+        return self._get(
+            f"/messages/{message_id}/attachments/{attachment_id}",
+            params=None,
+            cost=5,
+        )
+
+    # -------------------------------------------------------------- history.*
+
+    def list_history(self, *, start_history_id: str, page_token: Optional[str] = None,
+                     history_types: Optional[list[str]] = None) -> dict:
+        params = {"startHistoryId": start_history_id, "maxResults": "500"}
+        if page_token:
+            params["pageToken"] = page_token
+        if history_types:
+            params["historyTypes"] = history_types
+        try:
+            return self._get("/history", params=params, cost=2)
+        except errors.NotFoundError as e:
+            # Gmail returns 404 when startHistoryId is too old. Wrap for callers.
+            raise errors.HistoryExpiredError(
+                "startHistoryId no longer available", status=404, payload=getattr(e, "payload", None)
+            ) from e
+
+    # -------------------------------------------------------------- profile
+
+    def get_profile(self) -> dict:
+        return self._get("/profile", params=None, cost=1)
+
+    # -------------------------------------------------------------- iteration helpers
+
+    def iter_messages(self, *, q: str = "") -> Iterator[dict]:
+        page_token: Optional[str] = None
+        while True:
+            resp = self.list_messages(q=q, page_token=page_token,
+                                      max_results=_cfg.CONFIG.backfill_page_size)
+            for m in resp.get("messages") or []:
+                yield m
+            page_token = resp.get("nextPageToken")
+            if not page_token:
+                return
+
+    def iter_history(self, *, start_history_id: str,
+                     history_types: Optional[list[str]] = None) -> Iterator[dict]:
+        page_token: Optional[str] = None
+        while True:
+            resp = self.list_history(
+                start_history_id=start_history_id,
+                page_token=page_token,
+                history_types=history_types,
+            )
+            for h in resp.get("history") or []:
+                yield h
+            page_token = resp.get("nextPageToken")
+            if not page_token:
+                # Cache final historyId for caller to checkpoint.
+                self._last_history_id = resp.get("historyId")
+                return
+
+    @property
+    def last_history_id(self) -> Optional[str]:
+        return getattr(self, "_last_history_id", None)
+
+    # -------------------------------------------------------------- internals
+
+    def _get(self, path: str, *, params: Optional[dict], cost: float) -> dict:
+        return self._with_retry(lambda: self._call("GET", path, params=params, cost=cost))
+
+    def _call(self, method: str, path: str, *, params: Optional[dict] = None,
+              body: Optional[bytes] = None, cost: float = 1.0) -> dict:
+        self._bucket.wait(cost)
+        self.stats.api_calls += 1
+
+        qs = ""
+        if params:
+            # urllib.parse.urlencode with doseq=True handles repeated params
+            # like metadataHeaders=Foo&metadataHeaders=Bar correctly.
+            qs = "?" + urllib.parse.urlencode(params, doseq=True)
+        url = f"{BASE}/me{path}{qs}"
+
+        token = self._creds.access_token_for(self._email)
+        req = urllib.request.Request(url, method=method, data=body)
+        req.add_header("Authorization", f"Bearer {token.token}")
+        req.add_header("Accept", "application/json")
+        if body:
+            req.add_header("Content-Type", "application/json")
+
+        try:
+            with urllib.request.urlopen(req, timeout=30) as resp:
+                raw = resp.read()
+                self.stats.bytes_in += len(raw)
+                if not raw:
+                    return {}
+                return json.loads(raw)
+        except urllib.error.HTTPError as e:
+            raw = e.read() or b""
+            self.stats.bytes_in += len(raw)
+            try:
+                payload = json.loads(raw) if raw else {}
+            except Exception:
+                payload = {"raw": raw.decode("utf-8", errors="replace")}
+            err = errors.classify_http(e.code, payload)
+            # short-message logging hook (redacted of tokens by design)
+            self.stats.last_errors.append(f"{e.code} {type(err).__name__}")
+            self.stats.last_errors = self.stats.last_errors[-10:]
+            raise err
+        except (urllib.error.URLError, TimeoutError) as e:
+            raise errors.TransientError(f"network error: {e}") from e
+
+    def _with_retry(self, fn):
+        cfg = _cfg.CONFIG
+        attempts = 0
+        delay = cfg.retry_initial_delay_sec
+        while True:
+            try:
+                return fn()
+            except errors.RETRYABLE as e:
+                attempts += 1
+                if attempts >= cfg.retry_max_attempts:
+                    raise
+                self.stats.retries += 1
+                # Full jitter
+                sleep_for = random.uniform(0, min(delay, cfg.retry_max_delay_sec))
+                time.sleep(sleep_for)
+                delay = min(delay * 2, cfg.retry_max_delay_sec)
+            # Non-retryable errors propagate immediately.
+
+
+# ---------------------------------------------------------------------------- batch fetch sketch
+
+def batch_get_metadata(client: GmailClient, message_ids: list[str],
+                       headers: list[str]) -> dict[str, dict]:
+    """Fetch metadata for up to ~100 messages.
+
+    TODO: implement using Gmail's multipart/mixed batch endpoint at
+    https://www.googleapis.com/batch/gmail/v1 for efficiency. In the scaffold
+    we fall back to serial gets so the logic is correct from day 1.
+    """
+    out: dict[str, dict] = {}
+    for mid in message_ids:
+        try:
+            out[mid] = client.get_message(mid, format="metadata", metadata_headers=headers)
+        except errors.NotFoundError:
+            # Message deleted between list and get — skip.
+            continue
+    return out
@@ -0,0 +1,215 @@
+"""
+Investor matching.
+
+Builds an in-memory index of investor email addresses from:
+  - fundraising_contacts.email
+  - contacts.email
+  - organizations.email + organizations.website (domain only)
+
+For each synced email, returns a list of investor links. Exact-email matches
+beat domain matches; if any exact match exists, domain matches are suppressed.
+
+The index is rebuilt every `REFRESH_INTERVAL_SEC` or on demand via rebuild().
+"""
+
+import re
+import threading
+import time
+from dataclasses import dataclass
+from typing import Optional
+
+
+REFRESH_INTERVAL_SEC = 900  # 15 minutes
+
+# Domains we never domain-match against (personal mailboxes).
+COMMON_PERSONAL_DOMAINS = {
+    "gmail.com", "googlemail.com",
+    "outlook.com", "hotmail.com", "live.com", "msn.com",
+    "yahoo.com", "yahoo.co.uk", "ymail.com",
+    "icloud.com", "me.com", "mac.com",
+    "aol.com", "proton.me", "protonmail.com",
+    "pm.me", "fastmail.com", "tuta.io", "hey.com",
+    "duck.com", "zoho.com",
+}
+
+
+# Also skip matching on the team's own domain (they email each other).
+# Populated from CONFIG.workspace_domain at rebuild time.
+
+
+@dataclass
+class MatchTarget:
+    fundraising_investor_id: Optional[str] = None
+    fundraising_contact_id: Optional[str] = None
+    contact_id: Optional[str] = None
+    organization_id: Optional[str] = None
+    investor_name: Optional[str] = None
+
+
+@dataclass
+class InvestorLink:
+    matched_address: str
+    match_kind: str          # exact_email | domain_match | manual
+    match_confidence: float
+    target: MatchTarget
+
+
+class InvestorIndex:
+
+    def __init__(self, own_domain: Optional[str] = None):
+        self._email_index: dict[str, MatchTarget] = {}
+        self._domain_index: dict[str, list[MatchTarget]] = {}
+        self._own_domain = (own_domain or "").lower() or None
+        self._last_built = 0.0
+        self._lock = threading.Lock()
+
+    # ------------------------------------------------------------------ build
+
+    def rebuild(self, db_conn_factory) -> None:
+        with self._lock:
+            email_idx: dict[str, MatchTarget] = {}
+            domain_idx: dict[str, list[MatchTarget]] = {}
+
+            conn = db_conn_factory()
+            try:
+                cur = conn.cursor()
+
+                # fundraising_contacts
+                cur.execute(
+                    "SELECT fc.id, fc.email, fc.investor_id, fi.investor_name "
+                    "FROM fundraising_contacts fc "
+                    "LEFT JOIN fundraising_investors fi ON fi.id = fc.investor_id "
+                    "WHERE fc.email IS NOT NULL AND fc.email != ''"
+                )
+                for r in cur.fetchall():
+                    addr = (r["email"] or "").lower().strip()
+                    if not _valid_email(addr):
+                        continue
+                    email_idx[addr] = MatchTarget(
+                        fundraising_contact_id=r["id"],
+                        fundraising_investor_id=r["investor_id"],
+                        investor_name=r["investor_name"],
+                    )
+
+                # contacts
+                cur.execute(
+                    "SELECT id, email, organization_id FROM contacts "
+                    "WHERE email IS NOT NULL AND email != ''"
+                )
+                for r in cur.fetchall():
+                    addr = (r["email"] or "").lower().strip()
+                    if not _valid_email(addr):
+                        continue
+                    # Don't overwrite a fundraising_contact match; they're higher signal.
+                    email_idx.setdefault(addr, MatchTarget(
+                        contact_id=r["id"],
+                        organization_id=r["organization_id"],
+                    ))
+
+                # organizations — domain-only match source
+                cur.execute(
+                    "SELECT id, name, email, website FROM organizations "
+                    "WHERE (email IS NOT NULL AND email != '') OR (website IS NOT NULL AND website != '')"
+                )
+                for r in cur.fetchall():
+                    for d in _domains_for_org(r):
+                        if d in COMMON_PERSONAL_DOMAINS:
+                            continue
+                        if self._own_domain and d == self._own_domain:
+                            continue
+                        domain_idx.setdefault(d, []).append(MatchTarget(
+                            organization_id=r["id"],
+                            investor_name=r["name"],
+                        ))
+            finally:
+                conn.close()
+
+            self._email_index = email_idx
+            self._domain_index = domain_idx
+            self._last_built = time.time()
+
+    def rebuild_if_stale(self, db_conn_factory) -> None:
+        if time.time() - self._last_built > REFRESH_INTERVAL_SEC:
+            self.rebuild(db_conn_factory)
+
+    # ------------------------------------------------------------------ query
+
+    def match(self, addresses: set[str], *,
+              exclude_addresses: Optional[set[str]] = None) -> list[InvestorLink]:
+        excl = {a.lower() for a in (exclude_addresses or set())}
+        candidates = {a.lower().strip() for a in addresses if a} - excl
+
+        # Exclude own domain addresses (teammates emailing each other).
+        if self._own_domain:
+            candidates = {a for a in candidates
+                          if not a.endswith("@" + self._own_domain)}
+
+        links: list[InvestorLink] = []
+        seen_targets: set[tuple] = set()
+
+        # Exact email matches first.
+        for addr in candidates:
+            t = self._email_index.get(addr)
+            if t:
+                key = (t.fundraising_contact_id, t.contact_id)
+                if key in seen_targets:
+                    continue
+                seen_targets.add(key)
+                links.append(InvestorLink(
+                    matched_address=addr,
+                    match_kind="exact_email",
+                    match_confidence=1.0,
+                    target=t,
+                ))
+
+        if links:  # exact hits short-circuit domain matching
+            return links
+
+        # Domain fallback.
+        for addr in candidates:
+            _, _, domain = addr.partition("@")
+            if not domain or domain in COMMON_PERSONAL_DOMAINS:
+                continue
+            for t in self._domain_index.get(domain, []):
+                key = ("org", t.organization_id)
+                if key in seen_targets:
+                    continue
+                seen_targets.add(key)
+                links.append(InvestorLink(
+                    matched_address=addr,
+                    match_kind="domain_match",
+                    match_confidence=0.6,
+                    target=t,
+                ))
+        return links
+
+
+# ---------------------------------------------------------------------------- helpers
+
+_EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
+
+
+def _valid_email(s: str) -> bool:
+    return bool(_EMAIL_RE.match(s))
+
+
+def _domains_for_org(row) -> list[str]:
+    out: list[str] = []
+    if row["email"]:
+        _, _, d = row["email"].lower().partition("@")
+        if d:
+            out.append(d)
+    if row["website"]:
+        d = _domain_from_url(row["website"])
+        if d:
+            out.append(d)
+    return list({d for d in out if d})
+
+
+def _domain_from_url(url: str) -> Optional[str]:
+    if not url:
+        return None
+    m = re.match(r"^\s*(?:https?://)?(?:www\.)?([^/:?#\s]+)", url.strip(), re.IGNORECASE)
+    if not m:
+        return None
+    return m.group(1).lower()
@@ -0,0 +1,192 @@
+-- Gmail Integration — Phase 1 migration
+-- Creates all tables for email capture, matching, threading, attachments.
+-- This migration is IDEMPOTENT: safe to re-run.
+-- Applied by email_integration.db.apply_migrations() on server startup when
+-- CRM_GMAIL_INTEGRATION_ENABLED is truthy.
+--
+-- DO NOT modify this file in place after it ships. Create 0002_*.sql, etc.
+
+-- ============================================================================
+-- email_accounts — one row per enrolled team-member mailbox
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_accounts (
+    id                TEXT PRIMARY KEY,
+    user_id           TEXT NOT NULL,
+    email_address     TEXT NOT NULL UNIQUE,
+    auth_method       TEXT NOT NULL,                 -- 'dwd' | 'oauth'
+    oauth_refresh_enc BLOB,
+    oauth_token_enc   BLOB,
+    oauth_token_exp   TEXT,
+    sync_enabled      INTEGER NOT NULL DEFAULT 1,
+    sync_status       TEXT NOT NULL DEFAULT 'pending',
+    sync_error        TEXT,
+    last_history_id   TEXT,
+    last_synced_at    TEXT,
+    backfill_complete INTEGER NOT NULL DEFAULT 0,
+    backfill_cursor   TEXT,
+    created_at        TEXT DEFAULT (datetime('now')),
+    updated_at        TEXT DEFAULT (datetime('now')),
+    FOREIGN KEY(user_id) REFERENCES users(id)
+);
+CREATE INDEX IF NOT EXISTS idx_email_accounts_user ON email_accounts(user_id);
+CREATE INDEX IF NOT EXISTS idx_email_accounts_sync ON email_accounts(sync_enabled, sync_status);
+
+-- ============================================================================
+-- emails — canonical email record, dedup'd across accounts by RFC Message-ID
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS emails (
+    id                 TEXT PRIMARY KEY,
+    rfc_message_id     TEXT NOT NULL UNIQUE,
+    gmail_thread_id    TEXT,
+    rfc_thread_root_id TEXT,
+    thread_id          TEXT,                         -- FK email_threads.id (populated by threads.py)
+    subject            TEXT,
+    from_email         TEXT NOT NULL,
+    from_name          TEXT,
+    to_emails_json     TEXT NOT NULL DEFAULT '[]',
+    cc_emails_json     TEXT NOT NULL DEFAULT '[]',
+    bcc_emails_json    TEXT NOT NULL DEFAULT '[]',
+    reply_to           TEXT,
+    sent_at            TEXT NOT NULL,
+    body_text          TEXT,
+    body_html          TEXT,
+    snippet            TEXT,
+    in_reply_to        TEXT,
+    references_json    TEXT DEFAULT '[]',
+    has_attachments    INTEGER NOT NULL DEFAULT 0,
+    size_estimate      INTEGER,
+    is_matched         INTEGER NOT NULL DEFAULT 0,
+    match_status       TEXT NOT NULL DEFAULT 'unmatched',   -- unmatched|matched|skipped
+    raw_headers_json   TEXT,
+    created_at         TEXT DEFAULT (datetime('now')),
+    updated_at         TEXT DEFAULT (datetime('now'))
+);
+CREATE INDEX IF NOT EXISTS idx_emails_thread      ON emails(gmail_thread_id);
+CREATE INDEX IF NOT EXISTS idx_emails_rfc_thread  ON emails(rfc_thread_root_id);
+CREATE INDEX IF NOT EXISTS idx_emails_thread_fk   ON emails(thread_id);
+CREATE INDEX IF NOT EXISTS idx_emails_from        ON emails(from_email);
+CREATE INDEX IF NOT EXISTS idx_emails_sent_at     ON emails(sent_at);
+CREATE INDEX IF NOT EXISTS idx_emails_matched     ON emails(is_matched, sent_at);
+CREATE INDEX IF NOT EXISTS idx_emails_in_reply_to ON emails(in_reply_to);
+
+-- ============================================================================
+-- email_recipients — denormalized for fast address lookups
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_recipients (
+    id           TEXT PRIMARY KEY,
+    email_id     TEXT NOT NULL,
+    address      TEXT NOT NULL,
+    display_name TEXT,
+    kind         TEXT NOT NULL,              -- from|to|cc|bcc|reply_to
+    FOREIGN KEY(email_id) REFERENCES emails(id) ON DELETE CASCADE
+);
+CREATE INDEX IF NOT EXISTS idx_email_recipients_addr  ON email_recipients(address);
+CREATE INDEX IF NOT EXISTS idx_email_recipients_email ON email_recipients(email_id);
+
+-- ============================================================================
+-- email_account_messages — per-mailbox sighting of an email
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_account_messages (
+    id                TEXT PRIMARY KEY,
+    email_id          TEXT NOT NULL,
+    account_id        TEXT NOT NULL,
+    gmail_message_id  TEXT NOT NULL,
+    gmail_thread_id   TEXT NOT NULL,
+    labels_json       TEXT DEFAULT '[]',
+    is_sent           INTEGER NOT NULL DEFAULT 0,
+    first_seen_at     TEXT DEFAULT (datetime('now')),
+    deleted_at        TEXT,
+    FOREIGN KEY(email_id)   REFERENCES emails(id) ON DELETE CASCADE,
+    FOREIGN KEY(account_id) REFERENCES email_accounts(id) ON DELETE CASCADE,
+    UNIQUE(account_id, gmail_message_id)
+);
+CREATE INDEX IF NOT EXISTS idx_eam_email     ON email_account_messages(email_id);
+CREATE INDEX IF NOT EXISTS idx_eam_account   ON email_account_messages(account_id);
+CREATE INDEX IF NOT EXISTS idx_eam_gmail_msg ON email_account_messages(gmail_message_id);
+
+-- ============================================================================
+-- email_attachments — metadata; bytes on disk under data/email_attachments/
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_attachments (
+    id                  TEXT PRIMARY KEY,
+    email_id            TEXT NOT NULL,
+    gmail_attachment_id TEXT NOT NULL,
+    filename            TEXT NOT NULL,
+    sanitized_filename  TEXT NOT NULL,
+    mime_type           TEXT,
+    size_bytes          INTEGER,
+    sha256_hex          TEXT,
+    storage_path        TEXT NOT NULL,
+    download_status     TEXT NOT NULL DEFAULT 'pending',  -- pending|downloaded|failed|skipped
+    download_attempts   INTEGER NOT NULL DEFAULT 0,
+    download_error      TEXT,
+    downloaded_at       TEXT,
+    created_at          TEXT DEFAULT (datetime('now')),
+    FOREIGN KEY(email_id) REFERENCES emails(id) ON DELETE CASCADE
+);
+CREATE INDEX IF NOT EXISTS idx_attach_email  ON email_attachments(email_id);
+CREATE INDEX IF NOT EXISTS idx_attach_sha    ON email_attachments(sha256_hex);
+CREATE INDEX IF NOT EXISTS idx_attach_status ON email_attachments(download_status);
+
+-- ============================================================================
+-- email_threads — thread roll-up for UI
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_threads (
+    id                 TEXT PRIMARY KEY,
+    gmail_thread_id    TEXT,
+    rfc_thread_root_id TEXT,
+    subject_normalized TEXT,
+    first_message_at   TEXT,
+    last_message_at    TEXT,
+    message_count      INTEGER NOT NULL DEFAULT 0,
+    participant_count  INTEGER NOT NULL DEFAULT 0,
+    participants_json  TEXT DEFAULT '[]',
+    is_matched         INTEGER NOT NULL DEFAULT 0,
+    created_at         TEXT DEFAULT (datetime('now')),
+    updated_at         TEXT DEFAULT (datetime('now'))
+);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_threads_gmail_uniq ON email_threads(gmail_thread_id)
+    WHERE gmail_thread_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_threads_rfc_root ON email_threads(rfc_thread_root_id);
+CREATE INDEX IF NOT EXISTS idx_threads_last_msg ON email_threads(last_message_at);
+
+-- ============================================================================
+-- email_investor_links — matched investors
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_investor_links (
+    id                      TEXT PRIMARY KEY,
+    email_id                TEXT NOT NULL,
+    fundraising_investor_id TEXT,
+    fundraising_contact_id  TEXT,
+    contact_id              TEXT,
+    organization_id         TEXT,
+    matched_address         TEXT NOT NULL,
+    match_kind              TEXT NOT NULL,            -- exact_email|domain_match|manual
+    match_confidence        REAL NOT NULL DEFAULT 1.0,
+    created_at              TEXT DEFAULT (datetime('now')),
+    FOREIGN KEY(email_id) REFERENCES emails(id) ON DELETE CASCADE
+);
+CREATE INDEX IF NOT EXISTS idx_eil_email      ON email_investor_links(email_id);
+CREATE INDEX IF NOT EXISTS idx_eil_investor   ON email_investor_links(fundraising_investor_id);
+CREATE INDEX IF NOT EXISTS idx_eil_fr_contact ON email_investor_links(fundraising_contact_id);
+CREATE INDEX IF NOT EXISTS idx_eil_contact    ON email_investor_links(contact_id);
+
+-- ============================================================================
+-- email_sync_runs — per-run observability
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS email_sync_runs (
+    id                TEXT PRIMARY KEY,
+    account_id        TEXT NOT NULL,
+    kind              TEXT NOT NULL,                  -- backfill|incremental|manual
+    started_at        TEXT NOT NULL,
+    finished_at       TEXT,
+    status            TEXT NOT NULL,                  -- running|ok|error|partial
+    messages_seen     INTEGER NOT NULL DEFAULT 0,
+    messages_stored   INTEGER NOT NULL DEFAULT 0,
+    attachments_saved INTEGER NOT NULL DEFAULT 0,
+    api_calls         INTEGER NOT NULL DEFAULT 0,
+    retries           INTEGER NOT NULL DEFAULT 0,
+    error             TEXT,
+    FOREIGN KEY(account_id) REFERENCES email_accounts(id) ON DELETE CASCADE
+);
+CREATE INDEX IF NOT EXISTS idx_sync_runs_account ON email_sync_runs(account_id, started_at);
@@ -0,0 +1,283 @@
+"""
+Parse a Gmail `users.messages.get` response (format=full) into a flat dict
+ready for db.insert_email().
+
+Input shape (abbreviated):
+  {
+    "id": "...",                     # Gmail message id
+    "threadId": "...",
+    "labelIds": ["INBOX","IMPORTANT",...],
+    "snippet": "...",
+    "historyId": "...",
+    "internalDate": "1713657600000", # ms epoch, authoritative
+    "sizeEstimate": 12345,
+    "payload": {
+      "headers": [{"name":"Subject","value":"..."}, ...],
+      "mimeType": "multipart/mixed",
+      "parts": [...recursive...],
+      "body": {"data": "<base64url>", "size": ...}
+    }
+  }
+"""
+
+import base64
+import email.utils
+import email.header
+import re
+from datetime import datetime, timezone
+from typing import Any, Iterable, Optional
+from html.parser import HTMLParser
+
+
+# ---------------------------------------------------------------------------- public
+
+def parse(message: dict, *, owning_account_address: Optional[str] = None) -> dict:
+    """Parse a Gmail message payload into our canonical dict shape."""
+    headers = _header_map(message.get("payload", {}).get("headers") or [])
+
+    from_name, from_email = _split_addr(headers.get("from", ""))
+    to_list = _parse_address_list(headers.get("to", ""))
+    cc_list = _parse_address_list(headers.get("cc", ""))
+    bcc_list = _parse_address_list(headers.get("bcc", ""))
+    reply_to = _split_addr(headers.get("reply-to", ""))[1] or None
+
+    sent_at = _parse_date_header(headers.get("date"), fallback_ms=message.get("internalDate"))
+
+    rfc_mid = headers.get("message-id", "").strip() or f"synthetic-{message.get('id')}@ten31.local"
+    rfc_mid = _strip_angle_brackets(rfc_mid)
+    in_reply_to = _strip_angle_brackets(headers.get("in-reply-to", "").strip()) or None
+    references = _split_references(headers.get("references", ""))
+    rfc_thread_root_id = references[0] if references else (in_reply_to or rfc_mid)
+
+    body_text, body_html, attachments = _walk_payload(message.get("payload", {}))
+
+    subject = _decode_rfc2047(headers.get("subject") or "")
+
+    labels = message.get("labelIds") or []
+    is_sent = "SENT" in labels
+
+    return {
+        "gmail_message_id": message.get("id"),
+        "gmail_thread_id": message.get("threadId"),
+        "rfc_message_id": rfc_mid,
+        "rfc_thread_root_id": rfc_thread_root_id,
+        "in_reply_to": in_reply_to,
+        "references": references,
+        "subject": subject,
+        "from_email": (from_email or "").lower(),
+        "from_name": from_name,
+        "to": [{"email": e.lower(), "name": n} for n, e in to_list if e],
+        "cc": [{"email": e.lower(), "name": n} for n, e in cc_list if e],
+        "bcc": [{"email": e.lower(), "name": n} for n, e in bcc_list if e],
+        "reply_to": reply_to.lower() if reply_to else None,
+        "sent_at": sent_at,
+        "body_text": _cap_text(body_text),
+        "body_html": _cap_text(body_html),
+        "snippet": message.get("snippet"),
+        "attachments": attachments,
+        "size_estimate": message.get("sizeEstimate"),
+        "labels": labels,
+        "is_sent": is_sent,
+        "raw_headers": headers,
+        "owning_account": owning_account_address,
+    }
+
+
+# ---------------------------------------------------------------------------- headers
+
+def _header_map(header_list: Iterable[dict]) -> dict[str, str]:
+    """Case-insensitive keys. Last-write-wins for duplicates (rare)."""
+    out: dict[str, str] = {}
+    for h in header_list:
+        name = (h.get("name") or "").lower()
+        out[name] = h.get("value") or ""
+    return out
+
+
+def _decode_rfc2047(s: str) -> str:
+    if not s:
+        return ""
+    try:
+        parts = email.header.decode_header(s)
+        pieces = []
+        for text, charset in parts:
+            if isinstance(text, bytes):
+                try:
+                    pieces.append(text.decode(charset or "utf-8", errors="replace"))
+                except LookupError:
+                    pieces.append(text.decode("utf-8", errors="replace"))
+            else:
+                pieces.append(text)
+        return "".join(pieces)
+    except Exception:
+        return s
+
+
+def _split_addr(raw: str) -> tuple[Optional[str], Optional[str]]:
+    if not raw:
+        return (None, None)
+    name, addr = email.utils.parseaddr(raw)
+    return (_decode_rfc2047(name) or None, addr or None)
+
+
+def _parse_address_list(raw: str) -> list[tuple[Optional[str], Optional[str]]]:
+    if not raw:
+        return []
+    parsed = email.utils.getaddresses([raw])
+    return [(_decode_rfc2047(n) or None, a or None) for n, a in parsed if a]
+
+
+def _parse_date_header(raw: Optional[str], *, fallback_ms: Optional[str]) -> str:
+    # Prefer RFC Date header, fall back to Gmail internalDate (epoch ms).
+    if raw:
+        try:
+            dt = email.utils.parsedate_to_datetime(raw)
+            if dt is not None:
+                if dt.tzinfo is None:
+                    dt = dt.replace(tzinfo=timezone.utc)
+                return dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+        except (TypeError, ValueError):
+            pass
+    if fallback_ms:
+        try:
+            dt = datetime.fromtimestamp(int(fallback_ms) / 1000.0, tz=timezone.utc)
+            return dt.strftime("%Y-%m-%dT%H:%M:%SZ")
+        except (TypeError, ValueError):
+            pass
+    return datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+
+
+def _split_references(raw: str) -> list[str]:
+    if not raw:
+        return []
+    return [_strip_angle_brackets(p) for p in raw.split() if p.strip()]
+
+
+def _strip_angle_brackets(s: str) -> str:
+    s = (s or "").strip()
+    if s.startswith("<") and s.endswith(">"):
+        return s[1:-1]
+    return s
+
+
+# ---------------------------------------------------------------------------- MIME walk
+
+def _walk_payload(payload: dict) -> tuple[Optional[str], Optional[str], list[dict]]:
+    """Returns (body_text, body_html, attachments).
+
+    Depth-first walk. First plain/text wins for body_text; first text/html
+    wins for body_html. Anything with a filename or attachment disposition
+    becomes an attachment entry.
+    """
+    text: Optional[str] = None
+    html_body: Optional[str] = None
+    attachments: list[dict] = []
+
+    def visit(part: dict):
+        nonlocal text, html_body
+        mime = (part.get("mimeType") or "").lower()
+        filename = part.get("filename") or ""
+        body = part.get("body") or {}
+        parts = part.get("parts") or []
+
+        headers = _header_map(part.get("headers") or [])
+        disposition = (headers.get("content-disposition") or "").lower()
+        is_attachment = bool(filename) or disposition.startswith("attachment")
+
+        if is_attachment:
+            attachments.append({
+                "filename": filename or f"unnamed.{_ext_for(mime)}",
+                "mime_type": mime or "application/octet-stream",
+                "size": body.get("size"),
+                "gmail_attachment_id": body.get("attachmentId"),
+                # Some tiny attachments come inlined as base64; attachmentId is
+                # then missing and data is in body.data. sync.py handles both.
+                "inline_data_b64": body.get("data"),
+                "content_disposition": "inline" if disposition.startswith("inline") else "attachment",
+            })
+        else:
+            if mime == "text/plain" and text is None:
+                text = _decode_body(body)
+            elif mime == "text/html" and html_body is None:
+                html_body = _decode_body(body)
+
+        for child in parts:
+            visit(child)
+
+    visit(payload)
+
+    # Derive a plain-text body from HTML if only HTML exists.
+    if text is None and html_body:
+        text = _strip_html(html_body)
+
+    return text, html_body, attachments
+
+
+def _decode_body(body: dict) -> Optional[str]:
+    data = body.get("data")
+    if not data:
+        return None
+    try:
+        padding = 4 - (len(data) % 4)
+        if padding != 4:
+            data = data + ("=" * padding)
+        raw = base64.urlsafe_b64decode(data.encode("ascii"))
+        return raw.decode("utf-8", errors="replace").replace("\r\n", "\n")
+    except Exception:
+        return None
+
+
+# ---------------------------------------------------------------------------- HTML stripping
+
+class _HTMLToText(HTMLParser):
+    def __init__(self):
+        super().__init__()
+        self._parts: list[str] = []
+        self._skip_depth = 0
+
+    def handle_starttag(self, tag, attrs):
+        if tag in ("script", "style"):
+            self._skip_depth += 1
+        if tag in ("br", "p", "div", "tr", "li"):
+            self._parts.append("\n")
+
+    def handle_endtag(self, tag):
+        if tag in ("script", "style"):
+            self._skip_depth = max(0, self._skip_depth - 1)
+        if tag in ("p", "div", "tr"):
+            self._parts.append("\n")
+
+    def handle_data(self, data):
+        if self._skip_depth == 0:
+            self._parts.append(data)
+
+    def text(self) -> str:
+        raw = "".join(self._parts)
+        return re.sub(r"\n{3,}", "\n\n", raw).strip()
+
+
+def _strip_html(html: str) -> str:
+    p = _HTMLToText()
+    try:
+        p.feed(html)
+        return p.text()
+    except Exception:
+        return re.sub(r"<[^>]+>", " ", html)
+
+
+def _ext_for(mime: str) -> str:
+    return mime.split("/")[-1] if "/" in mime else "bin"
+
+
+# ---------------------------------------------------------------------------- caps
+
+# Keep bodies bounded to avoid a pathological 500MB message exploding the DB.
+_BODY_CAP_BYTES = 10 * 1024 * 1024  # 10MB
+
+
+def _cap_text(s: Optional[str]) -> Optional[str]:
+    if s is None:
+        return None
+    if len(s.encode("utf-8", errors="ignore")) <= _BODY_CAP_BYTES:
+        return s
+    return s[: _BODY_CAP_BYTES // 2] + "\n\n[TRUNCATED BY CRM — body exceeded 10MB]"
@@ -0,0 +1,462 @@
+"""
+HTTP route handlers for the Gmail integration.
+
+Designed to plug into server.py's CRMHandler (BaseHTTPRequestHandler) pattern.
+The hook is a single function call near the top of do_GET / do_POST that
+lets this module claim any /api/email/* request:
+
+    # in CRMHandler.do_GET and CRMHandler.do_POST, before the 404 fallthrough:
+    from email_integration.routes import try_handle
+    if try_handle(self):
+        return
+
+`try_handle(handler)` inspects `handler.command` and `handler.get_path()` and
+returns True if it handled the request (sent a response).
+
+Every handler respects the same auth / rate-limit model as the rest of server.py
+by calling handler.get_user() and handler.rate_limited(...).
+"""
+
+import json
+import sqlite3
+from typing import Optional
+
+from . import config as _cfg
+from . import credentials as _creds
+from . import crypto as _crypto
+from . import db as _db
+from . import scheduler as _sched
+
+
+# ---------------------------------------------------------------------------- dispatch
+
+_GET_ROUTES = {
+    "/api/email/status": "status",
+    "/api/email/accounts": "list_accounts",
+    "/api/email/threads": "list_threads",
+    "/api/email/oauth/start": "oauth_start",
+    "/api/email/oauth/callback": "oauth_callback",
+}
+
+_POST_ROUTES = {
+    "/api/email/accounts/enroll-all": "enroll_all",
+    "/api/email/accounts/enroll": "enroll_one",
+    "/api/email/sync/run-now": "run_now",
+    "/api/email/rematch": "rematch",
+}
+
+
+def try_handle(handler) -> bool:
+    path = handler.get_path()
+    method = handler.command
+    table = _GET_ROUTES if method == "GET" else _POST_ROUTES if method == "POST" else {}
+    name = table.get(path)
+    if not path.startswith("/api/email/"):
+        return False
+    if not name:
+        # Route is owned by this module but unknown — return a proper 404
+        # instead of letting the main dispatcher's 404 abuse counter fire.
+        handler.send_error_json("Not found", 404)
+        return True
+
+    if not _cfg.CONFIG.enabled:
+        handler.send_error_json("Email integration disabled", 503)
+        return True
+
+    # Also enforce attachment streaming under a different prefix
+    # (handled above via prefix check).
+
+    impl = globals().get(f"_h_{name}")
+    if impl is None:
+        handler.send_error_json("Not implemented", 500)
+        return True
+
+    try:
+        impl(handler)
+    except Exception as e:
+        handler.send_error_json(f"Internal error: {e}", 500)
+    return True
+
+
+# ---------------------------------------------------------------------------- helpers
+
+def _conn() -> sqlite3.Connection:
+    import os
+    db_path = os.environ.get(
+        "CRM_DB_PATH",
+        os.path.join(_cfg.CONFIG.data_dir, "crm.db"),
+    )
+    conn = sqlite3.connect(db_path)
+    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("PRAGMA foreign_keys=ON")
+    conn.execute("PRAGMA busy_timeout=5000")
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+def _require_auth(handler) -> Optional[dict]:
+    user = handler.get_user()
+    if not user:
+        handler.send_error_json("Unauthorized", 401)
+        return None
+    return user
+
+
+def _require_admin(handler) -> Optional[dict]:
+    user = _require_auth(handler)
+    if user is None:
+        return None
+    if user.get("role") != "admin":
+        handler.send_error_json("Admin required", 403)
+        return None
+    return user
+
+
+# ---------------------------------------------------------------------------- GET handlers
+
+def _h_status(handler):
+    user = _require_auth(handler)
+    if not user:
+        return
+    snap = _sched.status_snapshot()
+    conn = _conn()
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT COUNT(*) AS n_accounts, "
+            "SUM(CASE WHEN sync_status='active' THEN 1 ELSE 0 END) AS n_active, "
+            "SUM(CASE WHEN sync_status='error' THEN 1 ELSE 0 END) AS n_error "
+            "FROM email_accounts"
+        )
+        counts = dict(cur.fetchone() or {})
+        cur.execute("SELECT COUNT(*) AS n FROM emails WHERE match_status = 'matched'")
+        snap["matched_emails"] = cur.fetchone()["n"]
+    finally:
+        conn.close()
+    snap["accounts_summary"] = counts
+    handler.send_json(snap)
+
+
+def _h_list_accounts(handler):
+    user = _require_auth(handler)
+    if not user:
+        return
+    conn = _conn()
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT id, user_id, email_address, auth_method, sync_enabled, "
+            "sync_status, sync_error, last_synced_at, backfill_complete "
+            "FROM email_accounts ORDER BY email_address"
+        )
+        rows = [dict(r) for r in cur.fetchall()]
+    finally:
+        conn.close()
+    # Non-admins only see their own row
+    if user.get("role") != "admin":
+        rows = [r for r in rows if r["user_id"] == user["user_id"]]
+    handler.send_json({"accounts": rows})
+
+
+def _h_list_threads(handler):
+    user = _require_auth(handler)
+    if not user:
+        return
+    q = handler.get_query_params()
+    investor_id = q.get("investor_id")
+    limit = min(int(q.get("limit", 50)), 500)
+    conn = _conn()
+    try:
+        cur = conn.cursor()
+        if investor_id:
+            cur.execute(
+                """SELECT t.*
+                   FROM email_threads t
+                   JOIN emails e ON e.thread_id = t.id
+                   JOIN email_investor_links l ON l.email_id = e.id
+                   WHERE l.fundraising_investor_id = ?
+                      OR l.fundraising_contact_id IN (
+                         SELECT id FROM fundraising_contacts WHERE investor_id = ?
+                      )
+                   GROUP BY t.id
+                   ORDER BY t.last_message_at DESC
+                   LIMIT ?""",
+                (investor_id, investor_id, limit),
+            )
+        else:
+            cur.execute(
+                "SELECT * FROM email_threads WHERE is_matched = 1 "
+                "ORDER BY last_message_at DESC LIMIT ?",
+                (limit,),
+            )
+        threads = [dict(r) for r in cur.fetchall()]
+    finally:
+        conn.close()
+    handler.send_json({"threads": threads})
+
+
+def _h_oauth_start(handler):
+    """Begin per-user OAuth consent flow (fallback path)."""
+    user = _require_auth(handler)
+    if not user:
+        return
+    if _cfg.CONFIG.primary_auth != "oauth":
+        return handler.send_error_json(
+            "Per-user OAuth disabled (set CRM_GMAIL_AUTH_METHOD=oauth to enable)", 400
+        )
+    q = handler.get_query_params()
+    account_email = q.get("account_email") or ""
+    if not account_email:
+        return handler.send_error_json("account_email required", 400)
+
+    import secrets
+    import urllib.parse
+    state = secrets.token_urlsafe(32)
+    _oauth_state_store(state, user["user_id"], account_email)
+
+    params = {
+        "client_id": _cfg.CONFIG.oauth_client_id,
+        "redirect_uri": _cfg.CONFIG.oauth_redirect_uri,
+        "response_type": "code",
+        "scope": _creds.GMAIL_READONLY_SCOPE,
+        "access_type": "offline",
+        "prompt": "consent",
+        "state": state,
+        "login_hint": account_email,
+    }
+    url = "https://accounts.google.com/o/oauth2/v2/auth?" + urllib.parse.urlencode(params)
+    handler.send_json({"redirect_url": url})
+
+
+def _h_oauth_callback(handler):
+    """Exchange code for tokens, encrypt refresh token, store."""
+    q = handler.get_query_params()
+    code = q.get("code")
+    state = q.get("state")
+    if not code or not state:
+        return handler.send_error_json("code and state required", 400)
+
+    state_row = _oauth_state_consume(state)
+    if not state_row:
+        return handler.send_error_json("Invalid state", 400)
+
+    import urllib.parse
+    import urllib.request
+    body = urllib.parse.urlencode({
+        "code": code,
+        "client_id": _cfg.CONFIG.oauth_client_id,
+        "client_secret": _cfg.CONFIG.oauth_client_secret,
+        "redirect_uri": _cfg.CONFIG.oauth_redirect_uri,
+        "grant_type": "authorization_code",
+    }).encode("ascii")
+    req = urllib.request.Request(
+        "https://oauth2.googleapis.com/token",
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=15) as resp:
+            payload = json.loads(resp.read())
+    except Exception as e:
+        return handler.send_error_json(f"Token exchange failed: {e}", 500)
+
+    refresh = payload.get("refresh_token")
+    if not refresh:
+        return handler.send_error_json("No refresh_token returned (user may have previously consented; prompt=consent required)", 400)
+
+    enc = _crypto.encrypt(refresh.encode("ascii"), secret_key_b64=_cfg.CONFIG.secret_key_b64)
+
+    conn = _conn()
+    try:
+        _db.upsert_account(conn, user_id=state_row["user_id"],
+                           email_address=state_row["account_email"],
+                           auth_method="oauth")
+        conn.execute(
+            "UPDATE email_accounts SET oauth_refresh_enc = ?, sync_status = 'pending', "
+            "updated_at = datetime('now') WHERE email_address = ?",
+            (enc, state_row["account_email"]),
+        )
+        conn.commit()
+    finally:
+        conn.close()
+
+    handler.send_json({"ok": True, "account_email": state_row["account_email"]})
+
+
+# ---------------------------------------------------------------------------- POST handlers
+
+def _h_enroll_all(handler):
+    """Admin: enroll every CRM user whose email is @workspace_domain via DWD."""
+    user = _require_admin(handler)
+    if not user:
+        return
+    if _cfg.CONFIG.primary_auth != "dwd":
+        return handler.send_error_json("enroll-all only valid in DWD mode", 400)
+    domain = _cfg.CONFIG.workspace_domain
+    if not domain:
+        return handler.send_error_json("CRM_GMAIL_WORKSPACE_DOMAIN not set", 400)
+
+    conn = _conn()
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT id, email FROM users WHERE is_active = 1 AND email LIKE ?",
+            (f"%@{domain}",),
+        )
+        users = cur.fetchall()
+        created = []
+        for u in users:
+            aid = _db.upsert_account(conn, user_id=u["id"],
+                                     email_address=u["email"].lower(),
+                                     auth_method="dwd")
+            created.append({"account_id": aid, "email": u["email"]})
+        conn.commit()
+    finally:
+        conn.close()
+    handler.send_json({"enrolled": created, "count": len(created)})
+
+
+def _h_enroll_one(handler):
+    user = _require_admin(handler)
+    if not user:
+        return
+    body = handler.get_body() or {}
+    # Accept either `email` or `email_address` for ergonomics.
+    email_address = (body.get("email_address") or body.get("email") or "").lower().strip()
+    user_id = body.get("user_id")
+    auth_method = body.get("auth_method") or _cfg.CONFIG.primary_auth
+
+    if not email_address:
+        return handler.send_error_json("email (or email_address) required", 400)
+
+    # If the caller didn't specify a CRM user_id, resolve it from the
+    # users table by matching email. Falls back to the authenticated
+    # admin's own id (handles the common case of a single admin
+    # enrolling themselves without having to paste their UUID).
+    if not user_id:
+        conn = _conn()
+        try:
+            cur = conn.cursor()
+            cur.execute("SELECT id FROM users WHERE LOWER(email) = ?",
+                        (email_address,))
+            row = cur.fetchone()
+            user_id = row["id"] if row else user.get("id")
+        finally:
+            conn.close()
+
+    if not user_id:
+        return handler.send_error_json("could not resolve user_id for that email", 400)
+
+    conn = _conn()
+    try:
+        aid = _db.upsert_account(conn, user_id=user_id,
+                                 email_address=email_address,
+                                 auth_method=auth_method)
+        conn.commit()
+    finally:
+        conn.close()
+    handler.send_json({"account_id": aid, "email": email_address, "user_id": user_id})
+
+
+def _h_run_now(handler):
+    user = _require_admin(handler)
+    if not user:
+        return
+    # Reuse existing rate limit so admins can't hammer this.
+    if handler.rate_limited("email-sync-now", 6):
+        return handler.send_error_json("Too many requests", 429)
+    result = _sched.trigger_run_now()
+    handler.send_json(result)
+
+
+def _h_rematch(handler):
+    """Re-evaluate unmatched emails against the current investor index."""
+    user = _require_admin(handler)
+    if not user:
+        return
+    body = handler.get_body() or {}
+    since = body.get("since")  # optional ISO8601
+    conn = _conn()
+    scanned = 0
+    matched = 0
+    try:
+        from .matcher import InvestorIndex
+        index = InvestorIndex(own_domain=_cfg.CONFIG.workspace_domain)
+        index.rebuild(_conn)
+        cur = conn.cursor()
+        sql = ("SELECT id, from_email, to_emails_json, cc_emails_json "
+               "FROM emails WHERE match_status = 'unmatched'")
+        params: list = []
+        if since:
+            sql += " AND sent_at >= ?"
+            params.append(since)
+        sql += " ORDER BY sent_at DESC LIMIT 10000"
+        cur.execute(sql, params)
+        for row in cur.fetchall():
+            scanned += 1
+            participants = set()
+            if row["from_email"]:
+                participants.add(row["from_email"].lower())
+            for col in ("to_emails_json", "cc_emails_json"):
+                try:
+                    arr = json.loads(row[col] or "[]")
+                except Exception:
+                    arr = []
+                for a in arr:
+                    e = a.get("email") if isinstance(a, dict) else a
+                    if e:
+                        participants.add(e.lower())
+            links = index.match(participants)
+            if not links:
+                continue
+            matched += 1
+            conn.execute(
+                "UPDATE emails SET match_status='matched', is_matched=1, "
+                "updated_at=datetime('now') WHERE id=?",
+                (row["id"],),
+            )
+            for link in links:
+                _db.insert_investor_link(conn, email_id=row["id"], link={
+                    "matched_address": link.matched_address,
+                    "match_kind": link.match_kind,
+                    "match_confidence": link.match_confidence,
+                    "fundraising_investor_id": link.target.fundraising_investor_id,
+                    "fundraising_contact_id": link.target.fundraising_contact_id,
+                    "contact_id": link.target.contact_id,
+                    "organization_id": link.target.organization_id,
+                })
+            # NOTE: body is still missing — we only have headers. A follow-up
+            # job can re-fetch the full message from Gmail using the sighting's
+            # gmail_message_id. Not done inline to keep this endpoint fast.
+        conn.commit()
+    finally:
+        conn.close()
+    handler.send_json({"scanned": scanned, "newly_matched": matched})
+
+
+# ---------------------------------------------------------------------------- OAuth state store (in-memory)
+# For a 5-person CRM the state store doesn't need to be durable — a server
+# restart between start and callback is rare and just requires a retry.
+
+_oauth_states: dict[str, dict] = {}
+_oauth_state_lock = __import__("threading").Lock()
+
+
+def _oauth_state_store(state: str, user_id: str, account_email: str) -> None:
+    import time
+    with _oauth_state_lock:
+        # Prune stale entries (>10 min).
+        cutoff = time.time() - 600
+        for k, v in list(_oauth_states.items()):
+            if v["created"] < cutoff:
+                _oauth_states.pop(k, None)
+        _oauth_states[state] = {
+            "user_id": user_id,
+            "account_email": account_email.lower().strip(),
+            "created": time.time(),
+        }
+
+
+def _oauth_state_consume(state: str) -> Optional[dict]:
+    with _oauth_state_lock:
+        return _oauth_states.pop(state, None)
@@ -0,0 +1,143 @@
+"""
+Background sync scheduler.
+
+Runs as a daemon thread started from server.py main(). One thread; it wakes
+every `sync_interval_sec`, processes all accounts serially, sleeps again.
+
+Singleton: start_sync_scheduler() is idempotent — calling twice won't spawn
+a second thread. stop_sync_scheduler() gracefully signals shutdown (not
+strictly needed since it's daemon, but useful for tests).
+"""
+
+import logging
+import sqlite3
+import threading
+import time
+from typing import Callable, Optional
+
+from . import config as _cfg
+from . import credentials as _creds
+from . import sync as _sync
+from .matcher import InvestorIndex
+
+
+log = logging.getLogger("email_integration.scheduler")
+
+
+_state: dict[str, object] = {
+    "thread": None,
+    "stop": threading.Event(),
+    "last_run": 0.0,
+    "last_result": None,
+    "running_now": False,
+}
+
+
+def _conn_factory_from_env() -> Callable[[], sqlite3.Connection]:
+    """Build a get_db() compatible with server.py's pattern.
+
+    We don't import server.py (avoid circular / startup ordering). Instead
+    we re-implement the same settings. If server.py's DB path differs from
+    the default, CRM_DB_PATH env var should be set — same mechanism.
+    """
+    import os
+    db_path = os.environ.get(
+        "CRM_DB_PATH",
+        os.path.join(_cfg.CONFIG.data_dir, "crm.db"),
+    )
+
+    def get_db() -> sqlite3.Connection:
+        conn = sqlite3.connect(db_path)
+        conn.execute("PRAGMA journal_mode=WAL")
+        conn.execute("PRAGMA foreign_keys=ON")
+        conn.execute("PRAGMA busy_timeout=5000")
+        conn.row_factory = sqlite3.Row
+        return conn
+
+    return get_db
+
+
+def start_sync_scheduler(conn_factory: Optional[Callable] = None) -> None:
+    if _state["thread"] is not None:
+        return  # already running
+
+    if not _cfg.CONFIG.enabled:
+        log.info("email_integration not enabled; scheduler will not start")
+        return
+
+    factory = conn_factory or _conn_factory_from_env()
+
+    try:
+        provider = _creds.build_provider(factory)
+    except Exception as e:
+        log.exception("cannot build credential provider: %s", e)
+        return
+
+    index = InvestorIndex(own_domain=_cfg.CONFIG.workspace_domain)
+    try:
+        index.rebuild(factory)
+    except Exception:
+        log.exception("initial investor-index build failed; scheduler continues")
+
+    stop = threading.Event()
+    _state["stop"] = stop
+
+    def _loop():
+        log.info("email sync scheduler started; interval=%ss", _cfg.CONFIG.sync_interval_sec)
+        # First cycle: short delay to let server finish startup.
+        if stop.wait(10):
+            return
+        while not stop.is_set():
+            _state["running_now"] = True
+            t0 = time.time()
+            try:
+                result = _sync.sync_all(factory, provider, index)
+                _state["last_result"] = result
+            except Exception:
+                log.exception("sync loop crashed; will retry next cycle")
+            finally:
+                _state["running_now"] = False
+                _state["last_run"] = t0
+            if stop.wait(_cfg.CONFIG.sync_interval_sec):
+                return
+
+    t = threading.Thread(target=_loop, name="email-sync", daemon=True)
+    t.start()
+    _state["thread"] = t
+    _state["provider"] = provider
+    _state["index"] = index
+    _state["factory"] = factory
+
+
+def stop_sync_scheduler() -> None:
+    ev: threading.Event = _state["stop"]  # type: ignore
+    ev.set()
+    t = _state.get("thread")
+    if t:
+        try:
+            t.join(timeout=5)
+        except Exception:
+            pass
+    _state["thread"] = None
+
+
+def trigger_run_now() -> dict:
+    """Force a single sync pass synchronously (admin 'sync now' endpoint)."""
+    if _state.get("running_now"):
+        return {"status": "already_running"}
+    factory = _state.get("factory")
+    provider = _state.get("provider")
+    index = _state.get("index")
+    if not (factory and provider and index):
+        return {"status": "not_initialized"}
+    return _sync.sync_all(factory, provider, index)  # type: ignore
+
+
+def status_snapshot() -> dict:
+    return {
+        "enabled": _cfg.CONFIG.enabled,
+        "running": _state["running_now"],
+        "last_run_unix": _state.get("last_run"),
+        "last_result": _state.get("last_result"),
+        "interval_sec": _cfg.CONFIG.sync_interval_sec,
+    }
@@ -0,0 +1,390 @@
+"""
+Sync orchestrator.
+
+Top-level entry points:
+
+  sync_account(conn_factory, credential_provider, account_row, matcher)
+      Full sync pass for one mailbox. Decides backfill vs. incremental based
+      on email_accounts.backfill_complete. Writes a sync_runs row.
+
+  sync_all(conn_factory, credential_provider, matcher)
+      Iterates every sync-enabled account sequentially. Called from
+      scheduler.py every CRM_GMAIL_SYNC_INTERVAL_MIN minutes.
+
+Design: match-only storage (see architecture doc §7). For each message:
+  1. Fetch metadata (cheap, 5 units).
+  2. Run matcher against participant addresses.
+  3. If matched → fetch full message, parse, persist body + register attachments.
+  4. If unmatched → persist header-only row.
+  5. In both cases, record the per-account sighting.
+"""
+
+import logging
+import sqlite3
+import traceback
+from typing import Optional
+
+from . import attachments as _attach
+from . import config as _cfg
+from . import db as _db
+from . import errors as _errors
+from . import gmail_client as _gmail
+from . import parser as _parser
+from . import threads as _threads
+from .matcher import InvestorIndex, InvestorLink
+
+
+log = logging.getLogger("email_integration.sync")
+
+
+METADATA_HEADERS = [
+    "From", "To", "Cc", "Bcc", "Subject", "Date",
+    "Message-ID", "In-Reply-To", "References", "Reply-To",
+]
+
+
+# ---------------------------------------------------------------------------- public
+
+def sync_all(conn_factory, credential_provider, index: InvestorIndex) -> dict:
+    """Run one pass across all enabled accounts. Returns summary stats."""
+    index.rebuild_if_stale(conn_factory)
+
+    conn = conn_factory()
+    try:
+        accounts = _db.list_sync_ready_accounts(conn)
+    finally:
+        conn.close()
+
+    totals = {"accounts": 0, "messages_stored": 0, "errors": 0}
+    for acc in accounts:
+        totals["accounts"] += 1
+        try:
+            stats = sync_account(conn_factory, credential_provider, acc, index)
+            totals["messages_stored"] += stats.get("messages_stored", 0)
+        except Exception:
+            totals["errors"] += 1
+            log.exception("sync failed for account %s", acc["email_address"])
+    return totals
+
+
+def sync_account(conn_factory, credential_provider, account,
+                 index: InvestorIndex) -> dict:
+    """Sync a single mailbox. Returns stats dict."""
+    email_addr = account["email_address"]
+    stats = _gmail.CallStats()
+    client = _gmail.GmailClient(credential_provider, email_addr, stats=stats)
+
+    # Mark running
+    conn = conn_factory()
+    try:
+        run_id = _db.start_sync_run(conn,
+                                    account_id=account["id"],
+                                    kind="backfill" if not account["backfill_complete"] else "incremental")
+        _db.set_account_status(conn, account["id"], status="active", error=None)
+        conn.commit()
+    finally:
+        conn.close()
+
+    run_stats = {"messages_seen": 0, "messages_stored": 0, "attachments_saved": 0}
+    error_str: Optional[str] = None
+    status = "ok"
+
+    try:
+        if not account["backfill_complete"]:
+            _run_backfill(conn_factory, client, account, index, run_stats)
+        else:
+            _run_incremental(conn_factory, client, account, index, run_stats)
+
+        # Drain attachments for this account.
+        conn = conn_factory()
+        try:
+            # Limit to a few cycles' worth of attachments per pass.
+            batched = _attach.drain_pending(conn_factory, client, account["id"], limit=100)
+            run_stats["attachments_saved"] = batched
+        finally:
+            conn.close()
+
+    except _errors.AuthError as e:
+        error_str = f"auth: {e}"
+        status = "error"
+    except _errors.HistoryExpiredError:
+        # Recover: reset to date-based backfill from last_synced_at.
+        error_str = "history expired; fallback to date backfill"
+        status = "partial"
+        _fallback_date_backfill(conn_factory, client, account, index, run_stats)
+    except Exception as e:
+        error_str = f"unexpected: {type(e).__name__}: {e}"
+        status = "error"
+        log.exception("unexpected during sync of %s", email_addr)
+    finally:
+        run_stats["api_calls"] = stats.api_calls
+        run_stats["retries"] = stats.retries
+        conn = conn_factory()
+        try:
+            _db.finish_sync_run(conn, run_id, status=status, stats=run_stats, error=error_str)
+            _db.set_account_status(conn, account["id"],
+                                   status="active" if status == "ok" else status,
+                                   error=error_str)
+            _db.set_account_checkpoint(conn, account["id"],
+                                       last_synced_at=_db._now_iso())
+            conn.commit()
+        finally:
+            conn.close()
+
+    return run_stats
+
+
+# ---------------------------------------------------------------------------- backfill
+
+def _run_backfill(conn_factory, client, account, index: InvestorIndex,
+                  run_stats: dict) -> None:
+    """Initial full-mailbox backfill, resumable via backfill_cursor."""
+    page_token = account["backfill_cursor"]
+    while True:
+        resp = client.list_messages(page_token=page_token,
+                                    max_results=_cfg.CONFIG.backfill_page_size)
+        messages = resp.get("messages") or []
+        for m in messages:
+            run_stats["messages_seen"] += 1
+            try:
+                _process_one_message(conn_factory, client, account, index,
+                                     gmail_message_id=m["id"], run_stats=run_stats)
+            except _errors.GmailError as e:
+                log.warning("skip msg %s on %s: %s", m["id"], account["email_address"], e)
+                continue
+
+        page_token = resp.get("nextPageToken")
+        conn = conn_factory()
+        try:
+            _db.set_account_checkpoint(conn, account["id"],
+                                       backfill_cursor=page_token,
+                                       backfill_complete=(not page_token))
+            conn.commit()
+        finally:
+            conn.close()
+
+        if not page_token:
+            # Capture current historyId as checkpoint for future incrementals.
+            prof = client.get_profile()
+            hid = prof.get("historyId")
+            if hid:
+                conn = conn_factory()
+                try:
+                    _db.set_account_checkpoint(conn, account["id"], history_id=str(hid))
+                    conn.commit()
+                finally:
+                    conn.close()
+            return
+
+
+# ---------------------------------------------------------------------------- incremental
+
+def _run_incremental(conn_factory, client, account, index: InvestorIndex,
+                     run_stats: dict) -> None:
+    start_hid = account["last_history_id"]
+    if not start_hid:
+        # Safety: if checkpoint is missing, re-enter backfill.
+        _run_backfill(conn_factory, client, account, index, run_stats)
+        return
+
+    # history_types filter limits bandwidth to what we care about.
+    new_hid: Optional[str] = None
+    try:
+        for h in client.iter_history(
+            start_history_id=start_hid,
+            history_types=["messageAdded", "messageDeleted", "labelAdded", "labelRemoved"],
+        ):
+            for ma in h.get("messagesAdded") or []:
+                msg = ma.get("message") or {}
+                run_stats["messages_seen"] += 1
+                try:
+                    _process_one_message(conn_factory, client, account, index,
+                                         gmail_message_id=msg.get("id"),
+                                         run_stats=run_stats)
+                except _errors.GmailError as e:
+                    log.warning("skip msg %s on %s: %s", msg.get("id"), account["email_address"], e)
+
+            for md in h.get("messagesDeleted") or []:
+                msg = md.get("message") or {}
+                conn = conn_factory()
+                try:
+                    _db.tombstone_sighting(
+                        conn,
+                        account_id=account["id"],
+                        gmail_message_id=msg.get("id"),
+                    )
+                    conn.commit()
+                finally:
+                    conn.close()
+
+            for la in (h.get("labelsAdded") or []) + (h.get("labelsRemoved") or []):
+                msg = la.get("message") or {}
+                # labels are the resulting label set in Gmail's payload after
+                # the change. We refresh them wholesale.
+                labels = msg.get("labelIds") or []
+                conn = conn_factory()
+                try:
+                    _db.update_sighting_labels(
+                        conn,
+                        account_id=account["id"],
+                        gmail_message_id=msg.get("id"),
+                        labels=labels,
+                    )
+                    conn.commit()
+                finally:
+                    conn.close()
+        new_hid = client.last_history_id
+    except _errors.HistoryExpiredError:
+        raise
+
+    if new_hid:
+        conn = conn_factory()
+        try:
+            _db.set_account_checkpoint(conn, account["id"], history_id=str(new_hid))
+            conn.commit()
+        finally:
+            conn.close()
+
+
+def _fallback_date_backfill(conn_factory, client, account, index, run_stats):
+    """Used when startHistoryId has been pruned by Gmail.
+
+    Pulls everything since last_synced_at (or 14d if unknown), which will
+    hit a large overlap with existing data but upserts are idempotent.
+    """
+    from datetime import datetime, timedelta, timezone
+    since = account["last_synced_at"] or (
+        datetime.now(tz=timezone.utc) - timedelta(days=14)
+    ).strftime("%Y-%m-%dT%H:%M:%SZ")
+    q = f"after:{since.replace('-', '/').split('T')[0]}"
+    for m in client.iter_messages(q=q):
+        run_stats["messages_seen"] += 1
+        try:
+            _process_one_message(conn_factory, client, account, index,
+                                 gmail_message_id=m["id"], run_stats=run_stats)
+        except _errors.GmailError as e:
+            log.warning("skip during date-backfill msg %s: %s", m["id"], e)
+    prof = client.get_profile()
+    hid = prof.get("historyId")
+    if hid:
+        conn = conn_factory()
+        try:
+            _db.set_account_checkpoint(conn, account["id"], history_id=str(hid))
+            conn.commit()
+        finally:
+            conn.close()
+
+
+# ---------------------------------------------------------------------------- per-message
+
+def _process_one_message(conn_factory, client, account, index: InvestorIndex,
+                         *, gmail_message_id: str, run_stats: dict) -> None:
+    """Fetch, match, persist one message. Idempotent."""
+    if not gmail_message_id:
+        return
+
+    # Skip if we've already sighted this message for this account.
+    conn = conn_factory()
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT email_id FROM email_account_messages "
+            "WHERE account_id = ? AND gmail_message_id = ?",
+            (account["id"], gmail_message_id),
+        )
+        if cur.fetchone():
+            return
+    finally:
+        conn.close()
+
+    # 1. Metadata fetch (cheap).
+    meta = client.get_message(gmail_message_id, format="metadata",
+                              metadata_headers=METADATA_HEADERS)
+    meta_parsed = _parser.parse(meta, owning_account_address=account["email_address"])
+
+    participants = set()
+    if meta_parsed.get("from_email"):
+        participants.add(meta_parsed["from_email"])
+    for kind in ("to", "cc", "bcc"):
+        for a in meta_parsed.get(kind, []):
+            if isinstance(a, dict) and a.get("email"):
+                participants.add(a["email"])
+
+    # Exclude owning account's own address so we don't try to "match" ourselves.
+    own = {account["email_address"].lower()}
+    links = index.match(participants, exclude_addresses=own)
+    is_matched = bool(links)
+
+    # 2. If matched, fetch full and parse for body + attachments.
+    if is_matched:
+        full = client.get_message(gmail_message_id, format="full")
+        parsed = _parser.parse(full, owning_account_address=account["email_address"])
+    else:
+        parsed = meta_parsed
+        # Strip any body fields (metadata fetch shouldn't have them but be safe).
+        parsed["body_text"] = None
+        parsed["body_html"] = None
+        parsed["attachments"] = []
+
+    # 3. Persist (idempotent on rfc_message_id).
+    conn = conn_factory()
+    try:
+        existing = _db.find_email_by_rfc_id(conn, parsed["rfc_message_id"])
+        if existing:
+            email_id = existing["id"]
+            # If the email was previously unmatched but now matches (e.g. user
+            # added the investor after first sight), upgrade the row.
+            if is_matched and existing["match_status"] == "unmatched":
+                conn.execute(
+                    "UPDATE emails SET match_status = 'matched', is_matched = 1, "
+                    "body_text = ?, body_html = ?, updated_at = datetime('now') "
+                    "WHERE id = ?",
+                    (parsed.get("body_text"), parsed.get("body_html"), email_id),
+                )
+                _attach.register_stubs(conn,
+                                       email_id=email_id,
+                                       parsed_attachments=parsed.get("attachments") or [])
+                for link in links:
+                    _db.insert_investor_link(conn, email_id=email_id, link=_flatten_link(link))
+        else:
+            match_status = "matched" if is_matched else "unmatched"
+            email_id = _db.insert_email(conn, parsed=parsed, match_status=match_status)
+            thread_id = _threads.resolve_thread_id(conn, parsed)
+            _db.set_email_thread(conn, email_id, thread_id)
+            if is_matched:
+                _attach.register_stubs(conn,
+                                       email_id=email_id,
+                                       parsed_attachments=parsed.get("attachments") or [])
+                for link in links:
+                    _db.insert_investor_link(conn, email_id=email_id, link=_flatten_link(link))
+            _db.rollup_thread(conn, thread_id)
+            run_stats["messages_stored"] += 1
+
+        # Record sighting (always, even if email row was pre-existing).
+        _db.upsert_sighting(
+            conn,
+            email_id=email_id,
+            account_id=account["id"],
+            gmail_message_id=gmail_message_id,
+            gmail_thread_id=parsed.get("gmail_thread_id") or "",
+            labels=parsed.get("labels", []),
+            is_sent=parsed.get("is_sent", False),
+        )
+        conn.commit()
+    except sqlite3.IntegrityError:
+        # Concurrent insert race — re-read and proceed.
+        pass
+    finally:
+        conn.close()
+
+
+def _flatten_link(link: InvestorLink) -> dict:
+    return {
+        "matched_address": link.matched_address,
+        "match_kind": link.match_kind,
+        "match_confidence": link.match_confidence,
+        "fundraising_investor_id": link.target.fundraising_investor_id,
+        "fundraising_contact_id": link.target.fundraising_contact_id,
+        "contact_id": link.target.contact_id,
+        "organization_id": link.target.organization_id,
+    }
@@ -0,0 +1,75 @@
+"""
+Threading resolution.
+
+Given a freshly-inserted emails row (or its about-to-be-inserted parsed dict),
+figure out which email_threads row it belongs to. If none exists, create one.
+
+Priority order (see architecture doc §10):
+  1. Existing email in our DB that shares any RFC Message-ID with this one's
+     References/In-Reply-To chain — inherit its thread.
+  2. Existing thread with the same gmail_thread_id.
+  3. Existing thread with the same rfc_thread_root_id.
+  4. Create a new thread.
+"""
+
+import re
+import sqlite3
+from typing import Optional
+
+from . import db as _db
+
+
+SUBJECT_PREFIX_RE = re.compile(r"^\s*(re|fwd?|aw|sv|antw|回复|fw)\s*:\s*", re.IGNORECASE)
+
+
+def normalize_subject(s: Optional[str]) -> Optional[str]:
+    if not s:
+        return None
+    out = s
+    # Strip up to 5 nested Re:/Fwd: prefixes.
+    for _ in range(5):
+        new = SUBJECT_PREFIX_RE.sub("", out, count=1)
+        if new == out:
+            break
+        out = new
+    return out.strip().lower()
+
+
+def resolve_thread_id(conn: sqlite3.Connection, parsed: dict) -> str:
+    """Returns a thread_id — either an existing one or a newly created one."""
+    # Step 1: RFC cross-link.
+    candidates = list(parsed.get("references") or [])
+    if parsed.get("in_reply_to"):
+        candidates.append(parsed["in_reply_to"])
+
+    if candidates:
+        existing_email_id = _db.find_email_id_by_any_rfc_id(conn, candidates)
+        if existing_email_id:
+            cur = conn.cursor()
+            cur.execute("SELECT thread_id FROM emails WHERE id = ?", (existing_email_id,))
+            row = cur.fetchone()
+            if row and row["thread_id"]:
+                return row["thread_id"]
+
+    # Step 2: gmail_thread_id match.
+    gt = parsed.get("gmail_thread_id")
+    if gt:
+        existing = _db.find_thread_by_gmail_id(conn, gt)
+        if existing:
+            return existing["id"]
+
+    # Step 3: RFC thread-root match.
+    rfc_root = parsed.get("rfc_thread_root_id")
+    if rfc_root:
+        existing = _db.find_thread_by_rfc_root(conn, rfc_root)
+        if existing:
+            return existing["id"]
+
+    # Step 4: create.
+    return _db.create_thread(
+        conn,
+        gmail_thread_id=gt,
+        rfc_thread_root_id=rfc_root,
+        subject_normalized=normalize_subject(parsed.get("subject")),
+        first_message_at=parsed.get("sent_at"),
+    )