# Gmail Integration — Enablement Runbook *How to turn on the (already-built) Gmail correspondence integration on the live Start9 box, validate it with a small observed backfill, then roll out to the domain. Read-only capture; all mail stays on Ten31 infrastructure.* Code: `backend/email_integration/`. Schema: `migrations/0001_email_tables.sql`. See `docs/crm-overview.md` §2.4 for the data model. --- ## What this does & the sovereignty posture - Pulls Gmail messages for enrolled `@ten31.xyz` mailboxes into the CRM's own SQLite DB (`emails`, `email_threads`, `email_attachments`, …), **deduped across inboxes**, **threaded**, and **matched** to investors/contacts (`email_investor_links`). - **Scope is `https://www.googleapis.com/auth/gmail.readonly`** (`credentials.py:34`) — the integration can *read* mail, never send or modify. Lower risk, and it's all the ingest needs. - **Data path is Google → your Start9 box only.** No new third party, and per guardrail #9 Claude never reads the mail — the correspondence becomes ingest input for *local* embeddings (bge-m3 on the Sparks), not API context. (Contrast with Superhuman's MCP — see §7.) ## 0. Pick the auth method | Method | When | What you provide | |---|---|---| | **DWD (domain-wide delegation)** — recommended | You administer the `ten31.xyz` Google Workspace and want to capture team mailboxes without per-user consent | One service-account JSON key + a Workspace admin authorization | | **Per-user OAuth** | Capturing a mailbox you don't admin, or avoiding DWD | OAuth client id/secret + each user clicks through `/api/email/oauth/start` | The Start9 0.4 entrypoint is built around **DWD** (auto-detects the key, sets `CRM_GMAIL_AUTH_METHOD=dwd`, `CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz`). The rest of this runbook assumes DWD. ## 1. Google-side setup (one time) You need Workspace **super-admin** + a GCP project. 1. **GCP project** → enable the **Gmail API** (`APIs & Services → Library → Gmail API → Enable`). 2. **Create a service account** (`IAM & Admin → Service Accounts`). Note its **client ID** (a long number) and its email. 3. **Create a JSON key** for it (`Keys → Add key → JSON`). This file is the secret — handle per guardrail #7. 4. **Authorize domain-wide delegation** in the Workspace **Admin console** (`Security → Access and data control → API controls → Domain-wide delegation → Add new`): - **Client ID** = the service account's client ID from step 2. - **OAuth scopes** = `https://www.googleapis.com/auth/gmail.readonly` - Save. (Without this exact scope authorized, sync returns a non-retryable auth error — see `errors.py:21`.) ## 2. Install the key on Start9 1. Copy the JSON key to the service's data volume at **`/data/secrets/gmail-service-account.json`**. 2. Lock it down: `chmod 600 /data/secrets/gmail-service-account.json` (the entrypoint also `chmod 700`s `/data/secrets`). 3. **Restart the service.** On boot the 0.4 entrypoint detects the key and exports: `CRM_GMAIL_INTEGRATION_ENABLED=true`, `CRM_GMAIL_AUTH_METHOD=dwd`, `CRM_GMAIL_SA_KEY_PATH=/data/secrets/gmail-service-account.json`, `CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz`, `CRM_GMAIL_SYNC_INTERVAL_MIN=180`. It logs `Gmail integration: ENABLED (key at …)`. ## 3. Smoke test — ONE mailbox first (the "don't rush it" gate) Do a single-mailbox run before enrolling the whole team, to shake out auth/matching bugs on a small surface. All calls need an **admin Bearer token**: ```bash CRM=https:// # the CRM's address TOKEN=$(curl -sk $CRM/api/auth/login -H 'Content-Type: application/json' \ -d '{"username":"","password":""}' | python3 -c 'import sys,json;print(json.load(sys.stdin)["token"])') # integration alive? curl -sk $CRM/api/email/status -H "Authorization: Bearer $TOKEN" # enroll just yourself curl -sk $CRM/api/email/accounts/enroll -H "Authorization: Bearer $TOKEN" \ -H 'Content-Type: application/json' -d '{"email":"you@ten31.xyz"}' # trigger a sync now (otherwise it runs every 180 min) curl -sk $CRM/api/email/sync/run-now -X POST -H "Authorization: Bearer $TOKEN" ``` **Tip:** to keep the first backfill small, set `CRM_GMAIL_BACKFILL_PAGE_SIZE` low (e.g. `50`) before the restart, watch one page land, then raise it. ## 4. Verify (on the box, read-only SQL) ```sql -- sync ran cleanly? SELECT kind, status, messages_seen, messages_stored, attachments_saved, error FROM email_sync_runs ORDER BY started_at DESC LIMIT 3; -- mail captured + how much got matched to investors/contacts SELECT COUNT(*) total, SUM(is_matched) matched FROM emails; -- who did it match, and how confidently? SELECT match_kind, COUNT(*) FROM email_investor_links GROUP BY match_kind; ``` Or via the API: `GET /api/email/status` (counts) and `GET /api/email/threads?investor_id=` (matched threads for one investor). If matching looks thin, run `POST /api/email/rematch` with `{"since":""}` after the investor list is populated. ## 5. Roll out to the domain Once the single mailbox looks right: ```bash curl -sk $CRM/api/email/accounts/enroll-all -X POST -H "Authorization: Bearer $TOKEN" curl -sk $CRM/api/email/sync/run-now -X POST -H "Authorization: Bearer $TOKEN" ``` Incremental sync then runs every `CRM_GMAIL_SYNC_INTERVAL_MIN` (default 180) via the scheduler thread. ## 6. Tuning knobs (env, `config.py`) `CRM_GMAIL_SYNC_INTERVAL_MIN` (180) · `CRM_GMAIL_BACKFILL_PAGE_SIZE` (500) · `CRM_GMAIL_MAX_ATTACHMENT_MB` (50) · `CRM_GMAIL_ATTACH_CONCURRENCY` (4) · `CRM_GMAIL_RATE_UNITS_SEC` (150) · `CRM_GMAIL_HISTORY_STALE_DAYS` (5, forces a backfill if Gmail pruned history). ## 7. Where Superhuman fits (and where it doesn't) You have Superhuman connected to Gmail, and it exposes an MCP server. The two are **complementary, not competing**, and it matters which job each does: - **Canonical correspondence ingest → use this DWD integration, not Superhuman.** It pulls mail straight into your own `crm.db` on Start9 and feeds the *local* embedding pipeline. Routing bulk ingest through Superhuman's MCP would put your email content through Superhuman's servers and — because an agent/Claude would be driving those calls — through Anthropic, which is exactly what guardrail #1 keeps the corpus away from. DWD keeps the data path Google → your box. - **Human mail workflow & drafting → Superhuman MCP is great.** Reading/triaging your own inbox, and Closer-style *draft* generation that a human reviews and sends, naturally happen in your real mail client. The `batch-draft-writer` skill already drives the Superhuman MCP for that, and it's usable today — independent of the CRM pipeline. Net: **DWD = system-of-record correspondence (sovereign, for retrieval). Superhuman MCP = the human's working surface (drafting, triage).** Don't make Superhuman the ingest source of truth. ## 8. Disable / rollback Remove (or rename) `/data/secrets/gmail-service-account.json` and restart → the entrypoint logs `DISABLED` and routes return 503; captured data remains. To pause one mailbox without disabling the whole integration, set its `email_accounts.sync_enabled = 0`. ## 9. Troubleshooting - **401/403 from Google on sync** → DWD scope not authorized, wrong client ID, or Gmail API not enabled (steps 1 & 4). This error is non-retryable by design (`errors.py`). - **`status` says disabled / routes 503** → key not found at `CRM_GMAIL_SA_KEY_PATH`, or `CRM_GMAIL_INTEGRATION_ENABLED` not truthy (the entrypoint only sets it when the key file exists). - **Mail captured but `matched = 0`** → the investor/contact list was empty or addresses don't match; populate the CRM/grid first, then `POST /api/email/rematch`. - **Bodies missing on some emails** → by design, unmatched emails are stored metadata-only (no body) until matched (`sync.py`); re-match to backfill.