Files
ten31-database/docs/gmail-enablement-runbook.md
T
Keysat c7ce44d963 Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server
Workstream A–C substrate for the Ten31 agentic system:
- A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9
- A2: additive/reversible core migration (canonical_entities, entity_links,
  interaction_log, relationship_edges, soft-delete) + ledgered runner
- B1/B3: chunking + deterministic entity resolution (backend/ingest)
- B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks
- C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools
- docs: redaction/re-hydration, Gmail enablement runbook
- synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db,
  drop legacy files + start9/0.3.5)

Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity
queries). Real backfill runs on Ten31 infra; index holds synthetic data only.
Branch snapshot also captures pre-existing working-tree changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 08:13:35 -05:00

7.7 KiB

Gmail Integration — Enablement Runbook

How to turn on the (already-built) Gmail correspondence integration on the live Start9 box, validate it with a small observed backfill, then roll out to the domain. Read-only capture; all mail stays on Ten31 infrastructure.

Code: backend/email_integration/. Schema: migrations/0001_email_tables.sql. See docs/crm-overview.md §2.4 for the data model.


What this does & the sovereignty posture

  • Pulls Gmail messages for enrolled @ten31.xyz mailboxes into the CRM's own SQLite DB (emails, email_threads, email_attachments, …), deduped across inboxes, threaded, and matched to investors/contacts (email_investor_links).
  • Scope is https://www.googleapis.com/auth/gmail.readonly (credentials.py:34) — the integration can read mail, never send or modify. Lower risk, and it's all the ingest needs.
  • Data path is Google → your Start9 box only. No new third party, and per guardrail #9 Claude never reads the mail — the correspondence becomes ingest input for local embeddings (bge-m3 on the Sparks), not API context. (Contrast with Superhuman's MCP — see §7.)

0. Pick the auth method

Method When What you provide
DWD (domain-wide delegation) — recommended You administer the ten31.xyz Google Workspace and want to capture team mailboxes without per-user consent One service-account JSON key + a Workspace admin authorization
Per-user OAuth Capturing a mailbox you don't admin, or avoiding DWD OAuth client id/secret + each user clicks through /api/email/oauth/start

The Start9 0.4 entrypoint is built around DWD (auto-detects the key, sets CRM_GMAIL_AUTH_METHOD=dwd, CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz). The rest of this runbook assumes DWD.

1. Google-side setup (one time)

You need Workspace super-admin + a GCP project.

  1. GCP project → enable the Gmail API (APIs & Services → Library → Gmail API → Enable).
  2. Create a service account (IAM & Admin → Service Accounts). Note its client ID (a long number) and its email.
  3. Create a JSON key for it (Keys → Add key → JSON). This file is the secret — handle per guardrail #7.
  4. Authorize domain-wide delegation in the Workspace Admin console (Security → Access and data control → API controls → Domain-wide delegation → Add new):
    • Client ID = the service account's client ID from step 2.
    • OAuth scopes = https://www.googleapis.com/auth/gmail.readonly
    • Save. (Without this exact scope authorized, sync returns a non-retryable auth error — see errors.py:21.)

2. Install the key on Start9

  1. Copy the JSON key to the service's data volume at /data/secrets/gmail-service-account.json.
  2. Lock it down: chmod 600 /data/secrets/gmail-service-account.json (the entrypoint also chmod 700s /data/secrets).
  3. Restart the service. On boot the 0.4 entrypoint detects the key and exports: CRM_GMAIL_INTEGRATION_ENABLED=true, CRM_GMAIL_AUTH_METHOD=dwd, CRM_GMAIL_SA_KEY_PATH=/data/secrets/gmail-service-account.json, CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz, CRM_GMAIL_SYNC_INTERVAL_MIN=180. It logs Gmail integration: ENABLED (key at …).

3. Smoke test — ONE mailbox first (the "don't rush it" gate)

Do a single-mailbox run before enrolling the whole team, to shake out auth/matching bugs on a small surface. All calls need an admin Bearer token:

CRM=https://<your-start9-crm-host>           # the CRM's address
TOKEN=$(curl -sk $CRM/api/auth/login -H 'Content-Type: application/json' \
  -d '{"username":"<admin>","password":"<pw>"}' | python3 -c 'import sys,json;print(json.load(sys.stdin)["token"])')

# integration alive?
curl -sk $CRM/api/email/status -H "Authorization: Bearer $TOKEN"

# enroll just yourself
curl -sk $CRM/api/email/accounts/enroll -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' -d '{"email":"you@ten31.xyz"}'

# trigger a sync now (otherwise it runs every 180 min)
curl -sk $CRM/api/email/sync/run-now -X POST -H "Authorization: Bearer $TOKEN"

Tip: to keep the first backfill small, set CRM_GMAIL_BACKFILL_PAGE_SIZE low (e.g. 50) before the restart, watch one page land, then raise it.

4. Verify (on the box, read-only SQL)

-- sync ran cleanly?
SELECT kind, status, messages_seen, messages_stored, attachments_saved, error
FROM email_sync_runs ORDER BY started_at DESC LIMIT 3;

-- mail captured + how much got matched to investors/contacts
SELECT COUNT(*) total, SUM(is_matched) matched FROM emails;

-- who did it match, and how confidently?
SELECT match_kind, COUNT(*) FROM email_investor_links GROUP BY match_kind;

Or via the API: GET /api/email/status (counts) and GET /api/email/threads?investor_id=<id> (matched threads for one investor). If matching looks thin, run POST /api/email/rematch with {"since":"<ISO8601>"} after the investor list is populated.

5. Roll out to the domain

Once the single mailbox looks right:

curl -sk $CRM/api/email/accounts/enroll-all -X POST -H "Authorization: Bearer $TOKEN"
curl -sk $CRM/api/email/sync/run-now -X POST -H "Authorization: Bearer $TOKEN"

Incremental sync then runs every CRM_GMAIL_SYNC_INTERVAL_MIN (default 180) via the scheduler thread.

6. Tuning knobs (env, config.py)

CRM_GMAIL_SYNC_INTERVAL_MIN (180) · CRM_GMAIL_BACKFILL_PAGE_SIZE (500) · CRM_GMAIL_MAX_ATTACHMENT_MB (50) · CRM_GMAIL_ATTACH_CONCURRENCY (4) · CRM_GMAIL_RATE_UNITS_SEC (150) · CRM_GMAIL_HISTORY_STALE_DAYS (5, forces a backfill if Gmail pruned history).

7. Where Superhuman fits (and where it doesn't)

You have Superhuman connected to Gmail, and it exposes an MCP server. The two are complementary, not competing, and it matters which job each does:

  • Canonical correspondence ingest → use this DWD integration, not Superhuman. It pulls mail straight into your own crm.db on Start9 and feeds the local embedding pipeline. Routing bulk ingest through Superhuman's MCP would put your email content through Superhuman's servers and — because an agent/Claude would be driving those calls — through Anthropic, which is exactly what guardrail #1 keeps the corpus away from. DWD keeps the data path Google → your box.
  • Human mail workflow & drafting → Superhuman MCP is great. Reading/triaging your own inbox, and Closer-style draft generation that a human reviews and sends, naturally happen in your real mail client. The batch-draft-writer skill already drives the Superhuman MCP for that, and it's usable today — independent of the CRM pipeline.

Net: DWD = system-of-record correspondence (sovereign, for retrieval). Superhuman MCP = the human's working surface (drafting, triage). Don't make Superhuman the ingest source of truth.

8. Disable / rollback

Remove (or rename) /data/secrets/gmail-service-account.json and restart → the entrypoint logs DISABLED and routes return 503; captured data remains. To pause one mailbox without disabling the whole integration, set its email_accounts.sync_enabled = 0.

9. Troubleshooting

  • 401/403 from Google on sync → DWD scope not authorized, wrong client ID, or Gmail API not enabled (steps 1 & 4). This error is non-retryable by design (errors.py).
  • status says disabled / routes 503 → key not found at CRM_GMAIL_SA_KEY_PATH, or CRM_GMAIL_INTEGRATION_ENABLED not truthy (the entrypoint only sets it when the key file exists).
  • Mail captured but matched = 0 → the investor/contact list was empty or addresses don't match; populate the CRM/grid first, then POST /api/email/rematch.
  • Bodies missing on some emails → by design, unmatched emails are stored metadata-only (no body) until matched (sync.py); re-match to backfill.