When a sent/received email is matched to an investor, a local-model agent drafts a
one-line dated note and queues it as a PENDING proposal (it never writes the grid
itself). On the Email Capture page a partner sees "Proposed grid notes", can edit the
text, and Approve (appends to that investor's grid notes cell, newest at bottom,
stamped with the approver) or Dismiss. Going-forward only: a cutoff (app_settings
email_activity_since, set on first run) means email dated before the feature was
enabled is never summarized, so the historical backfill makes no noise. Sovereign:
summaries run entirely on the local model (no redaction needed). Gmail sync interval
tightened 180 -> 15 min so outgoing email surfaces quickly.
Backend: migration 0002 (email_activity_proposals); propose_email_activity_notes()
runs via a new scheduler post_sync hook; list/decide functions + routes
GET /api/activity/proposals, POST .../{id}/approve|dismiss. Grid append stamps the
approving user (fundraising_state.updated_by has a FK to users). Test
test_email_activity.py (propose cutoff/idempotency, approve appends + edited note,
dismiss, already-decided guard) under FK enforcement.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
email_integration — Gmail capture for the Venture CRM
Scaffolded Phase 1 of the Gmail integration described in
GMAIL_INTEGRATION_ARCHITECTURE.md (repo root). Everything in this module is
isolated from server.py until you wire it in explicitly.
Contents
| File | Purpose |
|---|---|
config.py |
Env-var loader; exposes CONFIG singleton. |
errors.py |
Exception taxonomy used by the retry loop. |
crypto.py |
AES-GCM wrapper for OAuth refresh-token encryption (only used in OAuth mode). |
credentials.py |
CredentialProvider protocol + DWDCredentialProvider / OAuthCredentialProvider. |
gmail_client.py |
Gmail API HTTP wrapper (rate limit, retry, pagination). |
db.py |
All SQL touching emails_* tables. Migrations live under migrations/. |
parser.py |
Gmail payload → canonical dict (headers, body, attachments). |
matcher.py |
Investor address index + match logic. |
threads.py |
Thread resolution using Gmail threadId + RFC References. |
attachments.py |
Stub rows + on-disk storage + download worker. |
sync.py |
Orchestrator for backfill + incremental sync of one account. |
scheduler.py |
Background thread that runs sync.sync_all on an interval. |
routes.py |
HTTP handlers under /api/email/* compatible with CRMHandler. |
migrations/0001_email_tables.sql |
Table DDL. |
Wiring it in
All changes are in backend/server.py, all guarded by an env flag. Each is
independently revertible. None run unless CRM_GMAIL_INTEGRATION_ENABLED=true.
Patch 1 — migrations (append to init_db() after all existing
cursor.executescript(...) calls, before conn.commit()):
try:
from email_integration.db import apply_migrations
apply_migrations(cursor)
except ImportError:
pass
Patch 2 — scheduler (in main(), after start_backup_scheduler()):
if os.environ.get("CRM_GMAIL_INTEGRATION_ENABLED", "").lower() in ("1", "true", "yes", "on"):
from email_integration.scheduler import start_sync_scheduler
start_sync_scheduler()
Patch 3 — routes (add near the top of CRMHandler.do_GET and
CRMHandler.do_POST, after auth/rate-limit pre-checks, before API routing):
try:
from email_integration.routes import try_handle
if try_handle(self):
return
except ImportError:
pass
Environment variables
# Master on/off. Default off; scheduler won't start, routes return 503.
CRM_GMAIL_INTEGRATION_ENABLED=true
# Auth method: "dwd" (default, recommended) or "oauth"
CRM_GMAIL_AUTH_METHOD=dwd
# DWD mode
CRM_GMAIL_SA_KEY_PATH=/path/to/CRM/data/secrets/gmail-service-account.json
CRM_GMAIL_WORKSPACE_DOMAIN=ten31.xyz
# OAuth mode (fallback; not required for DWD)
CRM_GMAIL_OAUTH_CLIENT_ID=...
CRM_GMAIL_OAUTH_CLIENT_SECRET=...
CRM_GMAIL_OAUTH_REDIRECT_URI=https://crm.ten31.xyz/api/email/oauth/callback
CRM_GMAIL_SECRET_KEY=<base64-32-random-bytes> # for encrypting refresh tokens
# Sync
CRM_GMAIL_SYNC_INTERVAL_MIN=180 # default 3h
CRM_GMAIL_BACKFILL_PAGE_SIZE=500
CRM_GMAIL_MAX_ATTACHMENT_MB=50
CRM_GMAIL_ATTACH_CONCURRENCY=4
CRM_GMAIL_RATE_UNITS_SEC=150 # per account, leaves 40% headroom
CRM_GMAIL_RETRY_MAX=5
CRM_GMAIL_HISTORY_STALE_DAYS=5
Google Cloud / Workspace setup (DWD)
See GMAIL_INTEGRATION_ARCHITECTURE.md §3 for the full runbook. Short form:
- Create GCP project, enable Gmail API.
- Create service account, download JSON key, enable domain-wide delegation.
- In Google Admin console → Security → API controls → Manage domain-wide
delegation, authorize the service account's client ID with scope
https://www.googleapis.com/auth/gmail.readonly. - Copy the JSON key to
data/secrets/gmail-service-account.json,chmod 600. - Set env vars in
.env.beta, restart CRM. - As admin, POST
/api/email/accounts/enroll-allto createemail_accountsrows for every active user whose email ends in the Workspace domain.
Adding the crypto dependency (only for OAuth mode)
If you use OAuth fallback you need cryptography:
cryptography==42.0.5
Append to backend/requirements.txt. DWD mode also uses cryptography for
the RSA signing of the JWT bearer token — so if you enable the integration in
either mode, add the dep.
Rollback
To disable instantly: set CRM_GMAIL_INTEGRATION_ENABLED=false and restart.
The scheduler won't start, routes return 503, DB tables remain (unused).
To remove completely: drop the env var, delete data/email_attachments/,
drop all emails_* tables and email_* tables (migration is idempotent
create-only; a separate drop script would be required — not provided in
Phase 1 since you said you're not rushing).
Local development
The module has zero network dependencies when imported without the scheduler starting. You can:
python3 -c "from email_integration.parser import parse; \
import json; \
print(parse(json.load(open('fixture.json'))))"
Testing checklist (before enabling in production)
- Enable
CRM_GMAIL_INTEGRATION_ENABLED=trueon a staging copy of the DB only. - Verify migrations applied:
emails,email_accounts, etc. present. - Enroll one account (yours) via
/api/email/accounts/enroll. - Trigger
POST /api/email/sync/run-now. - Check
email_sync_runsforstatus='ok'. - Spot-check
emailsrows against Gmail. - Verify an attachment downloaded correctly (hash and size).
- Let the scheduler run for 24 hours; monitor
/api/email/status. - Enroll remaining 4 teammates.
What's scaffolded vs. TODO
Scaffolded and complete:
- Schema (migration 0001)
- Config and env parsing
- Error taxonomy + retry classifier
- AES-GCM crypto helpers
- DWD JWT minting + access token caching
- OAuth refresh + consent flow endpoints
- Gmail client (list/get/history/attachments/profile) with rate limit + retry
- Full DB data-access layer
- MIME parser including RFC 2047 subjects and HTML→text fallback
- Investor matcher with exact + domain strategies
- Thread resolution (Gmail threadId + RFC References cross-account)
- Attachment storage with SHA-256 dedup
- Sync orchestrator (backfill + incremental with history-expired fallback)
- Scheduler with manual-trigger hook
- HTTP routes (status, accounts, threads, enroll, run-now, rematch, oauth)
TODO before production (see architecture doc §15):
- Multipart batch metadata fetch in
gmail_client.batch_get_metadata(currently serial fallback). - Unit tests (fixtures for parser, matcher, threads; integration tests with responses-style HTTP mock).
- Frontend UI: a thread list + detail pane in
frontend/index.html. - Sandboxed HTML rendering for email bodies (out of scope here).