Files
ten31-database/docs/guides/email.md
T
Keysat 5faa5ae4d6 Email-proposal review over Matrix + a bot role (v0.1.0:89)
The email-capture "proposed grid notes" gain two review surfaces:

1. Inline source email — each proposed-note card on the Email Capture page
   gets a "View email" toggle that lazily fetches the existing
   GET /api/email/detail and shows from/to/cc/date/subject + scrollable body,
   so a reviewer can judge the note against the email it was drafted from.

2. CRM->Matrix review bridge — the CRM (box, stdlib, no matrix-nio) can't post
   to Matrix, so the intake bot (Spark) PULLS: GET /api/intake/email-proposals
   returns to_post/open/to_close work-lists; the bot posts a review card
   (metadata + snippet + draft note) to a dedicated review room
   (MATRIX_EMAIL_REVIEW_ROOM) and relays in-thread yes / no / NL-edit
   (POST .../{id}/decide, note revised via local Qwen). Decisions sync both
   ways: web decide -> bot announces + closes the thread; Matrix decide -> the
   web panel's ~25s poll clears the card. State lives CRM-side in the new
   email_proposal_matrix side row (email-integration migration 0003, additive
   + idempotent CREATE TABLE IF NOT EXISTS), so it survives a bot restart.

Adds a 'bot' role (authenticated, never admin; require_bot_or_admin) to gate
the email-proposal endpoints rather than handing the bot full admin — the
principled base for the coming agentic capabilities. Role controls reach;
the draft->approve gate still controls autonomy (a human approves every write).

Deploy split: endpoints + migration + role + frontend ship in the s9pk; the
bot poll loop + review-room handling ship on the Spark. The bot's CRM user
must be flipped member->bot and joined to the review room (one-time).

Tests: backend/test_email_proposal_matrix.py + matrix_intake/test_email_proposals.py
(30/30 suite green, render-smoke green, migration verified twice on a DB copy).
2026-06-18 09:51:41 -05:00

157 lines
12 KiB
Markdown

---
paths:
- backend/email_integration/**
- backend/digest_mailer.py
- backend/smtp_send.py
---
# Email capture & drafts (Gmail)
Read this before editing Gmail capture or draft creation.
## What it does
- `backend/email_integration/` captures Gmail via **domain-wide delegation** (`credentials.py`, `matcher.py`, `parser.py`, `db.py`, `sync.py`, `scheduler.py`, `routes.py`) and creates Tier-B in-thread drafts (`compose.py`). It has its own `migrations/`.
- Captured email becomes CRM activity through a **propose → approve** flow — nothing lands on a contact record until a human approves the proposal. The proposed grid notes show on the **Email Capture** page (admin-only): each card has a **View email** toggle that fetches `GET /api/email/detail?id=` and shows the source email inline (from/to/cc/date/subject + scrollable body) so you can judge the note against it. The same proposals can also be reviewed/approved/edited from a **dedicated Matrix room**, kept in sync with this panel (decide on either surface; the other reflects it) — that CRM→Matrix bridge lives in the **review bot**, see `docs/guides/matrix-intake.md`. The proposal model itself (`email_activity_proposals` + the `propose_email_activity_notes` drafter + the decide path) lives in `backend/server.py`, not this package.
## Hard rule
- **Agents draft; humans send.** Never let an agent send email, post, or contact an LP autonomously. Tier-B `compose.py` only *creates* a Gmail draft for human review.
## Outbound mail — the daily digest (internal; exempt from "agents draft")
The CRM sends an internal **daily activity digest** to the fund's own admins. This is the
ONE automated send path, and it does **not** violate the hard rule above: that rule governs
outward **LP/prospect** contact. An internal ops email to the team's own inboxes is a
different category. **Never extend this path to send to LPs/prospects.**
- **Transport selector: `backend/digest_mailer.py`** (top-level, not in this package) —
`send_digest(conn, to_addrs, subject, body)` picks **Gmail-DWD (preferred) → SMTP (fallback)**.
DWD-impersonation sender = `CRM_DIGEST_SENDER` env, else the first active admin.
- **Gmail-DWD path: `gmail_send.py`** (this package) — reuses `credentials.py`'s
`DWDCredentialProvider` with the **`gmail.compose`** scope to call `users.messages.send`
(REST, mirrors `compose.py`; body is `{raw}` not the draft's `{message:{raw}}`). The
deployment's DWD grant includes `gmail.compose` (which authorizes send) but **not** the
narrow `gmail.send` — so request `gmail.compose`. Verified live 2026-06-15 (token mint +
a real `messages.send`).
- **SMTP fallback: `backend/smtp_send.py`** (top-level) — stdlib smtplib reading `SMTP_*` env,
populated on the box by the **Configure Digest SMTP** Start9 action (writes
`/data/secrets/smtp/*`; entrypoint exports `SMTP_*`). A dedicated per-package account,
independent of any StartOS system-wide SMTP.
- The admin **`POST /api/admin/digest/test-email`** restricts recipients to the active-admin
set (not an open relay), and logs send failures rather than echoing them (an auth error can
carry a token/credential).
### Phase B — the daily digest itself (built)
- **Content builder: `backend/digest_builder.py`** (top-level). `build_digest(conn, since_iso,
until_iso, chat_fn=None)` returns `{subject, body, has_activity, user_count, email_count,
investor_count}` and composes **two sections**:
- **By team member** — `collect_user_activity`: per registered user, both directions
(per-mailbox `eam.is_sent`), with **one Spark narrative paragraph** per user
(`ingest/llm.py` → Spark Control `/v1/chat/completions`), **never Claude** (the digest is
deliberately un-anonymized — real LP names + substance stay local). Deterministic
count-only fallback if Spark is unreachable (always-send must not fail).
- **By investor** — `collect_investor_activity`: re-pivots the same window across the whole
team, **deduped per email** (a reply to several teammates counts once), direction decided
at the **email level** (outbound if `from_email` is one of our mailboxes, else inbound).
Structured list, no extra Spark calls.
- Soft-delete filters: `email_account_messages.deleted_at IS NULL` + `users.is_active = 1`,
and the org/contact name joins drop soft-deleted rows (falling back to the matched address).
- **Control is DB-backed, set from the admin panel** — `digest_builder.load_digest_policy(conn)`
reads `app_settings.digest_policy` = `{enabled, send_hour}`. Precedence: **DB row wins**
(the Settings → Admin toggle + send-time dropdown), else `CRM_DIGEST_ENABLED`/
`CRM_DIGEST_SEND_HOUR` seed a first-boot default, else `{false, 18}`. `GET`/`PATCH
/api/admin/digest/policy` (admin-only) read/write it. **Not a StartOS action** — it's an
operational toggle, so it lives in-app where it's discoverable and takes effect live.
- **Scheduler: `backend/email_integration/digest_scheduler.py`** (co-located with the sync
scheduler). One daemon thread, **always started**; each cycle (60s) re-reads the DB policy
and sends once per local day at/after `send_hour` **only when `enabled`** — so toggling in
the panel takes effect with no restart. Content window = (last send, now]; cursor
(`digest_last_sent_at`) + once-per-day guard (`digest_last_sent_date`) live in `app_settings`,
so a missed day rolls into the next digest. Recipients = all active admins.
- **Windowed preview + manual send (Settings → Admin "Manual run & preview"):**
- **`POST /api/admin/digest/preview`** (admin-only) builds the digest over a chosen window
and returns `{subject, body, …, window}` **without sending** — it runs the **real Spark
summarization**, so widening the window is how you verify the summarizer on a quiet day
(a last-24h window with no activity never calls Spark). Rendered in an in-panel preview.
- **`POST /api/admin/digest/send-now`** (admin-only) sends over the **same** window to the
admin set now.
- Both take the window from the body: default last 24h, `{"hours": N}`, or
`{"since": "YYYY-MM-DD"}` (a **local** date → that day's midnight). Resolved by
`digest_builder.resolve_digest_window` (capped at `MAX_WINDOW_DAYS`=92, validated → 400 on
bad input). The send goes through `digest_scheduler.send_digest_window`, which — like the
old `force=True` path — **does NOT advance the daily cursor**, so a wide manual preview/send
never suppresses the scheduled daily digest.
- The **"Send transport test"** button (`POST /api/admin/digest/test-email`) stays as a pure
pipe check (fixed message, admin-recipient-restricted).
- **Decisions (locked):** 6 PM default send · always-send (empty days get a "no activity"
note) · per-user narrative + by-investor structured section · enable/time controlled in the
admin panel. Tests: `backend/test_digest_builder.py` (per-user + per-investor queries,
soft-delete, inbound dedup, two-section compose, fallback, policy resolver, scheduler guards
— stubbed LLM + transport).
## Email-activity panel (Communications tab) — admin-only
The **Communications** tab (frontend) is the admin-only search over captured Gmail. The
classic manual "Log Communication" form was retired (the Fundraising Grid context menu is
the manual-log path). Backed by **`GET /api/email/activity`** (`routes.py:_h_activity`,
`require_admin` server-side) → **`db.query_email_activity(conn, ...)`** (the pure, tested
query). Filters: `investor_id`, `account_id` (mailbox), `direction` (`inbound`/`outbound`),
`q` (free-text over subject/snippet/from). Non-obvious semantics to preserve:
- **Matched-only:** the panel surfaces ONLY email that links to a known
investor/contact (`query_email_activity` gates on `EXISTS email_investor_links`).
Capture still stores unmatched cold/unknown-sender email (metadata only, see "match-only
full storage"), but it is never shown here — the Communications tab is the
investor-relationship view, not the raw mailbox.
- **Soft-delete lives on the per-mailbox sighting**, not the email: `emails` has no
`deleted_at`. An email is "live" iff it has a sighting with `email_account_messages.
deleted_at IS NULL` — the query gates on `EXISTS(... deleted_at IS NULL)`. (Investor
links are email-level and carry no `deleted_at`, so they need no separate filter.)
- **Direction is decided at the email level** — outbound if `from_email` is one of our
`email_accounts` addresses, else inbound — mirroring `digest_builder._own_addresses`.
- **Graveyard investors** are hidden from the filter *dropdown* (CRM-wide `graveyard = 0`),
but their captured email still shows in the list and stays findable by free-text search —
it's an audit surface, so history is never hidden, only the picker is.
- **Typed investor facet (the dropdown).** The picker mirrors what the list resolves: one
entry per distinct matched entity, with the digest's precedence (**grid investor → org →
contact → raw address**) and a **typed key** — `fund:<id>` / `org:<id>` / `contact:<id>`
(`investor_id=` accepts these; a bare id is treated as `fund:` for back-compat). This fixed
the "dropdown only shows *All investors*" bug: matches that land on a **classic contact or
org domain** (no grid id — common, since `fundraising_contacts.email` is sparsely populated)
now resolve to a real name and appear in the picker, instead of the facet coming back empty.
Raw-address-only matches stay out of the *picker* (noisy) but still show + search in the list.
Helpers: `db._resolve_entity` + the shared `_LINK_IDENTITY_COLS`/`_LINK_IDENTITY_JOINS`.
- **Date range:** `since`/`until` filter `e.sent_at` as a half-open `[since, until)`
interval; the UI sends `from` as `…T00:00:00` and `to` as the **next day's** midnight,
so the whole "to" day is included regardless of the stored timestamp's precision/zone.
- **Detail view:** **`GET /api/email/detail?id=`** (`_h_detail`, `require_admin`) →
`db.query_email_detail` returns the full body + to/cc recipients + attachments + typed
identities, **soft-delete-gated on a live sighting** (404 otherwise). The UI renders
`body_text` (escaped) — **never** raw remote `body_html` (XSS); click a row to expand.
## Content search (semantic, over email bodies) — admin-only
The Communications tab has a **Filter ⇄ Search content** toggle. "Search content" is semantic
search over the email *bodies* indexed in Qdrant (distinct from the structured subject/sender
LIKE filters above). **`GET /api/email/search?q=`** (`routes._h_search`, `require_admin`):
- Retrieval = `ingest/search.py:hybrid_search` (dense + BM25, reranked) pre-filtered to
`doc_type='email'`, imported **lazily** (the ingest stack — Spark Control + Qdrant + the
sparse encoder — ships in the Docker image, not the bare CRM); any failure → a clean **503**.
- Only **matched** email bodies are indexed (see `ingest/chunking.py`); the Qdrant payload
carries `source_id`=email_id, `lp_name`, `date_ts`, so hits link straight back to the row.
- **Hydrated + soft-delete-filtered against SQLite (canonical):** `db.search_hit_emails`
drops any hit whose email no longer has a live sighting — the derived index can lag a
deletion, and we never surface a fact from Qdrant that SQLite has tombstoned.
Tests: `backend/email_integration/test_email_activity_panel.py` (panel filters/facets/detail +
the search route's hydrate/drop/503/admin paths, with retrieval stubbed).
## Known gap
- Tier-B drafts currently reply to the **LP only**; reply-all is the next change (see AGENTS.md → Current state).
See also `docs/gmail-enablement-runbook.md`.