Files
Keysat 5faa5ae4d6 Email-proposal review over Matrix + a bot role (v0.1.0:89)
The email-capture "proposed grid notes" gain two review surfaces:

1. Inline source email — each proposed-note card on the Email Capture page
   gets a "View email" toggle that lazily fetches the existing
   GET /api/email/detail and shows from/to/cc/date/subject + scrollable body,
   so a reviewer can judge the note against the email it was drafted from.

2. CRM->Matrix review bridge — the CRM (box, stdlib, no matrix-nio) can't post
   to Matrix, so the intake bot (Spark) PULLS: GET /api/intake/email-proposals
   returns to_post/open/to_close work-lists; the bot posts a review card
   (metadata + snippet + draft note) to a dedicated review room
   (MATRIX_EMAIL_REVIEW_ROOM) and relays in-thread yes / no / NL-edit
   (POST .../{id}/decide, note revised via local Qwen). Decisions sync both
   ways: web decide -> bot announces + closes the thread; Matrix decide -> the
   web panel's ~25s poll clears the card. State lives CRM-side in the new
   email_proposal_matrix side row (email-integration migration 0003, additive
   + idempotent CREATE TABLE IF NOT EXISTS), so it survives a bot restart.

Adds a 'bot' role (authenticated, never admin; require_bot_or_admin) to gate
the email-proposal endpoints rather than handing the bot full admin — the
principled base for the coming agentic capabilities. Role controls reach;
the draft->approve gate still controls autonomy (a human approves every write).

Deploy split: endpoints + migration + role + frontend ship in the s9pk; the
bot poll loop + review-room handling ship on the Spark. The bot's CRM user
must be flipped member->bot and joined to the review room (one-time).

Tests: backend/test_email_proposal_matrix.py + matrix_intake/test_email_proposals.py
(30/30 suite green, render-smoke green, migration verified twice on a DB copy).
2026-06-18 09:51:41 -05:00

12 KiB

paths
paths
backend/email_integration/**
backend/digest_mailer.py
backend/smtp_send.py

Email capture & drafts (Gmail)

Read this before editing Gmail capture or draft creation.

What it does

  • backend/email_integration/ captures Gmail via domain-wide delegation (credentials.py, matcher.py, parser.py, db.py, sync.py, scheduler.py, routes.py) and creates Tier-B in-thread drafts (compose.py). It has its own migrations/.
  • Captured email becomes CRM activity through a propose → approve flow — nothing lands on a contact record until a human approves the proposal. The proposed grid notes show on the Email Capture page (admin-only): each card has a View email toggle that fetches GET /api/email/detail?id= and shows the source email inline (from/to/cc/date/subject + scrollable body) so you can judge the note against it. The same proposals can also be reviewed/approved/edited from a dedicated Matrix room, kept in sync with this panel (decide on either surface; the other reflects it) — that CRM→Matrix bridge lives in the review bot, see docs/guides/matrix-intake.md. The proposal model itself (email_activity_proposals + the propose_email_activity_notes drafter + the decide path) lives in backend/server.py, not this package.

Hard rule

  • Agents draft; humans send. Never let an agent send email, post, or contact an LP autonomously. Tier-B compose.py only creates a Gmail draft for human review.

Outbound mail — the daily digest (internal; exempt from "agents draft")

The CRM sends an internal daily activity digest to the fund's own admins. This is the ONE automated send path, and it does not violate the hard rule above: that rule governs outward LP/prospect contact. An internal ops email to the team's own inboxes is a different category. Never extend this path to send to LPs/prospects.

  • Transport selector: backend/digest_mailer.py (top-level, not in this package) — send_digest(conn, to_addrs, subject, body) picks Gmail-DWD (preferred) → SMTP (fallback). DWD-impersonation sender = CRM_DIGEST_SENDER env, else the first active admin.
  • Gmail-DWD path: gmail_send.py (this package) — reuses credentials.py's DWDCredentialProvider with the gmail.compose scope to call users.messages.send (REST, mirrors compose.py; body is {raw} not the draft's {message:{raw}}). The deployment's DWD grant includes gmail.compose (which authorizes send) but not the narrow gmail.send — so request gmail.compose. Verified live 2026-06-15 (token mint + a real messages.send).
  • SMTP fallback: backend/smtp_send.py (top-level) — stdlib smtplib reading SMTP_* env, populated on the box by the Configure Digest SMTP Start9 action (writes /data/secrets/smtp/*; entrypoint exports SMTP_*). A dedicated per-package account, independent of any StartOS system-wide SMTP.
  • The admin POST /api/admin/digest/test-email restricts recipients to the active-admin set (not an open relay), and logs send failures rather than echoing them (an auth error can carry a token/credential).

Phase B — the daily digest itself (built)

  • Content builder: backend/digest_builder.py (top-level). build_digest(conn, since_iso, until_iso, chat_fn=None) returns {subject, body, has_activity, user_count, email_count, investor_count} and composes two sections:
    • By team membercollect_user_activity: per registered user, both directions (per-mailbox eam.is_sent), with one Spark narrative paragraph per user (ingest/llm.py → Spark Control /v1/chat/completions), never Claude (the digest is deliberately un-anonymized — real LP names + substance stay local). Deterministic count-only fallback if Spark is unreachable (always-send must not fail).
    • By investorcollect_investor_activity: re-pivots the same window across the whole team, deduped per email (a reply to several teammates counts once), direction decided at the email level (outbound if from_email is one of our mailboxes, else inbound). Structured list, no extra Spark calls.
    • Soft-delete filters: email_account_messages.deleted_at IS NULL + users.is_active = 1, and the org/contact name joins drop soft-deleted rows (falling back to the matched address).
  • Control is DB-backed, set from the admin paneldigest_builder.load_digest_policy(conn) reads app_settings.digest_policy = {enabled, send_hour}. Precedence: DB row wins (the Settings → Admin toggle + send-time dropdown), else CRM_DIGEST_ENABLED/ CRM_DIGEST_SEND_HOUR seed a first-boot default, else {false, 18}. GET/PATCH /api/admin/digest/policy (admin-only) read/write it. Not a StartOS action — it's an operational toggle, so it lives in-app where it's discoverable and takes effect live.
  • Scheduler: backend/email_integration/digest_scheduler.py (co-located with the sync scheduler). One daemon thread, always started; each cycle (60s) re-reads the DB policy and sends once per local day at/after send_hour only when enabled — so toggling in the panel takes effect with no restart. Content window = (last send, now]; cursor (digest_last_sent_at) + once-per-day guard (digest_last_sent_date) live in app_settings, so a missed day rolls into the next digest. Recipients = all active admins.
  • Windowed preview + manual send (Settings → Admin "Manual run & preview"):
    • POST /api/admin/digest/preview (admin-only) builds the digest over a chosen window and returns {subject, body, …, window} without sending — it runs the real Spark summarization, so widening the window is how you verify the summarizer on a quiet day (a last-24h window with no activity never calls Spark). Rendered in an in-panel preview.
    • POST /api/admin/digest/send-now (admin-only) sends over the same window to the admin set now.
    • Both take the window from the body: default last 24h, {"hours": N}, or {"since": "YYYY-MM-DD"} (a local date → that day's midnight). Resolved by digest_builder.resolve_digest_window (capped at MAX_WINDOW_DAYS=92, validated → 400 on bad input). The send goes through digest_scheduler.send_digest_window, which — like the old force=True path — does NOT advance the daily cursor, so a wide manual preview/send never suppresses the scheduled daily digest.
    • The "Send transport test" button (POST /api/admin/digest/test-email) stays as a pure pipe check (fixed message, admin-recipient-restricted).
  • Decisions (locked): 6 PM default send · always-send (empty days get a "no activity" note) · per-user narrative + by-investor structured section · enable/time controlled in the admin panel. Tests: backend/test_digest_builder.py (per-user + per-investor queries, soft-delete, inbound dedup, two-section compose, fallback, policy resolver, scheduler guards — stubbed LLM + transport).

Email-activity panel (Communications tab) — admin-only

The Communications tab (frontend) is the admin-only search over captured Gmail. The classic manual "Log Communication" form was retired (the Fundraising Grid context menu is the manual-log path). Backed by GET /api/email/activity (routes.py:_h_activity, require_admin server-side) → db.query_email_activity(conn, ...) (the pure, tested query). Filters: investor_id, account_id (mailbox), direction (inbound/outbound), q (free-text over subject/snippet/from). Non-obvious semantics to preserve:

  • Matched-only: the panel surfaces ONLY email that links to a known investor/contact (query_email_activity gates on EXISTS email_investor_links). Capture still stores unmatched cold/unknown-sender email (metadata only, see "match-only full storage"), but it is never shown here — the Communications tab is the investor-relationship view, not the raw mailbox.
  • Soft-delete lives on the per-mailbox sighting, not the email: emails has no deleted_at. An email is "live" iff it has a sighting with email_account_messages. deleted_at IS NULL — the query gates on EXISTS(... deleted_at IS NULL). (Investor links are email-level and carry no deleted_at, so they need no separate filter.)
  • Direction is decided at the email level — outbound if from_email is one of our email_accounts addresses, else inbound — mirroring digest_builder._own_addresses.
  • Graveyard investors are hidden from the filter dropdown (CRM-wide graveyard = 0), but their captured email still shows in the list and stays findable by free-text search — it's an audit surface, so history is never hidden, only the picker is.
  • Typed investor facet (the dropdown). The picker mirrors what the list resolves: one entry per distinct matched entity, with the digest's precedence (grid investor → org → contact → raw address) and a typed keyfund:<id> / org:<id> / contact:<id> (investor_id= accepts these; a bare id is treated as fund: for back-compat). This fixed the "dropdown only shows All investors" bug: matches that land on a classic contact or org domain (no grid id — common, since fundraising_contacts.email is sparsely populated) now resolve to a real name and appear in the picker, instead of the facet coming back empty. Raw-address-only matches stay out of the picker (noisy) but still show + search in the list. Helpers: db._resolve_entity + the shared _LINK_IDENTITY_COLS/_LINK_IDENTITY_JOINS.
  • Date range: since/until filter e.sent_at as a half-open [since, until) interval; the UI sends from as …T00:00:00 and to as the next day's midnight, so the whole "to" day is included regardless of the stored timestamp's precision/zone.
  • Detail view: GET /api/email/detail?id= (_h_detail, require_admin) → db.query_email_detail returns the full body + to/cc recipients + attachments + typed identities, soft-delete-gated on a live sighting (404 otherwise). The UI renders body_text (escaped) — never raw remote body_html (XSS); click a row to expand.

Content search (semantic, over email bodies) — admin-only

The Communications tab has a Filter ⇄ Search content toggle. "Search content" is semantic search over the email bodies indexed in Qdrant (distinct from the structured subject/sender LIKE filters above). GET /api/email/search?q= (routes._h_search, require_admin):

  • Retrieval = ingest/search.py:hybrid_search (dense + BM25, reranked) pre-filtered to doc_type='email', imported lazily (the ingest stack — Spark Control + Qdrant + the sparse encoder — ships in the Docker image, not the bare CRM); any failure → a clean 503.
  • Only matched email bodies are indexed (see ingest/chunking.py); the Qdrant payload carries source_id=email_id, lp_name, date_ts, so hits link straight back to the row.
  • Hydrated + soft-delete-filtered against SQLite (canonical): db.search_hit_emails drops any hit whose email no longer has a live sighting — the derived index can lag a deletion, and we never surface a fact from Qdrant that SQLite has tombstoned.

Tests: backend/email_integration/test_email_activity_panel.py (panel filters/facets/detail + the search route's hydrate/drop/503/admin paths, with retrieval stubbed).

Known gap

  • Tier-B drafts currently reply to the LP only; reply-all is the next change (see AGENTS.md → Current state).

See also docs/gmail-enablement-runbook.md.