Email search/query + windowed digest preview (v0.1.0:83)
Communications tab (search/query roadmap items 1 & 2): - Fix the investor dropdown: the facet only listed grid investors, so it came back empty whenever email matched a classic contact or org domain (no grid id — the common case). It now mirrors the email list, resolving each link to a typed identity (fund:/org:/contact:/addr:) with precedence grid -> org -> contact -> address; investor_id accepts the typed key (bare id = fund: for back-compat) and an unknown prefix matches nothing. - Add a date-range filter and a click-to-expand full-body view (GET /api/email/detail, admin, soft-delete-gated; body_text only, never raw remote HTML). - Add a "Search content" mode: GET /api/email/search wraps the ingest hybrid_search over the Qdrant email index (doc_type=email), hydrated and soft-delete-filtered against SQLite (canonical), 503 if Spark/Qdrant down. Daily digest: - Settings -> Admin builds a digest over a chosen window (last 24h or since a date) as an in-app preview before sending (POST /api/admin/digest/preview), so the local-Spark summarizer can be verified on demand even on a quiet day. Manual send uses the same window; neither advances the daily cursor, so a preview never suppresses the scheduled digest. Code-only, migrations no-op. 22/22 backend tests, render-smoke pass.
This commit is contained in:
+5
-3
@@ -87,7 +87,7 @@
|
||||
## Backlog (post-Phase-1 agentic)
|
||||
|
||||
### Daily activity digest (email to the team)
|
||||
*Requested 2026-06-15. **Phase A deployed** (v0.1.0:76). **Phase B deployed & verified live in v0.1.0:77 (2026-06-16)** — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin.*
|
||||
*Requested 2026-06-15. **Phase A deployed** (v0.1.0:76). **Phase B deployed & verified live in v0.1.0:77 (2026-06-16)** — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin. **v0.1.0:83 (built, deploy pending): in-app windowed preview** — Settings → Admin builds a digest over a chosen window (last 24h or since a date) and shows it before sending (`POST /api/admin/digest/preview`), so the **real Spark summarizer can be verified on demand** even on a quiet day (the fixed last-24h `send-now` couldn't); manual send uses the same window and never touches the daily cursor.*
|
||||
|
||||
**Decisions (locked 2026-06-15):** recipients = **all active admins**; summarization = **Spark-LLM narrative** (never Claude — un-anonymized substance stays local); granularity = **grouped by user** (→ per investor).
|
||||
|
||||
@@ -111,13 +111,15 @@ Open design questions (settled at build time): send time = **6 PM box-local** (c
|
||||
### Email/communication search + natural-language query
|
||||
*Requested 2026-06-16. Three increments, **sequenced 1 → 2 → 3** (1 and 2 first as a quick increment; 3 is a separate, larger build after). Origin: Grant asked whether we can query "emails sent to a specific investor" / "activity by user," and floated NL queries like "existing investors who have committed capital across our funds that we haven't emailed in a while."*
|
||||
|
||||
**Status: items 1 & 2 SHIPPED in v0.1.0:83 (built + verified locally 2026-06-16, deploy pending).** The Communications tab now has the structured activity surface (item 1: typed/fixed investor dropdown, mailbox + direction + **date-range** filters, free-text, **click-to-expand full body** via `GET /api/email/detail`) and a **"Search content"** semantic mode (item 2: `GET /api/email/search` over the Qdrant email index). The dropdown-empty bug (the facet only listed grid investors) was the v83 fix — it now mirrors the list across grid/org/contact matches. **Item 3 (NL→SQL) remains** — the larger, separate build below. Detail: `docs/guides/email.md`.
|
||||
|
||||
**Context — the data is captured but currently has NO front-end.** The entire Gmail email schema (`emails`, `email_threads`, `email_investor_links`, `email_account_messages`, `email_activity_proposals`, …) exists and is populated by the DWD capture pipeline, but is surfaced **nowhere** in `frontend/index.html` today (only as inputs to the daily digest). So all three items below are about making already-captured data queryable/visible. Email bodies of *matched* emails are already chunked + embedded into Qdrant with `{lp_id, lp_name, doc_type:"email", date_ts}` metadata.
|
||||
|
||||
**Caveat that shapes all three — the two-model join.** "Emails to an investor" link to the **fundraising grid** (`email_investor_links.fundraising_investor_id`); "committed capital" lives in the grid too (`fundraising_commitments`, multi-fund). But manually-logged `communications` and `lp_profiles` (single-fund) live in the **classic** model, and the two models are only bridged by fuzzy email/name matching (no authoritative join key). Any query spanning "committed capital" + "email recency" must reckon with this. Prefer the grid side as the higher-signal source (matcher already does).
|
||||
|
||||
**1. Activity query endpoints + panel (do first).** The logic already exists and is tested inside `backend/digest_builder.py` — `collect_user_activity()` (per team-member, sent vs received, with matched investor names) and `collect_investor_activity()` (re-pivoted by investor, team-wide). Expose them as on-demand endpoints (e.g. `GET /api/activity?user_id=…&since=…&until=…` and `…?investor_id=…`) returning the actual records (not just the counts that `/api/reports/activity` gives today), plus a simple UI panel. Answers "emails to investor X" and "what has user Y sent lately" interactively. Small build — mostly assembling tested parts + a thin UI. Soft-delete filter every read.
|
||||
**1. Activity query endpoints + panel — DONE (v0.1.0:83).** Delivered as the **Communications tab** rather than the originally-sketched `/api/activity` endpoints: `GET /api/email/activity` (`db.query_email_activity`) returns the actual records filterable by investor / mailbox / direction / **date range** / free-text, and `GET /api/email/detail` expands the full body. Answers "emails to investor X" and "what has mailbox Y sent" interactively. Soft-delete filtered throughout; investor identity is typed (`fund:`/`org:`/`contact:`) so org/contact-only matches resolve and are pickable. *(The `collect_user_activity()`/`collect_investor_activity()` digest helpers remain the by-user/by-investor pivot source; a dedicated per-user pivot UI was not needed for the answer Grant wanted, which the mailbox+direction filters already give.)*
|
||||
|
||||
**2. Email content search box (do first, alongside 1).** Wire a search box onto the email bodies **already indexed in Qdrant** (capability is ~80% built — see the retrieval modes in `backend/ingest/search.py` and the MCP `hybrid_search`/`semantic_search`/`keyword_search` tools). This is semantic/lexical search over email *content* ("find where we discussed the mining deal"), distinct from the structured filters in item 1. Decide placement (global search bar vs. a dedicated email/search page — note there's no email UI at all today, so this may pair naturally with surfacing threads). Small.
|
||||
**2. Email content search box — DONE (v0.1.0:83).** A **"Search content"** toggle in the Communications tab → `GET /api/email/search?q=` wraps `backend/ingest/search.py:hybrid_search` filtered to `doc_type='email'`; hits are hydrated + soft-delete-filtered against SQLite (canonical) and link back to the full body. Semantic/lexical search over email *content* ("find where we discussed the mining deal"), distinct from item 1's structured filters. 503 (clean "unavailable") when Spark/Qdrant is unreachable.
|
||||
|
||||
**3. Natural-language → safe structured query (separate, larger, after 1 & 2).** An LLM translates a plain-English question into a **safe, read-only** DB query against the CRM, for relational/analytical questions that semantic search *cannot* answer — Grant's example ("committed across funds AND not emailed in a while") is joins + aggregates + recency, not a text-topic match. Design constraints (locked at request time, refine at build):
|
||||
- **LLM = Claude behind the redaction boundary** (better at text-to-SQL than local Qwen; the scrub→Claude→re-hydrate path already exists for the PII concern). Not Spark — Spark Control offers embeddings/rerank/RAG + local chat, but **no text-to-SQL**.
|
||||
|
||||
Reference in New Issue
Block a user