Files
ten31-database/ROADMAP.md
T
Keysat c7b74a2704 Email search/query + windowed digest preview (v0.1.0:83)
Communications tab (search/query roadmap items 1 & 2):
- Fix the investor dropdown: the facet only listed grid investors, so it
  came back empty whenever email matched a classic contact or org domain
  (no grid id — the common case). It now mirrors the email list, resolving
  each link to a typed identity (fund:/org:/contact:/addr:) with precedence
  grid -> org -> contact -> address; investor_id accepts the typed key
  (bare id = fund: for back-compat) and an unknown prefix matches nothing.
- Add a date-range filter and a click-to-expand full-body view
  (GET /api/email/detail, admin, soft-delete-gated; body_text only, never
  raw remote HTML).
- Add a "Search content" mode: GET /api/email/search wraps the ingest
  hybrid_search over the Qdrant email index (doc_type=email), hydrated and
  soft-delete-filtered against SQLite (canonical), 503 if Spark/Qdrant down.

Daily digest:
- Settings -> Admin builds a digest over a chosen window (last 24h or since
  a date) as an in-app preview before sending (POST /api/admin/digest/preview),
  so the local-Spark summarizer can be verified on demand even on a quiet day.
  Manual send uses the same window; neither advances the daily cursor, so a
  preview never suppresses the scheduled digest.

Code-only, migrations no-op. 22/22 backend tests, render-smoke pass.
2026-06-16 20:46:15 -05:00

156 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Venture CRM Roadmap (Airtable Replacement)
## Current status
- Premium Airtable-like frontend grid exists and is actively iterating.
- Backend now has production-grade APIs for:
- `GET /api/fundraising/state`
- `PUT /api/fundraising/state` (with optimistic version check)
- `GET /api/fundraising/export`
- `POST /api/fundraising/backup`
- `POST /api/fundraising/restore-preview`
- `POST /api/fundraising/restore`
- `GET /api/fundraising/backups`
- `GET/PATCH /api/fundraising/backup-policy`
- `GET /api/fundraising/relational-summary`
- `GET /api/feature-requests`
- `POST /api/feature-requests`
- `PATCH /api/feature-requests/:id`
- New DB tables:
- `fundraising_state`
- `fundraising_investors`
- `fundraising_contacts`
- `fundraising_funds`
- `fundraising_commitments`
- `fundraising_views`
- `feature_requests`
- `app_settings`
- Grid saves/restores now sync into relational fundraising tables automatically.
- Formula engine is now sandboxed (no `eval`/`new Function`) with expanded function support.
- Automation engine v1 added:
- Rule table + toggle API
- List memberships (`main`, `follow_up`, `graveyard`, `longshot`, `all`)
- Automation run log
- Collaboration/reliability additions:
- Unified activity feed API (`audit` + `automation` + `backup`)
- Backup integrity verification API
- Better version-conflict metadata (`updated_at`, `updated_by`)
- Security hardening additions:
- Basic IP rate limiting (login and write APIs)
- Configurable CORS origin (`CRM_CORS_ORIGIN`)
- Production secret enforcement (`CRM_ENV=production` requires `CRM_SECRET_KEY`)
- Security status API + go-live checklist (`SECURITY.md`)
## Phase 1 (Production foundation)
1. Persist grid + views on backend
- Wire frontend fundraising grid reads/writes to `/api/fundraising/state`.
- Keep localStorage only as emergency fallback.
- Add autosave debounce and conflict handling (`expected_version`).
2. Admin-invite auth model
- Disable self-register for non-admin users.
- Add admin-only invite/create-user endpoint.
- Keep role model: `admin`, `member`.
3. Deployment and remote access
- Add `docker-compose` for one-command launch.
- Reverse proxy + TLS option (Caddy/Traefik) for non-Tailscale deployments.
- Recommended for your use case: Tailscale private access to laptop host.
4. Data safety and operations
- Automated nightly SQLite backups and restore test script.
- Add `/api/fundraising/export` for JSON snapshot export.
- Add health/readiness checks.
## Phase 2 (Airtable parity)
1. Advanced views
- Multi-condition filter groups (AND/OR groups)
- Multi-column sorting
- Pinned/frozen columns
- Personal vs shared views
2. Formula engine v2
- Add functions: `SUM`, `MIN`, `MAX`, `ROUND`, `ABS`, `CONCAT` (done)
- Type-aware formulas and better errors
- Dependency graph and recalculation rules
3. Activity + audit
- Record-level change history in UI
- Last modified by / at fields
- Restore archived rows
## Phase 3 (Team workflow and automation)
1. Tasks/reminders tied to investors/contacts
2. Automation rules (graveyard/follow-up triggers)
3. Email/communication integrations (optional)
4. Granular permissions (if team grows)
## Backlog (post-Phase-1 agentic)
### Daily activity digest (email to the team)
*Requested 2026-06-15. **Phase A deployed** (v0.1.0:76). **Phase B deployed & verified live in v0.1.0:77 (2026-06-16)** — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin. **v0.1.0:83 (built, deploy pending): in-app windowed preview** — Settings → Admin builds a digest over a chosen window (last 24h or since a date) and shows it before sending (`POST /api/admin/digest/preview`), so the **real Spark summarizer can be verified on demand** even on a quiet day (the fixed last-24h `send-now` couldn't); manual send uses the same window and never touches the daily cursor.*
**Decisions (locked 2026-06-15):** recipients = **all active admins**; summarization = **Spark-LLM narrative** (never Claude — un-anonymized substance stays local); granularity = **grouped by user** (→ per investor).
**Send transport — DECIDED 2026-06-15: Gmail domain-wide delegation** (not SMTP). The box's existing service-account grant (which powers email capture) **includes `gmail.compose`**, which authorizes `users.messages.send` — verified by a token-mint probe **and a live `messages.send` to grant**. So the digest sends through the account the CRM already uses: **no app password, no new account, no admin change.** The narrow `gmail.send` scope is *not* granted, so the sender must request `gmail.compose`.
**Phase A — DONE:** (v0.1.0:75) `configureDigestSmtp` Start9 action + `docker_entrypoint.sh` `SMTP_*` export + `backend/smtp_send.py` + admin `POST /api/admin/digest/test-email` (recipient-restricted to the admin set — not an open relay) + Settings button. (v0.1.0:76, redeploy pending) `backend/email_integration/gmail_send.py` (`users.messages.send` via DWD/compose) + `backend/digest_mailer.py` (**Gmail-DWD preferred, SMTP fallback**); the endpoint + button route through it; sender = `CRM_DIGEST_SENDER` else first active admin. Tests: `test_smtp_send.py`, `test_smtp_endpoint.py`, `test_gmail_send.py`.
**Phase B — DONE (2026-06-15/16):** `backend/digest_builder.py` builds **two sections***by team member* (per-user **Spark** narrative + both directions, with a deterministic fallback) and *by investor* (team-wide, inbound + outbound, deduped per email, structured). Soft-delete filtered throughout. `backend/email_integration/digest_scheduler.py` is an always-on daily thread that re-reads a **DB-backed policy** each cycle and sends once/day at the configured hour to all active admins (window cursor in `app_settings`). Control moved out of env into the **admin panel**: `app_settings.digest_policy` + `GET/PATCH /api/admin/digest/policy` + a Settings → Admin **enable toggle + send-time dropdown** (env vars only seed the first-boot default). Plus admin `POST /api/admin/digest/send-now` + a "Send Digest Now" button. Decisions settled: **6 PM default**, **always-send** (empty-day note), **per-user narrative + by-investor section**, **in-app control** (not StartOS). Tests: `backend/test_digest_builder.py`. Detail: `docs/guides/email.md`.
Have the CRM send a **daily digest email** summarizing each registered user's activity — primarily **who emailed which investors and the substance of those emails** — to the fund principal (and eventually other admins). Scales with the synced-user count: 2 users synced today, ~5 eventually.
- **Source data:** the captured email-activity already flowing through the Gmail DWD propose→approve pipeline (`backend/email_integration/`), keyed per registered user → per investor/contact. Optionally fold in other CRM activity (audit feed, automation runs, new opportunities) later.
- **Send path is NEW capability.** Today nothing leaves the box — the system only *captures* Gmail and *creates drafts*. This needs outbound SMTP. StartOS 0.4 has a system-wide SMTP account (since v0.4.0-beta.9): the user configures it once for the whole server and services read it via `sdk.getSystemSmtp(effects).const()`, which returns a `T.SmtpValue` (`host`, `port`, `from`, `username`, `password`, `security`). Wire the digest sender to that rather than hardcoding any account. *Implementation path (researched 2026-06-15, our SDK pin `^0.4.0-beta.66`):* model a `manageSMTP` action on [gitea-startos](https://github.com/Start9Labs/gitea-startos) / [vaultwarden-startos](https://github.com/Start9Labs/vaultwarden-startos) — a three-way `selection` (system / custom / disabled) built on `sdk.inputSpecConstants.smtpInputSpec`, persisted to `storeJson`, with `main.ts` injecting `SMTP_HOST/PORT/USER/PASS/FROM/SECURITY` env vars into the daemon `exec` block (same shape as the existing `setAnthropicApiKey.ts` action). The Python sender reads them via `os.environ` and opens `smtplib.SMTP`/`SMTP_SSL`. **"Custom SMTP" is a dedicated per-package account, fully independent of the server's system SMTP** — the custom branch never calls `getSystemSmtp`, so the digest can send through its own provider even on a box with no system account configured (confirmed in both reference packages). This is the likely fit here: a digest-only mailbox separate from anyone's Gmail. Note StartOS 0.4 dropped the old `Config`/`Properties` manifest spec — SMTP config is an **action + storeJson**, not a manifest config field. **SDK note (verified 2026-06-15):** our pin `^0.4.0-beta.66` resolves to exactly `0.4.0-beta.66` (caret on a prerelease stays within the `0.4.0` tuple), whose SMTP surface — `getSystemSmtp``T.SmtpValue {host, port, from, username, password, security}`, `inputSpecConstants.smtpInputSpec` (providers gmail/ses/sendgrid/mailgun/protonmail/other; selection disabled/system/custom), `smtpShape`, `smtpPrefill` — is **byte-identical** to the 1.5.3 reference packages (verified from published tarballs; repo `node_modules` is absent). Build against beta.66 as-is — **no SDK bump needed** (moving to 1.x is a major-track change with broad blast radius across `startos/`, and nothing about SMTP justifies it).
- **Analysis runs on Spark, never Claude.** The digest is deliberately **un-anonymized** (real LP names + email substance), so any summarization/analysis must go through **Spark Control to local models** — this is the one path that intentionally bypasses the scrub→Claude→re-hydrate boundary, because keeping the substance local is the whole point. Never route digest content to Claude.
- **Exempt from "agents draft, humans send."** That rule governs outward LP/prospect contact. This is an internal ops digest to the team's own inboxes — a different category — so an automated daily send here does not violate the draft-only guardrail. State this explicitly at build time.
- **Scheduling:** a daily cron, naturally co-located with the existing `backend/email_integration/scheduler.py` sync cadence.
- **Soft-delete:** every aggregate/read in the digest must filter `deleted_at IS NULL` (see the standing soft-delete rule).
Open design questions (settled at build time): send time = **6 PM box-local** (configurable in the admin panel), covering the ~24h window up to send; empty days = **always send** with a "no activity" note; summary granularity = **one per-user narrative** plus a **by-investor structured section** (inbound + outbound, team-wide) added 2026-06-16; enable/time live in the **admin panel** (DB-backed), not StartOS actions.
### Email/communication search + natural-language query
*Requested 2026-06-16. Three increments, **sequenced 1 → 2 → 3** (1 and 2 first as a quick increment; 3 is a separate, larger build after). Origin: Grant asked whether we can query "emails sent to a specific investor" / "activity by user," and floated NL queries like "existing investors who have committed capital across our funds that we haven't emailed in a while."*
**Status: items 1 & 2 SHIPPED in v0.1.0:83 (built + verified locally 2026-06-16, deploy pending).** The Communications tab now has the structured activity surface (item 1: typed/fixed investor dropdown, mailbox + direction + **date-range** filters, free-text, **click-to-expand full body** via `GET /api/email/detail`) and a **"Search content"** semantic mode (item 2: `GET /api/email/search` over the Qdrant email index). The dropdown-empty bug (the facet only listed grid investors) was the v83 fix — it now mirrors the list across grid/org/contact matches. **Item 3 (NL→SQL) remains** — the larger, separate build below. Detail: `docs/guides/email.md`.
**Context — the data is captured but currently has NO front-end.** The entire Gmail email schema (`emails`, `email_threads`, `email_investor_links`, `email_account_messages`, `email_activity_proposals`, …) exists and is populated by the DWD capture pipeline, but is surfaced **nowhere** in `frontend/index.html` today (only as inputs to the daily digest). So all three items below are about making already-captured data queryable/visible. Email bodies of *matched* emails are already chunked + embedded into Qdrant with `{lp_id, lp_name, doc_type:"email", date_ts}` metadata.
**Caveat that shapes all three — the two-model join.** "Emails to an investor" link to the **fundraising grid** (`email_investor_links.fundraising_investor_id`); "committed capital" lives in the grid too (`fundraising_commitments`, multi-fund). But manually-logged `communications` and `lp_profiles` (single-fund) live in the **classic** model, and the two models are only bridged by fuzzy email/name matching (no authoritative join key). Any query spanning "committed capital" + "email recency" must reckon with this. Prefer the grid side as the higher-signal source (matcher already does).
**1. Activity query endpoints + panel — DONE (v0.1.0:83).** Delivered as the **Communications tab** rather than the originally-sketched `/api/activity` endpoints: `GET /api/email/activity` (`db.query_email_activity`) returns the actual records filterable by investor / mailbox / direction / **date range** / free-text, and `GET /api/email/detail` expands the full body. Answers "emails to investor X" and "what has mailbox Y sent" interactively. Soft-delete filtered throughout; investor identity is typed (`fund:`/`org:`/`contact:`) so org/contact-only matches resolve and are pickable. *(The `collect_user_activity()`/`collect_investor_activity()` digest helpers remain the by-user/by-investor pivot source; a dedicated per-user pivot UI was not needed for the answer Grant wanted, which the mailbox+direction filters already give.)*
**2. Email content search box — DONE (v0.1.0:83).** A **"Search content"** toggle in the Communications tab → `GET /api/email/search?q=` wraps `backend/ingest/search.py:hybrid_search` filtered to `doc_type='email'`; hits are hydrated + soft-delete-filtered against SQLite (canonical) and link back to the full body. Semantic/lexical search over email *content* ("find where we discussed the mining deal"), distinct from item 1's structured filters. 503 (clean "unavailable") when Spark/Qdrant is unreachable.
**3. Natural-language → safe structured query (separate, larger, after 1 & 2).** An LLM translates a plain-English question into a **safe, read-only** DB query against the CRM, for relational/analytical questions that semantic search *cannot* answer — Grant's example ("committed across funds AND not emailed in a while") is joins + aggregates + recency, not a text-topic match. Design constraints (locked at request time, refine at build):
- **LLM = Claude behind the redaction boundary** (better at text-to-SQL than local Qwen; the scrub→Claude→re-hydrate path already exists for the PII concern). Not Spark — Spark Control offers embeddings/rerank/RAG + local chat, but **no text-to-SQL**.
- **Safety is the hard part, not the parsing.** Do NOT hand the LLM open-ended SQL against the live DB (soft-delete leaks, injection, runaway scans). Constrain it: read-only connection/view, a curated/parameterized query surface or a validated query AST, soft-delete-filtered views, row/time caps. Treat as its own designed feature with its own tests.
- Must reckon with the two-model join caveat above (capital lives in the grid; recency from email links).
### Consolidate on the fundraising grid as canonical; retire vestigial classic-CRM surfaces
*Decided 2026-06-16. The CRM carries two stacked models: the original generic CRM (contacts / lp_profiles / opportunities / manual communications) and the fundraising grid + email capture. The team uses the grid; most classic surfaces are un-adopted (verified on the box: Pipeline + Communications empty, Contacts auto-populated from the grid). **Decision: the fundraising grid + email capture is the canonical system of record;** prune or repurpose the rest rather than maintain a parallel half-empty CRM.*
**Retire `lp_profiles` + LP Tracker — DONE & deployed live (v0.1.0:78, 2026-06-16).** 21/21 backend tests green, `py_compile` clean; installed to the box (`installed-version``0.1.0:78`, migration chain …77→78 clean, server up on :8080).
- Removed the orphaned `LPTrackerPage` component + the `lp-tracker``fundraising-grid` redirect (frontend).
- Removed the `/api/lp-profiles*` endpoints (list/get/create/update) and their handlers, the unused `lp-breakdown` report + route, the contact-dossier LP display (frontend + the `lp_profile` block in `handle_get_contact`), and the demo-seed LP block.
- **Dashboard KPIs repointed:** "Total Committed" now sums `fundraising_investors.total_invested` (the canonical grid rollup), **excluding graveyarded investors** so the headline reflects live committed capital — a deliberate divergence from `/api/fundraising/relational-summary`, which sums all rows. "Total Funded" dropped — the grid has no funded-vs-committed concept and the frontend never rendered it. (If a funded/wired status is wanted later, that's a new grid feature, not a revival of lp_profiles.) Regression-guarded by `test_dashboard_report.py`.
- **Left in place (intentional):** the empty `lp_profiles` table + index (no destructive drop, per never-hard-delete); the contact-delete soft-delete cascade; the `--reset-all-data` clear; and the inert MOCK_MODE `mockDb.lp_profiles` fixtures (dev-only fallback, never hits the backend — its dashboard mock still reads mock lp_profiles, a known dev-only divergence from the real backend). Updated `test_soft_delete_reads.py` to drop the now-removed `lp_profile` assertions (kept its org `total_funded` opportunities-aggregate checks).
**Adopt the Pipeline — wire it to the grid.**
- Pipeline (`opportunities`) is fully built and functional but unused. Keep it: it's the one classic surface that tracks something the grid doesn't — a forward-looking deal funnel (stage, `expected_amount × probability`, owner, close date) vs. the grid's actual committed dollars + flags.
- New idea (Grant, 2026-06-16): let users **flag an investor in the grid as a pipeline opportunity** (a grid column/control) so it **auto-creates / syncs an `opportunities` row** that loads into the Pipeline board. Design the grid↔pipeline link (which fund seeds it? what sets stage/expected amount? keep them reconciled). Turns Pipeline from a disconnected second data-entry surface into a view driven by the canonical grid.
- Revisit the stray contact-create side-door (the "Create Opportunity" modal `POST /api/contacts`, `frontend/index.html:6030`) once the grid-driven flow exists.
**Keep the Contacts table — as the read-only per-person directory it already is.** Confirmed 2026-06-16: the grid models **investor entity → many people** correctly today. The grid "contacts" column is a multi-pill editor; each pill syncs to a `fundraising_contacts` row AND its own classic `contacts` row (5-person family office → 1 investor + 5 contacts, linked via `fundraising_contacts.contact_id`, migration 0004). The Contacts page is **read-only for creation** (header: "added from the Fundraising Grid"; no New-Contact button), edit-only via the detail slide-over — the desired flow already holds. Email capture already rolls **multiple people up to one investor** (matcher indexes each pill's email separately, all → same `fundraising_investor_id`; `email_investor_links` records both investor and specific person). No build here — future email-surfacing UI should present comms grouped by investor across all its people.
### Front-end: pre-compile JSX, drop runtime Babel (optional, larger)
*Logged 2026-06-16 during the v0.1.0:82 vendor+SRI work. The scoped fix shipped: React/ReactDOM/Babel are now vendored + SRI-pinned and served same-origin, and a jsdom render smoke check gates every build (`docs/guides/packaging.md`). This is the bigger alternative we deliberately deferred.*
Today the app ships `@babel/standalone` (~3 MB) and transforms ~5k lines of inline JSX **in the browser on every page load**. A build step that pre-compiles the JSX to plain JS would (a) eliminate the runtime-transform blank-screen class entirely (no Babel in production), and (b) load much faster. **Cost:** it introduces a build step, which contradicts the current **"No build step"** convention (single `frontend/index.html`, inline-Babel React) — so this is a real architecture change, not a tweak. Weigh only if page-load size/latency or render robustness becomes a felt problem; the render-smoke gate already de-risks the status quo. If taken: keep the source `index.html` editable, emit a compiled artifact into the s9pk, and keep the smoke check pointed at the built output.
## Definition of done for "Airtable substitute" v1
- Team can manage all investors in one master table
- Saved views replicate current Airtable workflows
- CSV import from Airtable is reliable and repeatable
- Data persists safely and supports multi-user access
- Auth is invite-only and backups are automated