ten31-database/ROADMAP.md

# Venture CRM Roadmap (Airtable Replacement)

## Current status
- Premium Airtable-like frontend grid exists and is actively iterating.
- Backend now has production-grade APIs for:
  - `GET /api/fundraising/state`
  - `PUT /api/fundraising/state` (with optimistic version check)
  - `GET /api/fundraising/export`
  - `POST /api/fundraising/backup`
  - `POST /api/fundraising/restore-preview`
  - `POST /api/fundraising/restore`
  - `GET /api/fundraising/backups`
  - `GET/PATCH /api/fundraising/backup-policy`
  - `GET /api/fundraising/relational-summary`
  - `GET /api/feature-requests`
  - `POST /api/feature-requests`
  - `PATCH /api/feature-requests/:id`
- New DB tables:
  - `fundraising_state`
  - `fundraising_investors`
  - `fundraising_contacts`
  - `fundraising_funds`
  - `fundraising_commitments`
  - `fundraising_views`
  - `feature_requests`
  - `app_settings`
- Grid saves/restores now sync into relational fundraising tables automatically.
- Formula engine is now sandboxed (no `eval`/`new Function`) with expanded function support.
- Automation engine v1 added:
  - Rule table + toggle API
  - List memberships (`main`, `follow_up`, `graveyard`, `longshot`, `all`)
  - Automation run log
- Collaboration/reliability additions:
  - Unified activity feed API (`audit` + `automation` + `backup`)
  - Backup integrity verification API
  - Better version-conflict metadata (`updated_at`, `updated_by`)
- Security hardening additions:
  - Basic IP rate limiting (login and write APIs)
  - Configurable CORS origin (`CRM_CORS_ORIGIN`)
  - Production secret enforcement (`CRM_ENV=production` requires `CRM_SECRET_KEY`)
  - Security status API + go-live checklist (`SECURITY.md`)

## Phase 1 (Production foundation)
1. Persist grid + views on backend
- Wire frontend fundraising grid reads/writes to `/api/fundraising/state`.
- Keep localStorage only as emergency fallback.
- Add autosave debounce and conflict handling (`expected_version`).

2. Admin-invite auth model
- Disable self-register for non-admin users.
- Add admin-only invite/create-user endpoint.
- Keep role model: `admin`, `member`.

3. Deployment and remote access
- Add `docker-compose` for one-command launch.
- Reverse proxy + TLS option (Caddy/Traefik) for non-Tailscale deployments.
- Recommended for your use case: Tailscale private access to laptop host.

4. Data safety and operations
- Automated nightly SQLite backups and restore test script.
- Add `/api/fundraising/export` for JSON snapshot export.
- Add health/readiness checks.

## Phase 2 (Airtable parity)
1. Advanced views
- Multi-condition filter groups (AND/OR groups)
- Multi-column sorting
- Pinned/frozen columns
- Personal vs shared views

2. Formula engine v2
- Add functions: `SUM`, `MIN`, `MAX`, `ROUND`, `ABS`, `CONCAT` (done)
- Type-aware formulas and better errors
- Dependency graph and recalculation rules

3. Activity + audit
- Record-level change history in UI
- Last modified by / at fields
- Restore archived rows

## Phase 3 (Team workflow and automation)
1. Tasks/reminders tied to investors/contacts
2. Automation rules (graveyard/follow-up triggers)
3. Email/communication integrations (optional)
4. Granular permissions (if team grows)

## Backlog (post-Phase-1 agentic)

### Daily activity digest (email to the team)
*Requested 2026-06-15. **Phase A deployed** (v0.1.0:76). **Phase B deployed & verified live in v0.1.0:77 (2026-06-16)** — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin. **v0.1.0:83 (built, deploy pending): in-app windowed preview** — Settings → Admin builds a digest over a chosen window (last 24h or since a date) and shows it before sending (`POST /api/admin/digest/preview`), so the **real Spark summarizer can be verified on demand** even on a quiet day (the fixed last-24h `send-now` couldn't); manual send uses the same window and never touches the daily cursor.*

**Decisions (locked 2026-06-15):** recipients = **all active admins**; summarization = **Spark-LLM narrative** (never Claude — un-anonymized substance stays local); granularity = **grouped by user** (→ per investor).

**Send transport — DECIDED 2026-06-15: Gmail domain-wide delegation** (not SMTP). The box's existing service-account grant (which powers email capture) **includes `gmail.compose`**, which authorizes `users.messages.send` — verified by a token-mint probe **and a live `messages.send` to grant**. So the digest sends through the account the CRM already uses: **no app password, no new account, no admin change.** The narrow `gmail.send` scope is *not* granted, so the sender must request `gmail.compose`.

**Phase A — DONE:** (v0.1.0:75) `configureDigestSmtp` Start9 action + `docker_entrypoint.sh` `SMTP_*` export + `backend/smtp_send.py` + admin `POST /api/admin/digest/test-email` (recipient-restricted to the admin set — not an open relay) + Settings button. (v0.1.0:76, redeploy pending) `backend/email_integration/gmail_send.py` (`users.messages.send` via DWD/compose) + `backend/digest_mailer.py` (**Gmail-DWD preferred, SMTP fallback**); the endpoint + button route through it; sender = `CRM_DIGEST_SENDER` else first active admin. Tests: `test_smtp_send.py`, `test_smtp_endpoint.py`, `test_gmail_send.py`.

**Phase B — DONE (2026-06-15/16):** `backend/digest_builder.py` builds **two sections** — *by team member* (per-user **Spark** narrative + both directions, with a deterministic fallback) and *by investor* (team-wide, inbound + outbound, deduped per email, structured). Soft-delete filtered throughout. `backend/email_integration/digest_scheduler.py` is an always-on daily thread that re-reads a **DB-backed policy** each cycle and sends once/day at the configured hour to all active admins (window cursor in `app_settings`). Control moved out of env into the **admin panel**: `app_settings.digest_policy` + `GET/PATCH /api/admin/digest/policy` + a Settings → Admin **enable toggle + send-time dropdown** (env vars only seed the first-boot default). Plus admin `POST /api/admin/digest/send-now` + a "Send Digest Now" button. Decisions settled: **6 PM default**, **always-send** (empty-day note), **per-user narrative + by-investor section**, **in-app control** (not StartOS). Tests: `backend/test_digest_builder.py`. Detail: `docs/guides/email.md`.

Have the CRM send a **daily digest email** summarizing each registered user's activity — primarily **who emailed which investors and the substance of those emails** — to the fund principal (and eventually other admins). Scales with the synced-user count: 2 users synced today, ~5 eventually.

- **Source data:** the captured email-activity already flowing through the Gmail DWD propose→approve pipeline (`backend/email_integration/`), keyed per registered user → per investor/contact. Optionally fold in other CRM activity (audit feed, automation runs, new opportunities) later.
- **Send path is NEW capability.** Today nothing leaves the box — the system only *captures* Gmail and *creates drafts*. This needs outbound SMTP. StartOS 0.4 has a system-wide SMTP account (since v0.4.0-beta.9): the user configures it once for the whole server and services read it via `sdk.getSystemSmtp(effects).const()`, which returns a `T.SmtpValue` (`host`, `port`, `from`, `username`, `password`, `security`). Wire the digest sender to that rather than hardcoding any account. *Implementation path (researched 2026-06-15, our SDK pin `^0.4.0-beta.66`):* model a `manageSMTP` action on [gitea-startos](https://github.com/Start9Labs/gitea-startos) / [vaultwarden-startos](https://github.com/Start9Labs/vaultwarden-startos) — a three-way `selection` (system / custom / disabled) built on `sdk.inputSpecConstants.smtpInputSpec`, persisted to `storeJson`, with `main.ts` injecting `SMTP_HOST/PORT/USER/PASS/FROM/SECURITY` env vars into the daemon `exec` block (same shape as the existing `setAnthropicApiKey.ts` action). The Python sender reads them via `os.environ` and opens `smtplib.SMTP`/`SMTP_SSL`. **"Custom SMTP" is a dedicated per-package account, fully independent of the server's system SMTP** — the custom branch never calls `getSystemSmtp`, so the digest can send through its own provider even on a box with no system account configured (confirmed in both reference packages). This is the likely fit here: a digest-only mailbox separate from anyone's Gmail. Note StartOS 0.4 dropped the old `Config`/`Properties` manifest spec — SMTP config is an **action + storeJson**, not a manifest config field. **SDK note (verified 2026-06-15):** our pin `^0.4.0-beta.66` resolves to exactly `0.4.0-beta.66` (caret on a prerelease stays within the `0.4.0` tuple), whose SMTP surface — `getSystemSmtp` → `T.SmtpValue {host, port, from, username, password, security}`, `inputSpecConstants.smtpInputSpec` (providers gmail/ses/sendgrid/mailgun/protonmail/other; selection disabled/system/custom), `smtpShape`, `smtpPrefill` — is **byte-identical** to the 1.5.3 reference packages (verified from published tarballs; repo `node_modules` is absent). Build against beta.66 as-is — **no SDK bump needed** (moving to 1.x is a major-track change with broad blast radius across `startos/`, and nothing about SMTP justifies it).
- **Analysis runs on Spark, never Claude.** The digest is deliberately **un-anonymized** (real LP names + email substance), so any summarization/analysis must go through **Spark Control to local models** — this is the one path that intentionally bypasses the scrub→Claude→re-hydrate boundary, because keeping the substance local is the whole point. Never route digest content to Claude.
- **Exempt from "agents draft, humans send."** That rule governs outward LP/prospect contact. This is an internal ops digest to the team's own inboxes — a different category — so an automated daily send here does not violate the draft-only guardrail. State this explicitly at build time.
- **Scheduling:** a daily cron, naturally co-located with the existing `backend/email_integration/scheduler.py` sync cadence.
- **Soft-delete:** every aggregate/read in the digest must filter `deleted_at IS NULL` (see the standing soft-delete rule).

Open design questions (settled at build time): send time = **6 PM box-local** (configurable in the admin panel), covering the ~24h window up to send; empty days = **always send** with a "no activity" note; summary granularity = **one per-user narrative** plus a **by-investor structured section** (inbound + outbound, team-wide) added 2026-06-16; enable/time live in the **admin panel** (DB-backed), not StartOS actions.

### Email/communication search + natural-language query
*Requested 2026-06-16. Three increments, **sequenced 1 → 2 → 3** (1 and 2 first as a quick increment; 3 is a separate, larger build after). Origin: Grant asked whether we can query "emails sent to a specific investor" / "activity by user," and floated NL queries like "existing investors who have committed capital across our funds that we haven't emailed in a while."*

**Status: items 1 & 2 SHIPPED in v0.1.0:83 (built + verified locally 2026-06-16, deploy pending).** The Communications tab now has the structured activity surface (item 1: typed/fixed investor dropdown, mailbox + direction + **date-range** filters, free-text, **click-to-expand full body** via `GET /api/email/detail`) and a **"Search content"** semantic mode (item 2: `GET /api/email/search` over the Qdrant email index). The dropdown-empty bug (the facet only listed grid investors) was the v83 fix — it now mirrors the list across grid/org/contact matches. **Item 3 (NL→SQL) remains** — the larger, separate build below. Detail: `docs/guides/email.md`.

**Context — the data is captured but currently has NO front-end.** The entire Gmail email schema (`emails`, `email_threads`, `email_investor_links`, `email_account_messages`, `email_activity_proposals`, …) exists and is populated by the DWD capture pipeline, but is surfaced **nowhere** in `frontend/index.html` today (only as inputs to the daily digest). So all three items below are about making already-captured data queryable/visible. Email bodies of *matched* emails are already chunked + embedded into Qdrant with `{lp_id, lp_name, doc_type:"email", date_ts}` metadata.

**Caveat that shapes all three — the two-model join.** "Emails to an investor" link to the **fundraising grid** (`email_investor_links.fundraising_investor_id`); "committed capital" lives in the grid too (`fundraising_commitments`, multi-fund). But manually-logged `communications` and `lp_profiles` (single-fund) live in the **classic** model, and the two models are only bridged by fuzzy email/name matching (no authoritative join key). Any query spanning "committed capital" + "email recency" must reckon with this. Prefer the grid side as the higher-signal source (matcher already does).

**1. Activity query endpoints + panel — DONE (v0.1.0:83).** Delivered as the **Communications tab** rather than the originally-sketched `/api/activity` endpoints: `GET /api/email/activity` (`db.query_email_activity`) returns the actual records filterable by investor / mailbox / direction / **date range** / free-text, and `GET /api/email/detail` expands the full body. Answers "emails to investor X" and "what has mailbox Y sent" interactively. Soft-delete filtered throughout; investor identity is typed (`fund:`/`org:`/`contact:`) so org/contact-only matches resolve and are pickable. *(The `collect_user_activity()`/`collect_investor_activity()` digest helpers remain the by-user/by-investor pivot source; a dedicated per-user pivot UI was not needed for the answer Grant wanted, which the mailbox+direction filters already give.)*

**2. Email content search box — DONE (v0.1.0:83).** A **"Search content"** toggle in the Communications tab → `GET /api/email/search?q=` wraps `backend/ingest/search.py:hybrid_search` filtered to `doc_type='email'`; hits are hydrated + soft-delete-filtered against SQLite (canonical) and link back to the full body. Semantic/lexical search over email *content* ("find where we discussed the mining deal"), distinct from item 1's structured filters. 503 (clean "unavailable") when Spark/Qdrant is unreachable.

**3. Natural-language → safe structured query (separate, larger, after 1 & 2).** An LLM translates a plain-English question into a **safe, read-only** DB query against the CRM, for relational/analytical questions that semantic search *cannot* answer — Grant's example ("committed across funds AND not emailed in a while") is joins + aggregates + recency, not a text-topic match. Design constraints (locked at request time, refine at build):
  - **LLM = Claude behind the redaction boundary** (better at text-to-SQL than local Qwen; the scrub→Claude→re-hydrate path already exists for the PII concern). Not Spark — Spark Control offers embeddings/rerank/RAG + local chat, but **no text-to-SQL**.
  - **Safety is the hard part, not the parsing.** Do NOT hand the LLM open-ended SQL against the live DB (soft-delete leaks, injection, runaway scans). Constrain it: read-only connection/view, a curated/parameterized query surface or a validated query AST, soft-delete-filtered views, row/time caps. Treat as its own designed feature with its own tests.
  - Must reckon with the two-model join caveat above (capital lives in the grid; recency from email links).

### Consolidate on the fundraising grid as canonical; retire vestigial classic-CRM surfaces
*Decided 2026-06-16. The CRM carries two stacked models: the original generic CRM (contacts / lp_profiles / opportunities / manual communications) and the fundraising grid + email capture. The team uses the grid; most classic surfaces are un-adopted (verified on the box: Pipeline + Communications empty, Contacts auto-populated from the grid). **Decision: the fundraising grid + email capture is the canonical system of record;** prune or repurpose the rest rather than maintain a parallel half-empty CRM.*

**Retire `lp_profiles` + LP Tracker — DONE & deployed live (v0.1.0:78, 2026-06-16).** 21/21 backend tests green, `py_compile` clean; installed to the box (`installed-version`→`0.1.0:78`, migration chain …77→78 clean, server up on :8080).
- Removed the orphaned `LPTrackerPage` component + the `lp-tracker`→`fundraising-grid` redirect (frontend).
- Removed the `/api/lp-profiles*` endpoints (list/get/create/update) and their handlers, the unused `lp-breakdown` report + route, the contact-dossier LP display (frontend + the `lp_profile` block in `handle_get_contact`), and the demo-seed LP block.
- **Dashboard KPIs repointed:** "Total Committed" now sums `fundraising_investors.total_invested` (the canonical grid rollup), **excluding graveyarded investors** so the headline reflects live committed capital — a deliberate divergence from `/api/fundraising/relational-summary`, which sums all rows. "Total Funded" dropped — the grid has no funded-vs-committed concept and the frontend never rendered it. (If a funded/wired status is wanted later, that's a new grid feature, not a revival of lp_profiles.) Regression-guarded by `test_dashboard_report.py`.
- **Left in place (intentional):** the empty `lp_profiles` table + index (no destructive drop, per never-hard-delete); the contact-delete soft-delete cascade; the `--reset-all-data` clear; and the inert MOCK_MODE `mockDb.lp_profiles` fixtures (dev-only fallback, never hits the backend — its dashboard mock still reads mock lp_profiles, a known dev-only divergence from the real backend). Updated `test_soft_delete_reads.py` to drop the now-removed `lp_profile` assertions (kept its org `total_funded` opportunities-aggregate checks).

**Adopt the Pipeline — wire it to the grid.**
- Pipeline (`opportunities`) is fully built and functional but unused. Keep it: it's the one classic surface that tracks something the grid doesn't — a forward-looking deal funnel (stage, `expected_amount × probability`, owner, close date) vs. the grid's actual committed dollars + flags.
- New idea (Grant, 2026-06-16): let users **flag an investor in the grid as a pipeline opportunity** (a grid column/control) so it **auto-creates / syncs an `opportunities` row** that loads into the Pipeline board. Design the grid↔pipeline link (which fund seeds it? what sets stage/expected amount? keep them reconciled). Turns Pipeline from a disconnected second data-entry surface into a view driven by the canonical grid.
- Revisit the stray contact-create side-door (the "Create Opportunity" modal `POST /api/contacts`, `frontend/index.html:6030`) once the grid-driven flow exists.

**Keep the Contacts table — as the read-only per-person directory it already is.** Confirmed 2026-06-16: the grid models **investor entity → many people** correctly today. The grid "contacts" column is a multi-pill editor; each pill syncs to a `fundraising_contacts` row AND its own classic `contacts` row (5-person family office → 1 investor + 5 contacts, linked via `fundraising_contacts.contact_id`, migration 0004). The Contacts page is **read-only for creation** (header: "added from the Fundraising Grid"; no New-Contact button), edit-only via the detail slide-over — the desired flow already holds. Email capture already rolls **multiple people up to one investor** (matcher indexes each pill's email separately, all → same `fundraising_investor_id`; `email_investor_links` records both investor and specific person). No build here — future email-surfacing UI should present comms grouped by investor across all its people.

### Front-end: pre-compile JSX, drop runtime Babel (optional, larger)
*Logged 2026-06-16 during the v0.1.0:82 vendor+SRI work. The scoped fix shipped: React/ReactDOM/Babel are now vendored + SRI-pinned and served same-origin, and a jsdom render smoke check gates every build (`docs/guides/packaging.md`). This is the bigger alternative we deliberately deferred.*

Today the app ships `@babel/standalone` (~3 MB) and transforms ~5k lines of inline JSX **in the browser on every page load**. A build step that pre-compiles the JSX to plain JS would (a) eliminate the runtime-transform blank-screen class entirely (no Babel in production), and (b) load much faster. **Cost:** it introduces a build step, which contradicts the current **"No build step"** convention (single `frontend/index.html`, inline-Babel React) — so this is a real architecture change, not a tweak. Weigh only if page-load size/latency or render robustness becomes a felt problem; the render-smoke gate already de-risks the status quo. If taken: keep the source `index.html` editable, emit a compiled artifact into the s9pk, and keep the smoke check pointed at the built output.

## Definition of done for "Airtable substitute" v1
- Team can manage all investors in one master table
- Saved views replicate current Airtable workflows
- CSV import from Airtable is reliable and repeatable
- Data persists safely and supports multi-user access
- Auth is invite-only and backups are automated