41def0f014
Box and repo on v0.1.0:88; the +Pipeline -> board -> advance-stage -> remove round-trip is verified on the box. Pipeline adoption is closed out; ROADMAP item marked done and the Next list advances to the spark-control intake card.
198 lines
35 KiB
Markdown
198 lines
35 KiB
Markdown
# Venture CRM Roadmap (Airtable Replacement)
|
||
|
||
## Current status
|
||
- Premium Airtable-like frontend grid exists and is actively iterating.
|
||
- Backend now has production-grade APIs for:
|
||
- `GET /api/fundraising/state`
|
||
- `PUT /api/fundraising/state` (with optimistic version check)
|
||
- `GET /api/fundraising/export`
|
||
- `POST /api/fundraising/backup`
|
||
- `POST /api/fundraising/restore-preview`
|
||
- `POST /api/fundraising/restore`
|
||
- `GET /api/fundraising/backups`
|
||
- `GET/PATCH /api/fundraising/backup-policy`
|
||
- `GET /api/fundraising/relational-summary`
|
||
- `GET /api/feature-requests`
|
||
- `POST /api/feature-requests`
|
||
- `PATCH /api/feature-requests/:id`
|
||
- New DB tables:
|
||
- `fundraising_state`
|
||
- `fundraising_investors`
|
||
- `fundraising_contacts`
|
||
- `fundraising_funds`
|
||
- `fundraising_commitments`
|
||
- `fundraising_views`
|
||
- `feature_requests`
|
||
- `app_settings`
|
||
- Grid saves/restores now sync into relational fundraising tables automatically.
|
||
- Formula engine is now sandboxed (no `eval`/`new Function`) with expanded function support.
|
||
- Automation engine v1 added:
|
||
- Rule table + toggle API
|
||
- List memberships (`main`, `follow_up`, `graveyard`, `longshot`, `all`)
|
||
- Automation run log
|
||
- Collaboration/reliability additions:
|
||
- Unified activity feed API (`audit` + `automation` + `backup`)
|
||
- Backup integrity verification API
|
||
- Better version-conflict metadata (`updated_at`, `updated_by`)
|
||
- Security hardening additions:
|
||
- Basic IP rate limiting (login and write APIs)
|
||
- Configurable CORS origin (`CRM_CORS_ORIGIN`)
|
||
- Production secret enforcement (`CRM_ENV=production` requires `CRM_SECRET_KEY`)
|
||
- Security status API + go-live checklist (`SECURITY.md`)
|
||
|
||
## Phase 1 (Production foundation)
|
||
1. Persist grid + views on backend
|
||
- Wire frontend fundraising grid reads/writes to `/api/fundraising/state`.
|
||
- Keep localStorage only as emergency fallback.
|
||
- Add autosave debounce and conflict handling (`expected_version`).
|
||
|
||
2. Admin-invite auth model
|
||
- Disable self-register for non-admin users.
|
||
- Add admin-only invite/create-user endpoint.
|
||
- Keep role model: `admin`, `member`.
|
||
|
||
3. Deployment and remote access
|
||
- Add `docker-compose` for one-command launch.
|
||
- Reverse proxy + TLS option (Caddy/Traefik) for non-Tailscale deployments.
|
||
- Recommended for your use case: Tailscale private access to laptop host.
|
||
|
||
4. Data safety and operations
|
||
- Automated nightly SQLite backups and restore test script.
|
||
- Add `/api/fundraising/export` for JSON snapshot export.
|
||
- Add health/readiness checks.
|
||
|
||
## Phase 2 (Airtable parity)
|
||
1. Advanced views
|
||
- Multi-condition filter groups (AND/OR groups)
|
||
- Multi-column sorting
|
||
- Pinned/frozen columns
|
||
- Personal vs shared views
|
||
|
||
2. Formula engine v2
|
||
- Add functions: `SUM`, `MIN`, `MAX`, `ROUND`, `ABS`, `CONCAT` (done)
|
||
- Type-aware formulas and better errors
|
||
- Dependency graph and recalculation rules
|
||
|
||
3. Activity + audit
|
||
- Record-level change history in UI
|
||
- Last modified by / at fields
|
||
- Restore archived rows
|
||
|
||
## Phase 3 (Team workflow and automation)
|
||
1. Tasks/reminders tied to investors/contacts
|
||
2. Automation rules (graveyard/follow-up triggers)
|
||
3. Email/communication integrations (optional)
|
||
4. Granular permissions (if team grows)
|
||
|
||
## Backlog (post-Phase-1 agentic)
|
||
|
||
### Matrix-bridge intake for the fundraising grid — M1+M2 BUILT (deploy + live smoke pending)
|
||
*Requested 2026-06-16. **M1 (scaffold + parse + in-thread propose) and M2 (match + write-on-approve) built, tested (26/26), not yet deployed** — code in `backend/matrix_intake/`, guide at `docs/guides/matrix-intake.md`. Remaining: install `matrix-nio` + creds on the Spark, create the CRM bot user, and a **live Matrix smoke** (can't run in CI). M3 (business-card photo) deferred until Spark Control has a vision model. Next major build after this is **Pipeline adoption** (see below).*
|
||
|
||
Use the **matrix-bridge** repo's pattern to listen on a dedicated ten31-database Matrix room. Send a message (with an optional business-card photo) and a local LLM **via Spark Control** parses it into the fundraising-grid schema and **auto-creates the investor entity + contact row**. For an existing investor, send a meeting note and it **appends an interaction-log entry**. Approval gate: the bot replies in Matrix with the proposed add/edit; the user approves / rejects / edits in-thread before the write commits (keeps the draft→human-approve guardrail).
|
||
- Fits the "grid is canonical" direction (writes land in `fundraising_*`) and the never-send-autonomously rule (in-thread human approval before any write).
|
||
|
||
**Locked design (2026-06-16, approved) — build now, M1 then M2:**
|
||
- **Separate component, shared scaffold:** new `backend/matrix_intake/` (its own process; lifts matrix-bridge's connect/prime-then-listen/threaded-reply plumbing). `matrix-nio` is isolated to this component's `requirements.txt` — it never enters the stdlib CRM runtime. Keeps the CRM write credential + LP data out of the general-purpose matrix-bridge bot (blast-radius + data-sovereignty), and lets the two iterate independently. Runs on the Spark (placement settled against `standards/guides/placement.md` at deploy).
|
||
- **v1 = text-only.** Business-card photo deferred to M3 — Spark Control fronts chat/embeddings/rerank but **no vision model** today, so photo→fields isn't buildable end-to-end yet.
|
||
- **Parse:** local Qwen via Spark Control `/v1/chat/completions` (temp 0, JSON-only), reusing the existing Spark client pattern (`backend/redaction`/`backend/ingest`).
|
||
- **Approval handshake (the one stateful piece):** in-memory pending-proposal store keyed by Matrix thread root; user replies **yes / edit field=value / no** in-thread. Satisfies never-write-autonomously; exempt from "agents draft, humans send" (internal data entry, like the digest).
|
||
- **CRM-side:** `POST /api/intake/investor` (service-auth) creates a new investor+contact **through the existing grid-save path** (so relational sync + audit + backup-on-write happen as with a UI edit; bot never does whole-blob RMW) or appends a meeting note to the interaction log for an existing investor; `GET /api/intake/match?q=` fuzzy-matches via the existing entity-resolution/email-matcher. New investor needs no fund at intake.
|
||
- **Phases:** M1 = scaffold + parse + in-thread propose, **no writes** (proves Matrix↔Spark). M2 = intake endpoint + match + write-on-approve + tests. M3 (deferred) = business-card photo.
|
||
|
||
**Post-deploy enhancement — fuzzy match + in-thread confirm (Grant, 2026-06-17). DEPLOYED & LIVE 2026-06-17 (v0.1.0:86; box migration chain …85→86 clean, `candidates` endpoint verified); Matrix live-smoke pending.** Today `find_intake_match` is **exact-after-normalization** (`_normalize_text` = lowercase+strip), so near-misses — "Charlie" vs "Charles" (same last name), "Acme Capital" vs "Acme Capital LLC", a one-character email typo — return no match and the bot proposes a **new** investor, risking a duplicate the human approves without realizing a near-match exists. The existing in-thread approval gate is useless against this because the human is never *shown* the near-match. Fix: matcher returns **ranked fuzzy candidates** (deterministic pre-filter: normalized name similarity / token overlap + email edit-distance ≤ ~2), surfaced in-thread for the human to confirm or pick, with the **local Spark LLM optionally re-ranking/judging the shortlist** (good at Charlie/Charles + legal-suffix equivalence; fed only the shortlist, never the whole LP list). Keeps the approval gate but makes it effective against duplicates. Land **after** the live smoke — net-new logic + reply grammar + tests; the current exact match is safe and its failure mode (a duplicate) is recoverable via the existing entity-merge subsystem (`backend/entity_*.py`).
|
||
- **As built:** `find_intake_candidates` in `server.py` (deterministic — stdlib `difflib` name similarity + token-set Jaccard, legal-suffix-aware via `_strip_legal_suffix`, + email Levenshtein ≤ 2; ranked, ≥0.62, top 5). `GET /api/intake/match` now returns `{match, candidates}`. Bot: a new `_stage="disambiguate"` shortlist (`proposals.render_disambiguation` / `interpret_disambiguation` / `attach_to_candidate` / `promote_to_new`) — human picks a number / `new` / `no`. **The optional LLM-judge re-rank was deliberately deferred** (the deterministic filter already surfaces the named cases; an LLM judge is the right *pruner* for shortlist noise — build if the deterministic ranking proves too noisy in practice). Tests: `test_intake_endpoints.py` (server fuzzy cases), `matrix_intake/test_proposals.py` (disambiguation grammar), `matrix_intake/test_crm_client.py` (candidate shape).
|
||
|
||
**Post-deploy enhancement — conversational (LLM-mediated) edits (Grant, 2026-06-17). DEPLOYED & LIVE 2026-06-17 (bot-side; pulled + restarted on the Spark `modelo32`); Matrix live-smoke pending.** Today an in-thread correction uses a rigid grammar (`edit field=value`). Let a free-form reply that isn't `yes`/`no`/a literal `edit …` be treated as a natural-language revision instruction: send {current proposal + the instruction} back through local Qwen (`spark.py`, the same parse leg — no Claude, no scrub) and re-render the revised proposal card for approval (e.g. "add that we met on June 14" → updated Note). Keeps the draft→human-approve gate (the human still confirms the LLM's revision) and subsumes `edit field=value` as a deterministic fast path. Thread the instruction text into `normalize`'s source so the email-integrity rule still holds (a revised email must appear in the original message or the instruction). Pairs naturally with the fuzzy-match item above — build both as one conversational-UX pass after the smoke. (Parsing of free-form *intake* messages already works today via the Qwen parse leg; this item is specifically about the *edit/refine* turn.)
|
||
- **As built:** `parse.revise` + `_apply_revision` (offline-testable; the approval-stage `else` branch in `bot.py` routes any non-yes/no/edit reply here). `parse_message` now stashes `_source_text` so revise can re-check email integrity against {instruction + original}; the model's email field is never trusted. No-op revisions are caught via `proposals.same_fields` (re-prompt, not a false "Updated"). **Known v1 limit:** revise edits fields but does not re-run the matcher on a mid-thread firm rename. Tests: `matrix_intake/test_parse.py` (revise merge + email integrity + match-id preservation).
|
||
|
||
**Managed service — DONE (container) 2026-06-17; dashboard card deferred to a spark-control session.** The bot ran as a bare `nohup` process (silently died on a Spark reboot). Now it's a **docker-compose service** (`docker-compose.yml` at the repo root + `backend/matrix_intake/Dockerfile`; `restart: unless-stopped` → survives reboot; image bundles `backend/matrix_intake` + the stdlib `backend/ingest` Spark client; `.env` mounted read-only). Cutover done on the Spark (nohup stopped, container `matrix-intake` up + listening). **Still bare `docker`/SSH-managed** — a spark-control dashboard card (Update/Restart/Stop/Logs tile like `matrix-bridge`) is a separate task in the spark-control repo: see `docs/handoffs/add-intake-bot-to-spark-control.md`.
|
||
|
||
**Parse mis-identifies the investor when the message names an internal teammate (found in live-smoke, 2026-06-17).** *"jonathan is chatting with wyoming soon about fund commitment"* → the bot picked **jonathan** (a colleague/CRM admin) as the investor and offered a Jonathan/Nathan fuzzy shortlist, when the investor is **Wyoming**. Root cause is upstream of matching: local Qwen has no notion of who's internal, and mis-read the sentence role. **Fix (cheap, high-confidence, near-term):** feed the parse prompt the ~5-person team roster + the frame *"messages are written by a team member about a prospect; a named team member is the person doing outreach, never the investor"* (roster from a config value or a small read — not the admin-gated `/api/users`, since the bot is a member). Offline-testable (stub the model). **Bigger design (deferred, needs more failure samples):** the user's idea of routing inputs through the LLM *with grid context* for entity resolution — feasible (local Qwen, same as the digest, never Claude) but feed a **bounded shortlist, not the full ~400-name grid** (a small model dilutes on a haystack); pairs with the deferred LLM-judge. Also exposes a missing concept: the **internal deal owner** (Jonathan), which the bot doesn't model. Get 3–5 more real intake messages before re-architecting; the roster fix lands regardless.
|
||
|
||
**Long-term — extract the intake bot to its own repo (recommended, not yet done).** Containerizing from this monorepo is the pragmatic now-state, but the bot is a genuinely separate deployable (own process, own `matrix-nio` dep, own lifecycle); its only CRM coupling is the HTTP API (a clean network contract) plus ~40 lines of stdlib Spark client (cheap to vendor). The tell: the spark-control Update button would run `git reset --hard origin/main` on the **whole CRM clone** — wrong blast radius. `matrix-bridge` is already a dedicated repo; mirror it. The extraction is a migration (new Gitea repo, move code + tests + guide, vendor the client, re-point the Spark deploy), so it's deferred until worth the lift — do it *before* wiring the spark-control card if both land in the same push.
|
||
|
||
### Scoped service-credential auth path for automated CRM writers
|
||
*Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.*
|
||
|
||
**Accepted residual risk (why this is worth revisiting):** a member credential is far broader than the bot's actual need (two endpoints: `GET /api/intake/match`, `POST /api/fundraising/log-communication`). A member can **read the entire LP/prospect database** — the exact data this system exists to keep off third-party servers — plus broad member-level *write* within the fundraising domain (could create/append on any investor). The credential lives in a `.env` on the Spark, so a Spark compromise leaks read-access to all LP data. Mitigating context: own-infra, LAN-local; the Matrix bot is the **first out-of-process API writer** (the digest runs in-process with direct DB access), so there is exactly **one** consumer today → building a token-scope framework now is premature (YAGNI).
|
||
|
||
**Right long-term design:** a hashed, revocable **service token** with a per-route **scope allowlist** (intake-match + log-communication only), minted/revoked from the admin panel, replacing the bot's member login. Revocation then kills the token without rotating a reused human password.
|
||
|
||
**Build trigger:** when a **second** out-of-process automated writer appears, OR before **any** automated writer is reachable beyond the LAN — whichever comes first. Build it once, properly, at that point.
|
||
|
||
### Admin-only vs. all-users web-UI surface — audit
|
||
*Requested 2026-06-16 (idea, P2).* Have the **explorer agent** report which web-UI functionality is visible only to admins vs. to all users (member role) — a map of the role-gated surface across `frontend/index.html` and the backend route auth checks. Useful input for the consolidation/permissions work.
|
||
|
||
### Daily activity digest (email to the team)
|
||
*Requested 2026-06-15. **Phase A deployed** (v0.1.0:76). **Phase B deployed & verified live in v0.1.0:77 (2026-06-16)** — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin. **v0.1.0:83 (built, deploy pending): in-app windowed preview** — Settings → Admin builds a digest over a chosen window (last 24h or since a date) and shows it before sending (`POST /api/admin/digest/preview`), so the **real Spark summarizer can be verified on demand** even on a quiet day (the fixed last-24h `send-now` couldn't); manual send uses the same window and never touches the daily cursor.*
|
||
|
||
**Decisions (locked 2026-06-15):** recipients = **all active admins**; summarization = **Spark-LLM narrative** (never Claude — un-anonymized substance stays local); granularity = **grouped by user** (→ per investor).
|
||
|
||
**Send transport — DECIDED 2026-06-15: Gmail domain-wide delegation** (not SMTP). The box's existing service-account grant (which powers email capture) **includes `gmail.compose`**, which authorizes `users.messages.send` — verified by a token-mint probe **and a live `messages.send` to grant**. So the digest sends through the account the CRM already uses: **no app password, no new account, no admin change.** The narrow `gmail.send` scope is *not* granted, so the sender must request `gmail.compose`.
|
||
|
||
**Phase A — DONE:** (v0.1.0:75) `configureDigestSmtp` Start9 action + `docker_entrypoint.sh` `SMTP_*` export + `backend/smtp_send.py` + admin `POST /api/admin/digest/test-email` (recipient-restricted to the admin set — not an open relay) + Settings button. (v0.1.0:76, redeploy pending) `backend/email_integration/gmail_send.py` (`users.messages.send` via DWD/compose) + `backend/digest_mailer.py` (**Gmail-DWD preferred, SMTP fallback**); the endpoint + button route through it; sender = `CRM_DIGEST_SENDER` else first active admin. Tests: `test_smtp_send.py`, `test_smtp_endpoint.py`, `test_gmail_send.py`.
|
||
|
||
**Phase B — DONE (2026-06-15/16):** `backend/digest_builder.py` builds **two sections** — *by team member* (per-user **Spark** narrative + both directions, with a deterministic fallback) and *by investor* (team-wide, inbound + outbound, deduped per email, structured). Soft-delete filtered throughout. `backend/email_integration/digest_scheduler.py` is an always-on daily thread that re-reads a **DB-backed policy** each cycle and sends once/day at the configured hour to all active admins (window cursor in `app_settings`). Control moved out of env into the **admin panel**: `app_settings.digest_policy` + `GET/PATCH /api/admin/digest/policy` + a Settings → Admin **enable toggle + send-time dropdown** (env vars only seed the first-boot default). Plus admin `POST /api/admin/digest/send-now` + a "Send Digest Now" button. Decisions settled: **6 PM default**, **always-send** (empty-day note), **per-user narrative + by-investor section**, **in-app control** (not StartOS). Tests: `backend/test_digest_builder.py`. Detail: `docs/guides/email.md`.
|
||
|
||
Have the CRM send a **daily digest email** summarizing each registered user's activity — primarily **who emailed which investors and the substance of those emails** — to the fund principal (and eventually other admins). Scales with the synced-user count: 2 users synced today, ~5 eventually.
|
||
|
||
- **Source data:** the captured email-activity already flowing through the Gmail DWD propose→approve pipeline (`backend/email_integration/`), keyed per registered user → per investor/contact. Optionally fold in other CRM activity (audit feed, automation runs, new opportunities) later.
|
||
- **Send path is NEW capability.** Today nothing leaves the box — the system only *captures* Gmail and *creates drafts*. This needs outbound SMTP. StartOS 0.4 has a system-wide SMTP account (since v0.4.0-beta.9): the user configures it once for the whole server and services read it via `sdk.getSystemSmtp(effects).const()`, which returns a `T.SmtpValue` (`host`, `port`, `from`, `username`, `password`, `security`). Wire the digest sender to that rather than hardcoding any account. *Implementation path (researched 2026-06-15, our SDK pin `^0.4.0-beta.66`):* model a `manageSMTP` action on [gitea-startos](https://github.com/Start9Labs/gitea-startos) / [vaultwarden-startos](https://github.com/Start9Labs/vaultwarden-startos) — a three-way `selection` (system / custom / disabled) built on `sdk.inputSpecConstants.smtpInputSpec`, persisted to `storeJson`, with `main.ts` injecting `SMTP_HOST/PORT/USER/PASS/FROM/SECURITY` env vars into the daemon `exec` block (same shape as the existing `setAnthropicApiKey.ts` action). The Python sender reads them via `os.environ` and opens `smtplib.SMTP`/`SMTP_SSL`. **"Custom SMTP" is a dedicated per-package account, fully independent of the server's system SMTP** — the custom branch never calls `getSystemSmtp`, so the digest can send through its own provider even on a box with no system account configured (confirmed in both reference packages). This is the likely fit here: a digest-only mailbox separate from anyone's Gmail. Note StartOS 0.4 dropped the old `Config`/`Properties` manifest spec — SMTP config is an **action + storeJson**, not a manifest config field. **SDK note (verified 2026-06-15):** our pin `^0.4.0-beta.66` resolves to exactly `0.4.0-beta.66` (caret on a prerelease stays within the `0.4.0` tuple), whose SMTP surface — `getSystemSmtp` → `T.SmtpValue {host, port, from, username, password, security}`, `inputSpecConstants.smtpInputSpec` (providers gmail/ses/sendgrid/mailgun/protonmail/other; selection disabled/system/custom), `smtpShape`, `smtpPrefill` — is **byte-identical** to the 1.5.3 reference packages (verified from published tarballs; repo `node_modules` is absent). Build against beta.66 as-is — **no SDK bump needed** (moving to 1.x is a major-track change with broad blast radius across `startos/`, and nothing about SMTP justifies it).
|
||
- **Analysis runs on Spark, never Claude.** The digest is deliberately **un-anonymized** (real LP names + email substance), so any summarization/analysis must go through **Spark Control to local models** — this is the one path that intentionally bypasses the scrub→Claude→re-hydrate boundary, because keeping the substance local is the whole point. Never route digest content to Claude.
|
||
- **Exempt from "agents draft, humans send."** That rule governs outward LP/prospect contact. This is an internal ops digest to the team's own inboxes — a different category — so an automated daily send here does not violate the draft-only guardrail. State this explicitly at build time.
|
||
- **Scheduling:** a daily cron, naturally co-located with the existing `backend/email_integration/scheduler.py` sync cadence.
|
||
- **Soft-delete:** every aggregate/read in the digest must filter `deleted_at IS NULL` (see the standing soft-delete rule).
|
||
|
||
Open design questions (settled at build time): send time = **6 PM box-local** (configurable in the admin panel), covering the ~24h window up to send; empty days = **always send** with a "no activity" note; summary granularity = **one per-user narrative** plus a **by-investor structured section** (inbound + outbound, team-wide) added 2026-06-16; enable/time live in the **admin panel** (DB-backed), not StartOS actions.
|
||
|
||
### Email/communication search + natural-language query
|
||
*Requested 2026-06-16. Three increments, **sequenced 1 → 2 → 3** (1 and 2 first as a quick increment; 3 is a separate, larger build after). Origin: Grant asked whether we can query "emails sent to a specific investor" / "activity by user," and floated NL queries like "existing investors who have committed capital across our funds that we haven't emailed in a while."*
|
||
|
||
**Status: items 1 & 2 SHIPPED in v0.1.0:83 (built + verified locally 2026-06-16, deploy pending).** The Communications tab now has the structured activity surface (item 1: typed/fixed investor dropdown, mailbox + direction + **date-range** filters, free-text, **click-to-expand full body** via `GET /api/email/detail`) and a **"Search content"** semantic mode (item 2: `GET /api/email/search` over the Qdrant email index). The dropdown-empty bug (the facet only listed grid investors) was the v83 fix — it now mirrors the list across grid/org/contact matches. **Item 3 (NL→SQL) remains** — the larger, separate build below. Detail: `docs/guides/email.md`.
|
||
|
||
**Context — the data is captured but currently has NO front-end.** The entire Gmail email schema (`emails`, `email_threads`, `email_investor_links`, `email_account_messages`, `email_activity_proposals`, …) exists and is populated by the DWD capture pipeline, but is surfaced **nowhere** in `frontend/index.html` today (only as inputs to the daily digest). So all three items below are about making already-captured data queryable/visible. Email bodies of *matched* emails are already chunked + embedded into Qdrant with `{lp_id, lp_name, doc_type:"email", date_ts}` metadata.
|
||
|
||
**Caveat that shapes all three — the two-model join.** "Emails to an investor" link to the **fundraising grid** (`email_investor_links.fundraising_investor_id`); "committed capital" lives in the grid too (`fundraising_commitments`, multi-fund). But manually-logged `communications` and `lp_profiles` (single-fund) live in the **classic** model, and the two models are only bridged by fuzzy email/name matching (no authoritative join key). Any query spanning "committed capital" + "email recency" must reckon with this. Prefer the grid side as the higher-signal source (matcher already does).
|
||
|
||
**1. Activity query endpoints + panel — DONE (v0.1.0:83).** Delivered as the **Communications tab** rather than the originally-sketched `/api/activity` endpoints: `GET /api/email/activity` (`db.query_email_activity`) returns the actual records filterable by investor / mailbox / direction / **date range** / free-text, and `GET /api/email/detail` expands the full body. Answers "emails to investor X" and "what has mailbox Y sent" interactively. Soft-delete filtered throughout; investor identity is typed (`fund:`/`org:`/`contact:`) so org/contact-only matches resolve and are pickable. *(The `collect_user_activity()`/`collect_investor_activity()` digest helpers remain the by-user/by-investor pivot source; a dedicated per-user pivot UI was not needed for the answer Grant wanted, which the mailbox+direction filters already give.)*
|
||
|
||
**2. Email content search box — DONE (v0.1.0:83).** A **"Search content"** toggle in the Communications tab → `GET /api/email/search?q=` wraps `backend/ingest/search.py:hybrid_search` filtered to `doc_type='email'`; hits are hydrated + soft-delete-filtered against SQLite (canonical) and link back to the full body. Semantic/lexical search over email *content* ("find where we discussed the mining deal"), distinct from item 1's structured filters. 503 (clean "unavailable") when Spark/Qdrant is unreachable.
|
||
|
||
**3. Natural-language → safe structured query (separate, larger, after 1 & 2).** An LLM translates a plain-English question into a **safe, read-only** DB query against the CRM, for relational/analytical questions that semantic search *cannot* answer — Grant's example ("committed across funds AND not emailed in a while") is joins + aggregates + recency, not a text-topic match. Design constraints (locked at request time, refine at build):
|
||
- **LLM = Claude behind the redaction boundary** (better at text-to-SQL than local Qwen; the scrub→Claude→re-hydrate path already exists for the PII concern). Not Spark — Spark Control offers embeddings/rerank/RAG + local chat, but **no text-to-SQL**.
|
||
- **Safety is the hard part, not the parsing.** Do NOT hand the LLM open-ended SQL against the live DB (soft-delete leaks, injection, runaway scans). Constrain it: read-only connection/view, a curated/parameterized query surface or a validated query AST, soft-delete-filtered views, row/time caps. Treat as its own designed feature with its own tests.
|
||
- Must reckon with the two-model join caveat above (capital lives in the grid; recency from email links).
|
||
|
||
### Consolidate on the fundraising grid as canonical; retire vestigial classic-CRM surfaces
|
||
*Decided 2026-06-16. The CRM carries two stacked models: the original generic CRM (contacts / lp_profiles / opportunities / manual communications) and the fundraising grid + email capture. The team uses the grid; most classic surfaces are un-adopted (verified on the box: Pipeline + Communications empty, Contacts auto-populated from the grid). **Decision: the fundraising grid + email capture is the canonical system of record;** prune or repurpose the rest rather than maintain a parallel half-empty CRM.*
|
||
|
||
**Retire `lp_profiles` + LP Tracker — DONE & deployed live (v0.1.0:78, 2026-06-16).** 21/21 backend tests green, `py_compile` clean; installed to the box (`installed-version`→`0.1.0:78`, migration chain …77→78 clean, server up on :8080).
|
||
- Removed the orphaned `LPTrackerPage` component + the `lp-tracker`→`fundraising-grid` redirect (frontend).
|
||
- Removed the `/api/lp-profiles*` endpoints (list/get/create/update) and their handlers, the unused `lp-breakdown` report + route, the contact-dossier LP display (frontend + the `lp_profile` block in `handle_get_contact`), and the demo-seed LP block.
|
||
- **Dashboard KPIs repointed:** "Total Committed" now sums `fundraising_investors.total_invested` (the canonical grid rollup), **excluding graveyarded investors** so the headline reflects live committed capital — a deliberate divergence from `/api/fundraising/relational-summary`, which sums all rows. "Total Funded" dropped — the grid has no funded-vs-committed concept and the frontend never rendered it. (If a funded/wired status is wanted later, that's a new grid feature, not a revival of lp_profiles.) Regression-guarded by `test_dashboard_report.py`.
|
||
- **Left in place (intentional):** the empty `lp_profiles` table + index (no destructive drop, per never-hard-delete); the contact-delete soft-delete cascade; the `--reset-all-data` clear; and the inert MOCK_MODE `mockDb.lp_profiles` fixtures (dev-only fallback, never hits the backend — its dashboard mock still reads mock lp_profiles, a known dev-only divergence from the real backend). Updated `test_soft_delete_reads.py` to drop the now-removed `lp_profile` assertions (kept its org `total_funded` opportunities-aggregate checks).
|
||
|
||
**Adopt the Pipeline — wire it to the grid. — DONE: DEPLOYED & live-smoked 2026-06-18 (v0.1.0:88; migration chain …86→88 clean, `0005_grid_pipeline_link.sql` applied on the box, server up; the full +Pipeline → board → advance-stage → remove round-trip is verified on the box).** *(Was: second build after the Matrix-bridge intake.)*
|
||
- Pipeline (`opportunities`) is fully built and functional but unused. Keep it: it's the one classic surface that tracks something the grid doesn't — a forward-looking deal funnel (stage, `expected_amount × probability`, owner, close date) vs. the grid's actual committed dollars + flags.
|
||
- New idea (Grant, 2026-06-16): let users **flag an investor in the grid as a pipeline opportunity** (a grid column/control) so it **auto-creates / syncs an `opportunities` row** that loads into the Pipeline board. Design the grid↔pipeline link (which fund seeds it? what sets stage/expected amount? keep them reconciled). Turns Pipeline from a disconnected second data-entry surface into a view driven by the canonical grid.
|
||
- Revisit the stray contact-create side-door (the "Create Opportunity" modal `POST /api/contacts`) once the grid-driven flow exists.
|
||
|
||
**As built (decisions locked with Grant 2026-06-17):** UX = **row action + seed modal** ("Add to Pipeline" per grid row → captures primary contact / target fund / expected amount / stage / probability). The durable link is `opportunities.fundraising_investor_id` (**migration 0005**, additive + reversible); "is in pipeline?" / "what stage?" are **derived from a live opp join**, never a denormalized flag (no drift). **Ownership split:** the grid owns whether the link exists + the seed; the **board owns stage/probability/owner/close/next-step** — a grid save never reseeds a live opp (`POST /api/fundraising/pipeline/link` is idempotent: one live opp/investor, re-link returns the existing one). Contact is **reused from the grid's synced `fundraising_contacts.contact_id`** — the `POST /api/contacts` side-door is **gone**. Grid `lead` → opp owner (fallback acting user). Two **read-only** grid columns (Pipeline action + Pipeline Stage) injected on read; their row values are stripped on write so they never persist or dirty the autosave. **Remove from pipeline** (`POST .../unlink`) **soft-deletes the opp; the grid row is left fully intact** (Grant's explicit ask). Deleting an investor from the grid archives its orphaned opp (`reconcile_grid_pipeline_links`, called after `sync_fundraising_relational`). **Folded in:** the P2 soft-delete leak in `handle_pipeline_report` + dashboard pipeline aggregates (archived opps no longer counted). Tests: `backend/test_grid_pipeline_link.py` (link/idempotent/round-trip/guards/unlink-intact/re-link/orphan/aggregates), 28/28 suite green, render-smoke green. **Deploy:** server-side → needs an **s9pk build + install** (v87); get authorization first.
|
||
- **Follow-up (v0.1.0:88, frontend-only, DEPLOYED & verified 2026-06-18):** retired the Pipeline page's **"+ New Opportunity"** button + its create-by-contact modal — an opportunity is now born **only** from a fundraising-grid investor row ("+ Pipeline"), matching how the team works (they live in the grid). The board is now a view + stage-management surface; button replaced with a muted "Add deals from the Fundraising Grid" hint. Removed the dead handler/state + the page's unused `/api/contacts` fetch.
|
||
- **Deferred (not built):** no write-back of committed dollars into grid fund cells (grid stays canonical for committed $); a graveyarded investor with a live opp still shows its stage (deliberate — a live deal is a live deal).
|
||
|
||
**Keep the Contacts table — as the read-only per-person directory it already is.** Confirmed 2026-06-16: the grid models **investor entity → many people** correctly today. The grid "contacts" column is a multi-pill editor; each pill syncs to a `fundraising_contacts` row AND its own classic `contacts` row (5-person family office → 1 investor + 5 contacts, linked via `fundraising_contacts.contact_id`, migration 0004). The Contacts page is **read-only for creation** (header: "added from the Fundraising Grid"; no New-Contact button), edit-only via the detail slide-over — the desired flow already holds. Email capture already rolls **multiple people up to one investor** (matcher indexes each pill's email separately, all → same `fundraising_investor_id`; `email_investor_links` records both investor and specific person). No build here — future email-surfacing UI should present comms grouped by investor across all its people.
|
||
|
||
### Front-end: pre-compile JSX, drop runtime Babel (optional, larger)
|
||
*Logged 2026-06-16 during the v0.1.0:82 vendor+SRI work. The scoped fix shipped: React/ReactDOM/Babel are now vendored + SRI-pinned and served same-origin, and a jsdom render smoke check gates every build (`docs/guides/packaging.md`). This is the bigger alternative we deliberately deferred.*
|
||
|
||
Today the app ships `@babel/standalone` (~3 MB) and transforms ~5k lines of inline JSX **in the browser on every page load**. A build step that pre-compiles the JSX to plain JS would (a) eliminate the runtime-transform blank-screen class entirely (no Babel in production), and (b) load much faster. **Cost:** it introduces a build step, which contradicts the current **"No build step"** convention (single `frontend/index.html`, inline-Babel React) — so this is a real architecture change, not a tweak. Weigh only if page-load size/latency or render robustness becomes a felt problem; the render-smoke gate already de-risks the status quo. If taken: keep the source `index.html` editable, emit a compiled artifact into the s9pk, and keep the smoke check pointed at the built output.
|
||
|
||
## Definition of done for "Airtable substitute" v1
|
||
- Team can manage all investors in one master table
|
||
- Saved views replicate current Airtable workflows
|
||
- CSV import from Airtable is reliable and repeatable
|
||
- Data persists safely and supports multi-user access
|
||
- Auth is invite-only and backups are automated
|