grant/ten31-database

Fork 0

Files

T

Keysat c23384498b Mark v0.1.0:78 deployed & verified live (lp_profiles/LP Tracker retired)

2026-06-16 10:51:01 -05:00

18 KiB

Raw Blame History

Venture CRM Roadmap (Airtable Replacement)

Current status

Premium Airtable-like frontend grid exists and is actively iterating.
Backend now has production-grade APIs for:
- GET /api/fundraising/state
- PUT /api/fundraising/state (with optimistic version check)
- GET /api/fundraising/export
- POST /api/fundraising/backup
- POST /api/fundraising/restore-preview
- POST /api/fundraising/restore
- GET /api/fundraising/backups
- GET/PATCH /api/fundraising/backup-policy
- GET /api/fundraising/relational-summary
- GET /api/feature-requests
- POST /api/feature-requests
- PATCH /api/feature-requests/:id
New DB tables:
- fundraising_state
- fundraising_investors
- fundraising_contacts
- fundraising_funds
- fundraising_commitments
- fundraising_views
- feature_requests
- app_settings
Grid saves/restores now sync into relational fundraising tables automatically.
Formula engine is now sandboxed (no eval/new Function) with expanded function support.
Automation engine v1 added:
- Rule table + toggle API
- List memberships (main, follow_up, graveyard, longshot, all)
- Automation run log
Collaboration/reliability additions:
- Unified activity feed API (audit + automation + backup)
- Backup integrity verification API
- Better version-conflict metadata (updated_at, updated_by)
Security hardening additions:
- Basic IP rate limiting (login and write APIs)
- Configurable CORS origin (CRM_CORS_ORIGIN)
- Production secret enforcement (CRM_ENV=production requires CRM_SECRET_KEY)
- Security status API + go-live checklist (SECURITY.md)

Phase 1 (Production foundation)

Persist grid + views on backend

Wire frontend fundraising grid reads/writes to /api/fundraising/state.
Keep localStorage only as emergency fallback.
Add autosave debounce and conflict handling (expected_version).

Admin-invite auth model

Disable self-register for non-admin users.
Add admin-only invite/create-user endpoint.
Keep role model: admin, member.

Deployment and remote access

Add docker-compose for one-command launch.
Reverse proxy + TLS option (Caddy/Traefik) for non-Tailscale deployments.
Recommended for your use case: Tailscale private access to laptop host.

Data safety and operations

Automated nightly SQLite backups and restore test script.
Add /api/fundraising/export for JSON snapshot export.
Add health/readiness checks.

Phase 2 (Airtable parity)

Advanced views

Multi-condition filter groups (AND/OR groups)
Multi-column sorting
Pinned/frozen columns
Personal vs shared views

Formula engine v2

Add functions: SUM, MIN, MAX, ROUND, ABS, CONCAT (done)
Type-aware formulas and better errors
Dependency graph and recalculation rules

Activity + audit

Record-level change history in UI
Last modified by / at fields
Restore archived rows

Phase 3 (Team workflow and automation)

Tasks/reminders tied to investors/contacts
Automation rules (graveyard/follow-up triggers)
Email/communication integrations (optional)
Granular permissions (if team grows)

Backlog (post-Phase-1 agentic)

Daily activity digest (email to the team)

Requested 2026-06-15. Phase A deployed (v0.1.0:76). Phase B deployed & verified live in v0.1.0:77 (2026-06-16) — digest content + Spark summarization + daily scheduler + by-investor section + admin-panel control + on-demand send. Auto-send defaults OFF until an admin enables it in Settings → Admin.

Decisions (locked 2026-06-15): recipients = all active admins; summarization = Spark-LLM narrative (never Claude — un-anonymized substance stays local); granularity = grouped by user (→ per investor).

Send transport — DECIDED 2026-06-15: Gmail domain-wide delegation (not SMTP). The box's existing service-account grant (which powers email capture) includes gmail.compose, which authorizes users.messages.send — verified by a token-mint probe and a live messages.send to grant. So the digest sends through the account the CRM already uses: no app password, no new account, no admin change. The narrow gmail.send scope is not granted, so the sender must request gmail.compose.

Phase A — DONE: (v0.1.0:75) configureDigestSmtp Start9 action + docker_entrypoint.sh SMTP_* export + backend/smtp_send.py + admin POST /api/admin/digest/test-email (recipient-restricted to the admin set — not an open relay) + Settings button. (v0.1.0:76, redeploy pending) backend/email_integration/gmail_send.py (users.messages.send via DWD/compose) + backend/digest_mailer.py (Gmail-DWD preferred, SMTP fallback); the endpoint + button route through it; sender = CRM_DIGEST_SENDER else first active admin. Tests: test_smtp_send.py, test_smtp_endpoint.py, test_gmail_send.py.

Phase B — DONE (2026-06-15/16): backend/digest_builder.py builds two sections — by team member (per-user Spark narrative + both directions, with a deterministic fallback) and by investor (team-wide, inbound + outbound, deduped per email, structured). Soft-delete filtered throughout. backend/email_integration/digest_scheduler.py is an always-on daily thread that re-reads a DB-backed policy each cycle and sends once/day at the configured hour to all active admins (window cursor in app_settings). Control moved out of env into the admin panel: app_settings.digest_policy + GET/PATCH /api/admin/digest/policy + a Settings → Admin enable toggle + send-time dropdown (env vars only seed the first-boot default). Plus admin POST /api/admin/digest/send-now + a "Send Digest Now" button. Decisions settled: 6 PM default, always-send (empty-day note), per-user narrative + by-investor section, in-app control (not StartOS). Tests: backend/test_digest_builder.py. Detail: docs/guides/email.md.

Have the CRM send a daily digest email summarizing each registered user's activity — primarily who emailed which investors and the substance of those emails — to the fund principal (and eventually other admins). Scales with the synced-user count: 2 users synced today, ~5 eventually.

Source data: the captured email-activity already flowing through the Gmail DWD propose→approve pipeline (backend/email_integration/), keyed per registered user → per investor/contact. Optionally fold in other CRM activity (audit feed, automation runs, new opportunities) later.
Send path is NEW capability. Today nothing leaves the box — the system only captures Gmail and creates drafts. This needs outbound SMTP. StartOS 0.4 has a system-wide SMTP account (since v0.4.0-beta.9): the user configures it once for the whole server and services read it via sdk.getSystemSmtp(effects).const(), which returns a T.SmtpValue (host, port, from, username, password, security). Wire the digest sender to that rather than hardcoding any account. Implementation path (researched 2026-06-15, our SDK pin ^0.4.0-beta.66): model a manageSMTP action on gitea-startos / vaultwarden-startos — a three-way selection (system / custom / disabled) built on sdk.inputSpecConstants.smtpInputSpec, persisted to storeJson, with main.ts injecting SMTP_HOST/PORT/USER/PASS/FROM/SECURITY env vars into the daemon exec block (same shape as the existing setAnthropicApiKey.ts action). The Python sender reads them via os.environ and opens smtplib.SMTP/SMTP_SSL. "Custom SMTP" is a dedicated per-package account, fully independent of the server's system SMTP — the custom branch never calls getSystemSmtp, so the digest can send through its own provider even on a box with no system account configured (confirmed in both reference packages). This is the likely fit here: a digest-only mailbox separate from anyone's Gmail. Note StartOS 0.4 dropped the old Config/Properties manifest spec — SMTP config is an action + storeJson, not a manifest config field. SDK note (verified 2026-06-15): our pin ^0.4.0-beta.66 resolves to exactly 0.4.0-beta.66 (caret on a prerelease stays within the 0.4.0 tuple), whose SMTP surface — getSystemSmtp → T.SmtpValue {host, port, from, username, password, security}, inputSpecConstants.smtpInputSpec (providers gmail/ses/sendgrid/mailgun/protonmail/other; selection disabled/system/custom), smtpShape, smtpPrefill — is byte-identical to the 1.5.3 reference packages (verified from published tarballs; repo node_modules is absent). Build against beta.66 as-is — no SDK bump needed (moving to 1.x is a major-track change with broad blast radius across startos/, and nothing about SMTP justifies it).
Analysis runs on Spark, never Claude. The digest is deliberately un-anonymized (real LP names + email substance), so any summarization/analysis must go through Spark Control to local models — this is the one path that intentionally bypasses the scrub→Claude→re-hydrate boundary, because keeping the substance local is the whole point. Never route digest content to Claude.
Exempt from "agents draft, humans send." That rule governs outward LP/prospect contact. This is an internal ops digest to the team's own inboxes — a different category — so an automated daily send here does not violate the draft-only guardrail. State this explicitly at build time.
Scheduling: a daily cron, naturally co-located with the existing backend/email_integration/scheduler.py sync cadence.
Soft-delete: every aggregate/read in the digest must filter deleted_at IS NULL (see the standing soft-delete rule).

Open design questions (settled at build time): send time = 6 PM box-local (configurable in the admin panel), covering the ~24h window up to send; empty days = always send with a "no activity" note; summary granularity = one per-user narrative plus a by-investor structured section (inbound + outbound, team-wide) added 2026-06-16; enable/time live in the admin panel (DB-backed), not StartOS actions.

Email/communication search + natural-language query

Requested 2026-06-16. Three increments, sequenced 1 → 2 → 3 (1 and 2 first as a quick increment; 3 is a separate, larger build after). Origin: Grant asked whether we can query "emails sent to a specific investor" / "activity by user," and floated NL queries like "existing investors who have committed capital across our funds that we haven't emailed in a while."

Context — the data is captured but currently has NO front-end. The entire Gmail email schema (emails, email_threads, email_investor_links, email_account_messages, email_activity_proposals, …) exists and is populated by the DWD capture pipeline, but is surfaced nowhere in frontend/index.html today (only as inputs to the daily digest). So all three items below are about making already-captured data queryable/visible. Email bodies of matched emails are already chunked + embedded into Qdrant with {lp_id, lp_name, doc_type:"email", date_ts} metadata.

Caveat that shapes all three — the two-model join. "Emails to an investor" link to the fundraising grid (email_investor_links.fundraising_investor_id); "committed capital" lives in the grid too (fundraising_commitments, multi-fund). But manually-logged communications and lp_profiles (single-fund) live in the classic model, and the two models are only bridged by fuzzy email/name matching (no authoritative join key). Any query spanning "committed capital" + "email recency" must reckon with this. Prefer the grid side as the higher-signal source (matcher already does).

1. Activity query endpoints + panel (do first). The logic already exists and is tested inside backend/digest_builder.py — collect_user_activity() (per team-member, sent vs received, with matched investor names) and collect_investor_activity() (re-pivoted by investor, team-wide). Expose them as on-demand endpoints (e.g. GET /api/activity?user_id=…&since=…&until=… and …?investor_id=…) returning the actual records (not just the counts that /api/reports/activity gives today), plus a simple UI panel. Answers "emails to investor X" and "what has user Y sent lately" interactively. Small build — mostly assembling tested parts + a thin UI. Soft-delete filter every read.

2. Email content search box (do first, alongside 1). Wire a search box onto the email bodies already indexed in Qdrant (capability is ~80% built — see the retrieval modes in backend/ingest/search.py and the MCP hybrid_search/semantic_search/keyword_search tools). This is semantic/lexical search over email content ("find where we discussed the mining deal"), distinct from the structured filters in item 1. Decide placement (global search bar vs. a dedicated email/search page — note there's no email UI at all today, so this may pair naturally with surfacing threads). Small.

3. Natural-language → safe structured query (separate, larger, after 1 & 2). An LLM translates a plain-English question into a safe, read-only DB query against the CRM, for relational/analytical questions that semantic search cannot answer — Grant's example ("committed across funds AND not emailed in a while") is joins + aggregates + recency, not a text-topic match. Design constraints (locked at request time, refine at build):

LLM = Claude behind the redaction boundary (better at text-to-SQL than local Qwen; the scrub→Claude→re-hydrate path already exists for the PII concern). Not Spark — Spark Control offers embeddings/rerank/RAG + local chat, but no text-to-SQL.
Safety is the hard part, not the parsing. Do NOT hand the LLM open-ended SQL against the live DB (soft-delete leaks, injection, runaway scans). Constrain it: read-only connection/view, a curated/parameterized query surface or a validated query AST, soft-delete-filtered views, row/time caps. Treat as its own designed feature with its own tests.
Must reckon with the two-model join caveat above (capital lives in the grid; recency from email links).

Consolidate on the fundraising grid as canonical; retire vestigial classic-CRM surfaces

Decided 2026-06-16. The CRM carries two stacked models: the original generic CRM (contacts / lp_profiles / opportunities / manual communications) and the fundraising grid + email capture. The team uses the grid; most classic surfaces are un-adopted (verified on the box: Pipeline + Communications empty, Contacts auto-populated from the grid). Decision: the fundraising grid + email capture is the canonical system of record; prune or repurpose the rest rather than maintain a parallel half-empty CRM.

Retire lp_profiles + LP Tracker — DONE & deployed live (v0.1.0:78, 2026-06-16). 21/21 backend tests green, py_compile clean; installed to the box (installed-version→0.1.0:78, migration chain …77→78 clean, server up on :8080).

Removed the orphaned LPTrackerPage component + the lp-tracker→fundraising-grid redirect (frontend).
Removed the /api/lp-profiles* endpoints (list/get/create/update) and their handlers, the unused lp-breakdown report + route, the contact-dossier LP display (frontend + the lp_profile block in handle_get_contact), and the demo-seed LP block.
Dashboard KPIs repointed: "Total Committed" now sums fundraising_investors.total_invested (the canonical grid rollup), excluding graveyarded investors so the headline reflects live committed capital — a deliberate divergence from /api/fundraising/relational-summary, which sums all rows. "Total Funded" dropped — the grid has no funded-vs-committed concept and the frontend never rendered it. (If a funded/wired status is wanted later, that's a new grid feature, not a revival of lp_profiles.) Regression-guarded by test_dashboard_report.py.
Left in place (intentional): the empty lp_profiles table + index (no destructive drop, per never-hard-delete); the contact-delete soft-delete cascade; the --reset-all-data clear; and the inert MOCK_MODE mockDb.lp_profiles fixtures (dev-only fallback, never hits the backend — its dashboard mock still reads mock lp_profiles, a known dev-only divergence from the real backend). Updated test_soft_delete_reads.py to drop the now-removed lp_profile assertions (kept its org total_funded opportunities-aggregate checks).

Adopt the Pipeline — wire it to the grid.

Pipeline (opportunities) is fully built and functional but unused. Keep it: it's the one classic surface that tracks something the grid doesn't — a forward-looking deal funnel (stage, expected_amount × probability, owner, close date) vs. the grid's actual committed dollars + flags.
New idea (Grant, 2026-06-16): let users flag an investor in the grid as a pipeline opportunity (a grid column/control) so it auto-creates / syncs an opportunities row that loads into the Pipeline board. Design the grid↔pipeline link (which fund seeds it? what sets stage/expected amount? keep them reconciled). Turns Pipeline from a disconnected second data-entry surface into a view driven by the canonical grid.
Revisit the stray contact-create side-door (the "Create Opportunity" modal POST /api/contacts, frontend/index.html:6030) once the grid-driven flow exists.

Keep the Contacts table — as the read-only per-person directory it already is. Confirmed 2026-06-16: the grid models investor entity → many people correctly today. The grid "contacts" column is a multi-pill editor; each pill syncs to a fundraising_contacts row AND its own classic contacts row (5-person family office → 1 investor + 5 contacts, linked via fundraising_contacts.contact_id, migration 0004). The Contacts page is read-only for creation (header: "added from the Fundraising Grid"; no New-Contact button), edit-only via the detail slide-over — the desired flow already holds. Email capture already rolls multiple people up to one investor (matcher indexes each pill's email separately, all → same fundraising_investor_id; email_investor_links records both investor and specific person). No build here — future email-surfacing UI should present comms grouped by investor across all its people.

Definition of done for "Airtable substitute" v1

Team can manage all investors in one master table
Saved views replicate current Airtable workflows
CSV import from Airtable is reliable and repeatable
Data persists safely and supports multi-user access
Auth is invite-only and backups are automated

18 KiB Raw Blame History Unescape Escape