docs: spec the notes-blob → communications unification; record v0.1.0:106
- ROADMAP: full spec to retire the notes blob into a single investor-anchored communications store (schema rebuild, email unification, log-form simplify, derived grid column, LLM-assisted retrofit-then-delete); mark contact_type retirement done - AGENTS.md: Current state -> v0.1.0:106 deployed; Next points at the census extension + comms unification
This commit is contained in:
@@ -108,12 +108,10 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
|
||||
|
||||
## Current state
|
||||
|
||||
_**Box live at v0.1.0:105 (deployed + verified 2026-06-20)** — clean StartOS migration chain (…→105) and the in-app SQL chain through `0008_drop_retired_tables` (`lp_profiles` + `feature_requests` physically dropped on the box), server up on :8080. This session = a **removal + bug-fix + feature batch** (v0.1.0:104, below) **+ a TEMPORARY admin contacts-census diagnostic (v0.1.0:105 — delete after use).** **The fundraising grid + email capture is the canonical system of record.** History: git log + `start9/0.4/startos/versions/`._
|
||||
_**Box live at v0.1.0:106 (deployed + verified 2026-06-21)** — clean StartOS migration chain (…→106), server up on :8080. **The fundraising grid + email capture is the canonical system of record.** History: git log + `start9/0.4/startos/versions/`._
|
||||
|
||||
- **Removed (v0.1.0:104):** the **Instructions** + **Feedback** (`feature_requests`) pages + backend, and `lp_profiles` + `investor_type` (across server / ingest / seeds). Migration `0008` drops both empty tables (a sanctioned one-off exception to never-hard-delete); `0001`'s `lp_profiles` ALTER was removed so a fresh DB doesn't break the migration chain. Net −570 lines.
|
||||
- **Fixes (v0.1.0:104):** [B] email sync no longer terminally parks a mailbox on a transient timeout — `'retrying'` retries every cycle, `'error'` re-included on an hourly backoff, so **Grant's & Jonathan's stuck mailboxes self-heal on this deploy** (`test_sync_ready.py`). [C] clock icon on the mobile email Review-log sets a reminder inline. [D] email-approval cards show date/time. **[Contacts 500-cap]** the mobile Contacts directory now pages through ALL contacts (was truncated at 500 of 720 — hid people from the list *and* search).
|
||||
- **New (v0.1.0:104):** admin-only **Purge Deleted Data** (Settings → Admin) — guarded, type-to-confirm hard-delete of soft-deleted rows; see the soft-delete convention + `test_purge_soft_deleted.py`.
|
||||
- **Verification:** **45/45** backend, render-smoke green, reviewer-agent APPROVE after fixing **1 blocker** (contact purge left a dangling `reminders.contact_id` — now NULLed + test-guarded). New UI behavior is **live-smoke / on-device only** (jsdom can't drive touch).
|
||||
- **Shipped (v0.1.0:106):** **retired `contacts.contact_type` (logical).** Desktop Contacts lost the Investors/Prospects tabs + TYPE badge → a grid-derived **Status** (existing-LP badge + pipeline-stage chip via `contact_grid_signals`); dashboard `total_lps`/`total_prospects` now count grid investor entities (committed>0 vs $0, graveyard + 'Untitled Investor' excluded) + fixed a `total_contacts` soft-delete leak. Column left physically inert; physical DROP deferred to a signed-off table-rebuild migration. **45/45** + new dashboard assertions, render-smoke green, reviewer APPROVE (no blockers).
|
||||
- **New SPEC in ROADMAP (Grant 2026-06-21): retire the notes blob → unify ALL activity into `communications`.** Traced in code: the blob (`fundraising_investors.notes`) and `communications` are dual-written by `log-communication`; they drift, leak soft-deletes into the grid + grounding corpus, and **emails land in the blob, never in `communications`** (so the timeline misses emails). Plan: rebuild the leaf `communications` table (add `fundraising_investor_id` NOT NULL, relax `contact_id` nullable, drop `duration_minutes`/`attendees`/`outcome`/`opportunity_id`), unify emails into comms, simplify the log form (+ `next_action`→auto-reminder, backdatable date, desktop==mobile), make the grid Notes column a derived view + Log button, retrofit blobs→comms, then DELETE the blob. Multi-session; see the ROADMAP spec.
|
||||
- **Bug A — Grant is handling:** `odell/marty/finance/ten31@` can't enroll for email capture ("could not resolve user_id") because the enroll flow requires a CRM `users` row; Grant is creating user accounts for those mailboxes.
|
||||
- **Next:** (A) **retire `contact_type`** (the next build) — replace the Contacts Investors/Prospects tabs + TYPE badge with grid-derived `existing_investor`/`pipeline_stage`, repoint the dashboard `total_lps`/`total_prospects` counts, then drop the column (live UI change → its own small design pass; see ROADMAP); (B) **contacts ↔ `fundraising_contacts` consolidation** — capture A/B/C from the live census (Settings → Admin → "Run census", or `GET /api/admin/contacts-census`), then **DELETE the TEMPORARY census endpoint + handler + route + button** (all tagged `TEMPORARY`; mirrors `backend/scripts/contacts_census.sql`); (C) confirm the two stuck mailboxes pulled current + Grant's 4 new mailbox users enroll; (D) carried: bell approve-on-phone → Matrix-thread-clears round-trip spot-check.
|
||||
- **Open / risks:** the Contacts pagination, the purge, and the email-sync auto-recovery are **live-smoke / not yet device-confirmed**. Carried: **Claude/Architect path unverified live on the box**; vision OCR small-in-frame misread (`mara.com→marac.com`); doc drift — `crm-overview.md` narrative + `EVALUATION.md` still describe `lp_profiles` (the active API/schema claims were fixed; the deeper Phase-0 narrative is deferred to a doc pass).
|
||||
- **Next:** (A) begin the **notes-blob → communications** work — start by **extending the contacts census** (count investors with notes but zero contacts; pure-structured vs legacy free-text blobs) to size the retrofit + the contactless gap; (B) **contacts ↔ `fundraising_contacts` consolidation** (path a — every investor ≥1 contact) + **DELETE the TEMPORARY census endpoint/handler/route/button** once A/B/C captured; (C) the deferred `contact_type` physical DROP can ride the `communications` rebuild; (D) confirm the two stuck mailboxes + Grant's 4 new mailbox users enroll; (E) carried: bell approve-on-phone → Matrix-thread-clears spot-check.
|
||||
- **Open / risks:** v106 desktop Contacts Status + dashboard counts are **live-smoke / not yet device-confirmed**. Carried: **Claude/Architect path unverified live on the box**; vision OCR small-in-frame misread (`mara.com→marac.com`); doc drift — `crm-overview.md` narrative + `EVALUATION.md` still describe `lp_profiles` (active API/schema claims fixed; deeper Phase-0 narrative deferred to a doc pass).
|
||||
|
||||
+39
-1
@@ -84,10 +84,48 @@
|
||||
|
||||
### Data-model cleanups (deferred from the v0.1.0:104 session)
|
||||
|
||||
- **Retire `contacts.contact_type`** (the Contacts Investors/Prospects tabs + TYPE badge). It's a legacy binary that's set mechanically — `'investor'` just means "exists in the grid" (stamped unconditionally by `_upsert_contact_from_fundraising`), `'prospect'` means "imported/added, not in the grid" — and is superseded by the grid-derived signals `contact_grid_signals()` already injects (`existing_investor`/`committed`, `pipeline_stage`). Plan: replace the tabs + TYPE badge with those signals, repoint the dashboard `total_lps`/`total_prospects` counts, then drop the column. Live UI change → its own small design pass. (Grant: "I want to delete it, next session.")
|
||||
- **Retire `contacts.contact_type` — DONE (logical), deployed v0.1.0:106 (2026-06-21).** The Investors/Prospects tabs + TYPE badge are gone; desktop Contacts shows a grid-derived **Status** (existing-LP badge + pipeline-stage chip via `contact_grid_signals`), the dashboard `total_lps`/`total_prospects` now count grid investor entities (committed>0 vs $0, graveyard + 'Untitled Investor' blank-row excluded; also fixed a `total_contacts` soft-delete leak), and no code reads/writes the column. **Left:** the column is physically inert (`DEFAULT 'prospect'`); a physical DROP is deferred to a signed-off table-rebuild migration (SQLite no-drop-column; `contacts` is FK-referenced) — same retire-then-drop path lp_profiles took (v78→v104). Folds into the comms-unification rebuild below or its own later migration. 45/45 + new dashboard assertions, render-smoke + reviewer APPROVE.
|
||||
|
||||
- **Consolidate `contacts` ↔ `fundraising_contacts` into one linked model.** Goal (Grant): everyone in `contacts` maps to a `fundraising_investors` row (an individual maps to their own row). Today `contacts` is the canonical person directory (FK target for `communications`/`opportunities`); `fundraising_contacts.contact_id` (migration `0004`) points INTO it; the mobile Contacts page reads `contacts`. Three populations: **A** linked (grid pill ↔ contact), **B** `contacts`-only (imported prospects / manual adds — need a grid row), **C** pill-only (`fundraising_contacts.contact_id IS NULL` — need a contact row). **Census-first:** before designing any migration, count A/B/C on the box — Grant runs the SQL himself (he is **not** providing a DB copy), so hand him a counts-only script. The census decides whether this is a ~20-row cleanup or a ~300-row structural migration with `communications`/`opportunities` repointing. Then Grant reconciles B (add grid rows/pills) and C (add contact rows) and ensures all are linked. **(v0.1.0:105) A TEMPORARY admin census ships to read A/B/C off the box without shell access: `GET /api/admin/contacts-census` (`handle_contacts_census`) + a Settings → Admin "Run census" button, mirroring `backend/scripts/contacts_census.sql` (counts only). DELETE the endpoint + route + button after the numbers are captured — all tagged `TEMPORARY` in code.**
|
||||
|
||||
### Retire the notes blob → unify ALL activity into `communications` (SPEC, Grant 2026-06-21)
|
||||
*A multi-session structural change. **End goal:** `communications` is the single source of truth for every touchpoint with an investor; the grid's free-text "Notes / Communication / Outreach" blob is **deleted entirely** (no archive — but only after the retrofit is verified). Grounding for the decision below was traced in code 2026-06-21.*
|
||||
|
||||
**The problem (two writable stores, one endpoint).** `POST /api/fundraising/log-communication` (`server.py:3389`) writes BOTH a normalized `communications` row (the real record) AND, when `append_note` (default true), one appended line to the grid blob (`fundraising_investors.notes`, the row's `notes` longtext — also synced to `fundraising_investors.notes`). This dual store has real bugs, not just mess:
|
||||
- **One-way drift:** editing/soft-deleting a `communication` does NOT update/remove its blob line; editing the blob directly creates no communication. Neither store is trustworthy alone.
|
||||
- **Soft-delete leak:** a deleted communication's text lives on in the blob, which is shown in the grid AND fed to the Spark grounding corpus — "deleted" content isn't gone (violates the soft-delete-everywhere rule).
|
||||
- **No per-entry structure** in the blob: no author, no type/filter, no edit/remove of a single line, no precise dates.
|
||||
- **Email fragmentation (the key finding):** approving an email-activity proposal appends a one-line summary to the **blob** (`_append_grid_note`, `server.py:6754`), and emails are **never** inserted into `communications`. So the per-investor **timeline** (`GET /api/communications?source_row_id` → `NoteTimeline`) shows logged calls/notes but **NOT emails**. The blob is currently the ONLY place email-as-a-touchpoint is unified onto the investor. (The daily digest reports email activity via its own separate `email_*`-table path, `digest_builder.collect_investor_activity`; the grounding corpus already pulls full matched-email bodies directly from `emails`, capped ~4000 chars — so grounding doesn't depend on the blob for emails.) **Consequence: retiring the blob REQUIRES unifying emails into `communications` first, or the timeline loses email activity.**
|
||||
|
||||
**Schema change — `communications` becomes investor-anchored (LOCKED with Grant 2026-06-21).** `communications` is a **leaf table** (verified: nothing FK-references it), so ONE rebuild migration does everything:
|
||||
- **Add `fundraising_investor_id` → NOT NULL** (the anchor; backfill existing rows via the `contact_id → fundraising_contacts → fundraising_investors` join). This is path b: a touchpoint is fundamentally with the *investor entity*.
|
||||
- **Relax `contact_id` → nullable** (the specific person, when known). Lets us log "reached out on the org's LinkedIn / emailed the website alias" before a named contact exists; attach the person later.
|
||||
- **Drop the dead fields** (Grant, never used / too tedious): `duration_minutes`, `attendees`, `outcome`, `opportunity_id`. (Pipeline decides *who* to contact; a comm needn't be assigned to an opp.)
|
||||
- Migration is additive-where-possible but the NOT-NULL relax + column drops need the SQLite 12-step table rebuild; ship a `.down.sql`, run against a **copy** of `crm.db` first.
|
||||
- **Data-hygiene goal (path a, separate):** every investor SHOULD have ≥1 contact — even a placeholder ("Harvard Endowment") or a found name. Encouraged, not enforced at the comm level. Pairs with the contacts↔`fundraising_contacts` consolidation above.
|
||||
|
||||
**`log-communication` field set (simplified, desktop == mobile — Grant).** Types: **`email/call/meeting/note`** (drop `text` — redundant with `note`, Grant). Fields: type, **communication date (defaults today, backdatable)**, body (free text — fold the old `outcome`/attendees prose in here), and **`next_action` + `next_action_date`** which, when set, **auto-create a reminder** (the W1 `reminders` table) on the investor. Remove the duration/attendees/outcome/opportunity inputs. **Same fields on desktop and mobile** (one shared form).
|
||||
|
||||
**Email → `communications` unification.** Repoint email-proposal approval (`decide_email_activity_proposal`) to create an **`email`-type communication** (investor_id from `email_investor_links.fundraising_investor_id`; contact_id when the person is a known contact, else null) instead of appending to the blob. Full email body stays in the `emails` table; the communication is the touchpoint record + links back. Then the timeline shows emails, and consumers stop needing a parallel email path.
|
||||
|
||||
**The grid column → derived view + Log button.** "Notes / Communication / Outreach" becomes a **read-only derived view** (render the latest N communications for the row inline), with a quick **Log button** (reuse the existing log-communication modal) for scratch entries — replacing inline free-text editing. The detail timeline already reads from `communications`.
|
||||
|
||||
**Retrofit (LLM-assisted, two-tier, then delete the blob).** Parse each investor's blob into `communications`:
|
||||
- **Structured auto-appended lines** (`2026-06-20 [call] Jane Smith: …`, generated by log-communication) → **deterministic** parse, round-trips exactly (date, type, contact-name→contact_id, summary→subject). No LLM needed.
|
||||
- **Legacy free-text** → feed the blob to the **local LLM via Spark Control** (never Claude — raw LP notes are Tier-2 sensitive; same local-only basis as the digest / intake / grounding-minimize) and ask it to split it into a set of `{date, type, summary}` communication entries, OR fall back to **one blanket `note`-comm** when it can't structure it. Much of the free text already carries explicit dates ("May 25 2025 — got a call", "Jun 16 2024 — sent an email"), so per-event reconstruction is realistic, not just a generic dump.
|
||||
- **Missing-year caveat (Grant):** entries over the years often omit the year ("contacted Jan 10"). The model can infer it from surrounding entries + the blob's roughly-chronological append order, but it **won't be perfectly precise** — expect gaps. So **flag inferred dates as low-confidence** (don't assert a wrong precise date) and **surface the suggested logs for human review/edit before committing** (fits the draft→approve guardrail). The blanket-fallback comm dates to `notes_last_modified`.
|
||||
- The whole retrofit is **idempotent + reversible**, run against a **copy** of `crm.db` first. Once verified, **DROP the blob entirely** (the `notes` grid column + `fundraising_investors.notes`) — Grant: no archive, full deletion, but only after the retrofit is confirmed correct.
|
||||
|
||||
**Sequencing (safe, not big-bang):**
|
||||
1. **Extend the census** (the consolidation's temporary endpoint) to also count: investors with notes but **zero contacts**, and blobs that are pure-structured vs. contain legacy free-text — sizes the retrofit + the contactless gap.
|
||||
2. **Schema rebuild** of `communications` (investor_id NOT NULL + contact_id nullable + drop dead fields), backfill investor_id.
|
||||
3. **Unify emails** into `communications` (repoint proposal approval) + simplify the log-communication form (+ reminder auto-create) + same desktop/mobile fields.
|
||||
4. **Retrofit** blobs → comms (DB copy, verified).
|
||||
5. **Cut over:** grid column → derived view + Log button; grounding drops the notes source; stop writing the blob.
|
||||
6. **Delete** the blob (column + relational mirror) once everything reads from comms.
|
||||
|
||||
**Open / depends-on:** the contacts↔`fundraising_contacts` consolidation (path a) and the W1 reminders table (next_action → reminder). The `contact_type` physical-drop migration can ride the same `communications` rebuild window.
|
||||
|
||||
### Captured tweaks (Matrix, 2026-06-18/20)
|
||||
*Small UI/UX + capture-quality items captured via Matrix; not yet scheduled.*
|
||||
|
||||
|
||||
Reference in New Issue
Block a user