Matrix intake: fuzzy investor matching + conversational in-thread edits (v0.1.0:86)
Close the two locked post-deploy enhancements for the Matrix intake bot.
Fuzzy matching (server-side, ships in the s9pk): new find_intake_candidates in
server.py returns ranked deterministic near-matches (difflib name similarity +
token-set Jaccard, legal-suffix-aware, + email Levenshtein <= 2); GET
/api/intake/match now returns {match, candidates}. The bot surfaces a numbered
shortlist so a near-duplicate (Charlie/Charles, Acme Capital vs Acme Capital LLC,
a one-char email typo) is confirmed by a human instead of silently creating a
second investor. Exact match still auto-attaches; fuzzy candidates are never
auto-attached. The optional LLM-judge re-rank is deferred.
Conversational edits (bot-side, ships on the Spark): any in-thread reply that
isn't yes/no/edit field=value is treated as a natural-language revision and
re-run through local Qwen (parse.revise). Email integrity is preserved -- a
changed address must literally appear in the instruction; the model's email
field is structurally unreachable. No-op revisions re-prompt.
Docs/current-state brought current; 27/27 backend tests green.
This commit is contained in:
@@ -103,16 +103,17 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
|
|||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
_Phase 0 substrate + Phase 1 thesis/outreach are built; **box and repo at v0.1.0:83** (deployed & verified live 2026-06-16). v83 (latest): **email search/query + windowed digest preview** — Communications tab gains a fixed/typed investor dropdown, a date-range filter, a full-body view, and a semantic "Search content" mode; the Daily Digest gains an in-app windowed preview before send. Prior v82: front-end libs vendored + SRI-pinned + jsdom render-smoke build gate. **Decision (2026-06-16): the fundraising grid + email capture is the canonical system of record** — vestigial classic-CRM surfaces get pruned or repurposed (see `ROADMAP.md` → "Consolidate on the fundraising grid as canonical"). Longer-term backlog: `ROADMAP.md`._
|
_Phase 0 substrate + Phase 1 thesis/outreach built; **box and repo at v0.1.0:85** (deployed & verified live 2026-06-17). **The fundraising grid + email capture is the canonical system of record** (decision 2026-06-16) — vestigial classic-CRM surfaces get pruned/repurposed. Longer-term backlog: `ROADMAP.md`._
|
||||||
|
|
||||||
- **Built & reviewed, not yet deployed — Matrix intake bot (M1+M2), `backend/matrix_intake/`:** a separate-process bot (its `matrix-nio` dep isolated from the stdlib CRM) that turns a typed message in a dedicated Matrix room into a proposed fundraising-grid add/edit and writes only after **in-thread human approval** (`yes`/`edit field=value`/`no`). Parse = local Qwen via Spark Control (reuses `ingest/llm.py`; no Claude, no scrub needed — local path like the digest). Writes reuse the CRM's own `POST /api/fundraising/log-communication` (create-if-missing + contact upsert + note + relational sync + audit), tagged `source="matrix_intake"`; the one new CRM surface is read-only `GET /api/intake/match` (`find_intake_match`) returning the **grid row id** so an approved note lands on the matched investor (no duplicate). v1 is **text-only** — business-card photo (M3) is deferred (Spark Control has no vision model). Reviewer-passed (double-approve race fixed — `handle_reply` pops before the commit await; edit-grammar fix). **Code-complete, compiles, 26/26 tests green; a live Matrix smoke needs creds + `matrix-nio` on the Spark (can't run in CI).** Guide: `docs/guides/matrix-intake.md` (incl. the `settings.py`-not-`config.py` collision + email-integrity gotchas).
|
- **Matrix intake bot — DEPLOYED & LIVE (2026-06-17), `backend/matrix_intake/`:** a separate-process bot (its `matrix-nio` dep isolated from the stdlib CRM) turning a typed Matrix-room message into a proposed fundraising-grid add/edit, written only after **in-thread human approval** (`yes`/`edit field=value`/`no`). Parse = local Qwen via Spark Control (no Claude/scrub, like the digest); writes reuse the CRM's own `POST /api/fundraising/log-communication` tagged `source="matrix_intake"`; new-vs-existing via read-only `GET /api/intake/match` (returns the grid row id → no duplicate). **Runs on the Spark** (`modelo32`, nohup+venv; pid `/tmp/intake-bot.pid`, log `/tmp/intake-bot.log`) — **not a systemd service yet** (won't survive a reboot). **Live-smoked end-to-end** (new-investor create + existing-investor note matched & appended, no dup). Server side shipped to the box as **v0.1.0:84** (`/api/intake/match` + `source` provenance — these were missing on v83, so the bot 404'd until v84); then UX adds: main-timeline nudge pointer, top-level-`yes`→thread redirect, clearer commit wording, note text in the grid line (v85 dropped the `[note]` tag). M3 (business-card photo) deferred (no Spark vision model). Guide: `docs/guides/matrix-intake.md`.
|
||||||
|
- **Matrix intake — fuzzy-match + conversational-edit pass — BUILT 2026-06-17, NOT yet deployed/live-smoked (repo at v0.1.0:86; box still v85).** Closes the two locked post-deploy enhancements (ROADMAP). **(a) Fuzzy matching (server-side, ships in the s9pk):** `find_intake_candidates` in `server.py` (deterministic — stdlib `difflib` name similarity + token-set Jaccard, legal-suffix-aware via `_strip_legal_suffix`, + email Levenshtein ≤ 2; ranked, ≥0.62, top 5); `GET /api/intake/match` now returns `{match, candidates}`. The bot surfaces a numbered shortlist (`_stage="disambiguate"`) so a near-duplicate ("Charlie"/"Charles", "Acme Capital"/"Acme Capital LLC", a one-char email typo) is **confirmed by a human** instead of silently creating a second investor — never auto-attached. **The optional LLM-judge re-rank was deferred** (deterministic filter already surfaces the cases; LLM is the right shortlist *pruner* if noise proves real). **(b) Conversational edits (bot-side, ships on the Spark):** any in-thread reply that isn't `yes`/`no`/`edit field=value` → `parse.revise` re-runs `{proposal + instruction}` through local Qwen and re-renders the card; **email integrity preserved** (a changed address must literally appear in the instruction; the model's email field is never trusted); no-op revisions re-prompt (`same_fields`). **Deploy is split:** the `candidates` need an **s9pk build+install** (v86); the bot's disambiguation+revise need a **Spark `git pull` + restart** — a bot restart alone won't deliver `candidates` (box returns `[]`, bot safely proposes new). Tests green; **needs a Matrix live-smoke** (grammar + Qwen `revise` leg). Guide updated.
|
||||||
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
|
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
|
||||||
- **Deployed & verified live: v0.1.0:83** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version`→`0.1.0:83`, migration chain `…82→83` clean, server up on `:8080`, Gmail + ingest + digest schedulers all started; render-smoke gated the build) — **email search/query + windowed digest preview** (code-only, migrations no-op). Communications tab (`CommunicationsPage` + `email_integration/db.query_email_activity`): **fixed the investor dropdown** — the facet now mirrors the list with the digest's precedence (grid → org → contact → address) and **typed keys** (`fund:`/`org:`/`contact:`), so email matched only to a classic contact or org domain (no grid id — the common case, since `fundraising_contacts.email` is sparsely populated) now resolves to a real name and is selectable, instead of the dropdown being empty; added a **date-range filter** (`since`/`until`), and a **click-to-expand full-body view** (`GET /api/email/detail?id=` → `query_email_detail`, admin, soft-delete-gated, renders `body_text` escaped — never raw HTML). New **semantic content search**: a "Search content" toggle → `GET /api/email/search?q=` (`routes._h_search`) wrapping `ingest/search.py:hybrid_search` filtered to `doc_type='email'` (lazy import; **503** if Spark/Qdrant unreachable), **hydrated + soft-delete-filtered against SQLite** (`db.search_hit_emails` — never trust the derived index). **Daily Digest:** Settings → Admin now builds a digest over a chosen window (last 24h or since a date) as an **in-app preview** before sending (`POST /api/admin/digest/preview`); manual send uses the same window (`send-now` + `digest_scheduler.send_digest_window`); window resolved by `digest_builder.resolve_digest_window` (cap 92d). Both run the **real local-Spark summarizer** and **never touch the daily cursor**. Verified: 22/22 backend tests, `py_compile` clean, render-smoke pass. **Grant validated both live on the box 2026-06-16** — the digest windowed preview renders real Spark narratives over real activity, and the Communications dropdown / date filter / full-body view / content-search all work. Detail: `docs/guides/email.md`.
|
- **Deployed & verified live: v0.1.0:83** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version`→`0.1.0:83`, migration chain `…82→83` clean, server up on `:8080`, Gmail + ingest + digest schedulers all started; render-smoke gated the build) — **email search/query + windowed digest preview** (code-only, migrations no-op). Communications tab (`CommunicationsPage` + `email_integration/db.query_email_activity`): **fixed the investor dropdown** — the facet now mirrors the list with the digest's precedence (grid → org → contact → address) and **typed keys** (`fund:`/`org:`/`contact:`), so email matched only to a classic contact or org domain (no grid id — the common case, since `fundraising_contacts.email` is sparsely populated) now resolves to a real name and is selectable, instead of the dropdown being empty; added a **date-range filter** (`since`/`until`), and a **click-to-expand full-body view** (`GET /api/email/detail?id=` → `query_email_detail`, admin, soft-delete-gated, renders `body_text` escaped — never raw HTML). New **semantic content search**: a "Search content" toggle → `GET /api/email/search?q=` (`routes._h_search`) wrapping `ingest/search.py:hybrid_search` filtered to `doc_type='email'` (lazy import; **503** if Spark/Qdrant unreachable), **hydrated + soft-delete-filtered against SQLite** (`db.search_hit_emails` — never trust the derived index). **Daily Digest:** Settings → Admin now builds a digest over a chosen window (last 24h or since a date) as an **in-app preview** before sending (`POST /api/admin/digest/preview`); manual send uses the same window (`send-now` + `digest_scheduler.send_digest_window`); window resolved by `digest_builder.resolve_digest_window` (cap 92d). Both run the **real local-Spark summarizer** and **never touch the daily cursor**. Verified: 22/22 backend tests, `py_compile` clean, render-smoke pass. **Grant validated both live on the box 2026-06-16** — the digest windowed preview renders real Spark narratives over real activity, and the Communications dropdown / date filter / full-body view / content-search all work. Detail: `docs/guides/email.md`.
|
||||||
- **Deployed & verified live: v0.1.0:82** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version`→`0.1.0:82`, migration chain `…81→82` clean, server up on `:8080`, schedulers + Gmail integration up). **v82 vendored React 18.3.1 / ReactDOM 18.3.1 / @babel/standalone 7.29.7 into `frontend/assets/vendor/`**, served same-origin with `sha384` SRI (no CDN, no outbound-internet dependency to render the UI), and added **`start9/0.4/render-smoke.mjs`** — a jsdom check (shipped-Babel transform asserts classic/non-module + parseable; real mount asserts the login UI renders) wired into the default `make` goal (`verified-build`), so every build is gated on the frontend actually rendering. Closes the v78 (blank screen) + v79 (Babel-8 ESM-import) class structurally. Detail: `docs/guides/packaging.md`. **Prior shipped & live:** v81 Communications-tab matched-only (`query_email_activity` gates on `EXISTS(email_investor_links)`; unmatched email captured but never shown; `docs/guides/email.md`); v80 admin-only email-activity panel (`GET /api/email/activity`); v78 retired `lp_profiles`/LP Tracker + repointed Dashboard "Total Committed" onto the grid (graveyard-excluded). **Digest fully live:** capture (DWD) → propose→approve; Gmail-DWD→SMTP transport; daily Phase-B digest (`digest_builder.py` + always-on `digest_scheduler.py` reading a DB policy + `send-now`); **daily auto-send is now ENABLED** (Grant turned it on in Settings → Admin, 2026-06-16). Detail: `docs/guides/email.md`.
|
- **Deployed & verified live: v0.1.0:82** (box `$START9_BOX_HOST`/immense-voyage.local; `installed-version`→`0.1.0:82`, migration chain `…81→82` clean, server up on `:8080`, schedulers + Gmail integration up). **v82 vendored React 18.3.1 / ReactDOM 18.3.1 / @babel/standalone 7.29.7 into `frontend/assets/vendor/`**, served same-origin with `sha384` SRI (no CDN, no outbound-internet dependency to render the UI), and added **`start9/0.4/render-smoke.mjs`** — a jsdom check (shipped-Babel transform asserts classic/non-module + parseable; real mount asserts the login UI renders) wired into the default `make` goal (`verified-build`), so every build is gated on the frontend actually rendering. Closes the v78 (blank screen) + v79 (Babel-8 ESM-import) class structurally. Detail: `docs/guides/packaging.md`. **Prior shipped & live:** v81 Communications-tab matched-only (`query_email_activity` gates on `EXISTS(email_investor_links)`; unmatched email captured but never shown; `docs/guides/email.md`); v80 admin-only email-activity panel (`GET /api/email/activity`); v78 retired `lp_profiles`/LP Tracker + repointed Dashboard "Total Committed" onto the grid (graveyard-excluded). **Digest fully live:** capture (DWD) → propose→approve; Gmail-DWD→SMTP transport; daily Phase-B digest (`digest_builder.py` + always-on `digest_scheduler.py` reading a DB policy + `send-now`); **daily auto-send is now ENABLED** (Grant turned it on in Settings → Admin, 2026-06-16). Detail: `docs/guides/email.md`.
|
||||||
- **Live since v74 (2026-06-13):** login works; `/assets/` traversal 404s (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). Security/privacy hardening (path-traversal close, outreach NER backstop, get-by-id soft-delete) shipped in v74 — detail in `EVALUATION.md`.
|
- **Live since v74 (2026-06-13):** login works; `/assets/` traversal 404s (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). Security/privacy hardening (path-traversal close, outreach NER backstop, get-by-id soft-delete) shipped in v74 — detail in `EVALUATION.md`.
|
||||||
- **Tests (2026-06-16):** **26/26 backend tests green** via `python3 backend/run_tests.py`, `py_compile` clean. (+4 this session for the Matrix intake bot: `matrix_intake/test_parse.py`, `test_proposals.py`, `test_crm_client.py`, and `test_intake_endpoints.py` — the last boots the real server against a temp DB and covers `/api/intake/match`, the create→match no-duplicate contract, and `source="matrix_intake"` provenance.) `test_email_activity_panel.py` now covers the **typed facet + org/contact resolution** (the dropdown fix), the **date-range filter**, the **detail view** (full body / recipients / attachments / soft-delete), and the **content-search route** (hydrate / drop-tombstoned / 503 / admin) with retrieval stubbed; `test_digest_builder.py` adds the **window resolver** + **`send_digest_window`** (no-cursor-touch) cases. Frontend **render smoke check** (`cd start9/0.4 && make render-smoke`) still gates the default `make` build. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
|
- **Tests (2026-06-17):** **27/27 backend tests green** via `python3 backend/run_tests.py`, `py_compile` clean. (+4 last session for the Matrix intake bot: `matrix_intake/test_parse.py`, `test_proposals.py`, `test_crm_client.py`, and `test_intake_endpoints.py` — the last boots the real server against a temp DB and covers `/api/intake/match`, the create→match no-duplicate contract, and `source="matrix_intake"` provenance.) **This session (v86 fuzzy + conversational pass) added cases to those same files** — `test_intake_endpoints.py`: fuzzy `candidates` (near-spelling, legal-suffix-at-1.0, one-char email typo, exact→no-candidates, nothing-close→empty); `test_proposals.py`: the disambiguation grammar + `attach_to_candidate`/`promote_to_new`/`same_fields`; `test_parse.py`: `revise` merge + email-integrity-from-instruction + match-id preservation; `test_crm_client.py`: the `{match, candidates}` shape + no-query-skips-network. `test_email_activity_panel.py` now covers the **typed facet + org/contact resolution** (the dropdown fix), the **date-range filter**, the **detail view** (full body / recipients / attachments / soft-delete), and the **content-search route** (hydrate / drop-tombstoned / 503 / admin) with retrieval stubbed; `test_digest_builder.py` adds the **window resolver** + **`send_digest_window`** (no-cursor-touch) cases. Frontend **render smoke check** (`cd start9/0.4 && make render-smoke`) still gates the default `make` build. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
|
||||||
- **Decided, not yet built (detail in `ROADMAP.md`):** Pipeline adoption + a grid flag that auto-loads flagged investors as opportunities; **NL→safe-query** feature (search item 3 — the larger, separate build); CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (currently reply to the LP only). *(Done this session, v83: email search item 1 [activity query/panel gaps — typed facet fix + date range + full-body view] and item 2 [semantic content search] both shipped; daily-digest windowed preview→send.)*
|
- **Decided, not yet built (detail in `ROADMAP.md`):** Pipeline adoption + a grid flag that auto-loads flagged investors as opportunities; **NL→safe-query** feature (search item 3 — the larger, separate build); CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (currently reply to the LP only). *(Done this session, v83: email search item 1 [activity query/panel gaps — typed facet fix + date range + full-body view] and item 2 [semantic content search] both shipped; daily-digest windowed preview→send.)*
|
||||||
- **Known debt (P2, not deploy-blocking):** **reports-subsystem soft-delete sweep** — `handle_pipeline_report` + remaining report/aggregate queries over opportunities/communications still count soft-deleted rows (v78 shrank this surface: the `lp_profiles`/lp-breakdown aggregates are gone and the dashboard "Total Committed" is now grid-sourced); needs a pass + report-endpoint tests. Also `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; **StartOS package icon oversized/zoomed** (research the Start9 icon spec, source a base ten31 logo, produce a correctly sized icon **before the next s9pk upload**); the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`. *(Resolved v82: front-end CDN/SRI risk — libs vendored + SRI-pinned — and the render smoke check is now scripted into the build.)*
|
- **Known debt (P2, not deploy-blocking):** **reports-subsystem soft-delete sweep** — `handle_pipeline_report` + remaining report/aggregate queries over opportunities/communications still count soft-deleted rows (v78 shrank this surface: the `lp_profiles`/lp-breakdown aggregates are gone and the dashboard "Total Committed" is now grid-sourced); needs a pass + report-endpoint tests. Also `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; **StartOS package icon oversized/zoomed** (research the Start9 icon spec, source a base ten31 logo, produce a correctly sized icon **before the next s9pk upload**); the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`. *(Resolved v82: front-end CDN/SRI risk — libs vendored + SRI-pinned — and the render smoke check is now scripted into the build.)*
|
||||||
- **Doc drift to reconcile:** `crm-overview.md` + `EVALUATION.md` still describe `lp_profiles` as a live model in places — a doc-auditor pass should align them to "grid canonical, `lp_profiles` retired."
|
- **Doc drift to reconcile:** `crm-overview.md` + `EVALUATION.md` still describe `lp_profiles` as a live model in places — a doc-auditor pass should align them to "grid canonical, `lp_profiles` retired."
|
||||||
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine. Live infra now exercised on the box (Gmail capture + schedulers up; local-Spark summarization confirmed via the digest preview; Qdrant via Communications content-search); **Claude/Architect path still unverified live on the box.**
|
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine. Live infra now exercised on the box (Gmail capture + schedulers up; local-Spark summarization confirmed via the digest preview; Qdrant via Communications content-search); **Claude/Architect path still unverified live on the box.**
|
||||||
- **Next:** 1) **deploy + live-smoke the Matrix intake bot** (`pip install matrix-nio` + `MATRIX_*`/`CRM_BOT_*` in `.env` on the Spark, create the CRM bot user, `python3 backend/matrix_intake/bot.py`, post a test message); 2) **Pipeline adoption** — grid flag → auto-load opportunities (the agreed next major build); 3) add an **auth regression test** asserting the 3 v79-gated GET endpoints (`/api/users`, `/api/email/status`, `/api/email/accounts`) reject members; 4) **reports-subsystem soft-delete sweep** + report-endpoint tests; 5) `?limit=abc` crash; 6) **email-capture tab error on email sync status** (likely `/api/email/status`); 7) **NL→safe-query** (search item 3 — separate, larger); 8) Grant + Jonathan freeze v2.0 canonical; 9) reply-all for Tier-B drafts. *(Logged to ROADMAP: a build step that pre-compiles JSX to drop runtime Babel entirely — bigger, contradicts the "no build step" convention.)*
|
- **Next:** 1) **Pipeline adoption** — grid flag → auto-create/sync an `opportunities` row so flagged investors load into the Pipeline board (the agreed next major build; design the grid↔pipeline link first — see ROADMAP "Adopt the Pipeline"); 2) **make the intake bot a managed service** (systemd / restart-on-boot — it's a nohup process today); 3) **deploy + Matrix-smoke the v86 intake pass** — s9pk build+install (carries `find_intake_candidates`) + Spark `git pull`+restart (carries disambiguation + `revise`), then live-smoke the shortlist grammar and the Qwen revise leg (built this session, ROADMAP updated); 4) **reports-subsystem soft-delete sweep** + report-endpoint tests; 5) `?limit=abc` crash; 6) **auth regression test** for the 3 v79-gated GET endpoints (`/api/users`, `/api/email/status`, `/api/email/accounts`); 7) **NL→safe-query** (search item 3 — separate, larger); 8) Grant + Jonathan freeze v2.0 canonical; 9) reply-all for Tier-B drafts.
|
||||||
|
|||||||
+4
-2
@@ -100,9 +100,11 @@ Use the **matrix-bridge** repo's pattern to listen on a dedicated ten31-database
|
|||||||
- **CRM-side:** `POST /api/intake/investor` (service-auth) creates a new investor+contact **through the existing grid-save path** (so relational sync + audit + backup-on-write happen as with a UI edit; bot never does whole-blob RMW) or appends a meeting note to the interaction log for an existing investor; `GET /api/intake/match?q=` fuzzy-matches via the existing entity-resolution/email-matcher. New investor needs no fund at intake.
|
- **CRM-side:** `POST /api/intake/investor` (service-auth) creates a new investor+contact **through the existing grid-save path** (so relational sync + audit + backup-on-write happen as with a UI edit; bot never does whole-blob RMW) or appends a meeting note to the interaction log for an existing investor; `GET /api/intake/match?q=` fuzzy-matches via the existing entity-resolution/email-matcher. New investor needs no fund at intake.
|
||||||
- **Phases:** M1 = scaffold + parse + in-thread propose, **no writes** (proves Matrix↔Spark). M2 = intake endpoint + match + write-on-approve + tests. M3 (deferred) = business-card photo.
|
- **Phases:** M1 = scaffold + parse + in-thread propose, **no writes** (proves Matrix↔Spark). M2 = intake endpoint + match + write-on-approve + tests. M3 (deferred) = business-card photo.
|
||||||
|
|
||||||
**Post-deploy enhancement — fuzzy match + in-thread confirm (Grant, 2026-06-17).** Today `find_intake_match` is **exact-after-normalization** (`_normalize_text` = lowercase+strip), so near-misses — "Charlie" vs "Charles" (same last name), "Acme Capital" vs "Acme Capital LLC", a one-character email typo — return no match and the bot proposes a **new** investor, risking a duplicate the human approves without realizing a near-match exists. The existing in-thread approval gate is useless against this because the human is never *shown* the near-match. Fix: matcher returns **ranked fuzzy candidates** (deterministic pre-filter: normalized name similarity / token overlap + email edit-distance ≤ ~2), surfaced in-thread for the human to confirm or pick, with the **local Spark LLM optionally re-ranking/judging the shortlist** (good at Charlie/Charles + legal-suffix equivalence; fed only the shortlist, never the whole LP list). Keeps the approval gate but makes it effective against duplicates. Land **after** the live smoke — net-new logic + reply grammar + tests; the current exact match is safe and its failure mode (a duplicate) is recoverable via the existing entity-merge subsystem (`backend/entity_*.py`).
|
**Post-deploy enhancement — fuzzy match + in-thread confirm (Grant, 2026-06-17). BUILT 2026-06-17 (v0.1.0:86), not yet deployed / live-smoked.** Today `find_intake_match` is **exact-after-normalization** (`_normalize_text` = lowercase+strip), so near-misses — "Charlie" vs "Charles" (same last name), "Acme Capital" vs "Acme Capital LLC", a one-character email typo — return no match and the bot proposes a **new** investor, risking a duplicate the human approves without realizing a near-match exists. The existing in-thread approval gate is useless against this because the human is never *shown* the near-match. Fix: matcher returns **ranked fuzzy candidates** (deterministic pre-filter: normalized name similarity / token overlap + email edit-distance ≤ ~2), surfaced in-thread for the human to confirm or pick, with the **local Spark LLM optionally re-ranking/judging the shortlist** (good at Charlie/Charles + legal-suffix equivalence; fed only the shortlist, never the whole LP list). Keeps the approval gate but makes it effective against duplicates. Land **after** the live smoke — net-new logic + reply grammar + tests; the current exact match is safe and its failure mode (a duplicate) is recoverable via the existing entity-merge subsystem (`backend/entity_*.py`).
|
||||||
|
- **As built:** `find_intake_candidates` in `server.py` (deterministic — stdlib `difflib` name similarity + token-set Jaccard, legal-suffix-aware via `_strip_legal_suffix`, + email Levenshtein ≤ 2; ranked, ≥0.62, top 5). `GET /api/intake/match` now returns `{match, candidates}`. Bot: a new `_stage="disambiguate"` shortlist (`proposals.render_disambiguation` / `interpret_disambiguation` / `attach_to_candidate` / `promote_to_new`) — human picks a number / `new` / `no`. **The optional LLM-judge re-rank was deliberately deferred** (the deterministic filter already surfaces the named cases; an LLM judge is the right *pruner* for shortlist noise — build if the deterministic ranking proves too noisy in practice). Tests: `test_intake_endpoints.py` (server fuzzy cases), `matrix_intake/test_proposals.py` (disambiguation grammar), `matrix_intake/test_crm_client.py` (candidate shape).
|
||||||
|
|
||||||
**Post-deploy enhancement — conversational (LLM-mediated) edits (Grant, 2026-06-17).** Today an in-thread correction uses a rigid grammar (`edit field=value`). Let a free-form reply that isn't `yes`/`no`/a literal `edit …` be treated as a natural-language revision instruction: send {current proposal + the instruction} back through local Qwen (`spark.py`, the same parse leg — no Claude, no scrub) and re-render the revised proposal card for approval (e.g. "add that we met on June 14" → updated Note). Keeps the draft→human-approve gate (the human still confirms the LLM's revision) and subsumes `edit field=value` as a deterministic fast path. Thread the instruction text into `normalize`'s source so the email-integrity rule still holds (a revised email must appear in the original message or the instruction). Pairs naturally with the fuzzy-match item above — build both as one conversational-UX pass after the smoke. (Parsing of free-form *intake* messages already works today via the Qwen parse leg; this item is specifically about the *edit/refine* turn.)
|
**Post-deploy enhancement — conversational (LLM-mediated) edits (Grant, 2026-06-17). BUILT 2026-06-17 (bot-side, ships on the Spark), not yet deployed / live-smoked.** Today an in-thread correction uses a rigid grammar (`edit field=value`). Let a free-form reply that isn't `yes`/`no`/a literal `edit …` be treated as a natural-language revision instruction: send {current proposal + the instruction} back through local Qwen (`spark.py`, the same parse leg — no Claude, no scrub) and re-render the revised proposal card for approval (e.g. "add that we met on June 14" → updated Note). Keeps the draft→human-approve gate (the human still confirms the LLM's revision) and subsumes `edit field=value` as a deterministic fast path. Thread the instruction text into `normalize`'s source so the email-integrity rule still holds (a revised email must appear in the original message or the instruction). Pairs naturally with the fuzzy-match item above — build both as one conversational-UX pass after the smoke. (Parsing of free-form *intake* messages already works today via the Qwen parse leg; this item is specifically about the *edit/refine* turn.)
|
||||||
|
- **As built:** `parse.revise` + `_apply_revision` (offline-testable; the approval-stage `else` branch in `bot.py` routes any non-yes/no/edit reply here). `parse_message` now stashes `_source_text` so revise can re-check email integrity against {instruction + original}; the model's email field is never trusted. No-op revisions are caught via `proposals.same_fields` (re-prompt, not a false "Updated"). **Known v1 limit:** revise edits fields but does not re-run the matcher on a mid-thread firm rename. Tests: `matrix_intake/test_parse.py` (revise merge + email integrity + match-id preservation).
|
||||||
|
|
||||||
### Scoped service-credential auth path for automated CRM writers
|
### Scoped service-credential auth path for automated CRM writers
|
||||||
*Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.*
|
*Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.*
|
||||||
|
|||||||
@@ -46,30 +46,49 @@ async def main():
|
|||||||
try:
|
try:
|
||||||
proposal = await asyncio.to_thread(parse.parse_message, text)
|
proposal = await asyncio.to_thread(parse.parse_message, text)
|
||||||
except Exception as exc: # Spark/Qwen unreachable or bad response
|
except Exception as exc: # Spark/Qwen unreachable or bad response
|
||||||
await say(room_id, f"⚠️ couldn't reach the local parser: {exc}", root)
|
await say(room_id, f"⚠️ couldn't reach the local parser: {str(exc)[:200]}", root)
|
||||||
return
|
return
|
||||||
if proposal["intent"] == "unclear":
|
if proposal["intent"] == "unclear":
|
||||||
await say(room_id, UNCLEAR_HELP, root)
|
await say(room_id, UNCLEAR_HELP, root)
|
||||||
return
|
return
|
||||||
# Confirm new-vs-existing against the CRM matcher (read-only). Degrade gracefully if
|
# Resolve new-vs-existing against the CRM matcher (read-only). Degrade gracefully if the
|
||||||
# the CRM is unreachable — still propose, just without the "looks like existing" hint.
|
# CRM is unreachable — still propose as new, just without match/candidate hints.
|
||||||
hint = ""
|
match, candidates = None, []
|
||||||
try:
|
try:
|
||||||
match = await asyncio.to_thread(crm_client.match, proposal)
|
res = await asyncio.to_thread(crm_client.match, proposal)
|
||||||
if match:
|
match = res.get("match")
|
||||||
proposal["intent"] = "meeting_note"
|
candidates = res.get("candidates") or []
|
||||||
proposal["_match_id"] = match["id"]
|
|
||||||
hint = f"\n\n🔎 Looks like an existing investor: **{match['name']}** — this will append a note to them."
|
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
if match:
|
||||||
|
# Confident exact match → auto-attach the note to that investor (no disambiguation).
|
||||||
|
proposal["intent"] = "meeting_note"
|
||||||
|
proposal["_match_id"] = match["id"]
|
||||||
|
proposal["_stage"] = "approval"
|
||||||
store.put(root, proposal)
|
store.put(root, proposal)
|
||||||
|
hint = (f"\n\n🔎 Looks like an existing investor: **{match['name']}** — "
|
||||||
|
"this will append a note to them.")
|
||||||
await say(room_id, proposals.render(proposal) + hint, root)
|
await say(room_id, proposals.render(proposal) + hint, root)
|
||||||
|
await nudge(room_id, proposals.summary_line(proposal), root)
|
||||||
|
return
|
||||||
|
if candidates:
|
||||||
|
# No exact match but near-misses exist → make the human pick one or confirm "new",
|
||||||
|
# so a typo'd/near-duplicate name can't silently create a second investor.
|
||||||
|
proposal["_stage"] = "disambiguate"
|
||||||
|
proposal["_candidates"] = candidates
|
||||||
|
store.put(root, proposal)
|
||||||
|
await say(room_id, proposals.render_disambiguation(proposal), root)
|
||||||
|
await nudge(room_id, proposals.disambiguation_nudge(proposal), root)
|
||||||
|
return
|
||||||
|
# Genuinely new — straight to the new-investor approval card.
|
||||||
|
proposal["_stage"] = "approval"
|
||||||
|
store.put(root, proposal)
|
||||||
|
await say(room_id, proposals.render(proposal), root)
|
||||||
# Also drop a brief, un-threaded reply in the main timeline so the proposal isn't
|
# Also drop a brief, un-threaded reply in the main timeline so the proposal isn't
|
||||||
# easy to miss inside a thread (the full card + yes/edit/no stay in the thread).
|
# easy to miss inside a thread (the full card + yes/edit/no stay in the thread).
|
||||||
await nudge(room_id, proposals.summary_line(proposal), root)
|
await nudge(room_id, proposals.summary_line(proposal), root)
|
||||||
|
|
||||||
async def handle_reply(room_id, root, text):
|
async def handle_reply(room_id, root, text):
|
||||||
action, payload = proposals.interpret_reply(text)
|
|
||||||
# Claim the proposal synchronously — BEFORE any await — so a second reply that
|
# Claim the proposal synchronously — BEFORE any await — so a second reply that
|
||||||
# arrives while a commit is in flight can't double-process it. asyncio is
|
# arrives while a commit is in flight can't double-process it. asyncio is
|
||||||
# cooperative: nothing else runs between here and the first await below, so the
|
# cooperative: nothing else runs between here and the first await below, so the
|
||||||
@@ -77,6 +96,11 @@ async def main():
|
|||||||
proposal = store.pop(root)
|
proposal = store.pop(root)
|
||||||
if proposal is None:
|
if proposal is None:
|
||||||
return
|
return
|
||||||
|
if proposal.get("_stage") == "disambiguate":
|
||||||
|
await handle_disambiguation(room_id, root, text, proposal)
|
||||||
|
return
|
||||||
|
|
||||||
|
action, payload = proposals.interpret_reply(text)
|
||||||
if action == "approve":
|
if action == "approve":
|
||||||
try:
|
try:
|
||||||
summary = await asyncio.to_thread(crm_client.commit, proposal)
|
summary = await asyncio.to_thread(crm_client.commit, proposal)
|
||||||
@@ -92,9 +116,43 @@ async def main():
|
|||||||
proposal = proposals.apply_edit(proposal, field, value)
|
proposal = proposals.apply_edit(proposal, field, value)
|
||||||
store.put(root, proposal) # keep it pending (edited) for the next reply
|
store.put(root, proposal) # keep it pending (edited) for the next reply
|
||||||
await say(room_id, "✏️ Updated:\n\n" + proposals.render(proposal), root)
|
await say(room_id, "✏️ Updated:\n\n" + proposals.render(proposal), root)
|
||||||
else: # unrecognized reply — leave the proposal pending
|
else:
|
||||||
|
# Not yes/no/edit-grammar → treat it as a natural-language revision instruction and
|
||||||
|
# re-run it through local Qwen (no Claude, no scrub). The human still approves the
|
||||||
|
# revised card, so the draft→approve gate holds.
|
||||||
|
try:
|
||||||
|
revised = await asyncio.to_thread(parse.revise, proposal, text)
|
||||||
|
except Exception as exc:
|
||||||
store.put(root, proposal)
|
store.put(root, proposal)
|
||||||
await say(room_id, "Reply **yes** to commit, **edit field=value**, or **no**.", root)
|
await say(room_id, f"⚠️ couldn't apply that change ({str(exc)[:200]}).\n\nReply **yes** "
|
||||||
|
"to commit, **no** to discard, **edit field=value**, or rephrase.", root)
|
||||||
|
return
|
||||||
|
if proposals.same_fields(proposal, revised):
|
||||||
|
store.put(root, proposal)
|
||||||
|
await say(room_id, "I didn't catch a change there. Reply **yes** to commit, **no** "
|
||||||
|
"to discard, **edit field=value**, or tell me what to change.", root)
|
||||||
|
return
|
||||||
|
store.put(root, revised)
|
||||||
|
await say(room_id, "✏️ Updated:\n\n" + proposals.render(revised), root)
|
||||||
|
|
||||||
|
async def handle_disambiguation(room_id, root, text, proposal):
|
||||||
|
cands = proposal.get("_candidates") or []
|
||||||
|
action, payload = proposals.interpret_disambiguation(text, len(cands))
|
||||||
|
if action == "pick":
|
||||||
|
updated = proposals.attach_to_candidate(proposal, cands[payload])
|
||||||
|
store.put(root, updated)
|
||||||
|
await say(room_id, "✏️ Will log against the existing investor:\n\n"
|
||||||
|
+ proposals.render(updated), root)
|
||||||
|
elif action == "new":
|
||||||
|
updated = proposals.promote_to_new(proposal)
|
||||||
|
store.put(root, updated)
|
||||||
|
await say(room_id, "➕ OK — adding as a new investor:\n\n"
|
||||||
|
+ proposals.render(updated), root)
|
||||||
|
elif action == "reject":
|
||||||
|
await say(room_id, "🗑️ Discarded — nothing written.", root)
|
||||||
|
else: # unrecognized — re-show the shortlist
|
||||||
|
store.put(root, proposal)
|
||||||
|
await say(room_id, "I didn't catch that.\n\n" + proposals.render_disambiguation(proposal), root)
|
||||||
|
|
||||||
async def on_message(room: MatrixRoom, event: RoomMessageText):
|
async def on_message(room: MatrixRoom, event: RoomMessageText):
|
||||||
if event.sender == mx["user_id"]:
|
if event.sender == mx["user_id"]:
|
||||||
|
|||||||
@@ -70,19 +70,32 @@ def _authed(method, path, body=None):
|
|||||||
|
|
||||||
|
|
||||||
def match(proposal):
|
def match(proposal):
|
||||||
"""Return {'id', 'name'} for an existing investor matching this proposal, else None."""
|
"""Resolve new-vs-existing for this proposal against the CRM matcher.
|
||||||
|
|
||||||
|
Returns {'match': {...}|None, 'candidates': [...]}:
|
||||||
|
- `match` is a confident EXACT existing investor — {'id', 'name'} — that the bot
|
||||||
|
auto-attaches a note to (no human disambiguation needed).
|
||||||
|
- `candidates` is a ranked list of fuzzy NEAR-matches — each {'id', 'name', 'score',
|
||||||
|
'matched_on'} — surfaced in-thread for the human to pick from (or confirm "new")
|
||||||
|
when there is no exact match, so a typo'd/near-duplicate name doesn't silently
|
||||||
|
create a second investor."""
|
||||||
q = proposal.get("investor_name") or proposal.get("contact_name") or ""
|
q = proposal.get("investor_name") or proposal.get("contact_name") or ""
|
||||||
email = proposal.get("contact_email") or ""
|
email = proposal.get("contact_email") or ""
|
||||||
if not q and not email:
|
if not q and not email:
|
||||||
return None
|
return {"match": None, "candidates": []}
|
||||||
qs = urlencode({"q": q, "email": email})
|
qs = urlencode({"q": q, "email": email})
|
||||||
status, data = _authed("GET", f"/api/intake/match?{qs}")
|
status, data = _authed("GET", f"/api/intake/match?{qs}")
|
||||||
if status != 200:
|
if status != 200:
|
||||||
raise RuntimeError(f"intake match failed ({status}): {data.get('error') or data}")
|
raise RuntimeError(f"intake match failed ({status}): {data.get('error') or data}")
|
||||||
m = (data.get("data") or {}).get("match")
|
payload = data.get("data") or {}
|
||||||
if not m:
|
m = payload.get("match")
|
||||||
return None
|
match_out = {"id": m["id"], "name": m.get("investor_name") or q} if m else None
|
||||||
return {"id": m["id"], "name": m.get("investor_name") or q}
|
candidates = [
|
||||||
|
{"id": c["id"], "name": c.get("investor_name") or "?",
|
||||||
|
"score": c.get("score"), "matched_on": c.get("matched_on")}
|
||||||
|
for c in (payload.get("candidates") or []) if c.get("id")
|
||||||
|
]
|
||||||
|
return {"match": match_out, "candidates": candidates}
|
||||||
|
|
||||||
|
|
||||||
def build_commit_payload(proposal):
|
def build_commit_payload(proposal):
|
||||||
|
|||||||
@@ -2,7 +2,13 @@
|
|||||||
|
|
||||||
The model only EXTRACTS structure; it never decides to write anything. New-vs-existing is
|
The model only EXTRACTS structure; it never decides to write anything. New-vs-existing is
|
||||||
finalized in M2 against the CRM matcher — here `intent` is the model's first read.
|
finalized in M2 against the CRM matcher — here `intent` is the model's first read.
|
||||||
|
|
||||||
|
`revise()` is the conversational-edit leg: a free-form correction the human types in the
|
||||||
|
proposal thread (e.g. "add that we met June 14") is applied to the pending proposal via the
|
||||||
|
same local Qwen — no Claude, no scrub. Email integrity is preserved: a changed address must
|
||||||
|
literally appear in the instruction (or the original message); the model can never mint one.
|
||||||
"""
|
"""
|
||||||
|
import json
|
||||||
import re
|
import re
|
||||||
|
|
||||||
import spark
|
import spark
|
||||||
@@ -60,4 +66,54 @@ def parse_message(text, parse_fn=spark.parse_json):
|
|||||||
"""Parse one intake message. `parse_fn` is injectable for tests (defaults to Spark/Qwen).
|
"""Parse one intake message. `parse_fn` is injectable for tests (defaults to Spark/Qwen).
|
||||||
Returns a normalized proposal dict. On a model/transport failure, raises (caller decides)."""
|
Returns a normalized proposal dict. On a model/transport failure, raises (caller decides)."""
|
||||||
raw = parse_fn(text, system=SYSTEM, max_tokens=400)
|
raw = parse_fn(text, system=SYSTEM, max_tokens=400)
|
||||||
return normalize(raw, source_text=text)
|
proposal = normalize(raw, source_text=text)
|
||||||
|
# Stash the original message so a later revise() can re-check email integrity against it.
|
||||||
|
proposal["_source_text"] = text
|
||||||
|
return proposal
|
||||||
|
|
||||||
|
|
||||||
|
REVISE_SYSTEM = (
|
||||||
|
"You revise a structured investor-intake proposal from a short correction a venture-fund "
|
||||||
|
"team member typed. You are given the CURRENT proposal as JSON and an INSTRUCTION. Apply "
|
||||||
|
"the instruction and reply with ONLY the full revised JSON object, these keys:\n"
|
||||||
|
' "investor_name", "contact_name", "contact_email", "contact_title", "note".\n'
|
||||||
|
"Change ONLY what the instruction asks; copy every other field through unchanged. Use null "
|
||||||
|
"for a field the instruction clears or that is genuinely absent. Never invent an email "
|
||||||
|
"address. Output JSON only."
|
||||||
|
)
|
||||||
|
|
||||||
|
_REVISABLE = ("investor_name", "contact_name", "contact_title", "note")
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_revision(proposal, model_out, instruction):
|
||||||
|
"""Merge the model's revised fields onto the proposal. Pure + offline-testable.
|
||||||
|
|
||||||
|
Preserves control keys (_match_id / _stage / intent / _source_text). Enforces email
|
||||||
|
integrity: a revised address is taken only if it literally appears in the INSTRUCTION the
|
||||||
|
human typed; otherwise the existing (already integrity-checked) address is kept. The model's
|
||||||
|
own email field is never trusted — it must not mint an address."""
|
||||||
|
model_out = model_out or {}
|
||||||
|
out = dict(proposal)
|
||||||
|
for k in _REVISABLE:
|
||||||
|
if k in model_out:
|
||||||
|
out[k] = _clean(model_out.get(k))
|
||||||
|
m = _EMAIL_RE.search(instruction or "")
|
||||||
|
if m:
|
||||||
|
out["contact_email"] = m.group(0).rstrip(".,;:!?)]}>\"'")
|
||||||
|
# else: keep proposal's current contact_email (untouched above; control key copied by dict())
|
||||||
|
# Don't let a revision strip the proposal down to nothing actionable.
|
||||||
|
if not out.get("investor_name") and not out.get("contact_name"):
|
||||||
|
out["investor_name"] = proposal.get("investor_name")
|
||||||
|
out["contact_name"] = proposal.get("contact_name")
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def revise(proposal, instruction, parse_fn=spark.parse_json):
|
||||||
|
"""Apply a natural-language correction to a pending proposal via local Qwen; return the
|
||||||
|
revised proposal dict. `parse_fn` is injectable for tests (defaults to Spark/Qwen)."""
|
||||||
|
current = {k: proposal.get(k) for k in
|
||||||
|
("investor_name", "contact_name", "contact_email", "contact_title", "note")}
|
||||||
|
prompt = ("CURRENT:\n" + json.dumps(current, ensure_ascii=False)
|
||||||
|
+ "\n\nINSTRUCTION:\n" + (instruction or "").strip())
|
||||||
|
raw = parse_fn(prompt, system=REVISE_SYSTEM, max_tokens=400)
|
||||||
|
return _apply_revision(proposal, raw, instruction)
|
||||||
|
|||||||
@@ -5,7 +5,12 @@ Matrix thread root (the bot's proposal lives in a thread rooted at the user's me
|
|||||||
the user replies inside that thread). In-memory and ephemeral by design — a restart drops
|
the user replies inside that thread). In-memory and ephemeral by design — a restart drops
|
||||||
pending proposals (the user just re-sends), matching matrix-bridge's stateless-by-default
|
pending proposals (the user just re-sends), matching matrix-bridge's stateless-by-default
|
||||||
ethos. Nothing here writes to the CRM; the bot calls the CRM client only after `approve`.
|
ethos. Nothing here writes to the CRM; the bot calls the CRM client only after `approve`.
|
||||||
|
|
||||||
|
A proposal carries a `_stage`: "approval" (the normal yes/edit/no card) or "disambiguate"
|
||||||
|
(a fuzzy-match shortlist the human must resolve — pick a number / "new" / "no" — before it
|
||||||
|
becomes an approval-stage proposal). The shortlist itself rides on `_candidates`.
|
||||||
"""
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
# field aliases accepted in `edit <field>=<value>`
|
# field aliases accepted in `edit <field>=<value>`
|
||||||
_EDIT_ALIASES = {
|
_EDIT_ALIASES = {
|
||||||
@@ -18,6 +23,10 @@ _EDIT_ALIASES = {
|
|||||||
|
|
||||||
_YES = {"yes", "y", "approve", "approved", "ok", "confirm", "go", "👍", "✅"}
|
_YES = {"yes", "y", "approve", "approved", "ok", "confirm", "go", "👍", "✅"}
|
||||||
_NO = {"no", "n", "cancel", "discard", "reject", "stop", "👎", "❌"}
|
_NO = {"no", "n", "cancel", "discard", "reject", "stop", "👎", "❌"}
|
||||||
|
# "create a new investor anyway" replies to a disambiguation shortlist
|
||||||
|
_NEW = {"new", "none", "new investor", "none of these", "create", "create new", "add new", "neither"}
|
||||||
|
|
||||||
|
_CONTENT_FIELDS = ("intent", "investor_name", "contact_name", "contact_email", "contact_title", "note")
|
||||||
|
|
||||||
|
|
||||||
class ProposalStore:
|
class ProposalStore:
|
||||||
@@ -84,6 +93,75 @@ def apply_edit(proposal, field, value):
|
|||||||
return updated
|
return updated
|
||||||
|
|
||||||
|
|
||||||
|
def same_fields(a, b):
|
||||||
|
"""True if two proposals carry identical content (used to detect a no-op NL revision so we
|
||||||
|
don't tell the human 'Updated' when nothing changed)."""
|
||||||
|
return all((a or {}).get(k) == (b or {}).get(k) for k in _CONTENT_FIELDS)
|
||||||
|
|
||||||
|
|
||||||
|
def interpret_disambiguation(text, n_candidates):
|
||||||
|
"""Classify a reply to a fuzzy-match shortlist.
|
||||||
|
|
||||||
|
Returns ("pick", index) | ("new", None) | ("reject", None) | ("unknown", None). A bare
|
||||||
|
number selects that candidate; "new"/"none" creates a new investor; "no"/"cancel" discards."""
|
||||||
|
t = (text or "").strip().lower()
|
||||||
|
if not t:
|
||||||
|
return ("unknown", None)
|
||||||
|
if t in _NO:
|
||||||
|
return ("reject", None)
|
||||||
|
if t in _NEW:
|
||||||
|
return ("new", None)
|
||||||
|
m = re.fullmatch(r"#?\s*(\d{1,2})", t)
|
||||||
|
if m:
|
||||||
|
idx = int(m.group(1)) - 1
|
||||||
|
if 0 <= idx < n_candidates:
|
||||||
|
return ("pick", idx)
|
||||||
|
return ("unknown", None)
|
||||||
|
|
||||||
|
|
||||||
|
def attach_to_candidate(proposal, candidate):
|
||||||
|
"""Promote a disambiguation pick into an approval-stage meeting note on the chosen investor.
|
||||||
|
The note will target that existing grid row (via _match_id); the firm name is shown for
|
||||||
|
accuracy. Drops the shortlist."""
|
||||||
|
updated = dict(proposal)
|
||||||
|
updated.pop("_candidates", None)
|
||||||
|
updated["_stage"] = "approval"
|
||||||
|
updated["_match_id"] = candidate["id"]
|
||||||
|
updated["intent"] = "meeting_note"
|
||||||
|
if candidate.get("name"):
|
||||||
|
updated["investor_name"] = candidate["name"]
|
||||||
|
return updated
|
||||||
|
|
||||||
|
|
||||||
|
def promote_to_new(proposal):
|
||||||
|
"""Disambiguation 'new' — discard the shortlist and proceed as a new-investor proposal."""
|
||||||
|
updated = dict(proposal)
|
||||||
|
updated.pop("_candidates", None)
|
||||||
|
updated.pop("_match_id", None)
|
||||||
|
updated["_stage"] = "approval"
|
||||||
|
return updated
|
||||||
|
|
||||||
|
|
||||||
|
def render_disambiguation(proposal):
|
||||||
|
"""Render the fuzzy-match shortlist a human resolves before we create a new investor."""
|
||||||
|
name = proposal.get("investor_name") or proposal.get("contact_name") or "?"
|
||||||
|
cands = proposal.get("_candidates") or []
|
||||||
|
lines = [f"🔎 Before adding **{name}** as new — these existing investors look similar:"]
|
||||||
|
for i, c in enumerate(cands, 1):
|
||||||
|
lines.append(f" **{i}.** {c.get('name') or '?'}")
|
||||||
|
lines.append("")
|
||||||
|
lines.append("Reply a **number** to log this against that investor, **new** to add it as a "
|
||||||
|
"new investor, or **no** to discard.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def disambiguation_nudge(proposal):
|
||||||
|
"""Brief main-timeline pointer for a disambiguation proposal (the shortlist is in the thread)."""
|
||||||
|
name = proposal.get("investor_name") or proposal.get("contact_name") or "?"
|
||||||
|
return (f"🔎 **{name}** may match an existing investor — open the **thread** to pick one "
|
||||||
|
"or confirm it's new.")
|
||||||
|
|
||||||
|
|
||||||
def render(proposal):
|
def render(proposal):
|
||||||
"""Render a proposal as the in-thread message a human approves."""
|
"""Render a proposal as the in-thread message a human approves."""
|
||||||
if proposal.get("intent") == "meeting_note":
|
if proposal.get("intent") == "meeting_note":
|
||||||
|
|||||||
@@ -58,6 +58,61 @@ def test_subject_blank_when_note_present_else_provenance_label():
|
|||||||
assert no_note["subject"] == "Intake (Matrix)"
|
assert no_note["subject"] == "Intake (Matrix)"
|
||||||
|
|
||||||
|
|
||||||
|
def _with_stub_authed(reply, capture=None):
|
||||||
|
"""Swap crm_client._authed for a canned (status, data); return a restorer."""
|
||||||
|
orig = crm_client._authed
|
||||||
|
|
||||||
|
def fake(method, path, body=None):
|
||||||
|
if capture is not None:
|
||||||
|
capture["path"] = path
|
||||||
|
return reply
|
||||||
|
|
||||||
|
crm_client._authed = fake
|
||||||
|
return orig
|
||||||
|
|
||||||
|
|
||||||
|
def test_match_parses_exact_match():
|
||||||
|
cap = {}
|
||||||
|
orig = _with_stub_authed((200, {"data": {
|
||||||
|
"match": {"id": "rowAcme", "investor_name": "Acme Capital", "matched_on": "name"},
|
||||||
|
"candidates": [],
|
||||||
|
}}), cap)
|
||||||
|
try:
|
||||||
|
res = crm_client.match({"investor_name": "Acme Capital", "contact_email": ""})
|
||||||
|
finally:
|
||||||
|
crm_client._authed = orig
|
||||||
|
assert res["match"] == {"id": "rowAcme", "name": "Acme Capital"}
|
||||||
|
assert res["candidates"] == []
|
||||||
|
assert "q=Acme" in cap["path"] # the query was forwarded
|
||||||
|
|
||||||
|
|
||||||
|
def test_match_returns_ranked_candidates_when_no_exact():
|
||||||
|
orig = _with_stub_authed((200, {"data": {"match": None, "candidates": [
|
||||||
|
{"id": "rowCharlie", "investor_name": "Charlie Brown", "score": 0.92, "matched_on": "name"},
|
||||||
|
{"id": "rowBeta", "investor_name": "Beta Capital LLC", "score": 0.86, "matched_on": "name"},
|
||||||
|
]}}))
|
||||||
|
try:
|
||||||
|
res = crm_client.match({"investor_name": "Charles Brown"})
|
||||||
|
finally:
|
||||||
|
crm_client._authed = orig
|
||||||
|
assert res["match"] is None
|
||||||
|
assert [c["id"] for c in res["candidates"]] == ["rowCharlie", "rowBeta"]
|
||||||
|
assert res["candidates"][0]["name"] == "Charlie Brown"
|
||||||
|
assert res["candidates"][0]["matched_on"] == "name"
|
||||||
|
|
||||||
|
|
||||||
|
def test_match_no_query_skips_network():
|
||||||
|
def boom(*a, **k):
|
||||||
|
raise AssertionError("should not hit the network when there's nothing to match on")
|
||||||
|
orig = crm_client._authed
|
||||||
|
crm_client._authed = boom
|
||||||
|
try:
|
||||||
|
res = crm_client.match({"investor_name": None, "contact_name": None, "contact_email": None})
|
||||||
|
finally:
|
||||||
|
crm_client._authed = orig
|
||||||
|
assert res == {"match": None, "candidates": []}
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
||||||
for fn in fns:
|
for fn in fns:
|
||||||
|
|||||||
@@ -102,6 +102,65 @@ def test_none_model_reply_is_unclear():
|
|||||||
assert p["intent"] == "unclear"
|
assert p["intent"] == "unclear"
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_message_stashes_source_text():
|
||||||
|
p = parse.parse_message("Acme Capital, Jane jane@acme.com",
|
||||||
|
parse_fn=_stub({"intent": "new_investor", "investor_name": "Acme Capital",
|
||||||
|
"contact_name": "Jane", "contact_email": "jane@acme.com"}))
|
||||||
|
assert p["_source_text"] == "Acme Capital, Jane jane@acme.com"
|
||||||
|
|
||||||
|
|
||||||
|
def test_revise_applies_note_change_and_preserves_control_keys():
|
||||||
|
proposal = parse.parse_message(
|
||||||
|
"New investor Acme Capital, Jane Doe jane@acme.com",
|
||||||
|
parse_fn=_stub({"intent": "new_investor", "investor_name": "Acme Capital",
|
||||||
|
"contact_name": "Jane Doe", "contact_email": "jane@acme.com",
|
||||||
|
"contact_title": None, "note": None}))
|
||||||
|
revised = parse.revise(
|
||||||
|
proposal, "add that we met on June 14",
|
||||||
|
parse_fn=_stub({"investor_name": "Acme Capital", "contact_name": "Jane Doe",
|
||||||
|
"contact_email": "jane@acme.com", "contact_title": None,
|
||||||
|
"note": "met on June 14"}))
|
||||||
|
assert revised["note"] == "met on June 14"
|
||||||
|
assert revised["investor_name"] == "Acme Capital"
|
||||||
|
assert revised["intent"] == "new_investor" # control key preserved
|
||||||
|
assert revised["_source_text"] == proposal["_source_text"] # preserved for email integrity
|
||||||
|
|
||||||
|
|
||||||
|
def test_revise_email_taken_only_from_instruction():
|
||||||
|
proposal = {"intent": "new_investor", "investor_name": "Acme", "contact_name": "Jane",
|
||||||
|
"contact_email": "jane@acme.com", "contact_title": None, "note": None,
|
||||||
|
"_source_text": "Acme, Jane jane@acme.com"}
|
||||||
|
# instruction literally carries the new address → accepted
|
||||||
|
r1 = parse.revise(proposal, "her email is jane@newfirm.com",
|
||||||
|
parse_fn=_stub({"contact_email": "jane@newfirm.com"}))
|
||||||
|
assert r1["contact_email"] == "jane@newfirm.com"
|
||||||
|
# model tries to change the email but the instruction has no address → keep the existing one
|
||||||
|
r2 = parse.revise(proposal, "set her title to GP",
|
||||||
|
parse_fn=_stub({"contact_email": "totally@madeup.test", "contact_title": "GP"}))
|
||||||
|
assert r2["contact_email"] == "jane@acme.com" # model's email ignored (not in instruction)
|
||||||
|
assert r2["contact_title"] == "GP"
|
||||||
|
|
||||||
|
|
||||||
|
def test_revise_preserves_match_id():
|
||||||
|
proposal = {"intent": "meeting_note", "investor_name": "Acme", "contact_name": None,
|
||||||
|
"contact_email": None, "contact_title": None, "note": "old",
|
||||||
|
"_match_id": "rowAcme", "_stage": "approval", "_source_text": "note for Acme: old"}
|
||||||
|
revised = parse.revise(proposal, "change the note to: sent the deck",
|
||||||
|
parse_fn=_stub({"note": "sent the deck"}))
|
||||||
|
assert revised["note"] == "sent the deck"
|
||||||
|
assert revised["_match_id"] == "rowAcme"
|
||||||
|
assert revised["intent"] == "meeting_note"
|
||||||
|
|
||||||
|
|
||||||
|
def test_revise_cannot_empty_the_proposal():
|
||||||
|
proposal = {"intent": "new_investor", "investor_name": "Acme", "contact_name": "Jane",
|
||||||
|
"contact_email": None, "contact_title": None, "note": "x", "_source_text": "Acme Jane"}
|
||||||
|
revised = parse.revise(proposal, "clear it",
|
||||||
|
parse_fn=_stub({"investor_name": None, "contact_name": None,
|
||||||
|
"contact_title": None, "note": None}))
|
||||||
|
assert revised["investor_name"] == "Acme" and revised["contact_name"] == "Jane"
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
||||||
for fn in fns:
|
for fn in fns:
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
"""Tests for the proposal store + approval state machine (pure logic, no network)."""
|
"""Tests for the proposal store + approval state machine (pure logic, no network)."""
|
||||||
|
import copy
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
@@ -105,6 +106,79 @@ def test_summary_line_new_vs_note():
|
|||||||
assert "thread" in new_line.lower()
|
assert "thread" in new_line.lower()
|
||||||
|
|
||||||
|
|
||||||
|
# --- fuzzy-match disambiguation + conversational-revision helpers ---
|
||||||
|
|
||||||
|
DISAMBIG = {"intent": "new_investor", "investor_name": "Charles Brown",
|
||||||
|
"contact_name": "Charles Brown", "contact_email": None, "contact_title": None,
|
||||||
|
"note": "met at conf", "_stage": "disambiguate",
|
||||||
|
"_candidates": [{"id": "rowCharlie", "name": "Charlie Brown", "score": 0.92, "matched_on": "name"},
|
||||||
|
{"id": "rowBeta", "name": "Beta Capital LLC", "score": 0.7, "matched_on": "name"}]}
|
||||||
|
|
||||||
|
|
||||||
|
def test_interpret_disambiguation_pick_number():
|
||||||
|
assert proposals.interpret_disambiguation("1", 2) == ("pick", 0)
|
||||||
|
assert proposals.interpret_disambiguation(" 2 ", 2) == ("pick", 1)
|
||||||
|
assert proposals.interpret_disambiguation("#1", 2) == ("pick", 0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_interpret_disambiguation_out_of_range_is_unknown():
|
||||||
|
assert proposals.interpret_disambiguation("3", 2)[0] == "unknown"
|
||||||
|
assert proposals.interpret_disambiguation("0", 2)[0] == "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def test_interpret_disambiguation_new_and_no():
|
||||||
|
assert proposals.interpret_disambiguation("new", 2)[0] == "new"
|
||||||
|
assert proposals.interpret_disambiguation("none of these", 2)[0] == "new"
|
||||||
|
assert proposals.interpret_disambiguation("no", 2)[0] == "reject"
|
||||||
|
|
||||||
|
|
||||||
|
def test_interpret_disambiguation_freeform_is_unknown():
|
||||||
|
# a free-form reply in the shortlist stage isn't guessed at — re-prompt instead
|
||||||
|
assert proposals.interpret_disambiguation("the first one", 2)[0] == "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def test_attach_to_candidate_promotes_to_meeting_note():
|
||||||
|
out = proposals.attach_to_candidate(DISAMBIG, DISAMBIG["_candidates"][0])
|
||||||
|
assert out["_match_id"] == "rowCharlie"
|
||||||
|
assert out["intent"] == "meeting_note"
|
||||||
|
assert out["_stage"] == "approval"
|
||||||
|
assert out["investor_name"] == "Charlie Brown" # canonical existing name shown
|
||||||
|
assert "_candidates" not in out
|
||||||
|
assert "_candidates" in DISAMBIG # original untouched
|
||||||
|
|
||||||
|
|
||||||
|
def test_promote_to_new_clears_shortlist_and_match():
|
||||||
|
out = proposals.promote_to_new(dict(DISAMBIG, _match_id="rowX"))
|
||||||
|
assert out["_stage"] == "approval"
|
||||||
|
assert "_candidates" not in out
|
||||||
|
assert "_match_id" not in out
|
||||||
|
|
||||||
|
|
||||||
|
def test_disambiguation_pick_then_yes_reaches_approval():
|
||||||
|
# Closes the seam between the two state machines: a shortlist pick promotes the proposal to
|
||||||
|
# approval stage carrying the chosen investor's row id, and a following 'yes' classifies as
|
||||||
|
# approve (the normal commit path) — so pick -> yes lands the note on the existing investor.
|
||||||
|
picked = proposals.attach_to_candidate(copy.deepcopy(DISAMBIG), DISAMBIG["_candidates"][0])
|
||||||
|
assert picked["_stage"] == "approval"
|
||||||
|
assert picked["_match_id"] == "rowCharlie"
|
||||||
|
assert picked["intent"] == "meeting_note"
|
||||||
|
assert proposals.interpret_reply("yes") == ("approve", None)
|
||||||
|
|
||||||
|
|
||||||
|
def test_render_disambiguation_lists_numbered_candidates():
|
||||||
|
text = proposals.render_disambiguation(DISAMBIG)
|
||||||
|
assert "Charlie Brown" in text and "Beta Capital LLC" in text
|
||||||
|
assert "1." in text and "2." in text
|
||||||
|
assert "new" in text.lower() and "no" in text.lower()
|
||||||
|
|
||||||
|
|
||||||
|
def test_same_fields_ignores_control_keys():
|
||||||
|
a = dict(SAMPLE)
|
||||||
|
assert proposals.same_fields(a, dict(a))
|
||||||
|
assert not proposals.same_fields(a, dict(a, note="different"))
|
||||||
|
assert proposals.same_fields(a, dict(a, _match_id="r1", _stage="approval"))
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_") and callable(v)]
|
||||||
for fn in fns:
|
for fn in fns:
|
||||||
|
|||||||
+127
-2
@@ -15,6 +15,7 @@ import uuid
|
|||||||
import csv
|
import csv
|
||||||
import io
|
import io
|
||||||
import re
|
import re
|
||||||
|
import difflib
|
||||||
import base64
|
import base64
|
||||||
import threading
|
import threading
|
||||||
from datetime import datetime, timedelta
|
from datetime import datetime, timedelta
|
||||||
@@ -1254,6 +1255,124 @@ def find_intake_match(conn, q, email=None):
|
|||||||
return email_hit
|
return email_hit
|
||||||
|
|
||||||
|
|
||||||
|
def _email_edit_distance(a, b):
|
||||||
|
"""Levenshtein distance between two short strings (emails). Stdlib-only DP; used to flag
|
||||||
|
near-miss emails (a one- or two-character typo) for the intake fuzzy matcher."""
|
||||||
|
a = (a or '').strip().lower()
|
||||||
|
b = (b or '').strip().lower()
|
||||||
|
if a == b:
|
||||||
|
return 0
|
||||||
|
if not a or not b:
|
||||||
|
return max(len(a), len(b))
|
||||||
|
prev = list(range(len(b) + 1))
|
||||||
|
for i, ca in enumerate(a, 1):
|
||||||
|
cur = [i]
|
||||||
|
for j, cb in enumerate(b, 1):
|
||||||
|
cost = 0 if ca == cb else 1
|
||||||
|
cur.append(min(prev[j] + 1, cur[j - 1] + 1, prev[j - 1] + cost))
|
||||||
|
prev = cur
|
||||||
|
return prev[-1]
|
||||||
|
|
||||||
|
|
||||||
|
# Legal-entity suffixes stripped before name comparison so 'Acme Capital' ~ 'Acme Capital LLC'.
|
||||||
|
# Deliberately tight: only true entity types, NOT name-type words (Capital/Ventures/Partners),
|
||||||
|
# which are distinctive enough to keep. Intentionally EXCLUDES 'sa'/'ag' (Société Anonyme /
|
||||||
|
# Aktiengesellschaft) — niche for this portfolio and ambiguous enough as coincidental trailing
|
||||||
|
# tokens that stripping them inflates false 1.0 matches across distinct firms.
|
||||||
|
_LEGAL_SUFFIX = {"llc", "lp", "llp", "inc", "incorporated", "ltd", "limited", "co", "corp",
|
||||||
|
"corporation", "company", "plc", "gmbh", "pte"}
|
||||||
|
|
||||||
|
|
||||||
|
def _strip_legal_suffix(normalized_name):
|
||||||
|
"""Drop trailing legal-entity suffix tokens (llc/lp/inc/…) from an already-normalized name."""
|
||||||
|
toks = re.findall(r"[a-z0-9]+", normalized_name)
|
||||||
|
while toks and toks[-1] in _LEGAL_SUFFIX:
|
||||||
|
toks.pop()
|
||||||
|
return " ".join(toks)
|
||||||
|
|
||||||
|
|
||||||
|
def _name_similarity(a, b):
|
||||||
|
"""0..1 fuzzy similarity between two investor names: the max of difflib's sequence ratio
|
||||||
|
(catches near-spellings — 'Charlie'/'Charles') and token-set Jaccard overlap (catches
|
||||||
|
word-order differences). Legal-entity suffixes are stripped first, so two names differing
|
||||||
|
only by 'LLC'/'LP'/'Inc' score 1.0 (a near-certain duplicate to surface — find_intake_match
|
||||||
|
won't have caught it, since it compares the full string). Favors recall: a shared common
|
||||||
|
name-word ('… Capital') can lift unrelated firms into the 0.6–0.8 band — acceptable noise in
|
||||||
|
a ranked, human-confirmed shortlist; semantic pruning is the deferred LLM-judge's job."""
|
||||||
|
a = _normalize_text(a)
|
||||||
|
b = _normalize_text(b)
|
||||||
|
if not a or not b:
|
||||||
|
return 0.0
|
||||||
|
if a == b:
|
||||||
|
return 1.0
|
||||||
|
sa = _strip_legal_suffix(a) or a
|
||||||
|
sb = _strip_legal_suffix(b) or b
|
||||||
|
if sa == sb:
|
||||||
|
return 1.0
|
||||||
|
ratio = difflib.SequenceMatcher(None, sa, sb).ratio()
|
||||||
|
ta = set(re.findall(r"[a-z0-9]+", sa))
|
||||||
|
tb = set(re.findall(r"[a-z0-9]+", sb))
|
||||||
|
jaccard = len(ta & tb) / len(ta | tb) if (ta or tb) else 0.0
|
||||||
|
return max(ratio, jaccard)
|
||||||
|
|
||||||
|
|
||||||
|
def find_intake_candidates(conn, q, email=None, limit=5, min_score=0.62, max_email_distance=2):
|
||||||
|
"""Ranked fuzzy near-matches for the intake bot's disambiguation prompt.
|
||||||
|
|
||||||
|
Complements find_intake_match (which is exact-after-normalization): when the exact matcher
|
||||||
|
misses, this returns the closest existing grid investors so the bot can surface them
|
||||||
|
in-thread and the human can attach to one — instead of unknowingly creating a duplicate.
|
||||||
|
Deterministic (stdlib difflib + token overlap + email edit distance), no LLM. Scans the same
|
||||||
|
canonical grid blob as find_intake_match, so candidate ids are grid row ids the write targets.
|
||||||
|
EXCLUDES exact matches (score 1.0 — those belong to find_intake_match) and ranks by score."""
|
||||||
|
row = conn.execute("SELECT grid_json FROM fundraising_state WHERE id = 'main'").fetchone()
|
||||||
|
if not row or not row['grid_json']:
|
||||||
|
return []
|
||||||
|
try:
|
||||||
|
grid = json.loads(row['grid_json'])
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
rows = grid.get('rows', []) if isinstance(grid, dict) else []
|
||||||
|
wanted_name = _normalize_text(q) if q else ''
|
||||||
|
wanted_email = (email or '').strip().lower()
|
||||||
|
scored = {}
|
||||||
|
for r in rows:
|
||||||
|
if not isinstance(r, dict):
|
||||||
|
continue
|
||||||
|
rid = str(r.get('id') or '').strip()
|
||||||
|
if not rid:
|
||||||
|
continue
|
||||||
|
name = str(r.get('investor_name') or '').strip()
|
||||||
|
# An exact name match belongs to find_intake_match — never echo it back as a candidate.
|
||||||
|
if wanted_name and _normalize_text(name) == wanted_name:
|
||||||
|
continue
|
||||||
|
name_score = _name_similarity(wanted_name, name) if (wanted_name and name) else 0.0
|
||||||
|
email_score = 0.0
|
||||||
|
if wanted_email:
|
||||||
|
contacts = r.get('contacts')
|
||||||
|
if isinstance(contacts, list):
|
||||||
|
for c in contacts:
|
||||||
|
if not isinstance(c, dict):
|
||||||
|
continue
|
||||||
|
ce = str(c.get('email') or '').strip().lower()
|
||||||
|
if not ce:
|
||||||
|
continue
|
||||||
|
dist = _email_edit_distance(wanted_email, ce)
|
||||||
|
# dist 0 is an exact email (find_intake_match's); 1→0.9, 2→0.8 are near-misses
|
||||||
|
if 0 < dist <= max_email_distance:
|
||||||
|
email_score = max(email_score, 1.0 - 0.1 * dist)
|
||||||
|
score = max(name_score, email_score)
|
||||||
|
if score < min_score: # too weak to be a useful suggestion
|
||||||
|
continue
|
||||||
|
matched_on = 'email' if email_score >= name_score else 'name'
|
||||||
|
# a row can match on both name and email — keep its highest-scoring read
|
||||||
|
if rid not in scored or score > scored[rid]['score']:
|
||||||
|
scored[rid] = {"id": rid, "investor_name": name,
|
||||||
|
"score": round(score, 3), "matched_on": matched_on}
|
||||||
|
out = sorted(scored.values(), key=lambda x: x['score'], reverse=True)
|
||||||
|
return out[:limit]
|
||||||
|
|
||||||
|
|
||||||
def ensure_fundraising_state_row(conn):
|
def ensure_fundraising_state_row(conn):
|
||||||
existing = conn.execute("SELECT * FROM fundraising_state WHERE id = 'main'").fetchone()
|
existing = conn.execute("SELECT * FROM fundraising_state WHERE id = 'main'").fetchone()
|
||||||
if not existing:
|
if not existing:
|
||||||
@@ -2950,7 +3069,12 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
def handle_intake_match(self, user, params):
|
def handle_intake_match(self, user, params):
|
||||||
"""Read-only: does an investor matching this intake already exist? Used by the
|
"""Read-only: does an investor matching this intake already exist? Used by the
|
||||||
Matrix intake bot to label its in-thread proposal new-vs-existing. Returns the
|
Matrix intake bot to label its in-thread proposal new-vs-existing. Returns the
|
||||||
grid row id so an approved note lands on exactly that investor."""
|
grid row id so an approved note lands on exactly that investor.
|
||||||
|
|
||||||
|
`match` is the confident exact match (auto-attached by the bot). When there is no
|
||||||
|
exact match, `candidates` carries ranked fuzzy near-matches so the bot can surface
|
||||||
|
a disambiguation shortlist in-thread (the human picks one or creates new) — closing
|
||||||
|
the duplicate-investor hole the exact-only matcher leaves open."""
|
||||||
q = str(params.get('q') or '').strip()
|
q = str(params.get('q') or '').strip()
|
||||||
email = str(params.get('email') or '').strip()
|
email = str(params.get('email') or '').strip()
|
||||||
if not q and not email:
|
if not q and not email:
|
||||||
@@ -2958,9 +3082,10 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
conn = get_db()
|
conn = get_db()
|
||||||
try:
|
try:
|
||||||
match = find_intake_match(conn, q, email)
|
match = find_intake_match(conn, q, email)
|
||||||
|
candidates = find_intake_candidates(conn, q, email) if match is None else []
|
||||||
finally:
|
finally:
|
||||||
conn.close()
|
conn.close()
|
||||||
return self.send_json({"data": {"match": match}})
|
return self.send_json({"data": {"match": match, "candidates": candidates}})
|
||||||
|
|
||||||
def handle_update_communication(self, user, comm_id, body):
|
def handle_update_communication(self, user, comm_id, body):
|
||||||
conn = get_db()
|
conn = get_db()
|
||||||
|
|||||||
@@ -71,6 +71,10 @@ GRID = {
|
|||||||
"rows": [
|
"rows": [
|
||||||
{"id": "rowAcme", "investor_name": "Acme Capital", "notes": "",
|
{"id": "rowAcme", "investor_name": "Acme Capital", "notes": "",
|
||||||
"contacts": [{"name": "Jane Doe", "email": "jane@acme.com", "title": "GP"}]},
|
"contacts": [{"name": "Jane Doe", "email": "jane@acme.com", "title": "GP"}]},
|
||||||
|
{"id": "rowCharlie", "investor_name": "Charlie Brown", "notes": "",
|
||||||
|
"contacts": [{"name": "Charlie Brown", "email": "cb@brown.fund", "title": ""}]},
|
||||||
|
{"id": "rowBeta", "investor_name": "Beta Capital LLC", "notes": "",
|
||||||
|
"contacts": [{"name": "Pat Roe", "email": "pat@beta.com", "title": ""}]},
|
||||||
],
|
],
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -119,6 +123,61 @@ def main():
|
|||||||
check(st == 200 and (d or {}).get("data", {}).get("match") is None,
|
check(st == 200 and (d or {}).get("data", {}).get("match") is None,
|
||||||
f"no match -> null (got {st}, {d})")
|
f"no match -> null (got {st}, {d})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: exact match returns no candidates (bot auto-attaches)]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Acme%20Capital", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
check(st == 200 and data.get("match") and data.get("candidates") == [],
|
||||||
|
f"exact match -> match set, candidates empty (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: near-spelling surfaces a candidate (Charles Brown ~ Charlie Brown)]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Charles%20Brown", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
cids = [c["id"] for c in data.get("candidates", [])]
|
||||||
|
check(data.get("match") is None and "rowCharlie" in cids,
|
||||||
|
f"near-spelling -> candidate rowCharlie, no exact (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: legal-suffix difference surfaces a candidate (Beta Capital ~ Beta Capital LLC)]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Beta%20Capital", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
cids = [c["id"] for c in data.get("candidates", [])]
|
||||||
|
check(data.get("match") is None and "rowBeta" in cids,
|
||||||
|
f"legal-suffix -> candidate rowBeta, no exact (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: legal-suffix-only difference ranks as a top candidate (Acme Capital LLC ~ Acme Capital)]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Acme%20Capital%20LLC", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
top = (data.get("candidates") or [None])[0]
|
||||||
|
check(data.get("match") is None and top and top["id"] == "rowAcme" and top["score"] == 1.0,
|
||||||
|
f"legal-suffix-only -> rowAcme top candidate @1.0, no exact (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: one-character email typo surfaces a candidate by email]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?email=jhane@acme.com", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
cands = data.get("candidates", [])
|
||||||
|
hit = next((c for c in cands if c["id"] == "rowAcme"), None)
|
||||||
|
check(data.get("match") is None and hit and hit["matched_on"] == "email",
|
||||||
|
f"email typo -> candidate rowAcme matched_on email (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: two-character email typo (distance 2) still surfaces]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?email=jane@acne.con", token) # acme->acne, com->con
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
hit = next((c for c in data.get("candidates", []) if c["id"] == "rowAcme"), None)
|
||||||
|
check(data.get("match") is None and hit and hit["matched_on"] == "email" and hit["score"] == 0.8,
|
||||||
|
f"dist-2 email -> rowAcme @0.8 (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: a row matching on BOTH name and email appears once (deduped)]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Acme%20Capitol&email=jhane@acme.com", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
acme_hits = [c for c in data.get("candidates", []) if c["id"] == "rowAcme"]
|
||||||
|
check(data.get("match") is None and len(acme_hits) == 1,
|
||||||
|
f"name+email both match rowAcme -> single deduped entry (got {data})")
|
||||||
|
|
||||||
|
print("\n[fuzzy: nothing close -> empty candidates]")
|
||||||
|
st, d = _req(port, "GET", "/api/intake/match?q=Zphq%20Nobody%20LP", token)
|
||||||
|
data = (d or {}).get("data", {})
|
||||||
|
check(st == 200 and data.get("match") is None and data.get("candidates") == [],
|
||||||
|
f"unrelated query -> no match, no candidates (got {data})")
|
||||||
|
|
||||||
print("\n[match: missing q and email -> 400]")
|
print("\n[match: missing q and email -> 400]")
|
||||||
st, _ = _req(port, "GET", "/api/intake/match", token)
|
st, _ = _req(port, "GET", "/api/intake/match", token)
|
||||||
check(st == 400, f"no params -> 400 (got {st})")
|
check(st == 400, f"no params -> 400 (got {st})")
|
||||||
|
|||||||
@@ -7,8 +7,15 @@ paths:
|
|||||||
|
|
||||||
Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
|
Read this before editing `backend/matrix_intake/`. The bot turns a typed message in a
|
||||||
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
|
dedicated Matrix room into a proposed fundraising-grid add/edit, gated on **in-thread human
|
||||||
approval** before any write. Phase status: **M1 + M2 built** (text intake + approval + write);
|
approval** before any write. Phase status: **M1 + M2 deployed & live** (text intake + approval + write; bot on the Spark,
|
||||||
**M3 (business-card photo) deferred** — Spark Control has no vision model yet.
|
CRM endpoints on the box at **v0.1.0:85**; live-smoked 2026-06-17). **M3 (business-card photo)
|
||||||
|
deferred** — Spark Control has no vision model yet.
|
||||||
|
|
||||||
|
**Post-deploy UX pass — BUILT, not yet deployed (2026-06-17):** fuzzy investor matching
|
||||||
|
(server-side, **v0.1.0:86** — needs s9pk build+install) + in-thread disambiguation and
|
||||||
|
conversational natural-language edits (bot-side — needs a Spark `git pull` + restart). See
|
||||||
|
*Fuzzy matching* below. Tests green (27/27 backend + the offline bot suite); **not yet
|
||||||
|
live-smoked** — the disambiguation grammar and the Qwen `revise` leg need a Matrix smoke.
|
||||||
|
|
||||||
## What it is (and isn't)
|
## What it is (and isn't)
|
||||||
|
|
||||||
@@ -27,13 +34,56 @@ approval** before any write. Phase status: **M1 + M2 built** (text intake + appr
|
|||||||
|
|
||||||
1. Top-level message in the intake room → `parse.parse_message` → local **Qwen via Spark
|
1. Top-level message in the intake room → `parse.parse_message` → local **Qwen via Spark
|
||||||
Control** (`spark.py` reuses `backend/ingest/llm.py`; temp 0, JSON only) extracts
|
Control** (`spark.py` reuses `backend/ingest/llm.py`; temp 0, JSON only) extracts
|
||||||
`{intent, investor_name, contact_name, contact_email, contact_title, note}`.
|
`{intent, investor_name, contact_name, contact_email, contact_title, note}`. The original
|
||||||
2. `crm_client.match` (`GET /api/intake/match`) checks new-vs-existing and returns the **grid
|
message text is stashed on the proposal as `_source_text` (needed later for `revise`'s
|
||||||
row id** so an approved note lands on exactly that investor (no duplicate).
|
email-integrity check).
|
||||||
3. The proposal is posted **in a thread** rooted at the user's message; the pending proposal is
|
2. `crm_client.match` (`GET /api/intake/match`) resolves new-vs-existing. It returns **both** an
|
||||||
held in memory keyed by that thread root (`proposals.ProposalStore`).
|
exact `match` (returns the **grid row id** so an approved note lands on exactly that investor,
|
||||||
4. User replies in-thread: `yes` / `edit field=value` / `no`. On `yes`, `crm_client.commit`
|
no duplicate) **and**, when there's no exact match, a ranked list of fuzzy `candidates` (see
|
||||||
POSTs to `log-communication` tagged `source="matrix_intake"` (provenance in the audit log).
|
*Fuzzy matching* below).
|
||||||
|
3. Three outcomes drive what gets posted, all **in a thread** rooted at the user's message, plus a
|
||||||
|
brief **main-timeline nudge** (a plain reply — `matrix_io.make_reply`) so it isn't missed:
|
||||||
|
- **Exact match** → auto-attach: proposal flips to `meeting_note` with `_match_id` set, rendered
|
||||||
|
as the normal approval card.
|
||||||
|
- **Fuzzy candidates, no exact** → a **disambiguation** card (`proposals.render_disambiguation`):
|
||||||
|
the proposal is held at `_stage="disambiguate"` with `_candidates`, and the human must pick a
|
||||||
|
**number** / `new` / `no` before it becomes an approval-stage proposal.
|
||||||
|
- **Neither** → the new-investor approval card.
|
||||||
|
The nudge is a **pointer only, not a reply target** — you need the thread to act. The pending
|
||||||
|
proposal is held in memory keyed by the thread root (`proposals.ProposalStore`).
|
||||||
|
4. User replies **in the thread**. `handle_reply` branches on `_stage`:
|
||||||
|
- **disambiguate** (`handle_disambiguation`): a number attaches to that candidate (→ `meeting_note`
|
||||||
|
+ `_match_id`, re-rendered for approval); `new` proceeds as a new investor; `no` discards.
|
||||||
|
- **approval**: `yes` commits; `no` discards; `edit field=value` is the deterministic fast-path
|
||||||
|
edit; **anything else is treated as a natural-language revision** — `parse.revise` sends
|
||||||
|
`{current proposal + instruction}` back through local Qwen and re-renders the revised card (a
|
||||||
|
no-op revision is detected via `proposals.same_fields` and re-prompts instead of saying
|
||||||
|
"Updated"). On `yes`, `crm_client.commit` POSTs to `log-communication` tagged
|
||||||
|
`source="matrix_intake"` (provenance in the audit log).
|
||||||
|
A bare `yes`/`no` typed **top-level** (not in the thread) while a proposal is pending gets a
|
||||||
|
"reply in the thread" redirect (`store.any_pending()` guard in `handle_intake`), not a
|
||||||
|
misparsed new intake.
|
||||||
|
|
||||||
|
## Fuzzy matching (server-side, ships in the s9pk)
|
||||||
|
|
||||||
|
`GET /api/intake/match` returns `{match, candidates}`. `find_intake_match` is unchanged —
|
||||||
|
**exact-after-normalization**, and an exact match still auto-attaches without disambiguation.
|
||||||
|
`find_intake_candidates` (new) is the fuzzy layer, **deterministic, no LLM**: it scans the same
|
||||||
|
canonical grid blob and scores each row by `max(`name similarity`, `email near-match`)`, keeping
|
||||||
|
rows ≥ `min_score` (0.62), ranked, capped at 5:
|
||||||
|
- **Name** (`_name_similarity`): max of stdlib `difflib` sequence ratio (near-spellings —
|
||||||
|
"Charlie"/"Charles") and token-set Jaccard (word-order). **Legal-entity suffixes**
|
||||||
|
(LLC/LP/Inc/… via `_strip_legal_suffix`) are stripped first, so "Acme Capital" ~ "Acme Capital
|
||||||
|
LLC" scores 1.0 (a near-certain duplicate `find_intake_match` misses because it compares the
|
||||||
|
full string) — and is surfaced as a candidate, **never auto-attached** (the human still confirms).
|
||||||
|
- **Email** (`_email_edit_distance`): Levenshtein ≤ 2 against each contact email (dist 1→0.9,
|
||||||
|
2→0.8). Distance 0 is an exact email — that's `find_intake_match`'s job, skipped here.
|
||||||
|
- **Recall-favoring by design:** a shared common name-word ("… Capital") can lift an unrelated firm
|
||||||
|
into the 0.6–0.8 band. Acceptable — it's a *ranked, human-confirmed* shortlist, and the cost of an
|
||||||
|
occasional stray suggestion is far lower than missing a real near-duplicate. **Semantic pruning of
|
||||||
|
the shortlist (the "Charlie really is Charles" judgment) is a deferred LLM-judge re-rank** — fed
|
||||||
|
only the shortlist, never the whole LP list — intentionally NOT built in this pass, because the
|
||||||
|
deterministic filter already surfaces every duplicate the human then resolves.
|
||||||
|
|
||||||
## Rules / gotchas
|
## Rules / gotchas
|
||||||
|
|
||||||
@@ -47,9 +97,27 @@ approval** before any write. Phase status: **M1 + M2 built** (text intake + appr
|
|||||||
could attach the wrong one; the human sees it in the proposal and can `edit email=…` before
|
could attach the wrong one; the human sees it in the proposal and can `edit email=…` before
|
||||||
approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal
|
approving. Cross-referencing multiple addresses to the named contact is a deliberate non-goal
|
||||||
for v1.
|
for v1.
|
||||||
|
- **Conversational revise keeps the email rule:** `parse.revise` re-runs a free-form correction
|
||||||
|
through Qwen but **never trusts the model's email field**. A changed address is accepted only if
|
||||||
|
it literally appears in the *instruction text* (searched first), else the existing
|
||||||
|
integrity-checked address is kept (`_apply_revision`). The model can edit name/contact/title/note
|
||||||
|
freely but cannot mint an email. A revision that nulls both investor and contact is rejected (the
|
||||||
|
proposal can't be emptied to something unactionable). Revise edits fields on the current proposal;
|
||||||
|
it does **not** re-run the matcher if you rename the firm mid-thread (a known v1 limit — the human
|
||||||
|
still approves).
|
||||||
|
- **Deploy is split across two surfaces** (mind which one carries a change): the fuzzy
|
||||||
|
**`candidates`** come from `server.py` → ship in the **s9pk** (build + install, version-bumped).
|
||||||
|
The bot's **disambiguation flow + `revise`** live in `backend/matrix_intake/` → ship on the
|
||||||
|
**Spark** via `git pull` + restart. A bot restart alone won't deliver `candidates` (the box would
|
||||||
|
return an empty list and the bot just proposes new — safe, but no fuzzy surfacing until the s9pk
|
||||||
|
is installed). Same lesson as the v83→v84 `/api/intake/match` 404.
|
||||||
- **Double-approve guard:** `handle_reply` pops the pending proposal from the store *before*
|
- **Double-approve guard:** `handle_reply` pops the pending proposal from the store *before*
|
||||||
awaiting the commit, so a second `yes` arriving mid-write is a no-op (asyncio is cooperative;
|
awaiting the commit, so a second `yes` arriving mid-write is a no-op (asyncio is cooperative;
|
||||||
the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
|
the pop is atomic w.r.t. other events). On commit failure the proposal is restored for retry.
|
||||||
|
*Known minor:* in the **disambiguate** stage the pick re-stores an approval-stage proposal
|
||||||
|
before its `await say`, so a rapidly-repeated `1` can have the second one fall through to the
|
||||||
|
NL-revise path (a wasted Spark round-trip that re-prompts) — harmless, nothing commits, not
|
||||||
|
guarded (low likelihood on a ~5-person team).
|
||||||
- **Local-only parse:** intake text is real LP substance but goes ONLY to local Qwen via Spark
|
- **Local-only parse:** intake text is real LP substance but goes ONLY to local Qwen via Spark
|
||||||
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
|
Control, never Claude — so no scrub boundary applies (same basis as the digest). Never call a
|
||||||
Spark directly; always go through `SPARK_CONTROL_URL`.
|
Spark directly; always go through `SPARK_CONTROL_URL`.
|
||||||
@@ -59,6 +127,29 @@ approval** before any write. Phase status: **M1 + M2 built** (text intake + appr
|
|||||||
network; `backend/test_intake_endpoints.py` boots the real server against a temp DB and
|
network; `backend/test_intake_endpoints.py` boots the real server against a temp DB and
|
||||||
covers `/api/intake/match` + the create→match (no-duplicate) contract + provenance. A **live
|
covers `/api/intake/match` + the create→match (no-duplicate) contract + provenance. A **live
|
||||||
Matrix smoke** needs creds + `matrix-nio` installed on the Spark — it can't run in CI.
|
Matrix smoke** needs creds + `matrix-nio` installed on the Spark — it can't run in CI.
|
||||||
|
- **Grid note line:** the bot sends a **blank `subject`** when there's a note so the CRM's
|
||||||
|
one-line note summary shows the note text (the CRM renders subject-or-body); a provenance
|
||||||
|
label is sent only when there's no note. v0.1.0:85 also dropped the redundant `[note]` type
|
||||||
|
tag from that server-side line (informative types like `[call]` keep theirs).
|
||||||
|
|
||||||
|
## Deployment & ops
|
||||||
|
|
||||||
|
- **Runs on the Spark** (SSH alias `modelo32`, host `spark-32d0`): repo at
|
||||||
|
`/home/modelo/ten31-database`, deps in a venv (`.venv`; only `matrix-nio`). Launched detached:
|
||||||
|
`nohup ./.venv/bin/python backend/matrix_intake/bot.py >/tmp/intake-bot.log 2>&1 &`, pid in
|
||||||
|
`/tmp/intake-bot.pid`; startup logs `listening as … in room …`.
|
||||||
|
- **Restart after a `git pull` of bot code:** `kill $(cat /tmp/intake-bot.pid)`, relaunch as
|
||||||
|
above, re-write the pid. A restart **drops in-memory pending proposals** (re-send to recover).
|
||||||
|
- **NOT a managed service yet** — won't survive a Spark reboot; restart-on-boot (systemd) is an
|
||||||
|
open TODO.
|
||||||
|
- **Server-side endpoints ship in the s9pk, not the bot.** `GET /api/intake/match` and the
|
||||||
|
`source` provenance on `log-communication` live in `backend/server.py`, so they reach the box
|
||||||
|
only via an **s9pk build + install** — a bot restart won't deliver them. (Missed in v83: the
|
||||||
|
box 404'd `/api/intake/match` until **v0.1.0:84**.)
|
||||||
|
- **`CRM_API_BASE` is the box over the LAN, not localhost** (bot on the Spark, CRM on the box).
|
||||||
|
`https://immense-voyage.local` (443) is the **StartOS dashboard**, not the CRM — the CRM has
|
||||||
|
its own interface address (the URL you open in a browser); container port 8080 isn't
|
||||||
|
LAN-reachable.
|
||||||
|
|
||||||
## Config
|
## Config
|
||||||
|
|
||||||
|
|||||||
@@ -50,8 +50,9 @@ export const PACKAGE_TITLE = 'Ten31 Database'
|
|||||||
// * 0.1.0:82 (vendor + SRI-pin the front-end libs: React/ReactDOM/Babel now ship in the s9pk and load same-origin from /assets/vendor/ with integrity hashes, so a CDN can never swap prod deps [the v78/v79 blank-screen class] and the box needs no outbound internet to render; plus a committed jsdom render smoke check [start9/0.4/render-smoke.mjs] gating the default `make` build)
|
// * 0.1.0:82 (vendor + SRI-pin the front-end libs: React/ReactDOM/Babel now ship in the s9pk and load same-origin from /assets/vendor/ with integrity hashes, so a CDN can never swap prod deps [the v78/v79 blank-screen class] and the box needs no outbound internet to render; plus a committed jsdom render smoke check [start9/0.4/render-smoke.mjs] gating the default `make` build)
|
||||||
// * 0.1.0:83 (email search/query + windowed digest preview, code-only: Communications investor dropdown now mirrors the list with typed keys [fund:/org:/contact:] so classic-contact/org-domain matches show + are pickable [fixes the empty-dropdown bug], plus a date-range filter, a click-to-expand full-body view [GET /api/email/detail], and a semantic "Search content" mode over indexed email bodies [GET /api/email/search -> ingest hybrid_search, soft-delete-filtered, 503 if Spark/Qdrant down]; Daily Digest gains an in-app windowed preview before send [POST /api/admin/digest/preview, send-now takes the same window] that exercises the real Spark summarizer without touching the daily cursor)
|
// * 0.1.0:83 (email search/query + windowed digest preview, code-only: Communications investor dropdown now mirrors the list with typed keys [fund:/org:/contact:] so classic-contact/org-domain matches show + are pickable [fixes the empty-dropdown bug], plus a date-range filter, a click-to-expand full-body view [GET /api/email/detail], and a semantic "Search content" mode over indexed email bodies [GET /api/email/search -> ingest hybrid_search, soft-delete-filtered, 503 if Spark/Qdrant down]; Daily Digest gains an in-app windowed preview before send [POST /api/admin/digest/preview, send-now takes the same window] that exercises the real Spark summarizer without touching the daily cursor)
|
||||||
// * 0.1.0:84 (Matrix intake bot CRM support — ships the server side of commit 7ad0ee7, which was never packaged: new read-only GET /api/intake/match [new-vs-existing lookup against the canonical fundraising grid blob; returns the grid row id so an approved note lands on the matched investor, no duplicate] + source provenance on POST /api/fundraising/log-communication [audit records source, default "fundraising_grid"]; code-only, no schema change)
|
// * 0.1.0:84 (Matrix intake bot CRM support — ships the server side of commit 7ad0ee7, which was never packaged: new read-only GET /api/intake/match [new-vs-existing lookup against the canonical fundraising grid blob; returns the grid row id so an approved note lands on the matched investor, no duplicate] + source provenance on POST /api/fundraising/log-communication [audit records source, default "fundraising_grid"]; code-only, no schema change)
|
||||||
// * Current: 0.1.0:85 (cosmetic: drop the redundant "[note]" tag from the fundraising-grid note line — now "YYYY-MM-DD Contact: summary"; informative comm types [call, meeting, …] keep their "[type]" tag; shared by the Matrix intake bot + grid-UI logging; no schema change)
|
// * 0.1.0:85 (cosmetic: drop the redundant "[note]" tag from the fundraising-grid note line — now "YYYY-MM-DD Contact: summary"; informative comm types [call, meeting, …] keep their "[type]" tag; shared by the Matrix intake bot + grid-UI logging; no schema change)
|
||||||
export const PACKAGE_VERSION = '0.1.0:85'
|
// * Current: 0.1.0:86 (Matrix intake fuzzy matching: GET /api/intake/match now returns ranked `candidates` [fuzzy near-matches — deterministic difflib name similarity + token overlap + email edit-distance ≤ 2, legal-suffix-aware] alongside the exact `match`, so the bot can surface near-duplicates ["Charlie"/"Charles", "Acme Capital"/"Acme Capital LLC", a one-char email typo] for human confirmation instead of silently creating a second investor; the bot-side disambiguation + conversational-edit UX ships on the Spark, not the s9pk; code-only, no schema change)
|
||||||
|
export const PACKAGE_VERSION = '0.1.0:86'
|
||||||
|
|
||||||
export const DATA_MOUNT_PATH = '/data'
|
export const DATA_MOUNT_PATH = '/data'
|
||||||
export const WEB_PORT = 8080
|
export const WEB_PORT = 8080
|
||||||
|
|||||||
@@ -46,8 +46,9 @@ import { v_0_1_0_82 } from './v0.1.0.82'
|
|||||||
import { v_0_1_0_83 } from './v0.1.0.83'
|
import { v_0_1_0_83 } from './v0.1.0.83'
|
||||||
import { v_0_1_0_84 } from './v0.1.0.84'
|
import { v_0_1_0_84 } from './v0.1.0.84'
|
||||||
import { v_0_1_0_85 } from './v0.1.0.85'
|
import { v_0_1_0_85 } from './v0.1.0.85'
|
||||||
|
import { v_0_1_0_86 } from './v0.1.0.86'
|
||||||
|
|
||||||
export const versionGraph = VersionGraph.of({
|
export const versionGraph = VersionGraph.of({
|
||||||
current: v_0_1_0_85,
|
current: v_0_1_0_86,
|
||||||
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49, v_0_1_0_50, v_0_1_0_51, v_0_1_0_52, v_0_1_0_53, v_0_1_0_54, v_0_1_0_55, v_0_1_0_56, v_0_1_0_57, v_0_1_0_58, v_0_1_0_59, v_0_1_0_60, v_0_1_0_61, v_0_1_0_62, v_0_1_0_63, v_0_1_0_64, v_0_1_0_65, v_0_1_0_66, v_0_1_0_67, v_0_1_0_68, v_0_1_0_69, v_0_1_0_70, v_0_1_0_71, v_0_1_0_72, v_0_1_0_73, v_0_1_0_74, v_0_1_0_75, v_0_1_0_76, v_0_1_0_77, v_0_1_0_78, v_0_1_0_79, v_0_1_0_80, v_0_1_0_81, v_0_1_0_82, v_0_1_0_83, v_0_1_0_84],
|
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49, v_0_1_0_50, v_0_1_0_51, v_0_1_0_52, v_0_1_0_53, v_0_1_0_54, v_0_1_0_55, v_0_1_0_56, v_0_1_0_57, v_0_1_0_58, v_0_1_0_59, v_0_1_0_60, v_0_1_0_61, v_0_1_0_62, v_0_1_0_63, v_0_1_0_64, v_0_1_0_65, v_0_1_0_66, v_0_1_0_67, v_0_1_0_68, v_0_1_0_69, v_0_1_0_70, v_0_1_0_71, v_0_1_0_72, v_0_1_0_73, v_0_1_0_74, v_0_1_0_75, v_0_1_0_76, v_0_1_0_77, v_0_1_0_78, v_0_1_0_79, v_0_1_0_80, v_0_1_0_81, v_0_1_0_82, v_0_1_0_83, v_0_1_0_84, v_0_1_0_85],
|
||||||
})
|
})
|
||||||
|
|||||||
@@ -0,0 +1,20 @@
|
|||||||
|
import { VersionInfo } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
|
// Matrix intake — fuzzy investor matching. GET /api/intake/match now returns, alongside the
|
||||||
|
// exact `match`, a ranked list of `candidates`: fuzzy near-matches (deterministic difflib name
|
||||||
|
// similarity + token overlap + email edit-distance ≤ 2, legal-suffix-aware) the intake bot can
|
||||||
|
// surface in-thread for the human to pick from — so a near-duplicate name ("Charlie"/"Charles",
|
||||||
|
// "Acme Capital"/"Acme Capital LLC", a one-char email typo) no longer silently creates a second
|
||||||
|
// investor. Server-side only (the bot's disambiguation + conversational-edit UX ships on the
|
||||||
|
// Spark, not in the s9pk). Code-only, no schema change.
|
||||||
|
export const v_0_1_0_86 = VersionInfo.of({
|
||||||
|
version: '0.1.0:86',
|
||||||
|
releaseNotes: {
|
||||||
|
en_US: [
|
||||||
|
'Matrix intake: the new-vs-existing lookup now also returns ranked fuzzy near-matches,',
|
||||||
|
'so a typo’d or near-duplicate investor name is surfaced for confirmation instead of',
|
||||||
|
'silently creating a duplicate. No data changes.',
|
||||||
|
].join(' '),
|
||||||
|
},
|
||||||
|
migrations: { up: async () => {}, down: async () => {} },
|
||||||
|
})
|
||||||
Reference in New Issue
Block a user