Restrict comms_by_user/email_counts_by_user to matched-investor email
Both NL-query intents counted/listed a user's ENTIRE captured sent corpus (internal, vendor, personal mail) rather than only email to a matched investor — they were missing the `EXISTS email_investor_links` gate that recent_emails and the Communications panel's query_email_activity use. Their own docstrings said "investor emails", so the behavior was wrong, not just loose. Add the matched-only gate to both, mirroring query_email_activity. The runner test now seeds an unmatched sent email and asserts it is excluded (without the fix comms_by_user returns 3 not 2, this_week 2 not 1) — the prior fixture linked every email, so the leak went uncaught. Also documents the matched-only rule in the nl-query guide, and refreshes the AGENTS.md Current state (v93 deployed; this fix pending a v94 s9pk since the intents run on the box, not the bot).
This commit is contained in:
@@ -106,13 +106,13 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
|
|||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
_Phase 0 + Phase 1 built; **box live at v0.1.0:91; repo at v0.1.0:92** (reminders, deploy pending). **The fundraising grid + email capture is the canonical system of record.** Active thread: **W2 natural-language query** (backend + Matrix `@bot` surface built; web "Ask" box next). Deploy/feature history: git log + `start9/0.4/startos/versions/`; longer-term backlog/debt: `ROADMAP.md` / `EVALUATION.md`._
|
_Phase 0 + Phase 1 built; **box live at v0.1.0:93; repo at v0.1.0:93** (reminders W1 + NL-query W2 deployed 2026-06-18). **The fundraising grid + email capture is the canonical system of record.** Active thread: **W2 natural-language query** (backend + Matrix Q&A live; web "Ask" box next). Deploy/feature history: git log + `start9/0.4/startos/versions/`; longer-term backlog/debt: `ROADMAP.md` / `EVALUATION.md`._
|
||||||
|
|
||||||
- **W2 — natural-language query (read-only): BACKEND + MATRIX `@bot` surface built + tested locally 2026-06-18; web "Ask" box next.** `backend/nl_query/` — 12 curated parameterized queries + a slot validator (the trust boundary; no generic SQL) + a **local-Qwen** translator (question→{intent,slots} via Spark Control; nothing leaves the box, **no Claude, no redaction** — the simplification Grant chose). `POST /api/query/nl` (also accepts direct `{intent,slots}`) + `GET /api/query/catalog`, `require_bot_or_admin`, audited (`entity_type='nl_query'`). **Local Qwen translated 12/12 of Grant's real example questions correctly against the live Spark** — settles local-only (Claude not needed). Soft-delete-correct per table (gotcha: `fundraising_*` has **no `deleted_at`** — `graveyard` is the axis; emails via a live `eam` sighting). Guide: `docs/guides/nl-query.md`. **Step 5 (Matrix Q&A) DONE** — thin client in `backend/matrix_intake/query.py` (trigger grammar + answer rendering) + `crm_client.nl_query` + `bot.py` wiring, read-only (no approval gate), tested in `test_query.py`. **Two entry points (room-per-purpose model):** a **dedicated Q&A room** (`MATRIX_QUERY_ROOM`) where every message is a question, **and** the `?`/`@bot` trigger still working in the intake room as a cross-room convenience. Ships on the **Spark** (git pull + restart, no s9pk for the bot). Q&A room `!RGlJEObVaIUtUVcHtx:matrix.gilliam.ai` created + bot invited (2026-06-18). **BUT the box-side `/api/query/nl` endpoint is NOT live yet** (box v91; verified 404 on 2026-06-18) — it lands with the **v93 s9pk** (reminders + W2). **So DON'T activate the bot deploy (set `MATRIX_QUERY_ROOM` + restart) until v93 is installed**, or every question 404s. Code committed + pushed; bot deploy is staged to follow the v93 install. **Next: step 4 web "Ask" box (Communications tab)** — the last thin client.
|
- **W2 — natural-language query (read-only): BACKEND + MATRIX Q&A LIVE (deployed v0.1.0:93, 2026-06-18); web "Ask" box next.** `backend/nl_query/` — 12 curated parameterized queries + a slot validator (the trust boundary; no generic SQL) + a **local-Qwen** translator (question→{intent,slots} via Spark Control; nothing leaves the box, **no Claude, no redaction** — the simplification Grant chose). `POST /api/query/nl` (also accepts direct `{intent,slots}`) + `GET /api/query/catalog`, `require_bot_or_admin`, audited (`entity_type='nl_query'`) — **live on the box** (verified 400/200 post-install). Soft-delete-correct per table (gotcha: `fundraising_*` has **no `deleted_at`** — `graveyard` is the axis; emails via a live `eam` sighting). Guide: `docs/guides/nl-query.md`. **Step 5 (Matrix Q&A) DONE + DEPLOYED** — thin client in `backend/matrix_intake/query.py` (trigger grammar + answer rendering) + `crm_client.nl_query` + `bot.py` wiring, read-only (no approval gate), tested in `test_query.py`. **Two entry points (room-per-purpose model):** a **dedicated Q&A room** (`MATRIX_QUERY_ROOM=!RGlJEObVaIUtUVcHtx:matrix.gilliam.ai`) where every message is a question, **and** the `?`/`@bot` trigger in the intake room as a cross-room convenience. Bot rebuilt + running on the Spark (logs: `answering questions in room …`). **End-to-end verified from inside the bot container** (3 questions → correct intents, live box, no errors; `investors_cold` hits the 500-row cap so Matrix shows 30 + a refine note). **Remaining: the actual in-room Matrix smoke (a human typing a question) — not yet done.** **Matched-only fix (2026-06-18, post-v93):** `comms_by_user` + `email_counts_by_user` were counting/listing the user's *entire* captured sent corpus, not just investor-linked email (missing the `EXISTS email_investor_links` gate that `recent_emails`/`query_email_activity` use) — **fixed + regression-tested in the repo, but the box still runs the leaky v93 behavior until a v94 s9pk** (these intents run on the box, not the bot). **Next: step 4 web "Ask" box (Communications tab)** — the last thin client.
|
||||||
|
|
||||||
- **W1 — reminders & follow-ups: BUILT + tested locally (v0.1.0:92), DEPLOY PENDING.** First-class tickler tied to the grid (migration `0006`; CRUD `GET/POST/PATCH/DELETE /api/reminders`; derived `reminder_status` grid column; Reminders page + dashboard card + digest section; the `last_activity_at` recency rollup that W2 reuses). Needs s9pk build + install (authorize first; verify `0006` against a DB copy). Deferred **W1b** = nurture-gap auto-suggested reminders.
|
- **W1 — reminders & follow-ups: LIVE (deployed v0.1.0:93, 2026-06-18).** First-class tickler tied to the grid (migration `0006` — applied cleanly on the box per logs; CRUD `GET/POST/PATCH/DELETE /api/reminders`; derived `reminder_status` grid column; Reminders page + dashboard card + digest section; the `last_activity_at` recency rollup that W2 reuses). `0006` was verified up/down against a copy of `crm.db` before install. Deferred **W1b** = nurture-gap auto-suggested reminders.
|
||||||
|
|
||||||
- **Done & live (detail in git log / ROADMAP):** email-proposal Matrix review + `bot` role (box v91); grid-driven Pipeline (v88); Matrix intake bot (Spark `matrix-intake` container); Gmail capture (DWD) + propose→approve + daily digest; Thesis Workshop + Architect (Claude, dual-approval); outreach drafts + radar. All draft-only.
|
- **Done & live (detail in git log / ROADMAP):** email-proposal Matrix review + `bot` role (box v91); grid-driven Pipeline (v88); Matrix intake bot (Spark `matrix-intake` container); Gmail capture (DWD) + propose→approve + daily digest; Thesis Workshop + Architect (Claude, dual-approval); outreach drafts + radar. All draft-only.
|
||||||
- **Tests:** **35/35 backend green** (`python3 backend/run_tests.py`; +`nl_query/` + matrix `test_query.py` suites), `py_compile` clean; render-smoke gates `make`.
|
- **Tests:** **35/35 backend green** (`python3 backend/run_tests.py`; +`nl_query/` + matrix `test_query.py` suites), `py_compile` clean; render-smoke gates `make`.
|
||||||
- **Next (priority order):** 1) **deploy reminders (v92) + W2 together** — bump to **v0.1.0:93**, build s9pk, install, browser-verify (authorize first; verify `0006` against a DB copy) — **this is the gate for the Matrix Q&A: the bot's step-5 surface 404s until `/api/query/nl` is on the box**; THEN activate the bot deploy (set `MATRIX_QUERY_ROOM` on the Spark + git pull + restart) + in-room smoke; 2) **W2 step 4** web Ask box (last NL-query client); 3) **W3** bot grid-mutations behind the Matrix approval gate (local-Qwen parse); 4) **W1b** nurture-gap reminders; 5) Grant + Jonathan freeze v2.0 canonical; 6) in-room smoke of the intake disambiguation numbered-pick grammar; then P2 debt (reports comms-aggregate soft-delete sweep, `?limit=abc` crash, auth regression test, oversized StartOS icon).
|
- **Next (priority order):** 1) **in-room Matrix smoke** of the Q&A room (type a real question; confirm the answer renders well on mobile — broad questions like "cold investors" hit the 500-row cap → 30 shown + refine note) + the intake `?`/`@bot` trigger; 2) **W2 step 4** web Ask box (last NL-query client); 3) **W3** bot grid-mutations behind the Matrix approval gate (local-Qwen parse); 4) **W1b** nurture-gap reminders; 5) Grant + Jonathan freeze v2.0 canonical; 6) in-room smoke of the intake disambiguation numbered-pick grammar; then P2 debt (reports comms-aggregate soft-delete sweep, `?limit=abc` crash, auth regression test, oversized StartOS icon).
|
||||||
- **Open / risks:** W2 translation only **happy-path-validated** (typos/ambiguous/no-match phrasings shake out in live use); **Claude/Architect path still unverified live on the box**; v2.0 reserve-asset spine is the *working approved* spine but **not canonical** (needs dual sign-off); doc drift — `crm-overview.md` + `EVALUATION.md` still call `lp_profiles` live.
|
- **Open / risks:** W2 translation only **happy-path-validated** (typos/ambiguous/no-match phrasings shake out in live use); **Claude/Architect path still unverified live on the box**; v2.0 reserve-asset spine is the *working approved* spine but **not canonical** (needs dual sign-off); doc drift — `crm-overview.md` + `EVALUATION.md` still call `lp_profiles` live.
|
||||||
|
|||||||
@@ -306,8 +306,11 @@ def run_investor_last_contact(conn, slots):
|
|||||||
|
|
||||||
|
|
||||||
def run_comms_by_user(conn, slots):
|
def run_comms_by_user(conn, slots):
|
||||||
"""The most recent `limit` outbound investor emails sent by a given user (matched by
|
"""The most recent `limit` outbound **investor** emails sent by a given user (matched by
|
||||||
username or full name). Soft-delete-correct (live sighting, is_sent)."""
|
username or full name). MATCHED-ONLY: restricted to investor-linked email (an
|
||||||
|
email_investor_links row exists), mirroring query_email_activity / recent_emails — NOT the
|
||||||
|
user's entire sent corpus (internal/vendor/personal mail is captured but never surfaced
|
||||||
|
here). Soft-delete-correct (live sighting, is_sent)."""
|
||||||
n, pat = slots["limit"], like_contains(slots["user"])
|
n, pat = slots["limit"], like_contains(slots["user"])
|
||||||
rows = _rows(conn.execute(
|
rows = _rows(conn.execute(
|
||||||
"SELECT e.subject, e.sent_at, u.full_name AS sender, "
|
"SELECT e.subject, e.sent_at, u.full_name AS sender, "
|
||||||
@@ -318,6 +321,7 @@ def run_comms_by_user(conn, slots):
|
|||||||
"AND eam.deleted_at IS NULL AND eam.is_sent = 1 "
|
"AND eam.deleted_at IS NULL AND eam.is_sent = 1 "
|
||||||
"JOIN email_accounts ea ON ea.id = eam.account_id JOIN users u ON u.id = ea.user_id "
|
"JOIN email_accounts ea ON ea.id = eam.account_id JOIN users u ON u.id = ea.user_id "
|
||||||
"WHERE (u.username LIKE ? ESCAPE '\\' OR u.full_name LIKE ? ESCAPE '\\') "
|
"WHERE (u.username LIKE ? ESCAPE '\\' OR u.full_name LIKE ? ESCAPE '\\') "
|
||||||
|
"AND EXISTS (SELECT 1 FROM email_investor_links l2 WHERE l2.email_id = e.id) "
|
||||||
"ORDER BY e.sent_at DESC LIMIT ?", (pat, pat, n)))
|
"ORDER BY e.sent_at DESC LIMIT ?", (pat, pat, n)))
|
||||||
return {"columns": ["sent_at", "subject", "sender", "investor"], "rows": rows,
|
return {"columns": ["sent_at", "subject", "sender", "investor"], "rows": rows,
|
||||||
"truncated": False,
|
"truncated": False,
|
||||||
@@ -325,13 +329,16 @@ def run_comms_by_user(conn, slots):
|
|||||||
|
|
||||||
|
|
||||||
def run_email_counts_by_user(conn, slots):
|
def run_email_counts_by_user(conn, slots):
|
||||||
"""Per-user counts of outbound investor emails over this week / month / year-to-date.
|
"""Per-user counts of outbound **investor** emails over this week / month / year-to-date.
|
||||||
|
MATCHED-ONLY: counts only investor-linked email (an email_investor_links row exists),
|
||||||
|
mirroring query_email_activity / recent_emails — not the user's entire sent corpus.
|
||||||
Windows are calendar-based: week = since Monday, month = since the 1st, ytd = since Jan 1."""
|
Windows are calendar-based: week = since Monday, month = since the 1st, ytd = since Jan 1."""
|
||||||
today = _today()
|
today = _today()
|
||||||
wk = (today - timedelta(days=today.weekday())).isoformat()
|
wk = (today - timedelta(days=today.weekday())).isoformat()
|
||||||
mo = today.replace(day=1).isoformat()
|
mo = today.replace(day=1).isoformat()
|
||||||
yr = today.replace(month=1, day=1).isoformat()
|
yr = today.replace(month=1, day=1).isoformat()
|
||||||
where = "WHERE eam.deleted_at IS NULL AND eam.is_sent = 1"
|
where = ("WHERE eam.deleted_at IS NULL AND eam.is_sent = 1 "
|
||||||
|
"AND EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id)")
|
||||||
params = [wk, mo, yr]
|
params = [wk, mo, yr]
|
||||||
if slots.get("user"):
|
if slots.get("user"):
|
||||||
pat = like_contains(slots["user"])
|
pat = like_contains(slots["user"])
|
||||||
|
|||||||
@@ -101,6 +101,15 @@ def seed(conn):
|
|||||||
email("edel", "grant@ten31.xyz", "Grant Smith", 0, "i_beta", "a_grant", 1, deleted=True) # tombstoned
|
email("edel", "grant@ten31.xyz", "Grant Smith", 0, "i_beta", "a_grant", 1, deleted=True) # tombstoned
|
||||||
email("ej", "jon@ten31.xyz", "Jonathan Lee", 0, "i_acme", "a_jon", 1) # jonathan today
|
email("ej", "jon@ten31.xyz", "Jonathan Lee", 0, "i_acme", "a_jon", 1) # jonathan today
|
||||||
email("ein", "alice@acme.com", "Alice Acme", 3, "i_acme", "a_grant", 0) # inbound 3d
|
email("ein", "alice@acme.com", "Alice Acme", 3, "i_acme", "a_grant", 0) # inbound 3d
|
||||||
|
# an UNMATCHED sent email by Grant (NO email_investor_links row) — captured, but not to a
|
||||||
|
# known investor. The investor-email intents are matched-only, so it must be EXCLUDED from
|
||||||
|
# comms_by_user / email_counts_by_user; without the matched-only filter it would inflate both.
|
||||||
|
c("INSERT INTO emails (id, rfc_message_id, from_email, from_name, sent_at, subject, "
|
||||||
|
"is_matched, match_status) VALUES ('eunm','rfc_eunm','grant@ten31.xyz','Grant Smith',?,"
|
||||||
|
"'Internal: team lunch',0,'unmatched')", (_ago(0),))
|
||||||
|
c("INSERT INTO email_account_messages (id, email_id, account_id, gmail_message_id, "
|
||||||
|
"gmail_thread_id, is_sent, deleted_at) VALUES "
|
||||||
|
"('eam_eunm','eunm','a_grant','g_eunm','t_eunm',1,NULL)")
|
||||||
|
|
||||||
# communications (the other recency leg) — Delta has ONLY comms: one live (5d), one tombstoned
|
# communications (the other recency leg) — Delta has ONLY comms: one live (5d), one tombstoned
|
||||||
# (today). If the soft-delete filter broke, Delta would read as contacted today.
|
# (today). If the soft-delete filter broke, Delta would read as contacted today.
|
||||||
@@ -187,9 +196,10 @@ def main():
|
|||||||
r = run("investor_last_contact", {"name": "beta"})
|
r = run("investor_last_contact", {"name": "beta"})
|
||||||
check(r["rows"][0]["days_since"] >= 39, "investor_last_contact days_since")
|
check(r["rows"][0]["days_since"] >= 39, "investor_last_contact days_since")
|
||||||
check(run("comms_by_user", {"user": "Grant"})["row_count"] == 2,
|
check(run("comms_by_user", {"user": "Grant"})["row_count"] == 2,
|
||||||
"comms_by_user: grant's 2 live outbound (tombstoned excluded)")
|
"comms_by_user: grant's 2 live MATCHED outbound (tombstoned + unmatched excluded)")
|
||||||
r = run("email_counts_by_user", {"user": "grant"})
|
r = run("email_counts_by_user", {"user": "grant"})
|
||||||
check(r["rows"][0]["this_week"] == 1, "email_counts this_week = 1 live (tombstoned excluded)")
|
check(r["rows"][0]["this_week"] == 1,
|
||||||
|
"email_counts this_week = 1 live matched (tombstoned + unmatched excluded)")
|
||||||
check(r["rows"][0]["ytd"] >= 1, "email_counts ytd")
|
check(r["rows"][0]["ytd"] >= 1, "email_counts ytd")
|
||||||
|
|
||||||
print("trust boundary")
|
print("trust boundary")
|
||||||
|
|||||||
@@ -49,6 +49,17 @@ axis is the **`graveyard` flag** (exclude `graveyard = 1` for "live"). Other tab
|
|||||||
to avoid importing the `__main__` server module — helpers take a `conn`, never import server).
|
to avoid importing the `__main__` server module — helpers take a `conn`, never import server).
|
||||||
Keep the two in sync; the soft-delete test guards the copy.
|
Keep the two in sync; the soft-delete test guards the copy.
|
||||||
|
|
||||||
|
## Email/comms intents are MATCHED-ONLY
|
||||||
|
|
||||||
|
The email-touching intents (`recent_emails`, `comms_by_user`, `email_counts_by_user`,
|
||||||
|
`investor_last_contact`) surface only **investor-linked** email — an `email_investor_links` row
|
||||||
|
must exist — exactly like the Communications panel's `query_email_activity`. Captured
|
||||||
|
internal/vendor/personal mail is never counted or listed. The gate is
|
||||||
|
`EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id)`. **`comms_by_user` /
|
||||||
|
`email_counts_by_user` originally omitted this** and counted the user's *entire* sent corpus —
|
||||||
|
fixed; the runner test now seeds an unmatched sent email to guard it. Add this gate to any new
|
||||||
|
email intent.
|
||||||
|
|
||||||
## Endpoint, caps, audit
|
## Endpoint, caps, audit
|
||||||
|
|
||||||
- `POST /api/query/nl` (`require_bot_or_admin`, read-only) — body `{question}` (local translate)
|
- `POST /api/query/nl` (`require_bot_or_admin`, read-only) — body `{question}` (local translate)
|
||||||
|
|||||||
Reference in New Issue
Block a user