Restrict comms_by_user/email_counts_by_user to matched-investor email

Both NL-query intents counted/listed a user's ENTIRE captured sent corpus
(internal, vendor, personal mail) rather than only email to a matched investor
— they were missing the `EXISTS email_investor_links` gate that recent_emails
and the Communications panel's query_email_activity use. Their own docstrings
said "investor emails", so the behavior was wrong, not just loose.

Add the matched-only gate to both, mirroring query_email_activity. The runner
test now seeds an unmatched sent email and asserts it is excluded (without the
fix comms_by_user returns 3 not 2, this_week 2 not 1) — the prior fixture
linked every email, so the leak went uncaught.

Also documents the matched-only rule in the nl-query guide, and refreshes the
AGENTS.md Current state (v93 deployed; this fix pending a v94 s9pk since the
intents run on the box, not the bot).
This commit is contained in:
Keysat
2026-06-18 20:24:52 -05:00
parent f7b03ee109
commit 2d43bad6fc
4 changed files with 38 additions and 10 deletions
+12 -2
View File
@@ -101,6 +101,15 @@ def seed(conn):
email("edel", "grant@ten31.xyz", "Grant Smith", 0, "i_beta", "a_grant", 1, deleted=True) # tombstoned
email("ej", "jon@ten31.xyz", "Jonathan Lee", 0, "i_acme", "a_jon", 1) # jonathan today
email("ein", "alice@acme.com", "Alice Acme", 3, "i_acme", "a_grant", 0) # inbound 3d
# an UNMATCHED sent email by Grant (NO email_investor_links row) — captured, but not to a
# known investor. The investor-email intents are matched-only, so it must be EXCLUDED from
# comms_by_user / email_counts_by_user; without the matched-only filter it would inflate both.
c("INSERT INTO emails (id, rfc_message_id, from_email, from_name, sent_at, subject, "
"is_matched, match_status) VALUES ('eunm','rfc_eunm','grant@ten31.xyz','Grant Smith',?,"
"'Internal: team lunch',0,'unmatched')", (_ago(0),))
c("INSERT INTO email_account_messages (id, email_id, account_id, gmail_message_id, "
"gmail_thread_id, is_sent, deleted_at) VALUES "
"('eam_eunm','eunm','a_grant','g_eunm','t_eunm',1,NULL)")
# communications (the other recency leg) — Delta has ONLY comms: one live (5d), one tombstoned
# (today). If the soft-delete filter broke, Delta would read as contacted today.
@@ -187,9 +196,10 @@ def main():
r = run("investor_last_contact", {"name": "beta"})
check(r["rows"][0]["days_since"] >= 39, "investor_last_contact days_since")
check(run("comms_by_user", {"user": "Grant"})["row_count"] == 2,
"comms_by_user: grant's 2 live outbound (tombstoned excluded)")
"comms_by_user: grant's 2 live MATCHED outbound (tombstoned + unmatched excluded)")
r = run("email_counts_by_user", {"user": "grant"})
check(r["rows"][0]["this_week"] == 1, "email_counts this_week = 1 live (tombstoned excluded)")
check(r["rows"][0]["this_week"] == 1,
"email_counts this_week = 1 live matched (tombstoned + unmatched excluded)")
check(r["rows"][0]["ytd"] >= 1, "email_counts ytd")
print("trust boundary")