Files

T

Keysat 2d43bad6fc Restrict comms_by_user/email_counts_by_user to matched-investor email

Both NL-query intents counted/listed a user's ENTIRE captured sent corpus
(internal, vendor, personal mail) rather than only email to a matched investor
— they were missing the `EXISTS email_investor_links` gate that recent_emails
and the Communications panel's query_email_activity use. Their own docstrings
said "investor emails", so the behavior was wrong, not just loose.

Add the matched-only gate to both, mirroring query_email_activity. The runner
test now seeds an unmatched sent email and asserts it is excluded (without the
fix comms_by_user returns 3 not 2, this_week 2 not 1) — the prior fixture
linked every email, so the leak went uncaught.

Also documents the matched-only rule in the nl-query guide, and refreshes the
AGENTS.md Current state (v93 deployed; this fix pending a v94 s9pk since the
intents run on the box, not the bot).

2026-06-18 20:24:52 -05:00

5.5 KiB

Raw Blame History

paths

backend/nl_query/**

Natural-language query (W2)

Read this before editing the NL-query surface (backend/nl_query/). It is the read-only "ask the database in plain English" layer — web "Ask" box + Matrix @bot <question>.

The trust model — named intents, not a query language

There is no generic SQL/AST compiler and no dynamically-built identifiers. Every query is a fixed, hand-written, reviewed, parameterized statement in intents.py; the only thing a caller (or the model) controls is a small set of typed slot values, bound as ? params. runner.validate is the trust boundary: it accepts only a known intent key and coerces each slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected; it can never name a table/column, pick an operator, or write SQL. run_query never raises — every failure returns a structured error dict (a bad limit=abc must not crash the thread).

To add a capability: add a run_* + a registry entry (with its slots spec) in intents.py; the translator prompt and the UI pick it up automatically from catalog(). Add a test case.

Local-only — no Claude, no redaction here

Translation (question → {intent, slots}) runs on the local Qwen via Spark Control (translate.py, reusing ingest/llm.py), the same sanctioned local leg as intake/digest. The question never leaves the box, so there is no Claude path and no redaction boundary — that was the whole point of the W2 simplification (the answer is sensitive and never leaves; the question is generic English, translated locally). Validated 12/12 on real example questions against the live Spark (2026-06-18). The model output is still untrusted: it goes straight through runner.validate, so a hallucinated intent is rejected. If the local model ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative chat_fn without touching the validator/executor — deliberately not built.

Results never go to any model. Summaries are deterministic local strings; rows render client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.

Soft-delete per table (the gotcha the design reviews caught)

The fundraising_* tables are a hard-rebuilt projection of the grid blob and have no deleted_at column — do NOT add deleted_at IS NULL to them (it raises). Their live/retired axis is the graveyard flag (exclude graveyard = 1 for "live"). Other tables:

reminders / opportunities / communications → filter deleted_at IS NULL.
emails have no deleted_at; "live" = a non-tombstoned sighting (EXISTS email_account_messages … deleted_at IS NULL), mirroring query_email_activity / the digest.

intents._last_activity_by_investor mirrors server.last_activity_by_investor (duplicated to avoid importing the __main__ server module — helpers take a conn, never import server). Keep the two in sync; the soft-delete test guards the copy.

Email/comms intents are MATCHED-ONLY

The email-touching intents (recent_emails, comms_by_user, email_counts_by_user, investor_last_contact) surface only investor-linked email — an email_investor_links row must exist — exactly like the Communications panel's query_email_activity. Captured internal/vendor/personal mail is never counted or listed. The gate is EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id). comms_by_user / email_counts_by_user originally omitted this and counted the user's entire sent corpus — fixed; the runner test now seeds an unmatched sent email to guard it. Add this gate to any new email intent.

Endpoint, caps, audit

POST /api/query/nl (require_bot_or_admin, read-only) — body {question} (local translate) or {intent, slots} (direct, e.g. a UI re-run). Returns {intent, slots, rows, summary, question}. GET /api/query/catalog returns the askable surface for the UI.
Clients (thin): the Matrix Q&A surface is built — it lives bot-side in backend/matrix_intake/query.py (trigger grammar + deterministic answer rendering) + crm_client.nl_query, and ships on the Spark (no s9pk for the bot). Two entry points: a dedicated Q&A room (MATRIX_QUERY_ROOM, every message is a question) and the ?/@bot trigger in the intake room. It depends on this endpoint being live on the box — which lands with the v93 s9pk (reminders + W2); deploy the bot only after that, or it 404s. See the matrix-intake guide. The web "Ask" box (Communications tab) is the remaining client.
Status: local-model outage → 503; unexpected SQL fault → 500; everything else (a hit, or a soft no_match/unknown_intent) → 200 with the structured result, because the UI always wants the interpreted query back, not a bare code.
Every executed query writes an audit row (audit_log, entity_type='nl_query') so a query through a leaked/automated credential is detectable. Global row ceiling MAX_ROWS=500.

Tests + dev harness

test_nl_query.py (runner: every intent + soft-delete on both recency legs + injection-safety

caps), test_translate.py (offline translator via an injected chat_fn), and test_nl_query_endpoint.py (HTTP auth/wiring/503, local model forced down via a dead SPARK_CONTROL_URL port). try_questions.py is a dev harness (not a test) that fires questions at the real local model and prints the translation — the cheap way to check quality.

5.5 KiB Raw Blame History