Both NL-query intents counted/listed a user's ENTIRE captured sent corpus (internal, vendor, personal mail) rather than only email to a matched investor — they were missing the `EXISTS email_investor_links` gate that recent_emails and the Communications panel's query_email_activity use. Their own docstrings said "investor emails", so the behavior was wrong, not just loose. Add the matched-only gate to both, mirroring query_email_activity. The runner test now seeds an unmatched sent email and asserts it is excluded (without the fix comms_by_user returns 3 not 2, this_week 2 not 1) — the prior fixture linked every email, so the leak went uncaught. Also documents the matched-only rule in the nl-query guide, and refreshes the AGENTS.md Current state (v93 deployed; this fix pending a v94 s9pk since the intents run on the box, not the bot).
5.5 KiB
paths
| paths | |
|---|---|
|
Natural-language query (W2)
Read this before editing the NL-query surface (backend/nl_query/). It is the read-only
"ask the database in plain English" layer — web "Ask" box + Matrix @bot <question>.
The trust model — named intents, not a query language
There is no generic SQL/AST compiler and no dynamically-built identifiers. Every query is
a fixed, hand-written, reviewed, parameterized statement in intents.py; the only thing a
caller (or the model) controls is a small set of typed slot values, bound as ? params.
runner.validate is the trust boundary: it accepts only a known intent key and coerces each
slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected;
it can never name a table/column, pick an operator, or write SQL. run_query never raises —
every failure returns a structured error dict (a bad limit=abc must not crash the thread).
To add a capability: add a run_* + a registry entry (with its slots spec) in intents.py;
the translator prompt and the UI pick it up automatically from catalog(). Add a test case.
Local-only — no Claude, no redaction here
Translation (question → {intent, slots}) runs on the local Qwen via Spark Control
(translate.py, reusing ingest/llm.py), the same sanctioned local leg as intake/digest. The
question never leaves the box, so there is no Claude path and no redaction boundary — that
was the whole point of the W2 simplification (the answer is sensitive and never leaves; the
question is generic English, translated locally). Validated 12/12 on real example
questions against the live Spark (2026-06-18). The model output is still untrusted: it goes
straight through runner.validate, so a hallucinated intent is rejected. If the local model
ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative
chat_fn without touching the validator/executor — deliberately not built.
Results never go to any model. Summaries are deterministic local strings; rows render client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.
Soft-delete per table (the gotcha the design reviews caught)
The fundraising_* tables are a hard-rebuilt projection of the grid blob and have no
deleted_at column — do NOT add deleted_at IS NULL to them (it raises). Their live/retired
axis is the graveyard flag (exclude graveyard = 1 for "live"). Other tables:
reminders/opportunities/communications→ filterdeleted_at IS NULL.emailshave nodeleted_at; "live" = a non-tombstoned sighting (EXISTS email_account_messages … deleted_at IS NULL), mirroringquery_email_activity/ the digest.
intents._last_activity_by_investor mirrors server.last_activity_by_investor (duplicated
to avoid importing the __main__ server module — helpers take a conn, never import server).
Keep the two in sync; the soft-delete test guards the copy.
Email/comms intents are MATCHED-ONLY
The email-touching intents (recent_emails, comms_by_user, email_counts_by_user,
investor_last_contact) surface only investor-linked email — an email_investor_links row
must exist — exactly like the Communications panel's query_email_activity. Captured
internal/vendor/personal mail is never counted or listed. The gate is
EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id). comms_by_user /
email_counts_by_user originally omitted this and counted the user's entire sent corpus —
fixed; the runner test now seeds an unmatched sent email to guard it. Add this gate to any new
email intent.
Endpoint, caps, audit
POST /api/query/nl(require_bot_or_admin, read-only) — body{question}(local translate) or{intent, slots}(direct, e.g. a UI re-run). Returns{intent, slots, rows, summary, question}.GET /api/query/catalogreturns the askable surface for the UI.- Clients (thin): the Matrix Q&A surface is built — it lives bot-side in
backend/matrix_intake/query.py(trigger grammar + deterministic answer rendering) +crm_client.nl_query, and ships on the Spark (no s9pk for the bot). Two entry points: a dedicated Q&A room (MATRIX_QUERY_ROOM, every message is a question) and the?/@bottrigger in the intake room. It depends on this endpoint being live on the box — which lands with the v93 s9pk (reminders + W2); deploy the bot only after that, or it 404s. See the matrix-intake guide. The web "Ask" box (Communications tab) is the remaining client. - Status: local-model outage → 503; unexpected SQL fault → 500; everything else
(a hit, or a soft
no_match/unknown_intent) → 200 with the structured result, because the UI always wants the interpreted query back, not a bare code. - Every executed query writes an audit row (
audit_log,entity_type='nl_query') so a query through a leaked/automated credential is detectable. Global row ceilingMAX_ROWS=500.
Tests + dev harness
test_nl_query.py (runner: every intent + soft-delete on both recency legs + injection-safety
- caps),
test_translate.py(offline translator via an injectedchat_fn), andtest_nl_query_endpoint.py(HTTP auth/wiring/503, local model forced down via a deadSPARK_CONTROL_URLport).try_questions.pyis a dev harness (not a test) that fires questions at the real local model and prints the translation — the cheap way to check quality.