Files
ten31-database/docs/guides/nl-query.md
T
Keysat 2d43bad6fc Restrict comms_by_user/email_counts_by_user to matched-investor email
Both NL-query intents counted/listed a user's ENTIRE captured sent corpus
(internal, vendor, personal mail) rather than only email to a matched investor
— they were missing the `EXISTS email_investor_links` gate that recent_emails
and the Communications panel's query_email_activity use. Their own docstrings
said "investor emails", so the behavior was wrong, not just loose.

Add the matched-only gate to both, mirroring query_email_activity. The runner
test now seeds an unmatched sent email and asserts it is excluded (without the
fix comms_by_user returns 3 not 2, this_week 2 not 1) — the prior fixture
linked every email, so the leak went uncaught.

Also documents the matched-only rule in the nl-query guide, and refreshes the
AGENTS.md Current state (v93 deployed; this fix pending a v94 s9pk since the
intents run on the box, not the bot).
2026-06-18 20:24:52 -05:00

88 lines
5.5 KiB
Markdown

---
paths:
- backend/nl_query/**
---
# Natural-language query (W2)
Read this before editing the NL-query surface (`backend/nl_query/`). It is the read-only
"ask the database in plain English" layer — web "Ask" box + Matrix `@bot <question>`.
## The trust model — named intents, not a query language
There is **no generic SQL/AST compiler and no dynamically-built identifiers.** Every query is
a fixed, hand-written, reviewed, parameterized statement in `intents.py`; the only thing a
caller (or the model) controls is a small set of typed **slot values**, bound as `?` params.
`runner.validate` is the trust boundary: it accepts only a known intent key and coerces each
slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected;
it can never name a table/column, pick an operator, or write SQL. `run_query` never raises —
every failure returns a structured error dict (a bad `limit=abc` must not crash the thread).
To add a capability: add a `run_*` + a registry entry (with its `slots` spec) in `intents.py`;
the translator prompt and the UI pick it up automatically from `catalog()`. Add a test case.
## Local-only — no Claude, no redaction here
Translation (question → `{intent, slots}`) runs on the **local Qwen via Spark Control**
(`translate.py`, reusing `ingest/llm.py`), the same sanctioned local leg as intake/digest. The
question never leaves the box, so there is **no Claude path and no redaction boundary** — that
was the whole point of the W2 simplification (the *answer* is sensitive and never leaves; the
*question* is generic English, translated locally). Validated **12/12** on real example
questions against the live Spark (2026-06-18). The model output is still untrusted: it goes
straight through `runner.validate`, so a hallucinated intent is rejected. If the local model
ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative
`chat_fn` without touching the validator/executor — deliberately **not** built.
**Results never go to any model.** Summaries are deterministic local strings; rows render
client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.
## Soft-delete per table (the gotcha the design reviews caught)
The `fundraising_*` tables are a **hard-rebuilt projection** of the grid blob and have **no
`deleted_at` column** — do NOT add `deleted_at IS NULL` to them (it raises). Their live/retired
axis is the **`graveyard` flag** (exclude `graveyard = 1` for "live"). Other tables:
- `reminders` / `opportunities` / `communications` → filter `deleted_at IS NULL`.
- `emails` have no `deleted_at`; "live" = a non-tombstoned sighting (`EXISTS email_account_messages … deleted_at IS NULL`), mirroring `query_email_activity` / the digest.
`intents._last_activity_by_investor` **mirrors** `server.last_activity_by_investor` (duplicated
to avoid importing the `__main__` server module — helpers take a `conn`, never import server).
Keep the two in sync; the soft-delete test guards the copy.
## Email/comms intents are MATCHED-ONLY
The email-touching intents (`recent_emails`, `comms_by_user`, `email_counts_by_user`,
`investor_last_contact`) surface only **investor-linked** email — an `email_investor_links` row
must exist — exactly like the Communications panel's `query_email_activity`. Captured
internal/vendor/personal mail is never counted or listed. The gate is
`EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id)`. **`comms_by_user` /
`email_counts_by_user` originally omitted this** and counted the user's *entire* sent corpus —
fixed; the runner test now seeds an unmatched sent email to guard it. Add this gate to any new
email intent.
## Endpoint, caps, audit
- `POST /api/query/nl` (`require_bot_or_admin`, read-only) — body `{question}` (local translate)
or `{intent, slots}` (direct, e.g. a UI re-run). Returns `{intent, slots, rows, summary,
question}`. `GET /api/query/catalog` returns the askable surface for the UI.
- **Clients (thin):** the **Matrix Q&A** surface is built — it lives bot-side in
`backend/matrix_intake/query.py` (trigger grammar + deterministic answer rendering) +
`crm_client.nl_query`, and ships on the Spark (no s9pk for the bot). Two entry points: a
**dedicated Q&A room** (`MATRIX_QUERY_ROOM`, every message is a question) and the `?`/`@bot`
trigger in the intake room. **It depends on this endpoint being live on the box** — which lands
with the v93 s9pk (reminders + W2); deploy the bot only after that, or it 404s. See the
matrix-intake guide. The **web "Ask" box** (Communications tab) is the remaining client.
- Status: local-model outage → **503**; unexpected SQL fault → **500**; everything else
(a hit, or a soft `no_match`/`unknown_intent`) → **200** with the structured result, because
the UI always wants the interpreted query back, not a bare code.
- Every executed query writes an audit row (`audit_log`, `entity_type='nl_query'`) so a query
through a leaked/automated credential is detectable. Global row ceiling `MAX_ROWS=500`.
## Tests + dev harness
`test_nl_query.py` (runner: every intent + soft-delete on both recency legs + injection-safety
+ caps), `test_translate.py` (offline translator via an injected `chat_fn`), and
`test_nl_query_endpoint.py` (HTTP auth/wiring/503, local model forced down via a dead
`SPARK_CONTROL_URL` port). `try_questions.py` is a dev harness (not a test) that fires
questions at the real local model and prints the translation — the cheap way to check quality.