ten31-database/docs/guides/nl-query.md

---
paths:
  - backend/nl_query/**
---

# Natural-language query (W2)

Read this before editing the NL-query surface (`backend/nl_query/`). It is the read-only
"ask the database in plain English" layer — web "Ask" box + Matrix `@bot <question>`.

## The trust model — named intents, not a query language

There is **no generic SQL/AST compiler and no dynamically-built identifiers.** Every query is
a fixed, hand-written, reviewed, parameterized statement in `intents.py`; the only thing a
caller (or the model) controls is a small set of typed **slot values**, bound as `?` params.
`runner.validate` is the trust boundary: it accepts only a known intent key and coerces each
slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected;
it can never name a table/column, pick an operator, or write SQL. `run_query` never raises —
every failure returns a structured error dict (a bad `limit=abc` must not crash the thread).

To add a capability: add a `run_*` + a registry entry (with its `slots` spec) in `intents.py`;
the translator prompt and the UI pick it up automatically from `catalog()`. Add a test case.

## Local-only — no Claude, no redaction here

Translation (question → `{intent, slots}`) runs on the **local Qwen via Spark Control**
(`translate.py`, reusing `ingest/llm.py`), the same sanctioned local leg as intake/digest. The
question never leaves the box, so there is **no Claude path and no redaction boundary** — that
was the whole point of the W2 simplification (the *answer* is sensitive and never leaves; the
*question* is generic English, translated locally). Validated **12/12** on real example
questions against the live Spark (2026-06-18). The model output is still untrusted: it goes
straight through `runner.validate`, so a hallucinated intent is rejected. If the local model
ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative
`chat_fn` without touching the validator/executor — deliberately **not** built.

**Results never go to any model.** Summaries are deterministic local strings; rows render
client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.

## Soft-delete per table (the gotcha the design reviews caught)

The `fundraising_*` tables are a **hard-rebuilt projection** of the grid blob and have **no
`deleted_at` column** — do NOT add `deleted_at IS NULL` to them (it raises). Their live/retired
axis is the **`graveyard` flag** (exclude `graveyard = 1` for "live"). Other tables:

- `reminders` / `opportunities` / `communications` → filter `deleted_at IS NULL`.
- `emails` have no `deleted_at`; "live" = a non-tombstoned sighting (`EXISTS email_account_messages … deleted_at IS NULL`), mirroring `query_email_activity` / the digest.

`intents._last_activity_by_investor` **mirrors** `server.last_activity_by_investor` (duplicated
to avoid importing the `__main__` server module — helpers take a `conn`, never import server).
Keep the two in sync; the soft-delete test guards the copy.

## Email/comms intents are MATCHED-ONLY

The email-touching intents (`recent_emails`, `comms_by_user`, `email_counts_by_user`,
`investor_last_contact`) surface only **investor-linked** email — an `email_investor_links` row
must exist — exactly like the Communications panel's `query_email_activity`. Captured
internal/vendor/personal mail is never counted or listed. The gate is
`EXISTS (SELECT 1 FROM email_investor_links l WHERE l.email_id = e.id)`. **`comms_by_user` /
`email_counts_by_user` originally omitted this** and counted the user's *entire* sent corpus —
fixed; the runner test now seeds an unmatched sent email to guard it. Add this gate to any new
email intent.

## Endpoint, caps, audit

- `POST /api/query/nl` (`require_bot_or_admin`, read-only) — body `{question}` (local translate)
  or `{intent, slots}` (direct, e.g. a UI re-run). Returns `{intent, slots, rows, summary,
  question}`. `GET /api/query/catalog` returns the askable surface for the UI.
- **Clients (thin):** the **Matrix Q&A** surface is built — it lives bot-side in
  `backend/matrix_intake/query.py` (trigger grammar + deterministic answer rendering) +
  `crm_client.nl_query`, and ships on the Spark (no s9pk for the bot). Two entry points: a
  **dedicated Q&A room** (`MATRIX_QUERY_ROOM`, every message is a question) and the `?`/`@bot`
  trigger in the intake room. **It depends on this endpoint being live on the box** — which lands
  with the v93 s9pk (reminders + W2); deploy the bot only after that, or it 404s. See the
  matrix-intake guide. The **web "Ask" box** (Communications tab) is the remaining client.
- Status: local-model outage → **503**; unexpected SQL fault → **500**; everything else
  (a hit, or a soft `no_match`/`unknown_intent`) → **200** with the structured result, because
  the UI always wants the interpreted query back, not a bare code.
- Every executed query writes an audit row (`audit_log`, `entity_type='nl_query'`) so a query
  through a leaked/automated credential is detectable. Global row ceiling `MAX_ROWS=500`.

## Tests + dev harness

`test_nl_query.py` (runner: every intent + soft-delete on both recency legs + injection-safety
+ caps), `test_translate.py` (offline translator via an injected `chat_fn`), and
`test_nl_query_endpoint.py` (HTTP auth/wiring/503, local model forced down via a dead
`SPARK_CONTROL_URL` port). `try_questions.py` is a dev harness (not a test) that fires
questions at the real local model and prints the translation — the cheap way to check quality.