Files

T

Keysat 68106d7a5a Add Matrix NL-query Q&A surface (W2 step 5)

Read-only natural-language query over the curated nl_query endpoint, answered
in-thread. Two entry points (room-per-purpose model): a dedicated Q&A room
(MATRIX_QUERY_ROOM) where every top-level message is a question, plus the
?/@bot trigger in the intake room as a cross-room convenience. Both routes hit
the same handle_query -> crm_client.nl_query -> POST /api/query/nl; translation
runs on the box's local model, nothing leaves the box, and there is no write
path so no approval gate applies.

Pure logic (trigger parsing, answer rendering) in query.py with offline tests;
async room wiring in bot.py (live-smoke only, per the bot's convention).

Bot-side only, ships on the Spark via git pull + restart. Depends on the
box-side /api/query/nl endpoint, which lands with the v93 s9pk (reminders + W2):
until v93 is installed the Q&A surface 404s, so the bot deploy is staged to
follow that install.

2026-06-18 19:46:54 -05:00

4.8 KiB

Raw Blame History

paths

backend/nl_query/**

Natural-language query (W2)

Read this before editing the NL-query surface (backend/nl_query/). It is the read-only "ask the database in plain English" layer — web "Ask" box + Matrix @bot <question>.

The trust model — named intents, not a query language

There is no generic SQL/AST compiler and no dynamically-built identifiers. Every query is a fixed, hand-written, reviewed, parameterized statement in intents.py; the only thing a caller (or the model) controls is a small set of typed slot values, bound as ? params. runner.validate is the trust boundary: it accepts only a known intent key and coerces each slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected; it can never name a table/column, pick an operator, or write SQL. run_query never raises — every failure returns a structured error dict (a bad limit=abc must not crash the thread).

To add a capability: add a run_* + a registry entry (with its slots spec) in intents.py; the translator prompt and the UI pick it up automatically from catalog(). Add a test case.

Local-only — no Claude, no redaction here

Translation (question → {intent, slots}) runs on the local Qwen via Spark Control (translate.py, reusing ingest/llm.py), the same sanctioned local leg as intake/digest. The question never leaves the box, so there is no Claude path and no redaction boundary — that was the whole point of the W2 simplification (the answer is sensitive and never leaves; the question is generic English, translated locally). Validated 12/12 on real example questions against the live Spark (2026-06-18). The model output is still untrusted: it goes straight through runner.validate, so a hallucinated intent is rejected. If the local model ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative chat_fn without touching the validator/executor — deliberately not built.

Results never go to any model. Summaries are deterministic local strings; rows render client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.

Soft-delete per table (the gotcha the design reviews caught)

The fundraising_* tables are a hard-rebuilt projection of the grid blob and have no deleted_at column — do NOT add deleted_at IS NULL to them (it raises). Their live/retired axis is the graveyard flag (exclude graveyard = 1 for "live"). Other tables:

reminders / opportunities / communications → filter deleted_at IS NULL.
emails have no deleted_at; "live" = a non-tombstoned sighting (EXISTS email_account_messages … deleted_at IS NULL), mirroring query_email_activity / the digest.

intents._last_activity_by_investor mirrors server.last_activity_by_investor (duplicated to avoid importing the __main__ server module — helpers take a conn, never import server). Keep the two in sync; the soft-delete test guards the copy.

Endpoint, caps, audit

POST /api/query/nl (require_bot_or_admin, read-only) — body {question} (local translate) or {intent, slots} (direct, e.g. a UI re-run). Returns {intent, slots, rows, summary, question}. GET /api/query/catalog returns the askable surface for the UI.
Clients (thin): the Matrix Q&A surface is built — it lives bot-side in backend/matrix_intake/query.py (trigger grammar + deterministic answer rendering) + crm_client.nl_query, and ships on the Spark (no s9pk for the bot). Two entry points: a dedicated Q&A room (MATRIX_QUERY_ROOM, every message is a question) and the ?/@bot trigger in the intake room. It depends on this endpoint being live on the box — which lands with the v93 s9pk (reminders + W2); deploy the bot only after that, or it 404s. See the matrix-intake guide. The web "Ask" box (Communications tab) is the remaining client.
Status: local-model outage → 503; unexpected SQL fault → 500; everything else (a hit, or a soft no_match/unknown_intent) → 200 with the structured result, because the UI always wants the interpreted query back, not a bare code.
Every executed query writes an audit row (audit_log, entity_type='nl_query') so a query through a leaked/automated credential is detectable. Global row ceiling MAX_ROWS=500.

Tests + dev harness

test_nl_query.py (runner: every intent + soft-delete on both recency legs + injection-safety

caps), test_translate.py (offline translator via an injected chat_fn), and test_nl_query_endpoint.py (HTTP auth/wiring/503, local model forced down via a dead SPARK_CONTROL_URL port). try_questions.py is a dev harness (not a test) that fires questions at the real local model and prints the translation — the cheap way to check quality.

4.8 KiB Raw Blame History