Files

T

Keysat 6c29c22601 Add NL-query backend (W2): local translator + safe named-query runner

Read-only "ask the database in plain English" backend. Translation runs on
the local Qwen via Spark Control (question -> {intent, slots}); nothing leaves
the box, no Claude and no redaction boundary (the simplification chosen after
pressure-testing). The safe surface is a curated catalog of ~12 hand-written
parameterized queries; a slot validator is the trust boundary (no generic SQL,
no dynamic identifiers). POST /api/query/nl + GET /api/query/catalog, gated
require_bot_or_admin, read-only, audited. Soft-delete-correct per table.
Local Qwen translated 12/12 real example questions correctly against the live
Spark. Web "Ask" box and Matrix bot still to come (steps 4-5).

2026-06-18 18:35:41 -05:00

4.2 KiB

Raw Blame History

paths

backend/nl_query/**

Natural-language query (W2)

Read this before editing the NL-query surface (backend/nl_query/). It is the read-only "ask the database in plain English" layer — web "Ask" box + Matrix @bot <question>.

The trust model — named intents, not a query language

There is no generic SQL/AST compiler and no dynamically-built identifiers. Every query is a fixed, hand-written, reviewed, parameterized statement in intents.py; the only thing a caller (or the model) controls is a small set of typed slot values, bound as ? params. runner.validate is the trust boundary: it accepts only a known intent key and coerces each slot to its declared type, rejecting anything off-spec. A request that's wrong is rejected; it can never name a table/column, pick an operator, or write SQL. run_query never raises — every failure returns a structured error dict (a bad limit=abc must not crash the thread).

To add a capability: add a run_* + a registry entry (with its slots spec) in intents.py; the translator prompt and the UI pick it up automatically from catalog(). Add a test case.

Local-only — no Claude, no redaction here

Translation (question → {intent, slots}) runs on the local Qwen via Spark Control (translate.py, reusing ingest/llm.py), the same sanctioned local leg as intake/digest. The question never leaves the box, so there is no Claude path and no redaction boundary — that was the whole point of the W2 simplification (the answer is sensitive and never leaves; the question is generic English, translated locally). Validated 12/12 on real example questions against the live Spark (2026-06-18). The model output is still untrusted: it goes straight through runner.validate, so a hallucinated intent is rejected. If the local model ever proves too weak, a Claude-behind-redaction translator could drop in as an alternative chat_fn without touching the validator/executor — deliberately not built.

Results never go to any model. Summaries are deterministic local strings; rows render client-side. Never add a "summarize these rows with an LLM" step — that re-introduces the leak.

Soft-delete per table (the gotcha the design reviews caught)

The fundraising_* tables are a hard-rebuilt projection of the grid blob and have no deleted_at column — do NOT add deleted_at IS NULL to them (it raises). Their live/retired axis is the graveyard flag (exclude graveyard = 1 for "live"). Other tables:

reminders / opportunities / communications → filter deleted_at IS NULL.
emails have no deleted_at; "live" = a non-tombstoned sighting (EXISTS email_account_messages … deleted_at IS NULL), mirroring query_email_activity / the digest.

intents._last_activity_by_investor mirrors server.last_activity_by_investor (duplicated to avoid importing the __main__ server module — helpers take a conn, never import server). Keep the two in sync; the soft-delete test guards the copy.

Endpoint, caps, audit

POST /api/query/nl (require_bot_or_admin, read-only) — body {question} (local translate) or {intent, slots} (direct, e.g. a UI re-run). Returns {intent, slots, rows, summary, question}. GET /api/query/catalog returns the askable surface for the UI.
Status: local-model outage → 503; unexpected SQL fault → 500; everything else (a hit, or a soft no_match/unknown_intent) → 200 with the structured result, because the UI always wants the interpreted query back, not a bare code.
Every executed query writes an audit row (audit_log, entity_type='nl_query') so a query through a leaked/automated credential is detectable. Global row ceiling MAX_ROWS=500.

Tests + dev harness

test_nl_query.py (runner: every intent + soft-delete on both recency legs + injection-safety

caps), test_translate.py (offline translator via an injected chat_fn), and test_nl_query_endpoint.py (HTTP auth/wiring/503, local model forced down via a dead SPARK_CONTROL_URL port). try_questions.py is a dev harness (not a test) that fires questions at the real local model and prints the translation — the cheap way to check quality.

4.2 KiB Raw Blame History