Add NL-query backend (W2): local translator + safe named-query runner

Read-only "ask the database in plain English" backend. Translation runs on
the local Qwen via Spark Control (question -> {intent, slots}); nothing leaves
the box, no Claude and no redaction boundary (the simplification chosen after
pressure-testing). The safe surface is a curated catalog of ~12 hand-written
parameterized queries; a slot validator is the trust boundary (no generic SQL,
no dynamic identifiers). POST /api/query/nl + GET /api/query/catalog, gated
require_bot_or_admin, read-only, audited. Soft-delete-correct per table.
Local Qwen translated 12/12 real example questions correctly against the live
Spark. Web "Ask" box and Matrix bot still to come (steps 4-5).
This commit is contained in:
Keysat
2026-06-18 18:35:41 -05:00
parent a166b49397
commit 6c29c22601
13 changed files with 1348 additions and 13 deletions
+51
View File
@@ -0,0 +1,51 @@
#!/usr/bin/env python3
"""Dev harness — fire questions at the LOCAL model and print how each is translated.
Lets you eyeball whether the local Qwen maps real questions to the right curated query
(intent + slots), against your real Spark, with NO UI, auth, HTTP, or deploy. This is the
cheap way to validate translation quality before building the web/Matrix surfaces. It only
translates (it does not touch the DB), so no data is needed and nothing leaves the box.
NOT shipped and NOT a test (no `test_` prefix) — a developer convenience.
Needs SPARK_CONTROL_URL set (read from the repo .env) and the Spark reachable.
Run:
python3 backend/nl_query/try_questions.py # the built-in sample set
python3 backend/nl_query/try_questions.py "when did we last email Acme?"
"""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # backend/
import nl_query # noqa: E402
SAMPLES = [
"Which investors haven't we reached out to in the last 3 months?",
"Which investors do we owe follow-ups to?",
"What is Acme Capital's email and how much have they committed across funds?",
"When did we last reach out to Acme Capital?",
"What were the last 10 investor emails we sent, and who to?",
"What were the last 10 investor emails we received?",
"Who are all the investors located in Austin?",
"List our top 10 investors by committed capital.",
"List our top 10 pipeline investors by stage and the most recent conversation.",
"What is our total pipeline in dollars, split by stage?",
"What were the last investor emails sent by Grant?",
"How many emails has Jonathan sent this week, this month, and year to date?",
]
def main():
questions = sys.argv[1:] or SAMPLES
print(f"Translating {len(questions)} question(s) on the local model "
f"(SPARK_CONTROL_URL={os.environ.get('SPARK_CONTROL_URL', '(unset)')})\n")
for q in questions:
r = nl_query.translate(q)
if r.get("error"):
print(f" ? {q}\n -> [{r['error']}] {r.get('detail', '')}\n")
else:
print(f" ? {q}\n -> {r['intent']} slots={r['slots']}\n")
if __name__ == "__main__":
main()