Add NL-query backend (W2): local translator + safe named-query runner

Read-only "ask the database in plain English" backend. Translation runs on the local Qwen via Spark Control (question -> {intent, slots}); nothing leaves the box, no Claude and no redaction boundary (the simplification chosen after pressure-testing). The safe surface is a curated catalog of ~12 hand-written parameterized queries; a slot validator is the trust boundary (no generic SQL, no dynamic identifiers). POST /api/query/nl + GET /api/query/catalog, gated require_bot_or_admin, read-only, audited. Soft-delete-correct per table. Local Qwen translated 12/12 real example questions correctly against the live Spark. Web "Ask" box and Matrix bot still to come (steps 4-5).
2026-06-18 18:35:41 -05:00
parent a166b49397
commit 6c29c22601
13 changed files with 1348 additions and 13 deletions
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+"""Dev harness — fire questions at the LOCAL model and print how each is translated.
+
+Lets you eyeball whether the local Qwen maps real questions to the right curated query
+(intent + slots), against your real Spark, with NO UI, auth, HTTP, or deploy. This is the
+cheap way to validate translation quality before building the web/Matrix surfaces. It only
+translates (it does not touch the DB), so no data is needed and nothing leaves the box.
+
+NOT shipped and NOT a test (no `test_` prefix) — a developer convenience.
+
+Needs SPARK_CONTROL_URL set (read from the repo .env) and the Spark reachable.
+Run:
+  python3 backend/nl_query/try_questions.py                 # the built-in sample set
+  python3 backend/nl_query/try_questions.py "when did we last email Acme?"
+"""
+import os
+import sys
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))  # backend/
+import nl_query  # noqa: E402
+
+SAMPLES = [
+    "Which investors haven't we reached out to in the last 3 months?",
+    "Which investors do we owe follow-ups to?",
+    "What is Acme Capital's email and how much have they committed across funds?",
+    "When did we last reach out to Acme Capital?",
+    "What were the last 10 investor emails we sent, and who to?",
+    "What were the last 10 investor emails we received?",
+    "Who are all the investors located in Austin?",
+    "List our top 10 investors by committed capital.",
+    "List our top 10 pipeline investors by stage and the most recent conversation.",
+    "What is our total pipeline in dollars, split by stage?",
+    "What were the last investor emails sent by Grant?",
+    "How many emails has Jonathan sent this week, this month, and year to date?",
+]
+
+
+def main():
+    questions = sys.argv[1:] or SAMPLES
+    print(f"Translating {len(questions)} question(s) on the local model "
+          f"(SPARK_CONTROL_URL={os.environ.get('SPARK_CONTROL_URL', '(unset)')})\n")
+    for q in questions:
+        r = nl_query.translate(q)
+        if r.get("error"):
+            print(f"  ?  {q}\n     -> [{r['error']}] {r.get('detail', '')}\n")
+        else:
+            print(f"  ?  {q}\n     -> {r['intent']}  slots={r['slots']}\n")
+
+
+if __name__ == "__main__":
+    main()