Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)
Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP objections without any LP identity reaching the Claude API. Layered, defense-in-depth, fail-closed by construction (docs/redaction-rehydration.md). backend/redaction/: - scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/ IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails, scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/ quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N] strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the de-anon key — local-only, never logged/sent). Hardened across THREE adversarial leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/ comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes). - client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary build errors propagate; strict rehydrate returns tokenized-not-de-anon text). - test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector. backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first (local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the local model is unreachable or a hallucinated token appears. test_grounding_boundary.py proves nothing sensitive reaches Claude and the three fail-closed paths. server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections. docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md: the gateway handover spec (Option 1 — caller supplies the entity dictionary). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -59,9 +59,11 @@ try:
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), "mcp"))
|
||||
import architect_tools as _architect_tools # type: ignore
|
||||
import architect_agent as _architect_agent # type: ignore
|
||||
import architect_grounding as _architect_grounding # type: ignore
|
||||
except Exception:
|
||||
_architect_tools = None
|
||||
_architect_agent = None
|
||||
_architect_grounding = None
|
||||
|
||||
# ─── Configuration ────────────────────────────────────────────────────────────
|
||||
|
||||
@@ -1894,6 +1896,8 @@ class CRMHandler(BaseHTTPRequestHandler):
|
||||
return self.handle_generate_options(user, path.split('/')[-2], body)
|
||||
if re.match(r'^/api/thesis/nodes/[^/]+/feedback$', path):
|
||||
return self.handle_node_feedback(user, path.split('/')[-2], body)
|
||||
if path == '/api/architect/ground':
|
||||
return self.handle_architect_ground(user, body)
|
||||
if path == '/api/thesis/lines':
|
||||
return self.handle_create_thesis_line(user, body)
|
||||
if re.match(r'^/api/thesis/lines/[^/]+/nodes$', path):
|
||||
@@ -3701,6 +3705,44 @@ class CRMHandler(BaseHTTPRequestHandler):
|
||||
return self.send_error_json(res.get('raw') or res['error'], 502)
|
||||
return self.send_json({"data": res})
|
||||
|
||||
def _ground_feedback_corpus(self, conn, limit=60):
|
||||
"""Raw LP-feedback prose for grounding (communications + grid notes). Sensitive
|
||||
Tier-2-heavy text; ONLY ever passed into the redaction boundary, never to Claude
|
||||
directly."""
|
||||
items = []
|
||||
for q in ("SELECT body FROM communications WHERE body IS NOT NULL AND TRIM(body)<>'' ORDER BY communication_date DESC LIMIT ?",
|
||||
"SELECT notes FROM fundraising_investors WHERE notes IS NOT NULL AND TRIM(notes)<>'' LIMIT ?"):
|
||||
try:
|
||||
items += [r[0] for r in conn.execute(q, (limit,))]
|
||||
except Exception:
|
||||
pass
|
||||
return items[:limit]
|
||||
|
||||
def handle_architect_ground(self, user, body):
|
||||
"""Ground an objection register in real LP feedback THROUGH the redaction boundary
|
||||
(Workstream D). Retrieval + minimization + scrub stay local; only the de-identified
|
||||
register reaches Claude; the re-hydrated draft is for human review (guardrail #4)."""
|
||||
if not require_admin(user):
|
||||
return self.send_error_json("Admin required", 403)
|
||||
if _architect_grounding is None:
|
||||
return self.send_error_json("Unavailable", 503)
|
||||
body = body or {}
|
||||
segment_key = body.get('segment_key')
|
||||
feedback = body.get('feedback_items')
|
||||
conn = get_db()
|
||||
try:
|
||||
if not feedback:
|
||||
feedback = self._ground_feedback_corpus(conn)
|
||||
if not feedback:
|
||||
return self.send_error_json("No LP feedback found to ground against", 404)
|
||||
res = _architect_grounding.ground_objections(feedback, segment_key=segment_key,
|
||||
db_path=DB_PATH, conn=conn)
|
||||
except Exception as exc:
|
||||
return self.send_error_json(str(exc), 502)
|
||||
finally:
|
||||
conn.close()
|
||||
return self.send_json({"data": res})
|
||||
|
||||
# ─── Architect thesis (Phase 1) ───
|
||||
def handle_list_thesis_lines(self, user):
|
||||
if thesis_review is None:
|
||||
|
||||
Reference in New Issue
Block a user