Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)

Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP
objections without any LP identity reaching the Claude API. Layered, defense-in-depth,
fail-closed by construction (docs/redaction-rehydration.md).

backend/redaction/:
- scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/
  IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from
  the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails,
  scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/
  quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N]
  strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the
  de-anon key — local-only, never logged/sent). Hardened across THREE adversarial
  leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/
  comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes).
- client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or
  gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary
  build errors propagate; strict rehydrate returns tokenized-not-de-anon text).
- test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification
  suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector.

backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first
(local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the
de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the
local model is unreachable or a hallucinated token appears. test_grounding_boundary.py
proves nothing sensitive reaches Claude and the three fail-closed paths.

server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections.
docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md:
the gateway handover spec (Option 1 — caller supplies the entity dictionary).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 17:06:29 -05:00
parent 300041a7ec
commit 2e70b34592
12 changed files with 1371 additions and 4 deletions
+141
View File
@@ -0,0 +1,141 @@
"""Architect grounding (Phase 1 Workstream D) — ground the thesis in real LP feedback
WITHOUT leaking identity, by routing the Claude-facing synthesis through the redaction
boundary. The corpus is a defensibility oracle, not a generator.
Flow (redaction-rehydration.md):
retrieve (local CRM)
-> MINIMIZE local Qwen condenses raw feedback into anonymous objection THEMES
("first ask if Claude needs the record content at all")
-> SCRUB tokenize any residual identifiers (defense-in-depth, leak-proof core)
-> REASON Claude synthesizes the strongest honest rebuttals on the DE-IDENTIFIED
register (placeholders only)
-> RE-HYDRATE map placeholders back locally
-> HUMAN a partner reviews before anything is used (guardrail #4).
The Architect only drafts; nothing is sent outbound. Every scrub/rehydrate is logged
(counts only) to interaction_log. minimize_fn / claude_fn are injectable so the boundary
enforcement is testable offline without Spark or Claude.
"""
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "redaction"))
from client import Boundary # noqa: E402
_MINIMIZE_SYSTEM = (
"You are a local privacy-preserving summarizer running on Ten31's own infrastructure. "
"Given raw notes about investor conversations, extract ONLY the recurring objections, "
"concerns, and sentiment as generic themes. STRIP every identifier: no names, firms, "
"emails, amounts, dates, or places. Output a short de-identified bullet list of distinct "
"objection themes with a rough frequency. Never include who said it.")
_NER_SYSTEM = (
"You are a local NER extractor running on Ten31's own infrastructure. Return STRICT "
"JSON only: {\"entities\":[{\"text\":\"...\",\"type\":\"PERSON|ORG|LOC\"}]}. Extract every "
"person name, organization/firm/fund name, and specific location mentioned. Include "
"partial names and nicknames. No commentary, JSON only.")
def _minimize_local(feedback_items, segment_key):
"""Condense raw LP feedback into anonymous objection themes on the LOCAL model
(Spark Control /v1/chat/completions). RAISES if the local model is unreachable — the
caller must then FAIL CLOSED, because without the local model neither this nor the
NER backstop can run, and raw prose must never reach Claude unprotected."""
raw = "\n".join(f"- {t}" for t in feedback_items if t)
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "ingest"))
import llm # noqa: E402
out = llm.chat(f"Segment: {segment_key or 'all'}\nRaw notes:\n{raw}\n\nDe-identified objection themes:",
system=_MINIMIZE_SYSTEM, max_tokens=500)
if not (out or "").strip():
raise RuntimeError("local minimize produced no output")
return out
def _ner_local(text):
"""Local-Qwen NER backstop for UNKNOWN names not in the CRM dictionary. Returns
[(surface, type)]. RAISES on a connection failure (caller fails closed); an empty
result (model reachable, found nothing) is fine."""
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "ingest"))
import llm # noqa: E402
data = llm.chat_json(f"Text:\n{text}\n\nEntities JSON:", system=_NER_SYSTEM, max_tokens=700)
if not data:
return []
return [(e.get("text"), e.get("type")) for e in (data.get("entities") or []) if e.get("text")]
def _synthesize_claude(register_text, segment_key):
"""Claude drafts the strongest honest rebuttals over the DE-IDENTIFIED register.
Reuses the Architect's Claude client. Raises if not configured (handled by caller)."""
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import architect_agent as aa # noqa: E402
client = aa._client()
user = (
# NB: the raw segment_key is intentionally NOT interpolated here — it is an
# admin free-text field and must not become an un-scrubbed channel to Claude.
"Below is a DE-IDENTIFIED register of recurring LP objections for one of our LP "
"segments. Every identifier is a placeholder like [PERSON_1] or "
"[ORG_1]; keep all placeholders intact and never invent new ones.\n\n"
f"{register_text}\n\n"
"For each distinct objection, draft Ten31's strongest HONEST rebuttal grounded in our "
"thesis, and flag each as 'substantiated' or 'hand-wavy'. Also do a brief counter-evidence "
"sweep (where might the objection be right?). Return readable markdown.")
resp = client.messages.create(
model=aa.MODEL, max_tokens=2000,
system=[{"type": "text",
"text": "You are the Architect, pressure-testing Ten31's thesis against real LP "
"objections. " + aa.VOICE,
"cache_control": {"type": "ephemeral"}}],
messages=[{"role": "user", "content": user}])
return "".join(b.text for b in resp.content if getattr(b, "type", None) == "text")
def ground_objections(feedback_items, segment_key=None, db_path=None, actor="architect",
minimize_fn=None, claude_fn=None, ner_fn="default", conn=None):
"""Build a grounded objection register through the redaction boundary.
FAILS CLOSED at every step: no Claude call happens unless the local minimize ran and
the scrub (dict + regex + NER backstop) succeeded; a hallucinated token in Claude's
output quarantines the draft. Always flagged needs_human_review (guardrail #4)."""
minimize_fn = minimize_fn or _minimize_local
claude_fn = claude_fn or _synthesize_claude
if ner_fn == "default":
ner_fn = _ner_local # local-Qwen backstop for unknown names (None disables it, for tests)
base = {"segment_key": segment_key, "needs_human_review": True}
# 1) MINIMIZE locally. If the local model is unreachable -> FAIL CLOSED (no raw onward).
try:
themes = minimize_fn(feedback_items, segment_key)
except Exception as exc:
return {**base, "status": "local_model_unavailable", "reason": str(exc)}
# 2) SCRUB. The Boundary fails closed if db_path is missing or the dictionary can't be
# built, and the NER backstop propagates a local-model failure here too.
try:
boundary = Boundary(db_path=db_path, actor=actor, ner_fn=ner_fn)
# bucket=False: tokenize amounts/dates REVERSIBLY. The objection register needs no
# magnitudes, so opaque placeholders both remove any inference signal AND avoid the
# irreversible substance corruption a coarse bucket could introduce on a misparse.
scrubbed = boundary.scrub([themes], task_id=f"ground:{segment_key or 'all'}", bucket=False, conn=conn)
except Exception as exc:
return {**base, "status": "scrub_unavailable", "reason": str(exc)}
register = scrubbed["items"][0]
handle = scrubbed["handle"]
out = {**base, "scrubbed_register": register, "scrub_stats": scrubbed["stats"]}
# 3) REASON on Claude over the de-identified register only.
try:
draft = claude_fn(register, segment_key)
except Exception as exc:
boundary.forget(handle)
return {**out, "status": "claude_not_configured", "reason": str(exc)}
# 4) RE-HYDRATE (strict). A residual placeholder = a Claude-hallucinated/smuggled token
# -> HARD STOP: quarantine the draft, never surface a de-anonymized version.
rehy = boundary.rehydrate(draft, handle, strict=True, conn=conn)
boundary.forget(handle)
if rehy.get("error"):
return {**out, "status": "rehydrate_failed", "draft_quarantined": True,
"unknown_tokens": rehy.get("unknown_tokens")}
return {**out, "status": "ok", "draft": rehy["text"]}
+126
View File
@@ -0,0 +1,126 @@
#!/usr/bin/env python3
"""Boundary-enforcement + FAIL-CLOSED test for the Architect grounding flow.
Proves: (1) whatever reaches Claude is de-identified and the draft is re-hydrated locally;
(2) the local-Qwen NER backstop tokenizes UNKNOWN names not in the CRM dictionary;
(3) the flow FAILS CLOSED — no Claude call when the local model is unavailable, the scrub
refuses without a db_path, and a Claude-hallucinated token quarantines the draft.
Offline + synthetic (guardrail #9): minimize, Claude, and NER are injected as stubs.
Run: cd backend && python3 mcp/test_grounding_boundary.py
"""
import os
import sqlite3
import sys
import tempfile
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import architect_grounding as G # noqa: E402
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "redaction"))
from client import Boundary # noqa: E402
FAILS = []
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
def make_db():
db = os.path.join(tempfile.mkdtemp(), "crm.db")
c = sqlite3.connect(db)
c.executescript("""
CREATE TABLE canonical_entities (id TEXT PRIMARY KEY, entity_kind TEXT, display_name TEXT, primary_email TEXT, deleted_at TEXT);
CREATE TABLE contacts (id TEXT PRIMARY KEY, first_name TEXT, last_name TEXT, email TEXT, deleted_at TEXT);
CREATE TABLE interaction_log (id TEXT PRIMARY KEY, ts TEXT, actor_type TEXT, actor_id TEXT, action TEXT,
target_type TEXT, target_id TEXT, payload TEXT, source TEXT, created_at TEXT);
""")
c.execute("INSERT INTO canonical_entities VALUES ('per_1','person','Jonathan Reyes','jon@cedarpoint.example',NULL)")
c.execute("INSERT INTO canonical_entities VALUES ('inv_1','investor','Cedar Point Capital',NULL,NULL)")
c.execute("INSERT INTO contacts VALUES ('c1','Jonathan','Reyes','jon@cedarpoint.example',NULL)")
c.commit()
c.close()
return db
FEEDBACK = [
"Jonathan Reyes at Cedar Point Capital (jon@cedarpoint.example) is cooling; Reyes wants better terms "
"and a $5,000,000 minimum. Wire acct 000123456789 flagged. Objection: fee load and lock-up.",
"Another LP echoed the lock-up concern and questioned the energy thesis timeline.",
]
SENSITIVE = ["Jonathan Reyes", "Reyes", "Cedar Point Capital", "jon@cedarpoint.example", "$5,000,000", "000123456789"]
def main():
db = make_db()
conn = sqlite3.connect(db)
passthrough = lambda items, seg: "\n".join(items) # worst case: no minimization
# ── A) deterministic enforcement (NER off): nothing sensitive reaches Claude ──
print("\n[A — deterministic enforcement]")
captured = {}
res = G.ground_objections(FEEDBACK, segment_key="institution", db_path=db, conn=conn,
minimize_fn=passthrough, ner_fn=None,
claude_fn=lambda reg, seg: (captured.__setitem__("sent", reg), reg)[1])
check(res.get("status") == "ok", f"grounding ok (status={res.get('status')})")
for v in SENSITIVE:
check(v not in captured.get("sent", ""), f"de-identified payload to Claude has NO {v!r}")
check("fee load" in captured.get("sent", ""), "objection substance survives to Claude")
check("000123456789" not in res.get("draft", ""), "Tier-1 account number never re-hydrated")
check("Jonathan Reyes" in res.get("draft", ""), "rehydrate restored real Tier-2 values locally")
blob = " ".join(r[0] for r in conn.execute("SELECT payload FROM interaction_log WHERE action LIKE 'redaction.%'"))
check(all(v not in blob for v in SENSITIVE), "interaction_log carries NO sensitive value")
# ── B) NER backstop tokenizes an UNKNOWN name not in the CRM dictionary ──
print("\n[B — NER backstop for unknown names]")
cap2 = {}
fb = ["New intro: Penelope Ashworth-Vane runs a family office and is cooling on the lock-up."]
ner_stub = lambda text: [("Penelope Ashworth-Vane", "PERSON")]
res2 = G.ground_objections(fb, db_path=db, conn=conn, minimize_fn=passthrough, ner_fn=ner_stub,
claude_fn=lambda reg, seg: (cap2.__setitem__("sent", reg), reg)[1])
check("Penelope Ashworth-Vane" not in cap2.get("sent", ""), "unknown name tokenized by NER backstop (absent from Claude payload)")
check("Penelope Ashworth-Vane" in res2.get("draft", ""), "unknown name re-hydrated locally for the human")
# ── C) FAIL CLOSED: local model unavailable -> no Claude call ──
print("\n[C — fail closed: local model down]")
called = {"claude": False}
def boom(items, seg):
raise RuntimeError("Spark Control unreachable")
res3 = G.ground_objections(FEEDBACK, db_path=db, conn=conn, minimize_fn=boom, ner_fn=None,
claude_fn=lambda reg, seg: called.__setitem__("claude", True) or reg)
check(res3.get("status") == "local_model_unavailable", f"status local_model_unavailable (got {res3.get('status')})")
check(called["claude"] is False, "Claude was NOT called when minimize failed (fail closed)")
# ── D) FAIL CLOSED: a Claude-hallucinated token quarantines the draft ──
print("\n[D — fail closed: hallucinated token]")
res4 = G.ground_objections(FEEDBACK, db_path=db, conn=conn, minimize_fn=passthrough, ner_fn=None,
claude_fn=lambda reg, seg: reg + " Also loop in [PERSON_99].")
check(res4.get("status") == "rehydrate_failed", f"status rehydrate_failed (got {res4.get('status')})")
check("draft" not in res4 and res4.get("draft_quarantined"), "de-anonymized draft quarantined, not returned")
# ── E) FAIL CLOSED: local scrub backend requires a db_path ──
print("\n[E — fail closed: missing db_path]")
raised = False
try:
Boundary(db_path=None, backend="local")
except ValueError:
raised = True
check(raised, "Boundary(local) without db_path raises (never runs name-blind)")
res5 = G.ground_objections(FEEDBACK, db_path=None, conn=conn, minimize_fn=passthrough, ner_fn=None,
claude_fn=lambda reg, seg: reg)
check(res5.get("status") == "scrub_unavailable", f"grounding fails closed without db_path (got {res5.get('status')})")
conn.close()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (grounding boundary enforcement + fail-closed)")
if __name__ == "__main__":
main()