Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)
Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP objections without any LP identity reaching the Claude API. Layered, defense-in-depth, fail-closed by construction (docs/redaction-rehydration.md). backend/redaction/: - scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/ IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails, scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/ quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N] strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the de-anon key — local-only, never logged/sent). Hardened across THREE adversarial leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/ comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes). - client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary build errors propagate; strict rehydrate returns tokenized-not-de-anon text). - test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector. backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first (local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the local model is unreachable or a hallucinated token appears. test_grounding_boundary.py proves nothing sensitive reaches Claude and the three fail-closed paths. server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections. docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md: the gateway handover spec (Option 1 — caller supplies the entity dictionary). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,141 @@
|
||||
"""Architect grounding (Phase 1 Workstream D) — ground the thesis in real LP feedback
|
||||
WITHOUT leaking identity, by routing the Claude-facing synthesis through the redaction
|
||||
boundary. The corpus is a defensibility oracle, not a generator.
|
||||
|
||||
Flow (redaction-rehydration.md):
|
||||
retrieve (local CRM)
|
||||
-> MINIMIZE local Qwen condenses raw feedback into anonymous objection THEMES
|
||||
("first ask if Claude needs the record content at all")
|
||||
-> SCRUB tokenize any residual identifiers (defense-in-depth, leak-proof core)
|
||||
-> REASON Claude synthesizes the strongest honest rebuttals on the DE-IDENTIFIED
|
||||
register (placeholders only)
|
||||
-> RE-HYDRATE map placeholders back locally
|
||||
-> HUMAN a partner reviews before anything is used (guardrail #4).
|
||||
|
||||
The Architect only drafts; nothing is sent outbound. Every scrub/rehydrate is logged
|
||||
(counts only) to interaction_log. minimize_fn / claude_fn are injectable so the boundary
|
||||
enforcement is testable offline without Spark or Claude.
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "redaction"))
|
||||
from client import Boundary # noqa: E402
|
||||
|
||||
_MINIMIZE_SYSTEM = (
|
||||
"You are a local privacy-preserving summarizer running on Ten31's own infrastructure. "
|
||||
"Given raw notes about investor conversations, extract ONLY the recurring objections, "
|
||||
"concerns, and sentiment as generic themes. STRIP every identifier: no names, firms, "
|
||||
"emails, amounts, dates, or places. Output a short de-identified bullet list of distinct "
|
||||
"objection themes with a rough frequency. Never include who said it.")
|
||||
|
||||
|
||||
_NER_SYSTEM = (
|
||||
"You are a local NER extractor running on Ten31's own infrastructure. Return STRICT "
|
||||
"JSON only: {\"entities\":[{\"text\":\"...\",\"type\":\"PERSON|ORG|LOC\"}]}. Extract every "
|
||||
"person name, organization/firm/fund name, and specific location mentioned. Include "
|
||||
"partial names and nicknames. No commentary, JSON only.")
|
||||
|
||||
|
||||
def _minimize_local(feedback_items, segment_key):
|
||||
"""Condense raw LP feedback into anonymous objection themes on the LOCAL model
|
||||
(Spark Control /v1/chat/completions). RAISES if the local model is unreachable — the
|
||||
caller must then FAIL CLOSED, because without the local model neither this nor the
|
||||
NER backstop can run, and raw prose must never reach Claude unprotected."""
|
||||
raw = "\n".join(f"- {t}" for t in feedback_items if t)
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "ingest"))
|
||||
import llm # noqa: E402
|
||||
out = llm.chat(f"Segment: {segment_key or 'all'}\nRaw notes:\n{raw}\n\nDe-identified objection themes:",
|
||||
system=_MINIMIZE_SYSTEM, max_tokens=500)
|
||||
if not (out or "").strip():
|
||||
raise RuntimeError("local minimize produced no output")
|
||||
return out
|
||||
|
||||
|
||||
def _ner_local(text):
|
||||
"""Local-Qwen NER backstop for UNKNOWN names not in the CRM dictionary. Returns
|
||||
[(surface, type)]. RAISES on a connection failure (caller fails closed); an empty
|
||||
result (model reachable, found nothing) is fine."""
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "ingest"))
|
||||
import llm # noqa: E402
|
||||
data = llm.chat_json(f"Text:\n{text}\n\nEntities JSON:", system=_NER_SYSTEM, max_tokens=700)
|
||||
if not data:
|
||||
return []
|
||||
return [(e.get("text"), e.get("type")) for e in (data.get("entities") or []) if e.get("text")]
|
||||
|
||||
|
||||
def _synthesize_claude(register_text, segment_key):
|
||||
"""Claude drafts the strongest honest rebuttals over the DE-IDENTIFIED register.
|
||||
Reuses the Architect's Claude client. Raises if not configured (handled by caller)."""
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
import architect_agent as aa # noqa: E402
|
||||
client = aa._client()
|
||||
user = (
|
||||
# NB: the raw segment_key is intentionally NOT interpolated here — it is an
|
||||
# admin free-text field and must not become an un-scrubbed channel to Claude.
|
||||
"Below is a DE-IDENTIFIED register of recurring LP objections for one of our LP "
|
||||
"segments. Every identifier is a placeholder like [PERSON_1] or "
|
||||
"[ORG_1]; keep all placeholders intact and never invent new ones.\n\n"
|
||||
f"{register_text}\n\n"
|
||||
"For each distinct objection, draft Ten31's strongest HONEST rebuttal grounded in our "
|
||||
"thesis, and flag each as 'substantiated' or 'hand-wavy'. Also do a brief counter-evidence "
|
||||
"sweep (where might the objection be right?). Return readable markdown.")
|
||||
resp = client.messages.create(
|
||||
model=aa.MODEL, max_tokens=2000,
|
||||
system=[{"type": "text",
|
||||
"text": "You are the Architect, pressure-testing Ten31's thesis against real LP "
|
||||
"objections. " + aa.VOICE,
|
||||
"cache_control": {"type": "ephemeral"}}],
|
||||
messages=[{"role": "user", "content": user}])
|
||||
return "".join(b.text for b in resp.content if getattr(b, "type", None) == "text")
|
||||
|
||||
|
||||
def ground_objections(feedback_items, segment_key=None, db_path=None, actor="architect",
|
||||
minimize_fn=None, claude_fn=None, ner_fn="default", conn=None):
|
||||
"""Build a grounded objection register through the redaction boundary.
|
||||
|
||||
FAILS CLOSED at every step: no Claude call happens unless the local minimize ran and
|
||||
the scrub (dict + regex + NER backstop) succeeded; a hallucinated token in Claude's
|
||||
output quarantines the draft. Always flagged needs_human_review (guardrail #4)."""
|
||||
minimize_fn = minimize_fn or _minimize_local
|
||||
claude_fn = claude_fn or _synthesize_claude
|
||||
if ner_fn == "default":
|
||||
ner_fn = _ner_local # local-Qwen backstop for unknown names (None disables it, for tests)
|
||||
|
||||
base = {"segment_key": segment_key, "needs_human_review": True}
|
||||
|
||||
# 1) MINIMIZE locally. If the local model is unreachable -> FAIL CLOSED (no raw onward).
|
||||
try:
|
||||
themes = minimize_fn(feedback_items, segment_key)
|
||||
except Exception as exc:
|
||||
return {**base, "status": "local_model_unavailable", "reason": str(exc)}
|
||||
|
||||
# 2) SCRUB. The Boundary fails closed if db_path is missing or the dictionary can't be
|
||||
# built, and the NER backstop propagates a local-model failure here too.
|
||||
try:
|
||||
boundary = Boundary(db_path=db_path, actor=actor, ner_fn=ner_fn)
|
||||
# bucket=False: tokenize amounts/dates REVERSIBLY. The objection register needs no
|
||||
# magnitudes, so opaque placeholders both remove any inference signal AND avoid the
|
||||
# irreversible substance corruption a coarse bucket could introduce on a misparse.
|
||||
scrubbed = boundary.scrub([themes], task_id=f"ground:{segment_key or 'all'}", bucket=False, conn=conn)
|
||||
except Exception as exc:
|
||||
return {**base, "status": "scrub_unavailable", "reason": str(exc)}
|
||||
register = scrubbed["items"][0]
|
||||
handle = scrubbed["handle"]
|
||||
out = {**base, "scrubbed_register": register, "scrub_stats": scrubbed["stats"]}
|
||||
|
||||
# 3) REASON on Claude over the de-identified register only.
|
||||
try:
|
||||
draft = claude_fn(register, segment_key)
|
||||
except Exception as exc:
|
||||
boundary.forget(handle)
|
||||
return {**out, "status": "claude_not_configured", "reason": str(exc)}
|
||||
|
||||
# 4) RE-HYDRATE (strict). A residual placeholder = a Claude-hallucinated/smuggled token
|
||||
# -> HARD STOP: quarantine the draft, never surface a de-anonymized version.
|
||||
rehy = boundary.rehydrate(draft, handle, strict=True, conn=conn)
|
||||
boundary.forget(handle)
|
||||
if rehy.get("error"):
|
||||
return {**out, "status": "rehydrate_failed", "draft_quarantined": True,
|
||||
"unknown_tokens": rehy.get("unknown_tokens")}
|
||||
return {**out, "status": "ok", "draft": rehy["text"]}
|
||||
@@ -0,0 +1,126 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Boundary-enforcement + FAIL-CLOSED test for the Architect grounding flow.
|
||||
|
||||
Proves: (1) whatever reaches Claude is de-identified and the draft is re-hydrated locally;
|
||||
(2) the local-Qwen NER backstop tokenizes UNKNOWN names not in the CRM dictionary;
|
||||
(3) the flow FAILS CLOSED — no Claude call when the local model is unavailable, the scrub
|
||||
refuses without a db_path, and a Claude-hallucinated token quarantines the draft.
|
||||
Offline + synthetic (guardrail #9): minimize, Claude, and NER are injected as stubs.
|
||||
|
||||
Run: cd backend && python3 mcp/test_grounding_boundary.py
|
||||
"""
|
||||
import os
|
||||
import sqlite3
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
import architect_grounding as G # noqa: E402
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "redaction"))
|
||||
from client import Boundary # noqa: E402
|
||||
|
||||
FAILS = []
|
||||
|
||||
|
||||
def check(cond, msg):
|
||||
print((" PASS " if cond else " FAIL ") + msg)
|
||||
if not cond:
|
||||
FAILS.append(msg)
|
||||
|
||||
|
||||
def make_db():
|
||||
db = os.path.join(tempfile.mkdtemp(), "crm.db")
|
||||
c = sqlite3.connect(db)
|
||||
c.executescript("""
|
||||
CREATE TABLE canonical_entities (id TEXT PRIMARY KEY, entity_kind TEXT, display_name TEXT, primary_email TEXT, deleted_at TEXT);
|
||||
CREATE TABLE contacts (id TEXT PRIMARY KEY, first_name TEXT, last_name TEXT, email TEXT, deleted_at TEXT);
|
||||
CREATE TABLE interaction_log (id TEXT PRIMARY KEY, ts TEXT, actor_type TEXT, actor_id TEXT, action TEXT,
|
||||
target_type TEXT, target_id TEXT, payload TEXT, source TEXT, created_at TEXT);
|
||||
""")
|
||||
c.execute("INSERT INTO canonical_entities VALUES ('per_1','person','Jonathan Reyes','jon@cedarpoint.example',NULL)")
|
||||
c.execute("INSERT INTO canonical_entities VALUES ('inv_1','investor','Cedar Point Capital',NULL,NULL)")
|
||||
c.execute("INSERT INTO contacts VALUES ('c1','Jonathan','Reyes','jon@cedarpoint.example',NULL)")
|
||||
c.commit()
|
||||
c.close()
|
||||
return db
|
||||
|
||||
|
||||
FEEDBACK = [
|
||||
"Jonathan Reyes at Cedar Point Capital (jon@cedarpoint.example) is cooling; Reyes wants better terms "
|
||||
"and a $5,000,000 minimum. Wire acct 000123456789 flagged. Objection: fee load and lock-up.",
|
||||
"Another LP echoed the lock-up concern and questioned the energy thesis timeline.",
|
||||
]
|
||||
SENSITIVE = ["Jonathan Reyes", "Reyes", "Cedar Point Capital", "jon@cedarpoint.example", "$5,000,000", "000123456789"]
|
||||
|
||||
|
||||
def main():
|
||||
db = make_db()
|
||||
conn = sqlite3.connect(db)
|
||||
passthrough = lambda items, seg: "\n".join(items) # worst case: no minimization
|
||||
|
||||
# ── A) deterministic enforcement (NER off): nothing sensitive reaches Claude ──
|
||||
print("\n[A — deterministic enforcement]")
|
||||
captured = {}
|
||||
res = G.ground_objections(FEEDBACK, segment_key="institution", db_path=db, conn=conn,
|
||||
minimize_fn=passthrough, ner_fn=None,
|
||||
claude_fn=lambda reg, seg: (captured.__setitem__("sent", reg), reg)[1])
|
||||
check(res.get("status") == "ok", f"grounding ok (status={res.get('status')})")
|
||||
for v in SENSITIVE:
|
||||
check(v not in captured.get("sent", ""), f"de-identified payload to Claude has NO {v!r}")
|
||||
check("fee load" in captured.get("sent", ""), "objection substance survives to Claude")
|
||||
check("000123456789" not in res.get("draft", ""), "Tier-1 account number never re-hydrated")
|
||||
check("Jonathan Reyes" in res.get("draft", ""), "rehydrate restored real Tier-2 values locally")
|
||||
blob = " ".join(r[0] for r in conn.execute("SELECT payload FROM interaction_log WHERE action LIKE 'redaction.%'"))
|
||||
check(all(v not in blob for v in SENSITIVE), "interaction_log carries NO sensitive value")
|
||||
|
||||
# ── B) NER backstop tokenizes an UNKNOWN name not in the CRM dictionary ──
|
||||
print("\n[B — NER backstop for unknown names]")
|
||||
cap2 = {}
|
||||
fb = ["New intro: Penelope Ashworth-Vane runs a family office and is cooling on the lock-up."]
|
||||
ner_stub = lambda text: [("Penelope Ashworth-Vane", "PERSON")]
|
||||
res2 = G.ground_objections(fb, db_path=db, conn=conn, minimize_fn=passthrough, ner_fn=ner_stub,
|
||||
claude_fn=lambda reg, seg: (cap2.__setitem__("sent", reg), reg)[1])
|
||||
check("Penelope Ashworth-Vane" not in cap2.get("sent", ""), "unknown name tokenized by NER backstop (absent from Claude payload)")
|
||||
check("Penelope Ashworth-Vane" in res2.get("draft", ""), "unknown name re-hydrated locally for the human")
|
||||
|
||||
# ── C) FAIL CLOSED: local model unavailable -> no Claude call ──
|
||||
print("\n[C — fail closed: local model down]")
|
||||
called = {"claude": False}
|
||||
def boom(items, seg):
|
||||
raise RuntimeError("Spark Control unreachable")
|
||||
res3 = G.ground_objections(FEEDBACK, db_path=db, conn=conn, minimize_fn=boom, ner_fn=None,
|
||||
claude_fn=lambda reg, seg: called.__setitem__("claude", True) or reg)
|
||||
check(res3.get("status") == "local_model_unavailable", f"status local_model_unavailable (got {res3.get('status')})")
|
||||
check(called["claude"] is False, "Claude was NOT called when minimize failed (fail closed)")
|
||||
|
||||
# ── D) FAIL CLOSED: a Claude-hallucinated token quarantines the draft ──
|
||||
print("\n[D — fail closed: hallucinated token]")
|
||||
res4 = G.ground_objections(FEEDBACK, db_path=db, conn=conn, minimize_fn=passthrough, ner_fn=None,
|
||||
claude_fn=lambda reg, seg: reg + " Also loop in [PERSON_99].")
|
||||
check(res4.get("status") == "rehydrate_failed", f"status rehydrate_failed (got {res4.get('status')})")
|
||||
check("draft" not in res4 and res4.get("draft_quarantined"), "de-anonymized draft quarantined, not returned")
|
||||
|
||||
# ── E) FAIL CLOSED: local scrub backend requires a db_path ──
|
||||
print("\n[E — fail closed: missing db_path]")
|
||||
raised = False
|
||||
try:
|
||||
Boundary(db_path=None, backend="local")
|
||||
except ValueError:
|
||||
raised = True
|
||||
check(raised, "Boundary(local) without db_path raises (never runs name-blind)")
|
||||
res5 = G.ground_objections(FEEDBACK, db_path=None, conn=conn, minimize_fn=passthrough, ner_fn=None,
|
||||
claude_fn=lambda reg, seg: reg)
|
||||
check(res5.get("status") == "scrub_unavailable", f"grounding fails closed without db_path (got {res5.get('status')})")
|
||||
|
||||
conn.close()
|
||||
print()
|
||||
if FAILS:
|
||||
print(f"FAILED ({len(FAILS)}):")
|
||||
for f in FAILS:
|
||||
print(f" - {f}")
|
||||
sys.exit(1)
|
||||
print("ALL PASS (grounding boundary enforcement + fail-closed)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user