Files
Keysat 2e70b34592 Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)
Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP
objections without any LP identity reaching the Claude API. Layered, defense-in-depth,
fail-closed by construction (docs/redaction-rehydration.md).

backend/redaction/:
- scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/
  IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from
  the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails,
  scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/
  quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N]
  strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the
  de-anon key — local-only, never logged/sent). Hardened across THREE adversarial
  leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/
  comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes).
- client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or
  gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary
  build errors propagate; strict rehydrate returns tokenized-not-de-anon text).
- test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification
  suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector.

backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first
(local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the
de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the
local model is unreachable or a hallucinated token appears. test_grounding_boundary.py
proves nothing sensitive reaches Claude and the three fail-closed paths.

server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections.
docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md:
the gateway handover spec (Option 1 — caller supplies the entity dictionary).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:06:29 -05:00

134 lines
5.6 KiB
Python

#!/usr/bin/env python3
"""Re-identification spot-check (redaction-rehydration.md §6) — OFFLINE + SYNTHETIC.
A deterministic approximation of "feed only the scrubbed prompt to a model and try to
recover who it is." Three probes:
A. Exact/normalized leak gate (MUST PASS): re-scan the scrubbed payload for ANY known
real value or Tier-1 string under normalization (case, punctuation, reversed
'Last First', email local-part). Any hit = tokenizer miss = FAIL.
B. Descriptive-identifier residual (BOUNDED): phrases that re-identify even with names
tokenized ("the family that sold the mining company in Texas"). The deterministic
scrubber is not expected to catch these (the on-infra local-Qwen pass + the
minimize-first summary do); this probe MEASURES the residual and fails only if it
EXCEEDS a committed ceiling, so leakage can't silently grow.
C. Inference via bucketing: with bucket=True, no exact amount/identity-date survives,
and a (amount-band, year, sector) tuple is not unique to one synthetic entity.
Run: cd backend && python3 redaction/test_reidentification.py
"""
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import scrub as R # noqa: E402
FAILS = []
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
def _norm(s):
return re.sub(r"[^a-z0-9]+", " ", s.lower()).strip()
# Probe A — known synthetic entities that must NOT be recoverable from the scrubbed text
KB = {
"persons": ["Jonathan Reyes", "Marta Quine"],
"orgs": ["Cedar Point Capital"],
"emails": ["jon@cedarpoint.example"],
"tier1": ["000123456789", "123-45-6789"],
}
RAW_A = ("Jonathan Reyes at Cedar Point Capital (jon@cedarpoint.example) is cooling; Reyes wants better "
"terms. Marta Quine disagrees. acct 000123456789. Substance: fee and lock-up objections.")
KNOWN_A = {"persons": ["Jonathan Reyes", "Reyes", "Marta Quine", "Marta"],
"orgs": ["Cedar Point Capital"], "funds": [], "emails": ["jon@cedarpoint.example"]}
def probe_a():
print("\n[probe A — exact/normalized leak gate]")
outbound, _, _ = R.scrub(RAW_A, known_entities=KNOWN_A, bucket=False)
nout = _norm(outbound)
hits = []
for cat in ("persons", "orgs", "emails", "tier1"):
for v in KB[cat]:
variants = {_norm(v)}
if " " in v and cat == "persons":
a, b = v.split()[0], v.split()[-1]
variants.add(_norm(f"{b} {a}")) # reversed Last First
variants.add(_norm(b)) # bare surname
if "@" in v:
variants.add(_norm(v.split("@")[0])) # email local-part
for var in variants:
if var and var in nout:
hits.append((v, var))
check(not hits, f"no known identifier recoverable from scrubbed text (hits={hits})")
# Probe B — descriptive re-identifiers (bounded residual)
DESCRIPTIVE = [
"the family that sold the mining company in Texas",
"the former CTO of a well-known payments unicorn",
"the senator's brother who runs a family office",
]
RAW_B = "Notes: our contact is " + DESCRIPTIVE[0] + ". Another is " + DESCRIPTIVE[1] + ". A third is " + DESCRIPTIVE[2] + "."
RESIDUAL_CEILING = 3 # known residual the deterministic scrubber alone does not catch;
# the on-infra Qwen pass + minimize-first summary drive this toward 0.
def probe_b():
print("\n[probe B — descriptive-identifier residual, bounded]")
outbound, _, _ = R.scrub(RAW_B, known_entities={"persons": [], "orgs": [], "funds": [], "emails": []})
surviving = [d for d in DESCRIPTIVE if d in outbound]
for d in surviving:
print(f" flagged residual (handled on-infra by Qwen/minimize-first): {d!r}")
check(len(surviving) <= RESIDUAL_CEILING,
f"descriptive residual within committed ceiling ({len(surviving)} <= {RESIDUAL_CEILING})")
# Probe C — bucketing destroys exact values + singling-out
ENTITIES_C = [
{"name": "A", "amount": "$5,200,000", "date": "1986-02-10", "sector": "energy"},
{"name": "B", "amount": "$4,800,000", "date": "1986-03-20", "sector": "energy"}, # same band/year/sector as A
{"name": "C", "amount": "$25,000,000", "date": "1991-09-01", "sector": "bitcoin"},
]
def probe_c():
print("\n[probe C — inference via bucketing]")
raw = " ".join(f"Investor commits {e['amount']} on {e['date']} in {e['sector']}." for e in ENTITIES_C)
outbound, _, _ = R.scrub(raw, known_entities={"persons": [], "orgs": [], "funds": [], "emails": []}, bucket=True)
check(re.search(r"\$\s?\d[\d,]{2,}", outbound) is None, "no exact $ amount survives bucketing")
check(re.search(r"\b(?:19|20)\d{2}-\d{2}-\d{2}\b", outbound) is None, "no exact date survives bucketing")
# singling-out: the (amount-band, year, sector) tuple must not be unique to one entity
tuples = {}
for e in ENTITIES_C:
band = R._bucket_amount(e["amount"])
year = e["date"][:4]
tuples.setdefault((band, year, e["sector"]), []).append(e["name"])
unique = [k for k, v in tuples.items() if len(v) == 1]
# A and B collapse to the same bucket-tuple; C is alone but that's an accepted single in this fixture
check(any(len(v) > 1 for v in tuples.values()),
f"bucketing collapses distinct entities into shared bands (tuples={ {k: v for k,v in tuples.items()} })")
def main():
probe_a()
probe_b()
probe_c()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (re-identification spot-check)")
if __name__ == "__main__":
main()