Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)

Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP objections without any LP identity reaching the Claude API. Layered, defense-in-depth, fail-closed by construction (docs/redaction-rehydration.md). backend/redaction/: - scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/ IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails, scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/ quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N] strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the de-anon key — local-only, never logged/sent). Hardened across THREE adversarial leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/ comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes). - client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary build errors propagate; strict rehydrate returns tokenized-not-de-anon text). - test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector. backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first (local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the local model is unreachable or a hallucinated token appears. test_grounding_boundary.py proves nothing sensitive reaches Claude and the three fail-closed paths. server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections. docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md: the gateway handover spec (Option 1 — caller supplies the entity dictionary). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:06:29 -05:00
parent 300041a7ec
commit 2e70b34592
12 changed files with 1371 additions and 4 deletions
@@ -0,0 +1,175 @@
+# Spark Control — `/scrub` + `/rehydrate` endpoints (handover prompt)
+
+*Hand this to the Spark Control developer to build the gateway redaction endpoints.
+**OPTIONAL for Phase 1** — the Architect's grounding boundary already ships in our app
+repo (the `backend/redaction/` module, deterministic scrub + golden-file leak tests).
+Build this when we move to **multi-agent** enforcement (Phase 2 Analyst, Phase 3 Closer),
+where we want ONE bypass-proof point with the pseudonym map living next to the local
+models. Our in-repo module is the **reference implementation**; the contract below is what
+our `SCRUB_BACKEND=gateway` client already calls, so aim for behavioral parity and we cut
+agents over with zero app changes. Related: `docs/redaction-rehydration.md`.*
+
+---
+
+## Purpose
+
+Add two gateway endpoints, `POST /scrub` and `POST /rehydrate`, that let any agent send
+LP-specific context to the Claude API without exposing sovereign data. `/scrub`
+de-identifies an agent's assembled context; the agent sends the de-identified text to
+Claude; `/rehydrate` puts the real values back into Claude's response for human review.
+The **pseudonym map** that links tokens to real values is the de-anonymization key — it
+**must stay on the Spark, next to the local models, and never be sent or logged to any
+third party.** This sits behind the same trusted Spark Control URL, TLS, and access
+control that already front `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`, and
+`/api/search`. Spark Control does **not** call Claude here — it is the scrub and rehydrate
+transform pair plus a server-held map.
+
+## The three-tier classification (the rules `/scrub` enforces)
+
+Classify every span into exactly one tier:
+
+- **TIER-1 NEVER-SEND** — excluded entirely, not even tokenized: full LP list/export or
+  bulk relationship graph, raw account/wire/routing numbers, SSN/passport/gov-ID, anything
+  under a confidentiality obligation. Default: excise the span. `tier1_action="reject"`
+  makes the whole call fail-closed (422) if any Tier-1 is detected.
+- **TIER-2 TOKENIZE** — stable placeholders, swapped back locally after Claude: person
+  names, org/fund names, emails, phones, addresses, exact $ amounts, identity-pinning
+  dates. Placeholders are `[TYPE_N]`: `[PERSON_1] [ORG_1] [FUND_1] [EMAIL_1] [PHONE_1]
+  [ADDR_1] [AMOUNT_1] [DATE_1] [LOC_1] [MISC_1]`. `N` is 1-based and **stable within a
+  task**: the same real entity maps to the same token across every item in the call and
+  across later `/scrub` calls reusing the same `map_handle`, so Claude can reason about
+  relationships (`[PERSON_1] introduced [PERSON_2] to [FUND_1]`).
+- **SEND-AS-IS** — passed through untouched: the substance Claude needs (objections,
+  sentiment, generic deal mechanics, the drafted message body minus identifiers).
+
+## The round-trip
+
+`SCRUB` (your endpoint) → `REASON` (the agent calls Claude with placeholders only — your
+gateway does NOT call Claude) → `RE-HYDRATE` (your endpoint) → human review (in our app).
+
+## Request / response contracts
+
+### `POST /scrub`
+```json
+{
+  "task_id": "string, required, caller-chosen, stable across the round-trip",
+  "actor": "string, agent name e.g. 'analyst' | 'closer' (for logging)",
+  "items": [ {"id": "ctx_1", "text": "..."} ],
+  "known_entities": {                         // CALLER-SUPPLIED dictionary (see "Dictionary" below)
+    "persons": ["..."], "orgs": ["..."], "funds": ["..."], "emails": ["..."]
+  },
+  "tier1_action": "drop | reject",            // default "drop"
+  "bucket": {"amounts": false, "dates": false}, // default FALSE for grounding: tokenize reversibly (no magnitudes to Claude). bucket=true only when a caller genuinely needs coarse magnitude
+  "ner": "auto | rules_only | qwen",          // default "auto"
+  "map_handle": "string, optional, reuse/extend an existing task map"
+}
+```
+Response `200`:
+```json
+{
+  "task_id": "...",
+  "map_handle": "opaque server key to the map (NOT the map itself)",
+  "items": [ {"id":"ctx_1","scrubbed_text":"...","tokens_used":["PERSON_1","AMOUNT_1"]} ],
+  "stats": {"tier1_dropped": 2, "tier2_tokenized": 14, "distinct_entities": 9,
+            "descriptive_flags": [{"item":"ctx_1","span":"the family that sold the mining company in Texas","action":"redacted"}]},
+  "expires_at": "ISO-8601 map TTL (short-lived, e.g. 2h)"
+}
+```
+`422 {"error":"tier1_detected","spans":[...]}` when `tier1_action="reject"` and Tier-1 found.
+`400` on malformed input.
+
+### `POST /rehydrate`
+```json
+{
+  "task_id": "string, required",
+  "map_handle": "string, required, from /scrub",
+  "items": [ {"id":"out_1","text":"...with [PERSON_1] tokens..."} ],
+  "actor": "string",
+  "strict": true   // default true
+}
+```
+Response `200`:
+```json
+{ "items": [ {"id":"out_1","rehydrated_text":"...real values..."} ],
+  "stats": {"tokens_substituted": 6, "unknown_tokens": []} }
+```
+`409 {"error":"unknown_tokens","tokens":["PERSON_9"]}` when `strict` and the text contains a
+token with no map entry — this is your tripwire for a Claude-hallucinated/smuggled token; do
+NOT silently pass it through. `410 {"error":"map_expired"}` if the map TTL lapsed.
+
+## Dictionary: CALLER-SUPPLIED (Option 1 — decided)
+
+The known-entity dictionary is supplied by the **caller** in each `/scrub` request
+(`known_entities`), NOT read by Spark Control from the CRM. We chose this over giving the
+gateway CRM access because it keeps Spark Control **generic and portable** (no coupling to
+Ten31's schema, so the package still works for another dual-Spark user), needs **no CRM
+credentials on the gateway** (least privilege — the LP list lives in one place), stays
+**fresh** (built live from the CRM each call), and matches the reference `scrub()` signature.
+Our app builds it with its `build_known_entities` (names + name-parts + email local-parts)
+and sends it scoped to the request. Treat it as **sensitive** (a slice of the LP list): hold
+it only transiently with the map, never log or forward it. Tokens are per-task (stable within
+a `task_id`/`map_handle`), which is also the lower re-identification-risk choice. If per-call
+payload ever becomes a perf issue, an optional gateway-side cached export (Option 3) is the
+escape hatch — but do not build that coupling now.
+
+## Local-Qwen NER (how to find Tier-2 entities), cheapest/most-authoritative first
+
+1. **The caller-supplied `known_entities`.** Tokenize every supplied person/org/fund/email
+   surface deterministically (case- and unicode-fold insensitive, longest-match-first, with
+   hyphenated-surname extension). This is the bulk of the work and needs no model. (This is
+   the deterministic FLOOR — load-bearing but NOT complete; the NER pass below is required.)
+2. **Rules** for structured PII: regexes for emails, phones, $ amounts, dates, and
+   SSN/account-number shapes (account/SSN shapes route to Tier-1).
+3. **Local-Qwen NER pass** for residual named entities the map and rules miss (a person/org
+   in free text not yet in the CRM). Call the SAME local Qwen you already serve at
+   `/v1/chat/completions` (enable_thinking=false, temperature 0), with a strict JSON-only
+   extraction prompt returning `{entities:[{text,type,tier}]}`. Never a remote model — the
+   input is exactly the sensitive text we are keeping local. `ner="rules_only"` skips this;
+   `ner="qwen"` forces it; `auto` runs it only on spans unresolved by steps 1–2. Qwen should
+   also flag **descriptive** re-identifiers (e.g. "the family that sold the mining company in
+   Texas"), not just named entities.
+
+## Map-stays-local (non-negotiable)
+
+The pseudonym map `{token -> real_value}` is stored ONLY on the Spark, keyed by
+`map_handle`, in memory or a short-lived local store, TTL-expired (default ~2h). Never
+returned in full, never written to any log/metric/trace that leaves the box, never in any
+Claude-bound payload. `/scrub` returns only the opaque `map_handle` and counts. If you add
+`GET /scrub/map/{map_handle}` for same-box debugging, gate it behind the same auth and keep
+it off by default.
+
+## Logging
+
+One row per call to our `interaction_log` (we give you a write path or ingest an emitted
+event): `action='redaction.scrub' | 'redaction.rehydrate'`, `actor_type='agent'`,
+`actor_id=<actor>`, `source='spark_control'`, `payload` = **COUNTS ONLY** (tiers dropped,
+tokens by type, distinct_entities, descriptive_flags spans WITHOUT real values, model used).
+The payload MUST NOT contain any real Tier-2 value, any Tier-1 content, or the map.
+
+## Acceptance tests (must pass before agents route through it)
+
+1. **Golden-file diff** — fixed inputs, recorded expected scrubbed output; assert NO Tier-1
+   string and NO real Tier-2 identifier appears in any `/scrub` response or Claude-bound
+   payload. (We will share our `backend/redaction/` fixtures + golden files.)
+2. **Round-trip identity** — scrub → echo tokens → rehydrate reproduces the original real
+   values exactly; token stability holds across items and across a second `/scrub` reusing
+   the same `map_handle`.
+3. **Re-identification spot-check** — feed ONLY the de-identified prompt to the local Qwen
+   and ask it to name the real people/orgs/amounts; anything it recovers (esp. descriptive
+   re-identifiers and amount+date+sector inference) is flagged and must be driven to zero or
+   escalated to Tier-1/bucketing.
+4. **Map-leak assertion** — scan every response body, log row, and Claude-bound payload in
+   the suite; assert the map and all real values are absent.
+5. **Strict-rehydrate tripwire** — a rehydrate input with an unmapped token returns 409.
+6. **Fail-closed** — `tier1_action="reject"` returns 422 on any Tier-1 input, emits nothing.
+
+## Notes
+
+- This does NOT replace **minimization**: the agent must first ask "does Claude need this
+  record content at all?" — often a local retrieval summary suffices. These endpoints are
+  for when the answer is genuinely yes.
+- Keep placeholders **cache-stable** (deterministic token assignment per task_id/map_handle)
+  so they compose with prompt caching on the Claude side.
+- We will hand you our in-repo `redaction` module + golden files as the reference; aim for
+  behavioral parity first, then we cut agents over to the gateway as the single enforcement
+  point.