Architect grounding boundary: redaction/re-hydration privacy gate (v0.1.0:55)

Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP
objections without any LP identity reaching the Claude API. Layered, defense-in-depth,
fail-closed by construction (docs/redaction-rehydration.md).

backend/redaction/:
- scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/
  IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from
  the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails,
  scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/
  quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N]
  strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the
  de-anon key — local-only, never logged/sent). Hardened across THREE adversarial
  leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/
  comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes).
- client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or
  gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary
  build errors propagate; strict rehydrate returns tokenized-not-de-anon text).
- test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification
  suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector.

backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first
(local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the
de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the
local model is unreachable or a hallucinated token appears. test_grounding_boundary.py
proves nothing sensitive reaches Claude and the three fail-closed paths.

server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections.
docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md:
the gateway handover spec (Option 1 — caller supplies the entity dictionary).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 17:06:29 -05:00
parent 300041a7ec
commit 2e70b34592
12 changed files with 1371 additions and 4 deletions
+175
View File
@@ -0,0 +1,175 @@
# Spark Control — `/scrub` + `/rehydrate` endpoints (handover prompt)
*Hand this to the Spark Control developer to build the gateway redaction endpoints.
**OPTIONAL for Phase 1** — the Architect's grounding boundary already ships in our app
repo (the `backend/redaction/` module, deterministic scrub + golden-file leak tests).
Build this when we move to **multi-agent** enforcement (Phase 2 Analyst, Phase 3 Closer),
where we want ONE bypass-proof point with the pseudonym map living next to the local
models. Our in-repo module is the **reference implementation**; the contract below is what
our `SCRUB_BACKEND=gateway` client already calls, so aim for behavioral parity and we cut
agents over with zero app changes. Related: `docs/redaction-rehydration.md`.*
---
## Purpose
Add two gateway endpoints, `POST /scrub` and `POST /rehydrate`, that let any agent send
LP-specific context to the Claude API without exposing sovereign data. `/scrub`
de-identifies an agent's assembled context; the agent sends the de-identified text to
Claude; `/rehydrate` puts the real values back into Claude's response for human review.
The **pseudonym map** that links tokens to real values is the de-anonymization key — it
**must stay on the Spark, next to the local models, and never be sent or logged to any
third party.** This sits behind the same trusted Spark Control URL, TLS, and access
control that already front `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`, and
`/api/search`. Spark Control does **not** call Claude here — it is the scrub and rehydrate
transform pair plus a server-held map.
## The three-tier classification (the rules `/scrub` enforces)
Classify every span into exactly one tier:
- **TIER-1 NEVER-SEND** — excluded entirely, not even tokenized: full LP list/export or
bulk relationship graph, raw account/wire/routing numbers, SSN/passport/gov-ID, anything
under a confidentiality obligation. Default: excise the span. `tier1_action="reject"`
makes the whole call fail-closed (422) if any Tier-1 is detected.
- **TIER-2 TOKENIZE** — stable placeholders, swapped back locally after Claude: person
names, org/fund names, emails, phones, addresses, exact $ amounts, identity-pinning
dates. Placeholders are `[TYPE_N]`: `[PERSON_1] [ORG_1] [FUND_1] [EMAIL_1] [PHONE_1]
[ADDR_1] [AMOUNT_1] [DATE_1] [LOC_1] [MISC_1]`. `N` is 1-based and **stable within a
task**: the same real entity maps to the same token across every item in the call and
across later `/scrub` calls reusing the same `map_handle`, so Claude can reason about
relationships (`[PERSON_1] introduced [PERSON_2] to [FUND_1]`).
- **SEND-AS-IS** — passed through untouched: the substance Claude needs (objections,
sentiment, generic deal mechanics, the drafted message body minus identifiers).
## The round-trip
`SCRUB` (your endpoint) → `REASON` (the agent calls Claude with placeholders only — your
gateway does NOT call Claude) → `RE-HYDRATE` (your endpoint) → human review (in our app).
## Request / response contracts
### `POST /scrub`
```json
{
"task_id": "string, required, caller-chosen, stable across the round-trip",
"actor": "string, agent name e.g. 'analyst' | 'closer' (for logging)",
"items": [ {"id": "ctx_1", "text": "..."} ],
"known_entities": { // CALLER-SUPPLIED dictionary (see "Dictionary" below)
"persons": ["..."], "orgs": ["..."], "funds": ["..."], "emails": ["..."]
},
"tier1_action": "drop | reject", // default "drop"
"bucket": {"amounts": false, "dates": false}, // default FALSE for grounding: tokenize reversibly (no magnitudes to Claude). bucket=true only when a caller genuinely needs coarse magnitude
"ner": "auto | rules_only | qwen", // default "auto"
"map_handle": "string, optional, reuse/extend an existing task map"
}
```
Response `200`:
```json
{
"task_id": "...",
"map_handle": "opaque server key to the map (NOT the map itself)",
"items": [ {"id":"ctx_1","scrubbed_text":"...","tokens_used":["PERSON_1","AMOUNT_1"]} ],
"stats": {"tier1_dropped": 2, "tier2_tokenized": 14, "distinct_entities": 9,
"descriptive_flags": [{"item":"ctx_1","span":"the family that sold the mining company in Texas","action":"redacted"}]},
"expires_at": "ISO-8601 map TTL (short-lived, e.g. 2h)"
}
```
`422 {"error":"tier1_detected","spans":[...]}` when `tier1_action="reject"` and Tier-1 found.
`400` on malformed input.
### `POST /rehydrate`
```json
{
"task_id": "string, required",
"map_handle": "string, required, from /scrub",
"items": [ {"id":"out_1","text":"...with [PERSON_1] tokens..."} ],
"actor": "string",
"strict": true // default true
}
```
Response `200`:
```json
{ "items": [ {"id":"out_1","rehydrated_text":"...real values..."} ],
"stats": {"tokens_substituted": 6, "unknown_tokens": []} }
```
`409 {"error":"unknown_tokens","tokens":["PERSON_9"]}` when `strict` and the text contains a
token with no map entry — this is your tripwire for a Claude-hallucinated/smuggled token; do
NOT silently pass it through. `410 {"error":"map_expired"}` if the map TTL lapsed.
## Dictionary: CALLER-SUPPLIED (Option 1 — decided)
The known-entity dictionary is supplied by the **caller** in each `/scrub` request
(`known_entities`), NOT read by Spark Control from the CRM. We chose this over giving the
gateway CRM access because it keeps Spark Control **generic and portable** (no coupling to
Ten31's schema, so the package still works for another dual-Spark user), needs **no CRM
credentials on the gateway** (least privilege — the LP list lives in one place), stays
**fresh** (built live from the CRM each call), and matches the reference `scrub()` signature.
Our app builds it with its `build_known_entities` (names + name-parts + email local-parts)
and sends it scoped to the request. Treat it as **sensitive** (a slice of the LP list): hold
it only transiently with the map, never log or forward it. Tokens are per-task (stable within
a `task_id`/`map_handle`), which is also the lower re-identification-risk choice. If per-call
payload ever becomes a perf issue, an optional gateway-side cached export (Option 3) is the
escape hatch — but do not build that coupling now.
## Local-Qwen NER (how to find Tier-2 entities), cheapest/most-authoritative first
1. **The caller-supplied `known_entities`.** Tokenize every supplied person/org/fund/email
surface deterministically (case- and unicode-fold insensitive, longest-match-first, with
hyphenated-surname extension). This is the bulk of the work and needs no model. (This is
the deterministic FLOOR — load-bearing but NOT complete; the NER pass below is required.)
2. **Rules** for structured PII: regexes for emails, phones, $ amounts, dates, and
SSN/account-number shapes (account/SSN shapes route to Tier-1).
3. **Local-Qwen NER pass** for residual named entities the map and rules miss (a person/org
in free text not yet in the CRM). Call the SAME local Qwen you already serve at
`/v1/chat/completions` (enable_thinking=false, temperature 0), with a strict JSON-only
extraction prompt returning `{entities:[{text,type,tier}]}`. Never a remote model — the
input is exactly the sensitive text we are keeping local. `ner="rules_only"` skips this;
`ner="qwen"` forces it; `auto` runs it only on spans unresolved by steps 12. Qwen should
also flag **descriptive** re-identifiers (e.g. "the family that sold the mining company in
Texas"), not just named entities.
## Map-stays-local (non-negotiable)
The pseudonym map `{token -> real_value}` is stored ONLY on the Spark, keyed by
`map_handle`, in memory or a short-lived local store, TTL-expired (default ~2h). Never
returned in full, never written to any log/metric/trace that leaves the box, never in any
Claude-bound payload. `/scrub` returns only the opaque `map_handle` and counts. If you add
`GET /scrub/map/{map_handle}` for same-box debugging, gate it behind the same auth and keep
it off by default.
## Logging
One row per call to our `interaction_log` (we give you a write path or ingest an emitted
event): `action='redaction.scrub' | 'redaction.rehydrate'`, `actor_type='agent'`,
`actor_id=<actor>`, `source='spark_control'`, `payload` = **COUNTS ONLY** (tiers dropped,
tokens by type, distinct_entities, descriptive_flags spans WITHOUT real values, model used).
The payload MUST NOT contain any real Tier-2 value, any Tier-1 content, or the map.
## Acceptance tests (must pass before agents route through it)
1. **Golden-file diff** — fixed inputs, recorded expected scrubbed output; assert NO Tier-1
string and NO real Tier-2 identifier appears in any `/scrub` response or Claude-bound
payload. (We will share our `backend/redaction/` fixtures + golden files.)
2. **Round-trip identity** — scrub → echo tokens → rehydrate reproduces the original real
values exactly; token stability holds across items and across a second `/scrub` reusing
the same `map_handle`.
3. **Re-identification spot-check** — feed ONLY the de-identified prompt to the local Qwen
and ask it to name the real people/orgs/amounts; anything it recovers (esp. descriptive
re-identifiers and amount+date+sector inference) is flagged and must be driven to zero or
escalated to Tier-1/bucketing.
4. **Map-leak assertion** — scan every response body, log row, and Claude-bound payload in
the suite; assert the map and all real values are absent.
5. **Strict-rehydrate tripwire** — a rehydrate input with an unmapped token returns 409.
6. **Fail-closed**`tier1_action="reject"` returns 422 on any Tier-1 input, emits nothing.
## Notes
- This does NOT replace **minimization**: the agent must first ask "does Claude need this
record content at all?" — often a local retrieval summary suffices. These endpoints are
for when the answer is genuinely yes.
- Keep placeholders **cache-stable** (deterministic token assignment per task_id/map_handle)
so they compose with prompt caching on the Claude side.
- We will hand you our in-repo `redaction` module + golden files as the reference; aim for
behavioral parity first, then we cut agents over to the gateway as the single enforcement
point.