2e70b34592
Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP objections without any LP identity reaching the Claude API. Layered, defense-in-depth, fail-closed by construction (docs/redaction-rehydration.md). backend/redaction/: - scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/ IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails, scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/ quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N] strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the de-anon key — local-only, never logged/sent). Hardened across THREE adversarial leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/ comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes). - client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary build errors propagate; strict rehydrate returns tokenized-not-de-anon text). - test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector. backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first (local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the local model is unreachable or a hallucinated token appears. test_grounding_boundary.py proves nothing sensitive reaches Claude and the three fail-closed paths. server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections. docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md: the gateway handover spec (Option 1 — caller supplies the entity dictionary). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
176 lines
10 KiB
Markdown
176 lines
10 KiB
Markdown
# Spark Control — `/scrub` + `/rehydrate` endpoints (handover prompt)
|
||
|
||
*Hand this to the Spark Control developer to build the gateway redaction endpoints.
|
||
**OPTIONAL for Phase 1** — the Architect's grounding boundary already ships in our app
|
||
repo (the `backend/redaction/` module, deterministic scrub + golden-file leak tests).
|
||
Build this when we move to **multi-agent** enforcement (Phase 2 Analyst, Phase 3 Closer),
|
||
where we want ONE bypass-proof point with the pseudonym map living next to the local
|
||
models. Our in-repo module is the **reference implementation**; the contract below is what
|
||
our `SCRUB_BACKEND=gateway` client already calls, so aim for behavioral parity and we cut
|
||
agents over with zero app changes. Related: `docs/redaction-rehydration.md`.*
|
||
|
||
---
|
||
|
||
## Purpose
|
||
|
||
Add two gateway endpoints, `POST /scrub` and `POST /rehydrate`, that let any agent send
|
||
LP-specific context to the Claude API without exposing sovereign data. `/scrub`
|
||
de-identifies an agent's assembled context; the agent sends the de-identified text to
|
||
Claude; `/rehydrate` puts the real values back into Claude's response for human review.
|
||
The **pseudonym map** that links tokens to real values is the de-anonymization key — it
|
||
**must stay on the Spark, next to the local models, and never be sent or logged to any
|
||
third party.** This sits behind the same trusted Spark Control URL, TLS, and access
|
||
control that already front `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`, and
|
||
`/api/search`. Spark Control does **not** call Claude here — it is the scrub and rehydrate
|
||
transform pair plus a server-held map.
|
||
|
||
## The three-tier classification (the rules `/scrub` enforces)
|
||
|
||
Classify every span into exactly one tier:
|
||
|
||
- **TIER-1 NEVER-SEND** — excluded entirely, not even tokenized: full LP list/export or
|
||
bulk relationship graph, raw account/wire/routing numbers, SSN/passport/gov-ID, anything
|
||
under a confidentiality obligation. Default: excise the span. `tier1_action="reject"`
|
||
makes the whole call fail-closed (422) if any Tier-1 is detected.
|
||
- **TIER-2 TOKENIZE** — stable placeholders, swapped back locally after Claude: person
|
||
names, org/fund names, emails, phones, addresses, exact $ amounts, identity-pinning
|
||
dates. Placeholders are `[TYPE_N]`: `[PERSON_1] [ORG_1] [FUND_1] [EMAIL_1] [PHONE_1]
|
||
[ADDR_1] [AMOUNT_1] [DATE_1] [LOC_1] [MISC_1]`. `N` is 1-based and **stable within a
|
||
task**: the same real entity maps to the same token across every item in the call and
|
||
across later `/scrub` calls reusing the same `map_handle`, so Claude can reason about
|
||
relationships (`[PERSON_1] introduced [PERSON_2] to [FUND_1]`).
|
||
- **SEND-AS-IS** — passed through untouched: the substance Claude needs (objections,
|
||
sentiment, generic deal mechanics, the drafted message body minus identifiers).
|
||
|
||
## The round-trip
|
||
|
||
`SCRUB` (your endpoint) → `REASON` (the agent calls Claude with placeholders only — your
|
||
gateway does NOT call Claude) → `RE-HYDRATE` (your endpoint) → human review (in our app).
|
||
|
||
## Request / response contracts
|
||
|
||
### `POST /scrub`
|
||
```json
|
||
{
|
||
"task_id": "string, required, caller-chosen, stable across the round-trip",
|
||
"actor": "string, agent name e.g. 'analyst' | 'closer' (for logging)",
|
||
"items": [ {"id": "ctx_1", "text": "..."} ],
|
||
"known_entities": { // CALLER-SUPPLIED dictionary (see "Dictionary" below)
|
||
"persons": ["..."], "orgs": ["..."], "funds": ["..."], "emails": ["..."]
|
||
},
|
||
"tier1_action": "drop | reject", // default "drop"
|
||
"bucket": {"amounts": false, "dates": false}, // default FALSE for grounding: tokenize reversibly (no magnitudes to Claude). bucket=true only when a caller genuinely needs coarse magnitude
|
||
"ner": "auto | rules_only | qwen", // default "auto"
|
||
"map_handle": "string, optional, reuse/extend an existing task map"
|
||
}
|
||
```
|
||
Response `200`:
|
||
```json
|
||
{
|
||
"task_id": "...",
|
||
"map_handle": "opaque server key to the map (NOT the map itself)",
|
||
"items": [ {"id":"ctx_1","scrubbed_text":"...","tokens_used":["PERSON_1","AMOUNT_1"]} ],
|
||
"stats": {"tier1_dropped": 2, "tier2_tokenized": 14, "distinct_entities": 9,
|
||
"descriptive_flags": [{"item":"ctx_1","span":"the family that sold the mining company in Texas","action":"redacted"}]},
|
||
"expires_at": "ISO-8601 map TTL (short-lived, e.g. 2h)"
|
||
}
|
||
```
|
||
`422 {"error":"tier1_detected","spans":[...]}` when `tier1_action="reject"` and Tier-1 found.
|
||
`400` on malformed input.
|
||
|
||
### `POST /rehydrate`
|
||
```json
|
||
{
|
||
"task_id": "string, required",
|
||
"map_handle": "string, required, from /scrub",
|
||
"items": [ {"id":"out_1","text":"...with [PERSON_1] tokens..."} ],
|
||
"actor": "string",
|
||
"strict": true // default true
|
||
}
|
||
```
|
||
Response `200`:
|
||
```json
|
||
{ "items": [ {"id":"out_1","rehydrated_text":"...real values..."} ],
|
||
"stats": {"tokens_substituted": 6, "unknown_tokens": []} }
|
||
```
|
||
`409 {"error":"unknown_tokens","tokens":["PERSON_9"]}` when `strict` and the text contains a
|
||
token with no map entry — this is your tripwire for a Claude-hallucinated/smuggled token; do
|
||
NOT silently pass it through. `410 {"error":"map_expired"}` if the map TTL lapsed.
|
||
|
||
## Dictionary: CALLER-SUPPLIED (Option 1 — decided)
|
||
|
||
The known-entity dictionary is supplied by the **caller** in each `/scrub` request
|
||
(`known_entities`), NOT read by Spark Control from the CRM. We chose this over giving the
|
||
gateway CRM access because it keeps Spark Control **generic and portable** (no coupling to
|
||
Ten31's schema, so the package still works for another dual-Spark user), needs **no CRM
|
||
credentials on the gateway** (least privilege — the LP list lives in one place), stays
|
||
**fresh** (built live from the CRM each call), and matches the reference `scrub()` signature.
|
||
Our app builds it with its `build_known_entities` (names + name-parts + email local-parts)
|
||
and sends it scoped to the request. Treat it as **sensitive** (a slice of the LP list): hold
|
||
it only transiently with the map, never log or forward it. Tokens are per-task (stable within
|
||
a `task_id`/`map_handle`), which is also the lower re-identification-risk choice. If per-call
|
||
payload ever becomes a perf issue, an optional gateway-side cached export (Option 3) is the
|
||
escape hatch — but do not build that coupling now.
|
||
|
||
## Local-Qwen NER (how to find Tier-2 entities), cheapest/most-authoritative first
|
||
|
||
1. **The caller-supplied `known_entities`.** Tokenize every supplied person/org/fund/email
|
||
surface deterministically (case- and unicode-fold insensitive, longest-match-first, with
|
||
hyphenated-surname extension). This is the bulk of the work and needs no model. (This is
|
||
the deterministic FLOOR — load-bearing but NOT complete; the NER pass below is required.)
|
||
2. **Rules** for structured PII: regexes for emails, phones, $ amounts, dates, and
|
||
SSN/account-number shapes (account/SSN shapes route to Tier-1).
|
||
3. **Local-Qwen NER pass** for residual named entities the map and rules miss (a person/org
|
||
in free text not yet in the CRM). Call the SAME local Qwen you already serve at
|
||
`/v1/chat/completions` (enable_thinking=false, temperature 0), with a strict JSON-only
|
||
extraction prompt returning `{entities:[{text,type,tier}]}`. Never a remote model — the
|
||
input is exactly the sensitive text we are keeping local. `ner="rules_only"` skips this;
|
||
`ner="qwen"` forces it; `auto` runs it only on spans unresolved by steps 1–2. Qwen should
|
||
also flag **descriptive** re-identifiers (e.g. "the family that sold the mining company in
|
||
Texas"), not just named entities.
|
||
|
||
## Map-stays-local (non-negotiable)
|
||
|
||
The pseudonym map `{token -> real_value}` is stored ONLY on the Spark, keyed by
|
||
`map_handle`, in memory or a short-lived local store, TTL-expired (default ~2h). Never
|
||
returned in full, never written to any log/metric/trace that leaves the box, never in any
|
||
Claude-bound payload. `/scrub` returns only the opaque `map_handle` and counts. If you add
|
||
`GET /scrub/map/{map_handle}` for same-box debugging, gate it behind the same auth and keep
|
||
it off by default.
|
||
|
||
## Logging
|
||
|
||
One row per call to our `interaction_log` (we give you a write path or ingest an emitted
|
||
event): `action='redaction.scrub' | 'redaction.rehydrate'`, `actor_type='agent'`,
|
||
`actor_id=<actor>`, `source='spark_control'`, `payload` = **COUNTS ONLY** (tiers dropped,
|
||
tokens by type, distinct_entities, descriptive_flags spans WITHOUT real values, model used).
|
||
The payload MUST NOT contain any real Tier-2 value, any Tier-1 content, or the map.
|
||
|
||
## Acceptance tests (must pass before agents route through it)
|
||
|
||
1. **Golden-file diff** — fixed inputs, recorded expected scrubbed output; assert NO Tier-1
|
||
string and NO real Tier-2 identifier appears in any `/scrub` response or Claude-bound
|
||
payload. (We will share our `backend/redaction/` fixtures + golden files.)
|
||
2. **Round-trip identity** — scrub → echo tokens → rehydrate reproduces the original real
|
||
values exactly; token stability holds across items and across a second `/scrub` reusing
|
||
the same `map_handle`.
|
||
3. **Re-identification spot-check** — feed ONLY the de-identified prompt to the local Qwen
|
||
and ask it to name the real people/orgs/amounts; anything it recovers (esp. descriptive
|
||
re-identifiers and amount+date+sector inference) is flagged and must be driven to zero or
|
||
escalated to Tier-1/bucketing.
|
||
4. **Map-leak assertion** — scan every response body, log row, and Claude-bound payload in
|
||
the suite; assert the map and all real values are absent.
|
||
5. **Strict-rehydrate tripwire** — a rehydrate input with an unmapped token returns 409.
|
||
6. **Fail-closed** — `tier1_action="reject"` returns 422 on any Tier-1 input, emits nothing.
|
||
|
||
## Notes
|
||
|
||
- This does NOT replace **minimization**: the agent must first ask "does Claude need this
|
||
record content at all?" — often a local retrieval summary suffices. These endpoints are
|
||
for when the answer is genuinely yes.
|
||
- Keep placeholders **cache-stable** (deterministic token assignment per task_id/map_handle)
|
||
so they compose with prompt caching on the Claude side.
|
||
- We will hand you our in-repo `redaction` module + golden files as the reference; aim for
|
||
behavioral parity first, then we cut agents over to the gateway as the single enforcement
|
||
point.
|