ten31-database/docs/guides/redaction.md

---
paths:
  - backend/redaction/**
  - backend/mcp/**
---

# Redaction & the Claude privacy boundary

Read this before editing anything that sends data to a Claude model — the redaction layer or any MCP agent/tool path.

## The boundary

- `backend/redaction/` (`scrub.py` + `client.py`) is the **scrub → Claude → re-hydrate** boundary: `Boundary`, `SCRUB_BACKEND=local|gateway`, **fail-closed**.
- `SCRUB_BACKEND=gateway` routes scrubbing through Spark Control (caller-supplied dict). Local backend scrubs in-process. If scrubbing can't run, the call fails closed — it does not pass raw text through.

## Hard rules

- **Keep real LP data out of Claude.** Develop only on code/schema/synthetic-or-locally-redacted data. Route any real record substance through `backend/redaction` before it reaches a Claude model.
- **Never bulk-export the LP list** to any third party. Send only minimal, non-sensitive context to Claude.
- **Never call a Spark directly** — go through Spark Control (`SPARK_CONTROL_URL`).

## When adding a new Claude/MCP call

Trace the data path: any field carrying LP substance must cross `Boundary` first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in `backend/redaction/test_*.py`.

A Claude path that sends **free-prose** LP content (email bodies, notes) must pass `ner_fn=_ner_local` to `Boundary` and **fail closed** if the local model is down — the dictionary+regex floor only tokenizes KNOWN CRM entities, so unknown people/firms in prose leak otherwise. See `backend/mcp/architect_grounding.py` (does it right) and `backend/mcp/outreach_agent.py`.

See also `docs/redaction-rehydration.md` and `docs/spark-control-scrub-endpoints.md`.