Files
Keysat aec2b7775b Harden privacy boundary and asset serving (v0.1.0:74)
Fixes from the 2026-06-12 full-eval (P0 + two P1s); code-only, no schema
change. Without these the "private CRM" premise was breachable on the LAN:

- P0: the /assets/ route joined the request path onto FRONTEND_DIR without
  normalizing '..' (get_path/urlparse pass it through), so an unauthenticated
  GET /assets/../../data/crm.db read any file the process could — the LP DB,
  the JWT signing secret (-> admin-token forgery), the Gmail key. Add a realpath
  containment check that 404s anything resolving outside FRONTEND_ROOT.
- P1: the LP-outreach drafter built its redaction Boundary with no ner_fn, so
  unknown people/firms in raw email bodies reached Claude in the clear. Pass the
  local-Qwen NER backstop (ner_fn=_ner_local), matching architect_grounding;
  fails closed via the existing scrub_unavailable path if the local model is down.
- P1: get-by-id handlers leaked soft-deleted records by direct ID. Add
  deleted_at IS NULL to every get-by-id path — contacts, organizations,
  opportunities, lp_profiles — and to the nested related-data sub-selects in
  the contact/opportunity detail payloads, matching the list-handler convention.

Bumps the package to v0.1.0:74 (utils.ts + versions/v0.1.0.74.ts + graph).
Full report in EVALUATION.md; remaining P2/P3 triaged in AGENTS.md Current state.
2026-06-12 18:01:48 -05:00

1.7 KiB

paths
paths
backend/redaction/**
backend/mcp/**

Redaction & the Claude privacy boundary

Read this before editing anything that sends data to a Claude model — the redaction layer or any MCP agent/tool path.

The boundary

  • backend/redaction/ (scrub.py + client.py) is the scrub → Claude → re-hydrate boundary: Boundary, SCRUB_BACKEND=local|gateway, fail-closed.
  • SCRUB_BACKEND=gateway routes scrubbing through Spark Control (caller-supplied dict). Local backend scrubs in-process. If scrubbing can't run, the call fails closed — it does not pass raw text through.

Hard rules

  • Keep real LP data out of Claude. Develop only on code/schema/synthetic-or-locally-redacted data. Route any real record substance through backend/redaction before it reaches a Claude model.
  • Never bulk-export the LP list to any third party. Send only minimal, non-sensitive context to Claude.
  • Never call a Spark directly — go through Spark Control (SPARK_CONTROL_URL).

When adding a new Claude/MCP call

Trace the data path: any field carrying LP substance must cross Boundary first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in backend/redaction/test_*.py.

A Claude path that sends free-prose LP content (email bodies, notes) must pass ner_fn=_ner_local to Boundary and fail closed if the local model is down — the dictionary+regex floor only tokenizes KNOWN CRM entities, so unknown people/firms in prose leak otherwise. See backend/mcp/architect_grounding.py (does it right) and backend/mcp/outreach_agent.py.

See also docs/redaction-rehydration.md and docs/spark-control-scrub-endpoints.md.