Add regression tests for v74 fixes; close soft-delete leak in list-view aggregates

Lock in the three v0.1.0:74 security/privacy fixes with regression tests, and
fix a same-class soft-delete leak surfaced while writing them.

- backend/test_assets_traversal.py: boots the real server, proves /assets/
  path-traversal vectors (incl. a real decoy file and the live crm.db, plain
  and URL-encoded) 404 and leak nothing, while a legit asset still serves 200.
- backend/test_soft_delete_reads.py: get-by-id 404s soft-deleted rows and
  nested + list-view aggregates exclude soft-deleted children.
- backend/mcp/test_outreach_redaction.py: an unknown free-prose name is
  tokenized away from the Claude payload but re-hydrated locally, and the path
  fails closed (no Claude call) when the local NER model is down.
- backend/run_tests.py: aggregate runner (each backend/**/test_*.py in its own
  subprocess); replaces the manual for-loop. 16/16 green.

A reviewer pass on the tests confirmed the soft-delete filter was missing from
list-view aggregate sub-selects: org contact_count/total_funded and contacts
comm_count/last_contact_date counted soft-deleted rows. Add `deleted_at IS NULL`
to those four (server.py) and regression-cover them.

The reports subsystem (dashboard/pipeline/LP-breakdown, ~16 aggregate queries)
has the same leak and is logged as P2 for a dedicated pass. Not yet built or
deployed — bump the package version before the next s9pk build.
This commit is contained in:
Keysat
2026-06-13 00:26:22 -05:00
parent a74a540295
commit 7285bb0e52
6 changed files with 488 additions and 11 deletions
+8 -7
View File
@@ -25,8 +25,8 @@
python3 -m py_compile backend/server.py
# Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py
# Run all tests (no aggregate runner exists)
for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done
# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction`
# Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
cd start9/0.4 && make
```
@@ -64,7 +64,7 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
## Conventions
- **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` not just list handlers but get-by-id and nested related-data sub-selects too (the 2026-06-12 audit found both leaking soft-deleted rows the list handlers already hid). (Thesis has a subtlety here — see the thesis guide.)
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
@@ -100,10 +100,11 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
_Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:74**. Longer-term backlog: `ROADMAP.md`._
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Deployed:** v0.1.0:74 is committed, pushed (`main` @ `aec2b77`), built, and **installed to the box** (`$START9_BOX_HOST` / immense-voyage.local now reports v0.1.0:74, up from v72). On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). **Unverified post-deploy:** service health after the v72→v74 migration, and the security fixes behaving live (no box CRM URL/auth on hand).
- **Deployed & verified live (2026-06-13):** v0.1.0:74 is **installed and healthy on the box** (`$START9_BOX_HOST` / immense-voyage.local). Grant confirms login works; `/assets/` traversal 404s live (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
- **Repo ahead of the box (committed, NOT yet built/deployed):** since v74, `main` adds the **list-view soft-delete aggregate fix** (`server.py`: org `contact_count`/`total_funded`, contacts `comm_count`/`last_contact_date` now filter `deleted_at`), three **regression tests** (traversal/soft-delete/NER), and an **aggregate test runner**. The deployed box is still pristine v74 — **bump the version before the next s9pk build** to ship these.
- **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. Verified locally (py_compile, query exec, redaction/outreach tests, containment logic) + two reviewer passes.
- **Local verification (2026-06-12):** all documented commands run clean — `py_compile` OK, **13/13 backend tests green**, `./start.sh`/`./start_beta.sh` boot (health 200, auth 401), `make` builds the x86 s9pk (v0.1.0:74), `/assets/` traversal 404s locally (incl. URL-encoded). The 2 stale thesis tests are fixed (seed structure now documented in `docs/guides/thesis.md`). Box-only checks still open: live service health + security fixes on `$START9_BOX_HOST`.
- **Tests (2026-06-13):** **16/16 backend tests green** via `python3 backend/run_tests.py` (the new aggregate runner; +3 regression tests this session). `py_compile` clean; `./start.sh`/`./start_beta.sh` boot (health 200, auth 401); `make` builds the x86 s9pk. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
- **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only).
- **Known debt (P2, not deploy-blocking):** no aggregate test runner (the Commands `for` loop is it); `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
- **Known debt (P2, not deploy-blocking):** the **reports subsystem** (`handle_dashboard_report`/`handle_pipeline_report`/`handle_lp_breakdown_report`, ~16 aggregate queries over contacts/opportunities/communications/lp_profiles) still counts soft-deleted rows — the list/detail aggregates were fixed (v74 + the org/contacts list-view follow-up) but the reports were not; needs its own pass + report-endpoint tests; `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
- **Next:** 1) verify v0.1.0:74 live on the box — service health + `curl --path-as-is .../assets/../../data/crm.db` → 404; 2) clear P2 debt (next: aggregate test runner + add traversal/soft-delete/NER regression tests; 2 stale thesis tests already realigned); 3) Grant + Jonathan freeze v2.0 canonical; 4) build reply-all; 5) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
- **Next:** 1) **reports-subsystem soft-delete sweep** — ~16 dashboard/pipeline/LP aggregate queries still count soft-deleted rows; fix + add report-endpoint tests; 2) **bump version + rebuild/redeploy** to ship the list-view fix + tests now sitting ahead of the box; 3) `?limit=abc` crash (P2); 4) Grant + Jonathan freeze v2.0 canonical; 5) build reply-all; 6) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
+123
View File
@@ -0,0 +1,123 @@
#!/usr/bin/env python3
"""Regression test for the outreach NER-backstop wiring (v0.1.0:74).
The outreach draft path scrubs free-prose LP context (CRM notes + email bodies) before
it reaches Claude. The dictionary+regex floor only tokenizes KNOWN CRM entities, so an
UNKNOWN person/firm mentioned in an email body would otherwise reach Claude in the clear.
The v74 fix wired the local-Qwen NER backstop into draft_outreach (outreach_agent.py:
`Boundary(..., ner_fn=_ner_local)`) and made it FAIL CLOSED when the local model is down.
This drives the real draft_outreach with Claude and the NER model stubbed (offline,
synthetic — guardrail #9) and proves:
(1) an unknown name in an email body is tokenized AWAY from the Claude payload;
(2) it is re-hydrated locally so the human still sees the real name;
(3) the interaction_log captures no sensitive value;
(4) when the local NER model raises (unreachable), the path returns scrub_unavailable
and Claude is never called.
Run: cd backend && python3 mcp/test_outreach_redaction.py
"""
import os
import sqlite3
import sys
import tempfile
_HERE = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, _HERE) # backend/mcp
sys.path.insert(0, os.path.dirname(_HERE)) # backend (for the redaction package)
import outreach_agent as oa # noqa: E402
import architect_grounding as G # noqa: E402
import architect_agent as aa # noqa: E402 (imports OK offline; client is lazy)
FAILS = []
UNKNOWN = "Penelope Ashworth-Vane" # a person in NO CRM table -> only NER can catch her
INVESTOR = "Harbor & Vine" # a known org (fundraising_investors) -> dictionary floor
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
def make_db():
path = os.path.join(tempfile.mkdtemp(), "crm.db")
c = sqlite3.connect(path)
c.row_factory = sqlite3.Row
c.executescript("""
CREATE TABLE fundraising_investors (id TEXT PRIMARY KEY, investor_name TEXT, notes TEXT);
CREATE TABLE emails (id TEXT PRIMARY KEY, subject TEXT, body_text TEXT, snippet TEXT, sent_at TEXT,
from_email TEXT, to_emails_json TEXT, thread_id TEXT, is_matched INT);
CREATE TABLE email_investor_links (id TEXT, email_id TEXT, fundraising_investor_id TEXT);
CREATE TABLE interaction_log (id TEXT PRIMARY KEY, ts TEXT, actor_type TEXT, actor_id TEXT, action TEXT,
target_type TEXT, target_id TEXT, payload TEXT, source TEXT, created_at TEXT);
""")
c.execute("INSERT INTO fundraising_investors VALUES ('inv1',?,?)",
(INVESTOR, "Warm on Fund III; weighing lock-up terms."))
# The active-thread email body names an UNKNOWN person in free prose.
c.execute("INSERT INTO emails (id,subject,body_text,sent_at,thread_id,is_matched) VALUES "
"('e1','Re: Fund III',?,?,'t1',1)",
(f"Thanks for the call. My partner {UNKNOWN} still has a lock-up objection.", "2026-06-02T10:00:00"))
c.execute("INSERT INTO email_investor_links (id,email_id,fundraising_investor_id) VALUES ('l1','e1','inv1')")
c.commit()
return path, c
def main():
db_path, conn = make_db()
# Stub the thesis fetch (avoid the thesis DB dependency) and Claude. The NER stub stands
# in for the local-Qwen model; _draft_with_claude echoes the de-identified text back so
# re-hydration is exercised and we can inspect exactly what would have reached Claude.
aa.at.get_thesis = lambda *a, **k: {}
captured = {}
def fake_claude(aa_mod, thesis, type_desc, deident_target, deident_voice, guidance):
captured["target"] = deident_target
return deident_target # passthrough -> rehydrate must restore the real name
oa._draft_with_claude = fake_claude
G._ner_local = lambda text: [(UNKNOWN, "PERSON")] # local model UP, finds the unknown name
# ── A) unknown name is tokenized away from Claude, restored locally ──
print("\n[A — NER backstop tokenizes an unknown name in outreach]")
res = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None)
check(res.get("status") == "ok", f"draft ok (status={res.get('status')})")
sent = captured.get("target", "")
check(UNKNOWN not in sent, "unknown name absent from the Claude payload (NER tokenized it)")
check(INVESTOR not in sent, "known investor org absent from the Claude payload (dictionary floor)")
check("lock-up" in sent, "objection substance survives to Claude")
check(UNKNOWN in res.get("draft", ""), "unknown name re-hydrated locally for the human")
blob = " ".join(r[0] for r in conn.execute("SELECT payload FROM interaction_log WHERE payload IS NOT NULL"))
check(UNKNOWN not in blob and INVESTOR not in blob, "interaction_log carries NO sensitive value")
# ── B) FAIL CLOSED: local NER model unreachable -> no Claude call ──
print("\n[B — fail closed: local NER model down]")
called = {"claude": False}
def boom(text):
raise RuntimeError("Spark Control unreachable")
G._ner_local = boom
oa._draft_with_claude = lambda *a, **k: called.__setitem__("claude", True) or a[3]
res2 = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None)
check(res2.get("status") == "scrub_unavailable", f"status scrub_unavailable (got {res2.get('status')})")
check(bool(res2.get("reason")), "scrub_unavailable carries the propagated NER failure reason (non-vacuous)")
check(called["claude"] is False, "Claude was NOT called when the NER model is down (fail closed)")
check("draft" not in res2, "no draft returned when scrub fails closed")
conn.close()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (outreach NER-backstop wiring + fail-closed)")
if __name__ == "__main__":
main()
+67
View File
@@ -0,0 +1,67 @@
#!/usr/bin/env python3
"""Aggregate test runner for the backend suite.
The backend tests are standalone scripts (each with `if __name__ == "__main__"`, no
pytest). This discovers every backend/**/test_*.py and runs each in its OWN subprocess
(tests set os.environ and import `server` with different configs, so isolation matters),
prints a one-line PASS/FAIL per test, dumps output only for failures, and exits non-zero
if any test fails.
Run: python3 backend/run_tests.py (from the repo root)
or: cd backend && python3 run_tests.py
Filter: python3 backend/run_tests.py soft_delete redaction # substring match on path
"""
import os
import subprocess
import sys
import time
BACKEND = os.path.dirname(os.path.abspath(__file__))
def discover(filters):
found = []
for root, dirs, files in os.walk(BACKEND):
dirs[:] = [d for d in dirs if d != "__pycache__"]
for f in files:
if f.startswith("test_") and f.endswith(".py"):
path = os.path.join(root, f)
rel = os.path.relpath(path, BACKEND)
if not filters or any(flt in rel for flt in filters):
found.append(path)
return sorted(found)
def main():
filters = sys.argv[1:]
tests = discover(filters)
if not tests:
print("No tests matched.")
sys.exit(1)
print(f"Running {len(tests)} backend test(s)\n")
passed, failed = [], []
t0 = time.time()
for path in tests:
rel = os.path.relpath(path, BACKEND)
proc = subprocess.run([sys.executable, path], cwd=BACKEND,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
if proc.returncode == 0:
passed.append(rel)
print(f" PASS {rel}")
else:
failed.append(rel)
print(f" FAIL {rel}")
sys.stdout.write(proc.stdout.decode("utf-8", "replace").rstrip() + "\n")
print(f"\n{len(passed)}/{len(tests)} passed in {time.time() - t0:.1f}s")
if failed:
print("FAILED:")
for f in failed:
print(f" - {f}")
sys.exit(1)
print("ALL PASS")
if __name__ == "__main__":
main()
+4 -4
View File
@@ -2136,8 +2136,8 @@ class CRMHandler(BaseHTTPRequestHandler):
conn = get_db()
query = """
SELECT c.*, o.name as organization_name,
(SELECT COUNT(*) FROM communications WHERE contact_id = c.id) as comm_count,
(SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id) as last_contact_date
(SELECT COUNT(*) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as comm_count,
(SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as last_contact_date
FROM contacts c
LEFT JOIN organizations o ON c.organization_id = o.id
WHERE 1=1 AND c.deleted_at IS NULL
@@ -2345,8 +2345,8 @@ class CRMHandler(BaseHTTPRequestHandler):
conn = get_db()
query = """
SELECT o.*,
(SELECT COUNT(*) FROM contacts WHERE organization_id = o.id) as contact_count,
(SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded') as total_funded
(SELECT COUNT(*) FROM contacts WHERE organization_id = o.id AND deleted_at IS NULL) as contact_count,
(SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded' AND deleted_at IS NULL) as total_funded
FROM organizations o WHERE 1=1 AND o.deleted_at IS NULL
"""
args = []
+126
View File
@@ -0,0 +1,126 @@
#!/usr/bin/env python3
"""Regression test for the /assets/ path-traversal containment fix (v0.1.0:74).
Before the fix, get_path()/urlparse did NOT normalize '..', so an unauthenticated
GET /assets/../../data/crm.db (raw client, no client-side normalization) escaped the
frontend root and read any file the process could — the LP DB, the JWT secret, the
Gmail key. The fix resolves the target with os.path.realpath and 404s anything that
does not stay under FRONTEND_ROOT (server.py, the `/assets/` branch of do_GET).
This boots the REAL server in-process against a throwaway frontend root, plants a
decoy "secret" OUTSIDE that root, and proves: (1) traversal vectors that resolve to a
real readable file outside the root still 404 and leak no bytes; (2) the live crm.db
path is 404'd; (3) URL-encoded separators don't help; (4) a legit in-bounds asset
still serves 200 (the fix isn't over-broad). Synthetic only (guardrail #9).
Run: cd backend && python3 test_assets_traversal.py
"""
import http.client
import os
import sys
import tempfile
import threading
from http.server import ThreadingHTTPServer
# Lay out a throwaway tree BEFORE importing server (FRONTEND_DIR/ROOT resolve at import):
# base/frontend/{index.html,assets/app.css} <- the served root
# base/secret.txt <- a real file a traversal would target
# base/data/crm.db <- the live DB, created by init_db()
_BASE = tempfile.mkdtemp()
_FRONTEND = os.path.join(_BASE, "frontend")
os.makedirs(os.path.join(_FRONTEND, "assets"))
_DATA = os.path.join(_BASE, "data")
os.makedirs(_DATA)
with open(os.path.join(_FRONTEND, "index.html"), "w") as f:
f.write("<!doctype html><title>crm</title>")
_CSS_MARKER = "/* legit-asset-marker-7f3a */"
with open(os.path.join(_FRONTEND, "assets", "app.css"), "w") as f:
f.write(_CSS_MARKER)
_SECRET_MARKER = "TOPSECRET-JWT-zq19"
with open(os.path.join(_BASE, "secret.txt"), "w") as f:
f.write(_SECRET_MARKER)
os.environ["CRM_FRONTEND_DIR"] = _FRONTEND
os.environ["CRM_DATA_DIR"] = _DATA
os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import server # noqa: E402
FAILS = []
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
class _Quiet(server.CRMHandler):
def log_message(self, *a): # keep the test output clean
pass
def _get(port, path):
"""Raw GET with the path sent verbatim — http.client does NOT normalize '..',
which is exactly the unauthenticated raw-client threat the fix defends against."""
conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
conn.request("GET", path)
resp = conn.getresponse()
body = resp.read().decode("utf-8", "replace")
conn.close()
return resp.status, body
def main():
server.init_db() # creates base/data/crm.db and the full schema
check(os.path.exists(os.environ["CRM_DB_PATH"]), "init_db created the live crm.db (a real traversal target)")
httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
port = httpd.server_address[1]
threading.Thread(target=httpd.serve_forever, daemon=True).start()
try:
# ── legit in-bounds asset still serves (containment is not over-broad) ──
print("\n[legit asset]")
st, body = _get(port, "/assets/app.css")
check(st == 200, f"in-bounds /assets/app.css serves 200 (got {st})")
check(_CSS_MARKER in body, "in-bounds asset body is served intact")
# ── traversal to a REAL file outside the root: 404, zero bytes leaked ──
print("\n[traversal -> decoy secret outside the root]")
for vec in ["/assets/../../secret.txt",
"/assets/../../../secret.txt",
"/assets/..%2f..%2fsecret.txt", # urlparse won't decode %2f
"/assets/..%2F..%2Fsecret.txt"]: # …nor uppercase %2F (some clients send it)
st, body = _get(port, vec)
check(st == 404, f"{vec} -> 404 (got {st})")
check(_SECRET_MARKER not in body, f"{vec} leaks no secret bytes")
# ── traversal to the live crm.db (the headline vector from the eval) ──
print("\n[traversal -> live crm.db]")
for vec in ["/assets/../../data/crm.db",
"/assets/../data/crm.db",
"/assets/..%2f..%2fdata%2fcrm.db"]:
st, body = _get(port, vec)
check(st == 404, f"{vec} -> 404 (got {st})")
check("SQLite format 3" not in body, f"{vec} leaks no DB header")
# ── deep absolute-style escape ──
print("\n[deep escape]")
st, body = _get(port, "/assets/../../../../../../../../etc/passwd")
check(st == 404, f"/assets/../../etc/passwd -> 404 (got {st})")
check("root:" not in body, "/etc/passwd not leaked")
finally:
httpd.shutdown()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (assets path-traversal containment)")
if __name__ == "__main__":
main()
+160
View File
@@ -0,0 +1,160 @@
#!/usr/bin/env python3
"""Regression test for the soft-delete READ-path fix (v0.1.0:74).
Guardrail #3 is soft-delete only (deleted_at), and the 2026-06-12 audit found that
while LIST handlers filtered `deleted_at IS NULL`, the get-by-id handlers and their
nested related-data sub-selects did not — so a soft-deleted contact/org was still
readable by id, and soft-deleted children still surfaced inside a parent's detail
payload. The fix added `deleted_at IS NULL` to every get-by-id + nested sub-select
(server.py handle_get_contact / handle_get_organization).
This boots the REAL server, hand-builds active + soft-deleted rows across the five
soft-deletable tables, and drives the live HTTP read paths with a real token. It
asserts: get-by-id 404s a soft-deleted contact/org, and nested sub-selects
(org->contacts/opportunities, contact->communications/opportunities/lp_profile)
omit soft-deleted children while keeping the live ones. Synthetic only (guardrail #9).
Run: cd backend && python3 test_soft_delete_reads.py
"""
import http.client
import json
import os
import sqlite3
import sys
import tempfile
import threading
from http.server import ThreadingHTTPServer
_DATA = tempfile.mkdtemp()
os.environ["CRM_DATA_DIR"] = _DATA
os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import server # noqa: E402
FAILS = []
DEL = "2026-06-01T00:00:00" # any non-NULL deleted_at marks a row soft-deleted
def check(cond, msg):
print((" PASS " if cond else " FAIL ") + msg)
if not cond:
FAILS.append(msg)
class _Quiet(server.CRMHandler):
def log_message(self, *a):
pass
def _get(port, path, token):
conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
conn.request("GET", path, headers={"Authorization": "Bearer " + token})
resp = conn.getresponse()
body = resp.read().decode("utf-8", "replace")
conn.close()
data = None
if body:
try:
data = json.loads(body)
except ValueError:
pass
return resp.status, data
def seed():
"""Build a fixed graph of live + soft-deleted rows directly in the migrated DB."""
c = sqlite3.connect(os.environ["CRM_DB_PATH"])
c.execute("INSERT INTO users (id,username,email,password_hash,full_name,role,is_active) "
"VALUES ('u1','grant','grant@ten31.example','x','Grant','admin',1)")
# organizations: one live, one soft-deleted
c.execute("INSERT INTO organizations (id,name) VALUES ('orgA','Harbor & Vine')")
c.execute("INSERT INTO organizations (id,name,deleted_at) VALUES ('orgX','Deleted Org',?)", (DEL,))
# contacts under orgA: one live (with children), one soft-deleted, one live w/ deleted lp
c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id) VALUES ('cLive','Ada','Live','orgA')")
c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id,deleted_at) VALUES ('cDead','Boris','Gone','orgA',?)", (DEL,))
c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id) VALUES ('cLp','Cora','Lp','orgA')")
# opportunities on cLive (also tied to orgA so they appear in the org detail too)
c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id) VALUES ('opLive','Live Opp','cLive','orgA','u1')")
c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,deleted_at) VALUES ('opDead','Dead Opp','cLive','orgA','u1',?)", (DEL,))
# funded opportunities on orgA — one live, one soft-deleted (for the org-list total_funded aggregate)
c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,stage,commitment_amount) VALUES ('opFundLive','Funded Live','cLive','orgA','u1','funded',1000000)")
c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,stage,commitment_amount,deleted_at) VALUES ('opFundDead','Funded Dead','cLive','orgA','u1','funded',500000,?)", (DEL,))
# communications on cLive
c.execute("INSERT INTO communications (id,contact_id,communication_date,created_by,subject) VALUES ('cmLive','cLive','2026-05-01','u1','Live note')")
c.execute("INSERT INTO communications (id,contact_id,communication_date,created_by,subject,deleted_at) VALUES ('cmDead','cLive','2026-05-02','u1','Dead note',?)", (DEL,))
# lp_profiles: live one on cLive, soft-deleted one on cLp
c.execute("INSERT INTO lp_profiles (id,contact_id,fund_name) VALUES ('lpLive','cLive','Fund III')")
c.execute("INSERT INTO lp_profiles (id,contact_id,fund_name,deleted_at) VALUES ('lpDead','cLp','Fund III',?)", (DEL,))
c.commit()
c.close()
def main():
server.init_db()
seed()
token = server.create_token("u1", "grant", "admin")
httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
port = httpd.server_address[1]
threading.Thread(target=httpd.serve_forever, daemon=True).start()
try:
# ── get-by-id: soft-deleted rows are not found ──
print("\n[get-by-id excludes soft-deleted]")
st, _ = _get(port, "/api/contacts/cDead", token)
check(st == 404, f"GET soft-deleted contact -> 404 (got {st})")
st, _ = _get(port, "/api/organizations/orgX", token)
check(st == 404, f"GET soft-deleted organization -> 404 (got {st})")
st, live = _get(port, "/api/contacts/cLive", token)
check(st == 200, f"GET live contact -> 200 (got {st})")
# ── contact detail nested sub-selects exclude soft-deleted children ──
print("\n[contact detail nested sub-selects]")
d = (live or {}).get("data", {})
comm_ids = {x["id"] for x in d.get("communications", [])}
opp_ids = {x["id"] for x in d.get("opportunities", [])}
check("cmLive" in comm_ids and "cmDead" not in comm_ids, f"communications: live only (got {comm_ids})")
check("opLive" in opp_ids and "opDead" not in opp_ids, f"opportunities: live only (got {opp_ids})")
check(bool(d.get("lp_profile")) and d["lp_profile"].get("id") == "lpLive", "live lp_profile present on contact")
# soft-deleted lp_profile must read back as None (nested single-row sub-select)
_, lpc = _get(port, "/api/contacts/cLp", token)
check((lpc or {}).get("data", {}).get("lp_profile") is None, "soft-deleted lp_profile reads back as None")
# ── organization detail nested sub-selects exclude soft-deleted children ──
print("\n[organization detail nested sub-selects]")
_, org = _get(port, "/api/organizations/orgA", token)
od = (org or {}).get("data", {})
org_contacts = {x["id"] for x in od.get("contacts", [])}
org_opps = {x["id"] for x in od.get("opportunities", [])}
check("cLive" in org_contacts and "cLp" in org_contacts and "cDead" not in org_contacts,
f"org.contacts: both live contacts present, soft-deleted absent (got {org_contacts})")
check("opLive" in org_opps and "opDead" not in org_opps, f"org.opportunities: live only (got {org_opps})")
# ── list-view aggregates exclude soft-deleted rows (org contact_count/total_funded, contact comm_count) ──
print("\n[list-view aggregates]")
_, orglist = _get(port, "/api/organizations", token)
rowA = next((x for x in (orglist or {}).get("data", []) if x.get("id") == "orgA"), None)
check(rowA is not None, "orgA present in org list")
if rowA:
check(rowA.get("contact_count") == 2, f"org contact_count: live only (cLive,cLp -> 2; got {rowA.get('contact_count')})")
check(rowA.get("total_funded") == 1000000, f"org total_funded: live funded only (1,000,000; got {rowA.get('total_funded')})")
_, ctlist = _get(port, "/api/contacts", token)
rowC = next((x for x in (ctlist or {}).get("data", []) if x.get("id") == "cLive"), None)
check(rowC is not None, "cLive present in contact list")
if rowC:
check(rowC.get("comm_count") == 1, f"contact comm_count: live only (cmLive -> 1; got {rowC.get('comm_count')})")
finally:
httpd.shutdown()
print()
if FAILS:
print(f"FAILED ({len(FAILS)}):")
for f in FAILS:
print(f" - {f}")
sys.exit(1)
print("ALL PASS (soft-delete read-path containment)")
if __name__ == "__main__":
main()