Add regression tests for v74 fixes; close soft-delete leak in list-view aggregates

Lock in the three v0.1.0:74 security/privacy fixes with regression tests, and fix a same-class soft-delete leak surfaced while writing them. - backend/test_assets_traversal.py: boots the real server, proves /assets/ path-traversal vectors (incl. a real decoy file and the live crm.db, plain and URL-encoded) 404 and leak nothing, while a legit asset still serves 200. - backend/test_soft_delete_reads.py: get-by-id 404s soft-deleted rows and nested + list-view aggregates exclude soft-deleted children. - backend/mcp/test_outreach_redaction.py: an unknown free-prose name is tokenized away from the Claude payload but re-hydrated locally, and the path fails closed (no Claude call) when the local NER model is down. - backend/run_tests.py: aggregate runner (each backend/**/test_*.py in its own subprocess); replaces the manual for-loop. 16/16 green. A reviewer pass on the tests confirmed the soft-delete filter was missing from list-view aggregate sub-selects: org contact_count/total_funded and contacts comm_count/last_contact_date counted soft-deleted rows. Add `deleted_at IS NULL` to those four (server.py) and regression-cover them. The reports subsystem (dashboard/pipeline/LP-breakdown, ~16 aggregate queries) has the same leak and is logged as P2 for a dedicated pass. Not yet built or deployed — bump the package version before the next s9pk build.
2026-06-13 00:26:22 -05:00
parent a74a540295
commit 7285bb0e52
6 changed files with 488 additions and 11 deletions
@@ -25,8 +25,8 @@
 python3 -m py_compile backend/server.py
 # Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
 python3 backend/redaction/test_scrub_leak.py        # substitute any backend/**/test_*.py
-# Run all tests (no aggregate runner exists)
-for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done
+# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
+python3 backend/run_tests.py                         # add substrings to filter, e.g. `... soft_delete redaction`
 # Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
 cd start9/0.4 && make
 ```
@@ -64,7 +64,7 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
 ## Conventions

 - **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — not just list handlers but get-by-id and nested related-data sub-selects too (the 2026-06-12 audit found both leaking soft-deleted rows the list handlers already hid). (Thesis has a subtlety here — see the thesis guide.)
+- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
 - **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
 - **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.

@@ -100,10 +100,11 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
 _Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:74**. Longer-term backlog: `ROADMAP.md`._

 - **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Deployed:** v0.1.0:74 is committed, pushed (`main` @ `aec2b77`), built, and **installed to the box** (`$START9_BOX_HOST` / immense-voyage.local now reports v0.1.0:74, up from v72). On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). **Unverified post-deploy:** service health after the v72→v74 migration, and the security fixes behaving live (no box CRM URL/auth on hand).
+- **Deployed & verified live (2026-06-13):** v0.1.0:74 is **installed and healthy on the box** (`$START9_BOX_HOST` / immense-voyage.local). Grant confirms login works; `/assets/` traversal 404s live (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
+- **Repo ahead of the box (committed, NOT yet built/deployed):** since v74, `main` adds the **list-view soft-delete aggregate fix** (`server.py`: org `contact_count`/`total_funded`, contacts `comm_count`/`last_contact_date` now filter `deleted_at`), three **regression tests** (traversal/soft-delete/NER), and an **aggregate test runner**. The deployed box is still pristine v74 — **bump the version before the next s9pk build** to ship these.
 - **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. Verified locally (py_compile, query exec, redaction/outreach tests, containment logic) + two reviewer passes.
- **Local verification (2026-06-12):** all documented commands run clean — `py_compile` OK, **13/13 backend tests green**, `./start.sh`/`./start_beta.sh` boot (health 200, auth 401), `make` builds the x86 s9pk (v0.1.0:74), `/assets/` traversal 404s locally (incl. URL-encoded). The 2 stale thesis tests are fixed (seed structure now documented in `docs/guides/thesis.md`). Box-only checks still open: live service health + security fixes on `$START9_BOX_HOST`.
+- **Tests (2026-06-13):** **16/16 backend tests green** via `python3 backend/run_tests.py` (the new aggregate runner; +3 regression tests this session). `py_compile` clean; `./start.sh`/`./start_beta.sh` boot (health 200, auth 401); `make` builds the x86 s9pk. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
 - **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only).
- **Known debt (P2, not deploy-blocking):** no aggregate test runner (the Commands `for` loop is it); `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
+- **Known debt (P2, not deploy-blocking):** the **reports subsystem** (`handle_dashboard_report`/`handle_pipeline_report`/`handle_lp_breakdown_report`, ~16 aggregate queries over contacts/opportunities/communications/lp_profiles) still counts soft-deleted rows — the list/detail aggregates were fixed (v74 + the org/contacts list-view follow-up) but the reports were not; needs its own pass + report-endpoint tests; `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
 - **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
- **Next:** 1) verify v0.1.0:74 live on the box — service health + `curl --path-as-is .../assets/../../data/crm.db` → 404; 2) clear P2 debt (next: aggregate test runner + add traversal/soft-delete/NER regression tests; 2 stale thesis tests already realigned); 3) Grant + Jonathan freeze v2.0 canonical; 4) build reply-all; 5) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
+- **Next:** 1) **reports-subsystem soft-delete sweep** — ~16 dashboard/pipeline/LP aggregate queries still count soft-deleted rows; fix + add report-endpoint tests; 2) **bump version + rebuild/redeploy** to ship the list-view fix + tests now sitting ahead of the box; 3) `?limit=abc` crash (P2); 4) Grant + Jonathan freeze v2.0 canonical; 5) build reply-all; 6) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+"""Regression test for the outreach NER-backstop wiring (v0.1.0:74).
+
+The outreach draft path scrubs free-prose LP context (CRM notes + email bodies) before
+it reaches Claude. The dictionary+regex floor only tokenizes KNOWN CRM entities, so an
+UNKNOWN person/firm mentioned in an email body would otherwise reach Claude in the clear.
+The v74 fix wired the local-Qwen NER backstop into draft_outreach (outreach_agent.py:
+`Boundary(..., ner_fn=_ner_local)`) and made it FAIL CLOSED when the local model is down.
+
+This drives the real draft_outreach with Claude and the NER model stubbed (offline,
+synthetic — guardrail #9) and proves:
+  (1) an unknown name in an email body is tokenized AWAY from the Claude payload;
+  (2) it is re-hydrated locally so the human still sees the real name;
+  (3) the interaction_log captures no sensitive value;
+  (4) when the local NER model raises (unreachable), the path returns scrub_unavailable
+      and Claude is never called.
+
+Run: cd backend && python3 mcp/test_outreach_redaction.py
+"""
+import os
+import sqlite3
+import sys
+import tempfile
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+sys.path.insert(0, _HERE)                       # backend/mcp
+sys.path.insert(0, os.path.dirname(_HERE))      # backend (for the redaction package)
+
+import outreach_agent as oa            # noqa: E402
+import architect_grounding as G        # noqa: E402
+import architect_agent as aa           # noqa: E402  (imports OK offline; client is lazy)
+
+FAILS = []
+
+UNKNOWN = "Penelope Ashworth-Vane"      # a person in NO CRM table -> only NER can catch her
+INVESTOR = "Harbor & Vine"              # a known org (fundraising_investors) -> dictionary floor
+
+
+def check(cond, msg):
+    print(("  PASS " if cond else "  FAIL ") + msg)
+    if not cond:
+        FAILS.append(msg)
+
+
+def make_db():
+    path = os.path.join(tempfile.mkdtemp(), "crm.db")
+    c = sqlite3.connect(path)
+    c.row_factory = sqlite3.Row
+    c.executescript("""
+        CREATE TABLE fundraising_investors (id TEXT PRIMARY KEY, investor_name TEXT, notes TEXT);
+        CREATE TABLE emails (id TEXT PRIMARY KEY, subject TEXT, body_text TEXT, snippet TEXT, sent_at TEXT,
+            from_email TEXT, to_emails_json TEXT, thread_id TEXT, is_matched INT);
+        CREATE TABLE email_investor_links (id TEXT, email_id TEXT, fundraising_investor_id TEXT);
+        CREATE TABLE interaction_log (id TEXT PRIMARY KEY, ts TEXT, actor_type TEXT, actor_id TEXT, action TEXT,
+            target_type TEXT, target_id TEXT, payload TEXT, source TEXT, created_at TEXT);
+    """)
+    c.execute("INSERT INTO fundraising_investors VALUES ('inv1',?,?)",
+              (INVESTOR, "Warm on Fund III; weighing lock-up terms."))
+    # The active-thread email body names an UNKNOWN person in free prose.
+    c.execute("INSERT INTO emails (id,subject,body_text,sent_at,thread_id,is_matched) VALUES "
+              "('e1','Re: Fund III',?,?,'t1',1)",
+              (f"Thanks for the call. My partner {UNKNOWN} still has a lock-up objection.", "2026-06-02T10:00:00"))
+    c.execute("INSERT INTO email_investor_links (id,email_id,fundraising_investor_id) VALUES ('l1','e1','inv1')")
+    c.commit()
+    return path, c
+
+
+def main():
+    db_path, conn = make_db()
+
+    # Stub the thesis fetch (avoid the thesis DB dependency) and Claude. The NER stub stands
+    # in for the local-Qwen model; _draft_with_claude echoes the de-identified text back so
+    # re-hydration is exercised and we can inspect exactly what would have reached Claude.
+    aa.at.get_thesis = lambda *a, **k: {}
+    captured = {}
+
+    def fake_claude(aa_mod, thesis, type_desc, deident_target, deident_voice, guidance):
+        captured["target"] = deident_target
+        return deident_target   # passthrough -> rehydrate must restore the real name
+
+    oa._draft_with_claude = fake_claude
+    G._ner_local = lambda text: [(UNKNOWN, "PERSON")]   # local model UP, finds the unknown name
+
+    # ── A) unknown name is tokenized away from Claude, restored locally ──
+    print("\n[A — NER backstop tokenizes an unknown name in outreach]")
+    res = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None)
+    check(res.get("status") == "ok", f"draft ok (status={res.get('status')})")
+    sent = captured.get("target", "")
+    check(UNKNOWN not in sent, "unknown name absent from the Claude payload (NER tokenized it)")
+    check(INVESTOR not in sent, "known investor org absent from the Claude payload (dictionary floor)")
+    check("lock-up" in sent, "objection substance survives to Claude")
+    check(UNKNOWN in res.get("draft", ""), "unknown name re-hydrated locally for the human")
+
+    blob = " ".join(r[0] for r in conn.execute("SELECT payload FROM interaction_log WHERE payload IS NOT NULL"))
+    check(UNKNOWN not in blob and INVESTOR not in blob, "interaction_log carries NO sensitive value")
+
+    # ── B) FAIL CLOSED: local NER model unreachable -> no Claude call ──
+    print("\n[B — fail closed: local NER model down]")
+    called = {"claude": False}
+
+    def boom(text):
+        raise RuntimeError("Spark Control unreachable")
+
+    G._ner_local = boom
+    oa._draft_with_claude = lambda *a, **k: called.__setitem__("claude", True) or a[3]
+    res2 = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None)
+    check(res2.get("status") == "scrub_unavailable", f"status scrub_unavailable (got {res2.get('status')})")
+    check(bool(res2.get("reason")), "scrub_unavailable carries the propagated NER failure reason (non-vacuous)")
+    check(called["claude"] is False, "Claude was NOT called when the NER model is down (fail closed)")
+    check("draft" not in res2, "no draft returned when scrub fails closed")
+
+    conn.close()
+    print()
+    if FAILS:
+        print(f"FAILED ({len(FAILS)}):")
+        for f in FAILS:
+            print(f"  - {f}")
+        sys.exit(1)
+    print("ALL PASS (outreach NER-backstop wiring + fail-closed)")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,67 @@
+#!/usr/bin/env python3
+"""Aggregate test runner for the backend suite.
+
+The backend tests are standalone scripts (each with `if __name__ == "__main__"`, no
+pytest). This discovers every backend/**/test_*.py and runs each in its OWN subprocess
+(tests set os.environ and import `server` with different configs, so isolation matters),
+prints a one-line PASS/FAIL per test, dumps output only for failures, and exits non-zero
+if any test fails.
+
+Run:  python3 backend/run_tests.py        (from the repo root)
+  or: cd backend && python3 run_tests.py
+Filter: python3 backend/run_tests.py soft_delete redaction   # substring match on path
+"""
+import os
+import subprocess
+import sys
+import time
+
+BACKEND = os.path.dirname(os.path.abspath(__file__))
+
+
+def discover(filters):
+    found = []
+    for root, dirs, files in os.walk(BACKEND):
+        dirs[:] = [d for d in dirs if d != "__pycache__"]
+        for f in files:
+            if f.startswith("test_") and f.endswith(".py"):
+                path = os.path.join(root, f)
+                rel = os.path.relpath(path, BACKEND)
+                if not filters or any(flt in rel for flt in filters):
+                    found.append(path)
+    return sorted(found)
+
+
+def main():
+    filters = sys.argv[1:]
+    tests = discover(filters)
+    if not tests:
+        print("No tests matched.")
+        sys.exit(1)
+    print(f"Running {len(tests)} backend test(s)\n")
+
+    passed, failed = [], []
+    t0 = time.time()
+    for path in tests:
+        rel = os.path.relpath(path, BACKEND)
+        proc = subprocess.run([sys.executable, path], cwd=BACKEND,
+                              stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        if proc.returncode == 0:
+            passed.append(rel)
+            print(f"  PASS  {rel}")
+        else:
+            failed.append(rel)
+            print(f"  FAIL  {rel}")
+            sys.stdout.write(proc.stdout.decode("utf-8", "replace").rstrip() + "\n")
+
+    print(f"\n{len(passed)}/{len(tests)} passed in {time.time() - t0:.1f}s")
+    if failed:
+        print("FAILED:")
+        for f in failed:
+            print(f"  - {f}")
+        sys.exit(1)
+    print("ALL PASS")
+
+
+if __name__ == "__main__":
+    main()
@@ -2136,8 +2136,8 @@ class CRMHandler(BaseHTTPRequestHandler):
        conn = get_db()
        query = """
            SELECT c.*, o.name as organization_name,
-                   (SELECT COUNT(*) FROM communications WHERE contact_id = c.id) as comm_count,
-                   (SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id) as last_contact_date
+                   (SELECT COUNT(*) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as comm_count,
+                   (SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as last_contact_date
            FROM contacts c
            LEFT JOIN organizations o ON c.organization_id = o.id
            WHERE 1=1 AND c.deleted_at IS NULL
@@ -2345,8 +2345,8 @@ class CRMHandler(BaseHTTPRequestHandler):
        conn = get_db()
        query = """
            SELECT o.*,
-                   (SELECT COUNT(*) FROM contacts WHERE organization_id = o.id) as contact_count,
-                   (SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded') as total_funded
+                   (SELECT COUNT(*) FROM contacts WHERE organization_id = o.id AND deleted_at IS NULL) as contact_count,
+                   (SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded' AND deleted_at IS NULL) as total_funded
            FROM organizations o WHERE 1=1 AND o.deleted_at IS NULL
        """
        args = []
@@ -0,0 +1,126 @@
+#!/usr/bin/env python3
+"""Regression test for the /assets/ path-traversal containment fix (v0.1.0:74).
+
+Before the fix, get_path()/urlparse did NOT normalize '..', so an unauthenticated
+GET /assets/../../data/crm.db (raw client, no client-side normalization) escaped the
+frontend root and read any file the process could — the LP DB, the JWT secret, the
+Gmail key. The fix resolves the target with os.path.realpath and 404s anything that
+does not stay under FRONTEND_ROOT (server.py, the `/assets/` branch of do_GET).
+
+This boots the REAL server in-process against a throwaway frontend root, plants a
+decoy "secret" OUTSIDE that root, and proves: (1) traversal vectors that resolve to a
+real readable file outside the root still 404 and leak no bytes; (2) the live crm.db
+path is 404'd; (3) URL-encoded separators don't help; (4) a legit in-bounds asset
+still serves 200 (the fix isn't over-broad). Synthetic only (guardrail #9).
+
+Run: cd backend && python3 test_assets_traversal.py
+"""
+import http.client
+import os
+import sys
+import tempfile
+import threading
+from http.server import ThreadingHTTPServer
+
+# Lay out a throwaway tree BEFORE importing server (FRONTEND_DIR/ROOT resolve at import):
+#   base/frontend/{index.html,assets/app.css}   <- the served root
+#   base/secret.txt                             <- a real file a traversal would target
+#   base/data/crm.db                            <- the live DB, created by init_db()
+_BASE = tempfile.mkdtemp()
+_FRONTEND = os.path.join(_BASE, "frontend")
+os.makedirs(os.path.join(_FRONTEND, "assets"))
+_DATA = os.path.join(_BASE, "data")
+os.makedirs(_DATA)
+with open(os.path.join(_FRONTEND, "index.html"), "w") as f:
+    f.write("<!doctype html><title>crm</title>")
+_CSS_MARKER = "/* legit-asset-marker-7f3a */"
+with open(os.path.join(_FRONTEND, "assets", "app.css"), "w") as f:
+    f.write(_CSS_MARKER)
+_SECRET_MARKER = "TOPSECRET-JWT-zq19"
+with open(os.path.join(_BASE, "secret.txt"), "w") as f:
+    f.write(_SECRET_MARKER)
+
+os.environ["CRM_FRONTEND_DIR"] = _FRONTEND
+os.environ["CRM_DATA_DIR"] = _DATA
+os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+import server  # noqa: E402
+
+FAILS = []
+
+
+def check(cond, msg):
+    print(("  PASS " if cond else "  FAIL ") + msg)
+    if not cond:
+        FAILS.append(msg)
+
+
+class _Quiet(server.CRMHandler):
+    def log_message(self, *a):  # keep the test output clean
+        pass
+
+
+def _get(port, path):
+    """Raw GET with the path sent verbatim — http.client does NOT normalize '..',
+    which is exactly the unauthenticated raw-client threat the fix defends against."""
+    conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
+    conn.request("GET", path)
+    resp = conn.getresponse()
+    body = resp.read().decode("utf-8", "replace")
+    conn.close()
+    return resp.status, body
+
+
+def main():
+    server.init_db()  # creates base/data/crm.db and the full schema
+    check(os.path.exists(os.environ["CRM_DB_PATH"]), "init_db created the live crm.db (a real traversal target)")
+
+    httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
+    port = httpd.server_address[1]
+    threading.Thread(target=httpd.serve_forever, daemon=True).start()
+    try:
+        # ── legit in-bounds asset still serves (containment is not over-broad) ──
+        print("\n[legit asset]")
+        st, body = _get(port, "/assets/app.css")
+        check(st == 200, f"in-bounds /assets/app.css serves 200 (got {st})")
+        check(_CSS_MARKER in body, "in-bounds asset body is served intact")
+
+        # ── traversal to a REAL file outside the root: 404, zero bytes leaked ──
+        print("\n[traversal -> decoy secret outside the root]")
+        for vec in ["/assets/../../secret.txt",
+                    "/assets/../../../secret.txt",
+                    "/assets/..%2f..%2fsecret.txt",         # urlparse won't decode %2f
+                    "/assets/..%2F..%2Fsecret.txt"]:        # …nor uppercase %2F (some clients send it)
+            st, body = _get(port, vec)
+            check(st == 404, f"{vec} -> 404 (got {st})")
+            check(_SECRET_MARKER not in body, f"{vec} leaks no secret bytes")
+
+        # ── traversal to the live crm.db (the headline vector from the eval) ──
+        print("\n[traversal -> live crm.db]")
+        for vec in ["/assets/../../data/crm.db",
+                    "/assets/../data/crm.db",
+                    "/assets/..%2f..%2fdata%2fcrm.db"]:
+            st, body = _get(port, vec)
+            check(st == 404, f"{vec} -> 404 (got {st})")
+            check("SQLite format 3" not in body, f"{vec} leaks no DB header")
+
+        # ── deep absolute-style escape ──
+        print("\n[deep escape]")
+        st, body = _get(port, "/assets/../../../../../../../../etc/passwd")
+        check(st == 404, f"/assets/../../etc/passwd -> 404 (got {st})")
+        check("root:" not in body, "/etc/passwd not leaked")
+    finally:
+        httpd.shutdown()
+
+    print()
+    if FAILS:
+        print(f"FAILED ({len(FAILS)}):")
+        for f in FAILS:
+            print(f"  - {f}")
+        sys.exit(1)
+    print("ALL PASS (assets path-traversal containment)")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,160 @@
+#!/usr/bin/env python3
+"""Regression test for the soft-delete READ-path fix (v0.1.0:74).
+
+Guardrail #3 is soft-delete only (deleted_at), and the 2026-06-12 audit found that
+while LIST handlers filtered `deleted_at IS NULL`, the get-by-id handlers and their
+nested related-data sub-selects did not — so a soft-deleted contact/org was still
+readable by id, and soft-deleted children still surfaced inside a parent's detail
+payload. The fix added `deleted_at IS NULL` to every get-by-id + nested sub-select
+(server.py handle_get_contact / handle_get_organization).
+
+This boots the REAL server, hand-builds active + soft-deleted rows across the five
+soft-deletable tables, and drives the live HTTP read paths with a real token. It
+asserts: get-by-id 404s a soft-deleted contact/org, and nested sub-selects
+(org->contacts/opportunities, contact->communications/opportunities/lp_profile)
+omit soft-deleted children while keeping the live ones. Synthetic only (guardrail #9).
+
+Run: cd backend && python3 test_soft_delete_reads.py
+"""
+import http.client
+import json
+import os
+import sqlite3
+import sys
+import tempfile
+import threading
+from http.server import ThreadingHTTPServer
+
+_DATA = tempfile.mkdtemp()
+os.environ["CRM_DATA_DIR"] = _DATA
+os.environ["CRM_DB_PATH"] = os.path.join(_DATA, "crm.db")
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+import server  # noqa: E402
+
+FAILS = []
+DEL = "2026-06-01T00:00:00"   # any non-NULL deleted_at marks a row soft-deleted
+
+
+def check(cond, msg):
+    print(("  PASS " if cond else "  FAIL ") + msg)
+    if not cond:
+        FAILS.append(msg)
+
+
+class _Quiet(server.CRMHandler):
+    def log_message(self, *a):
+        pass
+
+
+def _get(port, path, token):
+    conn = http.client.HTTPConnection("127.0.0.1", port, timeout=10)
+    conn.request("GET", path, headers={"Authorization": "Bearer " + token})
+    resp = conn.getresponse()
+    body = resp.read().decode("utf-8", "replace")
+    conn.close()
+    data = None
+    if body:
+        try:
+            data = json.loads(body)
+        except ValueError:
+            pass
+    return resp.status, data
+
+
+def seed():
+    """Build a fixed graph of live + soft-deleted rows directly in the migrated DB."""
+    c = sqlite3.connect(os.environ["CRM_DB_PATH"])
+    c.execute("INSERT INTO users (id,username,email,password_hash,full_name,role,is_active) "
+              "VALUES ('u1','grant','grant@ten31.example','x','Grant','admin',1)")
+    # organizations: one live, one soft-deleted
+    c.execute("INSERT INTO organizations (id,name) VALUES ('orgA','Harbor & Vine')")
+    c.execute("INSERT INTO organizations (id,name,deleted_at) VALUES ('orgX','Deleted Org',?)", (DEL,))
+    # contacts under orgA: one live (with children), one soft-deleted, one live w/ deleted lp
+    c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id) VALUES ('cLive','Ada','Live','orgA')")
+    c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id,deleted_at) VALUES ('cDead','Boris','Gone','orgA',?)", (DEL,))
+    c.execute("INSERT INTO contacts (id,first_name,last_name,organization_id) VALUES ('cLp','Cora','Lp','orgA')")
+    # opportunities on cLive (also tied to orgA so they appear in the org detail too)
+    c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id) VALUES ('opLive','Live Opp','cLive','orgA','u1')")
+    c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,deleted_at) VALUES ('opDead','Dead Opp','cLive','orgA','u1',?)", (DEL,))
+    # funded opportunities on orgA — one live, one soft-deleted (for the org-list total_funded aggregate)
+    c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,stage,commitment_amount) VALUES ('opFundLive','Funded Live','cLive','orgA','u1','funded',1000000)")
+    c.execute("INSERT INTO opportunities (id,name,contact_id,organization_id,owner_id,stage,commitment_amount,deleted_at) VALUES ('opFundDead','Funded Dead','cLive','orgA','u1','funded',500000,?)", (DEL,))
+    # communications on cLive
+    c.execute("INSERT INTO communications (id,contact_id,communication_date,created_by,subject) VALUES ('cmLive','cLive','2026-05-01','u1','Live note')")
+    c.execute("INSERT INTO communications (id,contact_id,communication_date,created_by,subject,deleted_at) VALUES ('cmDead','cLive','2026-05-02','u1','Dead note',?)", (DEL,))
+    # lp_profiles: live one on cLive, soft-deleted one on cLp
+    c.execute("INSERT INTO lp_profiles (id,contact_id,fund_name) VALUES ('lpLive','cLive','Fund III')")
+    c.execute("INSERT INTO lp_profiles (id,contact_id,fund_name,deleted_at) VALUES ('lpDead','cLp','Fund III',?)", (DEL,))
+    c.commit()
+    c.close()
+
+
+def main():
+    server.init_db()
+    seed()
+    token = server.create_token("u1", "grant", "admin")
+
+    httpd = ThreadingHTTPServer(("127.0.0.1", 0), _Quiet)
+    port = httpd.server_address[1]
+    threading.Thread(target=httpd.serve_forever, daemon=True).start()
+    try:
+        # ── get-by-id: soft-deleted rows are not found ──
+        print("\n[get-by-id excludes soft-deleted]")
+        st, _ = _get(port, "/api/contacts/cDead", token)
+        check(st == 404, f"GET soft-deleted contact -> 404 (got {st})")
+        st, _ = _get(port, "/api/organizations/orgX", token)
+        check(st == 404, f"GET soft-deleted organization -> 404 (got {st})")
+        st, live = _get(port, "/api/contacts/cLive", token)
+        check(st == 200, f"GET live contact -> 200 (got {st})")
+
+        # ── contact detail nested sub-selects exclude soft-deleted children ──
+        print("\n[contact detail nested sub-selects]")
+        d = (live or {}).get("data", {})
+        comm_ids = {x["id"] for x in d.get("communications", [])}
+        opp_ids = {x["id"] for x in d.get("opportunities", [])}
+        check("cmLive" in comm_ids and "cmDead" not in comm_ids, f"communications: live only (got {comm_ids})")
+        check("opLive" in opp_ids and "opDead" not in opp_ids, f"opportunities: live only (got {opp_ids})")
+        check(bool(d.get("lp_profile")) and d["lp_profile"].get("id") == "lpLive", "live lp_profile present on contact")
+
+        # soft-deleted lp_profile must read back as None (nested single-row sub-select)
+        _, lpc = _get(port, "/api/contacts/cLp", token)
+        check((lpc or {}).get("data", {}).get("lp_profile") is None, "soft-deleted lp_profile reads back as None")
+
+        # ── organization detail nested sub-selects exclude soft-deleted children ──
+        print("\n[organization detail nested sub-selects]")
+        _, org = _get(port, "/api/organizations/orgA", token)
+        od = (org or {}).get("data", {})
+        org_contacts = {x["id"] for x in od.get("contacts", [])}
+        org_opps = {x["id"] for x in od.get("opportunities", [])}
+        check("cLive" in org_contacts and "cLp" in org_contacts and "cDead" not in org_contacts,
+              f"org.contacts: both live contacts present, soft-deleted absent (got {org_contacts})")
+        check("opLive" in org_opps and "opDead" not in org_opps, f"org.opportunities: live only (got {org_opps})")
+
+        # ── list-view aggregates exclude soft-deleted rows (org contact_count/total_funded, contact comm_count) ──
+        print("\n[list-view aggregates]")
+        _, orglist = _get(port, "/api/organizations", token)
+        rowA = next((x for x in (orglist or {}).get("data", []) if x.get("id") == "orgA"), None)
+        check(rowA is not None, "orgA present in org list")
+        if rowA:
+            check(rowA.get("contact_count") == 2, f"org contact_count: live only (cLive,cLp -> 2; got {rowA.get('contact_count')})")
+            check(rowA.get("total_funded") == 1000000, f"org total_funded: live funded only (1,000,000; got {rowA.get('total_funded')})")
+        _, ctlist = _get(port, "/api/contacts", token)
+        rowC = next((x for x in (ctlist or {}).get("data", []) if x.get("id") == "cLive"), None)
+        check(rowC is not None, "cLive present in contact list")
+        if rowC:
+            check(rowC.get("comm_count") == 1, f"contact comm_count: live only (cmLive -> 1; got {rowC.get('comm_count')})")
+    finally:
+        httpd.shutdown()
+
+    print()
+    if FAILS:
+        print(f"FAILED ({len(FAILS)}):")
+        for f in FAILS:
+            print(f"  - {f}")
+        sys.exit(1)
+    print("ALL PASS (soft-delete read-path containment)")
+
+
+if __name__ == "__main__":
+    main()