Add regression tests for v74 fixes; close soft-delete leak in list-view aggregates

Lock in the three v0.1.0:74 security/privacy fixes with regression tests, and fix a same-class soft-delete leak surfaced while writing them. - backend/test_assets_traversal.py: boots the real server, proves /assets/ path-traversal vectors (incl. a real decoy file and the live crm.db, plain and URL-encoded) 404 and leak nothing, while a legit asset still serves 200. - backend/test_soft_delete_reads.py: get-by-id 404s soft-deleted rows and nested + list-view aggregates exclude soft-deleted children. - backend/mcp/test_outreach_redaction.py: an unknown free-prose name is tokenized away from the Claude payload but re-hydrated locally, and the path fails closed (no Claude call) when the local NER model is down. - backend/run_tests.py: aggregate runner (each backend/**/test_*.py in its own subprocess); replaces the manual for-loop. 16/16 green. A reviewer pass on the tests confirmed the soft-delete filter was missing from list-view aggregate sub-selects: org contact_count/total_funded and contacts comm_count/last_contact_date counted soft-deleted rows. Add `deleted_at IS NULL` to those four (server.py) and regression-cover them. The reports subsystem (dashboard/pipeline/LP-breakdown, ~16 aggregate queries) has the same leak and is logged as P2 for a dedicated pass. Not yet built or deployed — bump the package version before the next s9pk build.
2026-06-13 00:26:22 -05:00
parent a74a540295
commit 7285bb0e52
6 changed files with 488 additions and 11 deletions
@@ -25,8 +25,8 @@
 python3 -m py_compile backend/server.py
 # Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed)
 python3 backend/redaction/test_scrub_leak.py        # substitute any backend/**/test_*.py
-# Run all tests (no aggregate runner exists)
-for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done
+# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess)
+python3 backend/run_tests.py                         # add substrings to filter, e.g. `... soft_delete redaction`
 # Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md.
 cd start9/0.4 && make
 ```
@@ -64,7 +64,7 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
 ## Conventions

 - **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — not just list handlers but get-by-id and nested related-data sub-selects too (the 2026-06-12 audit found both leaking soft-deleted rows the list handlers already hid). (Thesis has a subtlety here — see the thesis guide.)
+- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.)
 - **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
 - **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.

@@ -100,10 +100,11 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
 _Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:74**. Longer-term backlog: `ROADMAP.md`._

 - **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
- **Deployed:** v0.1.0:74 is committed, pushed (`main` @ `aec2b77`), built, and **installed to the box** (`$START9_BOX_HOST` / immense-voyage.local now reports v0.1.0:74, up from v72). On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). **Unverified post-deploy:** service health after the v72→v74 migration, and the security fixes behaving live (no box CRM URL/auth on hand).
+- **Deployed & verified live (2026-06-13):** v0.1.0:74 is **installed and healthy on the box** (`$START9_BOX_HOST` / immense-voyage.local). Grant confirms login works; `/assets/` traversal 404s live (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
+- **Repo ahead of the box (committed, NOT yet built/deployed):** since v74, `main` adds the **list-view soft-delete aggregate fix** (`server.py`: org `contact_count`/`total_funded`, contacts `comm_count`/`last_contact_date` now filter `deleted_at`), three **regression tests** (traversal/soft-delete/NER), and an **aggregate test runner**. The deployed box is still pristine v74 — **bump the version before the next s9pk build** to ship these.
 - **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. Verified locally (py_compile, query exec, redaction/outreach tests, containment logic) + two reviewer passes.
- **Local verification (2026-06-12):** all documented commands run clean — `py_compile` OK, **13/13 backend tests green**, `./start.sh`/`./start_beta.sh` boot (health 200, auth 401), `make` builds the x86 s9pk (v0.1.0:74), `/assets/` traversal 404s locally (incl. URL-encoded). The 2 stale thesis tests are fixed (seed structure now documented in `docs/guides/thesis.md`). Box-only checks still open: live service health + security fixes on `$START9_BOX_HOST`.
+- **Tests (2026-06-13):** **16/16 backend tests green** via `python3 backend/run_tests.py` (the new aggregate runner; +3 regression tests this session). `py_compile` clean; `./start.sh`/`./start_beta.sh` boot (health 200, auth 401); `make` builds the x86 s9pk. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`).
 - **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only).
- **Known debt (P2, not deploy-blocking):** no aggregate test runner (the Commands `for` loop is it); `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
+- **Known debt (P2, not deploy-blocking):** the **reports subsystem** (`handle_dashboard_report`/`handle_pipeline_report`/`handle_lp_breakdown_report`, ~16 aggregate queries over contacts/opportunities/communications/lp_profiles) still counts soft-deleted rows — the list/detail aggregates were fixed (v74 + the org/contacts list-view follow-up) but the reports were not; needs its own pass + report-endpoint tests; `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
 - **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
- **Next:** 1) verify v0.1.0:74 live on the box — service health + `curl --path-as-is .../assets/../../data/crm.db` → 404; 2) clear P2 debt (next: aggregate test runner + add traversal/soft-delete/NER regression tests; 2 stale thesis tests already realigned); 3) Grant + Jonathan freeze v2.0 canonical; 4) build reply-all; 5) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
+- **Next:** 1) **reports-subsystem soft-delete sweep** — ~16 dashboard/pipeline/LP aggregate queries still count soft-deleted rows; fix + add report-endpoint tests; 2) **bump version + rebuild/redeploy** to ship the list-view fix + tests now sitting ahead of the box; 3) `?limit=abc` crash (P2); 4) Grant + Jonathan freeze v2.0 canonical; 5) build reply-all; 6) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.