diff --git a/AGENTS.md b/AGENTS.md index b590668..858488b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,8 +25,8 @@ python3 -m py_compile backend/server.py # Run ONE test (tests are standalone scripts with `if __name__ == "__main__"`; no pytest installed) python3 backend/redaction/test_scrub_leak.py # substitute any backend/**/test_*.py -# Run all tests (no aggregate runner exists) -for t in $(find backend -name 'test_*.py'); do echo "== $t"; python3 "$t" || break; done +# Run all tests (aggregate runner — runs each backend/**/test_*.py in its own subprocess) +python3 backend/run_tests.py # add substrings to filter, e.g. `... soft_delete redaction` # Build + install the s9pk — BUMP THE VERSION FIRST. See docs/guides/packaging.md. cd start9/0.4 && make ``` @@ -64,7 +64,7 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude ## Conventions - **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`. -- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — not just list handlers but get-by-id and nested related-data sub-selects too (the 2026-06-12 audit found both leaking soft-deleted rows the list handlers already hid). (Thesis has a subtlety here — see the thesis guide.) +- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — list handlers, get-by-id, nested related-data sub-selects, **and aggregate sub-selects (`COUNT`/`SUM`/`MAX`)**. Audits found leaks in all of these (2026-06-12 detail + nested; 2026-06-13 list-view `contact_count`/`total_funded`/`comm_count`); the **reports** subsystem aggregates still leak (see Current state). Regression-guarded by `backend/test_soft_delete_reads.py`. (Thesis has a subtlety here — see the thesis guide.) - **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`. - **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user. @@ -100,10 +100,11 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude _Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:74**. Longer-term backlog: `ROADMAP.md`._ - **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation. -- **Deployed:** v0.1.0:74 is committed, pushed (`main` @ `aec2b77`), built, and **installed to the box** (`$START9_BOX_HOST` / immense-voyage.local now reports v0.1.0:74, up from v72). On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). **Unverified post-deploy:** service health after the v72→v74 migration, and the security fixes behaving live (no box CRM URL/auth on hand). +- **Deployed & verified live (2026-06-13):** v0.1.0:74 is **installed and healthy on the box** (`$START9_BOX_HOST` / immense-voyage.local). Grant confirms login works; `/assets/` traversal 404s live (plain + URL-encoded), root health 200. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible). +- **Repo ahead of the box (committed, NOT yet built/deployed):** since v74, `main` adds the **list-view soft-delete aggregate fix** (`server.py`: org `contact_count`/`total_funded`, contacts `comm_count`/`last_contact_date` now filter `deleted_at`), three **regression tests** (traversal/soft-delete/NER), and an **aggregate test runner**. The deployed box is still pristine v74 — **bump the version before the next s9pk build** to ship these. - **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. Verified locally (py_compile, query exec, redaction/outreach tests, containment logic) + two reviewer passes. -- **Local verification (2026-06-12):** all documented commands run clean — `py_compile` OK, **13/13 backend tests green**, `./start.sh`/`./start_beta.sh` boot (health 200, auth 401), `make` builds the x86 s9pk (v0.1.0:74), `/assets/` traversal 404s locally (incl. URL-encoded). The 2 stale thesis tests are fixed (seed structure now documented in `docs/guides/thesis.md`). Box-only checks still open: live service health + security fixes on `$START9_BOX_HOST`. +- **Tests (2026-06-13):** **16/16 backend tests green** via `python3 backend/run_tests.py` (the new aggregate runner; +3 regression tests this session). `py_compile` clean; `./start.sh`/`./start_beta.sh` boot (health 200, auth 401); `make` builds the x86 s9pk. The 2 stale thesis tests stay fixed (seed structure in `docs/guides/thesis.md`). - **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only). -- **Known debt (P2, not deploy-blocking):** no aggregate test runner (the Commands `for` loop is it); `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`. +- **Known debt (P2, not deploy-blocking):** the **reports subsystem** (`handle_dashboard_report`/`handle_pipeline_report`/`handle_lp_breakdown_report`, ~16 aggregate queries over contacts/opportunities/communications/lp_profiles) still counts soft-deleted rows — the list/detail aggregates were fixed (v74 + the org/contacts list-view follow-up) but the reports were not; needs its own pass + report-endpoint tests; `?limit=abc` crashes the request thread (authenticated list path); scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`. - **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box. -- **Next:** 1) verify v0.1.0:74 live on the box — service health + `curl --path-as-is .../assets/../../data/crm.db` → 404; 2) clear P2 debt (next: aggregate test runner + add traversal/soft-delete/NER regression tests; 2 stale thesis tests already realigned); 3) Grant + Jonathan freeze v2.0 canonical; 4) build reply-all; 5) confirm Appendix-A + Maple/OpenSecret/Primal, then promote. +- **Next:** 1) **reports-subsystem soft-delete sweep** — ~16 dashboard/pipeline/LP aggregate queries still count soft-deleted rows; fix + add report-endpoint tests; 2) **bump version + rebuild/redeploy** to ship the list-view fix + tests now sitting ahead of the box; 3) `?limit=abc` crash (P2); 4) Grant + Jonathan freeze v2.0 canonical; 5) build reply-all; 6) confirm Appendix-A + Maple/OpenSecret/Primal, then promote. diff --git a/backend/mcp/test_outreach_redaction.py b/backend/mcp/test_outreach_redaction.py new file mode 100644 index 0000000..cd61de1 --- /dev/null +++ b/backend/mcp/test_outreach_redaction.py @@ -0,0 +1,123 @@ +#!/usr/bin/env python3 +"""Regression test for the outreach NER-backstop wiring (v0.1.0:74). + +The outreach draft path scrubs free-prose LP context (CRM notes + email bodies) before +it reaches Claude. The dictionary+regex floor only tokenizes KNOWN CRM entities, so an +UNKNOWN person/firm mentioned in an email body would otherwise reach Claude in the clear. +The v74 fix wired the local-Qwen NER backstop into draft_outreach (outreach_agent.py: +`Boundary(..., ner_fn=_ner_local)`) and made it FAIL CLOSED when the local model is down. + +This drives the real draft_outreach with Claude and the NER model stubbed (offline, +synthetic — guardrail #9) and proves: + (1) an unknown name in an email body is tokenized AWAY from the Claude payload; + (2) it is re-hydrated locally so the human still sees the real name; + (3) the interaction_log captures no sensitive value; + (4) when the local NER model raises (unreachable), the path returns scrub_unavailable + and Claude is never called. + +Run: cd backend && python3 mcp/test_outreach_redaction.py +""" +import os +import sqlite3 +import sys +import tempfile + +_HERE = os.path.dirname(os.path.abspath(__file__)) +sys.path.insert(0, _HERE) # backend/mcp +sys.path.insert(0, os.path.dirname(_HERE)) # backend (for the redaction package) + +import outreach_agent as oa # noqa: E402 +import architect_grounding as G # noqa: E402 +import architect_agent as aa # noqa: E402 (imports OK offline; client is lazy) + +FAILS = [] + +UNKNOWN = "Penelope Ashworth-Vane" # a person in NO CRM table -> only NER can catch her +INVESTOR = "Harbor & Vine" # a known org (fundraising_investors) -> dictionary floor + + +def check(cond, msg): + print((" PASS " if cond else " FAIL ") + msg) + if not cond: + FAILS.append(msg) + + +def make_db(): + path = os.path.join(tempfile.mkdtemp(), "crm.db") + c = sqlite3.connect(path) + c.row_factory = sqlite3.Row + c.executescript(""" + CREATE TABLE fundraising_investors (id TEXT PRIMARY KEY, investor_name TEXT, notes TEXT); + CREATE TABLE emails (id TEXT PRIMARY KEY, subject TEXT, body_text TEXT, snippet TEXT, sent_at TEXT, + from_email TEXT, to_emails_json TEXT, thread_id TEXT, is_matched INT); + CREATE TABLE email_investor_links (id TEXT, email_id TEXT, fundraising_investor_id TEXT); + CREATE TABLE interaction_log (id TEXT PRIMARY KEY, ts TEXT, actor_type TEXT, actor_id TEXT, action TEXT, + target_type TEXT, target_id TEXT, payload TEXT, source TEXT, created_at TEXT); + """) + c.execute("INSERT INTO fundraising_investors VALUES ('inv1',?,?)", + (INVESTOR, "Warm on Fund III; weighing lock-up terms.")) + # The active-thread email body names an UNKNOWN person in free prose. + c.execute("INSERT INTO emails (id,subject,body_text,sent_at,thread_id,is_matched) VALUES " + "('e1','Re: Fund III',?,?,'t1',1)", + (f"Thanks for the call. My partner {UNKNOWN} still has a lock-up objection.", "2026-06-02T10:00:00")) + c.execute("INSERT INTO email_investor_links (id,email_id,fundraising_investor_id) VALUES ('l1','e1','inv1')") + c.commit() + return path, c + + +def main(): + db_path, conn = make_db() + + # Stub the thesis fetch (avoid the thesis DB dependency) and Claude. The NER stub stands + # in for the local-Qwen model; _draft_with_claude echoes the de-identified text back so + # re-hydration is exercised and we can inspect exactly what would have reached Claude. + aa.at.get_thesis = lambda *a, **k: {} + captured = {} + + def fake_claude(aa_mod, thesis, type_desc, deident_target, deident_voice, guidance): + captured["target"] = deident_target + return deident_target # passthrough -> rehydrate must restore the real name + + oa._draft_with_claude = fake_claude + G._ner_local = lambda text: [(UNKNOWN, "PERSON")] # local model UP, finds the unknown name + + # ── A) unknown name is tokenized away from Claude, restored locally ── + print("\n[A — NER backstop tokenizes an unknown name in outreach]") + res = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None) + check(res.get("status") == "ok", f"draft ok (status={res.get('status')})") + sent = captured.get("target", "") + check(UNKNOWN not in sent, "unknown name absent from the Claude payload (NER tokenized it)") + check(INVESTOR not in sent, "known investor org absent from the Claude payload (dictionary floor)") + check("lock-up" in sent, "objection substance survives to Claude") + check(UNKNOWN in res.get("draft", ""), "unknown name re-hydrated locally for the human") + + blob = " ".join(r[0] for r in conn.execute("SELECT payload FROM interaction_log WHERE payload IS NOT NULL")) + check(UNKNOWN not in blob and INVESTOR not in blob, "interaction_log carries NO sensitive value") + + # ── B) FAIL CLOSED: local NER model unreachable -> no Claude call ── + print("\n[B — fail closed: local NER model down]") + called = {"claude": False} + + def boom(text): + raise RuntimeError("Spark Control unreachable") + + G._ner_local = boom + oa._draft_with_claude = lambda *a, **k: called.__setitem__("claude", True) or a[3] + res2 = oa.draft_outreach(conn, "inv1", "follow_up", "", db_path, sender_email=None) + check(res2.get("status") == "scrub_unavailable", f"status scrub_unavailable (got {res2.get('status')})") + check(bool(res2.get("reason")), "scrub_unavailable carries the propagated NER failure reason (non-vacuous)") + check(called["claude"] is False, "Claude was NOT called when the NER model is down (fail closed)") + check("draft" not in res2, "no draft returned when scrub fails closed") + + conn.close() + print() + if FAILS: + print(f"FAILED ({len(FAILS)}):") + for f in FAILS: + print(f" - {f}") + sys.exit(1) + print("ALL PASS (outreach NER-backstop wiring + fail-closed)") + + +if __name__ == "__main__": + main() diff --git a/backend/run_tests.py b/backend/run_tests.py new file mode 100644 index 0000000..d4582be --- /dev/null +++ b/backend/run_tests.py @@ -0,0 +1,67 @@ +#!/usr/bin/env python3 +"""Aggregate test runner for the backend suite. + +The backend tests are standalone scripts (each with `if __name__ == "__main__"`, no +pytest). This discovers every backend/**/test_*.py and runs each in its OWN subprocess +(tests set os.environ and import `server` with different configs, so isolation matters), +prints a one-line PASS/FAIL per test, dumps output only for failures, and exits non-zero +if any test fails. + +Run: python3 backend/run_tests.py (from the repo root) + or: cd backend && python3 run_tests.py +Filter: python3 backend/run_tests.py soft_delete redaction # substring match on path +""" +import os +import subprocess +import sys +import time + +BACKEND = os.path.dirname(os.path.abspath(__file__)) + + +def discover(filters): + found = [] + for root, dirs, files in os.walk(BACKEND): + dirs[:] = [d for d in dirs if d != "__pycache__"] + for f in files: + if f.startswith("test_") and f.endswith(".py"): + path = os.path.join(root, f) + rel = os.path.relpath(path, BACKEND) + if not filters or any(flt in rel for flt in filters): + found.append(path) + return sorted(found) + + +def main(): + filters = sys.argv[1:] + tests = discover(filters) + if not tests: + print("No tests matched.") + sys.exit(1) + print(f"Running {len(tests)} backend test(s)\n") + + passed, failed = [], [] + t0 = time.time() + for path in tests: + rel = os.path.relpath(path, BACKEND) + proc = subprocess.run([sys.executable, path], cwd=BACKEND, + stdout=subprocess.PIPE, stderr=subprocess.STDOUT) + if proc.returncode == 0: + passed.append(rel) + print(f" PASS {rel}") + else: + failed.append(rel) + print(f" FAIL {rel}") + sys.stdout.write(proc.stdout.decode("utf-8", "replace").rstrip() + "\n") + + print(f"\n{len(passed)}/{len(tests)} passed in {time.time() - t0:.1f}s") + if failed: + print("FAILED:") + for f in failed: + print(f" - {f}") + sys.exit(1) + print("ALL PASS") + + +if __name__ == "__main__": + main() diff --git a/backend/server.py b/backend/server.py index 504c454..b0011c9 100644 --- a/backend/server.py +++ b/backend/server.py @@ -2136,8 +2136,8 @@ class CRMHandler(BaseHTTPRequestHandler): conn = get_db() query = """ SELECT c.*, o.name as organization_name, - (SELECT COUNT(*) FROM communications WHERE contact_id = c.id) as comm_count, - (SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id) as last_contact_date + (SELECT COUNT(*) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as comm_count, + (SELECT MAX(communication_date) FROM communications WHERE contact_id = c.id AND deleted_at IS NULL) as last_contact_date FROM contacts c LEFT JOIN organizations o ON c.organization_id = o.id WHERE 1=1 AND c.deleted_at IS NULL @@ -2345,8 +2345,8 @@ class CRMHandler(BaseHTTPRequestHandler): conn = get_db() query = """ SELECT o.*, - (SELECT COUNT(*) FROM contacts WHERE organization_id = o.id) as contact_count, - (SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded') as total_funded + (SELECT COUNT(*) FROM contacts WHERE organization_id = o.id AND deleted_at IS NULL) as contact_count, + (SELECT COALESCE(SUM(commitment_amount), 0) FROM opportunities WHERE organization_id = o.id AND stage = 'funded' AND deleted_at IS NULL) as total_funded FROM organizations o WHERE 1=1 AND o.deleted_at IS NULL """ args = [] diff --git a/backend/test_assets_traversal.py b/backend/test_assets_traversal.py new file mode 100644 index 0000000..0f5341f --- /dev/null +++ b/backend/test_assets_traversal.py @@ -0,0 +1,126 @@ +#!/usr/bin/env python3 +"""Regression test for the /assets/ path-traversal containment fix (v0.1.0:74). + +Before the fix, get_path()/urlparse did NOT normalize '..', so an unauthenticated +GET /assets/../../data/crm.db (raw client, no client-side normalization) escaped the +frontend root and read any file the process could — the LP DB, the JWT secret, the +Gmail key. The fix resolves the target with os.path.realpath and 404s anything that +does not stay under FRONTEND_ROOT (server.py, the `/assets/` branch of do_GET). + +This boots the REAL server in-process against a throwaway frontend root, plants a +decoy "secret" OUTSIDE that root, and proves: (1) traversal vectors that resolve to a +real readable file outside the root still 404 and leak no bytes; (2) the live crm.db +path is 404'd; (3) URL-encoded separators don't help; (4) a legit in-bounds asset +still serves 200 (the fix isn't over-broad). Synthetic only (guardrail #9). + +Run: cd backend && python3 test_assets_traversal.py +""" +import http.client +import os +import sys +import tempfile +import threading +from http.server import ThreadingHTTPServer + +# Lay out a throwaway tree BEFORE importing server (FRONTEND_DIR/ROOT resolve at import): +# base/frontend/{index.html,assets/app.css} <- the served root +# base/secret.txt <- a real file a traversal would target +# base/data/crm.db <- the live DB, created by init_db() +_BASE = tempfile.mkdtemp() +_FRONTEND = os.path.join(_BASE, "frontend") +os.makedirs(os.path.join(_FRONTEND, "assets")) +_DATA = os.path.join(_BASE, "data") +os.makedirs(_DATA) +with open(os.path.join(_FRONTEND, "index.html"), "w") as f: + f.write("