ten31-database/EVALUATION.md

# Evaluation — CRM (Ten31 Venture CRM + Agentic System) — 2026-06-12

Intent: A self-hosted venture-fund CRM for Ten31 (replacing Airtable to keep sensitive LP/prospect data off third-party servers) managing contacts/organizations/opportunities/communications/LP profiles, with a new in-house AI/agentic layer that drafts outreach and sharpens the investment thesis — packaged as a StartOS s9pk, currently Phase 0/1 (agents draft, humans send).

Agents run: evaluator, security-auditor, exerciser, start9-spec-checker. Skipped: reviewer (working tree is clean — no uncommitted diff to review).

## Verdict

This is a functional, deliberately stdlib-only CRM (one ~5,400-line `backend/server.py` monolith over SQLite) with a genuinely well-engineered privacy layer bolted on; it largely achieves its intent, and the redaction boundary that keeps LP data out of Claude is the strongest part of the codebase. **The headline risk is a single P0: an unauthenticated path-traversal in the `/assets/` route (`backend/server.py:1717`) that runs *before* the auth gate and lets any LAN/Tailnet client read arbitrary files — the live LP database, the JWT signing secret (→ forge an admin token), and the Gmail service-account private key.** A separate P1 punches a hole in the privacy premise from the other side: the outreach drafter sends raw email bodies to Claude with the NER backstop disabled, leaking any non-CRM names in cleartext. The project's own test suite is not green today — two thesis tests fail against the shipped v0.1.0:73 seed (stale assertions, not a runtime bug). Net: not ready to trust as "private" until the traversal and the outreach leak are closed; everything else is hygiene or polish.

## Cross-referenced findings

- **Path traversal (the P0).** Found independently by the **evaluator** (P1, `server.py:1717-1718`) and the **security-auditor** (P0, escalated because the same read also yields `/data/.crm-secret` → admin-token forgery and the Gmail DWD key). The **exerciser** ran a path-traversal probe and reported "blocked" — a **false negative**: `curl` collapses `../` client-side, so the test never sent a literal `..`. The two code-reading agents are correct (`get_path()`/`urlparse` does not normalize `..`; a raw client like `curl --path-as-is` reaches it). Merged as **one P0** with the auditor's higher severity, on the strength of the secret/key exposure.
- **Two thesis tests fail.** The **evaluator** (P1) and the **exerciser** (P2) independently reproduced and root-caused the same failure: `test_thesis_seed.py` / `test_thesis_actions.py` assert the `positioning` variant_group has 2 members, but `ensure_positioning_framings` (added after the tests) seeds 5 more → 7. Merged as one **P2** (no runtime impact; but "all tests pass" is false today).
- **`X-Forwarded-For` trusted** for rate-limit/ban keying — flagged by both **evaluator** (P3) and **auditor** (P3). One P3.
- **CORS default `*`** — flagged by both **evaluator** (P3) and **auditor** (P3); both note it's benign today (Bearer auth, no cookies). One P3.

## Priority queue

- [P0] Unauthenticated path traversal in `/assets/` → arbitrary file read (LP DB, JWT secret → admin forgery, Gmail key); runs before auth — `backend/server.py:1717-1732`, `1671-1681` — evaluator + security-auditor (exerciser's "blocked" was a false negative)
- [P1] Outreach drafter sends raw email bodies to Claude with NER backstop disabled (`Boundary(...)` built with no `ner_fn`), leaking non-CRM names — contradicts documented fail-closed design — `backend/mcp/outreach_agent.py:230`, `106-118`, `151-170` — security-auditor
- [P1] Soft-deleted contacts and organizations remain fully readable by direct ID (`GET /api/{contacts,organizations}/{id}` omit `deleted_at IS NULL`) — exerciser
- [P2] Two thesis tests fail against the shipped seed (stale 2-vs-7 member assertion) — `backend/test_thesis_seed.py:50`, `backend/test_thesis_actions.py:40` — evaluator + exerciser
- [P2] Non-integer query params (`?limit=abc`, `?offset=abc`) raise unhandled `ValueError` and crash the request thread → connection reset, no error body (8 list endpoints) — exerciser
- [P2] 5,383-line single handler class with 6 near-identical copy-pasted CRUD update blocks — raises cost of safe change — `backend/server.py:1523`, `2267-2293` — evaluator
- [P2] Frontend loads React + Babel-standalone from `unpkg.com` with no SRI — offline-fragile (contradicts data-sovereignty premise) + unpinned supply chain + in-browser transpile cost — `frontend/index.html:9-11` — evaluator
- [P2] TLS verification disabled on the scrub-gateway path (`CERT_NONE`, `check_hostname=False`); ships pre-wired, MITM exposes the LP-name dictionary if gateway backend enabled — `backend/ingest/http_util.py:11-17`, `docker_entrypoint.sh:86` — security-auditor
- [P2] `cryptography==42.0.5` shipped in image carries a known bundled-OpenSSL advisory (used for Gmail RS256) — bump to ≥43 — `start9/0.4/Dockerfile:50` — security-auditor
- [P2] `assets/ABOUT.md` is stale and user-visible on the box: claims first-boot seeds the volume (removed in v0.1.0:40) — fresh install shows an empty CRM with no explanation — `start9/0.4/assets/ABOUT.md:9` — start9-spec-checker
- [P2] Hardcoded LAN IPs (`192.168.1.72` Spark, `192.168.1.87` Qdrant) compiled into the s9pk (16 occurrences) — network change forces edit+rebuild+reinstall — `start9/0.4/startos/actions/*.ts`, `docker_entrypoint.sh:85-87` — start9-spec-checker
- [P3] `X-Forwarded-For` trusted verbatim for rate-limit/ban keying — spoofable to evade ban or poison another IP's bucket — `backend/server.py:1588-1592` — evaluator + security-auditor
- [P3] CORS default `*` — benign with Bearer auth today, pin in prod — `backend/server.py:81` — evaluator + security-auditor
- [P3] DB connections closed only on success paths (no `try/finally`) — hygiene, not a leak in practice — `backend/server.py` (many handlers) — evaluator
- [P3] `get_body` reads `Content-Length` bytes with no max-size cap → memory-exhaustion DoS on write routes — `backend/server.py:1538-1554` — security-auditor
- [P3] Container runs as root (no `USER` in Dockerfile) — drop privileges — `start9/0.4/Dockerfile` — security-auditor
- [P3] `requirements.txt` lists ~12 vulnerable/unused deps (e.g. python-jose 3.3.0, CVE-2024-33663) — not imported at runtime, but a trap if ever installed — `backend/requirements.txt` — security-auditor
- [P3] `gmailResult.gmail_url` rendered into an `href` without scheme validation (server-generated today) — `frontend/index.html:10134` — security-auditor
- [P3] No uniqueness constraint on `contacts.email` — duplicates silently accepted — exerciser
- [P3] Create-opportunity does not validate the `stage` field (only the PATCH stage route does) — arbitrary stage strings stored — exerciser
- [P3] No server-side length limits on text fields (10k-char names accepted, stored, and exported) — exerciser
- [P3] `POST /api/fundraising/log-communication` ignores a valid `investor_id` and demands `row_id`/`investor_name` — grid-vs-contacts model mismatch — exerciser
- [P3] Deprecated `datetime.utcnow()` calls emit warnings (will break on a future Python) — exerciser
- [P3] `npm audit`: fast-xml-parser/-builder transitive advisories (build-time only, via start-sdk) — `npm audit fix` — security-auditor
- [P3] `anthropic` dependency unpinned — reproducibility/supply-chain gap — security-auditor
- [P3] Manifest declares `aarch64` but no native arm image is built (runs via QEMU; `fastembed`/`mcp` unverified on arm) — drop the arch or build it — `start9/0.4/startos/manifest/index.ts:23` — start9-spec-checker
- [P3] `start9/0.4/README.md:39-40` describes the removed seed mechanism (developer-facing) — start9-spec-checker
- [P3] `packageRepo`/`upstreamRepo` manifest URLs 404 (private/nonexistent) — fine for private use, fails registry validation — start9-spec-checker
- [P3] Stale `start9/0.4/javascript.tmp.1776377780/` build artifact on disk (gitignored, harmless) — start9-spec-checker

## Scorecard

The evaluator's six-lens table, unadjusted (other agents' evidence reinforces but does not contradict it):

| Lens | Score /5 | Justification |
|---|---|---|
| Architecture | 3 | Clean module separation (`redaction`/`ingest`/`mcp`/`email_integration`) + consistent central dispatch, but all CRM logic in one 5,383-line handler — `server.py:1523`. |
| Security | 2 | Strong auth crypto (pbkdf2 200k, `compare_digest`, pinned HS256) and clean secrets hygiene, undermined by the P0 traversal + the P1 outreach leak the auditor added — `server.py:1717`, `outreach_agent.py:230`. |
| Performance | 4 | Per-request connections + WAL + 42 indexes + locked abuse state, right-sized for ~400 records / 5 users. |
| Testing | 3 | 13 fast isolated tests covering the hard parts (redaction leak-hunts, grounding boundary), but 2 are red against the current seed. |
| Code quality | 3 | Comments explain *why* well; the monolith and copy-pasted CRUD blocks raise change cost. |
| Documentation | 5 | AGENTS.md + 6 scoped guides are accurate and verifiable (the "FastAPI is vestigial" claim checks out — zero runtime imports). |

Note: the auditor's P0 escalation and added P1 leak both land on the Security lens; they corroborate the score of 2 rather than push it lower (it was already the floor of the table).

## Disagreements & gaps

- **Path traversal — exerciser vs. the two readers.** The exerciser reported path traversal "blocked"; the evaluator and auditor both found it exploitable by code reading. Resolution: the exerciser's tool (`curl`) normalized `../` before sending, so the probe never tested the vuln — it is **real and P0**. Lesson: black-box probes for traversal must use a raw, non-normalizing client (`curl --path-as-is`).
- **Shared blind spot — the differentiating Phase-1 AI paths are unverified at runtime.** No agent could exercise live Claude/Anthropic calls (no `ANTHROPIC_API_KEY` in env), Qdrant ingest (`/api/index/*`), or Gmail draft creation. So `/api/outreach/draft`, `/api/architect/ground`, and thesis generation were reached only in their degraded/no-key form. The P1 outreach-leak finding is from code reading, not a live capture — confidence is high but a live request would confirm it. This is the one gap every agent shares.
- **StartOS spec — two UNVERIFIED items.** The spec-checker could not confirm whether 0.4 still requires `instructions.md`/`prepare.sh` (docs pages 404'd) or measure the expanded image size; `start-sdk verify` failed only because the machine has the wrong-era (0.3.5) binary — `start-cli s9pk inspect` succeeds and the artifact is valid. No packaging blocker for private sideload.

## Suggested order of work

1. **Close the P0 traversal first** — `os.path.realpath` + `commonpath` containment check (or stdlib `translate_path`) on the `/assets/` branch in `backend/server.py:1717`, and confirm with `curl --path-as-is 'http://host/assets/../../data/crm.db'`. Until this is fixed the "private" claim is false; do not deploy.
2. **Fix the P1 outreach leak** — pass `ner_fn=_ner_local` to the `Boundary` in `outreach_agent.py:230` (mirroring `architect_grounding.py`), fail closed if NER is unreachable, add a minimize-first pass. Both #1 and #2 directly protect the LP data the project exists to protect.
3. **Make the test suite green and authoritative** — fix the two stale thesis assertions (assert structurally, not on an exact count) and add a one-line aggregate runner so "do the tests pass" has a single answer; then the suite can gate the next deploy.
4. **Fix the two functional bugs** — add `AND deleted_at IS NULL` to the get-by-ID handlers (P1), and wrap query-param `int()` parsing to return a 400 instead of crashing the thread (P2).
5. **Deploy-prep the package** — update `ABOUT.md` (and README) to current first-boot behavior so a fresh install isn't a mystery empty CRM, and lift the hardcoded Spark/Qdrant IPs into config/env before the box ever moves networks.
6. **Then verify the live Phase-1 paths on the box** with a real `ANTHROPIC_API_KEY` — the outreach/architect/thesis features that no agent could exercise here.
7. **Hardening sweep** (P3 batch) — bump `cryptography`, drop the vestigial vulnerable `requirements.txt` entries, stop trusting `X-Forwarded-For`, cap request-body size, run as non-root.