Files
ten31-database/EVALUATION.md
T
Keysat aec2b7775b Harden privacy boundary and asset serving (v0.1.0:74)
Fixes from the 2026-06-12 full-eval (P0 + two P1s); code-only, no schema
change. Without these the "private CRM" premise was breachable on the LAN:

- P0: the /assets/ route joined the request path onto FRONTEND_DIR without
  normalizing '..' (get_path/urlparse pass it through), so an unauthenticated
  GET /assets/../../data/crm.db read any file the process could — the LP DB,
  the JWT signing secret (-> admin-token forgery), the Gmail key. Add a realpath
  containment check that 404s anything resolving outside FRONTEND_ROOT.
- P1: the LP-outreach drafter built its redaction Boundary with no ner_fn, so
  unknown people/firms in raw email bodies reached Claude in the clear. Pass the
  local-Qwen NER backstop (ner_fn=_ner_local), matching architect_grounding;
  fails closed via the existing scrub_unavailable path if the local model is down.
- P1: get-by-id handlers leaked soft-deleted records by direct ID. Add
  deleted_at IS NULL to every get-by-id path — contacts, organizations,
  opportunities, lp_profiles — and to the nested related-data sub-selects in
  the contact/opportunity detail payloads, matching the list-handler convention.

Bumps the package to v0.1.0:74 (utils.ts + versions/v0.1.0.74.ts + graph).
Full report in EVALUATION.md; remaining P2/P3 triaged in AGENTS.md Current state.
2026-06-12 18:01:48 -05:00

12 KiB

Evaluation — CRM (Ten31 Venture CRM + Agentic System) — 2026-06-12

Intent: A self-hosted venture-fund CRM for Ten31 (replacing Airtable to keep sensitive LP/prospect data off third-party servers) managing contacts/organizations/opportunities/communications/LP profiles, with a new in-house AI/agentic layer that drafts outreach and sharpens the investment thesis — packaged as a StartOS s9pk, currently Phase 0/1 (agents draft, humans send).

Agents run: evaluator, security-auditor, exerciser, start9-spec-checker. Skipped: reviewer (working tree is clean — no uncommitted diff to review).

Verdict

This is a functional, deliberately stdlib-only CRM (one ~5,400-line backend/server.py monolith over SQLite) with a genuinely well-engineered privacy layer bolted on; it largely achieves its intent, and the redaction boundary that keeps LP data out of Claude is the strongest part of the codebase. The headline risk is a single P0: an unauthenticated path-traversal in the /assets/ route (backend/server.py:1717) that runs before the auth gate and lets any LAN/Tailnet client read arbitrary files — the live LP database, the JWT signing secret (→ forge an admin token), and the Gmail service-account private key. A separate P1 punches a hole in the privacy premise from the other side: the outreach drafter sends raw email bodies to Claude with the NER backstop disabled, leaking any non-CRM names in cleartext. The project's own test suite is not green today — two thesis tests fail against the shipped v0.1.0:73 seed (stale assertions, not a runtime bug). Net: not ready to trust as "private" until the traversal and the outreach leak are closed; everything else is hygiene or polish.

Cross-referenced findings

  • Path traversal (the P0). Found independently by the evaluator (P1, server.py:1717-1718) and the security-auditor (P0, escalated because the same read also yields /data/.crm-secret → admin-token forgery and the Gmail DWD key). The exerciser ran a path-traversal probe and reported "blocked" — a false negative: curl collapses ../ client-side, so the test never sent a literal ... The two code-reading agents are correct (get_path()/urlparse does not normalize ..; a raw client like curl --path-as-is reaches it). Merged as one P0 with the auditor's higher severity, on the strength of the secret/key exposure.
  • Two thesis tests fail. The evaluator (P1) and the exerciser (P2) independently reproduced and root-caused the same failure: test_thesis_seed.py / test_thesis_actions.py assert the positioning variant_group has 2 members, but ensure_positioning_framings (added after the tests) seeds 5 more → 7. Merged as one P2 (no runtime impact; but "all tests pass" is false today).
  • X-Forwarded-For trusted for rate-limit/ban keying — flagged by both evaluator (P3) and auditor (P3). One P3.
  • CORS default * — flagged by both evaluator (P3) and auditor (P3); both note it's benign today (Bearer auth, no cookies). One P3.

Priority queue

  • [P0] Unauthenticated path traversal in /assets/ → arbitrary file read (LP DB, JWT secret → admin forgery, Gmail key); runs before auth — backend/server.py:1717-1732, 1671-1681 — evaluator + security-auditor (exerciser's "blocked" was a false negative)
  • [P1] Outreach drafter sends raw email bodies to Claude with NER backstop disabled (Boundary(...) built with no ner_fn), leaking non-CRM names — contradicts documented fail-closed design — backend/mcp/outreach_agent.py:230, 106-118, 151-170 — security-auditor
  • [P1] Soft-deleted contacts and organizations remain fully readable by direct ID (GET /api/{contacts,organizations}/{id} omit deleted_at IS NULL) — exerciser
  • [P2] Two thesis tests fail against the shipped seed (stale 2-vs-7 member assertion) — backend/test_thesis_seed.py:50, backend/test_thesis_actions.py:40 — evaluator + exerciser
  • [P2] Non-integer query params (?limit=abc, ?offset=abc) raise unhandled ValueError and crash the request thread → connection reset, no error body (8 list endpoints) — exerciser
  • [P2] 5,383-line single handler class with 6 near-identical copy-pasted CRUD update blocks — raises cost of safe change — backend/server.py:1523, 2267-2293 — evaluator
  • [P2] Frontend loads React + Babel-standalone from unpkg.com with no SRI — offline-fragile (contradicts data-sovereignty premise) + unpinned supply chain + in-browser transpile cost — frontend/index.html:9-11 — evaluator
  • [P2] TLS verification disabled on the scrub-gateway path (CERT_NONE, check_hostname=False); ships pre-wired, MITM exposes the LP-name dictionary if gateway backend enabled — backend/ingest/http_util.py:11-17, docker_entrypoint.sh:86 — security-auditor
  • [P2] cryptography==42.0.5 shipped in image carries a known bundled-OpenSSL advisory (used for Gmail RS256) — bump to ≥43 — start9/0.4/Dockerfile:50 — security-auditor
  • [P2] assets/ABOUT.md is stale and user-visible on the box: claims first-boot seeds the volume (removed in v0.1.0:40) — fresh install shows an empty CRM with no explanation — start9/0.4/assets/ABOUT.md:9 — start9-spec-checker
  • [P2] Hardcoded LAN IPs (192.168.1.72 Spark, 192.168.1.87 Qdrant) compiled into the s9pk (16 occurrences) — network change forces edit+rebuild+reinstall — start9/0.4/startos/actions/*.ts, docker_entrypoint.sh:85-87 — start9-spec-checker
  • [P3] X-Forwarded-For trusted verbatim for rate-limit/ban keying — spoofable to evade ban or poison another IP's bucket — backend/server.py:1588-1592 — evaluator + security-auditor
  • [P3] CORS default * — benign with Bearer auth today, pin in prod — backend/server.py:81 — evaluator + security-auditor
  • [P3] DB connections closed only on success paths (no try/finally) — hygiene, not a leak in practice — backend/server.py (many handlers) — evaluator
  • [P3] get_body reads Content-Length bytes with no max-size cap → memory-exhaustion DoS on write routes — backend/server.py:1538-1554 — security-auditor
  • [P3] Container runs as root (no USER in Dockerfile) — drop privileges — start9/0.4/Dockerfile — security-auditor
  • [P3] requirements.txt lists ~12 vulnerable/unused deps (e.g. python-jose 3.3.0, CVE-2024-33663) — not imported at runtime, but a trap if ever installed — backend/requirements.txt — security-auditor
  • [P3] gmailResult.gmail_url rendered into an href without scheme validation (server-generated today) — frontend/index.html:10134 — security-auditor
  • [P3] No uniqueness constraint on contacts.email — duplicates silently accepted — exerciser
  • [P3] Create-opportunity does not validate the stage field (only the PATCH stage route does) — arbitrary stage strings stored — exerciser
  • [P3] No server-side length limits on text fields (10k-char names accepted, stored, and exported) — exerciser
  • [P3] POST /api/fundraising/log-communication ignores a valid investor_id and demands row_id/investor_name — grid-vs-contacts model mismatch — exerciser
  • [P3] Deprecated datetime.utcnow() calls emit warnings (will break on a future Python) — exerciser
  • [P3] npm audit: fast-xml-parser/-builder transitive advisories (build-time only, via start-sdk) — npm audit fix — security-auditor
  • [P3] anthropic dependency unpinned — reproducibility/supply-chain gap — security-auditor
  • [P3] Manifest declares aarch64 but no native arm image is built (runs via QEMU; fastembed/mcp unverified on arm) — drop the arch or build it — start9/0.4/startos/manifest/index.ts:23 — start9-spec-checker
  • [P3] start9/0.4/README.md:39-40 describes the removed seed mechanism (developer-facing) — start9-spec-checker
  • [P3] packageRepo/upstreamRepo manifest URLs 404 (private/nonexistent) — fine for private use, fails registry validation — start9-spec-checker
  • [P3] Stale start9/0.4/javascript.tmp.1776377780/ build artifact on disk (gitignored, harmless) — start9-spec-checker

Scorecard

The evaluator's six-lens table, unadjusted (other agents' evidence reinforces but does not contradict it):

Lens Score /5 Justification
Architecture 3 Clean module separation (redaction/ingest/mcp/email_integration) + consistent central dispatch, but all CRM logic in one 5,383-line handler — server.py:1523.
Security 2 Strong auth crypto (pbkdf2 200k, compare_digest, pinned HS256) and clean secrets hygiene, undermined by the P0 traversal + the P1 outreach leak the auditor added — server.py:1717, outreach_agent.py:230.
Performance 4 Per-request connections + WAL + 42 indexes + locked abuse state, right-sized for ~400 records / 5 users.
Testing 3 13 fast isolated tests covering the hard parts (redaction leak-hunts, grounding boundary), but 2 are red against the current seed.
Code quality 3 Comments explain why well; the monolith and copy-pasted CRUD blocks raise change cost.
Documentation 5 AGENTS.md + 6 scoped guides are accurate and verifiable (the "FastAPI is vestigial" claim checks out — zero runtime imports).

Note: the auditor's P0 escalation and added P1 leak both land on the Security lens; they corroborate the score of 2 rather than push it lower (it was already the floor of the table).

Disagreements & gaps

  • Path traversal — exerciser vs. the two readers. The exerciser reported path traversal "blocked"; the evaluator and auditor both found it exploitable by code reading. Resolution: the exerciser's tool (curl) normalized ../ before sending, so the probe never tested the vuln — it is real and P0. Lesson: black-box probes for traversal must use a raw, non-normalizing client (curl --path-as-is).
  • Shared blind spot — the differentiating Phase-1 AI paths are unverified at runtime. No agent could exercise live Claude/Anthropic calls (no ANTHROPIC_API_KEY in env), Qdrant ingest (/api/index/*), or Gmail draft creation. So /api/outreach/draft, /api/architect/ground, and thesis generation were reached only in their degraded/no-key form. The P1 outreach-leak finding is from code reading, not a live capture — confidence is high but a live request would confirm it. This is the one gap every agent shares.
  • StartOS spec — two UNVERIFIED items. The spec-checker could not confirm whether 0.4 still requires instructions.md/prepare.sh (docs pages 404'd) or measure the expanded image size; start-sdk verify failed only because the machine has the wrong-era (0.3.5) binary — start-cli s9pk inspect succeeds and the artifact is valid. No packaging blocker for private sideload.

Suggested order of work

  1. Close the P0 traversal firstos.path.realpath + commonpath containment check (or stdlib translate_path) on the /assets/ branch in backend/server.py:1717, and confirm with curl --path-as-is 'http://host/assets/../../data/crm.db'. Until this is fixed the "private" claim is false; do not deploy.
  2. Fix the P1 outreach leak — pass ner_fn=_ner_local to the Boundary in outreach_agent.py:230 (mirroring architect_grounding.py), fail closed if NER is unreachable, add a minimize-first pass. Both #1 and #2 directly protect the LP data the project exists to protect.
  3. Make the test suite green and authoritative — fix the two stale thesis assertions (assert structurally, not on an exact count) and add a one-line aggregate runner so "do the tests pass" has a single answer; then the suite can gate the next deploy.
  4. Fix the two functional bugs — add AND deleted_at IS NULL to the get-by-ID handlers (P1), and wrap query-param int() parsing to return a 400 instead of crashing the thread (P2).
  5. Deploy-prep the package — update ABOUT.md (and README) to current first-boot behavior so a fresh install isn't a mystery empty CRM, and lift the hardcoded Spark/Qdrant IPs into config/env before the box ever moves networks.
  6. Then verify the live Phase-1 paths on the box with a real ANTHROPIC_API_KEY — the outreach/architect/thesis features that no agent could exercise here.
  7. Hardening sweep (P3 batch) — bump cryptography, drop the vestigial vulnerable requirements.txt entries, stop trusting X-Forwarded-For, cap request-body size, run as non-root.