Harden privacy boundary and asset serving (v0.1.0:74)
Fixes from the 2026-06-12 full-eval (P0 + two P1s); code-only, no schema change. Without these the "private CRM" premise was breachable on the LAN: - P0: the /assets/ route joined the request path onto FRONTEND_DIR without normalizing '..' (get_path/urlparse pass it through), so an unauthenticated GET /assets/../../data/crm.db read any file the process could — the LP DB, the JWT signing secret (-> admin-token forgery), the Gmail key. Add a realpath containment check that 404s anything resolving outside FRONTEND_ROOT. - P1: the LP-outreach drafter built its redaction Boundary with no ner_fn, so unknown people/firms in raw email bodies reached Claude in the clear. Pass the local-Qwen NER backstop (ner_fn=_ner_local), matching architect_grounding; fails closed via the existing scrub_unavailable path if the local model is down. - P1: get-by-id handlers leaked soft-deleted records by direct ID. Add deleted_at IS NULL to every get-by-id path — contacts, organizations, opportunities, lp_profiles — and to the nested related-data sub-selects in the contact/opportunity detail payloads, matching the list-handler convention. Bumps the package to v0.1.0:74 (utils.ts + versions/v0.1.0.74.ts + graph). Full report in EVALUATION.md; remaining P2/P3 triaged in AGENTS.md Current state.
This commit is contained in:
@@ -64,7 +64,7 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
|
|||||||
## Conventions
|
## Conventions
|
||||||
|
|
||||||
- **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
|
- **Two coexisting investor models** (classic `contacts`/`lp_profiles` + the `fundraising_*` grid). Reconciling them to canonical IDs is the core entity-resolution task — see `docs/crm-overview.md`.
|
||||||
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. (Thesis has a subtlety here — see the thesis guide.)
|
- **Soft-delete only:** `deleted_at` and/or `status='retired'`; never hard-delete. Every READ path must filter `deleted_at IS NULL` — not just list handlers but get-by-id and nested related-data sub-selects too (the 2026-06-12 audit found both leaking soft-deleted rows the list handlers already hid). (Thesis has a subtlety here — see the thesis guide.)
|
||||||
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
|
- **Env:** secrets in `.env` (gitignored); names in `.env.example`. Verified names: `ANTHROPIC_API_KEY`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `X_API_KEY`, `CRM_DB_PATH`, `CRM_DEV_DB_PATH`. Also used: `CRM_SECRET_KEY` (beta/prod), `CRM_HOST`/`CRM_PORT`, `CRM_DATA_DIR`.
|
||||||
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
|
- **Commit style:** imperative subject, concise body explaining the *why*; put the package version in the subject (`… (v0.1.0:NN)`) for shippable changes. **No AI co-author / attribution trailers** — commits are authored by the user.
|
||||||
|
|
||||||
@@ -97,10 +97,12 @@ Subsystem rules live in `docs/guides/` and lazy-load in Claude Code via `.claude
|
|||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
_Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:73**. Longer-term backlog: `ROADMAP.md`._
|
_Phase 0 substrate + Phase 1 thesis/outreach are built; current package is **v0.1.0:74**. Longer-term backlog: `ROADMAP.md`._
|
||||||
|
|
||||||
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
|
- **Working (all draft-only):** CRM + ingest (chunk→embed→Qdrant + retrieval) + redaction boundary; Gmail capture (DWD) + email-activity propose→approve; Thesis Workshop + Architect (Claude) with dual-approval gate; Outreach Draft Assistant + follow-up radar + per-user voice + Tier-B in-thread Gmail draft creation.
|
||||||
- **In progress:** v0.1.0:73 is committed and built but **not installed** — the box (`$START9_BOX_HOST`) runs v0.1.0:72, awaiting deploy authorization. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
|
- **In progress:** v0.1.0:74 is committed and reviewer-approved but **not pushed, not built, not installed** — the box (`$START9_BOX_HOST`) still runs v0.1.0:72 (:73 was built, never deployed). Pushing `main` and deploying both await user authorization. On boot, `ensure_thesis_v2_promoted` makes the v2.0 reserve-asset spine the working *approved* spine (node-level, reversible).
|
||||||
- **Decided, not yet built:** CRM is the canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts is next (drafts currently reply to the LP only).
|
- **Shipped in v0.1.0:74** (security/privacy hardening from the 2026-06-12 full-eval; report in `EVALUATION.md`): closed a pre-auth `/assets/` path traversal (could read crm.db / JWT secret / Gmail key); wired the local-Qwen NER backstop into the outreach redaction boundary (free-prose email bodies were reaching Claude with unknown names in the clear); added `deleted_at IS NULL` to every get-by-id + nested sub-select read path. **Traversal fix verified locally, not yet live on the box.**
|
||||||
- **Known gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical and not fed to the engine; on an already-seeded box the AI/energy-operator *segment* angle still shows old copy (gated on the banner decision); live features are unverified on the box.
|
- **Decided, not yet built:** CRM as canonical thesis backbone with the signal-engine reading from it (reconciliation unwired); reply-all for Tier-B drafts (drafts currently reply to the LP only).
|
||||||
- **Next:** 1) deploy v0.1.0:73 (on OK); 2) Grant + Jonathan freeze v2.0 canonical in the Workshop; 3) build reply-all; 4) confirm Appendix-A figures + Maple/OpenSecret/Primal, then promote; 5) verify live features on the box.
|
- **Known debt (P2, not deploy-blocking):** 2 thesis tests red vs. the v73 seed + no aggregate runner; `?limit=abc` crashes the request thread; scrub-gateway TLS verify off; `cryptography==42.0.5`; unpkg/no-SRI frontend; stale user-visible `start9/0.4/assets/ABOUT.md`; hardcoded Spark/Qdrant IPs in the s9pk; the 5.4k-line `server.py` monolith. P3 batch + full list in `EVALUATION.md`.
|
||||||
|
- **Other gaps:** the v2.0 spine is the *working* spine but **not a canonical `thesis_version`** (needs Grant + Jonathan dual sign-off); Appendix-A conviction/exposure (incl. ~40% Strike) stay Grant's working read, not canonical, not fed to the engine; live features (Claude/Qdrant/Gmail) unverified on the box.
|
||||||
|
- **Next:** 1) push `main` + build/deploy v0.1.0:74 (on OK), verify the traversal fix live; 2) clear P2 debt (start: 2 red thesis tests + aggregate runner + add traversal/soft-delete/NER regression tests); 3) Grant + Jonathan freeze v2.0 canonical; 4) build reply-all; 5) confirm Appendix-A + Maple/OpenSecret/Primal, then promote.
|
||||||
|
|||||||
@@ -0,0 +1,79 @@
|
|||||||
|
# Evaluation — CRM (Ten31 Venture CRM + Agentic System) — 2026-06-12
|
||||||
|
|
||||||
|
Intent: A self-hosted venture-fund CRM for Ten31 (replacing Airtable to keep sensitive LP/prospect data off third-party servers) managing contacts/organizations/opportunities/communications/LP profiles, with a new in-house AI/agentic layer that drafts outreach and sharpens the investment thesis — packaged as a StartOS s9pk, currently Phase 0/1 (agents draft, humans send).
|
||||||
|
|
||||||
|
Agents run: evaluator, security-auditor, exerciser, start9-spec-checker. Skipped: reviewer (working tree is clean — no uncommitted diff to review).
|
||||||
|
|
||||||
|
## Verdict
|
||||||
|
|
||||||
|
This is a functional, deliberately stdlib-only CRM (one ~5,400-line `backend/server.py` monolith over SQLite) with a genuinely well-engineered privacy layer bolted on; it largely achieves its intent, and the redaction boundary that keeps LP data out of Claude is the strongest part of the codebase. **The headline risk is a single P0: an unauthenticated path-traversal in the `/assets/` route (`backend/server.py:1717`) that runs *before* the auth gate and lets any LAN/Tailnet client read arbitrary files — the live LP database, the JWT signing secret (→ forge an admin token), and the Gmail service-account private key.** A separate P1 punches a hole in the privacy premise from the other side: the outreach drafter sends raw email bodies to Claude with the NER backstop disabled, leaking any non-CRM names in cleartext. The project's own test suite is not green today — two thesis tests fail against the shipped v0.1.0:73 seed (stale assertions, not a runtime bug). Net: not ready to trust as "private" until the traversal and the outreach leak are closed; everything else is hygiene or polish.
|
||||||
|
|
||||||
|
## Cross-referenced findings
|
||||||
|
|
||||||
|
- **Path traversal (the P0).** Found independently by the **evaluator** (P1, `server.py:1717-1718`) and the **security-auditor** (P0, escalated because the same read also yields `/data/.crm-secret` → admin-token forgery and the Gmail DWD key). The **exerciser** ran a path-traversal probe and reported "blocked" — a **false negative**: `curl` collapses `../` client-side, so the test never sent a literal `..`. The two code-reading agents are correct (`get_path()`/`urlparse` does not normalize `..`; a raw client like `curl --path-as-is` reaches it). Merged as **one P0** with the auditor's higher severity, on the strength of the secret/key exposure.
|
||||||
|
- **Two thesis tests fail.** The **evaluator** (P1) and the **exerciser** (P2) independently reproduced and root-caused the same failure: `test_thesis_seed.py` / `test_thesis_actions.py` assert the `positioning` variant_group has 2 members, but `ensure_positioning_framings` (added after the tests) seeds 5 more → 7. Merged as one **P2** (no runtime impact; but "all tests pass" is false today).
|
||||||
|
- **`X-Forwarded-For` trusted** for rate-limit/ban keying — flagged by both **evaluator** (P3) and **auditor** (P3). One P3.
|
||||||
|
- **CORS default `*`** — flagged by both **evaluator** (P3) and **auditor** (P3); both note it's benign today (Bearer auth, no cookies). One P3.
|
||||||
|
|
||||||
|
## Priority queue
|
||||||
|
|
||||||
|
- [P0] Unauthenticated path traversal in `/assets/` → arbitrary file read (LP DB, JWT secret → admin forgery, Gmail key); runs before auth — `backend/server.py:1717-1732`, `1671-1681` — evaluator + security-auditor (exerciser's "blocked" was a false negative)
|
||||||
|
- [P1] Outreach drafter sends raw email bodies to Claude with NER backstop disabled (`Boundary(...)` built with no `ner_fn`), leaking non-CRM names — contradicts documented fail-closed design — `backend/mcp/outreach_agent.py:230`, `106-118`, `151-170` — security-auditor
|
||||||
|
- [P1] Soft-deleted contacts and organizations remain fully readable by direct ID (`GET /api/{contacts,organizations}/{id}` omit `deleted_at IS NULL`) — exerciser
|
||||||
|
- [P2] Two thesis tests fail against the shipped seed (stale 2-vs-7 member assertion) — `backend/test_thesis_seed.py:50`, `backend/test_thesis_actions.py:40` — evaluator + exerciser
|
||||||
|
- [P2] Non-integer query params (`?limit=abc`, `?offset=abc`) raise unhandled `ValueError` and crash the request thread → connection reset, no error body (8 list endpoints) — exerciser
|
||||||
|
- [P2] 5,383-line single handler class with 6 near-identical copy-pasted CRUD update blocks — raises cost of safe change — `backend/server.py:1523`, `2267-2293` — evaluator
|
||||||
|
- [P2] Frontend loads React + Babel-standalone from `unpkg.com` with no SRI — offline-fragile (contradicts data-sovereignty premise) + unpinned supply chain + in-browser transpile cost — `frontend/index.html:9-11` — evaluator
|
||||||
|
- [P2] TLS verification disabled on the scrub-gateway path (`CERT_NONE`, `check_hostname=False`); ships pre-wired, MITM exposes the LP-name dictionary if gateway backend enabled — `backend/ingest/http_util.py:11-17`, `docker_entrypoint.sh:86` — security-auditor
|
||||||
|
- [P2] `cryptography==42.0.5` shipped in image carries a known bundled-OpenSSL advisory (used for Gmail RS256) — bump to ≥43 — `start9/0.4/Dockerfile:50` — security-auditor
|
||||||
|
- [P2] `assets/ABOUT.md` is stale and user-visible on the box: claims first-boot seeds the volume (removed in v0.1.0:40) — fresh install shows an empty CRM with no explanation — `start9/0.4/assets/ABOUT.md:9` — start9-spec-checker
|
||||||
|
- [P2] Hardcoded LAN IPs (`192.168.1.72` Spark, `192.168.1.87` Qdrant) compiled into the s9pk (16 occurrences) — network change forces edit+rebuild+reinstall — `start9/0.4/startos/actions/*.ts`, `docker_entrypoint.sh:85-87` — start9-spec-checker
|
||||||
|
- [P3] `X-Forwarded-For` trusted verbatim for rate-limit/ban keying — spoofable to evade ban or poison another IP's bucket — `backend/server.py:1588-1592` — evaluator + security-auditor
|
||||||
|
- [P3] CORS default `*` — benign with Bearer auth today, pin in prod — `backend/server.py:81` — evaluator + security-auditor
|
||||||
|
- [P3] DB connections closed only on success paths (no `try/finally`) — hygiene, not a leak in practice — `backend/server.py` (many handlers) — evaluator
|
||||||
|
- [P3] `get_body` reads `Content-Length` bytes with no max-size cap → memory-exhaustion DoS on write routes — `backend/server.py:1538-1554` — security-auditor
|
||||||
|
- [P3] Container runs as root (no `USER` in Dockerfile) — drop privileges — `start9/0.4/Dockerfile` — security-auditor
|
||||||
|
- [P3] `requirements.txt` lists ~12 vulnerable/unused deps (e.g. python-jose 3.3.0, CVE-2024-33663) — not imported at runtime, but a trap if ever installed — `backend/requirements.txt` — security-auditor
|
||||||
|
- [P3] `gmailResult.gmail_url` rendered into an `href` without scheme validation (server-generated today) — `frontend/index.html:10134` — security-auditor
|
||||||
|
- [P3] No uniqueness constraint on `contacts.email` — duplicates silently accepted — exerciser
|
||||||
|
- [P3] Create-opportunity does not validate the `stage` field (only the PATCH stage route does) — arbitrary stage strings stored — exerciser
|
||||||
|
- [P3] No server-side length limits on text fields (10k-char names accepted, stored, and exported) — exerciser
|
||||||
|
- [P3] `POST /api/fundraising/log-communication` ignores a valid `investor_id` and demands `row_id`/`investor_name` — grid-vs-contacts model mismatch — exerciser
|
||||||
|
- [P3] Deprecated `datetime.utcnow()` calls emit warnings (will break on a future Python) — exerciser
|
||||||
|
- [P3] `npm audit`: fast-xml-parser/-builder transitive advisories (build-time only, via start-sdk) — `npm audit fix` — security-auditor
|
||||||
|
- [P3] `anthropic` dependency unpinned — reproducibility/supply-chain gap — security-auditor
|
||||||
|
- [P3] Manifest declares `aarch64` but no native arm image is built (runs via QEMU; `fastembed`/`mcp` unverified on arm) — drop the arch or build it — `start9/0.4/startos/manifest/index.ts:23` — start9-spec-checker
|
||||||
|
- [P3] `start9/0.4/README.md:39-40` describes the removed seed mechanism (developer-facing) — start9-spec-checker
|
||||||
|
- [P3] `packageRepo`/`upstreamRepo` manifest URLs 404 (private/nonexistent) — fine for private use, fails registry validation — start9-spec-checker
|
||||||
|
- [P3] Stale `start9/0.4/javascript.tmp.1776377780/` build artifact on disk (gitignored, harmless) — start9-spec-checker
|
||||||
|
|
||||||
|
## Scorecard
|
||||||
|
|
||||||
|
The evaluator's six-lens table, unadjusted (other agents' evidence reinforces but does not contradict it):
|
||||||
|
|
||||||
|
| Lens | Score /5 | Justification |
|
||||||
|
|---|---|---|
|
||||||
|
| Architecture | 3 | Clean module separation (`redaction`/`ingest`/`mcp`/`email_integration`) + consistent central dispatch, but all CRM logic in one 5,383-line handler — `server.py:1523`. |
|
||||||
|
| Security | 2 | Strong auth crypto (pbkdf2 200k, `compare_digest`, pinned HS256) and clean secrets hygiene, undermined by the P0 traversal + the P1 outreach leak the auditor added — `server.py:1717`, `outreach_agent.py:230`. |
|
||||||
|
| Performance | 4 | Per-request connections + WAL + 42 indexes + locked abuse state, right-sized for ~400 records / 5 users. |
|
||||||
|
| Testing | 3 | 13 fast isolated tests covering the hard parts (redaction leak-hunts, grounding boundary), but 2 are red against the current seed. |
|
||||||
|
| Code quality | 3 | Comments explain *why* well; the monolith and copy-pasted CRUD blocks raise change cost. |
|
||||||
|
| Documentation | 5 | AGENTS.md + 6 scoped guides are accurate and verifiable (the "FastAPI is vestigial" claim checks out — zero runtime imports). |
|
||||||
|
|
||||||
|
Note: the auditor's P0 escalation and added P1 leak both land on the Security lens; they corroborate the score of 2 rather than push it lower (it was already the floor of the table).
|
||||||
|
|
||||||
|
## Disagreements & gaps
|
||||||
|
|
||||||
|
- **Path traversal — exerciser vs. the two readers.** The exerciser reported path traversal "blocked"; the evaluator and auditor both found it exploitable by code reading. Resolution: the exerciser's tool (`curl`) normalized `../` before sending, so the probe never tested the vuln — it is **real and P0**. Lesson: black-box probes for traversal must use a raw, non-normalizing client (`curl --path-as-is`).
|
||||||
|
- **Shared blind spot — the differentiating Phase-1 AI paths are unverified at runtime.** No agent could exercise live Claude/Anthropic calls (no `ANTHROPIC_API_KEY` in env), Qdrant ingest (`/api/index/*`), or Gmail draft creation. So `/api/outreach/draft`, `/api/architect/ground`, and thesis generation were reached only in their degraded/no-key form. The P1 outreach-leak finding is from code reading, not a live capture — confidence is high but a live request would confirm it. This is the one gap every agent shares.
|
||||||
|
- **StartOS spec — two UNVERIFIED items.** The spec-checker could not confirm whether 0.4 still requires `instructions.md`/`prepare.sh` (docs pages 404'd) or measure the expanded image size; `start-sdk verify` failed only because the machine has the wrong-era (0.3.5) binary — `start-cli s9pk inspect` succeeds and the artifact is valid. No packaging blocker for private sideload.
|
||||||
|
|
||||||
|
## Suggested order of work
|
||||||
|
|
||||||
|
1. **Close the P0 traversal first** — `os.path.realpath` + `commonpath` containment check (or stdlib `translate_path`) on the `/assets/` branch in `backend/server.py:1717`, and confirm with `curl --path-as-is 'http://host/assets/../../data/crm.db'`. Until this is fixed the "private" claim is false; do not deploy.
|
||||||
|
2. **Fix the P1 outreach leak** — pass `ner_fn=_ner_local` to the `Boundary` in `outreach_agent.py:230` (mirroring `architect_grounding.py`), fail closed if NER is unreachable, add a minimize-first pass. Both #1 and #2 directly protect the LP data the project exists to protect.
|
||||||
|
3. **Make the test suite green and authoritative** — fix the two stale thesis assertions (assert structurally, not on an exact count) and add a one-line aggregate runner so "do the tests pass" has a single answer; then the suite can gate the next deploy.
|
||||||
|
4. **Fix the two functional bugs** — add `AND deleted_at IS NULL` to the get-by-ID handlers (P1), and wrap query-param `int()` parsing to return a 400 instead of crashing the thread (P2).
|
||||||
|
5. **Deploy-prep the package** — update `ABOUT.md` (and README) to current first-boot behavior so a fresh install isn't a mystery empty CRM, and lift the hardcoded Spark/Qdrant IPs into config/env before the box ever moves networks.
|
||||||
|
6. **Then verify the live Phase-1 paths on the box** with a real `ANTHROPIC_API_KEY` — the outreach/architect/thesis features that no agent could exercise here.
|
||||||
|
7. **Hardening sweep** (P3 batch) — bump `cryptography`, drop the vestigial vulnerable `requirements.txt` entries, stop trusting `X-Forwarded-For`, cap request-body size, run as non-root.
|
||||||
@@ -12,6 +12,7 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
_HERE = os.path.dirname(os.path.abspath(__file__))
|
_HERE = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
sys.path.insert(0, _HERE) # backend/mcp on path for sibling imports (architect_grounding, architect_agent)
|
||||||
|
|
||||||
# outreach_type -> human description woven into the prompt
|
# outreach_type -> human description woven into the prompt
|
||||||
OUTREACH_TYPES = {
|
OUTREACH_TYPES = {
|
||||||
@@ -223,11 +224,15 @@ def draft_outreach(conn, investor_id, outreach_type, guidance, db_path, sender_e
|
|||||||
voice_blocks, voice_meta = _voice_examples(conn, sender_email, outreach_type)
|
voice_blocks, voice_meta = _voice_examples(conn, sender_email, outreach_type)
|
||||||
|
|
||||||
# 1) Scrub the sender's voice examples + the recipient context TOGETHER (shared token
|
# 1) Scrub the sender's voice examples + the recipient context TOGETHER (shared token
|
||||||
# space). Nothing reaches Claude in the clear; the voice examples are reference only.
|
# space). The recipient context is free-prose email bodies, so the dictionary+regex
|
||||||
|
# floor is NOT enough — pass the local-Qwen NER backstop (as architect_grounding does)
|
||||||
|
# to tokenize unknown people/firms not in the CRM. FAILS CLOSED: if the local model is
|
||||||
|
# unreachable, _ner_local raises here and no de-anonymized draft is returned.
|
||||||
try:
|
try:
|
||||||
sys.path.insert(0, os.path.dirname(_HERE)) # backend/ for the redaction package
|
sys.path.insert(0, os.path.dirname(_HERE)) # backend/ for the redaction package
|
||||||
from redaction.client import Boundary
|
from redaction.client import Boundary
|
||||||
boundary = Boundary(db_path=db_path, actor="closer")
|
from architect_grounding import _ner_local # local-Qwen NER backstop (sibling module)
|
||||||
|
boundary = Boundary(db_path=db_path, actor="closer", ner_fn=_ner_local)
|
||||||
scrubbed = boundary.scrub(list(voice_blocks) + [context], bucket=False, conn=conn)
|
scrubbed = boundary.scrub(list(voice_blocks) + [context], bucket=False, conn=conn)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
return {"status": "scrub_unavailable", "reason": str(exc)}
|
return {"status": "scrub_unavailable", "reason": str(exc)}
|
||||||
@@ -237,7 +242,6 @@ def draft_outreach(conn, investor_id, outreach_type, guidance, db_path, sender_e
|
|||||||
|
|
||||||
# 2) Claude drafts over the de-identified context + voice + (non-sensitive) thesis.
|
# 2) Claude drafts over the de-identified context + voice + (non-sensitive) thesis.
|
||||||
try:
|
try:
|
||||||
sys.path.insert(0, _HERE)
|
|
||||||
import architect_agent as aa
|
import architect_agent as aa
|
||||||
thesis = aa.at.get_thesis("core", db=db_path)
|
thesis = aa.at.get_thesis("core", db=db_path)
|
||||||
raw = _draft_with_claude(aa, thesis, type_desc, deident_target, deident_voice, guidance)
|
raw = _draft_with_claude(aa, thesis, type_desc, deident_target, deident_voice, guidance)
|
||||||
|
|||||||
+19
-10
@@ -73,6 +73,7 @@ BASE_DIR = os.path.dirname(os.path.abspath(__file__))
|
|||||||
PROJECT_DIR = os.path.dirname(BASE_DIR)
|
PROJECT_DIR = os.path.dirname(BASE_DIR)
|
||||||
DATA_DIR = os.environ.get("CRM_DATA_DIR", os.path.join(PROJECT_DIR, "data"))
|
DATA_DIR = os.environ.get("CRM_DATA_DIR", os.path.join(PROJECT_DIR, "data"))
|
||||||
FRONTEND_DIR = os.environ.get("CRM_FRONTEND_DIR", os.path.join(PROJECT_DIR, "frontend"))
|
FRONTEND_DIR = os.environ.get("CRM_FRONTEND_DIR", os.path.join(PROJECT_DIR, "frontend"))
|
||||||
|
FRONTEND_ROOT = os.path.realpath(FRONTEND_DIR) # resolved once; the /assets/ containment boundary
|
||||||
DB_PATH = os.environ.get("CRM_DB_PATH", os.path.join(DATA_DIR, "crm.db"))
|
DB_PATH = os.environ.get("CRM_DB_PATH", os.path.join(DATA_DIR, "crm.db"))
|
||||||
SECRET_KEY = os.environ.get("CRM_SECRET_KEY", "venture-crm-secret-change-in-production-" + str(uuid.uuid4()))
|
SECRET_KEY = os.environ.get("CRM_SECRET_KEY", "venture-crm-secret-change-in-production-" + str(uuid.uuid4()))
|
||||||
TOKEN_EXPIRY_HOURS = 24
|
TOKEN_EXPIRY_HOURS = 24
|
||||||
@@ -1716,6 +1717,14 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
return self.send_file(os.path.join(FRONTEND_DIR, 'index.html'))
|
return self.send_file(os.path.join(FRONTEND_DIR, 'index.html'))
|
||||||
if path.startswith('/assets/'):
|
if path.startswith('/assets/'):
|
||||||
filepath = os.path.join(FRONTEND_DIR, path.lstrip('/'))
|
filepath = os.path.join(FRONTEND_DIR, path.lstrip('/'))
|
||||||
|
# Containment check: get_path()/urlparse does NOT normalize '..', so without
|
||||||
|
# this an unauthenticated GET /assets/../../data/crm.db (raw client) would read
|
||||||
|
# any file the process can — the LP DB, the JWT secret, the Gmail key. Resolve
|
||||||
|
# and require the target stay under FRONTEND_ROOT; 404 (not 403) so it looks like
|
||||||
|
# any other miss and still trips the scanner abuse counter.
|
||||||
|
_real = os.path.realpath(filepath)
|
||||||
|
if _real != FRONTEND_ROOT and not _real.startswith(FRONTEND_ROOT + os.sep):
|
||||||
|
return self.send_error_json("File not found", 404)
|
||||||
ext = os.path.splitext(path)[1].lower()
|
ext = os.path.splitext(path)[1].lower()
|
||||||
content_types = {
|
content_types = {
|
||||||
'.css': 'text/css',
|
'.css': 'text/css',
|
||||||
@@ -2185,7 +2194,7 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
SELECT c.*, o.name as organization_name
|
SELECT c.*, o.name as organization_name
|
||||||
FROM contacts c
|
FROM contacts c
|
||||||
LEFT JOIN organizations o ON c.organization_id = o.id
|
LEFT JOIN organizations o ON c.organization_id = o.id
|
||||||
WHERE c.id = ?
|
WHERE c.id = ? AND c.deleted_at IS NULL
|
||||||
""", (contact_id,)).fetchone()
|
""", (contact_id,)).fetchone()
|
||||||
|
|
||||||
if not contact:
|
if not contact:
|
||||||
@@ -2198,16 +2207,16 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
result['communications'] = rows_to_list(conn.execute(
|
result['communications'] = rows_to_list(conn.execute(
|
||||||
"""SELECT cm.*, u.full_name as created_by_name
|
"""SELECT cm.*, u.full_name as created_by_name
|
||||||
FROM communications cm LEFT JOIN users u ON cm.created_by = u.id
|
FROM communications cm LEFT JOIN users u ON cm.created_by = u.id
|
||||||
WHERE cm.contact_id = ? ORDER BY cm.communication_date DESC LIMIT 20""",
|
WHERE cm.contact_id = ? AND cm.deleted_at IS NULL ORDER BY cm.communication_date DESC LIMIT 20""",
|
||||||
(contact_id,)
|
(contact_id,)
|
||||||
).fetchall())
|
).fetchall())
|
||||||
|
|
||||||
result['opportunities'] = rows_to_list(conn.execute(
|
result['opportunities'] = rows_to_list(conn.execute(
|
||||||
"SELECT * FROM opportunities WHERE contact_id = ? ORDER BY updated_at DESC",
|
"SELECT * FROM opportunities WHERE contact_id = ? AND deleted_at IS NULL ORDER BY updated_at DESC",
|
||||||
(contact_id,)
|
(contact_id,)
|
||||||
).fetchall())
|
).fetchall())
|
||||||
|
|
||||||
lp = conn.execute("SELECT * FROM lp_profiles WHERE contact_id = ?", (contact_id,)).fetchone()
|
lp = conn.execute("SELECT * FROM lp_profiles WHERE contact_id = ? AND deleted_at IS NULL", (contact_id,)).fetchone()
|
||||||
result['lp_profile'] = row_to_dict(lp) if lp else None
|
result['lp_profile'] = row_to_dict(lp) if lp else None
|
||||||
|
|
||||||
conn.close()
|
conn.close()
|
||||||
@@ -2362,17 +2371,17 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
|
|
||||||
def handle_get_organization(self, user, org_id):
|
def handle_get_organization(self, user, org_id):
|
||||||
conn = get_db()
|
conn = get_db()
|
||||||
org = conn.execute("SELECT * FROM organizations WHERE id = ?", (org_id,)).fetchone()
|
org = conn.execute("SELECT * FROM organizations WHERE id = ? AND deleted_at IS NULL", (org_id,)).fetchone()
|
||||||
if not org:
|
if not org:
|
||||||
conn.close()
|
conn.close()
|
||||||
return self.send_error_json("Organization not found", 404)
|
return self.send_error_json("Organization not found", 404)
|
||||||
|
|
||||||
result = row_to_dict(org)
|
result = row_to_dict(org)
|
||||||
result['contacts'] = rows_to_list(conn.execute(
|
result['contacts'] = rows_to_list(conn.execute(
|
||||||
"SELECT * FROM contacts WHERE organization_id = ? ORDER BY last_name", (org_id,)
|
"SELECT * FROM contacts WHERE organization_id = ? AND deleted_at IS NULL ORDER BY last_name", (org_id,)
|
||||||
).fetchall())
|
).fetchall())
|
||||||
result['opportunities'] = rows_to_list(conn.execute(
|
result['opportunities'] = rows_to_list(conn.execute(
|
||||||
"SELECT * FROM opportunities WHERE organization_id = ? ORDER BY updated_at DESC", (org_id,)
|
"SELECT * FROM opportunities WHERE organization_id = ? AND deleted_at IS NULL ORDER BY updated_at DESC", (org_id,)
|
||||||
).fetchall())
|
).fetchall())
|
||||||
conn.close()
|
conn.close()
|
||||||
return self.send_json({"data": result})
|
return self.send_json({"data": result})
|
||||||
@@ -2498,7 +2507,7 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
LEFT JOIN contacts c ON op.contact_id = c.id
|
LEFT JOIN contacts c ON op.contact_id = c.id
|
||||||
LEFT JOIN organizations o ON op.organization_id = o.id
|
LEFT JOIN organizations o ON op.organization_id = o.id
|
||||||
LEFT JOIN users u ON op.owner_id = u.id
|
LEFT JOIN users u ON op.owner_id = u.id
|
||||||
WHERE op.id = ?
|
WHERE op.id = ? AND op.deleted_at IS NULL
|
||||||
""", (opp_id,)).fetchone()
|
""", (opp_id,)).fetchone()
|
||||||
|
|
||||||
if not opp:
|
if not opp:
|
||||||
@@ -2509,7 +2518,7 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
result['communications'] = rows_to_list(conn.execute(
|
result['communications'] = rows_to_list(conn.execute(
|
||||||
"""SELECT cm.*, u.full_name as created_by_name
|
"""SELECT cm.*, u.full_name as created_by_name
|
||||||
FROM communications cm LEFT JOIN users u ON cm.created_by = u.id
|
FROM communications cm LEFT JOIN users u ON cm.created_by = u.id
|
||||||
WHERE cm.opportunity_id = ? ORDER BY cm.communication_date DESC""",
|
WHERE cm.opportunity_id = ? AND cm.deleted_at IS NULL ORDER BY cm.communication_date DESC""",
|
||||||
(opp_id,)
|
(opp_id,)
|
||||||
).fetchall())
|
).fetchall())
|
||||||
|
|
||||||
@@ -2975,7 +2984,7 @@ class CRMHandler(BaseHTTPRequestHandler):
|
|||||||
FROM lp_profiles lp
|
FROM lp_profiles lp
|
||||||
LEFT JOIN contacts c ON lp.contact_id = c.id
|
LEFT JOIN contacts c ON lp.contact_id = c.id
|
||||||
LEFT JOIN organizations o ON c.organization_id = o.id
|
LEFT JOIN organizations o ON c.organization_id = o.id
|
||||||
WHERE lp.id = ?
|
WHERE lp.id = ? AND lp.deleted_at IS NULL
|
||||||
""", (lp_id,)).fetchone()
|
""", (lp_id,)).fetchone()
|
||||||
if not lp:
|
if not lp:
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|||||||
@@ -23,4 +23,6 @@ Read this before editing anything that sends data to a Claude model — the reda
|
|||||||
|
|
||||||
Trace the data path: any field carrying LP substance must cross `Boundary` first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in `backend/redaction/test_*.py`.
|
Trace the data path: any field carrying LP substance must cross `Boundary` first. A new MCP tool that reads CRM rows and hands them to a model without scrubbing is a leak — add it to the redaction path and extend the leak tests in `backend/redaction/test_*.py`.
|
||||||
|
|
||||||
|
A Claude path that sends **free-prose** LP content (email bodies, notes) must pass `ner_fn=_ner_local` to `Boundary` and **fail closed** if the local model is down — the dictionary+regex floor only tokenizes KNOWN CRM entities, so unknown people/firms in prose leak otherwise. See `backend/mcp/architect_grounding.py` (does it right) and `backend/mcp/outreach_agent.py`.
|
||||||
|
|
||||||
See also `docs/redaction-rehydration.md` and `docs/spark-control-scrub-endpoints.md`.
|
See also `docs/redaction-rehydration.md` and `docs/spark-control-scrub-endpoints.md`.
|
||||||
|
|||||||
@@ -38,8 +38,9 @@ export const PACKAGE_TITLE = 'Ten31 Database'
|
|||||||
// * 0.1.0:70 (outreach voice upgrade — per-user voice from own emails + transparency; active-thread context)
|
// * 0.1.0:70 (outreach voice upgrade — per-user voice from own emails + transparency; active-thread context)
|
||||||
// * 0.1.0:71 (voice by-purpose larger sample + Tier-B: create Gmail draft w/ in-thread reply)
|
// * 0.1.0:71 (voice by-purpose larger sample + Tier-B: create Gmail draft w/ in-thread reply)
|
||||||
// * 0.1.0:72 (stage v2.0 reserve-asset thesis spine as Workshop candidates)
|
// * 0.1.0:72 (stage v2.0 reserve-asset thesis spine as Workshop candidates)
|
||||||
// * Current: 0.1.0:73 (replace old settlement spine with v2.0 reserve-asset spine across Architect + outreach prompts, seed constants, and docs; promote v2.0 to the working approved spine + soft-retire old settlement nodes, reversibly, node-level only)
|
// * 0.1.0:73 (replace old settlement spine with v2.0 reserve-asset spine across Architect + outreach prompts, seed constants, and docs; promote v2.0 to the working approved spine + soft-retire old settlement nodes, reversibly, node-level only)
|
||||||
export const PACKAGE_VERSION = '0.1.0:73'
|
// * Current: 0.1.0:74 (security/privacy hardening — full-eval P0+2×P1: close /assets/ path traversal, add NER backstop to the outreach redaction boundary, filter deleted_at on get-by-id)
|
||||||
|
export const PACKAGE_VERSION = '0.1.0:74'
|
||||||
|
|
||||||
export const DATA_MOUNT_PATH = '/data'
|
export const DATA_MOUNT_PATH = '/data'
|
||||||
export const WEB_PORT = 8080
|
export const WEB_PORT = 8080
|
||||||
|
|||||||
@@ -34,8 +34,9 @@ import { v_0_1_0_70 } from './v0.1.0.70'
|
|||||||
import { v_0_1_0_71 } from './v0.1.0.71'
|
import { v_0_1_0_71 } from './v0.1.0.71'
|
||||||
import { v_0_1_0_72 } from './v0.1.0.72'
|
import { v_0_1_0_72 } from './v0.1.0.72'
|
||||||
import { v_0_1_0_73 } from './v0.1.0.73'
|
import { v_0_1_0_73 } from './v0.1.0.73'
|
||||||
|
import { v_0_1_0_74 } from './v0.1.0.74'
|
||||||
|
|
||||||
export const versionGraph = VersionGraph.of({
|
export const versionGraph = VersionGraph.of({
|
||||||
current: v_0_1_0_73,
|
current: v_0_1_0_74,
|
||||||
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49, v_0_1_0_50, v_0_1_0_51, v_0_1_0_52, v_0_1_0_53, v_0_1_0_54, v_0_1_0_55, v_0_1_0_56, v_0_1_0_57, v_0_1_0_58, v_0_1_0_59, v_0_1_0_60, v_0_1_0_61, v_0_1_0_62, v_0_1_0_63, v_0_1_0_64, v_0_1_0_65, v_0_1_0_66, v_0_1_0_67, v_0_1_0_68, v_0_1_0_69, v_0_1_0_70, v_0_1_0_71, v_0_1_0_72],
|
other: [v_0_1_0_39, v_0_1_0_40, v_0_1_0_41, v_0_1_0_42, v_0_1_0_43, v_0_1_0_44, v_0_1_0_45, v_0_1_0_46, v_0_1_0_47, v_0_1_0_48, v_0_1_0_49, v_0_1_0_50, v_0_1_0_51, v_0_1_0_52, v_0_1_0_53, v_0_1_0_54, v_0_1_0_55, v_0_1_0_56, v_0_1_0_57, v_0_1_0_58, v_0_1_0_59, v_0_1_0_60, v_0_1_0_61, v_0_1_0_62, v_0_1_0_63, v_0_1_0_64, v_0_1_0_65, v_0_1_0_66, v_0_1_0_67, v_0_1_0_68, v_0_1_0_69, v_0_1_0_70, v_0_1_0_71, v_0_1_0_72, v_0_1_0_73],
|
||||||
})
|
})
|
||||||
|
|||||||
@@ -0,0 +1,27 @@
|
|||||||
|
import { VersionInfo } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
|
// Security/privacy hardening from the 2026-06-12 full-eval (P0 + two P1s). Code-only,
|
||||||
|
// no schema change (migrations are no-ops):
|
||||||
|
// * P0 — pre-auth path traversal in the /assets/ route (server.py): get_path()/urlparse
|
||||||
|
// does not normalize '..', so an unauthenticated GET /assets/../../data/crm.db (raw
|
||||||
|
// client) read any file the process could — the LP DB, the JWT signing secret (-> admin
|
||||||
|
// token forgery), the Gmail service-account key. Added a realpath containment check that
|
||||||
|
// 404s anything resolving outside FRONTEND_DIR.
|
||||||
|
// * P1 — the LP-outreach drafter (mcp/outreach_agent.py) built its redaction Boundary with
|
||||||
|
// no ner_fn, so unknown people/firms in raw email bodies reached Claude in the clear.
|
||||||
|
// Now passes the local-Qwen NER backstop (ner_fn=_ner_local) like architect_grounding;
|
||||||
|
// fails closed via the existing scrub_unavailable path if the local model is down.
|
||||||
|
// * P1 — get-by-ID handlers for contacts and organizations (server.py) omitted the
|
||||||
|
// deleted_at IS NULL filter, so soft-deleted records stayed readable by direct ID.
|
||||||
|
export const v_0_1_0_74 = VersionInfo.of({
|
||||||
|
version: '0.1.0:74',
|
||||||
|
releaseNotes: {
|
||||||
|
en_US: [
|
||||||
|
'Security hardening: close an unauthenticated file-read in static-asset serving (could expose',
|
||||||
|
'the database, the auth secret, and the Gmail key), tighten the LP-outreach privacy boundary so',
|
||||||
|
'unknown names in email bodies are de-identified before reaching Claude, and stop soft-deleted',
|
||||||
|
'contacts and organizations from being readable by direct link.',
|
||||||
|
].join(' '),
|
||||||
|
},
|
||||||
|
migrations: { up: async () => {}, down: async () => {} },
|
||||||
|
})
|
||||||
Reference in New Issue
Block a user