2e70b34592
Phase 1 Workstream D. Lets the Architect ground the thesis in REAL recurring LP objections without any LP identity reaching the Claude API. Layered, defense-in-depth, fail-closed by construction (docs/redaction-rehydration.md). backend/redaction/: - scrub.py: the leak-proof core. Drops Tier-1 (labelled/structured account/wire/SSN/ IBAN/SWIFT/passport, separator-tolerant); tokenizes known LP entities (dictionary from the canonical layer, unicode-folded + hyphen-extended) and structured PII (emails, scheme-less/social URLs, intl+ext phones, currency-cued amounts, ISO/worded/numeric/ quarter dates, addresses, bare long digit runs); pre-neutralizes injected [TYPE_N] strings; single-pass rehydrate; metadata-only audit logging (the pseudonym map is the de-anon key — local-only, never logged/sent). Hardened across THREE adversarial leak-hunts (worded/coded amounts, intl phones, NFD/ligature/zero-width names, slash/ comma SSN, SWIFT, alpha-prefixed accounts, substance-preserving false-positive fixes). - client.py: Boundary — one scrub/rehydrate contract, SCRUB_BACKEND=local (default) or gateway (Spark Control /scrub + /rehydrate). Fails closed (db_path required; dictionary build errors propagate; strict rehydrate returns tokenized-not-de-anon text). - test_scrub_leak.py, test_reidentification.py: golden-file leak + re-identification suites (synthetic only, guardrail #9), regression-locking every leak-hunt vector. backend/mcp/architect_grounding.py: the flow — retrieve (local) -> minimize-first (local Qwen) -> scrub (+ local-Qwen NER backstop for unknown names) -> Claude over the de-identified register only -> re-hydrate locally -> human review. FAILS CLOSED if the local model is unreachable or a hallucinated token appears. test_grounding_boundary.py proves nothing sensitive reaches Claude and the three fail-closed paths. server.py: POST /api/architect/ground (admin) wires retrieval -> ground_objections. docker_entrypoint.sh: SCRUB_BACKEND (default local). docs/spark-control-scrub-endpoints.md: the gateway handover spec (Option 1 — caller supplies the entity dictionary). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
113 lines
6.6 KiB
Bash
Executable File
113 lines
6.6 KiB
Bash
Executable File
#!/bin/sh
|
|
# ═══════════════════════════════════════════════════════════════
|
|
# Ten31 Database container entrypoint (StartOS 0.4 wrapper)
|
|
# ═══════════════════════════════════════════════════════════════
|
|
#
|
|
# Responsibilities:
|
|
# 1. Ensure the mounted /data volume directories exist.
|
|
# 2. Ensure a persistent CRM_SECRET_KEY exists so issued JWTs
|
|
# survive container restarts.
|
|
# 3. Launch the Python backend server.
|
|
#
|
|
# Note: This entrypoint NO LONGER seeds /data from a baked-in
|
|
# snapshot. The 0.3.5 → 0.4 migration is complete; from 0.1.0:40
|
|
# forward the live /data volume on the StartOS host is the sole
|
|
# source of truth. StartOS preserves /data across sideloads, so
|
|
# upgrades will not disturb live data.
|
|
# ═══════════════════════════════════════════════════════════════
|
|
|
|
set -eu
|
|
|
|
DATA_DIR="${CRM_DATA_DIR:-/data}"
|
|
SECRET_FILE="$DATA_DIR/.crm-secret"
|
|
SECRETS_DIR="$DATA_DIR/secrets"
|
|
EMAIL_ATTACHMENTS_DIR="$DATA_DIR/email_attachments"
|
|
GMAIL_SA_KEY="$SECRETS_DIR/gmail-service-account.json"
|
|
|
|
mkdir -p "$DATA_DIR" "$DATA_DIR/backups" "$SECRETS_DIR" "$EMAIL_ATTACHMENTS_DIR"
|
|
# /data/secrets holds the Gmail service-account key; lock it down so only
|
|
# the container user can read the directory. chmod on the file itself is
|
|
# the operator's responsibility when they drop the key in.
|
|
chmod 700 "$SECRETS_DIR" 2>/dev/null || true
|
|
|
|
# ── Persistent JWT secret ───────────────────────────────────────
|
|
if [ -z "${CRM_SECRET_KEY:-}" ]; then
|
|
if [ -f "$SECRET_FILE" ]; then
|
|
CRM_SECRET_KEY="$(cat "$SECRET_FILE")"
|
|
else
|
|
CRM_SECRET_KEY="$(head -c 48 /dev/urandom | base64 | tr -d '\n' | tr '/+' 'ab')"
|
|
printf '%s' "$CRM_SECRET_KEY" > "$SECRET_FILE"
|
|
chmod 600 "$SECRET_FILE"
|
|
fi
|
|
export CRM_SECRET_KEY
|
|
fi
|
|
|
|
# ── Gmail integration env vars ──────────────────────────────────
|
|
# The integration is enabled only if the service-account key file is
|
|
# actually present on the /data volume. This makes the package
|
|
# self-disabling on fresh installs until an operator drops the key in.
|
|
if [ -f "$GMAIL_SA_KEY" ]; then
|
|
export CRM_GMAIL_INTEGRATION_ENABLED="${CRM_GMAIL_INTEGRATION_ENABLED:-true}"
|
|
export CRM_GMAIL_AUTH_METHOD="${CRM_GMAIL_AUTH_METHOD:-dwd}"
|
|
export CRM_GMAIL_SA_KEY_PATH="${CRM_GMAIL_SA_KEY_PATH:-$GMAIL_SA_KEY}"
|
|
export CRM_GMAIL_WORKSPACE_DOMAIN="${CRM_GMAIL_WORKSPACE_DOMAIN:-ten31.xyz}"
|
|
export CRM_GMAIL_SYNC_INTERVAL_MIN="${CRM_GMAIL_SYNC_INTERVAL_MIN:-180}"
|
|
echo "[entrypoint] Gmail integration: ENABLED (key at $GMAIL_SA_KEY)"
|
|
else
|
|
echo "[entrypoint] Gmail integration: DISABLED (no key at $GMAIL_SA_KEY)"
|
|
fi
|
|
|
|
# ── Architect (Claude) API key ──────────────────────────────────
|
|
# The Architect agent (thesis generation) runs on Claude. Drop your Anthropic
|
|
# API key in this file to enable it; it stays on the box. Self-disabling until
|
|
# the key is present (generation endpoints return a clear "not configured" error).
|
|
ANTHROPIC_KEY_FILE="$SECRETS_DIR/anthropic-api-key"
|
|
if [ -z "${ANTHROPIC_API_KEY:-}" ] && [ -f "$ANTHROPIC_KEY_FILE" ]; then
|
|
export ANTHROPIC_API_KEY="$(tr -d '\n\r' < "$ANTHROPIC_KEY_FILE")"
|
|
echo "[entrypoint] Architect: ANTHROPIC_API_KEY loaded from $ANTHROPIC_KEY_FILE"
|
|
elif [ -z "${ANTHROPIC_API_KEY:-}" ]; then
|
|
echo "[entrypoint] Architect: no API key yet (drop it at $ANTHROPIC_KEY_FILE to enable thesis generation)"
|
|
fi
|
|
|
|
# ── Phase-0 ingest / retrieval env ──────────────────────────────
|
|
# These are consumed by the ingest pipeline (backend/ingest/) and the MCP
|
|
# server (backend/mcp/) — NOT by the CRM web server, which ignores them.
|
|
# They are exported here so the "Build search index" StartOS action and any
|
|
# manual `python3 /app/backend/ingest/...` / `backend/mcp/server.py` run on the
|
|
# box inherit them.
|
|
#
|
|
# OPERATOR: the values below are LAN defaults for the Ten31 deployment. Set the
|
|
# real ones for your network — either by editing them here before building the
|
|
# image, or by overriding the env vars in the StartOS service environment.
|
|
# Point SPARK_CONTROL_URL at the Spark Control gateway (TLS, self-signed by
|
|
# default → SPARK_CONTROL_VERIFY_TLS=false) and QDRANT_URL at Qdrant on Spark 2.
|
|
export CRM_DB_PATH="${CRM_DB_PATH:-$DATA_DIR/crm.db}"
|
|
export SPARK_CONTROL_URL="${SPARK_CONTROL_URL:-https://192.168.1.72:62419}"
|
|
export SPARK_CONTROL_VERIFY_TLS="${SPARK_CONTROL_VERIFY_TLS:-false}"
|
|
export QDRANT_URL="${QDRANT_URL:-http://192.168.1.87:6333}"
|
|
# Redaction boundary backend for the Architect's grounding step (Workstream D):
|
|
# local (default) = in-repo deterministic scrubber (backend/redaction/), map in-process.
|
|
# gateway = Spark Control POST /scrub + /rehydrate, once that ships.
|
|
# Flip to 'gateway' only after the Spark Control endpoints are live (same contract).
|
|
export SCRUB_BACKEND="${SCRUB_BACKEND:-local}"
|
|
# OPERATOR: how often (minutes) the background sync scheduler re-runs the
|
|
# incremental ingest sync to keep the Qdrant search index fresh. Default 60.
|
|
export CRM_INGEST_SYNC_INTERVAL_MIN="${CRM_INGEST_SYNC_INTERVAL_MIN:-60}"
|
|
|
|
# ── Background ingest sync scheduler ────────────────────────────
|
|
# Keep the Qdrant search index fresh hands-off: sync_scheduler.py loops the
|
|
# incremental sync every CRM_INGEST_SYNC_INTERVAL_MIN minutes. It runs as a
|
|
# BACKGROUND process (not a StartOS daemon) — see INGEST_PACKAGING.md for the
|
|
# daemon-vs-background-process tradeoff. Started only when ingest is configured,
|
|
# i.e. both Spark Control and Qdrant endpoints are set; otherwise the loop would
|
|
# just error every interval with nothing to talk to.
|
|
if [ -n "${SPARK_CONTROL_URL:-}" ] && [ -n "${QDRANT_URL:-}" ]; then
|
|
(cd /app/backend/ingest && CRM_DB_PATH=/data/crm.db python3 sync_scheduler.py --db /data/crm.db >> /data/ingest-sync.log 2>&1 &)
|
|
echo "[entrypoint] ingest sync scheduler: STARTED"
|
|
else
|
|
echo "[entrypoint] ingest sync scheduler: SKIPPED (Spark/Qdrant not configured)"
|
|
fi
|
|
|
|
# ── Launch the app ──────────────────────────────────────────────
|
|
exec python3 /app/backend/server.py
|