Phase 0 foundation: canonical schema, ingest pipeline, CRM MCP server

Workstream A–C substrate for the Ten31 agentic system:
- A1: docs/crm-overview.md; CLAUDE.md conventions + guardrail #9
- A2: additive/reversible core migration (canonical_entities, entity_links,
  interaction_log, relationship_edges, soft-delete) + ledgered runner
- B1/B3: chunking + deterministic entity resolution (backend/ingest)
- B2: dense (bge-m3) + BM25 sparse ingest to Qdrant crm_chunks
- C: CRM MCP server (reads, retrieval modes, logged writes) — no outbound tools
- docs: redaction/re-hydration, Gmail enablement runbook
- synthetic test data; .env.example; housekeeping (.gitignore, untrack crm.db,
  drop legacy files + start9/0.3.5)

Verified end-to-end on synthetic data + live Sparks (hybrid > dense on entity
queries). Real backfill runs on Ten31 infra; index holds synthetic data only.
Branch snapshot also captures pre-existing working-tree changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-06-05 08:11:28 -05:00
parent 7027efd777
commit c7ce44d963
99 changed files with 10676 additions and 7817 deletions
+116
View File
@@ -0,0 +1,116 @@
#!/bin/bash
# ═══════════════════════════════════════════════════════════════
# refresh_seed.sh
# Pull the live Ten31 Database data off a StartOS 0.3.5 host
# and stage it as the seed snapshot baked into the 0.4 image.
# ═══════════════════════════════════════════════════════════════
#
# Usage:
# ./refresh_seed.sh <ssh-user@host> [remote-data-dir]
#
# Examples:
# ./refresh_seed.sh start9@192.168.1.50
# ./refresh_seed.sh embassy@embassy.local \
# /embassy-data/package-data/volumes/ten-database/data/main
#
# What it does:
# 1. Finds the remote /data directory for the ten-database service.
# 2. Copies crm.db, backups/, and (optionally) .crm-secret into
# start9/0.4/seed/data/ on this machine.
# 3. Prints a row-count summary so you can verify content.
#
# After it finishes, run:
# make clean && make x86
# from this (start9/0.4/) directory to rebuild the .s9pk.
# ═══════════════════════════════════════════════════════════════
set -eu
if [ $# -lt 1 ]; then
echo "Usage: $0 <ssh-user@host> [remote-data-dir]"
echo ""
echo "Remote data dir defaults (tried in order):"
echo " /embassy-data/package-data/volumes/ten-database/data/main"
echo " /mnt/embassy-os/package-data/volumes/ten-database/data/main"
echo " /var/lib/embassy/services/ten-database/data"
exit 1
fi
REMOTE="$1"
REMOTE_DIR="${2:-}"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
SEED_DIR="$SCRIPT_DIR/seed/data"
echo ""
echo " Staging production seed from $REMOTE"
echo " into $SEED_DIR"
echo ""
# Auto-detect remote data dir if not supplied
if [ -z "$REMOTE_DIR" ]; then
echo " Probing for remote data directory..."
for candidate in \
"/embassy-data/package-data/volumes/ten-database/data/main" \
"/mnt/embassy-os/package-data/volumes/ten-database/data/main" \
"/var/lib/embassy/services/ten-database/data"; do
if ssh "$REMOTE" "[ -f \"$candidate/crm.db\" ]" 2>/dev/null; then
REMOTE_DIR="$candidate"
echo " found: $REMOTE_DIR"
break
fi
done
if [ -z "$REMOTE_DIR" ]; then
echo " Could not auto-detect a valid data directory with crm.db on $REMOTE."
echo " Re-run this script and pass the path explicitly as the 2nd argument."
exit 2
fi
fi
mkdir -p "$SEED_DIR/backups"
echo ""
echo " Copying crm.db ..."
scp "$REMOTE:$REMOTE_DIR/crm.db" "$SEED_DIR/crm.db"
echo " Copying backups/ (if present) ..."
if ssh "$REMOTE" "[ -d \"$REMOTE_DIR/backups\" ]" 2>/dev/null; then
scp -r "$REMOTE:$REMOTE_DIR/backups/." "$SEED_DIR/backups/" || true
else
echo " (none found, skipping)"
fi
echo " Copying .crm-secret (optional — keeps existing JWTs valid) ..."
if ssh "$REMOTE" "[ -f \"$REMOTE_DIR/.crm-secret\" ]" 2>/dev/null; then
read -r -p " Include .crm-secret in the baked image? [y/N] " ans
case "$ans" in
[yY]*) scp "$REMOTE:$REMOTE_DIR/.crm-secret" "$SEED_DIR/.crm-secret" ;;
*) echo " skipping .crm-secret; a fresh secret will be generated on first boot" ;;
esac
else
echo " (no .crm-secret on remote)"
fi
echo ""
echo " Summary of staged seed:"
ls -la "$SEED_DIR"
echo ""
if command -v python3 >/dev/null 2>&1 && [ -f "$SEED_DIR/crm.db" ]; then
python3 - <<PY
import sqlite3
db = sqlite3.connect("$SEED_DIR/crm.db")
cur = db.cursor()
cur.execute("PRAGMA integrity_check")
print(" integrity_check:", cur.fetchone()[0])
for t in ("users","fundraising_state","fundraising_funds","fundraising_views",
"contacts","organizations","audit_log","feature_requests","app_settings"):
try:
cur.execute(f"SELECT COUNT(*) FROM {t}")
print(f" {t:30s} {cur.fetchone()[0]} rows")
except Exception as e:
print(f" {t}: n/a ({e})")
PY
fi
echo ""
echo " Seed refreshed. Next: cd $(dirname "$SCRIPT_DIR")/0.4 && make clean && make x86"