Phase 0 go-live polish: hands-off incremental sync + refresh action
- backend/ingest/sync_scheduler.py: periodic incremental-sync loop (every CRM_INGEST_SYNC_INTERVAL_MIN min); resilient, --once for testing. - start9/0.4: "Refresh search index" action (incremental sync.py); entrypoint launches the scheduler as a background process when Spark/Qdrant are set; CRM_INGEST_SYNC_INTERVAL_MIN env; pre-release note on fastembed/mcp pins. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+105
-18
@@ -13,10 +13,11 @@ unchanged.
|
||||
|
||||
| File | Change |
|
||||
| --- | --- |
|
||||
| `Dockerfile` | `COPY backend/ingest` and `COPY backend/mcp` into the image alongside `backend/server.py`. Added two runtime deps to the existing `pip install`: `fastembed==0.4.2` (client-side BM25 / `Qdrant/bm25` for the sparse retrieval leg) and `mcp==1.2.0` (MCP Python SDK, only for `backend/mcp/server.py`). |
|
||||
| `docker_entrypoint.sh` | Added an export block for the ingest/retrieval env: `CRM_DB_PATH`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, with LAN-default placeholder values and an operator comment. The CRM web server ignores these; they exist so manual `python3 /app/backend/ingest/...` and `backend/mcp/server.py` runs on the box inherit them. |
|
||||
| `startos/actions/buildSearchIndex.ts` | **New.** A one-shot "Build search index" StartOS action (Steps 3–4 of the runbook). |
|
||||
| `startos/actions/index.ts` | Registered the new action: `sdk.Actions.of().addAction(buildSearchIndex)`. |
|
||||
| `Dockerfile` | `COPY backend/ingest` and `COPY backend/mcp` into the image alongside `backend/server.py`. Added two runtime deps to the existing `pip install`: `fastembed==0.4.2` (client-side BM25 / `Qdrant/bm25` for the sparse retrieval leg) and `mcp==1.2.0` (MCP Python SDK, only for `backend/mcp/server.py`). **These two pins carry a pre-release multi-arch verification requirement — see "Pre-release checks" below.** |
|
||||
| `docker_entrypoint.sh` | Added an export block for the ingest/retrieval env: `CRM_DB_PATH`, `SPARK_CONTROL_URL`, `SPARK_CONTROL_VERIFY_TLS`, `QDRANT_URL`, `CRM_INGEST_SYNC_INTERVAL_MIN`, with LAN-default placeholder values and an operator comment. The CRM web server ignores these; they exist so manual `python3 /app/backend/ingest/...` and `backend/mcp/server.py` runs on the box inherit them. Also launches the **background ingest sync scheduler** (`sync_scheduler.py`) before `exec`-ing the web server, guarded so it only starts when Spark Control + Qdrant are configured — see "Automatic scheduled refresh" below. |
|
||||
| `startos/actions/buildSearchIndex.ts` | **New.** A one-shot "Build search index" StartOS action (Steps 3–4 of the runbook) — full rebuild with `--recreate`. |
|
||||
| `startos/actions/refreshSearchIndex.ts` | **New.** A manual "Refresh search index" action — incremental, idempotent `sync.py` (no `--recreate`); the manual counterpart to the background scheduler. |
|
||||
| `startos/actions/index.ts` | Registered both actions: `sdk.Actions.of().addAction(buildSearchIndex).addAction(refreshSearchIndex)`. |
|
||||
| `startos/versions/v0.1.0.44.ts` + `versions/index.ts` | New version `0.1.0:44` (image-only change, no data migration) set as `current`; `0.1.0:43` moved to `other`. |
|
||||
| `startos/utils.ts` | Bumped the informational `PACKAGE_VERSION` constant to `0.1.0:44`. |
|
||||
|
||||
@@ -50,6 +51,63 @@ siblings by bare name, e.g. `import config`, so they must run from that
|
||||
directory). It uses `allowedStatuses: 'any'` — SQLite WAL mode makes a
|
||||
concurrently-running CRM safe for these reads/derived writes.
|
||||
|
||||
## Keeping the index fresh (hands-off refresh)
|
||||
|
||||
The "Build search index" action above is a full one-shot rebuild. To keep the
|
||||
index current as the CRM changes, there are now two incremental paths — both run
|
||||
`sync.py` (chunk → dense+BM25 → Qdrant upsert) for **changed records only**, with
|
||||
NO `--recreate`, so they never drop the collection and are safe to run any time.
|
||||
|
||||
### Manual: "Refresh search index" action
|
||||
|
||||
`startos/actions/refreshSearchIndex.ts` adds a second StartOS action,
|
||||
**Refresh search index** (id `refresh-search-index`). It mirrors
|
||||
`buildSearchIndex.ts` exactly — same subcontainer, same `/data` mount, same
|
||||
explicit `ingestEnv` — but runs `python3 sync.py --db /data/crm.db` (no
|
||||
`--recreate`) with `cwd = /app/backend/ingest`. An incremental delta is usually
|
||||
seconds to a few minutes (the action allows up to 30 min of headroom). Use it
|
||||
for an on-demand refresh; use "Build search index" only for a full rebuild.
|
||||
|
||||
### Automatic: background sync scheduler
|
||||
|
||||
For hands-off freshness, `docker_entrypoint.sh` launches
|
||||
`backend/ingest/sync_scheduler.py` as a **background process** just before it
|
||||
`exec`s the web server. `sync_scheduler.py` loops the incremental sync every
|
||||
`CRM_INGEST_SYNC_INTERVAL_MIN` minutes (default **60**, exported in the
|
||||
entrypoint's env block with an operator comment). It logs to
|
||||
`/data/ingest-sync.log`.
|
||||
|
||||
The launch is **guarded**: it only starts when both `SPARK_CONTROL_URL` and
|
||||
`QDRANT_URL` are set (both are exported just above it, so the default LAN values
|
||||
satisfy the guard; an operator who clears them to disable ingest also disables
|
||||
the scheduler). The entrypoint prints `STARTED` or `SKIPPED (Spark/Qdrant not
|
||||
configured)` so the choice is visible in the service logs.
|
||||
|
||||
#### Why a background process and not a StartOS daemon
|
||||
|
||||
The prior agent deliberately avoided adding the stdio MCP server as a daemon
|
||||
because StartOS daemons are built around a network port + `checkPortListening`
|
||||
health check, and a portless process has no liveness signal to probe (see "MCP
|
||||
server" below). `sync_scheduler.py` is the same shape — a long-running loop with
|
||||
no port — so adding it as a second daemon in `main.ts` would hit the same
|
||||
mismatch.
|
||||
|
||||
Launching it as a child of the entrypoint sidesteps that entirely:
|
||||
|
||||
- **Pro:** no portless-daemon contortion; it shares the `primary` container's
|
||||
`/data` and inherited env; the existing `primary` daemon and its
|
||||
`checkPortListening` health check are untouched.
|
||||
- **Con:** StartOS does not supervise it independently. If the scheduler dies it
|
||||
is not auto-restarted on its own (the container as a whole is still
|
||||
health-checked via the web server), and it has no separate status tile in the
|
||||
UI. Crashes surface only in `/data/ingest-sync.log`. The manual "Refresh
|
||||
search index" action is the always-available fallback.
|
||||
|
||||
If a future phase wants first-class supervision/visibility, promote it to a real
|
||||
StartOS daemon — but, as with the MCP server, only after giving the work a
|
||||
network transport (e.g. a tiny HTTP health endpoint) so it has a meaningful
|
||||
`checkPortListening` probe.
|
||||
|
||||
## Env / config the operator must set (Spark URLs)
|
||||
|
||||
The ingest run reaches out to **Spark Control** (dense embeddings) and **Qdrant**
|
||||
@@ -63,18 +121,22 @@ the Ten31 LAN defaults:
|
||||
| `SPARK_CONTROL_VERIFY_TLS` | `false` (Spark Control uses a self-signed cert) | TLS verification toggle |
|
||||
| `QDRANT_URL` | `http://192.168.1.87:6333` | Qdrant collection admin + upserts |
|
||||
| `CRM_DB_PATH` | `/data/crm.db` | both scripts + MCP server (already correct) |
|
||||
| `CRM_INGEST_SYNC_INTERVAL_MIN` | `60` | background sync scheduler loop interval (entrypoint only) |
|
||||
|
||||
Where to set them:
|
||||
|
||||
- **`docker_entrypoint.sh`** — for manual `python3` / MCP runs via the running
|
||||
container. Edit the `${VAR:-default}` block, or override via the StartOS
|
||||
service environment.
|
||||
- **`startos/actions/buildSearchIndex.ts`** (`ingestEnv`) — for the "Build search
|
||||
index" action, which runs in its own subcontainer and does **not** execute the
|
||||
entrypoint, so it carries its own copy of the values. Edit these to match.
|
||||
container and for the background sync scheduler. Edit the `${VAR:-default}`
|
||||
block, or override via the StartOS service environment.
|
||||
- **`startos/actions/buildSearchIndex.ts`** and
|
||||
**`startos/actions/refreshSearchIndex.ts`** (`ingestEnv`) — for the "Build
|
||||
search index" and "Refresh search index" actions, which run in their own
|
||||
subcontainers and do **not** execute the entrypoint, so each carries its own
|
||||
copy of the values. Edit these to match. (`CRM_INGEST_SYNC_INTERVAL_MIN` only
|
||||
matters to the entrypoint's scheduler loop, not to the actions.)
|
||||
|
||||
> Keep the two copies in sync. They are duplicated because the action's
|
||||
> subcontainer never runs `docker_entrypoint.sh`; there is no shared config
|
||||
> Keep the copies in sync. They are duplicated because the actions'
|
||||
> subcontainers never run `docker_entrypoint.sh`; there is no shared config
|
||||
> store wired into this package today (see "Still needed" below).
|
||||
|
||||
Verify reachability from the box before running the action:
|
||||
@@ -118,13 +180,16 @@ That is deliberately deferred to a later phase.
|
||||
|
||||
- **MCP-as-a-service** — see above. Deferred until there is a live agent and a
|
||||
network transport; today it is manual/stdio only.
|
||||
- **Incremental sync (runbook Step 6 / Workstream B4)** — the action does a full
|
||||
one-shot rebuild. Keeping the index fresh as the CRM changes needs an
|
||||
incremental, idempotent sync on a schedule. Until that exists, re-running the
|
||||
"Build search index" action is the refresh path. When built, it could be wired
|
||||
as a recurring StartOS action/task rather than a manual re-run.
|
||||
- ~~**Incremental sync (runbook Step 6 / Workstream B4)**~~ — **done.** The
|
||||
background sync scheduler (`sync_scheduler.py`, started by the entrypoint) keeps
|
||||
the index fresh automatically, and the manual "Refresh search index" action
|
||||
provides an on-demand incremental sync. See "Keeping the index fresh" above. A
|
||||
future enhancement could still promote the scheduler to a first-class StartOS
|
||||
daemon (with a network transport for a real health check) for independent
|
||||
supervision/visibility.
|
||||
- **Single source of truth for Spark/Qdrant config** — currently duplicated in
|
||||
`docker_entrypoint.sh` and `buildSearchIndex.ts`. A small StartOS config
|
||||
`docker_entrypoint.sh`, `buildSearchIndex.ts`, and `refreshSearchIndex.ts`. A
|
||||
small StartOS config
|
||||
store + input form (the SDK supports `Action.withInput` and a service config)
|
||||
would let the operator set the endpoints once in the UI; deferred to keep this
|
||||
change minimal and reviewable.
|
||||
@@ -133,10 +198,32 @@ That is deliberately deferred to a later phase.
|
||||
env). Not required given the exported env above, but available as an
|
||||
alternative if the operator prefers a file.
|
||||
|
||||
## Pre-release checks
|
||||
|
||||
Verify before cutting a release:
|
||||
|
||||
- **Multi-arch dependency build (BLOCKER).** The `fastembed==0.4.2` and
|
||||
`mcp==1.2.0` pins in `Dockerfile` were chosen best-effort and have **not** been
|
||||
confirmed to build on **both** `x86_64` and `aarch64`. StartOS targets arm64,
|
||||
and `fastembed` pulls `onnxruntime` (which may have no prebuilt arm64 wheel and
|
||||
fall back to a slow source build) plus downloads a model on first use. Build the
|
||||
image on aarch64 and run the ingest once end-to-end before release. Do not bump
|
||||
either pin without re-verifying on both arches. (Flagged inline above the pip
|
||||
line in `Dockerfile`.)
|
||||
- **Scheduler smoke test.** With Spark Control + Qdrant reachable, start the
|
||||
container and confirm the entrypoint logs
|
||||
`[entrypoint] ingest sync scheduler: STARTED`, that `/data/ingest-sync.log`
|
||||
accumulates sync output, and that clearing one of the endpoints flips the log to
|
||||
`SKIPPED`.
|
||||
- **Actions present.** Confirm both **Build search index** and **Refresh search
|
||||
index** appear under the service's Actions in the StartOS UI and run to success.
|
||||
|
||||
## Constraints honored
|
||||
|
||||
- No files under `backend/ingest/`, `backend/mcp/`, `backend/server.py`,
|
||||
`backend/core_migrations.py`, `backend/migrations/`, or `data/` were modified;
|
||||
only `start9/0.4/**` and this new doc.
|
||||
only `start9/0.4/**` and this doc. The entrypoint and the refresh action
|
||||
reference `backend/ingest/sync_scheduler.py` and `backend/ingest/sync.py` by
|
||||
path only — those scripts are owned/created by a separate process.
|
||||
- No build/deploy commands were run. `npx tsc --noEmit` was used only to verify
|
||||
the new TypeScript compiles against the SDK types.
|
||||
|
||||
Reference in New Issue
Block a user