Record intake-bot containerization; log parse fix, card handoff, and repo-extraction follow-ons

Bot now runs as a docker-compose service on the Spark (verified live, listening). Docs (matrix-intake guide ops, ROADMAP, AGENTS Current state) updated. Also logs the live-smoke parse bug (teammate read as investor -> team-roster fix), the spark-control dashboard-card handoff, and the long-term dedicated-repo extraction.
2026-06-17 20:13:35 -05:00
parent b470ea2659
commit cae2dbc8b9
3 changed files with 23 additions and 10 deletions
@@ -106,6 +106,12 @@ Use the **matrix-bridge** repo's pattern to listen on a dedicated ten31-database
 **Post-deploy enhancement — conversational (LLM-mediated) edits (Grant, 2026-06-17). DEPLOYED & LIVE 2026-06-17 (bot-side; pulled + restarted on the Spark `modelo32`); Matrix live-smoke pending.** Today an in-thread correction uses a rigid grammar (`edit field=value`). Let a free-form reply that isn't `yes`/`no`/a literal `edit …` be treated as a natural-language revision instruction: send {current proposal + the instruction} back through local Qwen (`spark.py`, the same parse leg — no Claude, no scrub) and re-render the revised proposal card for approval (e.g. "add that we met on June 14" → updated Note). Keeps the draft→human-approve gate (the human still confirms the LLM's revision) and subsumes `edit field=value` as a deterministic fast path. Thread the instruction text into `normalize`'s source so the email-integrity rule still holds (a revised email must appear in the original message or the instruction). Pairs naturally with the fuzzy-match item above — build both as one conversational-UX pass after the smoke. (Parsing of free-form *intake* messages already works today via the Qwen parse leg; this item is specifically about the *edit/refine* turn.)
 - **As built:** `parse.revise` + `_apply_revision` (offline-testable; the approval-stage `else` branch in `bot.py` routes any non-yes/no/edit reply here). `parse_message` now stashes `_source_text` so revise can re-check email integrity against {instruction + original}; the model's email field is never trusted. No-op revisions are caught via `proposals.same_fields` (re-prompt, not a false "Updated"). **Known v1 limit:** revise edits fields but does not re-run the matcher on a mid-thread firm rename. Tests: `matrix_intake/test_parse.py` (revise merge + email integrity + match-id preservation).

+**Managed service — DONE (container) 2026-06-17; dashboard card deferred to a spark-control session.** The bot ran as a bare `nohup` process (silently died on a Spark reboot). Now it's a **docker-compose service** (`docker-compose.yml` at the repo root + `backend/matrix_intake/Dockerfile`; `restart: unless-stopped` → survives reboot; image bundles `backend/matrix_intake` + the stdlib `backend/ingest` Spark client; `.env` mounted read-only). Cutover done on the Spark (nohup stopped, container `matrix-intake` up + listening). **Still bare `docker`/SSH-managed** — a spark-control dashboard card (Update/Restart/Stop/Logs tile like `matrix-bridge`) is a separate task in the spark-control repo: see `docs/handoffs/add-intake-bot-to-spark-control.md`.
+
+**Parse mis-identifies the investor when the message names an internal teammate (found in live-smoke, 2026-06-17).** *"jonathan is chatting with wyoming soon about fund commitment"* → the bot picked **jonathan** (a colleague/CRM admin) as the investor and offered a Jonathan/Nathan fuzzy shortlist, when the investor is **Wyoming**. Root cause is upstream of matching: local Qwen has no notion of who's internal, and mis-read the sentence role. **Fix (cheap, high-confidence, near-term):** feed the parse prompt the ~5-person team roster + the frame *"messages are written by a team member about a prospect; a named team member is the person doing outreach, never the investor"* (roster from a config value or a small read — not the admin-gated `/api/users`, since the bot is a member). Offline-testable (stub the model). **Bigger design (deferred, needs more failure samples):** the user's idea of routing inputs through the LLM *with grid context* for entity resolution — feasible (local Qwen, same as the digest, never Claude) but feed a **bounded shortlist, not the full ~400-name grid** (a small model dilutes on a haystack); pairs with the deferred LLM-judge. Also exposes a missing concept: the **internal deal owner** (Jonathan), which the bot doesn't model. Get 3–5 more real intake messages before re-architecting; the roster fix lands regardless.
+
+**Long-term — extract the intake bot to its own repo (recommended, not yet done).** Containerizing from this monorepo is the pragmatic now-state, but the bot is a genuinely separate deployable (own process, own `matrix-nio` dep, own lifecycle); its only CRM coupling is the HTTP API (a clean network contract) plus ~40 lines of stdlib Spark client (cheap to vendor). The tell: the spark-control Update button would run `git reset --hard origin/main` on the **whole CRM clone** — wrong blast radius. `matrix-bridge` is already a dedicated repo; mirror it. The extraction is a migration (new Gitea repo, move code + tests + guide, vendor the client, re-point the Spark deploy), so it's deferred until worth the lift — do it *before* wiring the spark-control card if both land in the same push.
+
 ### Scoped service-credential auth path for automated CRM writers
 *Surfaced 2026-06-17 while deploying the Matrix intake bot. **Decision: defer — the bot uses a dedicated member username/password for now.** The CRM has no API-key/service-token path; its only auth is username+password → JWT. A dedicated **member** login is appropriately scoped against what matters operationally (no admin: can't manage users, reset data, or change settings) and unblocks the live smoke today.*