diff --git a/AGENTS.md b/AGENTS.md index b1ebe66..c574600 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -70,6 +70,9 @@ the full answer back into the room (ask mode, D12). - `AGENTS.md` — this file (canonical; `CLAUDE.md` is a relative symlink to it). - `ROADMAP.md` — Phases 1–4+ with falsifiable exits, plus deferred/future directions. - `README.md` — human-facing intro. +- `docs/spark-control-integration.md` — Phase 3 spec for the Spark Control dev: the SSH + command contract (status / restart / git-pull update) the dashboard drives, plus the one-time + conversion of the Spark's `~/matrix-bridge` to a Gitea clone. matrix-bridge needs no code change. - `scripts/launch-claude.sh` — the Mac-side launch wrapper (the only seam that knows the Mac's environment). - `config.example.toml` — room→repo mapping template; the real `config.toml` is gitignored. @@ -164,12 +167,16 @@ once" is not done. ## Infra facts (proven — stable reference) -- **WireGuard (`starttunnel`), not LAN:** Mac `10.59.211.5`; Spark (`spark-32d0`, user `modelo`) - `10.59.211.6`. The Spark is not on the Mac's LAN subnet. +- **WireGuard (`starttunnel`) for Mac↔Spark:** Mac `10.59.211.5`; Spark (`spark-32d0`, user `modelo`) + `10.59.211.6`. The Mac↔Spark seam runs over WireGuard (not the Mac's LAN subnet). The Spark *is* + on the LAN, same as the Start9 host (`immense-voyage`) — so Spark→Gitea (`immense-voyage.local:59916`) + resolves and works directly. - **Spark → Mac:** SSH alias `mac-bridge` → the Mac as user `macpro`, dedicated key (`~/.ssh/id_ed25519` on the Spark, in the Mac's `authorized_keys`). The Spark host's `~/.ssh/config` needs `IdentitiesOnly yes` because a `Host *` rule shadows the default key; the container regenerates a clean config from `config.toml [mac]`. -- **Mac → Spark:** no authorized key — Spark-side ops (deploy/restart) are owner-run until Phase 3. +- **Mac → Spark:** no authorized key — direct Mac-initiated Spark ops stay owner-run. (This is *not* + what Phase 3 closes: Spark Control already has its own SSH channel into `spark-32d0`, so its + status/update/restart buttons ride that, not a Mac→Spark key.) - **Matrix:** homeserver `https://matrix.gilliam.ai` (StartOS Synapse), bot `@agent:matrix.gilliam.ai`, device `matrix-bridge-bot`. The bot reuses the stored access token (`.env`) — never re-logs in (avoids device churn). No E2EE (D9); bot↔Synapse is clearnet TLS, softening D9's WireGuard-only rationale. @@ -190,12 +197,19 @@ once" is not done. drivable session on the phone via Remote Control. - **Ask mode** (`?`-prefixed message): `ssh mac-bridge → ask-claude.sh → claude -p`, full answer posted back into the room (chunked, no truncation). See D12. -- **Phase 2 (multi-room routing)** is effectively satisfied — the bot is built multi-room and routes by - `room_id`; only a formal N=3 confirmation pass remains. -- **Next — Phase 3 (deferred to next session by owner):** Spark Control integration — bot container - status + one-click update/restart on the dashboard; also closes the Mac-has-no-key-into-Spark gap. -- **Open / risks:** (a) a `?`-ask in a repo `claude` has never opened may stall on the folder-trust gate - — add a trust flag to `ask-claude.sh` if/when hit, not preemptively; (b) owner TODO: clean up the - accidental MacBook docker deploy (`docker compose down` + `docker image rm matrix-bridge-bot`). -- **Repo:** tree clean; `master` == `phase-1` == `8ad1cd8`, pushed to Gitea. No test suite (pre-existing); - this session's changes were syntax/unit-checked locally, fresh-eyes reviewed, and proven live. +- **Phase 2 (multi-room routing) — DONE.** Owner confirmed the N=3 pass: routes by `room_id`, + correct repo, zero wrong-directory launches. +- **Phase 3 (Spark Control integration) — spec drafted, handed to the Spark Control dev (2026-06-15).** + See `docs/spark-control-integration.md`: the SSH command contract (status via `docker inspect`; + restart via `docker restart`; update via `git fetch && git reset --hard origin/master && + docker compose up -d --build`) plus a one-time conversion of the Spark's `~/matrix-bridge` from + scp'd loose files to a Gitea clone (secrets are gitignored, so `reset --hard` preserves them). + Decisions this session: update source = git-pull-from-Gitea (not scp-from-Mac); Spark Control + already SSHes into `spark-32d0`, so no new key. **matrix-bridge needs no code change** — the work + is now Spark Control-side (status tile + buttons) + the one-time Spark migration. Awaiting the dev. +- **Open / risks:** a `?`-ask in a repo `claude` has never opened may stall on the folder-trust gate + — add a trust flag to `ask-claude.sh` if/when hit, not preemptively. (Resolved this session: the + accidental MacBook docker deploy was cleaned up by the owner.) +- **Repo:** `master` == `phase-1` == `ee8408d` pushed to Gitea; this session adds the Phase 3 spec + doc + these AGENTS.md edits on top (uncommitted — propose committing as the handoff). No test suite + (pre-existing); the doc is a spec, no code changed. diff --git a/ROADMAP.md b/ROADMAP.md index 31ffdf8..bd8d33e 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -26,19 +26,23 @@ after it. - **Exit (falsifiable):** 3 consecutive real messages each correctly launch a drivable session on the phone. -## Phase 2 — Multi-room routing +## Phase 2 — Multi-room routing — DONE (2026-06-15) - Room → repo mapping table; the bot routes by `room_id` (config over code). - **Exit (falsifiable):** 3 real uses across ≥2 rooms, correct repo every time, zero - wrong-directory launches. + wrong-directory launches. *Met — owner-confirmed N=3 pass.* -## Phase 3 — Spark Control integration +## Phase 3 — Spark Control integration — SPEC DRAFTED (2026-06-15), awaiting Spark Control dev - Bot container status surfaced on the Spark Control dashboard. - One-click update (pull + restart) wired the same way Spark Control drives the Sparks today (SSH/commands behind a button). - **Exit (falsifiable):** bot status is visible and the bot can be updated/restarted from the panel. +- **Spec:** `docs/spark-control-integration.md` — the SSH command contract + one-time Spark + migration to a Gitea clone. Decided: update = git-pull-from-Gitea; Spark Control's existing + SSH into `spark-32d0` carries the buttons (no new key). matrix-bridge needs no code change; + remaining work is Spark Control-side + the one-time migration. ## Phase 4+ — Future direction (documented, not yet scoped to build) diff --git a/docs/spark-control-integration.md b/docs/spark-control-integration.md new file mode 100644 index 0000000..51faa65 --- /dev/null +++ b/docs/spark-control-integration.md @@ -0,0 +1,186 @@ +# Phase 3 — Spark Control integration (spec for the Spark Control dev) + +**Goal (ROADMAP Phase 3):** surface the matrix-bridge bot's container status on the Spark +Control dashboard, and add one-click **update** (pull + rebuild + restart) and **restart**, +wired the same SSH-behind-buttons way Spark Control already drives the Sparks. + +**Exit (falsifiable):** bot status is visible on the panel, and the bot can be +updated/restarted from the panel. + +This document is the **contract**: what to run, where, and what the output means. The +matrix-bridge side is fixed below; map the buttons onto Spark Control's existing +managed-service pattern however that codebase already models a Spark/service. No changes to +matrix-bridge are required for this. + +--- + +## What the bot is + +A single Docker container on the DGX Spark. + +| Fact | Value | +|---|---| +| Host | `spark-32d0` (`10.59.211.6` on WireGuard), user **`modelo`** | +| Project dir | `/home/modelo/matrix-bridge` (`~/matrix-bridge` for modelo) | +| Compose service | `bot` | +| Container name | `matrix-bridge` (fixed via `container_name:`) | +| Image | `matrix-bridge-bot` | +| Lifecycle | host networking, `restart: unless-stopped` (survives Spark reboot) | +| Secrets | `.env`, `config.toml` — **gitignored**, live only on the Spark, never in git | + +Spark Control already SSHes into `spark-32d0`, so these ride the existing channel — **no new +key needed.** All commands below assume they run **as `modelo`** (owner of the dir, member of +the `docker` group). If Spark Control's channel connects as a different user, wrap each command +in `sudo -iu modelo bash -lc ''` — running `git` in modelo's repo as root trips git's +"dubious ownership" guard, so don't skip this. + +--- + +## One-time prerequisites (owner, not Spark Control dev) + +The bot dir on the Spark was originally populated by `scp` of loose files. To make +git-pull-based updates work it must become a git clone of the Gitea repo **without disturbing +the gitignored secrets** (`.env`, `config.toml`). Because those two files are gitignored, +`git reset --hard` never touches them — so we can convert the existing dir in place. + +**0a. Confirm the Spark can reach + authenticate to Gitea (fail loud here, not at first button press):** + +```sh +git ls-remote ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git >/dev/null \ + && echo "gitea reachable" || echo "FIX gitea access first" +``` + +The Spark is on the same LAN as the Start9 host running Gitea, so `immense-voyage.local` +resolves directly — this should just work. If it doesn't, the only likely gap is a key +authorized for read on the Gitea repo available to `modelo` (deploy key or existing key). +Don't proceed until `git ls-remote` succeeds. + +**0b. Convert `~/matrix-bridge` to a clone tracking `master` (run as `modelo`):** + +```sh +cd /home/modelo/matrix-bridge +git init -b master +git remote add origin ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git +git fetch origin +git reset --hard origin/master # secrets are gitignored → untouched +git branch --set-upstream-to=origin/master master +``` + +Verify the secrets survived and the container still comes up clean: + +```sh +ls -la /home/modelo/matrix-bridge/.env /home/modelo/matrix-bridge/config.toml # both present +git -C /home/modelo/matrix-bridge status # .env/config.toml show as ignored, tree clean +docker compose up -d --build && docker ps --filter name=^/matrix-bridge$ +``` + +`master` is the release branch (today `master == phase-1`). Track whatever you treat as the +release line; the commands below assume `origin/master`. + +--- + +## The contract — commands behind each control + +Run from `/home/modelo/matrix-bridge` as `modelo`. Each is idempotent and fail-loud +(non-zero exit ⇒ surface it on the panel; don't swallow). + +### Status (poll for the badge) + +```sh +docker inspect -f '{{.State.Status}}|{{.State.StartedAt}}|{{.RestartCount}}' matrix-bridge +``` + +- Output e.g. `running|2026-06-15T18:02:11.4Z|0`. Parse field 1 for the badge: + - `running` → green/up. Field 3 (`RestartCount`) climbing while status flips to + `restarting` ⇒ **crash loop** — show it; that's the most useful signal a dashboard gives here. + - `exited` → stopped/crashed. + - `restarting` → unhealthy / boot-looping. +- **Non-zero exit** (`No such object: matrix-bridge`) ⇒ **not deployed** — distinct from + "stopped". Show that state rather than erroring out. + +Friendlier one-liner for a human-readable badge (empty string when not running): + +```sh +docker ps --filter name=^/matrix-bridge$ --format '{{.Status}}' # e.g. "Up 2 hours" +``` + +### Logs (optional "view logs" action — handy for diagnosing a red badge) + +```sh +docker logs --tail 100 matrix-bridge +``` + +### Restart (no code change) + +```sh +docker restart matrix-bridge +``` + +### Update (pull latest code + rebuild + recreate) — the headline button + +```sh +cd /home/modelo/matrix-bridge \ + && git fetch origin \ + && git reset --hard origin/master \ + && docker compose up -d --build +``` + +- `git reset --hard origin/master` is the deploy-box "always match remote" semantic: never gets + stuck on divergence, and gitignored secrets are preserved. (If you'd rather detect divergence, + `git pull --ff-only` is the gentler alternative — but then a wedged tree needs manual help.) +- `docker compose up -d --build` rebuilds the image and recreates the container only if the + build changed. First build after a base-image bump is slow (minutes); subsequent builds hit + the layer cache. **Treat update as long-running**: stream/await output, set a generous + timeout (≥10 min), and don't block the dashboard on it. + +### Stop / Start (optional) + +```sh +docker stop matrix-bridge # stop +cd /home/modelo/matrix-bridge && docker compose up -d # start (recreates if needed) +``` + +--- + +## Spark Control-side wiring (for the dev) + +Map the above onto however Spark Control already registers a managed Spark/service: + +1. **Register `matrix-bridge`** as a managed service (a tile), targeting `spark-32d0` over the + existing SSH channel, commands run as `modelo`. +2. **Status badge** ← poll the *Status* command on the panel's normal refresh cadence; map the + four states above (running / exited / restarting / not-deployed) to your existing badge + vocabulary. Surface `RestartCount` if your tile can show a secondary metric — a climbing + count is the crash-loop tell. +3. **Buttons:** `Update`, `Restart` (required for the exit criterion); `Logs`, `Stop`/`Start` + (optional, nice-to-have). +4. **Fail-loud, surfaced.** Every command's non-zero exit + stderr must reach the panel, not a + silent failure — this mirrors matrix-bridge's own discipline (a bad launch reports back into + the room rather than hanging). Especially: a failed `git fetch` (Gitea unreachable) or a + failed build should show the error, not a stuck spinner. +5. **`Update` is long-running** — see the timeout/streaming note above. + +What I deliberately left generic: the tile's exact place in Spark Control's code, its UI, and +its config schema — that's yours to fit to the existing pattern. If a precise drop-in matters, +share how a Spark is currently registered (config entry + the command-runner seam) and I'll +tailor steps 1–5 to it. + +--- + +## Acceptance (maps to the ROADMAP exit) + +- [ ] Status tile shows the bot's live state and flips correctly across a manual + `docker stop` / `docker start` on the Spark. +- [ ] `Restart` from the panel cycles the container (status returns to `running`). +- [ ] `Update` from the panel pulls a new commit, rebuilds, and recreates the container — and + surfaces a clear error if Gitea is unreachable or the build fails. + +--- + +## Note — optional future enhancement (not required for Phase 3) + +The *Status* command reports container liveness (process up), not Matrix connectivity — the bot +can be `running` yet disconnected from Synapse. A truer signal would need a Docker `HEALTHCHECK` +backed by a bot-side liveness signal (e.g. the bot touches a file or exposes a tiny endpoint on +each successful sync loop), after which Status could read `{{.State.Health.Status}}`. That's a +matrix-bridge-side change, out of scope here — flag it if/when "running but silent" actually bites.