docs/spark-control-integration.md: the SSH command contract (status via docker inspect; restart via docker restart; update via git fetch + reset --hard origin/master + docker compose up -d --build) plus the one-time conversion of the Spark's ~/matrix-bridge to a Gitea clone. No bot code change. Update source = git-pull-from-Gitea; rides Spark Control's existing SSH into spark-32d0 (no new key). Corrected the infra note: Spark is on the LAN with the Start9/Gitea host, so Spark->Gitea resolves directly.
8.1 KiB
Phase 3 — Spark Control integration (spec for the Spark Control dev)
Goal (ROADMAP Phase 3): surface the matrix-bridge bot's container status on the Spark Control dashboard, and add one-click update (pull + rebuild + restart) and restart, wired the same SSH-behind-buttons way Spark Control already drives the Sparks.
Exit (falsifiable): bot status is visible on the panel, and the bot can be updated/restarted from the panel.
This document is the contract: what to run, where, and what the output means. The matrix-bridge side is fixed below; map the buttons onto Spark Control's existing managed-service pattern however that codebase already models a Spark/service. No changes to matrix-bridge are required for this.
What the bot is
A single Docker container on the DGX Spark.
| Fact | Value |
|---|---|
| Host | spark-32d0 (10.59.211.6 on WireGuard), user modelo |
| Project dir | /home/modelo/matrix-bridge (~/matrix-bridge for modelo) |
| Compose service | bot |
| Container name | matrix-bridge (fixed via container_name:) |
| Image | matrix-bridge-bot |
| Lifecycle | host networking, restart: unless-stopped (survives Spark reboot) |
| Secrets | .env, config.toml — gitignored, live only on the Spark, never in git |
Spark Control already SSHes into spark-32d0, so these ride the existing channel — no new
key needed. All commands below assume they run as modelo (owner of the dir, member of
the docker group). If Spark Control's channel connects as a different user, wrap each command
in sudo -iu modelo bash -lc '<command>' — running git in modelo's repo as root trips git's
"dubious ownership" guard, so don't skip this.
One-time prerequisites (owner, not Spark Control dev)
The bot dir on the Spark was originally populated by scp of loose files. To make
git-pull-based updates work it must become a git clone of the Gitea repo without disturbing
the gitignored secrets (.env, config.toml). Because those two files are gitignored,
git reset --hard never touches them — so we can convert the existing dir in place.
0a. Confirm the Spark can reach + authenticate to Gitea (fail loud here, not at first button press):
git ls-remote ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git >/dev/null \
&& echo "gitea reachable" || echo "FIX gitea access first"
The Spark is on the same LAN as the Start9 host running Gitea, so immense-voyage.local
resolves directly — this should just work. If it doesn't, the only likely gap is a key
authorized for read on the Gitea repo available to modelo (deploy key or existing key).
Don't proceed until git ls-remote succeeds.
0b. Convert ~/matrix-bridge to a clone tracking master (run as modelo):
cd /home/modelo/matrix-bridge
git init -b master
git remote add origin ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git
git fetch origin
git reset --hard origin/master # secrets are gitignored → untouched
git branch --set-upstream-to=origin/master master
Verify the secrets survived and the container still comes up clean:
ls -la /home/modelo/matrix-bridge/.env /home/modelo/matrix-bridge/config.toml # both present
git -C /home/modelo/matrix-bridge status # .env/config.toml show as ignored, tree clean
docker compose up -d --build && docker ps --filter name=^/matrix-bridge$
master is the release branch (today master == phase-1). Track whatever you treat as the
release line; the commands below assume origin/master.
The contract — commands behind each control
Run from /home/modelo/matrix-bridge as modelo. Each is idempotent and fail-loud
(non-zero exit ⇒ surface it on the panel; don't swallow).
Status (poll for the badge)
docker inspect -f '{{.State.Status}}|{{.State.StartedAt}}|{{.RestartCount}}' matrix-bridge
- Output e.g.
running|2026-06-15T18:02:11.4Z|0. Parse field 1 for the badge:running→ green/up. Field 3 (RestartCount) climbing while status flips torestarting⇒ crash loop — show it; that's the most useful signal a dashboard gives here.exited→ stopped/crashed.restarting→ unhealthy / boot-looping.
- Non-zero exit (
No such object: matrix-bridge) ⇒ not deployed — distinct from "stopped". Show that state rather than erroring out.
Friendlier one-liner for a human-readable badge (empty string when not running):
docker ps --filter name=^/matrix-bridge$ --format '{{.Status}}' # e.g. "Up 2 hours"
Logs (optional "view logs" action — handy for diagnosing a red badge)
docker logs --tail 100 matrix-bridge
Restart (no code change)
docker restart matrix-bridge
Update (pull latest code + rebuild + recreate) — the headline button
cd /home/modelo/matrix-bridge \
&& git fetch origin \
&& git reset --hard origin/master \
&& docker compose up -d --build
git reset --hard origin/masteris the deploy-box "always match remote" semantic: never gets stuck on divergence, and gitignored secrets are preserved. (If you'd rather detect divergence,git pull --ff-onlyis the gentler alternative — but then a wedged tree needs manual help.)docker compose up -d --buildrebuilds the image and recreates the container only if the build changed. First build after a base-image bump is slow (minutes); subsequent builds hit the layer cache. Treat update as long-running: stream/await output, set a generous timeout (≥10 min), and don't block the dashboard on it.
Stop / Start (optional)
docker stop matrix-bridge # stop
cd /home/modelo/matrix-bridge && docker compose up -d # start (recreates if needed)
Spark Control-side wiring (for the dev)
Map the above onto however Spark Control already registers a managed Spark/service:
- Register
matrix-bridgeas a managed service (a tile), targetingspark-32d0over the existing SSH channel, commands run asmodelo. - Status badge ← poll the Status command on the panel's normal refresh cadence; map the
four states above (running / exited / restarting / not-deployed) to your existing badge
vocabulary. Surface
RestartCountif your tile can show a secondary metric — a climbing count is the crash-loop tell. - Buttons:
Update,Restart(required for the exit criterion);Logs,Stop/Start(optional, nice-to-have). - Fail-loud, surfaced. Every command's non-zero exit + stderr must reach the panel, not a
silent failure — this mirrors matrix-bridge's own discipline (a bad launch reports back into
the room rather than hanging). Especially: a failed
git fetch(Gitea unreachable) or a failed build should show the error, not a stuck spinner. Updateis long-running — see the timeout/streaming note above.
What I deliberately left generic: the tile's exact place in Spark Control's code, its UI, and its config schema — that's yours to fit to the existing pattern. If a precise drop-in matters, share how a Spark is currently registered (config entry + the command-runner seam) and I'll tailor steps 1–5 to it.
Acceptance (maps to the ROADMAP exit)
- Status tile shows the bot's live state and flips correctly across a manual
docker stop/docker starton the Spark. Restartfrom the panel cycles the container (status returns torunning).Updatefrom the panel pulls a new commit, rebuilds, and recreates the container — and surfaces a clear error if Gitea is unreachable or the build fails.
Note — optional future enhancement (not required for Phase 3)
The Status command reports container liveness (process up), not Matrix connectivity — the bot
can be running yet disconnected from Synapse. A truer signal would need a Docker HEALTHCHECK
backed by a bot-side liveness signal (e.g. the bot touches a file or exposes a tiny endpoint on
each successful sync loop), after which Status could read {{.State.Health.Status}}. That's a
matrix-bridge-side change, out of scope here — flag it if/when "running but silent" actually bites.