Files
matrix-bridge/docs/spark-control-integration.md
T
Keysat 28c974fe1d Mark Phase 3 (Spark Control) done; trim spec to live command contract
Shipped in Spark Control v0.21.0: status badge + Update/Restart/Stop-Start/Logs
tile. All three exit criteria confirmed. matrix-bridge needed no code change.

- AGENTS.md: Current state + ROADMAP Phase 3 -> DONE; Deploy switched scp -> git
  pull (Update button); D10 stamped; new Infra fact for the Spark->Gitea path and
  the load-bearing IdentitiesOnly ssh-config pin the Update button depends on.
- spark-control-integration.md: trimmed from dev spec to live contract (dropped
  sudo -iu fallback and dev-side scaffolding; folded in direct-as-modelo, the
  Gitea key gotcha, restart cadence, and the LAN-only HTTP API).
- README: dropped stale "pre-Phase 0" status; Setup reframed for a fresh install.

Deferred follow-up: badge reflects container liveness only, not Matrix
connectivity; HEALTHCHECK + {{.State.Health.Status}} is the matrix-bridge-side fix.
2026-06-15 23:19:30 -05:00

5.6 KiB

Phase 3 — Spark Control integration (live command contract)

Status: DONE (2026-06-16), shipped in Spark Control v0.21.0. The matrix-bridge bot has a tile on the Spark Control dashboard under "Always-on services" — a live status badge plus Update, Restart, Stop/Start, and View logs buttons. All three ROADMAP Phase 3 exit criteria are met (status visible + reflects the container; update works; restart works). matrix-bridge needed no code change.

This document is the contract: what each control runs on the Spark, and what the output means. Kept as the reference for what the buttons actually do — and to reproduce by hand if the dashboard is ever unavailable.


What the bot is

A single Docker container on the DGX Spark.

Fact Value
Host spark-32d0 (10.59.211.6 on WireGuard), user modelo
Project dir /home/modelo/matrix-bridge — a Gitea clone tracking master
Compose service bot
Container name matrix-bridge (fixed via container_name:)
Image matrix-bridge-bot
Lifecycle host networking, restart: unless-stopped (survives Spark reboot)
Secrets .env, config.tomlgitignored, live only on the Spark, never in git

Spark Control SSHes into spark-32d0 as modelo (the same login it already uses for Spark 2), so these ride the existing channel — no new key, and no sudo wrap: this Spark has no passwordless sudo, and since the channel is already modelo (owner of the dir, member of the docker group) every command runs as the right user directly. (The original spec's sudo -iu modelo different-user fallback therefore never applies here.)

Registration on the Spark Control side: the bot's SSH user is a config field (set to modelo), the host reuses the existing Spark 2 connection, and container / dir / branch use the defaults (matrix-bridge / ~/matrix-bridge / master). The tile auto-hides when that user is blank or the container is absent, so it stays out of the way on installs that don't run the bot.


One-time prerequisites — DONE

~/matrix-bridge was originally loose files from scp; it's now a git clone of the Gitea repo, converted in place (the gitignored .env/config.toml were untouched, because git reset --hard ignores them).

Load-bearing gotcha that's now fixed: on the Spark, git offered the wrong SSH key first and Gitea rejected it (Permission denied (publickey)) even though the deploy key was correctly registered. Fixed by pinning it in modelo's ~/.ssh/config with IdentitiesOnly yes for the Gitea host. The Update button depends on that block staying in place — flag it if modelo's account is ever rebuilt.

The conversion, for reference:

cd /home/modelo/matrix-bridge
git init -b master
git remote add origin ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git
git fetch origin
git reset --hard origin/master          # secrets are gitignored → untouched
git branch --set-upstream-to=origin/master master

The contract — commands behind each control

Run from /home/modelo/matrix-bridge as modelo. Each is idempotent and fail-loud: non-zero exit + stderr is surfaced on the panel, not swallowed.

Status (poll for the badge)

docker inspect -f '{{.State.Status}}|{{.State.StartedAt}}|{{.RestartCount}}' matrix-bridge
  • running → up · exited → stopped/crashed · restarting → unhealthy/boot-looping · non-zero exit (No such object: matrix-bridge) → not deployed (tile hides). A climbing RestartCount while status flips to restarting is the crash-loop tell.
  • Badge = container liveness only, not Matrix connectivity — a bot that's running but disconnected from Synapse still shows Healthy. See the HEALTHCHECK note below.
  • Cadence note: a fast docker restart won't visibly flip the badge red — the panel re-checks status only after the command returns, by which point the container is already back up. A full docker stop turns it red within ~5s. Polling cadence, not a bug.

Logs

docker logs --tail 100 matrix-bridge

Restart

docker restart matrix-bridge

Update (pull + rebuild + recreate) — the headline button

cd /home/modelo/matrix-bridge \
  && git fetch origin \
  && git reset --hard origin/master \
  && docker compose up -d --build

git reset --hard origin/master is the deploy-box "always match remote" semantic: never stuck on divergence, and gitignored secrets are preserved. Streamed live on the panel with a ~25-min ceiling; non-zero exit + stderr surfaced. Workflow: push to Gitea, then click Update.

Stop / Start

docker stop matrix-bridge                                    # stop
cd /home/modelo/matrix-bridge && docker compose up -d        # start (recreates if needed)

Programmatic interface (LAN-only)

The same controls are reachable over HTTP if scripting is ever wanted:

  • POST /api/matrix-bridge/update → returns an id; GET .../update/{id} and .../update/{id}/stream (SSE) for progress.
  • GET /api/matrix-bridge/logs?tail=N
  • status via GET /api/services

Future enhancement — truer status (not required; matrix-bridge-side)

Status reports container liveness, not Matrix connectivity — the bot can be running yet disconnected from Synapse. A truer signal needs a Docker HEALTHCHECK backed by a bot-side liveness signal (e.g. the bot touches a file or exposes a tiny endpoint on each successful sync loop), after which Status could read {{.State.Health.Status}}. That's a matrix-bridge-side change — do it if/when "running but silent" actually bites, then tell the Spark Control dev to read the health field.