Files
matrix-bridge/docs/spark-control-integration.md
Keysat 28c974fe1d Mark Phase 3 (Spark Control) done; trim spec to live command contract
Shipped in Spark Control v0.21.0: status badge + Update/Restart/Stop-Start/Logs
tile. All three exit criteria confirmed. matrix-bridge needed no code change.

- AGENTS.md: Current state + ROADMAP Phase 3 -> DONE; Deploy switched scp -> git
  pull (Update button); D10 stamped; new Infra fact for the Spark->Gitea path and
  the load-bearing IdentitiesOnly ssh-config pin the Update button depends on.
- spark-control-integration.md: trimmed from dev spec to live contract (dropped
  sudo -iu fallback and dev-side scaffolding; folded in direct-as-modelo, the
  Gitea key gotcha, restart cadence, and the LAN-only HTTP API).
- README: dropped stale "pre-Phase 0" status; Setup reframed for a fresh install.

Deferred follow-up: badge reflects container liveness only, not Matrix
connectivity; HEALTHCHECK + {{.State.Health.Status}} is the matrix-bridge-side fix.
2026-06-15 23:19:30 -05:00

140 lines
5.6 KiB
Markdown

# Phase 3 — Spark Control integration (live command contract)
**Status: DONE (2026-06-16), shipped in Spark Control v0.21.0.** The matrix-bridge bot has a
tile on the Spark Control dashboard under "Always-on services" — a live status badge plus
**Update**, **Restart**, **Stop/Start**, and **View logs** buttons. All three ROADMAP Phase 3
exit criteria are met (status visible + reflects the container; update works; restart works).
matrix-bridge needed no code change.
This document is the **contract**: what each control runs on the Spark, and what the output
means. Kept as the reference for what the buttons actually do — and to reproduce by hand if the
dashboard is ever unavailable.
---
## What the bot is
A single Docker container on the DGX Spark.
| Fact | Value |
|---|---|
| Host | `spark-32d0` (`10.59.211.6` on WireGuard), user **`modelo`** |
| Project dir | `/home/modelo/matrix-bridge` — a **Gitea clone tracking `master`** |
| Compose service | `bot` |
| Container name | `matrix-bridge` (fixed via `container_name:`) |
| Image | `matrix-bridge-bot` |
| Lifecycle | host networking, `restart: unless-stopped` (survives Spark reboot) |
| Secrets | `.env`, `config.toml`**gitignored**, live only on the Spark, never in git |
Spark Control SSHes into `spark-32d0` as **`modelo`** (the same login it already uses for Spark 2),
so these ride the existing channel — no new key, and **no `sudo` wrap**: this Spark has no
passwordless sudo, and since the channel is already `modelo` (owner of the dir, member of the
`docker` group) every command runs as the right user directly. (The original spec's
`sudo -iu modelo` different-user fallback therefore never applies here.)
Registration on the Spark Control side: the bot's SSH user is a config field (set to `modelo`),
the host reuses the existing Spark 2 connection, and container / dir / branch use the defaults
(`matrix-bridge` / `~/matrix-bridge` / `master`). The tile auto-hides when that user is blank or
the container is absent, so it stays out of the way on installs that don't run the bot.
---
## One-time prerequisites — DONE
`~/matrix-bridge` was originally loose files from `scp`; it's now a git clone of the Gitea repo,
converted in place (the gitignored `.env`/`config.toml` were untouched, because `git reset --hard`
ignores them).
**Load-bearing gotcha that's now fixed:** on the Spark, git offered the wrong SSH key first and
Gitea rejected it (`Permission denied (publickey)`) even though the deploy key was correctly
registered. Fixed by pinning it in modelo's `~/.ssh/config` with `IdentitiesOnly yes` for the
Gitea host. **The Update button depends on that block staying in place — flag it if modelo's
account is ever rebuilt.**
The conversion, for reference:
```sh
cd /home/modelo/matrix-bridge
git init -b master
git remote add origin ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git
git fetch origin
git reset --hard origin/master # secrets are gitignored → untouched
git branch --set-upstream-to=origin/master master
```
---
## The contract — commands behind each control
Run from `/home/modelo/matrix-bridge` as `modelo`. Each is idempotent and fail-loud: non-zero
exit + stderr is surfaced on the panel, not swallowed.
### Status (poll for the badge)
```sh
docker inspect -f '{{.State.Status}}|{{.State.StartedAt}}|{{.RestartCount}}' matrix-bridge
```
- `running` → up · `exited` → stopped/crashed · `restarting` → unhealthy/boot-looping ·
non-zero exit (`No such object: matrix-bridge`) → **not deployed** (tile hides). A climbing
`RestartCount` while status flips to `restarting` is the crash-loop tell.
- **Badge = container liveness only, not Matrix connectivity** — a bot that's `running` but
disconnected from Synapse still shows Healthy. See the HEALTHCHECK note below.
- *Cadence note:* a fast `docker restart` won't visibly flip the badge red — the panel re-checks
status only after the command returns, by which point the container is already back up. A full
`docker stop` turns it red within ~5s. Polling cadence, not a bug.
### Logs
```sh
docker logs --tail 100 matrix-bridge
```
### Restart
```sh
docker restart matrix-bridge
```
### Update (pull + rebuild + recreate) — the headline button
```sh
cd /home/modelo/matrix-bridge \
&& git fetch origin \
&& git reset --hard origin/master \
&& docker compose up -d --build
```
`git reset --hard origin/master` is the deploy-box "always match remote" semantic: never stuck on
divergence, and gitignored secrets are preserved. Streamed live on the panel with a ~25-min
ceiling; non-zero exit + stderr surfaced. **Workflow: push to Gitea, then click Update.**
### Stop / Start
```sh
docker stop matrix-bridge # stop
cd /home/modelo/matrix-bridge && docker compose up -d # start (recreates if needed)
```
---
## Programmatic interface (LAN-only)
The same controls are reachable over HTTP if scripting is ever wanted:
- `POST /api/matrix-bridge/update` → returns an id; `GET .../update/{id}` and
`.../update/{id}/stream` (SSE) for progress.
- `GET /api/matrix-bridge/logs?tail=N`
- status via `GET /api/services`
---
## Future enhancement — truer status (not required; matrix-bridge-side)
Status reports container liveness, not Matrix connectivity — the bot can be `running` yet
disconnected from Synapse. A truer signal needs a Docker `HEALTHCHECK` backed by a bot-side
liveness signal (e.g. the bot touches a file or exposes a tiny endpoint on each successful sync
loop), after which Status could read `{{.State.Health.Status}}`. That's a matrix-bridge-side
change — do it if/when "running but silent" actually bites, then tell the Spark Control dev to
read the health field.