Add Phase 3 Spark Control integration spec; mark Phase 2 done
docs/spark-control-integration.md: the SSH command contract (status via docker inspect; restart via docker restart; update via git fetch + reset --hard origin/master + docker compose up -d --build) plus the one-time conversion of the Spark's ~/matrix-bridge to a Gitea clone. No bot code change. Update source = git-pull-from-Gitea; rides Spark Control's existing SSH into spark-32d0 (no new key). Corrected the infra note: Spark is on the LAN with the Start9/Gitea host, so Spark->Gitea resolves directly.
This commit is contained in:
@@ -0,0 +1,186 @@
|
||||
# Phase 3 — Spark Control integration (spec for the Spark Control dev)
|
||||
|
||||
**Goal (ROADMAP Phase 3):** surface the matrix-bridge bot's container status on the Spark
|
||||
Control dashboard, and add one-click **update** (pull + rebuild + restart) and **restart**,
|
||||
wired the same SSH-behind-buttons way Spark Control already drives the Sparks.
|
||||
|
||||
**Exit (falsifiable):** bot status is visible on the panel, and the bot can be
|
||||
updated/restarted from the panel.
|
||||
|
||||
This document is the **contract**: what to run, where, and what the output means. The
|
||||
matrix-bridge side is fixed below; map the buttons onto Spark Control's existing
|
||||
managed-service pattern however that codebase already models a Spark/service. No changes to
|
||||
matrix-bridge are required for this.
|
||||
|
||||
---
|
||||
|
||||
## What the bot is
|
||||
|
||||
A single Docker container on the DGX Spark.
|
||||
|
||||
| Fact | Value |
|
||||
|---|---|
|
||||
| Host | `spark-32d0` (`10.59.211.6` on WireGuard), user **`modelo`** |
|
||||
| Project dir | `/home/modelo/matrix-bridge` (`~/matrix-bridge` for modelo) |
|
||||
| Compose service | `bot` |
|
||||
| Container name | `matrix-bridge` (fixed via `container_name:`) |
|
||||
| Image | `matrix-bridge-bot` |
|
||||
| Lifecycle | host networking, `restart: unless-stopped` (survives Spark reboot) |
|
||||
| Secrets | `.env`, `config.toml` — **gitignored**, live only on the Spark, never in git |
|
||||
|
||||
Spark Control already SSHes into `spark-32d0`, so these ride the existing channel — **no new
|
||||
key needed.** All commands below assume they run **as `modelo`** (owner of the dir, member of
|
||||
the `docker` group). If Spark Control's channel connects as a different user, wrap each command
|
||||
in `sudo -iu modelo bash -lc '<command>'` — running `git` in modelo's repo as root trips git's
|
||||
"dubious ownership" guard, so don't skip this.
|
||||
|
||||
---
|
||||
|
||||
## One-time prerequisites (owner, not Spark Control dev)
|
||||
|
||||
The bot dir on the Spark was originally populated by `scp` of loose files. To make
|
||||
git-pull-based updates work it must become a git clone of the Gitea repo **without disturbing
|
||||
the gitignored secrets** (`.env`, `config.toml`). Because those two files are gitignored,
|
||||
`git reset --hard` never touches them — so we can convert the existing dir in place.
|
||||
|
||||
**0a. Confirm the Spark can reach + authenticate to Gitea (fail loud here, not at first button press):**
|
||||
|
||||
```sh
|
||||
git ls-remote ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git >/dev/null \
|
||||
&& echo "gitea reachable" || echo "FIX gitea access first"
|
||||
```
|
||||
|
||||
The Spark is on the same LAN as the Start9 host running Gitea, so `immense-voyage.local`
|
||||
resolves directly — this should just work. If it doesn't, the only likely gap is a key
|
||||
authorized for read on the Gitea repo available to `modelo` (deploy key or existing key).
|
||||
Don't proceed until `git ls-remote` succeeds.
|
||||
|
||||
**0b. Convert `~/matrix-bridge` to a clone tracking `master` (run as `modelo`):**
|
||||
|
||||
```sh
|
||||
cd /home/modelo/matrix-bridge
|
||||
git init -b master
|
||||
git remote add origin ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git
|
||||
git fetch origin
|
||||
git reset --hard origin/master # secrets are gitignored → untouched
|
||||
git branch --set-upstream-to=origin/master master
|
||||
```
|
||||
|
||||
Verify the secrets survived and the container still comes up clean:
|
||||
|
||||
```sh
|
||||
ls -la /home/modelo/matrix-bridge/.env /home/modelo/matrix-bridge/config.toml # both present
|
||||
git -C /home/modelo/matrix-bridge status # .env/config.toml show as ignored, tree clean
|
||||
docker compose up -d --build && docker ps --filter name=^/matrix-bridge$
|
||||
```
|
||||
|
||||
`master` is the release branch (today `master == phase-1`). Track whatever you treat as the
|
||||
release line; the commands below assume `origin/master`.
|
||||
|
||||
---
|
||||
|
||||
## The contract — commands behind each control
|
||||
|
||||
Run from `/home/modelo/matrix-bridge` as `modelo`. Each is idempotent and fail-loud
|
||||
(non-zero exit ⇒ surface it on the panel; don't swallow).
|
||||
|
||||
### Status (poll for the badge)
|
||||
|
||||
```sh
|
||||
docker inspect -f '{{.State.Status}}|{{.State.StartedAt}}|{{.RestartCount}}' matrix-bridge
|
||||
```
|
||||
|
||||
- Output e.g. `running|2026-06-15T18:02:11.4Z|0`. Parse field 1 for the badge:
|
||||
- `running` → green/up. Field 3 (`RestartCount`) climbing while status flips to
|
||||
`restarting` ⇒ **crash loop** — show it; that's the most useful signal a dashboard gives here.
|
||||
- `exited` → stopped/crashed.
|
||||
- `restarting` → unhealthy / boot-looping.
|
||||
- **Non-zero exit** (`No such object: matrix-bridge`) ⇒ **not deployed** — distinct from
|
||||
"stopped". Show that state rather than erroring out.
|
||||
|
||||
Friendlier one-liner for a human-readable badge (empty string when not running):
|
||||
|
||||
```sh
|
||||
docker ps --filter name=^/matrix-bridge$ --format '{{.Status}}' # e.g. "Up 2 hours"
|
||||
```
|
||||
|
||||
### Logs (optional "view logs" action — handy for diagnosing a red badge)
|
||||
|
||||
```sh
|
||||
docker logs --tail 100 matrix-bridge
|
||||
```
|
||||
|
||||
### Restart (no code change)
|
||||
|
||||
```sh
|
||||
docker restart matrix-bridge
|
||||
```
|
||||
|
||||
### Update (pull latest code + rebuild + recreate) — the headline button
|
||||
|
||||
```sh
|
||||
cd /home/modelo/matrix-bridge \
|
||||
&& git fetch origin \
|
||||
&& git reset --hard origin/master \
|
||||
&& docker compose up -d --build
|
||||
```
|
||||
|
||||
- `git reset --hard origin/master` is the deploy-box "always match remote" semantic: never gets
|
||||
stuck on divergence, and gitignored secrets are preserved. (If you'd rather detect divergence,
|
||||
`git pull --ff-only` is the gentler alternative — but then a wedged tree needs manual help.)
|
||||
- `docker compose up -d --build` rebuilds the image and recreates the container only if the
|
||||
build changed. First build after a base-image bump is slow (minutes); subsequent builds hit
|
||||
the layer cache. **Treat update as long-running**: stream/await output, set a generous
|
||||
timeout (≥10 min), and don't block the dashboard on it.
|
||||
|
||||
### Stop / Start (optional)
|
||||
|
||||
```sh
|
||||
docker stop matrix-bridge # stop
|
||||
cd /home/modelo/matrix-bridge && docker compose up -d # start (recreates if needed)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Spark Control-side wiring (for the dev)
|
||||
|
||||
Map the above onto however Spark Control already registers a managed Spark/service:
|
||||
|
||||
1. **Register `matrix-bridge`** as a managed service (a tile), targeting `spark-32d0` over the
|
||||
existing SSH channel, commands run as `modelo`.
|
||||
2. **Status badge** ← poll the *Status* command on the panel's normal refresh cadence; map the
|
||||
four states above (running / exited / restarting / not-deployed) to your existing badge
|
||||
vocabulary. Surface `RestartCount` if your tile can show a secondary metric — a climbing
|
||||
count is the crash-loop tell.
|
||||
3. **Buttons:** `Update`, `Restart` (required for the exit criterion); `Logs`, `Stop`/`Start`
|
||||
(optional, nice-to-have).
|
||||
4. **Fail-loud, surfaced.** Every command's non-zero exit + stderr must reach the panel, not a
|
||||
silent failure — this mirrors matrix-bridge's own discipline (a bad launch reports back into
|
||||
the room rather than hanging). Especially: a failed `git fetch` (Gitea unreachable) or a
|
||||
failed build should show the error, not a stuck spinner.
|
||||
5. **`Update` is long-running** — see the timeout/streaming note above.
|
||||
|
||||
What I deliberately left generic: the tile's exact place in Spark Control's code, its UI, and
|
||||
its config schema — that's yours to fit to the existing pattern. If a precise drop-in matters,
|
||||
share how a Spark is currently registered (config entry + the command-runner seam) and I'll
|
||||
tailor steps 1–5 to it.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance (maps to the ROADMAP exit)
|
||||
|
||||
- [ ] Status tile shows the bot's live state and flips correctly across a manual
|
||||
`docker stop` / `docker start` on the Spark.
|
||||
- [ ] `Restart` from the panel cycles the container (status returns to `running`).
|
||||
- [ ] `Update` from the panel pulls a new commit, rebuilds, and recreates the container — and
|
||||
surfaces a clear error if Gitea is unreachable or the build fails.
|
||||
|
||||
---
|
||||
|
||||
## Note — optional future enhancement (not required for Phase 3)
|
||||
|
||||
The *Status* command reports container liveness (process up), not Matrix connectivity — the bot
|
||||
can be `running` yet disconnected from Synapse. A truer signal would need a Docker `HEALTHCHECK`
|
||||
backed by a bot-side liveness signal (e.g. the bot touches a file or exposes a tiny endpoint on
|
||||
each successful sync loop), after which Status could read `{{.State.Health.Status}}`. That's a
|
||||
matrix-bridge-side change, out of scope here — flag it if/when "running but silent" actually bites.
|
||||
Reference in New Issue
Block a user