Files
matrix-bridge/AGENTS.md
T
Keysat a7529eb0b7 Containerize Phase 1 bot: Docker deployment on the Spark
Add Dockerfile, docker-compose.yml, docker-entrypoint.sh, and .dockerignore
so the bot runs detached and survives reboots, replacing the foreground venv run.

The image is generic (no secrets/deployment specifics baked in): host networking
reaches both Synapse and the Mac; .env, config.toml, and the SSH key are mounted
read-only. The entrypoint is the container's environment seam (D4 analog of
launch-claude.sh) — it generates ~/.ssh/config for the mac-bridge alias from
config.toml [mac] (new hostname/user fields) so the bot's `ssh mac-bridge` stays
unchanged. SSH key mounted not baked; first connect uses accept-new host trust.

Proven live on the Spark: container connects to Synapse and real messages launched
drivable sessions on the phone across 2 rooms via the full chain.
2026-06-15 18:40:05 -05:00

17 KiB
Raw Blame History

matrix-bridge — AGENTS.md

A single-user Matrix bot that turns a message in a project room into a live Claude Code session in that project's repo on the Mac — surfaced to the phone via Claude Code Remote Control. It makes the trigger portable: from anywhere on the WireGuard network, a Matrix message starts a session on the Mac in the correct repo, and Remote Control pushes it to the phone to drive interactively. Single user, private home network, no multi-user/product scope.

Inbox check: At session start, if ~/Projects/standards/INBOX.md exists, scan it for items tagged (matrix-bridge) and surface them before proposing next steps; triage with /triage.

Core flow (v1)

Matrix message in a project room
  → bot (matrix-nio, on the DGX Spark) receives it
  → looks up which repo that room maps to (explicit config — no classification)
  → SSHes to the Mac and runs scripts/launch-claude.sh with (repo_dir, message_text)
  → wrapper cd's into the repo and launches `claude` with the message as the prompt
  → Claude Code Remote Control (auto-enabled) pushes a notification to the phone
  → tap in and drive the session from the Claude app

Room determines the repo; the message text becomes the initial prompt. That is the entire v1 decision surface.

Stack

  • Bot: Python, matrix-nio (from the nio-template scaffold), single Docker container.
  • Runs on: a DGX Spark (always-on Linux, Docker). Not Start9, not the Mac.
  • Mac seam: scripts/launch-claude.sh, a zsh login-shell wrapper that owns all environment setup and launches claude.
  • Config: a readable room→repo mapping file (TOML) — adding a project is a config edit.
  • State: none beyond config in v1; SQLite or flat files only if a later phase needs them.

Placement

Question Decision Rationale
Sensitivity / sovereignty Local-only when an LLM is ever involved v1 makes no LLM call; future intent-parsing must run on a local model via Spark Control — message content may reference investor/portfolio context. Never wire a frontier API to message payloads.
Runtime shape Long-running service (always-listening bot) Must be up unattended to catch messages.
Host DGX Spark, Docker container Always-on Linux with Docker; co-located with Qwen3 for future local intent-parsing; reaches both Synapse (network) and the Mac (SSH).
s9pk vs container Plain container Not on Start9 at all — StartOS only runs s9pk packages; don't pay packaging cost, don't touch Synapse.
Model routing None in v1; future Qwen3 via Spark Control Keeps the sovereignty boundary; deterministic core first.
Data layer Config file (TOML) v1 needs no datastore.
Interface Matrix (Element) + phone via Remote Control "Reachable from phone" already satisfied by WireGuard + Remote Control.
Repo home Local + Gitea backup ssh://git@immense-voyage.local:59916/grant/matrix-bridge.git.

Commands

  • scripts/launch-claude.sh <repo_dir> <prompt> — the Mac wrapper (Phase 0 deliverable; validate by hand before any bot code).
  • Bot (Phase 1), containerized on the Spark — preferred: from ~/matrix-bridge, docker compose up -d --build (host networking, restart: unless-stopped so it survives reboots; read-only mounts of .env/config.toml/SSH key). Logs: docker compose logs -f. The entrypoint generates ~/.ssh/config for the mac-bridge alias from config.toml [mac] (hostname/user), so the alias resolves inside the container. Override the host key path with MB_SSH_KEY_HOST if it isn't /home/modelo/.ssh/id_ed25519.
  • Bot — venv (dev/fallback): python3 -m venv .venv && .venv/bin/pip install -r requirements.txt, then .venv/bin/python src/bot.py — uses modelo's host ~/.ssh/config for the alias. MB_SSH_ALIAS overrides the SSH target for testing.
  • Deploy: pull the bot files from the Mac (no Gitea needed) — scp mac-bridge:/Users/macpro/Projects/matrix-bridge/{Dockerfile,docker-compose.yml,docker-entrypoint.sh,requirements.txt,config.toml,.env} . and scp -r mac-bridge:/Users/macpro/Projects/matrix-bridge/src ., then rebuild.

Layout

  • AGENTS.md — this file (canonical; CLAUDE.md is a relative symlink to it).
  • ROADMAP.md — Phases 14+ with falsifiable exits, plus deferred/future directions.
  • README.md — human-facing intro.
  • scripts/launch-claude.sh — the Mac-side launch wrapper (the only seam that knows the Mac's environment).
  • config.example.toml — room→repo mapping template; the real config.toml is gitignored.
  • scripts/gui-launch.sh — opens the desktop Terminal via osascript (Approach B, D11); calls launch-claude.sh inside it. The bot invokes this over SSH.
  • src/bot.py — the matrix-nio bot (Phase 1): listens in mapped rooms; on a message runs ssh mac-bridge gui-launch.sh; fans out for all-projects; reports failures back to the room.
  • requirements.txt (matrix-nio) · .env.example (credential schema; real .env gitignored).
  • .claude/ — Claude wiring (dir only for now).
  • Dockerfile · docker-compose.yml · docker-entrypoint.sh · .dockerignore — the Phase 1 container (Spark). Generic image (no secrets/deployment specifics baked in); host networking; read-only mounts of .env/config.toml/SSH key. The entrypoint generates ~/.ssh/config for the mac-bridge alias from config.toml [mac] — the container's environment seam (D4 analog of launch-claude.sh).

Decisions (already made — don't relitigate without new information)

Condensed from the scoping workshop. Each: the call, why, what it beat.

  • D1 — matrix-nio, not Maubot. Full control for one custom bot with real SSH-orchestration logic; keeps Spark Control as the single dashboard. Beat: Maubot (competing web UI, management layer we don't need), SimpleMatrixBotLib.
  • D2 — Bot runs on the Spark, not Start9 or the Mac. Always-on Linux + Docker, co-located with Qwen3, reaches Synapse + the Mac. Beat: Start9 (no s9pk), Mac (not always-on; it's the execution target, not the orchestrator).
  • D3 — Synapse stays untouched. Treat the existing StartOS Synapse as a fixed external homeserver; the bot logs in as an ordinary Matrix user over WireGuard/LAN.
  • D4 — The Mac wrapper is the environment seam. A #!/bin/zsh -l wrapper owns PATH/credentials/cd/exec claude; the bot stays dumb and only invokes it over SSH. Beat: inlining source ~/.zprofile && … from the bot (brittle); relying on the default non-interactive SSH shell (the core failure mode — minimal shell loads neither .zprofile nor .zshrc).
  • D5 — Remote Control is the phone-control layer. Native, E2EE, already auto-enabled; execution stays on the Mac. The bot only needs to start the session. Note: outside server mode, one remote session per Claude Code instance.
  • D6 — Room = repo; routing is deterministic in v1. No classification, no LLM, no path branching. Beat (for v1): LLM intent parsing → deferred to D8.
  • D7 — No Nextcloud / CalDAV in v1. Not the pain point; the interesting future (routing Claude/bot outputs into Nextcloud) is real but unscoped.
  • D8 — Intent parsing deferred, but as a "routing brain." When added (Phase 4+): a smart dispatcher that, knowing all repos/contexts, decides which repo applies and what context to inject — not a task-vs-session classifier. MUST run on a local model via Spark Control. Revisit when: the deterministic core (Phases 12) is proven.
  • D9 — E2EE deferred (documented tradeoff). Single-user bot over WireGuard on a private LAN; transport is already private and matrix-nio E2EE adds libolm overhead. Revisit when: the bot ever handles sensitive content over untrusted transport.
  • D10 — Spark Control manages the bot (Phase 3). Status on the dashboard + one-click update/restart, the same SSH-behind-buttons pattern Spark Control uses for the Sparks today.
  • D11 — Launch into a desktop Terminal, not a headless token (Phase 0). The SSH session can't reach the GUI login Keychain, so a plain ssh … claude reports "Not logged in." Rather than mint a long-lived claude setup-token, the launcher (scripts/gui-launch.sh) uses osascript to open a Terminal.app window in the GUI session, where claude inherits the existing Keychain login and a real TTY. Beat: the long-lived OAuth token (Approach A) — works and is fully unattended, but adds a credential to manage; kept as the documented fallback if the Mac is ever driven headless (logged out). Cost: requires the Mac logged in + a one-time Terminal Automation grant.

Sovereignty constraint

v1 sends nothing to external services except what is deliberately typed into the Claude Code session itself. The bot's own logic is fully local. When intent parsing is added later it MUST run on a local model via Spark Control — never a frontier API — because it reads message content that may reference investor/LP/portfolio context. Never wire an external API call that carries message payloads.

Implementation guardrails (from the workshop)

  • Quoting through SSH is the known footgun. Message text crosses two shells (the Spark's, then the Mac's). Use shlex.quote (or equivalent) when building the remote command — never naive string-concatenate user text into the SSH command.
  • Fail loud on a bad directory. If a room maps to a missing dir, the wrapper exits non-zero (cd "$1" || exit 1) and the bot reports the failure back into the room — never launches Claude in the wrong place.
  • Config over code for the room→repo mapping.

Definition of done per phase

Substance threshold N = 3 real uses, defined per phase in ROADMAP.md. "Done" means falsifiable, scaled substance (it worked 3 real times), never a checkbox. A phase that "works once" is not done.

Current state

  • Scaffolded 2026-06-15 from a prior scoping package (SPEC/DECISIONS/CLAUDE/KICKOFF), folded into this AGENTS.md (decisions + placement), ROADMAP.md (phases), and the wrapper + config skeleton. No bot code yet — by design.
  • Phase 0 — SSH leg proven (2026-06-15). Mac Remote Login is on. The Spark spark-32d0 (user modelo) reaches the Mac over starttunnel/WireGuard at 10.59.211.5not the LAN (the Spark isn't on the Mac's LAN subnet). A dedicated per-machine key (spark-control@spark-32d0 = ~/.ssh/id_ed25519 on the Spark) is in the Mac's authorized_keys. SSH alias mac-bridge in the Spark's ~/.ssh/config selects that key (IdentityFile ~/.ssh/id_ed25519 + IdentitiesOnly yes) — required because the pre-existing Host * → id_ed25519_shared rule otherwise shadows the default key. The bot's entire Mac hop is therefore ssh mac-bridge '<command>'. Phase 1: bake the dedicated key + an equivalent alias/config into the bot's Docker image (modelo's ~/.ssh/config won't exist in the container).
  • Phase 0 — launch chain proven end-to-end (2026-06-15). ssh mac-bridge → gui-launch.sh → launch-claude.sh → authenticated claude → phone via Remote Control works against a real repo (premier-gunner). Chose Approach B (desktop Terminal) over a headless token — see D11. Two pieces it took: (1) ~/.local/bin (where claude lives) had to be added to ~/.zprofile, because a non-interactive login shell skips .zshrc; (2) scripts/gui-launch.sh opens a Terminal.app window via osascript so claude runs inside the GUI session (login Keychain + real TTY) — needed a one-time "Allow ssh to control Terminal" Automation grant. Known caveats for the bot: (a) a never-trusted repo stalls at Claude's first-run folder-trust gate — unattended launches must target already-trusted repos or pass a skip flag; (b) if the TCC Automation grant ever resets, a launch stalls until someone clicks Allow — the bot should detect a failed launch and report it back to the room, not hang.
  • Phase 0 — Matrix bot user live (2026-06-15). Homeserver is the StartOS Synapse exposed on clearnet at https://matrix.gilliam.ai (server_name = matrix.gilliam.ai, Synapse 1.154.0) — not the stale @gilliam:<onion> account found in Element. Created a dedicated non-admin bot @agent:matrix.gilliam.ai (type bot) via the Synapse Admin Dashboard (StartOS "Create Bot User" is appservice-only/greyed out). Minted a long-lived access token (fixed device_id matrix-bridge-bot), verified via whoami, and stored homeserver/user/token/device_id (+ password for recovery) in the gitignored .env (chmod 600). config.toml holds homeserver+user; .env.example documents the schema. Bot reuses the stored token — never re-login per start (avoids device churn); no E2EE (D9). Note: the bot↔Synapse hop is now public-internet TLS, which softens D9's "transport already WireGuard- private" rationale (still TLS to the user's own server, single-user content) — revisit if it matters.
  • Phase 0 — rooms mapped (2026-06-15). 9 project rooms in config.toml (premier-gunner, recap, recap-relay, spark-control, ten31-transcripts, ten31-signal-engine, keysat, proof-of-work, ten31-database), each room_id → /Users/macpro/Projects/<repo>. @agent is joined to all 9 (via its token), so the Phase-1 bot will see messages in each. Manual by-hand launches must keep message text free of '/" — the typed SSH command line breaks on them (PS2 > hang); the Phase-1 bot avoids this via shlex.quote.
  • Phase 0 — PROVEN / DONE (2026-06-15). N=3 by-hand runs succeeded across multiple rooms (recap, spark-control, premier-gunner): each opened a Terminal in the right repo, started claude on the message, and pushed a drivable session to the phone. The deterministic core holds. Added session naming: launch-claude.sh now runs claude -n "<repo> - <topic>" (topic from the message, overridable via $MB_SESSION_NAME) so Remote Control's phone index is readable — confirmed -n drives the phone app's conversation label.
  • Phase 1 — bot working, sub-steps 13 PROVEN (2026-06-15). src/bot.py (matrix-nio) logs in as @agent with the stored token, listens in all 12 rooms, and on a message runs ssh mac-bridge gui-launch.sh <repo> <message> (via shlex.quote), replies in-room, fans out for #all-projects (each session named <repo> - <date>), and reports failures back (fail-loud). Tested on the Spark (~/matrix-bridge, venv) — launches worked across several rooms (N=3). Now 11 project rooms + all-projects; config.toml has a [mac] section (ssh_alias + launcher).
  • Phase 1 — DONE: containerized + proven on the Spark (2026-06-15). The bot runs as a Docker container on the Spark (~/matrix-bridge, docker compose up -d --build): generic image (python:3.12-slim + openssh-client), host networking, restart: unless-stopped (survives reboots), read-only mounts of .env/config.toml/SSH key. docker-entrypoint.sh generates ~/.ssh/config for mac-bridge from config.toml [mac] (added hostname=10.59.211.5, user=macpro) — the container's env seam (D4 analog of launch-claude.sh); SSH key mounted not baked; first connect uses StrictHostKeyChecking=accept-new (private-WireGuard tradeoff, D9). Proven live: container connects to Synapse (listening as @agent… 11 rooms) and real messages in 2 different rooms each launched a drivable session on the phone via the full chain (container → ssh mac-bridgegui-launch.shclaude → phone), rc=0 — confirming the new container→Mac SSH hop over WireGuard (mounted key + accept-new host trust). Formal exit was N=3; the owner accepted 2 live launches across 2 rooms + the clear repeatable pattern as done. Build-time checks on the Mac also passed (image builds, ssh -G mac-bridge resolves, entrypoint perms 700/600).
  • Spark-side ops are owner-run. The Mac has no authorized SSH key into the Spark (modelo@10.59.211.6 — reachable over WireGuard but not authenticated; Phase 0 only set up the reverse, mac-bridge). So deploys/restarts on the Spark are run by the owner from the Spark, not driven from the Mac — until Phase 3 wires it behind Spark Control.
  • Next (open — discuss before building): Phase 2 (multi-room routing) is effectively already satisfied — the bot was built multi-room (11 rooms + all-projects) and routed correctly across 2 rooms in the Phase 1 proof; only a formal confirmation pass remains. Live candidates: Phase 3 (Spark Control: bot status + one-click update/restart on the dashboard, the SSH-behind-buttons pattern — also closes the owner-run-ops gap above) or the headless "ask" mode from ROADMAP.md (a message runs claude -p and posts the answer back into the room).