# Cluster coordination through Spark Control (v0.25.0) Spark Control is the **GPU arbiter, not a job runner.** Your recurring pipelines (model-warming crons, "daily X" generators, batch jobs) live in your own services and *drive Spark Control's swap API*. This page documents the safety layer around that: a **swap reservation lock**, a **swap-event webhook**, and a **read-only schedule registry**. If only the dashboard ever swaps models, you don't need any of this — it's for when something automated also swaps. All endpoints are on the Spark Control host (same LAN/VPN URL as the LLM, audio, and embeddings proxies). There is no API-token auth by design (LAN + split-tunnel VPN only); a non-browser client passes the same-origin guard automatically. --- ## 1. Swap reservation lock A short, TTL-bounded reservation of the swap path. While a lock is held, **any real swap that doesn't present the holder's token is refused with `423 Locked`** — including the dashboard's manual swap. The holder *name* is descriptive; the returned **token** is the secret that authorises swaps and the release. The lock is in-memory: it resets to *unlocked* if Spark Control restarts (the safe-for-availability default), and the swap engine's own in-progress guard still prevents two swaps running at once. ### `POST /api/swap/lock` — acquire (or extend) ```json // request { "holder": "openclaw-daily-vol", "ttl_seconds": 900, "note": "daily vol run" } // 200 response { "held": true, "holder": "openclaw-daily-vol", "acquired_at": "2026-06-17T12:00:00+00:00", "expires_at": "2026-06-17T12:15:00+00:00", "seconds_remaining": 900, "note": "daily vol run", "token": "a1b2c3…" // SECRET — store it; needed to swap and to release } ``` - `ttl_seconds` is optional (default 900) and clamped to `[1, 86400]`. - **`409`** if a *different* holder already holds it (body includes the current `lock` state). To **extend** your own lock, POST again with the same `holder` **and** your `token` — the token is preserved and the window slides forward. ### `GET /api/swap/lock` — status (no token) ```json { "held": true, "holder": "openclaw-daily-vol", "expires_at": "…", "seconds_remaining": 612, "note": "…" } // or { "held": false } ``` ### `DELETE /api/swap/lock` — release Send your token in the `X-Swap-Lock-Token` header (or `?token=`): ``` DELETE /api/swap/lock X-Swap-Lock-Token: a1b2c3… ``` - **`403`** if the token doesn't match. The dashboard's human override is `DELETE /api/swap/lock?force=true` (no token). ### Swapping while you hold the lock Pass the token on the swap call; the dashboard (no token) is then blocked: ``` POST /api/swap X-Swap-Lock-Token: a1b2c3… { "model_key": "gemma-3-27b" } ``` Recommended scheduler flow: **acquire → swap (with token) → poll `/api/swap/{id}` → release**. Always release in a `finally`; if you crash, the TTL frees it. > `POST /api/swap/{key}/validate` (pre-flight) and dry-run swaps are **not** > blocked by the lock — they don't touch the cluster. --- ## 2. Swap-event webhook Configure a URL in **Configure Sparks → "Swap webhook URL"**. After every real swap, Spark Control POSTs: ```json { "event": "swap_complete", // or "swap_failed" "job_id": "1a2b3c4d", "model_key": "gemma-3-27b", "state": "ready", // or "failed" "returncode": 0, "started_at": "2026-06-17T12:00:00+00:00", "finished_at": "2026-06-17T12:03:11+00:00", "dry_run": false } ``` Headers: `X-Spark-Event: swap_complete`. If you set a **webhook secret**, the body is signed: `X-Spark-Signature: sha256=` (HMAC-SHA256 of the raw body with the shared secret). Verify it like: ```python import hmac, hashlib expected = "sha256=" + hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest() assert hmac.compare_digest(expected, request.headers["X-Spark-Signature"]) ``` Delivery is best-effort and fire-and-forget (5 s timeout, no retries) — a webhook failure never affects the swap itself. Dry runs don't fire. --- ## 3. Schedule registry (read-only display) So the dashboard can show *what's scheduled to touch the GPU and when*, your schedulers register their jobs here. **Spark Control only displays these — it never executes them.** ### `POST /api/schedule` — register / update ```json // request (pass a stable `id` to update in place on re-register) { "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw", "cron": "0 6 * * *", "next_run": "2026-06-18T06:00:00Z", "description": "Swaps to the big model, generates the vol report" } // response: the stored entry (generates an id if you omit one) ``` `name` is required; `id` (if given) must match `[A-Za-z0-9_.-]` (≤64 chars). ### `GET /api/schedule` — list ```json { "schedules": [ { "id": "daily-vol", "name": "Daily Vol", "owner": "openclaw", "cron": "0 6 * * *", "next_run": "…", "description": "…", "registered_at": "…", "updated_at": "…" } ] } ``` ### `DELETE /api/schedule/{id}` — deregister ```json { "deleted": true } ``` The registry is in-memory — re-register your schedules on your own startup so they survive a Spark Control restart.