v0.25.0:0 - cluster coordination layer (swap lock + webhook + schedule registry)

GPU-arbiter safety layer for when automation, not just the dashboard, swaps
models:
- swap reservation lock (POST/GET/DELETE /api/swap/lock); 423-enforced in
  post_swap via a single-read gate, TTL-bounded, secret-token auth, human
  force-release override + dashboard banner
- swap webhook (swap_complete/swap_failed) fired outside the swap lock, optional
  HMAC signature, configurable URL+secret
- read-only schedule registry (GET/POST/DELETE /api/schedule) + dashboard panel

New module image/app/coordination.py; docs/COORDINATION.md for consumers; 22
offline tests in test_coordination.py.
This commit is contained in:
Keysat
2026-06-18 07:07:08 -05:00
parent dd3d1412d4
commit 7ae6ab3ba8
15 changed files with 1026 additions and 15 deletions
@@ -35,6 +35,11 @@ export const sparkConfigSchema = z.object({
open_webui_url: z.string().catch(''),
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
ngc_api_key: z.string().catch(''),
// Optional coordination webhook: POSTed on swap_complete/swap_failed so
// downstream consumers re-point their model config. Blank => disabled.
swap_webhook_url: z.string().catch(''),
// Optional shared secret; if set, the webhook body is HMAC-signed.
swap_webhook_secret: z.string().catch(''),
})
export type SparkConfig = z.infer<typeof sparkConfigSchema>