spark-control

Files

T

Grant 1602b3b3b4 v0.8.0:4 - vLLM deep-health: 'no model loaded' is idle, not a wedge

Previously a ConnectError on /v1/models classified vLLM as failing, which would feed into the wedge auto-restart heuristic. But when no model is loaded (the normal idle state between swaps, or after a failed swap leaves the vllm_node container up with no process serving), nothing is listening on 8888 — that's by design, not a wedge.

The vLLM probe now does a two-step check:
  1. GET /v1/models. ConnectError or empty list -> ok=true with note='no model currently loaded (idle)'. No auto-restart triggered (it wouldn't help anyway — restarting vllm_node kills any loaded model and doesn't load a new one).
  2. If a model is loaded, POST 1-token chat completion. A 5xx here is a genuine wedge worth restarting for.

Result: deep-health correctly reports 'no model loaded' as informational rather than flagging it as a failure. Auto-restart for vLLM only fires when a model is actually loaded AND inference fails — the right semantics.

2026-05-12 14:50:00 -05:00

static

v0.8.0 - Deep health probes + auto-restart on CUDA wedge

2026-05-12 14:41:01 -05:00

__init__.py

Initial scaffold: image/ FastAPI app, models.yaml, docs

2026-05-12 09:29:13 -05:00

config.py

v0.4.0 - NIM installer + dashboard resilience

2026-05-12 12:32:29 -05:00

connectivity.py

v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint