v0.7.0 - Pre-flight launch validation (Test button on every model card)

validate.py: - Builds the same args list a real swap would pass to 'vllm serve' - SSHes into Spark 1 and runs vLLM's own argparse layer inside the running vllm_node container, WITHOUT initializing the engine - Uses FlexibleArgumentParser (from vllm.utils.argparse_utils, with fallback to engine.arg_utils) + make_arg_parser — the exact same parser the 'vllm serve' CLI uses. Earlier attempt with bare argparse.ArgumentParser was too strict (rejected '--moe_backend' with underscore that the real CLI accepts via FlexibleArgumentParser's normalization) - Returns structured {ok, stage, error, cmd_args, launch_cmd} so the UI can surface the exact failure cause Endpoint: POST /api/swap/{key}/validate. Cheap (~5s), no engine init, no disruption to the currently-loaded model. Frontend: 'Test' button on every model card, inline result below the action row (green check or red detailed error). Result stays visible until the user reloads or clicks Test again. Catches: typos in flag names, deprecated/removed flags after a vLLM upgrade, type mismatches. Does NOT catch runtime-only failures (Mamba block-size assertion, OOM at load, kernel-compat). Ok=true is necessary-but-not-sufficient; ok=false is definitive 'don't bother running it'.
v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch
2026-05-12 13:37:37 -05:00 · 2026-05-12 13:22:24 -05:00 · 2026-05-12 13:19:27 -05:00 · 2026-05-12 12:51:49 -05:00
13 changed files with 746 additions and 9 deletions
@@ -84,6 +84,24 @@ Other services on your LAN can hit `GET /api/endpoints` to learn where the curre

 `base_url` is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with `ready: true` to safely route traffic.

+## Reporting failures from external apps
+
+Spark Control polls every 5 s, so a brief blip in Parakeet/Magpie/vLLM availability can slip between polls and never make it into the connectivity log. To capture short failures, an external app (e.g. Open WebUI) can POST whenever a call fails (or succeeds):
+
+```bash
+curl -X POST http://<dashboard-url>/api/health-event \
+  -H 'content-type: application/json' \
+  -d '{
+    "service": "parakeet",
+    "ok": false,
+    "source": "open-webui",
+    "error": "HTTP 503",
+    "ms": 420
+  }'
+```
+
+Fields: `service` (required), `ok` (required), `source` (optional, free-form), `error` (optional), `ms` (optional latency). Each POST appends a `report` event to the connectivity log alongside the polling-based transition events.
+
 ## Status

 **v0.2.3** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
@@ -0,0 +1,190 @@
+"""Track up/down transitions for any subject (Sparks AND services) and cache MACs.
+
+Persisted to /data/connectivity.json. Schema:
+
+    {
+      "macs": { "spark1": "aa:bb:..", "spark2": "11:22:.." },
+      "current": { "spark1": "up", "parakeet": "up", "magpie": "down", ... },
+      "last_change": { ... },
+      "events": [
+        # Active-probe transition (logged when state flips during polling)
+        { "subject": "spark2", "at": "...", "kind": "transition",
+          "transition": "down" },
+        { "subject": "spark2", "at": "...", "kind": "transition",
+          "transition": "up", "down_seconds": 4500 },
+
+        # Passive report (logged whenever an external app POSTs to
+        # /api/health-event regardless of state change)
+        { "subject": "parakeet", "at": "...", "kind": "report",
+          "ok": false, "source": "open-webui",
+          "detail": "Connection refused", "latency_ms": 320 },
+      ]
+    }
+
+Legacy events from v0.5 with `spark` instead of `subject` and no `kind` field
+are read transparently as kind="transition".
+"""
+from __future__ import annotations
+import json
+import os
+import threading
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+
+MAX_EVENTS = 200  # rolling window — plenty for showing recent history
+
+
+def _path() -> str:
+    return os.environ.get("CONNECTIVITY_LOG", "/data/connectivity.json")
+
+
+_lock = threading.Lock()
+
+
+def _read() -> dict:
+    try:
+        with open(_path()) as f:
+            return json.load(f) or {}
+    except (FileNotFoundError, json.JSONDecodeError):
+        return {}
+
+
+def _write(data: dict) -> None:
+    p = _path()
+    Path(p).parent.mkdir(parents=True, exist_ok=True)
+    tmp = p + ".tmp"
+    with open(tmp, "w") as f:
+        json.dump(data, f, indent=2, sort_keys=False)
+    os.replace(tmp, p)
+
+
+def load() -> dict:
+    with _lock:
+        d = _read()
+        d.setdefault("macs", {})
+        d.setdefault("current", {})
+        d.setdefault("last_change", {})
+        d.setdefault("events", [])
+        return d
+
+
+def record_mac(subject: str, mac: Optional[str]) -> None:
+    if not mac:
+        return
+    with _lock:
+        d = _read()
+        d.setdefault("macs", {})
+        if d["macs"].get(subject) != mac:
+            d["macs"][subject] = mac
+            _write(d)
+
+
+def record_state(subject: str, reachable: bool) -> Optional[dict]:
+    """Update current state for `subject`. If it differs from the last seen
+    state, append a transition event. Returns the event dict if a transition
+    was recorded, else None.
+
+    `subject` can be a Spark host key (spark1/spark2) or a service name
+    (parakeet/magpie/vllm).
+    """
+    new_state = "up" if reachable else "down"
+    now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
+    with _lock:
+        d = _read()
+        d.setdefault("macs", {})
+        d.setdefault("current", {})
+        d.setdefault("last_change", {})
+        d.setdefault("events", [])
+        prev = d["current"].get(subject)
+        if prev == new_state:
+            return None
+        event: dict = {
+            "subject": subject,
+            "at": now,
+            "kind": "transition",
+            "transition": new_state,
+        }
+        # When we have a previous state and timestamp, compute duration
+        last_change = d["last_change"].get(subject)
+        if prev and last_change:
+            try:
+                prev_dt = datetime.fromisoformat(last_change.replace("Z", "+00:00"))
+                duration = (datetime.now(timezone.utc) - prev_dt).total_seconds()
+                if prev == "down" and new_state == "up":
+                    event["down_seconds"] = round(duration)
+                if prev == "up" and new_state == "down":
+                    event["up_seconds"] = round(duration)
+            except ValueError:
+                pass
+        d["current"][subject] = new_state
+        d["last_change"][subject] = now
+        d["events"].append(event)
+        if len(d["events"]) > MAX_EVENTS:
+            d["events"] = d["events"][-MAX_EVENTS:]
+        _write(d)
+        return event
+
+
+def record_report(
+    subject: str,
+    *,
+    ok: bool,
+    source: str = "external",
+    detail: str = "",
+    latency_ms: Optional[int] = None,
+) -> dict:
+    """Record a passive report from an external caller (e.g. Open WebUI got a
+    503 calling Parakeet). Always appended to the events list; does NOT change
+    the active-probe state (which only the polling probe is authoritative on).
+    """
+    now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
+    with _lock:
+        d = _read()
+        d.setdefault("events", [])
+        event: dict = {
+            "subject": subject,
+            "at": now,
+            "kind": "report",
+            "ok": bool(ok),
+            "source": source or "external",
+        }
+        if detail:
+            event["detail"] = detail
+        if latency_ms is not None:
+            event["latency_ms"] = int(latency_ms)
+        d["events"].append(event)
+        if len(d["events"]) > MAX_EVENTS:
+            d["events"] = d["events"][-MAX_EVENTS:]
+        _write(d)
+        return event
+
+
+def get_mac(subject: str) -> Optional[str]:
+    d = load()
+    return d.get("macs", {}).get(subject)
+
+
+def _normalize_event(e: dict) -> dict:
+    """Promote legacy v0.5 events to the v0.6 shape so the UI sees one schema."""
+    if "subject" in e:
+        e.setdefault("kind", "transition")
+        return e
+    # Legacy: had "spark" + "transition" only
+    if "spark" in e:
+        e["subject"] = e.pop("spark")
+        e.setdefault("kind", "transition")
+    return e
+
+
+def summary() -> dict:
+    """Compact summary for the UI: known MACs, current state, recent events."""
+    d = load()
+    events = [_normalize_event(dict(e)) for e in d.get("events", [])]
+    return {
+        "macs": d.get("macs", {}),
+        "current": d.get("current", {}),
+        "last_change": d.get("last_change", {}),
+        "events": events[-80:],
+    }
@@ -10,6 +10,7 @@ import time
 from typing import Any

 from .config import Settings
+from .connectivity import record_mac, record_state
 from .ssh import ssh_run


@@ -23,6 +24,8 @@ echo MEMORY=$(free -b 2>/dev/null | awk '/^Mem:/ {print $2, $3}')
 echo DISK=$(df -B1 / 2>/dev/null | awk 'NR==2 {print $2, $3}')
 echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.draw,memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
 echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
+DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
+echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
 """.strip()


@@ -78,6 +81,9 @@ def _parse(out: str) -> dict:
    # Sum per-process compute memory (works even on unified-memory systems)
    if info.get("gpu_mem_used_mib"):
        parsed["gpu_mem_used_mib"] = _parse_int(info["gpu_mem_used_mib"])
+    # MAC address on the default-route interface (for Wake-on-LAN)
+    if info.get("mac"):
+        parsed["mac"] = info["mac"].lower()
    return parsed


@@ -118,12 +124,14 @@ class HardwareProbe:
            # marked this host unreachable, return the cached failure immediately.
            rc, out, err = await ssh_run(host, user, _PROBE, self.settings, timeout=6)
            if rc != 0:
-                # Cache failures for a slightly longer TTL so the dashboard isn't
-                # blocked behind 6 s of SSH timeout on every poll.
                result = {"reachable": False, "configured": True, "host": host, "error": err.strip() or out.strip() or f"rc={rc}"}
                self._cache[key] = (now, result)
-                # Override the TTL effectively by inserting a sentinel into the cache age
+                record_state(key, False)
                return result
-            result = {"reachable": True, "configured": True, "host": host, **_parse(out)}
+            parsed = _parse(out)
+            result = {"reachable": True, "configured": True, "host": host, **parsed}
            self._cache[key] = (now, result)
+            record_state(key, True)
+            if parsed.get("mac"):
+                record_mac(key, parsed["mac"])
            return result
@@ -10,6 +10,7 @@ from pydantic import BaseModel
 from typing import Literal

 from .config import Settings
+from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
 from .custom_services import add_custom_service, delete_custom_service
 from .download import DownloadManager
 from .hardware import HardwareProbe
@@ -21,6 +22,8 @@ from .services import docker_state, run_action, services_from_settings
 from .ssh import ssh_run
 from .swap import SwapManager
 from .updates import UpdateManager, get_update_status
+from .validate import validate_launch
+from .wol import send_local_broadcast, send_via_peer


 settings = Settings.from_env()
@@ -128,6 +131,81 @@ async def get_hardware() -> dict:
    return await hardware_probe.fetch()


+@app.get("/api/connectivity")
+async def get_connectivity() -> dict:
+    """Up/down transition log per Spark + cached MACs."""
+    return connectivity_summary()
+
+
+class HealthEventBody(BaseModel):
+    service: str                 # e.g. "parakeet", "magpie", "vllm"
+    ok: bool                     # true on success, false on failure
+    source: str | None = None    # what app reported (e.g. "open-webui")
+    error: str | None = None     # optional detail
+    ms: int | None = None        # optional latency
+
+
+@app.post("/api/health-event")
+async def post_health_event(body: HealthEventBody) -> dict:
+    """Passive endpoint: any LAN app can POST here when its call to one of our
+    services succeeds or (more usefully) fails. We log the report into the
+    connectivity history so a brief blip that polling misses still surfaces.
+
+    Example:
+        curl -X POST http://<dashboard>/api/health-event \\
+          -H 'content-type: application/json' \\
+          -d '{"service":"parakeet","ok":false,"error":"503","source":"open-webui","ms":420}'
+    """
+    if not body.service.strip():
+        raise HTTPException(400, "service is required")
+    event = record_report(
+        body.service.strip(),
+        ok=body.ok,
+        source=(body.source or "external").strip(),
+        detail=(body.error or "").strip(),
+        latency_ms=body.ms,
+    )
+    return {"ok": True, "recorded": event}
+
+
+@app.post("/api/spark/{name}/wake")
+async def wake_spark(name: str) -> dict:
+    """Send a Wake-on-LAN magic packet for the named Spark.
+
+    Tries the OTHER Spark (if reachable) first because the packet has to
+    originate on the target's LAN segment to be reliable. Falls back to a
+    direct UDP broadcast from this container.
+    """
+    if name not in ("spark1", "spark2"):
+        raise HTTPException(404, f"unknown spark: {name}")
+    mac = get_mac(name)
+    if not mac:
+        raise HTTPException(400, f"MAC for {name} not yet known; bring it up once so we can probe it, then this will work next time it sleeps")
+
+    # Find the peer's connectivity to decide the path.
+    other = "spark2" if name == "spark1" else "spark1"
+    other_host = settings.spark1_host if other == "spark1" else settings.spark2_host
+    other_user = settings.spark1_user if other == "spark1" else settings.spark2_user
+
+    delivered_via = None
+    via_peer_ok = False
+    via_peer_err = ""
+    if other_host and other_user:
+        via_peer_ok, via_peer_err = await send_via_peer(other_host, other_user, mac, settings)
+        if via_peer_ok:
+            delivered_via = other
+
+    if not via_peer_ok:
+        # Fall back to direct from this container
+        try:
+            send_local_broadcast(mac)
+            delivered_via = "container"
+        except Exception as e:
+            raise HTTPException(500, f"WoL failed: peer={via_peer_err!r} container={e!r}")
+
+    return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
+
+
@app.get("/api/services")
 async def get_services() -> dict:
    """Lifecycle state of always-on support services (Parakeet, Magpie, …).
@@ -170,6 +248,8 @@ async def get_services() -> dict:
    results = await asyncio.gather(*[one(n) for n in services.keys()])
    for name, info in results:
        out[name] = info
+        # Feed http reachability into the connectivity log (transition-only)
+        record_state(name, bool(info.get("http_ready")))
    return out


@@ -326,6 +406,10 @@ async def get_status() -> dict:
        check_parakeet(settings),
        check_magpie(settings),
    )
+    # Feed health into the connectivity log (deduped — only logs on transition)
+    record_state("vllm", bool(vllm.get("ok")))
+    record_state("parakeet", bool(parakeet.get("ok")))
+    record_state("magpie", bool(magpie.get("ok")))
    current_key = _identify_current_model(vllm.get("current_model"))
    return {
        "configured": settings.configured,
@@ -351,6 +435,15 @@ class SwapRequest(BaseModel):
    dry_run: bool = False


+@app.post("/api/swap/{key}/validate")
+async def validate_swap(key: str) -> dict:
+    """Pre-flight check: run vLLM's argparse layer against the proposed launch
+    command WITHOUT starting an engine. Cheap (~5 s) and doesn't disturb the
+    currently-loaded model.
+    """
+    return await validate_launch(key, catalog, settings)
+
+
@app.post("/api/swap")
 async def post_swap(req: SwapRequest) -> dict:
    if not settings.configured and not req.dry_run:
@@ -73,8 +73,10 @@ function renderCards() {
        <button class="btn ${isActive ? '' : 'primary'}" data-swap-key="${key}" ${isActive || isSwapping ? 'disabled' : ''}>
          ${isActive ? 'Current' : 'Switch to this'}
        </button>
+        <button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
        <button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>
      </div>
+      <div class="test-result hidden" data-test-result-for="${key}"></div>
    `;
    root.appendChild(card);
  }
@@ -84,6 +86,37 @@ function renderCards() {
  for (const btn of root.querySelectorAll('[data-adv-key]')) {
    btn.addEventListener('click', () => openAdvanced(btn.dataset.advKey));
  }
+  for (const btn of root.querySelectorAll('[data-test-key]')) {
+    btn.addEventListener('click', () => testLaunch(btn.dataset.testKey, btn));
+  }
+}
+
+async function testLaunch(key, btn) {
+  const resultEl = document.querySelector(`[data-test-result-for="${key}"]`);
+  if (!resultEl) return;
+  const originalText = btn.textContent;
+  btn.disabled = true;
+  btn.textContent = 'Testing…';
+  resultEl.classList.remove('hidden', 'ok', 'fail');
+  resultEl.innerHTML = '<span class="muted small">Checking launch args against vLLM\'s parser…</span>';
+  try {
+    const r = await fetchJSON(`/api/swap/${encodeURIComponent(key)}/validate`, { method: 'POST' });
+    if (r.ok) {
+      resultEl.classList.add('ok');
+      resultEl.innerHTML = `<span class="ok-mark">✓</span> Launch args parse OK. <span class="muted small">(Doesn't guarantee runtime success — only catches argparse-level issues.)</span>`;
+    } else {
+      resultEl.classList.add('fail');
+      const err = escapeHtml(r.error || 'unknown error');
+      const stage = r.stage ? ` <span class="muted small">(${escapeHtml(r.stage)})</span>` : '';
+      resultEl.innerHTML = `<span class="fail-mark">✗</span> Would fail: ${err}${stage}`;
+    }
+  } catch (e) {
+    resultEl.classList.add('fail');
+    resultEl.innerHTML = `<span class="fail-mark">✗</span> Test failed: ${escapeHtml(e.message)}`;
+  } finally {
+    btn.disabled = false;
+    btn.textContent = originalText;
+  }
 }

 function renderCurrent(status) {
@@ -121,10 +154,110 @@ function bar(usedPct, warn) {
 async function pollHardware() {
  try {
    state.hardware = await fetchJSON('/api/hardware');
+    try { state.connectivity = await fetchJSON('/api/connectivity'); } catch {}
    renderHardware();
  } catch (e) { console.warn('hardware poll failed', e); }
 }

+function fmtDuration(sec) {
+  if (sec == null) return '';
+  if (sec < 60) return `${Math.round(sec)}s`;
+  if (sec < 3600) return `${Math.round(sec / 60)}m`;
+  if (sec < 86400) {
+    const h = Math.floor(sec / 3600);
+    const m = Math.round((sec % 3600) / 60);
+    return m ? `${h}h ${m}m` : `${h}h`;
+  }
+  const d = Math.floor(sec / 86400);
+  const h = Math.round((sec % 86400) / 3600);
+  return h ? `${d}d ${h}h` : `${d}d`;
+}
+
+function openConnectivityDialog() {
+  const dlg = el('#connectivity-dialog');
+  const content = el('#connectivity-content');
+  const c = state.connectivity || {};
+  const events = c.events || [];
+  if (events.length === 0) {
+    content.innerHTML = '<div class="muted small">No events recorded yet. Once a Spark or service goes down and back up (or an external app reports a failure), entries appear here.</div>';
+    dlg.showModal();
+    return;
+  }
+  const bySubject = {};
+  for (const e of events) {
+    const subj = e.subject || e.spark || 'unknown';  // legacy fallback
+    (bySubject[subj] = bySubject[subj] || []).push(e);
+  }
+  // Sort subjects: hosts first, then services, alphabetical
+  const hostOrder = ['spark1', 'spark2'];
+  const subjects = Object.keys(bySubject).sort((a, b) => {
+    const ia = hostOrder.indexOf(a);
+    const ib = hostOrder.indexOf(b);
+    if (ia >= 0 && ib >= 0) return ia - ib;
+    if (ia >= 0) return -1;
+    if (ib >= 0) return 1;
+    return a.localeCompare(b);
+  });
+
+  const html = subjects.map((subj) => {
+    const evs = bySubject[subj];
+    const transitions = evs.filter(e => (e.kind || 'transition') === 'transition');
+    const reports = evs.filter(e => e.kind === 'report');
+    const downs = transitions.filter(e => e.transition === 'down').length;
+    const failedReports = reports.filter(e => !e.ok).length;
+    const mac = c.macs?.[subj];
+    const summaryParts = [];
+    if (transitions.length) summaryParts.push(`${transitions.length} probe transition${transitions.length===1?'':'s'} (${downs} down)`);
+    if (reports.length) summaryParts.push(`${reports.length} app report${reports.length===1?'':'s'} (${failedReports} failed)`);
+    const isHost = hostOrder.includes(subj);
+    return `
+      <div class="conn-spark">
+        <h4>${escapeHtml(subj)}${isHost ? ' <span class="muted small">[host]</span>' : ' <span class="muted small">[service]</span>'}${mac ? ` <span class="muted small">${escapeHtml(mac)}</span>` : ''}</h4>
+        <div class="conn-summary">${summaryParts.join(' · ') || 'no events'}</div>
+        ${evs.slice(-30).reverse().map(e => renderConnEvent(e)).join('')}
+      </div>
+    `;
+  }).join('');
+  content.innerHTML = html;
+  dlg.showModal();
+}
+
+function renderConnEvent(e) {
+  const when = escapeHtml((e.at || '').replace('T', ' ').replace('Z', ''));
+  const kind = e.kind || 'transition';
+  if (kind === 'report') {
+    const ok = !!e.ok;
+    const source = escapeHtml(e.source || 'external');
+    const detail = e.detail ? ` — ${escapeHtml(e.detail)}` : '';
+    const latency = e.latency_ms != null ? ` (${e.latency_ms} ms)` : '';
+    return `
+      <div class="conn-event ${ok ? 'up' : 'down'} report">
+        <span class="when">${when}</span>
+        <span class="what">${ok ? '◷ report: ok' : '◷ report: failed'} <span class="muted">from</span> ${source}${detail}</span>
+        <span class="dur">${latency}</span>
+      </div>
+    `;
+  }
+  const down = e.down_seconds != null ? `was down ${fmtDuration(e.down_seconds)}` : '';
+  const up = e.up_seconds != null ? `was up ${fmtDuration(e.up_seconds)}` : '';
+  return `
+    <div class="conn-event ${e.transition}">
+      <span class="when">${when}</span>
+      <span class="what">${e.transition === 'up' ? '↑ came back online' : '↓ dropped offline'}</span>
+      <span class="dur">${down}${up}</span>
+    </div>
+  `;
+}
+
+async function wakeSpark(name) {
+  try {
+    const r = await fetchJSON(`/api/spark/${name}/wake`, { method: 'POST' });
+    alert(`Wake-on-LAN sent to ${name} (MAC ${r.mac}, via ${r.delivered_via}). Give it ~30 seconds to wake; the card will go green when it comes back.`);
+  } catch (e) {
+    alert(`Wake failed: ${e.message}`);
+  }
+}
+
 function renderHardware() {
  const panel = el('#hardware-panel');
  const grid = el('#hardware-grid');
@@ -138,14 +271,23 @@ function renderHardware() {
    const card = document.createElement('div');
    if (!s.reachable) {
      card.className = 'hw-card unreachable';
+      const mac = state.connectivity?.macs?.[key];
+      const wolRow = mac
+        ? `<div class="wol-row">
+             <span class="mac-display">${escapeHtml(mac)}</span>
+             <span class="spacer"></span>
+             <button class="btn" data-wake="${escapeHtml(key)}">Wake (WoL)</button>
+           </div>`
+        : `<div class="muted small">MAC not yet known — once it's been up once with this dashboard installed, "Wake" will appear here.</div>`;
      card.innerHTML = `
        <div class="head">
          <span class="name">${escapeHtml(key)}</span>
          <span class="meta">unreachable</span>
        </div>
        <div class="muted small">${escapeHtml(s.host || '')} — ${escapeHtml(s.error || 'no response')}</div>
+        ${wolRow}
        <div class="muted small" style="line-height:1.5">
-          Spark Control can't restart a Spark that won't answer SSH. Steps to try:
+          If Wake-on-LAN doesn't bring it back, manual steps:
          <ol style="margin: 6px 0 0 18px; padding: 0;">
            <li>Verify it's powered on (check the front LED).</li>
            <li>Ping it from another LAN device.</li>
@@ -1307,6 +1449,13 @@ async function init() {
  el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
  el('#nim-form').addEventListener('submit', submitNim);
  el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
+  el('#open-connectivity').addEventListener('click', openConnectivityDialog);
+  el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
+  // Wake-on-LAN buttons live on unreachable hardware cards; delegate.
+  el('#hardware-grid').addEventListener('click', (e) => {
+    const btn = e.target.closest('[data-wake]');
+    if (btn) wakeSpark(btn.dataset.wake);
+  });
  setupCatalogDialog();
  setupAdvancedDialog();
  // Open WebUI link from /api/config
@@ -26,8 +26,22 @@
    </section>

    <section id="hardware-panel" class="hardware-panel hidden">
-      <h2 class="section-title">Spark hardware</h2>
+      <div class="section-header">
+        <h2 class="section-title">Spark hardware</h2>
+        <button id="open-connectivity" class="btn small-btn">Connectivity log</button>
+      </div>
      <div id="hardware-grid" class="hardware-grid"></div>
+
+      <dialog id="connectivity-dialog" class="modal">
+        <form method="dialog" class="modal-form">
+          <h3>Spark connectivity history</h3>
+          <p class="muted small">Most recent up/down transitions per Spark. Tracked since this dashboard was installed.</p>
+          <div id="connectivity-content" class="connectivity-content"></div>
+          <div class="modal-actions">
+            <button type="button" id="connectivity-close" class="btn">Close</button>
+          </div>
+        </form>
+      </dialog>
    </section>

    <section id="endpoint-panel" class="endpoint-panel hidden">
@@ -377,6 +377,44 @@ main {
 .hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
 .hw-card.unreachable .name { color: var(--error); }
 .hw-card.unreachable ol { color: var(--muted); }
+.hw-card .wol-row {
+  margin-top: 8px;
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  font-size: 12px;
+  color: var(--muted);
+}
+.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
+.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
+
+.connectivity-content {
+  max-height: 360px;
+  overflow-y: auto;
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  padding: 10px;
+  background: var(--surface-2);
+}
+.conn-spark { margin-bottom: 16px; }
+.conn-spark h4 { font-size: 13px; margin: 0 0 8px; color: var(--text); }
+.conn-event {
+  font-size: 12px;
+  display: flex;
+  gap: 10px;
+  padding: 4px 0;
+  border-bottom: 1px solid rgba(255,255,255,0.04);
+  font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
+}
+.conn-event:last-child { border-bottom: 0; }
+.conn-event .when { color: var(--muted); flex-shrink: 0; }
+.conn-event .what { flex: 1; }
+.conn-event.up .what { color: var(--accent); }
+.conn-event.down .what { color: var(--error); }
+.conn-event.report .what { font-style: italic; }
+.conn-event .muted { color: var(--muted); font-style: normal; }
+.conn-event .dur { color: var(--muted); }
+.conn-summary { color: var(--muted); font-size: 11px; padding: 4px 0 10px; }
 .hw-metric { display: flex; align-items: center; gap: 10px; font-size: 12px; }
 .hw-metric .label { color: var(--muted); width: 56px; flex-shrink: 0; text-transform: uppercase; letter-spacing: 0.05em; font-size: 11px; }
 .hw-metric .bar { flex: 1; height: 8px; background: var(--surface-2); border-radius: 4px; overflow: hidden; position: relative; }
@@ -663,9 +701,24 @@ main {
 .card.active .btn { background: rgba(74, 222, 128, 0.12); color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
 .card-actions { display: flex; gap: 6px; }
 .card-actions .btn.primary { flex: 1; }
-.card .adv-btn { padding: 8px 12px; font-size: 12px; }
+.card .adv-btn,
+.card .test-btn { padding: 8px 12px; font-size: 12px; }
 .card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }

+.test-result {
+  font-size: 12px;
+  line-height: 1.45;
+  padding: 8px 10px;
+  border-radius: 5px;
+  margin-top: 4px;
+  border: 1px solid var(--border);
+  background: var(--surface-2);
+}
+.test-result.ok { border-color: rgba(74, 222, 128, 0.4); background: rgba(74, 222, 128, 0.04); }
+.test-result.fail { border-color: rgba(239, 68, 68, 0.45); background: rgba(239, 68, 68, 0.06); word-break: break-word; }
+.test-result .ok-mark { color: var(--accent); font-weight: 600; }
+.test-result .fail-mark { color: var(--error); font-weight: 600; }
+
 .footer {
  margin-top: 28px;
  padding-top: 16px;
@@ -0,0 +1,137 @@
+"""Pre-flight validation of a proposed vLLM launch command.
+
+Runs vLLM's own argparse layer (EngineArgs) inside the vllm_node container WITHOUT
+starting the engine. Catches:
+
+  * unknown flag names (typos)
+  * bad types / values that argparse rejects
+  * deprecated flags removed in the installed vLLM version
+
+Does NOT catch (these surface only during real engine init):
+  * model-architecture-specific constraints (e.g. Qwen3.6 Mamba block_size)
+  * OOM at weight-loading time
+  * Triton / CUDA-kernel compatibility errors
+
+A pre-flight check that returns "ok" is therefore NOT a guarantee — but a
+"failed" verdict is a definitive 'don't bother with the real swap'.
+"""
+from __future__ import annotations
+import json
+import shlex
+from typing import Any
+
+from .config import Settings
+from .models import Catalog, build_launch_command
+from .ssh import ssh_run
+
+
+# Validates the proposed args against the same combined parser vLLM uses for
+# `vllm serve` (engine args + server args + frontend args). Returns one JSON
+# line on stdout: {"ok": true, ...} or {"ok": false, ...}.
+_VALIDATOR_SCRIPT = r"""
+import argparse, json, sys
+
+# Mirror what `vllm serve` does internally: FlexibleArgumentParser (which is
+# more lenient about dashes vs underscores) wrapped with make_arg_parser
+# (which adds engine + server + frontend args).
+parser = None
+try:
+    # Newer vLLM path
+    from vllm.utils.argparse_utils import FlexibleArgumentParser
+except Exception:
+    try:
+        # Older fallback
+        from vllm.engine.arg_utils import FlexibleArgumentParser
+    except Exception:
+        FlexibleArgumentParser = argparse.ArgumentParser  # type: ignore
+
+try:
+    from vllm.entrypoints.openai.cli_args import make_arg_parser
+    parser = make_arg_parser(FlexibleArgumentParser(add_help=False))
+except Exception:
+    pass
+if parser is None:
+    try:
+        from vllm.engine.arg_utils import EngineArgs
+        parser = FlexibleArgumentParser(add_help=False)
+        EngineArgs.add_cli_args(parser)
+    except Exception as e:
+        print(json.dumps({"ok": False, "stage": "import", "error": f"{type(e).__name__}: {e}"}))
+        sys.exit(0)
+
+class _ArgError(Exception):
+    pass
+
+def _err(message):
+    raise _ArgError(message)
+
+parser.error = _err  # capture argparse errors instead of sys.exit(2)
+
+try:
+    raw = sys.stdin.read()
+    arglist = json.loads(raw)
+    ns = parser.parse_args(arglist)
+    print(json.dumps({"ok": True, "model": getattr(ns, "model", None)}))
+except _ArgError as e:
+    print(json.dumps({"ok": False, "stage": "parse", "error": str(e)}))
+except SystemExit as e:
+    print(json.dumps({"ok": False, "stage": "parse", "error": f"argparse exit {e.code}"}))
+except Exception as e:
+    print(json.dumps({"ok": False, "stage": "parse", "error": f"{type(e).__name__}: {e}"}))
+"""
+
+
+def _vllm_arg_list(key: str, model_def, catalog: Catalog) -> list[str]:
+    """Reconstruct the args list passed to `vllm serve` (without the positional model)."""
+    cmd = build_launch_command(key, model_def, catalog.defaults)
+    # build_launch_command yields:
+    #   ./launch-cluster.sh [--solo] -d exec vllm serve <repo> <args...>
+    # We just want the bits after `vllm serve <repo>`.
+    tokens = shlex.split(cmd)
+    if "serve" not in tokens:
+        return []
+    i = tokens.index("serve")
+    after = tokens[i + 1 :]  # repo, then args
+    if not after:
+        return []
+    args = after[1:]  # drop the repo
+    # EngineArgs expects --model=REPO rather than positional, so prepend it.
+    return [f"--model={after[0]}", *args]
+
+
+async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dict:
+    if key not in catalog.models:
+        return {"ok": False, "stage": "lookup", "error": f"unknown model: {key}"}
+    if not settings.spark1_host or not settings.spark1_user:
+        return {"ok": False, "stage": "config", "error": "spark1 not configured"}
+
+    model = catalog.models[key]
+    arg_list = _vllm_arg_list(key, model, catalog)
+    if not arg_list:
+        return {"ok": False, "stage": "build", "error": "failed to build args list"}
+
+    payload = json.dumps(arg_list).replace("'", "'\\''")
+    # Pipe the JSON args list to a here-doc Python invocation. The validator
+    # reads from stdin to avoid shell-escaping the args themselves.
+    cmd = (
+        f"echo '{payload}' | docker exec -i vllm_node python3 -c "
+        + shlex.quote(_VALIDATOR_SCRIPT)
+    )
+
+    rc, out, err = await ssh_run(settings.spark1_host, settings.spark1_user, cmd, settings, timeout=20)
+    if rc != 0 and not out.strip():
+        return {
+            "ok": False,
+            "stage": "ssh",
+            "error": err.strip() or f"rc={rc}",
+            "cmd_args": arg_list,
+            "launch_cmd": build_launch_command(key, model, catalog.defaults),
+        }
+    last = out.strip().splitlines()[-1] if out.strip() else ""
+    try:
+        result: dict[str, Any] = json.loads(last)
+    except json.JSONDecodeError:
+        result = {"ok": False, "stage": "decode", "error": "validator did not return JSON", "raw": out[-500:]}
+    result["cmd_args"] = arg_list
+    result["launch_cmd"] = build_launch_command(key, model, catalog.defaults)
+    return result
@@ -0,0 +1,69 @@
+"""Wake-on-LAN.
+
+Two delivery paths, tried in order:
+
+  1. SSH into the other Spark and have IT broadcast — most reliable because the
+     packet originates from the same LAN subnet as the sleeping Spark.
+  2. Direct UDP broadcast from this container. May or may not work depending
+     on the StartOS container's network namespace.
+
+The DGX Spark's NIC must have WoL enabled in firmware/OS for either path to
+actually wake the box; this module just delivers the magic packet correctly.
+"""
+from __future__ import annotations
+import asyncio
+import re
+import socket
+
+from .config import Settings
+from .ssh import ssh_run
+
+
+_MAC_RE = re.compile(r"^[0-9a-fA-F]{2}([:-]?[0-9a-fA-F]{2}){5}$")
+
+
+def normalize_mac(mac: str) -> str:
+    mac = mac.strip().lower()
+    if not _MAC_RE.match(mac):
+        raise ValueError(f"invalid MAC address: {mac!r}")
+    return mac.replace("-", ":")
+
+
+def build_magic_packet(mac: str) -> bytes:
+    mac_bytes = bytes.fromhex(normalize_mac(mac).replace(":", ""))
+    return b"\xff" * 6 + mac_bytes * 16
+
+
+def send_local_broadcast(mac: str, broadcast: str = "255.255.255.255", port: int = 9) -> None:
+    """Send from THIS container. May not reach the LAN in some topologies."""
+    pkt = build_magic_packet(mac)
+    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+    try:
+        s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
+        s.sendto(pkt, (broadcast, port))
+        # Also send to port 7 (alternate WoL convention) for safety
+        s.sendto(pkt, (broadcast, 7))
+    finally:
+        s.close()
+
+
+async def send_via_peer(host: str, user: str, mac: str, settings: Settings) -> tuple[bool, str]:
+    """Use a different (reachable) Spark to send the WoL packet to its peer.
+
+    Uses Python 3 (always present on the Sparks for vLLM) to avoid depending on
+    wakeonlan / etherwake being installed.
+    """
+    normalized = normalize_mac(mac)
+    mac_hex = normalized.replace(":", "")
+    py = (
+        "python3 -c \""
+        "import socket; "
+        f"m=bytes.fromhex('{mac_hex}'); "
+        "s=socket.socket(socket.AF_INET, socket.SOCK_DGRAM); "
+        "s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1); "
+        "s.sendto(b'\\xff'*6 + m*16, ('255.255.255.255', 9)); "
+        "s.sendto(b'\\xff'*6 + m*16, ('255.255.255.255', 7)); "
+        "print('sent')\""
+    )
+    rc, out, err = await ssh_run(host, user, py, settings, timeout=8)
+    return rc == 0 and "sent" in out, (err.strip() or out.strip() or f"rc={rc}")
@@ -66,6 +66,7 @@ models:
    vllm_args:
      - --gpu-memory-utilization=0.85
      - --max-model-len=65536
+      - --max-num-batched-tokens=16384
      - --reasoning-parser=qwen3
      - --moe_backend=flashinfer_cutlass
      - --load-format=fastsafetensors
@@ -20,6 +20,10 @@ The trick is the `docker run --rm alpine chown` — it runs as root inside the t

 This flag is Blackwell-specific. If vLLM in the container reports `unrecognized arguments: --moe_backend` or similar, edit `models.yaml` for `qwen36` and drop that flag. The swap UI does NOT auto-fallback in v0.1 — failure surfaces in the log stream.

+## Qwen3.6 Mamba block-size assertion (fixed in v0.6.0:1)
+
+Qwen3.6 uses a Mamba-attention hybrid that requires `--max-num-batched-tokens >= 2096`. vLLM's default is 2048, which trips `AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048)`. Fix: bake `--max-num-batched-tokens=16384` into the bundled qwen36 entry — matches the upstream qwen3.5-35b-a3b-fp8 recipe.
+
 ## Two SSH paths to Spark 1 from the laptop

 `ssh <spark-user>@<spark-1-ip>` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for `<spark-1-host>.local`. Always use the `.local` hostname or `<spark-2-ip>`-style entries that ARE matched.
@@ -50,6 +50,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
        MAGPIE_CONTAINER: cfg.magpie_container,
        MODELS_OVERRIDES: '/data/models-overrides.yaml',
        SERVICES_OVERRIDES: '/data/services-overrides.yaml',
+        CONNECTIVITY_LOG: '/data/connectivity.json',
        OPEN_WEBUI_URL: cfg.open_webui_url,
        NGC_API_KEY: cfg.ngc_api_key,
        BIND_PORT: String(uiPort),
@@ -1,10 +1,10 @@
 import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'

 export const v0_1_0 = VersionInfo.of({
-  version: '0.4.0:0',
+  version: '0.7.0:2',
  releaseNotes: {
    en_US:
-      'v0.4: install NIM containers from the dashboard. New "+ Install NIM" button next to the services panel shows a curated catalog (Parakeet, Magpie, Riva...) plus a free-form image field. Streams docker pull + docker run output with phase + elapsed timer; persists installed services to /data/services-overrides.yaml so they show up in the services panel after install. Configure Sparks now has an NGC API key field (masked) needed for nvcr.io. v0.3.1 hotfix bundled in: hardware/services SSH timeouts shortened (6 s) and failures cached for 25 s so an unreachable Spark doesn\'t hang the whole dashboard. Hardware card for an unreachable Spark now includes troubleshooting steps.',
+      'v0.7: pre-flight launch validation. New "Test" button on every model card runs vLLM\'s argparse against the proposed launch command inside the running vllm_node container — without starting an engine. Catches unknown flags, bad types, and version-removed flags in about 5 seconds, before disrupting the currently-loaded model. (Runtime-only failures like the Qwen3.6 Mamba block-size assertion still only surface during a real swap, but argparse-stage bugs are now caught up front.)',
  },
  migrations: {
    up: async ({ effects }) => {},
Author	SHA1	Message	Date
Grant	6434b01a95	v0.7.0 - Pre-flight launch validation (Test button on every model card) validate.py: - Builds the same args list a real swap would pass to 'vllm serve' - SSHes into Spark 1 and runs vLLM's own argparse layer inside the running vllm_node container, WITHOUT initializing the engine - Uses FlexibleArgumentParser (from vllm.utils.argparse_utils, with fallback to engine.arg_utils) + make_arg_parser — the exact same parser the 'vllm serve' CLI uses. Earlier attempt with bare argparse.ArgumentParser was too strict (rejected '--moe_backend' with underscore that the real CLI accepts via FlexibleArgumentParser's normalization) - Returns structured {ok, stage, error, cmd_args, launch_cmd} so the UI can surface the exact failure cause Endpoint: POST /api/swap/{key}/validate. Cheap (~5s), no engine init, no disruption to the currently-loaded model. Frontend: 'Test' button on every model card, inline result below the action row (green check or red detailed error). Result stays visible until the user reloads or clicks Test again. Catches: typos in flag names, deprecated/removed flags after a vLLM upgrade, type mismatches. Does NOT catch runtime-only failures (Mamba block-size assertion, OOM at load, kernel-compat). Ok=true is necessary-but-not-sufficient; ok=false is definitive 'don't bother running it'.	2026-05-12 13:37:37 -05:00
Grant	5827683a09	v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with: AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048). Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value. Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance. Also documented in known-issues.md.	2026-05-12 13:22:24 -05:00
Grant	ee8c2406b8	v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint connectivity.py: - Generalized 'spark' subject to any string; renamed 'spark' field to 'subject' - Legacy v0.5 events with the old 'spark' field are migrated transparently on read (kind defaults to 'transition') - New record_report(subject, ok, source, detail, latency_ms): always appends an event with kind='report'; does NOT mutate the current state (only active polling is authoritative) - summary() returns events normalized to the new schema Wiring: - /api/status now calls record_state for vllm/parakeet/magpie (dedup on no-change) - /api/services calls record_state for each service after its http check - Result: dashboard observes service-level transitions automatically with no extra polling Passive endpoint: - POST /api/health-event with {service, ok, source?, error?, ms?} - Useful for external apps (e.g. Open WebUI) to surface sub-poll-interval failures the dashboard would otherwise miss UI: - Connectivity dialog groups events by subject (hosts ordered first, then services) - Per-subject summary shows transition count, down count, report count, failed-report count - Transitions and reports render inline with distinct styling; reports show source app + error + latency - Legacy v0.5 events render unchanged Docs: - README documents /api/health-event with a curl example Package: bump to 0.6.0:0	2026-05-12 13:19:27 -05:00
Grant	a02f4db850	v0.5.0 - Wake-on-LAN + connectivity history wol.py: - build_magic_packet(): standard 6x0xFF + 16x MAC layout - send_local_broadcast(): direct from container (ports 9 + 7 for safety) - send_via_peer(): preferred path; SSHes to the OTHER Spark and runs a Python one-liner there so the packet originates on the target's LAN segment (most reliable) - MAC validation + normalization connectivity.py: - /data/connectivity.json persistence (thread-safe, atomic rename) - Stores per-Spark current state + last_change timestamp + rolling 200-event log - Records up/down transitions; computes down_seconds / up_seconds durations - MAC cache populated lazily during hardware probes hardware.py: - Probe now reads MAC via /sys/class/net/<default-route-iface>/address - After each probe, record_state() emits a transition event if state changed - record_mac() caches the address so WoL works when the Spark next goes down Endpoints: - GET /api/connectivity: macs, current state, last_change, events[] - POST /api/spark/{name}/wake: tries via-peer first, falls back to direct broadcast UI: - Unreachable hardware card shows the cached MAC + 'Wake (WoL)' button (only if MAC known) - New 'Connectivity log' button opens a modal with per-Spark transition history (last 25 each), including duration of each prior up/down period - pollHardware also pulls /api/connectivity so WoL buttons appear without an extra fetch Package: bump 0.5.0:0; main.ts sets CONNECTIVITY_LOG=/data/connectivity.json	2026-05-12 12:51:49 -05:00