v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link

Hardware dashboard: - New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum - DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true. - 4s TTL cache in HardwareProbe to avoid hammering - /api/hardware returns per-Spark snapshot - UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling - Polls every 8s Knob context (tied to live hardware): - Each Advanced knob now shows plain-English help text - 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM - 'Max context' shows '~N pages of text' - Toggles show tradeoff descriptions Explain context: - '✨ Explain context' button on the update banner - /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE - Renders into an expandable 'Explained by the loaded LLM' section under Pending commits - Reasoning tokens shown italicized when the model emits them Open WebUI integration: - New 'Open WebUI URL' optional field in Configure Sparks - /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set Downloads: - Third radio option: Spark 1 only / Spark 2 only / Both Sparks - Backend picks SSH target based on mode - HF repo link icon next to the input - Helper line about NVFP4 for Blackwell Model cards: - Repo name is now a clickable link to its Hugging Face page Package: bump 0.3.0:0
2026-05-12 12:00:15 -05:00
parent c6da6b0784
commit 64ce0fca10
11 changed files with 609 additions and 11 deletions
@@ -16,6 +16,7 @@
    <div class="current" id="current">
      <span class="muted">connecting…</span>
    </div>
+    <a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
  </header>

  <main>
@@ -24,6 +25,11 @@
      <span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
    </section>

+    <section id="hardware-panel" class="hardware-panel hidden">
+      <h2 class="section-title">Spark hardware</h2>
+      <div id="hardware-grid" class="hardware-grid"></div>
+    </section>
+
    <section id="endpoint-panel" class="endpoint-panel hidden">
      <div class="ep-title muted small">OpenAI-compatible endpoint</div>
      <div class="ep-row">
@@ -133,11 +139,20 @@
          <label class="dl-row">
            <span class="dl-label">HuggingFace repo</span>
            <input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
+            <a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
          </label>
+          <div class="dl-help muted small">
+            <a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
+            · NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
+          </div>
          <div class="dl-row">
            <span class="dl-label">Where</span>
-            <label class="radio"><input type="radio" name="dl-mode" value="solo" checked> Spark 1 only (solo)</label>
-            <label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (cluster, copy in parallel)</label>
+            <label class="radio"><input type="radio" name="dl-mode" value="spark1" checked> Spark 1 only</label>
+            <label class="radio"><input type="radio" name="dl-mode" value="spark2"> Spark 2 only</label>
+            <label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (for cluster models)</label>
+          </div>
+          <div class="dl-help muted small">
+            For <strong>solo</strong> models, download to wherever you'll run them. For <strong>cluster</strong> models (-tp 2), both Sparks need the weights — "Both" downloads to one Spark and rsyncs to the other in parallel.
          </div>
          <div class="dl-actions">
            <button id="dl-cancel" class="btn">Cancel</button>
@@ -178,6 +193,7 @@
      <div class="ub-row">
        <span id="ub-text">Checking for updates…</span>
        <span class="spacer"></span>
+        <button id="ub-explain" class="btn small-btn hidden">✨ Explain context</button>
        <button id="ub-details" class="btn small-btn hidden">Show details</button>
        <button id="ub-apply" class="btn small-btn primary hidden">Apply update</button>
      </div>
@@ -185,6 +201,10 @@
        <summary class="muted small">Pending commits</summary>
        <pre id="ub-log" class="snippet"></pre>
      </details>
+      <details id="ub-explain-section" class="hidden">
+        <summary class="muted small">Explained by the loaded LLM</summary>
+        <div id="ub-explain-content" class="explain-content"></div>
+      </details>
      <div id="ub-progress" class="hidden">
        <div class="phase-row">
          <div class="phase" id="ub-phase">Applying update…</div>