v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link

Hardware dashboard:
- New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum
- DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true.
- 4s TTL cache in HardwareProbe to avoid hammering
- /api/hardware returns per-Spark snapshot
- UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling
- Polls every 8s

Knob context (tied to live hardware):
- Each Advanced knob now shows plain-English help text
- 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM
- 'Max context' shows '~N pages of text'
- Toggles show tradeoff descriptions

Explain context:
- ' Explain context' button on the update banner
- /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE
- Renders into an expandable 'Explained by the loaded LLM' section under Pending commits
- Reasoning tokens shown italicized when the model emits them

Open WebUI integration:
- New 'Open WebUI URL' optional field in Configure Sparks
- /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set

Downloads:
- Third radio option: Spark 1 only / Spark 2 only / Both Sparks
- Backend picks SSH target based on mode
- HF repo link icon next to the input
- Helper line about NVFP4 for Blackwell

Model cards:
- Repo name is now a clickable link to its Hugging Face page

Package: bump 0.3.0:0
This commit is contained in:
Grant
2026-05-12 12:00:15 -05:00
parent c6da6b0784
commit 64ce0fca10
11 changed files with 609 additions and 11 deletions
+22 -2
View File
@@ -16,6 +16,7 @@
<div class="current" id="current">
<span class="muted">connecting…</span>
</div>
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
</header>
<main>
@@ -24,6 +25,11 @@
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
</section>
<section id="hardware-panel" class="hardware-panel hidden">
<h2 class="section-title">Spark hardware</h2>
<div id="hardware-grid" class="hardware-grid"></div>
</section>
<section id="endpoint-panel" class="endpoint-panel hidden">
<div class="ep-title muted small">OpenAI-compatible endpoint</div>
<div class="ep-row">
@@ -133,11 +139,20 @@
<label class="dl-row">
<span class="dl-label">HuggingFace repo</span>
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face"></a>
</label>
<div class="dl-help muted small">
<a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
</div>
<div class="dl-row">
<span class="dl-label">Where</span>
<label class="radio"><input type="radio" name="dl-mode" value="solo" checked> Spark 1 only (solo)</label>
<label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (cluster, copy in parallel)</label>
<label class="radio"><input type="radio" name="dl-mode" value="spark1" checked> Spark 1 only</label>
<label class="radio"><input type="radio" name="dl-mode" value="spark2"> Spark 2 only</label>
<label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (for cluster models)</label>
</div>
<div class="dl-help muted small">
For <strong>solo</strong> models, download to wherever you'll run them. For <strong>cluster</strong> models (-tp 2), both Sparks need the weights — "Both" downloads to one Spark and rsyncs to the other in parallel.
</div>
<div class="dl-actions">
<button id="dl-cancel" class="btn">Cancel</button>
@@ -178,6 +193,7 @@
<div class="ub-row">
<span id="ub-text">Checking for updates…</span>
<span class="spacer"></span>
<button id="ub-explain" class="btn small-btn hidden">✨ Explain context</button>
<button id="ub-details" class="btn small-btn hidden">Show details</button>
<button id="ub-apply" class="btn small-btn primary hidden">Apply update</button>
</div>
@@ -185,6 +201,10 @@
<summary class="muted small">Pending commits</summary>
<pre id="ub-log" class="snippet"></pre>
</details>
<details id="ub-explain-section" class="hidden">
<summary class="muted small">Explained by the loaded LLM</summary>
<div id="ub-explain-content" class="explain-content"></div>
</details>
<div id="ub-progress" class="hidden">
<div class="phase-row">
<div class="phase" id="ub-phase">Applying update…</div>