v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link
Hardware dashboard:
- New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum
- DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true.
- 4s TTL cache in HardwareProbe to avoid hammering
- /api/hardware returns per-Spark snapshot
- UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling
- Polls every 8s
Knob context (tied to live hardware):
- Each Advanced knob now shows plain-English help text
- 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM
- 'Max context' shows '~N pages of text'
- Toggles show tradeoff descriptions
Explain context:
- '✨ Explain context' button on the update banner
- /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE
- Renders into an expandable 'Explained by the loaded LLM' section under Pending commits
- Reasoning tokens shown italicized when the model emits them
Open WebUI integration:
- New 'Open WebUI URL' optional field in Configure Sparks
- /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set
Downloads:
- Third radio option: Spark 1 only / Spark 2 only / Both Sparks
- Backend picks SSH target based on mode
- HF repo link icon next to the input
- Helper line about NVFP4 for Blackwell
Model cards:
- Repo name is now a clickable link to its Hugging Face page
Package: bump 0.3.0:0
This commit is contained in:
+14
-5
@@ -19,7 +19,7 @@ from .config import Settings
|
||||
from .ssh import ssh_stream, StreamHandle
|
||||
|
||||
|
||||
Mode = Literal["solo", "cluster"]
|
||||
Mode = Literal["spark1", "spark2", "cluster"]
|
||||
|
||||
|
||||
_TQDM_RE = re.compile(
|
||||
@@ -113,17 +113,26 @@ class DownloadManager:
|
||||
|
||||
async def _do(self, job: DownloadJob) -> None:
|
||||
s = self.settings
|
||||
if not s.spark1_host or not s.spark1_user:
|
||||
raise RuntimeError("spark1 not configured")
|
||||
# Pick the SSH target and hf-download flags from the mode.
|
||||
if job.mode == "spark2":
|
||||
target_host, target_user = s.spark2_host, s.spark2_user
|
||||
flags = ""
|
||||
elif job.mode == "cluster":
|
||||
target_host, target_user = s.spark1_host, s.spark1_user
|
||||
flags = "-c --copy-parallel"
|
||||
else: # spark1
|
||||
target_host, target_user = s.spark1_host, s.spark1_user
|
||||
flags = ""
|
||||
if not target_host or not target_user:
|
||||
raise RuntimeError(f"{job.mode} host not configured")
|
||||
|
||||
flags = "-c --copy-parallel" if job.mode == "cluster" else ""
|
||||
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {job.repo} {flags}".strip()
|
||||
job.append(f"$ {cmd}")
|
||||
job.state = "downloading"
|
||||
job.progress.phase = "Connecting to Hugging Face…"
|
||||
|
||||
handle = StreamHandle()
|
||||
async for line in ssh_stream(s.spark1_host, s.spark1_user, cmd, s, handle=handle):
|
||||
async for line in ssh_stream(target_host, target_user, cmd, s, handle=handle):
|
||||
job.append(line)
|
||||
self._update_progress(job, line)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user