v0.4.0 - NIM installer + dashboard resilience

Hotfix (was v0.3.1):
- services.py: cache 'unreachable' per (host,user) for 25s so a dead Spark doesn't hang every /api/services call behind 6s ssh timeout
- ssh_run timeout reduced 10 -> 6s for docker_state probes
- hardware probe: shorter SSH timeout (6s), longer cache TTL for failures (25s)
- JS pollStatus retries loadModels() if state.models is empty (recovers from cold-start proxy timeout)
- Unreachable hardware card now includes troubleshooting steps (Spark Control cannot SSH into an unreachable Spark to restart it)

v0.4 NIM installer:
- nim.py module: curated SUGGESTED_NIMS list (Parakeet, Magpie, Riva) + NimManager that runs docker login nvcr.io + docker pull + docker run -d --gpus all -p PORT:PORT -v VOL:/opt/nim/.cache -e NGC_API_KEY -e ... --restart=unless-stopped + chown the volume to uid 1000 + restart. Streams all output via SSE; redacts the API key from log lines.
- custom_services.py: persists installed NIMs to /data/services-overrides.yaml so they appear in the services panel after install
- services.py: merges custom services into the panel
- /api/nim/catalog GET, /api/nim/install POST + GET/SSE
- /api/services/{name} DELETE for custom services
- UI: '+ Install NIM' button next to 'Always-on services'; modal lists curated images each with a 'Pick' button + a custom-image form; installation runs in a second dialog with phase + elapsed timer + collapsible log
- NGC API key field added to Configure Sparks (masked); injected as NGC_API_KEY env var into the container

Package: bump 0.4.0:0; main.ts adds SERVICES_OVERRIDES + NGC_API_KEY env vars
This commit is contained in:
Grant
2026-05-12 12:32:29 -05:00
parent e88fdcfde4
commit 1889ab45fb
13 changed files with 690 additions and 10 deletions
+32
View File
@@ -376,6 +376,7 @@ main {
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
.hw-card.unreachable .name { color: var(--error); }
.hw-card.unreachable ol { color: var(--muted); }
.hw-metric { display: flex; align-items: center; gap: 10px; font-size: 12px; }
.hw-metric .label { color: var(--muted); width: 56px; flex-shrink: 0; text-transform: uppercase; letter-spacing: 0.05em; font-size: 11px; }
.hw-metric .bar { flex: 1; height: 8px; background: var(--surface-2); border-radius: 4px; overflow: hidden; position: relative; }
@@ -477,6 +478,37 @@ main {
#dl-log-details { margin-top: 12px; }
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
/* ===== NIM install dialog ===== */
.modal#nim-dialog,
.modal#nim-progress-dialog { max-width: 640px; }
.nim-grid {
display: grid;
gap: 8px;
grid-template-columns: 1fr;
max-height: 240px;
overflow-y: auto;
margin-bottom: 4px;
}
.nim-card {
background: var(--surface-2);
border: 1px solid var(--border);
border-radius: 6px;
padding: 10px 12px;
display: flex;
gap: 10px;
align-items: flex-start;
}
.nim-card .info { flex: 1; }
.nim-card .name { font-weight: 600; font-size: 13px; }
.nim-card .desc { color: var(--muted); font-size: 12px; margin-top: 4px; }
.nim-card .img { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; color: #6b6b75; font-size: 11px; margin-top: 4px; word-break: break-all; }
.nim-card .btn { padding: 6px 12px; font-size: 12px; flex-shrink: 0; }
.nim-card .links { font-size: 11px; margin-top: 4px; }
.nim-card .links a { color: var(--info); text-decoration: none; }
.nim-card .links a:hover { text-decoration: underline; }
.nim-key-warn { color: var(--warn); }
/* ===== Section titles ===== */
.section-title {