v0.4.0 - NIM installer + dashboard resilience
Hotfix (was v0.3.1):
- services.py: cache 'unreachable' per (host,user) for 25s so a dead Spark doesn't hang every /api/services call behind 6s ssh timeout
- ssh_run timeout reduced 10 -> 6s for docker_state probes
- hardware probe: shorter SSH timeout (6s), longer cache TTL for failures (25s)
- JS pollStatus retries loadModels() if state.models is empty (recovers from cold-start proxy timeout)
- Unreachable hardware card now includes troubleshooting steps (Spark Control cannot SSH into an unreachable Spark to restart it)
v0.4 NIM installer:
- nim.py module: curated SUGGESTED_NIMS list (Parakeet, Magpie, Riva) + NimManager that runs docker login nvcr.io + docker pull + docker run -d --gpus all -p PORT:PORT -v VOL:/opt/nim/.cache -e NGC_API_KEY -e ... --restart=unless-stopped + chown the volume to uid 1000 + restart. Streams all output via SSE; redacts the API key from log lines.
- custom_services.py: persists installed NIMs to /data/services-overrides.yaml so they appear in the services panel after install
- services.py: merges custom services into the panel
- /api/nim/catalog GET, /api/nim/install POST + GET/SSE
- /api/services/{name} DELETE for custom services
- UI: '+ Install NIM' button next to 'Always-on services'; modal lists curated images each with a 'Pick' button + a custom-image form; installation runs in a second dialog with phase + elapsed timer + collapsible log
- NGC API key field added to Configure Sparks (masked); injected as NGC_API_KEY env var into the container
Package: bump 0.4.0:0; main.ts adds SERVICES_OVERRIDES + NGC_API_KEY env vars
This commit is contained in:
@@ -76,8 +76,66 @@
|
||||
</section>
|
||||
|
||||
<section id="services-panel" class="services hidden">
|
||||
<h2 class="section-title">Always-on services</h2>
|
||||
<div class="section-header">
|
||||
<h2 class="section-title">Always-on services</h2>
|
||||
<button id="open-nim" class="btn small-btn">+ Install NIM</button>
|
||||
</div>
|
||||
<div id="services-grid" class="services-grid"></div>
|
||||
|
||||
<dialog id="nim-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="nim-form">
|
||||
<h3>Install a NVIDIA NIM container</h3>
|
||||
<p class="muted small" id="nim-key-warn"></p>
|
||||
<p class="muted small">Pick a curated container below or paste any image from <a href="#" id="nim-catalog-link" target="_blank" rel="noopener">the NGC NIM catalog</a>. Spark Control will <code>docker pull</code> and <code>docker run</code> it on the target Spark.</p>
|
||||
|
||||
<div id="nim-suggested" class="nim-grid"></div>
|
||||
|
||||
<fieldset class="modal-fieldset">
|
||||
<legend>Custom image</legend>
|
||||
<label class="modal-row"><span>Image (nvcr.io/...)</span><input type="text" id="nim-image" placeholder="nvcr.io/nim/nvidia/<name>:latest"></label>
|
||||
<label class="modal-row"><span>Container name</span><input type="text" id="nim-container" placeholder="my-service"></label>
|
||||
<label class="modal-row"><span>Port</span><input type="number" id="nim-port" min="1" max="65535"></label>
|
||||
<label class="modal-row"><span>Kind</span>
|
||||
<select id="nim-kind">
|
||||
<option value="nim">NIM (other)</option>
|
||||
<option value="stt">STT (speech-to-text)</option>
|
||||
<option value="tts">TTS (text-to-speech)</option>
|
||||
<option value="vision">Vision</option>
|
||||
<option value="embedding">Embedding</option>
|
||||
</select>
|
||||
</label>
|
||||
<label class="modal-row"><span>Target Spark</span>
|
||||
<select id="nim-host">
|
||||
<option value="spark2">Spark 2 (default for support services)</option>
|
||||
<option value="spark1">Spark 1 (head node)</option>
|
||||
</select>
|
||||
</label>
|
||||
</fieldset>
|
||||
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="nim-cancel" class="btn">Cancel</button>
|
||||
<button type="submit" class="btn primary" id="nim-start">Install</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="nim-progress-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form">
|
||||
<h3 id="nim-prog-title">Installing…</h3>
|
||||
<div class="phase-row">
|
||||
<div class="phase" id="nim-prog-phase">Starting…</div>
|
||||
<span class="spacer"></span>
|
||||
<span class="timer" id="nim-prog-elapsed">0:00</span>
|
||||
</div>
|
||||
<details open>
|
||||
<summary class="muted small">Log</summary>
|
||||
<pre id="nim-prog-log" class="log"></pre>
|
||||
</details>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="nim-prog-close" class="btn">Close</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
</section>
|
||||
|
||||
<section id="models-section">
|
||||
|
||||
Reference in New Issue
Block a user