v0.2.3 - Per-model Advanced settings + catalog-add for downloaded models
Backend:
- overrides.py: read/write /data/models-overrides.yaml (knobs + custom entries)
- apply_knobs_to_args(): strip matching flags from bundled vllm_args and append knob values, so knob changes properly override bundled defaults
- extract_knobs_from_args(): seed UI knob values from bundled args so the Advanced dialog has correct starting state
- models.py: load_catalog merges overrides on top of bundled yaml
- GET /api/models returns effective_knobs per model
- PUT /api/models/{key}/knobs persists knob changes
- POST /api/models adds a custom catalog entry
- DELETE /api/models/{key} removes a custom entry (bundled models cannot be deleted)
- swap_manager.reload_catalog() called after each mutation so swaps see latest
Frontend:
- New 'Advanced' button on every card opens a modal dialog: max-model-len input, gpu-memory-utilization slider, three optimization checkboxes (fastsafetensors, prefix caching, FP8 KV cache). Save persists; Cancel discards. Custom models also have a Delete button.
- After a successful download, automatically open the 'Add to catalog' dialog pre-filled with the repo, with the same knob defaults — user just enters key, display name, and clicks Save.
- Custom catalog entries are tagged with a blue 'custom' pill on the card.
Package: bump 0.2.3:0; main.ts sets MODELS_OVERRIDES=/data/models-overrides.yaml so overrides persist on the StartOS volume.
This commit is contained in:
@@ -74,6 +74,54 @@
|
||||
<button id="open-download" class="btn small-btn">+ Download a new model</button>
|
||||
</div>
|
||||
|
||||
<dialog id="catalog-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="catalog-form">
|
||||
<h3>Add downloaded model to catalog</h3>
|
||||
<p class="muted small">It will appear as a new card you can swap to. Knob values become its default launch flags — you can tweak later via the model's "Advanced" panel.</p>
|
||||
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="cd-key" required pattern="[a-zA-Z0-9_-]+"></label>
|
||||
<label class="modal-row"><span>Display name</span><input type="text" id="cd-name" required></label>
|
||||
<label class="modal-row"><span>Repo (read-only)</span><input type="text" id="cd-repo" readonly></label>
|
||||
<label class="modal-row"><span>Size (GB)</span><input type="number" id="cd-size" step="0.1" min="0"></label>
|
||||
<label class="modal-row"><span>Mode</span>
|
||||
<select id="cd-mode">
|
||||
<option value="solo">solo (Spark 1 only)</option>
|
||||
<option value="cluster">cluster (both Sparks via Ray)</option>
|
||||
</select>
|
||||
</label>
|
||||
<label class="modal-row"><span>Description (optional)</span><textarea id="cd-desc" rows="3"></textarea></label>
|
||||
<fieldset class="modal-fieldset">
|
||||
<legend>Default launch knobs</legend>
|
||||
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="cd-mml" step="1024" min="1024" value="32768"></label>
|
||||
<label class="modal-row"><span>GPU memory %</span><input type="range" id="cd-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="cd-gmu-out">0.85</output></label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="cd-fst" checked> Fast safetensors loading</label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="cd-pcache" checked> Prefix caching</label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="cd-fp8" checked> FP8 KV cache</label>
|
||||
</fieldset>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="cd-cancel" class="btn">Cancel</button>
|
||||
<button type="submit" class="btn primary">Add to catalog</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="advanced-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="advanced-form">
|
||||
<h3 id="adv-title">Advanced settings</h3>
|
||||
<p class="muted small">Custom values are stored in the package volume and survive package updates. Empty fields fall back to defaults.</p>
|
||||
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="adv-mml" step="1024" min="1024"></label>
|
||||
<label class="modal-row"><span>GPU memory %</span><input type="range" id="adv-gmu" min="0.5" max="0.95" step="0.01"> <output id="adv-gmu-out"></output></label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="adv-fst"> Fast safetensors loading <span class="muted small">(faster cold start)</span></label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="adv-pcache"> Prefix caching <span class="muted small">(speeds up repeated prefixes)</span></label>
|
||||
<label class="modal-row inline"><input type="checkbox" id="adv-fp8"> FP8 KV cache <span class="muted small">(halves context memory)</span></label>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="adv-delete" class="btn danger hidden">Delete model</button>
|
||||
<span class="spacer"></span>
|
||||
<button type="button" id="adv-cancel" class="btn">Cancel</button>
|
||||
<button type="submit" class="btn primary">Save</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<section id="download-panel" class="download-panel hidden">
|
||||
<div class="download-form" id="download-form">
|
||||
<label class="dl-row">
|
||||
|
||||
Reference in New Issue
Block a user