From 75c0ecfd08c27c4855041bfab85ed960fb366883 Mon Sep 17 00:00:00 2001 From: Grant Date: Tue, 12 May 2026 11:31:14 -0500 Subject: [PATCH] docs: update README with v0.2 feature summary --- README.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9e1c92c..37b5493 100644 --- a/README.md +++ b/README.md @@ -86,6 +86,15 @@ Other services on your LAN can hit `GET /api/endpoints` to learn where the curre ## Status -**v0.1** — local-only, single-cluster, no auth (trusts LAN). Five LLMs in the catalog: qwen3-vl (cluster), gemma4, qwen36, plus two legacy entries. Magpie surfaces red until its container is fixed. +**v0.2.3** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI. -v0.2 in progress: service-discovery API, magpie crash fix, Parakeet/Magpie lifecycle, model download driving, spark-vllm-docker update checks, configurable flag tiers. +### What v0.2 added on top of v0.1 + +- **Service discovery API** (`/api/endpoints`) for other LAN services +- **Magpie crash fix** documented (chown the model-cache volume to uid 1000) +- **Always-on services panel** with Start/Stop/Restart for Parakeet + Magpie, plus per-service host configuration in Configure Sparks (so Parakeet/Magpie can live on Spark 1, Spark 2, or anywhere) +- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled. +- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log +- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike. + +v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.