docs: update README with v0.2 feature summary

2026-05-12 11:31:14 -05:00
parent 75fd0846b4
commit 75c0ecfd08
1 changed files with 11 additions and 2 deletions
@@ -86,6 +86,15 @@ Other services on your LAN can hit `GET /api/endpoints` to learn where the curre
 ## Status
-**v0.1** — local-only, single-cluster, no auth (trusts LAN). Five LLMs in the catalog: qwen3-vl (cluster), gemma4, qwen36, plus two legacy entries. Magpie surfaces red until its container is fixed.
+**v0.2.3** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
-v0.2 in progress: service-discovery API, magpie crash fix, Parakeet/Magpie lifecycle, model download driving, spark-vllm-docker update checks, configurable flag tiers.
+### What v0.2 added on top of v0.1
 - **Service discovery API** (`/api/endpoints`) for other LAN services
 - **Magpie crash fix** documented (chown the model-cache volume to uid 1000)
 - **Always-on services panel** with Start/Stop/Restart for Parakeet + Magpie, plus per-service host configuration in Configure Sparks (so Parakeet/Magpie can live on Spark 1, Spark 2, or anywhere)
 - **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
 - **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
 - **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
 v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.