docs: update README with v0.2 feature summary

This commit is contained in:
Grant
2026-05-12 11:31:14 -05:00
parent 75fd0846b4
commit 75c0ecfd08
+11 -2
View File
@@ -86,6 +86,15 @@ Other services on your LAN can hit `GET /api/endpoints` to learn where the curre
## Status ## Status
**v0.1**local-only, single-cluster, no auth (trusts LAN). Five LLMs in the catalog: qwen3-vl (cluster), gemma4, qwen36, plus two legacy entries. Magpie surfaces red until its container is fixed. **v0.2.3**installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
v0.2 in progress: service-discovery API, magpie crash fix, Parakeet/Magpie lifecycle, model download driving, spark-vllm-docker update checks, configurable flag tiers. ### What v0.2 added on top of v0.1
- **Service discovery API** (`/api/endpoints`) for other LAN services
- **Magpie crash fix** documented (chown the model-cache volume to uid 1000)
- **Always-on services panel** with Start/Stop/Restart for Parakeet + Magpie, plus per-service host configuration in Configure Sparks (so Parakeet/Magpie can live on Spark 1, Spark 2, or anywhere)
- **Model download** from the dashboard — paste an HF repo, pick solo or cluster, watch percent progress with bytes/rate/ETA. After completion, an "Add to catalog" dialog appears pre-filled.
- **spark-vllm-docker update check** — banner shows "N commits behind upstream"; Apply Update runs `git pull && ./build-and-copy.sh -c` over SSH with a streamed log
- **Per-model Advanced settings** — knobs for max context, GPU memory %, and three optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). Persisted to `/data/models-overrides.yaml` so they survive package updates. Bundled and custom models alike.
v0.3+ roadmap (loose): richer dashboard (SSH/GPU/tokens-per-sec), Open WebUI deep-link integration, optional auth, multi-cluster.