A 12MP photo expands past vLLM's ~4096-image-token limit -> 400. Cap via
--mm-processor-kwargs max_pixels in the qwen36 recipe so big images auto-
downscale server-side for every /v1 consumer (verified live: 400->200).
Remove the v0.27.2 in-dashboard vision-check button per owner request; the
vision badge already signals capability.
Qwen3.6-35B-A3B is multimodal (vision tower on disk) but was labelled
text-only. Mark it [vision, reasoning] and add a 'Vision check' button on
the running vision-capable card: upload an image + prompt -> existing /v1
passthrough proxy -> show the model's text. Confirmed 7/7 fields on a
business card. Records the Gemma-4-26B deferral + research findings.
@@ -55,11 +55,13 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state
- **Live: v0.27.3:0 — Qwen3.6 vision works end-to-end (incl. full-size phone photos).** Commit `1f359e3`; installed on `immense-voyage` (`start-cli` confirms `0.27.3:0`); pushed to gitea master; **published to Clankistry + Gitea release** (`v0.27.3`). Two-part story: **(A) the daily driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — recipe was mislabelled `[reasoning]`, now `[vision, reasoning]`. Real business card read **7/7 fields perfect** (~97 tok/s, no patches). **(B) oversized-image fix:** a 12MP phone photo expands to ~11.8k vision tokens → exceeds vLLM's ~4096-image-token cap → **400 "Failed to apply Qwen3VLProcessor … token count mismatch."** Fix = cap resolution server-side via `'--mm-processor-kwargs={"max_pixels": 2000000}'` in the qwen36 recipe (auto-downscales big images for *every*`/v1` consumer; verified live — the 12MP image went 400→200). Quoting survives the stack because `launch-cluster.sh` does `printf "%q"` on the serve args (line 163) and `build_launch_command` shlex-quotes (round-trip test passes). **An in-dashboard "Vision check" button shipped in v0.27.2 then was removed in v0.27.3 at the owner's request** (clutter; the `vision` badge already signals capability — don't re-add it). The `/v1/chat/completions` proxy is a dumb passthrough that already forwards image content, so no backend change was needed. 161 pytest green.
- **Gemma-4-26B-A4B-NVFP4 eval — RESOLVED as "defer; Qwen covers vision better."** Two independent deep-research agents (this session) confirmed: it does NOT run on the stock `eugr/spark-vllm-docker` stack (crashes on `tie_weights``NotImplementedError` — the checkpoint declares compressed-tensors in config.json but is modelopt NVFP4). The working path needs the **`vllm/vllm-openai:gemma4-0505-arm64-cu130`** image (lacks Ray → can't go through `launch-cluster.sh`, needs **raw `docker run`** = the deferred raw-docker-swap feature) **+ a bind-mounted patched `gemma4.py`** (upstream PR #39084 unmerged) **+ `--moe-backend marlin`**, AND even then **vision is degraded** by open vLLM bug #40106 (wrong attention on image tokens — hurts OCR specifically). ~52 tok/s vs Qwen's 97. Net: more duct tape for worse vision than the Qwen Grant already runs. Revisit when #40106 + #39084 land. Alternatives agent also flagged **`RedHatAI/Qwen3.5-122B-A10B-NVFP4`** as the proven single-Spark *reasoning* step-up (30–51 tok/s, fits 128 GB, no patches) — a future daily-driver upgrade, orthogonal to vision.
- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:**`GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
- **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
- **Next (owner-driven, concrete): Gemma-4-26B-A4B vision daily-driver eval.** The `gemma4-26b` recipe is in the catalog (NVFP4 MoE; `--moe_backend=marlin` set — the fast CUTLASS FP4 path errors on GB10; vision+tools). Not yet downloaded or swap-tested. Owner wants vision for business-card OCR and is weighing it against the text-only Qwen3.6 35B daily driver (research: Gemma ~52 tok/s vs Qwen's ~97, slightly weaker reasoning). Next: download it, swap-test, try a business card.
- **Gemma-4-26B-A4B vision eval — DONE this session (deferred; see the v0.27.2 + Gemma bullets up top).** The `gemma4-26b` recipe stays in the catalog but is known not to launch on the stock stack; the owner's vision/OCR goal is met by the Qwen3.6 daily driver instead.
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
'v0.27.1:0 — bug fix: "Download a new model" now works on its own. The downloader on the Spark relies on a helper tool (uvx, part of Astral\'s uv) that the standard installer places under your home directory in ~/.local/bin. Spark Control runs downloads over an automated SSH session that wasn\'t looking there, so a download failed immediately with "uvx: command not found" even though the tool was installed. Spark Control now includes ~/.local/bin on the path when it runs a download, so the Download button works with no manual setup. No other changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
'v0.27.3:0 — Qwen3.6 vision now works end-to-end, including full-size phone photos. (1) Qwen3.6-35B-A3B reads images (e.g. business-card OCR) and now shows a "vision" badge on its card. (2) Fix: large/high-resolution images (e.g. a 12-megapixel phone photo) were being rejected by the model with a 400 error — a single big image expands to more vision tokens than vLLM allows. The Qwen launch now caps image resolution (max_pixels) so oversized images are automatically downscaled to a size the model accepts; the dashboard, Open WebUI, and any downstream app can now send full-size photos to the /v1 endpoint without errors, and OCR stays sharp. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
},
migrations:{
up: async({effects})=>{},
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.