diff --git a/AGENTS.md b/AGENTS.md
index 1492b72..a893cbc 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -55,7 +55,7 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state
-- **Live: v0.27.2:0 — "Vision check" tool + Qwen3.6 marked vision-capable.** Installed on `immense-voyage` (`start-cli` confirms `0.27.2:0`); **committed/pushed; registry+Gitea-release publish pending Grant's real-card UI confirmation** (couldn't self-verify the front-end — mDNS package subdomain won't resolve from the agent shell). Prompted by a key discovery: **the daily-driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — its recipe was just mislabelled `[reasoning]`. Verified end-to-end: a business-card OCR request to its `/v1` vision API returned **7/7 fields perfect** (~97 tok/s, no patches). So: (1) `models.yaml` qwen36 now `[vision, reasoning]`; (2) new **Vision check** button on the *running* vision-capable card (`app.js` `openVisionCheck`/`runVisionCheck`) — upload an image + prompt → POST to the existing dumb-passthrough `/v1/chat/completions` proxy (no backend change; same-origin so CSRF-clean) → shows the model's text. 161 pytest green.
+- **Live: v0.27.3:0 — Qwen3.6 vision works end-to-end (incl. full-size phone photos).** Installed on `immense-voyage` (`start-cli` confirms `0.27.3:0`). Two-part story: **(A) the daily driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — recipe was mislabelled `[reasoning]`, now `[vision, reasoning]`. Real business card read **7/7 fields perfect** (~97 tok/s, no patches). **(B) oversized-image fix:** a 12MP phone photo expands to ~11.8k vision tokens → exceeds vLLM's ~4096-image-token cap → **400 "Failed to apply Qwen3VLProcessor … token count mismatch."** Fix = cap resolution server-side via `'--mm-processor-kwargs={"max_pixels": 2000000}'` in the qwen36 recipe (auto-downscales big images for *every* `/v1` consumer; verified live — the 12MP image went 400→200). Quoting survives the stack because `launch-cluster.sh` does `printf "%q"` on the serve args (line 163) and `build_launch_command` shlex-quotes (round-trip test passes). **An in-dashboard "Vision check" button shipped in v0.27.2 then was removed in v0.27.3 at the owner's request** (clutter; the `vision` badge already signals capability — don't re-add it). The `/v1/chat/completions` proxy is a dumb passthrough that already forwards image content, so no backend change was needed. 161 pytest green.
- **Gemma-4-26B-A4B-NVFP4 eval — RESOLVED as "defer; Qwen covers vision better."** Two independent deep-research agents (this session) confirmed: it does NOT run on the stock `eugr/spark-vllm-docker` stack (crashes on `tie_weights` `NotImplementedError` — the checkpoint declares compressed-tensors in config.json but is modelopt NVFP4). The working path needs the **`vllm/vllm-openai:gemma4-0505-arm64-cu130`** image (lacks Ray → can't go through `launch-cluster.sh`, needs **raw `docker run`** = the deferred raw-docker-swap feature) **+ a bind-mounted patched `gemma4.py`** (upstream PR #39084 unmerged) **+ `--moe-backend marlin`**, AND even then **vision is degraded** by open vLLM bug #40106 (wrong attention on image tokens — hurts OCR specifically). ~52 tok/s vs Qwen's 97. Net: more duct tape for worse vision than the Qwen Grant already runs. Revisit when #40106 + #39084 land. Alternatives agent also flagged **`RedHatAI/Qwen3.5-122B-A10B-NVFP4`** as the proven single-Spark *reasoning* step-up (30–51 tok/s, fits 128 GB, no patches) — a future daily-driver upgrade, orthogonal to vision.
- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
diff --git a/image/app/static/app.js b/image/app/static/app.js
index 1c66f7b..0208118 100644
--- a/image/app/static/app.js
+++ b/image/app/static/app.js
@@ -4,7 +4,6 @@ const state = {
models: {},
defaults: {},
current_model_key: null,
- vllm_model: null, // model id vLLM currently reports serving (for the vision check)
swap_job_id: null,
swap_eventsource: null,
swap_started_at: null,
@@ -121,11 +120,6 @@ function renderCards() {
const recipeActions = m.needs_setup ? '' : `
`;
- // "Vision check" only makes sense for the model that's actually loaded, and
- // only if it can take images — send an image to it and see what it reads.
- const visionBtn = (isActive && (m.capabilities || []).includes('vision'))
- ? ``
- : '';
card.innerHTML = `
diff --git a/image/app/static/style.css b/image/app/static/style.css
index bdba71d..9879496 100644
--- a/image/app/static/style.css
+++ b/image/app/static/style.css
@@ -805,18 +805,6 @@ main {
.test-result .ok-mark { color: var(--accent); font-weight: 600; }
.test-result .fail-mark { color: var(--error); font-weight: 600; }
-/* Vision check modal */
-.vc-preview { display: block; max-width: 100%; max-height: 180px; border-radius: 8px; margin: 4px 0 10px; border: 1px solid var(--border); }
-.vc-result {
- margin-top: 4px; padding: 10px 12px;
- border: 1px solid var(--border); border-radius: 8px;
- background: var(--surface-2);
- white-space: pre-wrap; word-break: break-word;
- font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
- font-size: 12px; line-height: 1.5; max-height: 280px; overflow: auto;
-}
-.vc-result.fail { border-color: rgba(239, 68, 68, 0.45); color: var(--error); }
-
.footer {
margin-top: 28px;
padding-top: 16px;
diff --git a/image/models.yaml b/image/models.yaml
index 4eef12b..e41284f 100644
--- a/image/models.yaml
+++ b/image/models.yaml
@@ -110,3 +110,8 @@ models:
- --load-format=fastsafetensors
- --enable-prefix-caching
- --kv-cache-dtype=fp8
+ # Cap image resolution: a large phone photo (e.g. 12MP) otherwise expands
+ # to ~11.8k vision tokens, blowing past vLLM's ~4096-image-token limit and
+ # getting rejected with a 400. ~2MP auto-downscales big images server-side
+ # (so every /v1 consumer is covered) while staying sharp enough for OCR.
+ - '--mm-processor-kwargs={"max_pixels": 2000000}'
diff --git a/package/startos/versions/v0_1_0.ts b/package/startos/versions/v0_1_0.ts
index 25b8b49..566ec15 100644
--- a/package/startos/versions/v0_1_0.ts
+++ b/package/startos/versions/v0_1_0.ts
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({
- version: '0.27.2:0',
+ version: '0.27.3:0',
releaseNotes: {
en_US:
- 'v0.27.2:0 — vision support is now visible and testable in the dashboard. (1) Qwen3.6-35B-A3B is a vision model (it reads images, including OCR), but its card was mislabelled text-only — it now shows the "vision" badge. (2) NEW: a "Vision check" button appears on the running model\'s card when it supports images. Upload a picture (e.g. a business card) with a prompt and see what the model reads back, right in the dashboard — confirmed reading a business card cleanly on the Qwen3.6 vision model. It uses the same on-LAN /v1 endpoint your apps already use, so nothing leaves your network. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
+ 'v0.27.3:0 — Qwen3.6 vision now works end-to-end, including full-size phone photos. (1) Qwen3.6-35B-A3B reads images (e.g. business-card OCR) and now shows a "vision" badge on its card. (2) Fix: large/high-resolution images (e.g. a 12-megapixel phone photo) were being rejected by the model with a 400 error — a single big image expands to more vision tokens than vLLM allows. The Qwen launch now caps image resolution (max_pixels) so oversized images are automatically downscaled to a size the model accepts; the dashboard, Open WebUI, and any downstream app can now send full-size photos to the /v1 endpoint without errors, and OCR stays sharp. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
},
migrations: {
up: async ({ effects }) => {},