After the recent eugr/spark-vllm-docker update, vLLM became stricter about multimodal token budgets:
ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496) is
larger than max_num_batched_tokens (2048). Please increase max_num_batched_tokens.
Each image input produces 2496 tokens, but vLLM's default --max-num-batched-tokens of 2048 is just under. Same class of bug as the Qwen3.6 Mamba block-size assertion we fixed in 0.6.0:1, surfacing on different models.
Fix: bake --max-num-batched-tokens=16384 into every multimodal model entry. Now applied to:
- qwen36 (already had it for the Mamba constraint; works for multimodal too since Qwen3.6 has vision)
- gemma4 (crashed today on engine init)
- qwen3-vl (would crash with the same error if anyone tried it)
The pre-flight Test button validates argparse but the 2048<2496 check happens at runtime engine init, so it's not caught by Test — only by actually trying to load. This is exactly the kind of bug v0.7's Test catches the *syntax* of but not the *semantics*; runtime errors like this still surface only on real swap. Known limitation documented in v0.7 release notes.
vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with:
AssertionError: In Mamba cache align mode, block_size (2096) must be
<= max_num_batched_tokens (2048).
Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value.
Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance.
Also documented in known-issues.md.
- models.yaml: add 'description' field for all 5 models (generic, anyone-can-use)
- ModelDef gains optional description: str | None field
- UI: render description below meta tags; mute the repo line further
- escapeHtml() for safety in case descriptions/names contain HTML chars
- Update runbook: how to add a new model with description