spark-control

Author	SHA1	Message	Date
Grant	8ac455f5f5	v0.8.0:3 - add --max-num-batched-tokens=16384 to vision models (gemma4, qwen3-vl) After the recent eugr/spark-vllm-docker update, vLLM became stricter about multimodal token budgets: ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496) is larger than max_num_batched_tokens (2048). Please increase max_num_batched_tokens. Each image input produces 2496 tokens, but vLLM's default --max-num-batched-tokens of 2048 is just under. Same class of bug as the Qwen3.6 Mamba block-size assertion we fixed in 0.6.0:1, surfacing on different models. Fix: bake --max-num-batched-tokens=16384 into every multimodal model entry. Now applied to: - qwen36 (already had it for the Mamba constraint; works for multimodal too since Qwen3.6 has vision) - gemma4 (crashed today on engine init) - qwen3-vl (would crash with the same error if anyone tried it) The pre-flight Test button validates argparse but the 2048<2496 check happens at runtime engine init, so it's not caught by Test — only by actually trying to load. This is exactly the kind of bug v0.7's Test catches the syntax of but not the semantics; runtime errors like this still surface only on real swap. Known limitation documented in v0.7 release notes.	2026-05-12 14:47:32 -05:00
Grant	5827683a09	v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with: AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048). Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value. Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance. Also documented in known-issues.md.	2026-05-12 13:22:24 -05:00
Grant	87334f85f0	Add per-model descriptions + repo-cleanup polish - models.yaml: add 'description' field for all 5 models (generic, anyone-can-use) - ModelDef gains optional description: str \| None field - UI: render description below meta tags; mute the repo line further - escapeHtml() for safety in case descriptions/names contain HTML chars - Update runbook: how to add a new model with description	2026-05-12 10:19:09 -05:00
Grant	72bf754baa	Pack spark-control_x86_64.s9pk (55 MB) - Move models.yaml into image/ so the docker build context is self-contained - Fix manifest: dockerfile=../image/Dockerfile, workdir=../image - Add LICENSE (MIT) and assets/README.md (StartOS marketplace listing) - s9pk validates: id=spark-control, version=0.1.0:0, osVersion=0.4.0-beta.6, sdkVersion=1.3.3 - Image embeds python:3.12-slim + openssh-client + FastAPI app + models.yaml	2026-05-12 09:52:53 -05:00

4 Commits