spark-control

Files

T

Grant 5827683a09 v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch

vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with:
  AssertionError: In Mamba cache align mode, block_size (2096) must be
  <= max_num_batched_tokens (2048).

Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value.

Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance.

Also documented in known-issues.md.

2026-05-12 13:22:24 -05:00

actions

v0.4.0 - NIM installer + dashboard resilience

2026-05-12 12:32:29 -05:00

fileModels

v0.4.0 - NIM installer + dashboard resilience

2026-05-12 12:32:29 -05:00

i18n

0.1.0:4 - expose /api/endpoints as separate StartOS service interface

2026-05-12 11:07:51 -05:00

init

Add StartOS 0.4 package scaffold (manifest, main, interfaces, 2 actions)