spark-control

grant/spark-control

Fork 0

Commit Graph

Author	SHA1	Message	Date
Grant	5827683a09	v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with: AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048). Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value. Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance. Also documented in known-issues.md.	2026-05-12 13:22:24 -05:00
Grant	87334f85f0	Add per-model descriptions + repo-cleanup polish - models.yaml: add 'description' field for all 5 models (generic, anyone-can-use) - ModelDef gains optional description: str \| None field - UI: render description below meta tags; mute the repo line further - escapeHtml() for safety in case descriptions/names contain HTML chars - Update runbook: how to add a new model with description	2026-05-12 10:19:09 -05:00
Grant	72bf754baa	Pack spark-control_x86_64.s9pk (55 MB) - Move models.yaml into image/ so the docker build context is self-contained - Fix manifest: dockerfile=../image/Dockerfile, workdir=../image - Add LICENSE (MIT) and assets/README.md (StartOS marketplace listing) - s9pk validates: id=spark-control, version=0.1.0:0, osVersion=0.4.0-beta.6, sdkVersion=1.3.3 - Image embeds python:3.12-slim + openssh-client + FastAPI app + models.yaml	2026-05-12 09:52:53 -05:00

Author

SHA1

Message

Date

Grant

5827683a09

v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch

vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with:
  AssertionError: In Mamba cache align mode, block_size (2096) must be
  <= max_num_batched_tokens (2048).

Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value.

Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance.

Also documented in known-issues.md.

2026-05-12 13:22:24 -05:00

Grant

87334f85f0

Add per-model descriptions + repo-cleanup polish

- models.yaml: add 'description' field for all 5 models (generic, anyone-can-use)
- ModelDef gains optional description: str | None field
- UI: render description below meta tags; mute the repo line further
- escapeHtml() for safety in case descriptions/names contain HTML chars
- Update runbook: how to add a new model with description

2026-05-12 10:19:09 -05:00

Grant

72bf754baa

Pack spark-control_x86_64.s9pk (55 MB)

- Move models.yaml into image/ so the docker build context is self-contained
- Fix manifest: dockerfile=../image/Dockerfile, workdir=../image
- Add LICENSE (MIT) and assets/README.md (StartOS marketplace listing)
- s9pk validates: id=spark-control, version=0.1.0:0, osVersion=0.4.0-beta.6, sdkVersion=1.3.3
- Image embeds python:3.12-slim + openssh-client + FastAPI app + models.yaml

2026-05-12 09:52:53 -05:00

3 Commits