v0.27.2:0 - vision check tool + mark Qwen3.6 vision-capable
Qwen3.6-35B-A3B is multimodal (vision tower on disk) but was labelled text-only. Mark it [vision, reasoning] and add a 'Vision check' button on the running vision-capable card: upload an image + prompt -> existing /v1 passthrough proxy -> show the model's text. Confirmed 7/7 fields on a business card. Records the Gemma-4-26B deferral + research findings.
This commit is contained in:
@@ -55,11 +55,13 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
|
||||
|
||||
## Current state
|
||||
|
||||
- **Live: v0.27.2:0 — "Vision check" tool + Qwen3.6 marked vision-capable.** Installed on `immense-voyage` (`start-cli` confirms `0.27.2:0`); **committed/pushed; registry+Gitea-release publish pending Grant's real-card UI confirmation** (couldn't self-verify the front-end — mDNS package subdomain won't resolve from the agent shell). Prompted by a key discovery: **the daily-driver `RedHatAI/Qwen3.6-35B-A3B-NVFP4` is itself a vision model** (`Qwen3_5MoeForConditionalGeneration`, `vision_config` + `model_visual.safetensors` on disk) — its recipe was just mislabelled `[reasoning]`. Verified end-to-end: a business-card OCR request to its `/v1` vision API returned **7/7 fields perfect** (~97 tok/s, no patches). So: (1) `models.yaml` qwen36 now `[vision, reasoning]`; (2) new **Vision check** button on the *running* vision-capable card (`app.js` `openVisionCheck`/`runVisionCheck`) — upload an image + prompt → POST to the existing dumb-passthrough `/v1/chat/completions` proxy (no backend change; same-origin so CSRF-clean) → shows the model's text. 161 pytest green.
|
||||
- **Gemma-4-26B-A4B-NVFP4 eval — RESOLVED as "defer; Qwen covers vision better."** Two independent deep-research agents (this session) confirmed: it does NOT run on the stock `eugr/spark-vllm-docker` stack (crashes on `tie_weights` `NotImplementedError` — the checkpoint declares compressed-tensors in config.json but is modelopt NVFP4). The working path needs the **`vllm/vllm-openai:gemma4-0505-arm64-cu130`** image (lacks Ray → can't go through `launch-cluster.sh`, needs **raw `docker run`** = the deferred raw-docker-swap feature) **+ a bind-mounted patched `gemma4.py`** (upstream PR #39084 unmerged) **+ `--moe-backend marlin`**, AND even then **vision is degraded** by open vLLM bug #40106 (wrong attention on image tokens — hurts OCR specifically). ~52 tok/s vs Qwen's 97. Net: more duct tape for worse vision than the Qwen Grant already runs. Revisit when #40106 + #39084 land. Alternatives agent also flagged **`RedHatAI/Qwen3.5-122B-A10B-NVFP4`** as the proven single-Spark *reasoning* step-up (30–51 tok/s, fits 128 GB, no patches) — a future daily-driver upgrade, orthogonal to vision.
|
||||
- **Live: v0.27.1:0 — fix: "Download a new model" button (uvx PATH).** Commit `1e1e1cb`; installed on `immense-voyage` (`start-cli package list` confirms `0.27.1:0`); pushed to gitea master; **published to Clankistry** (`~/.spark-control/publish.sh`). Root cause: `hf-download.sh` shells out to `uvx`, which the uv installer puts in `~/.local/bin`; Spark Control's *non-interactive* SSH session doesn't source the user's profile, so `~/.local/bin` is off PATH and the download died with "uvx: command not found" (same class as the matrix-bridge non-interactive-SSH gotcha). Fix: `download.build_download_command` prepends `export PATH="$HOME/.local/bin:$PATH"` (server-side `$HOME`, generic for any adopter); extracted to a pure helper with regression tests (`test_download.py`: PATH prefix, no-trailing-space, cluster flags, shlex round-trip). 161 pytest green; verified live. Prompted by Grant adding **Gemma-4-26B**: he downloaded `nvidia/Gemma-4-26B-A4B-NVFP4` (recipe `gemma4-26b` already in catalog) via the now-fixed button — **fix confirmed end-to-end** — and is swapping to it. **Pending: business-card OCR / vision test** once it's up.
|
||||
- **Live: v0.27.0:0 — in-app Settings gear + two bug fixes** (commit `7e07598`; installed on `immense-voyage` — `start-cli package list` confirms `0.27.0:0`; published to Clankistry; pushed to gitea master). Prompted by the second adopter's v0.25 feedback. (1) StartOS "Configure Sparks" action trimmed to the **four required fields**; all optional knobs moved to a **⚙ Settings gear** in the dashboard, backed by a `/data/app_settings.json` overlay (`app_settings.py`) keyed by env-var names, overlaid on `os.environ`, applied **live** via in-place `Settings.reload()` (architecture + the snapshot-holder gotcha are in the fastapi-image guide). Existing installs' values **migrate automatically** on first boot (`seed_from_env`). (2) **Support-service ports now configurable** (`PARAKEET_PORT`/`KOKORO_PORT`/`EMBED_PORT`/`QDRANT_PORT`; `VLLM_PORT` surfaced) — fixes the adopter's false "vLLM down" (theirs is on 8000, not launch-cluster.sh's 8888) and Parakeet 404 (remapped off 8000). (3) **Bug fix:** `GET /api/swap/lock` 404 (was shadowed by `/api/swap/{job_id}`; lock routes now register first). Code review caught a real P1 (the `WebhookNotifier` snapshot — fixed via `swap_webhook.update()` after reload, regression-tested). 157 pytest + live smoke all green.
|
||||
- **Next on this thread (small, externally gated):** (a) **adopter reply is drafted** (in the session — corrects the vLLM-port misconception → set 8000 in the gear, confirms the port knobs + swap/lock fix, asks the disk-scan diagnostic) — **pending Grant to send** + pick the distribution-channel wording. (b) **Optional Gitea tag + `make release`** so the adopter can pull v0.27 from Gitea Releases (NOT done this session — only registry + sideload shipped); do it only if that adopter pulls from Gitea Releases rather than subscribing to Clankistry. (c) **Un-diagnosed:** adopter's disk-scan shows Gemma "not on disk" — needs them to run `ls ~/.cache/huggingface/hub` as the SSH user vs `disk.py`'s `$HOME/.cache/huggingface/hub` assumption (likely a custom `HF_HOME`/container-volume/different-user cache path → would need a configurable cache path).
|
||||
- **Live: v0.26.0:0 — disk-driven model menu** (installed on the server 2026-06-18, `installed-version` confirms; also published to the self-hosted StartOS registry). The dashboard lists what's *actually downloaded* on the Sparks; `models.yaml`/overrides are **launch recipes** matched by `repo`, not the menu; an on-disk model with no recipe shows `needs_setup` and infers its launch flags from `config.json` (operator confirms once). Delete removes weights **and** the card; dropped the two legacy Qwen recipes. Architecture (`discovery.py`/`build_menu`/`infer_recipe`, the recipe-vs-disk split) is in the fastapi-image guide.
|
||||
- **Next (owner-driven, concrete): Gemma-4-26B-A4B vision daily-driver eval.** The `gemma4-26b` recipe is in the catalog (NVFP4 MoE; `--moe_backend=marlin` set — the fast CUTLASS FP4 path errors on GB10; vision+tools). Not yet downloaded or swap-tested. Owner wants vision for business-card OCR and is weighing it against the text-only Qwen3.6 35B daily driver (research: Gemma ~52 tok/s vs Qwen's ~97, slightly weaker reasoning). Next: download it, swap-test, try a business card.
|
||||
- **Gemma-4-26B-A4B vision eval — DONE this session (deferred; see the v0.27.2 + Gemma bullets up top).** The `gemma4-26b` recipe stays in the catalog but is known not to launch on the stock stack; the owner's vision/OCR goal is met by the Qwen3.6 daily driver instead.
|
||||
- **Live: v0.25.0:0** (installed 2026-06-18). The OpenClaw/Johnny-5 coexistence epic is fully shipped & live: configurable `VLLM_PORT` (v0.22, blank ⇒ 8888), local/fine-tuned models (v0.23), configurable topology (v0.24 — `VLLM_CONTAINER`, `DISABLED_SERVICES` hide-list, second-Spark `kind: vllm` monitor), coordination layer (v0.25 — swap reservation lock with `423`-enforced manual-swap pause + `?force=true` Release override, `swap_complete`/`swap_failed` webhook, read-only schedule registry; consumer API in `docs/COORDINATION.md`).
|
||||
- **Other live features:** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware badge. Security hardening (v0.19 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) stable (`EVALUATION.md`). Spark 2 audio/embeddings stack healthy.
|
||||
- **matrix-bridge bot tile (v0.21.0:1, live):** `bot`-kind tile (docker-state badge; Update/Restart/Stop-Start/View-logs) for the Matrix bot on Spark 2, driven as `modelo` (no `sudo -iu`; blank `matrix_bridge_user` ⇒ tile hidden; host reuses `spark2_host`). Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}`. **Load-bearing:** Update's `git fetch` runs as `modelo` and needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` (else publickey denial). Optional next only if the bot dev asks: Docker `HEALTHCHECK`.
|
||||
|
||||
+84
-1
@@ -4,6 +4,7 @@ const state = {
|
||||
models: {},
|
||||
defaults: {},
|
||||
current_model_key: null,
|
||||
vllm_model: null, // model id vLLM currently reports serving (for the vision check)
|
||||
swap_job_id: null,
|
||||
swap_eventsource: null,
|
||||
swap_started_at: null,
|
||||
@@ -120,6 +121,11 @@ function renderCards() {
|
||||
const recipeActions = m.needs_setup ? '' : `
|
||||
<button class="btn test-btn" data-test-key="${key}" title="Pre-flight check the launch command without starting the engine">Test</button>
|
||||
<button class="btn adv-btn" data-adv-key="${key}" title="Advanced settings">Advanced</button>`;
|
||||
// "Vision check" only makes sense for the model that's actually loaded, and
|
||||
// only if it can take images — send an image to it and see what it reads.
|
||||
const visionBtn = (isActive && (m.capabilities || []).includes('vision'))
|
||||
? `<button class="btn vision-btn" data-vision-key="${key}" title="Send an image to the running model (e.g. a business card) and see what it reads">Vision check</button>`
|
||||
: '';
|
||||
card.innerHTML = `
|
||||
<div class="name">${escapeHtml(m.display_name)}</div>
|
||||
<div class="meta">
|
||||
@@ -138,7 +144,7 @@ function renderCards() {
|
||||
</div>
|
||||
<div class="spacer"></div>
|
||||
<div class="card-actions">
|
||||
${primaryBtn}${recipeActions}
|
||||
${primaryBtn}${recipeActions}${visionBtn}
|
||||
${trashBtn}
|
||||
</div>
|
||||
<div class="test-result hidden" data-test-result-for="${key}"></div>
|
||||
@@ -160,6 +166,80 @@ function renderCards() {
|
||||
for (const btn of root.querySelectorAll('[data-disk-del-key]')) {
|
||||
btn.addEventListener('click', () => openDiskDeleteDialog(btn.dataset.diskDelKey));
|
||||
}
|
||||
for (const btn of root.querySelectorAll('[data-vision-key]')) {
|
||||
btn.addEventListener('click', () => openVisionCheck(btn.dataset.visionKey));
|
||||
}
|
||||
}
|
||||
|
||||
// ===================== vision check =====================
|
||||
|
||||
function openVisionCheck(key) {
|
||||
const m = state.models[key];
|
||||
el('#vc-model').textContent = m ? ` — ${m.display_name}` : '';
|
||||
el('#vc-file').value = '';
|
||||
el('#vc-preview').classList.add('hidden');
|
||||
el('#vc-preview').removeAttribute('src');
|
||||
const res = el('#vc-result');
|
||||
res.classList.add('hidden');
|
||||
res.textContent = '';
|
||||
el('#vision-dialog').showModal();
|
||||
}
|
||||
|
||||
function previewVisionImage() {
|
||||
const file = el('#vc-file').files[0];
|
||||
const img = el('#vc-preview');
|
||||
if (!file) { img.classList.add('hidden'); return; }
|
||||
img.src = URL.createObjectURL(file);
|
||||
img.classList.remove('hidden');
|
||||
}
|
||||
|
||||
function readFileAsDataURL(file) {
|
||||
return new Promise((resolve, reject) => {
|
||||
const fr = new FileReader();
|
||||
fr.onload = () => resolve(fr.result);
|
||||
fr.onerror = () => reject(new Error('could not read the image file'));
|
||||
fr.readAsDataURL(file);
|
||||
});
|
||||
}
|
||||
|
||||
async function runVisionCheck() {
|
||||
const file = el('#vc-file').files[0];
|
||||
const res = el('#vc-result');
|
||||
if (!file) { alert('Pick an image first.'); return; }
|
||||
const modelId = state.vllm_model;
|
||||
if (!modelId) { alert('No running model detected — switch to a model first.'); return; }
|
||||
const prompt = el('#vc-prompt').value.trim() || 'Describe this image.';
|
||||
const btn = el('#vc-run');
|
||||
btn.disabled = true; btn.textContent = 'Running…';
|
||||
res.classList.remove('hidden', 'fail');
|
||||
res.textContent = 'Sending the image to the model…';
|
||||
try {
|
||||
const dataUrl = await readFileAsDataURL(file);
|
||||
const r = await fetchJSON('/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
model: modelId,
|
||||
max_tokens: 800,
|
||||
temperature: 0,
|
||||
messages: [{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'image_url', image_url: { url: dataUrl } },
|
||||
],
|
||||
}],
|
||||
}),
|
||||
});
|
||||
const msg = r.choices && r.choices[0] && r.choices[0].message;
|
||||
const text = (msg && msg.content && msg.content.trim()) || '(model returned no text)';
|
||||
res.textContent = text;
|
||||
} catch (e) {
|
||||
res.classList.add('fail');
|
||||
res.textContent = 'Failed: ' + e.message;
|
||||
} finally {
|
||||
btn.disabled = false; btn.textContent = 'Run';
|
||||
}
|
||||
}
|
||||
|
||||
const trashIcon = '<svg viewBox="0 0 24 24" width="14" height="14" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><polyline points="3 6 5 6 21 6"></polyline><path d="M19 6l-1 14a2 2 0 0 1-2 2H8a2 2 0 0 1-2-2L5 6"></path><path d="M10 11v6"></path><path d="M14 11v6"></path><path d="M9 6V4a2 2 0 0 1 2-2h2a2 2 0 0 1 2 2v2"></path></svg>';
|
||||
@@ -1143,6 +1223,7 @@ async function pollStatus() {
|
||||
try {
|
||||
const status = await fetchJSON('/api/status');
|
||||
state.current_model_key = status.current_model_key;
|
||||
state.vllm_model = (status.vllm || {}).current_model || null;
|
||||
state.configured = status.configured;
|
||||
renderBanner(status);
|
||||
renderCurrent(status);
|
||||
@@ -2329,6 +2410,8 @@ async function init() {
|
||||
});
|
||||
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
|
||||
el('#open-local').addEventListener('click', openLocalModelDialog);
|
||||
el('#vc-run').addEventListener('click', runVisionCheck);
|
||||
el('#vc-file').addEventListener('change', previewVisionImage);
|
||||
el('#lock-release').addEventListener('click', releaseLock);
|
||||
setupCatalogDialog();
|
||||
setupAdvancedDialog();
|
||||
|
||||
@@ -365,6 +365,21 @@
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<dialog id="vision-dialog" class="modal">
|
||||
<form method="dialog" class="modal-form" id="vision-form">
|
||||
<h3>Vision check<span id="vc-model" class="muted small"></span></h3>
|
||||
<p class="muted small">Send an image to the running model and see what it reads back — handy for confirming OCR on a real photo (e.g. a business card). Sent over the same <code>/v1</code> endpoint your apps use; nothing leaves the LAN.</p>
|
||||
<label class="modal-row"><span>Image</span><input type="file" id="vc-file" accept="image/*"></label>
|
||||
<img id="vc-preview" class="vc-preview hidden" alt="selected image preview">
|
||||
<label class="modal-row"><span>Prompt</span><textarea id="vc-prompt" rows="3">This is a business card. Extract every field as JSON with keys: name, title, company, phone, email, website, address. Output only the JSON.</textarea></label>
|
||||
<div class="vc-result hidden" id="vc-result"></div>
|
||||
<div class="modal-actions">
|
||||
<button type="button" id="vc-run" class="btn primary">Run</button>
|
||||
<button class="btn" value="cancel">Close</button>
|
||||
</div>
|
||||
</form>
|
||||
</dialog>
|
||||
|
||||
<section id="download-panel" class="download-panel hidden">
|
||||
<div class="download-form" id="download-form">
|
||||
<label class="dl-row">
|
||||
|
||||
@@ -805,6 +805,18 @@ main {
|
||||
.test-result .ok-mark { color: var(--accent); font-weight: 600; }
|
||||
.test-result .fail-mark { color: var(--error); font-weight: 600; }
|
||||
|
||||
/* Vision check modal */
|
||||
.vc-preview { display: block; max-width: 100%; max-height: 180px; border-radius: 8px; margin: 4px 0 10px; border: 1px solid var(--border); }
|
||||
.vc-result {
|
||||
margin-top: 4px; padding: 10px 12px;
|
||||
border: 1px solid var(--border); border-radius: 8px;
|
||||
background: var(--surface-2);
|
||||
white-space: pre-wrap; word-break: break-word;
|
||||
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||
font-size: 12px; line-height: 1.5; max-height: 280px; overflow: auto;
|
||||
}
|
||||
.vc-result.fail { border-color: rgba(239, 68, 68, 0.45); color: var(--error); }
|
||||
|
||||
.footer {
|
||||
margin-top: 28px;
|
||||
padding-top: 16px;
|
||||
|
||||
+4
-1
@@ -96,7 +96,10 @@ models:
|
||||
repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4
|
||||
size_gb: 20
|
||||
mode: solo
|
||||
capabilities: [reasoning]
|
||||
# Qwen3.6-35B-A3B is natively multimodal (Qwen3_5MoeForConditionalGeneration,
|
||||
# vision tower ships in the checkpoint). Confirmed reading a business card
|
||||
# cleanly on this cluster — use the "Vision check" button on the live card.
|
||||
capabilities: [vision, reasoning]
|
||||
expected_ready_seconds: 300
|
||||
vllm_args:
|
||||
- --gpu-memory-utilization=0.85
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||
|
||||
export const v0_1_0 = VersionInfo.of({
|
||||
version: '0.27.1:0',
|
||||
version: '0.27.2:0',
|
||||
releaseNotes: {
|
||||
en_US:
|
||||
'v0.27.1:0 — bug fix: "Download a new model" now works on its own. The downloader on the Spark relies on a helper tool (uvx, part of Astral\'s uv) that the standard installer places under your home directory in ~/.local/bin. Spark Control runs downloads over an automated SSH session that wasn\'t looking there, so a download failed immediately with "uvx: command not found" even though the tool was installed. Spark Control now includes ~/.local/bin on the path when it runs a download, so the Download button works with no manual setup. No other changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
|
||||
'v0.27.2:0 — vision support is now visible and testable in the dashboard. (1) Qwen3.6-35B-A3B is a vision model (it reads images, including OCR), but its card was mislabelled text-only — it now shows the "vision" badge. (2) NEW: a "Vision check" button appears on the running model\'s card when it supports images. Upload a picture (e.g. a business card) with a prompt and see what the model reads back, right in the dashboard — confirmed reading a business card cleanly on the Qwen3.6 vision model. It uses the same on-LAN /v1 endpoint your apps already use, so nothing leaves your network. No consumer-API changes; the /v1 proxy, swap, and coordination APIs are unchanged.',
|
||||
},
|
||||
migrations: {
|
||||
up: async ({ effects }) => {},
|
||||
|
||||
Reference in New Issue
Block a user