Add per-model descriptions + repo-cleanup polish
- models.yaml: add 'description' field for all 5 models (generic, anyone-can-use) - ModelDef gains optional description: str | None field - UI: render description below meta tags; mute the repo line further - escapeHtml() for safety in case descriptions/names contain HTML chars - Update runbook: how to add a new model with description
This commit is contained in:
@@ -15,6 +15,11 @@ defaults:
|
||||
models:
|
||||
qwen3-vl:
|
||||
display_name: "Qwen3-VL 235B (vision)"
|
||||
description: >-
|
||||
Qwen's flagship multimodal model. 235B total parameters with ~22B
|
||||
active per token (Mixture-of-Experts). Handles text, images, and
|
||||
many languages. The most capable model in this catalog — also the
|
||||
slowest to load because it splits across both Sparks.
|
||||
repo: RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4
|
||||
size_gb: 135
|
||||
mode: cluster
|
||||
@@ -28,6 +33,10 @@ models:
|
||||
|
||||
gemma4:
|
||||
display_name: "Gemma 4 31B"
|
||||
description: >-
|
||||
Google's mid-size reasoning model. 31B dense parameters with built-in
|
||||
thinking mode and function-calling. Strong on math, logic, and
|
||||
structured outputs; also supports vision input. Runs solo on one Spark.
|
||||
repo: RedHatAI/gemma-4-31B-it-NVFP4
|
||||
size_gb: 23
|
||||
mode: solo
|
||||
@@ -45,6 +54,10 @@ models:
|
||||
|
||||
qwen36:
|
||||
display_name: "Qwen3.6 35B-A3B (daily driver)"
|
||||
description: >-
|
||||
Qwen's latest fast Mixture-of-Experts model: 35B total parameters but
|
||||
only ~3B active per token, making inference quick. Long 64K-token
|
||||
context window. A good default for everyday chat and longer documents.
|
||||
repo: RedHatAI/Qwen3.6-35B-A3B-NVFP4
|
||||
size_gb: 20
|
||||
mode: solo
|
||||
@@ -61,6 +74,10 @@ models:
|
||||
|
||||
qwen3-235b-fp8:
|
||||
display_name: "Qwen3 235B-A22B FP8 (legacy)"
|
||||
description: >-
|
||||
Earlier generation of the Qwen 235B family in native FP8 precision.
|
||||
Runs across both Sparks. Mostly superseded by Qwen3-VL above; keep
|
||||
around for text-only baseline comparisons.
|
||||
repo: Qwen/Qwen3-235B-A22B-FP8
|
||||
size_gb: 220
|
||||
mode: cluster
|
||||
@@ -74,6 +91,9 @@ models:
|
||||
|
||||
qwen25-72b:
|
||||
display_name: "Qwen2.5 72B (legacy)"
|
||||
description: >-
|
||||
Last-generation 72B dense model. Cluster mode required due to size.
|
||||
Kept for compatibility and baseline comparison against newer Qwens.
|
||||
repo: Qwen/Qwen2.5-72B-Instruct
|
||||
size_gb: 145
|
||||
mode: cluster
|
||||
|
||||
Reference in New Issue
Block a user