64ce0fca10
Hardware dashboard:
- New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum
- DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true.
- 4s TTL cache in HardwareProbe to avoid hammering
- /api/hardware returns per-Spark snapshot
- UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling
- Polls every 8s
Knob context (tied to live hardware):
- Each Advanced knob now shows plain-English help text
- 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM
- 'Max context' shows '~N pages of text'
- Toggles show tradeoff descriptions
Explain context:
- '✨ Explain context' button on the update banner
- /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE
- Renders into an expandable 'Explained by the loaded LLM' section under Pending commits
- Reasoning tokens shown italicized when the model emits them
Open WebUI integration:
- New 'Open WebUI URL' optional field in Configure Sparks
- /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set
Downloads:
- Third radio option: Spark 1 only / Spark 2 only / Both Sparks
- Backend picks SSH target based on mode
- HF repo link icon next to the input
- Helper line about NVFP4 for Blackwell
Model cards:
- Repo name is now a clickable link to its Hugging Face page
Package: bump 0.3.0:0
114 lines
3.7 KiB
TypeScript
114 lines
3.7 KiB
TypeScript
import { sdk } from '../sdk'
|
|
import { sparkConfigYaml } from '../fileModels/sparkConfig.yaml'
|
|
|
|
const { InputSpec, Value } = sdk
|
|
|
|
const inputSpec = InputSpec.of({
|
|
spark1_host: Value.text({
|
|
name: 'Spark 1 hostname or IP',
|
|
description:
|
|
'The head node of your DGX Spark cluster — the one that has ~/spark-vllm-docker cloned and runs the vLLM container. Enter its LAN IP (recommended) or hostname.',
|
|
required: true,
|
|
default: null,
|
|
placeholder: 'e.g. 192.168.1.10',
|
|
masked: false,
|
|
}),
|
|
spark1_user: Value.text({
|
|
name: 'Spark 1 SSH user',
|
|
description:
|
|
'The user account on Spark 1 to SSH in as — whatever you log in as when you ssh into it manually.',
|
|
required: true,
|
|
default: null,
|
|
placeholder: 'your SSH username',
|
|
masked: false,
|
|
}),
|
|
spark2_host: Value.text({
|
|
name: 'Spark 2 hostname or IP',
|
|
description:
|
|
'The worker node of your DGX Spark cluster (also runs always-on services like Parakeet/Magpie). Enter its LAN IP or hostname.',
|
|
required: true,
|
|
default: null,
|
|
placeholder: 'e.g. 192.168.1.11',
|
|
masked: false,
|
|
}),
|
|
spark2_user: Value.text({
|
|
name: 'Spark 2 SSH user',
|
|
description:
|
|
'The user account on Spark 2 to SSH in as. Usually the same as Spark 1.',
|
|
required: true,
|
|
default: null,
|
|
placeholder: 'your SSH username',
|
|
masked: false,
|
|
}),
|
|
parakeet_host: Value.text({
|
|
name: 'Parakeet host (optional)',
|
|
description:
|
|
'Override the host running the Parakeet STT container. Leave blank if Parakeet runs on Spark 2 — that\'s the default. Set this if you run Parakeet on Spark 1 or a different machine.',
|
|
required: false,
|
|
default: null,
|
|
placeholder: 'leave blank to use Spark 2',
|
|
masked: false,
|
|
}),
|
|
parakeet_container: Value.text({
|
|
name: 'Parakeet container name (optional)',
|
|
description:
|
|
'Docker container name for Parakeet. Defaults to "parakeet-asr" — change only if you named yours something else.',
|
|
required: false,
|
|
default: null,
|
|
placeholder: 'parakeet-asr',
|
|
masked: false,
|
|
}),
|
|
magpie_host: Value.text({
|
|
name: 'Magpie host (optional)',
|
|
description:
|
|
'Override the host running the Magpie TTS container. Leave blank if Magpie runs on Spark 2.',
|
|
required: false,
|
|
default: null,
|
|
placeholder: 'leave blank to use Spark 2',
|
|
masked: false,
|
|
}),
|
|
magpie_container: Value.text({
|
|
name: 'Magpie container name (optional)',
|
|
description:
|
|
'Docker container name for Magpie. Defaults to "magpie-tts".',
|
|
required: false,
|
|
default: null,
|
|
placeholder: 'magpie-tts',
|
|
masked: false,
|
|
}),
|
|
open_webui_url: Value.text({
|
|
name: 'Open WebUI URL (optional)',
|
|
description:
|
|
'If you also run Open WebUI on your LAN, paste its URL here. Spark Control will then show a one-click "Open chat" button next to the current model so you can jump straight to it.',
|
|
required: false,
|
|
default: null,
|
|
placeholder: 'e.g. https://open-webui.yourserver.local',
|
|
masked: false,
|
|
}),
|
|
})
|
|
|
|
export const configureSparks = sdk.Action.withInput(
|
|
'configure-sparks',
|
|
async () => ({
|
|
name: 'Configure Sparks',
|
|
description: 'Set the hostnames and SSH users for your two Spark nodes.',
|
|
warning: null,
|
|
visibility: 'enabled',
|
|
allowedStatuses: 'any',
|
|
group: null,
|
|
}),
|
|
async () => inputSpec,
|
|
async ({ effects }) => {
|
|
const cfg = await sparkConfigYaml.read().once()
|
|
return cfg ?? null
|
|
},
|
|
async ({ effects, input }) => {
|
|
// Optional fields come through as `null`; coerce to empty string for the schema.
|
|
const normalized = Object.fromEntries(
|
|
Object.entries(input).map(([k, v]) => [k, v ?? '']),
|
|
) as Record<string, string>
|
|
await sparkConfigYaml.merge(effects, normalized)
|
|
return null
|
|
},
|
|
)
|