64ce0fca10
Hardware dashboard:
- New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum
- DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true.
- 4s TTL cache in HardwareProbe to avoid hammering
- /api/hardware returns per-Spark snapshot
- UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling
- Polls every 8s
Knob context (tied to live hardware):
- Each Advanced knob now shows plain-English help text
- 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM
- 'Max context' shows '~N pages of text'
- Toggles show tradeoff descriptions
Explain context:
- '✨ Explain context' button on the update banner
- /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE
- Renders into an expandable 'Explained by the loaded LLM' section under Pending commits
- Reasoning tokens shown italicized when the model emits them
Open WebUI integration:
- New 'Open WebUI URL' optional field in Configure Sparks
- /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set
Downloads:
- Third radio option: Spark 1 only / Spark 2 only / Both Sparks
- Backend picks SSH target based on mode
- HF repo link icon next to the input
- Helper line about NVFP4 for Blackwell
Model cards:
- Repo name is now a clickable link to its Hugging Face page
Package: bump 0.3.0:0
66 lines
2.0 KiB
TypeScript
66 lines
2.0 KiB
TypeScript
import { i18n } from './i18n'
|
|
import { sdk } from './sdk'
|
|
import { uiPort } from './utils'
|
|
import { sparkConfigYaml } from './fileModels/sparkConfig.yaml'
|
|
|
|
export const main = sdk.setupMain(async ({ effects }) => {
|
|
console.info(i18n('Starting Spark Control…'))
|
|
|
|
// Reactively read SSH targets from the user-configured yaml file.
|
|
// Changing this file via the "Configure Sparks" action restarts the daemon.
|
|
const cfg = (await sparkConfigYaml.read().const(effects)) ?? {
|
|
spark1_host: '',
|
|
spark1_user: '',
|
|
spark2_host: '',
|
|
spark2_user: '',
|
|
parakeet_host: '',
|
|
parakeet_user: '',
|
|
parakeet_container: '',
|
|
magpie_host: '',
|
|
magpie_user: '',
|
|
magpie_container: '',
|
|
open_webui_url: '',
|
|
}
|
|
|
|
return sdk.Daemons.of(effects).addDaemon('primary', {
|
|
subcontainer: await sdk.SubContainer.of(
|
|
effects,
|
|
{ imageId: 'spark-control' },
|
|
sdk.Mounts.of().mountVolume({
|
|
volumeId: 'main',
|
|
subpath: null,
|
|
mountpoint: '/data',
|
|
readonly: false,
|
|
}),
|
|
'spark-control-sub',
|
|
),
|
|
exec: {
|
|
command: ['/app/entrypoint.sh'],
|
|
env: {
|
|
SPARK1_HOST: cfg.spark1_host,
|
|
SPARK1_USER: cfg.spark1_user,
|
|
SPARK2_HOST: cfg.spark2_host,
|
|
SPARK2_USER: cfg.spark2_user,
|
|
PARAKEET_HOST: cfg.parakeet_host,
|
|
PARAKEET_USER: cfg.parakeet_user,
|
|
PARAKEET_CONTAINER: cfg.parakeet_container,
|
|
MAGPIE_HOST: cfg.magpie_host,
|
|
MAGPIE_USER: cfg.magpie_user,
|
|
MAGPIE_CONTAINER: cfg.magpie_container,
|
|
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
|
OPEN_WEBUI_URL: cfg.open_webui_url,
|
|
BIND_PORT: String(uiPort),
|
|
},
|
|
},
|
|
ready: {
|
|
display: i18n('Web Interface'),
|
|
fn: () =>
|
|
sdk.healthCheck.checkPortListening(effects, uiPort, {
|
|
successMessage: i18n('The web interface is ready'),
|
|
errorMessage: i18n('The web interface is not ready'),
|
|
}),
|
|
},
|
|
requires: [],
|
|
})
|
|
})
|