spark-control

Author	SHA1	Message	Date
Keysat	7e0759846f	v0.27.0:0 - in-app settings gear + swap-lock route fix Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks" action (now just the 4 required fields) and into a dashboard ⚙ Settings gear, backed by a /data/app_settings.json overlay keyed by env-var names. One shared mutable Settings instance + Settings.reload() applies edits live without a restart; existing installs' values migrate automatically on first boot. Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the /api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its url/secret reload live too.	2026-06-18 13:41:28 -05:00
Keysat	df9f244eae	v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup) The dashboard menu is now the set of models actually downloaded on the Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as launch recipes matched to an on-disk model by repo; an on-disk model with no recipe is flagged needs_setup and its launch settings are inferred from its config.json for a one-time operator confirmation (discovery.py). - delete now removes weights AND the menu card (delete_from_disk sweeps all hosts; the delete endpoint resolves keys via the live menu) - new GET /api/models/suggest; /api/models returns the menu + a recipes list (download autocomplete); GET /api/models/disk-status removed - dropped the two legacy Qwen recipes (235B FP8, 2.5 72B) - tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)	2026-06-18 11:09:56 -05:00
Keysat	7ae6ab3ba8	v0.25.0:0 - cluster coordination layer (swap lock + webhook + schedule registry) GPU-arbiter safety layer for when automation, not just the dashboard, swaps models: - swap reservation lock (POST/GET/DELETE /api/swap/lock); 423-enforced in post_swap via a single-read gate, TTL-bounded, secret-token auth, human force-release override + dashboard banner - swap webhook (swap_complete/swap_failed) fired outside the swap lock, optional HMAC signature, configurable URL+secret - read-only schedule registry (GET/POST/DELETE /api/schedule) + dashboard panel New module image/app/coordination.py; docs/COORDINATION.md for consumers; 22 offline tests in test_coordination.py.	2026-06-18 07:07:08 -05:00
Keysat	26070eb191	v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor) Make the cluster topology configurable so an adopter wired differently (vLLM on both Sparks, port 8000, different container name, no Parakeet) can monitor without forking. Covers the OpenClaw report P4/P5/#6. - VLLM_CONTAINER override (default vllm_node), validated at the boundary and quote_arg-quoted into the swap log-tail + pre-flight validator exec. - DISABLED_SERVICES list: hidden services show no tile and are skipped by status/deep-health/connectivity probes (kills the Parakeet-on-8000 collision). - kind: vllm custom service monitors a second Spark's vLLM via the shared probe_vllm_endpoint; /api/endpoints gains a disabled flag. Swap mechanism intentionally not generalized to raw docker run (that's coordination, roadmap item 4).	2026-06-17 23:03:33 -05:00
Keysat	e783653ef0	v0.23.0:0 - local / fine-tuned model support Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes), not just Hugging Face repos. - ModelDef gains local_path; a model must set exactly one of repo / local_path. The validator also enforces the local-path whitelist and that any --chat-template lives inside local_path (only that dir is mounted). - build_launch_command bind-mounts the dir into the vLLM container at the SAME host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook, then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream expands that var unquoted; contract noted in runbook.md). - shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'. - POST /api/models validates the full entry via ModelDef before persisting, so a bad entry can't be written and then break catalog load; _merge_overrides skips an invalid override entry instead of failing the whole catalog. - disk.py size-probes a local path with du; disk-delete refused for local models. - UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF link, delete button hidden for local models. - Tests: local launch + injection round-trip, chat-template location, traversal, exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent pass; findings addressed.	2026-06-17 22:27:41 -05:00
Keysat	39f8410623	v0.21.0:0 - matrix-bridge bot tile (status, update, restart, logs)	2026-06-15 22:57:40 -05:00
Keysat	e87158c492	v0.20.0:0 - per-spark ssh-key copy + wireguard status badge	2026-06-15 09:53:40 -05:00
Keysat	1c4e861783	v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf Triaged from a full independent evaluation (EVALUATION.md). Addresses the three P0/P1 code findings; the proxy/data APIs that downstream apps consume are deliberately untouched. - ssh command injection (P0): new shellsafe.py validates + shlex.quotes every user-supplied value crossing into an SSH command on the Sparks (model repo, vllm args/knobs, NIM image/container/volume/port/env, service names). Boundary validation on POST /api/models and POST /api/nim/install; quoting at every sink in models/download/nim/services. NGC key now quoted too. - qdrant path injection (P1): /api/search validates the collection name against a metacharacter-free whitelist and URL-encodes the path segment. - csrf (P1): csrf_guard middleware enforces same-origin on state-changing control endpoints; /v1/, /scrub, /rehydrate, /api/search, /api/audio/ and /api/health-event are exempt so external consumers are unaffected. Verified: injection survives only as a single quoted token, vLLM preflight shlex.split round-trip intact, CSRF behaviors covered via TestClient, both offline redaction suites still pass, tsc clean, s9pk rebuilt.	2026-06-12 16:36:33 -05:00
Keysat	8d839e3714	v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API - Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests) - Add embeddings proxy and spark_embed service (Dockerfile + main.py) - Expand audio_proxy with speaker-aware handling; deep_health/health/server updates - Package: configureSparks action + sparkConfig model updates, manifest/main wiring - Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh	2026-06-11 17:45:57 -05:00
Keysat	4a75274db3	v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM Recap Relay dev caught that all audio endpoints route through Spark Control but chat-completions didn't — clients had to know about both SC AND the direct vLLM URL on Spark 1. Closes that last gap. New endpoints: POST /v1/chat/completions — OpenAI-shape, forwards to vLLM on Spark 1 POST /v1/completions — legacy OpenAI completions, same path Implementation (image/app/llm_proxy.py): - Dumb forwarder: request body passed through verbatim, response body streamed back chunk-by-chunk. No transformation. vLLM already speaks the same shape; adding any logic here would just create skew. - Streaming: parses body for `stream: true` and uses httpx.AsyncClient .stream() + FastAPI StreamingResponse if so. Non-streaming path is a simple post-and-return. - 30-minute timeout to accommodate large-context completions (default httpx 5s would kill anything substantial). - On upstream non-200 in streaming mode: emits one SSE `error` event so the client's parser doesn't hang on an empty stream forever. - On upstream connection error: HTTP 502 with "vllm unreachable" detail. Now clients can use ONE host for everything: POST https://spark-control/api/audio/diarize-chunk POST https://spark-control/v1/audio/transcriptions POST https://spark-control/v1/chat/completions GET https://spark-control/api/endpoints (still works for clients that prefer the direct URLs) No parakeet container changes. No Reapply patches needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 19:58:19 -05:00
Keysat	95524f4983	v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we never got a working docker build. The fundamental constraint isn't patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch 2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere. WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't satisfy without rebuilding torchaudio against torch 2.10's alpha API. Walking away cleanly is better than another night of chasing. Removed from the codebase: - image/whisperx_container/* (Dockerfile + requirements + app/main.py) - image/app/whisperx_install.py (install manager + SSH ship-context logic) - image/Dockerfile COPY whisperx_container - WHISPERX_* config keys in config.py - whisperx service entry in services.py - WhisperX-preferred branch in audio_proxy.py - /api/whisperx/* endpoints in server.py - install banner + progress dialog in index.html - render + handlers in app.js - .whisperx-install styles in style.css Spark 2 cleaned in tandem (user-authorized): container removed, ~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of builder cache reclaimed. parakeet-asr and magpie-tts unaffected and healthy throughout. The audio path is back to exactly what shipped in v0.11.0:3: POST /api/audio/transcribe-with-speakers → Parakeet (transcription) + Sortformer (diarization) in parallel → merged by timestamp into speaker-labeled blocks v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour was meant to address: 1. memory cap on the parakeet-asr container so a long-audio crash can't swap-thrash Spark 2 again 2. a chunking proxy in /api/audio/transcribe-with-speakers that splits inputs >10 min before Sortformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:03:19 -05:00
Keysat	5a0bfba6a3	v0.12.0:0 - WhisperX as a one-click dashboard install + managed service Replaces the manual rsync+build+run with a proper spark-control feature. First in the audio path that doesn't require shell access on Spark 2. What's in the box ───────────────── * image/whisperx_container/ - the build context (Dockerfile, requirements, app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT + pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint /v1/audio/transcribe-with-speakers returns the exact same shape spark- control's existing endpoint does, so the recap-relay PR spec needs no changes when we cut over. * image/app/whisperx_install.py - install manager. ships build context to Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min file), polls /health until both Whisper + pyannote report loaded. * Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX when its /health reports diarizer_loaded=true, falls back to the legacy Parakeet + Sortformer path otherwise. Same response shape either way. Clean cutover, easy rollback (`docker rm whisperx-asr`). * Dashboard (Audio / Speech tab): - "Add WhisperX" banner appears when not installed, with a primary "Install WhisperX" button. One click triggers the install. - Build progress dialog with phase + elapsed timer + live build log via SSE (`/api/whisperx/install/{job_id}/stream`). - After install, WhisperX auto-registers as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart). - Banner self-hides once /api/whisperx/status reports healthy. New endpoints ───────────── GET /api/whisperx/status POST /api/whisperx/install GET /api/whisperx/install/{job_id} GET /api/whisperx/install/{job_id}/stream (SSE phase + log) Config additions (env) ────────────────────── WHISPERX_HOST (defaults to spark2_host) WHISPERX_USER (defaults to spark2_user) WHISPERX_CONTAINER (default: whisperx-asr) WHISPERX_PORT (default: 8002) WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3) Dockerfile ────────── Added COPY whisperx_container /app/whisperx_container so the runtime install manager can read the build context from inside the spark-control image and ship it over SSH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:02:26 -05:00
Keysat	391117f705	v0.11.0:0 - Speech model patches panel (lifecycle for v0.10.0 overlays) Folds the image/parakeet_patches/apply.sh script into a one-click dashboard action and adds drift detection so you can see at a glance whether the parakeet-asr container has the latest Sortformer overlays that spark-control ships. Backend: * image/app/speech_models.py - SpeechModelsManager: reads /health from Parakeet, sha256s the local overlay files inside spark-control's Docker image (/app/parakeet_patches), sha256s the same files inside the parakeet-asr container via `docker exec ... sha256sum`, surfaces in_sync / drift / missing status per file. * GET /api/speech-models - status payload * POST /api/speech-models/reapply - copies overlays into container, verifies python syntax, restarts, polls /health for ~120s, returns step-by-step result * POST /api/speech-models/restart - plain `docker restart parakeet-asr` Dockerfile: now COPY parakeet_patches into the image at /app/parakeet_patches so the runtime can read them. Future spark-control releases auto-carry newer overlay versions; the panel surfaces drift after upgrade. Frontend: new "Speech model patches" section on the dashboard with * Status pill (in sync / drift / missing) * Per-file SHA comparison (local vs container) * Loaded-models pills (ASR + diarizer) * Reapply + Restart buttons (both with confirmation modals) * Live progress display during reapply with per-step ✓/✗ Verified post-install against the running cluster: GET /api/speech-models shows both files in_sync (SHAs match) and both models loaded ready on Spark 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:58:13 -05:00
Keysat	197655a62b	v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention) fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the proxy relayed the upstream 500 verbatim, Open WebUI showed "Server connection error" with no signal to retry, and recovery took up to 5 minutes (waiting for the next periodic deep-health probe). Now the proxy: 1. Detects 500 from /v1/audio/transcriptions 2. Fires deep_health.run_one("parakeet") as a background asyncio task (which contains the same wedge-detect + rate-limited auto-restart logic, but runs immediately instead of waiting for the next tick) 3. Returns 503 with a clear detail message and Retry-After: 60 The client (Open WebUI, Home Assistant, etc.) gets a proper retry signal; the auto-restart triggers inside seconds; the next attempt ~60s later succeeds. Rate-limiting (3 restarts per 30 min) is inherited from the deep-health module so this can't cause restart storms. server.py: pass deep_health into build_audio_router(). audio_proxy.py: new 503-with-restart branch; signature now accepts deep_health as an optional dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:07:35 -05:00
Keysat	f44e7f8b03	v0.9.0:0 - OpenAI-compatible audio proxy for Open WebUI / Home Assistant Adds three new endpoints to spark-control that translate OpenAI's audio API shapes to the Parakeet (STT) and Magpie (TTS, NVIDIA Riva) services on the Sparks: GET /v1/models — STT model + Magpie's 60+ voices POST /v1/audio/speech — OpenAI body -> Magpie multipart synthesize (returns audio/wav passthrough) POST /v1/audio/transcriptions — relay to Parakeet (already compatible) Verified shapes against the live services: - Parakeet returns OpenAI-style {"text": "..."} or verbose_json with segments+words. Already a perfect drop-in for OpenAI clients. - Magpie returns raw WAV bytes with Content-Type: audio/wav. NOT base64-wrapped JSON as one might assume. The proxy is literally a body-translation on the request side; response is passthrough. Voice language is auto-derived from the voice name (e.g. Magpie-Multilingual.EN-US.Mia -> language=en-US) so clients don't need to set it explicitly. Open WebUI / Home Assistant / Recap Relay can now all point at one URL — https://<spark-control>.local/v1 — and get LLM, STT, TTS behind a single identity. No shim service to deploy. Pure addition: no existing routes touched; the dashboard, /api/*, download flow, deep-health, hardware probes are all unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:41:48 -05:00
Keysat	9ff7ee9c1e	v0.8.1:0 - delete model weights from disk via card trash icon Each model card now shows whether its weights are present on disk (with GB size) or not yet downloaded. When present and the model isn't currently loaded, a trash icon appears; clicking it pops a confirmation showing exactly how many GB will be freed and on which Spark(s), then runs rm -rf on the HF cache directory via SSH. Cluster-mode models are removed from both Sparks; solo-mode from Spark 1 only. Safety rails: refuses to delete the currently-loaded model, refuses during an in-flight swap or download, and the catalog entry stays intact so it can be re-downloaded anytime. Backend: - new image/app/disk.py: probe_disk + delete_from_disk over SSH - GET /api/models/disk-status — parallel probe across all catalog models - DELETE /api/models/{key}/disk — guarded rm -rf, logs to connectivity events Frontend: - on-disk / not-downloaded pills on every card - trash icon-btn in card-actions row (hidden when not on disk) - confirmation dialog showing per-host bytes-to-free - disk-status re-checked every 60s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:07:20 -05:00
Grant	000c55febe	v0.8.0 - Deep health probes + auto-restart on CUDA wedge deep_health.py: - Synthetic probes per service, all payloads generated in-memory (BytesIO), never written to disk: - Parakeet: 1s of digital silence via in-memory WAV → POST /v1/audio/transcriptions - Magpie: short 'hi' text → POST /v1/audio/synthesize (multipart form-data, real TTS API endpoint discovered via openapi.json) - vLLM: 1-token completion against currently-loaded model - Background loop runs every 5 minutes (configurable). Best-effort: exceptions in the loop never kill it. - Auto-restart on wedge-pattern errors (cudaErrorUnknown / CUFFT_INTERNAL_ERROR / 500 / Engine core init failed): docker restart of the affected container. - Rate-limited: max 3 restarts per service per 30 min. - Cooldown: 120 s between consecutive restarts on the same service. - 60 s startup grace before any auto-restart can fire after the app boots. - Probe failures + recoveries logged via record_report(source='deep-health') into the connectivity history alongside the polling-based transitions. API: - GET /api/deep-health: per-service last result + auto-restart counters - POST /api/deep-health/{service}/run: manual trigger now UI: - Service cards show 'Deep check ok/FAILED <time> <latency>' inline, plus a ↻ button to run-now - Auto-restart count in 30-min window surfaced on the card when > 0 - Inline error excerpt shown for failed probes Bug fix: server.py app startup hook was placed before the FastAPI app object was constructed (would crash on import). Moved after.	2026-05-12 14:41:01 -05:00
Grant	6434b01a95	v0.7.0 - Pre-flight launch validation (Test button on every model card) validate.py: - Builds the same args list a real swap would pass to 'vllm serve' - SSHes into Spark 1 and runs vLLM's own argparse layer inside the running vllm_node container, WITHOUT initializing the engine - Uses FlexibleArgumentParser (from vllm.utils.argparse_utils, with fallback to engine.arg_utils) + make_arg_parser — the exact same parser the 'vllm serve' CLI uses. Earlier attempt with bare argparse.ArgumentParser was too strict (rejected '--moe_backend' with underscore that the real CLI accepts via FlexibleArgumentParser's normalization) - Returns structured {ok, stage, error, cmd_args, launch_cmd} so the UI can surface the exact failure cause Endpoint: POST /api/swap/{key}/validate. Cheap (~5s), no engine init, no disruption to the currently-loaded model. Frontend: 'Test' button on every model card, inline result below the action row (green check or red detailed error). Result stays visible until the user reloads or clicks Test again. Catches: typos in flag names, deprecated/removed flags after a vLLM upgrade, type mismatches. Does NOT catch runtime-only failures (Mamba block-size assertion, OOM at load, kernel-compat). Ok=true is necessary-but-not-sufficient; ok=false is definitive 'don't bother running it'.	2026-05-12 13:37:37 -05:00
Grant	ee8c2406b8	v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint connectivity.py: - Generalized 'spark' subject to any string; renamed 'spark' field to 'subject' - Legacy v0.5 events with the old 'spark' field are migrated transparently on read (kind defaults to 'transition') - New record_report(subject, ok, source, detail, latency_ms): always appends an event with kind='report'; does NOT mutate the current state (only active polling is authoritative) - summary() returns events normalized to the new schema Wiring: - /api/status now calls record_state for vllm/parakeet/magpie (dedup on no-change) - /api/services calls record_state for each service after its http check - Result: dashboard observes service-level transitions automatically with no extra polling Passive endpoint: - POST /api/health-event with {service, ok, source?, error?, ms?} - Useful for external apps (e.g. Open WebUI) to surface sub-poll-interval failures the dashboard would otherwise miss UI: - Connectivity dialog groups events by subject (hosts ordered first, then services) - Per-subject summary shows transition count, down count, report count, failed-report count - Transitions and reports render inline with distinct styling; reports show source app + error + latency - Legacy v0.5 events render unchanged Docs: - README documents /api/health-event with a curl example Package: bump to 0.6.0:0	2026-05-12 13:19:27 -05:00
Grant	a02f4db850	v0.5.0 - Wake-on-LAN + connectivity history wol.py: - build_magic_packet(): standard 6x0xFF + 16x MAC layout - send_local_broadcast(): direct from container (ports 9 + 7 for safety) - send_via_peer(): preferred path; SSHes to the OTHER Spark and runs a Python one-liner there so the packet originates on the target's LAN segment (most reliable) - MAC validation + normalization connectivity.py: - /data/connectivity.json persistence (thread-safe, atomic rename) - Stores per-Spark current state + last_change timestamp + rolling 200-event log - Records up/down transitions; computes down_seconds / up_seconds durations - MAC cache populated lazily during hardware probes hardware.py: - Probe now reads MAC via /sys/class/net/<default-route-iface>/address - After each probe, record_state() emits a transition event if state changed - record_mac() caches the address so WoL works when the Spark next goes down Endpoints: - GET /api/connectivity: macs, current state, last_change, events[] - POST /api/spark/{name}/wake: tries via-peer first, falls back to direct broadcast UI: - Unreachable hardware card shows the cached MAC + 'Wake (WoL)' button (only if MAC known) - New 'Connectivity log' button opens a modal with per-Spark transition history (last 25 each), including duration of each prior up/down period - pollHardware also pulls /api/connectivity so WoL buttons appear without an extra fetch Package: bump 0.5.0:0; main.ts sets CONNECTIVITY_LOG=/data/connectivity.json	2026-05-12 12:51:49 -05:00
Grant	1889ab45fb	v0.4.0 - NIM installer + dashboard resilience Hotfix (was v0.3.1): - services.py: cache 'unreachable' per (host,user) for 25s so a dead Spark doesn't hang every /api/services call behind 6s ssh timeout - ssh_run timeout reduced 10 -> 6s for docker_state probes - hardware probe: shorter SSH timeout (6s), longer cache TTL for failures (25s) - JS pollStatus retries loadModels() if state.models is empty (recovers from cold-start proxy timeout) - Unreachable hardware card now includes troubleshooting steps (Spark Control cannot SSH into an unreachable Spark to restart it) v0.4 NIM installer: - nim.py module: curated SUGGESTED_NIMS list (Parakeet, Magpie, Riva) + NimManager that runs docker login nvcr.io + docker pull + docker run -d --gpus all -p PORT:PORT -v VOL:/opt/nim/.cache -e NGC_API_KEY -e ... --restart=unless-stopped + chown the volume to uid 1000 + restart. Streams all output via SSE; redacts the API key from log lines. - custom_services.py: persists installed NIMs to /data/services-overrides.yaml so they appear in the services panel after install - services.py: merges custom services into the panel - /api/nim/catalog GET, /api/nim/install POST + GET/SSE - /api/services/{name} DELETE for custom services - UI: '+ Install NIM' button next to 'Always-on services'; modal lists curated images each with a 'Pick' button + a custom-image form; installation runs in a second dialog with phase + elapsed timer + collapsible log - NGC API key field added to Configure Sparks (masked); injected as NGC_API_KEY env var into the container Package: bump 0.4.0:0; main.ts adds SERVICES_OVERRIDES + NGC_API_KEY env vars	2026-05-12 12:32:29 -05:00
Grant	64ce0fca10	v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link Hardware dashboard: - New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum - DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true. - 4s TTL cache in HardwareProbe to avoid hammering - /api/hardware returns per-Spark snapshot - UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling - Polls every 8s Knob context (tied to live hardware): - Each Advanced knob now shows plain-English help text - 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM - 'Max context' shows '~N pages of text' - Toggles show tradeoff descriptions Explain context: - '✨ Explain context' button on the update banner - /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE - Renders into an expandable 'Explained by the loaded LLM' section under Pending commits - Reasoning tokens shown italicized when the model emits them Open WebUI integration: - New 'Open WebUI URL' optional field in Configure Sparks - /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set Downloads: - Third radio option: Spark 1 only / Spark 2 only / Both Sparks - Backend picks SSH target based on mode - HF repo link icon next to the input - Helper line about NVFP4 for Blackwell Model cards: - Repo name is now a clickable link to its Hugging Face page Package: bump 0.3.0:0	2026-05-12 12:00:15 -05:00
Grant	75fd0846b4	v0.2.3 - Per-model Advanced settings + catalog-add for downloaded models Backend: - overrides.py: read/write /data/models-overrides.yaml (knobs + custom entries) - apply_knobs_to_args(): strip matching flags from bundled vllm_args and append knob values, so knob changes properly override bundled defaults - extract_knobs_from_args(): seed UI knob values from bundled args so the Advanced dialog has correct starting state - models.py: load_catalog merges overrides on top of bundled yaml - GET /api/models returns effective_knobs per model - PUT /api/models/{key}/knobs persists knob changes - POST /api/models adds a custom catalog entry - DELETE /api/models/{key} removes a custom entry (bundled models cannot be deleted) - swap_manager.reload_catalog() called after each mutation so swaps see latest Frontend: - New 'Advanced' button on every card opens a modal dialog: max-model-len input, gpu-memory-utilization slider, three optimization checkboxes (fastsafetensors, prefix caching, FP8 KV cache). Save persists; Cancel discards. Custom models also have a Delete button. - After a successful download, automatically open the 'Add to catalog' dialog pre-filled with the repo, with the same knob defaults — user just enters key, display name, and clicks Save. - Custom catalog entries are tagged with a blue 'custom' pill on the card. Package: bump 0.2.3:0; main.ts sets MODELS_OVERRIDES=/data/models-overrides.yaml so overrides persist on the StartOS volume.	2026-05-12 11:30:47 -05:00
Grant	474417b458	v0.2.2 - spark-vllm-docker update checks + Apply Update Backend: - updates.py: get_update_status() runs git fetch + git rev-list --left-right --count HEAD...origin/main to learn ahead/behind/dirty, plus git log for pending commits - UpdateManager class with asyncio.Lock; one update at a time - POST /api/updates/apply triggers "git pull --ff-only && ./build-and-copy.sh -c" over SSH with streamed log + phase detection (Pulling / Building the vLLM container / Copying to peer Sparks) - GET /api/updates returns {ok, behind, ahead, dirty, current, log[], branch} Frontend: - Persistent banner near footer: hidden when up-to-date, blue when N commits behind, warn (orange) when local dirty changes block update - 'Show details' expands a list of pending commits - 'Apply update' triggers the long-running build with phase + elapsed timer + collapsible logs - Confirmation dialog explains the 5–40 min duration Package: bump 0.2.2:0	2026-05-12 11:26:55 -05:00
Grant	9dde938348	v0.2.1 - Model download with %% progress Backend: - download.py module: drives ./hf-download.sh <repo> [-c --copy-parallel] over SSH, parses tqdm output (regex matches '8%\|...\| 2.06G/25.1G [03:20<18:35, 20.6MB/s]') into percent + bytes done/total + elapsed + ETA + rate - DownloadManager: in-memory job tracking with asyncio.Lock (one download at a time) - POST /api/download, GET /api/download/{id}, SSE /api/download/{id}/stream - Phase detection: Connecting / Fetching N files / Downloading / Copying to peer Sparks / Done Frontend: - '+ Download a new model' button next to LLM swap section title - Inline form: HF repo text field + solo/cluster radio + Cancel/Start - Progress UI: spinner, elapsed timer, phase label, percent fill, stats line (bytes/rate/ETA), collapsible raw logs Package: bump 0.2.1:0	2026-05-12 11:24:31 -05:00
Grant	27699a2469	v0.2.0 - Always-on services panel with per-service host config Dashboard: - New 'Always-on services' section with cards for Parakeet and Magpie - Each card: host:port, model loaded, status pill (Healthy/Unhealthy/Starting/Not configured) - Start, Restart, Stop buttons. Buttons disabled when not applicable for current state - Restart counter shown when > 1 (would have surfaced the old magpie crash loop) Backend: - New /api/services GET: docker container state + http health for each support service - New POST /api/services/{name}/{action} for start \| stop \| restart - services.py module: docker_state, run_action via SSH - config.py: PARAKEET_HOST/USER/CONTAINER and MAGPIE_* env vars, default to spark2_* - health.py: use per-service hosts (no longer hard-wired to spark2_host) Package: - sparkConfig.yaml.ts: add 6 new optional fields - configureSparks action: optional 'Parakeet host', 'Parakeet container', 'Magpie host', 'Magpie container' fields; descriptions explain they default to Spark 2 when blank - Handler normalizes nulls to empty strings before merge - main.ts: pass new env vars to container - bump to 0.2.0:0	2026-05-12 11:21:15 -05:00
Grant	2ba3da55b1	0.1.0:3 - Show Public Key layout + /api/endpoints service-discovery - showPublicKey now uses result.group: install command and raw key are each their own one-click copy box; description is brief - /api/endpoints returns stable shape { vllm, parakeet, magpie } with base_url + model + ready, for other LAN services to consume without hardcoding Spark IPs - health.py: parakeet/magpie now also expose base_url - README: documented /api/endpoints shape	2026-05-12 10:52:57 -05:00
Grant	ae8efa1754	Initial scaffold: image/ FastAPI app, models.yaml, docs - image/ FastAPI app: /api/status, /api/swap, /api/swap/{id}/stream, /api/test-connection - models.yaml: 5-model catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen25-72b) - README, runbook, known-issues - Dry-run swap verified against live Spark 1 (gemma4 currently loaded)	2026-05-12 09:29:13 -05:00

28 Commits