spark-control

Author	SHA1	Message	Date
Keysat	9ef9226e0a	docs: split CLAUDE.md into path-scoped .claude/rules; fix dev/test commands - CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance moved to .claude/rules/{startos-package,fastapi-image,redaction, audio-speech}.md with paths: frontmatter so each loads only when matching files are touched - .gitignore: track .claude/rules/ while keeping the rest of .claude/ (settings.local.json) ignored - test-audio-with-speakers.sh: require audio-file arg in docs, replace owner-specific SPARK_CONTROL/VLLM defaults with generic ones (localhost dev server + Spark Control vLLM proxy), discover the loaded LLM via /api/status since /v1/models lists audio models only - document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local dev (/data only exists in the container) - prettier pass over startos/actions (formatting drift)	2026-06-11 19:12:23 -05:00
Keysat	8d839e3714	v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API - Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests) - Add embeddings proxy and spark_embed service (Dockerfile + main.py) - Expand audio_proxy with speaker-aware handling; deep_health/health/server updates - Package: configureSparks action + sparkConfig model updates, manifest/main wiring - Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh	2026-06-11 17:45:57 -05:00
Keysat	4a75274db3	v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM Recap Relay dev caught that all audio endpoints route through Spark Control but chat-completions didn't — clients had to know about both SC AND the direct vLLM URL on Spark 1. Closes that last gap. New endpoints: POST /v1/chat/completions — OpenAI-shape, forwards to vLLM on Spark 1 POST /v1/completions — legacy OpenAI completions, same path Implementation (image/app/llm_proxy.py): - Dumb forwarder: request body passed through verbatim, response body streamed back chunk-by-chunk. No transformation. vLLM already speaks the same shape; adding any logic here would just create skew. - Streaming: parses body for `stream: true` and uses httpx.AsyncClient .stream() + FastAPI StreamingResponse if so. Non-streaming path is a simple post-and-return. - 30-minute timeout to accommodate large-context completions (default httpx 5s would kill anything substantial). - On upstream non-200 in streaming mode: emits one SSE `error` event so the client's parser doesn't hang on an empty stream forever. - On upstream connection error: HTTP 502 with "vllm unreachable" detail. Now clients can use ONE host for everything: POST https://spark-control/api/audio/diarize-chunk POST https://spark-control/v1/audio/transcriptions POST https://spark-control/v1/chat/completions GET https://spark-control/api/endpoints (still works for clients that prefer the direct URLs) No parakeet container changes. No Reapply patches needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 19:58:19 -05:00
Keysat	c7f94381e7	v0.13.0:2 - per-segment confidence in diarize-chunk response Recap Relay dev asked: can the diarization output include a confidence level per segment so the UI can render "Speaker_0?" for uncertain assignments rather than confidently mislabeling? Answer: yes. Sortformer's diarize() with include_tensor_outputs=True returns the per-frame per-speaker sigmoid scores (shape [B, T, 4spk], ~12.6 fps frame rate). The current code argmaxes those into segment strings and throws the raw scores away. Now: for each output segment, compute mean probability of the assigned speaker across the segment's frames → confidence in [0, 1]. Implementation: - diarizer.py: diarize_chunk() now calls diarize() with include_tensor_outputs=True, and a new _attach_confidence() helper derives the per-segment mean probability after parsing the segment strings. The frame-rate is computed from tensor shape vs audio duration (no need to hard-code the model's stride). - All failure paths return confidence=None gracefully — Recap Relay can treat None as "no info" or fall back to a default threshold. Endpoint shape change: segments[] now have an optional `confidence` field in [0, 1] (or None). All other fields unchanged. Existing callers that ignore the field aren't affected. Verified with a 5s test signal that the tensor has shape [1, 63, 4] (63 frames / 5s = 12.6 fps) and values in [0, 1] (sigmoid outputs, independent per speaker so overlap detection works). Real speech values will be much higher than the near-zero values of the pure-tone test signal. Reapply patches on the Speech Models card after installing v0.13.0:2 to pick up the updated diarizer.py + main.py in the parakeet container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:36:25 -05:00
Keysat	e775906caa	v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints Spark Control now exposes a per-chunk worker designed for Recap Relay to orchestrate against. Recap Relay does the chunking + global speaker clustering (consistent with how it already handles the Gemini path); Spark Control handles the GPU-bound per-chunk work. Parakeet container: - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method runs Sortformer + extracts one 192-dim voice fingerprint per detected local speaker (concatenating each speaker's audio across the chunk and running TitaNet's get_embedding). - main.py: new POST /v1/audio/diarize-chunk endpoint that returns segments + speakers_detected + fingerprints + models in one shot. Spark Control: - new POST /api/audio/diarize-chunk that proxies to parakeet's new endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s retry-after) as the other audio endpoints. Returns the raw JSON upstream because Recap Relay is the consumer; no merging needed. Response shape Recap Relay receives per chunk: { "duration": 300.0, "segments": [{"start_s","end_s","speaker"}, ...], # LOCAL labels "speakers_detected": ["Speaker_0","Speaker_1",...], "fingerprints": {"Speaker_0":[192 floats], ...}, "models": {"diarization":"...","embedding":"..."} } Recap Relay's job: 1. Chunk audio (existing chunking infrastructure) 2. POST each chunk to /api/audio/diarize-chunk in parallel 3. Collect all fingerprints from all chunks 4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine) 5. Re-label segments with global cluster IDs 6. Concatenate transcripts (from a separate parallel call to /v1/audio/transcriptions) with timestamp offsets and merge with re-labeled diar segments After installing v0.13.0:1, click "Reapply patches" on the Speech Models card to push the updated diarizer.py + main.py into the parakeet container — TitaNet will download (~25 MB) on first call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 11:37:05 -05:00
Keysat	95524f4983	v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we never got a working docker build. The fundamental constraint isn't patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch 2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere. WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't satisfy without rebuilding torchaudio against torch 2.10's alpha API. Walking away cleanly is better than another night of chasing. Removed from the codebase: - image/whisperx_container/* (Dockerfile + requirements + app/main.py) - image/app/whisperx_install.py (install manager + SSH ship-context logic) - image/Dockerfile COPY whisperx_container - WHISPERX_* config keys in config.py - whisperx service entry in services.py - WhisperX-preferred branch in audio_proxy.py - /api/whisperx/* endpoints in server.py - install banner + progress dialog in index.html - render + handlers in app.js - .whisperx-install styles in style.css Spark 2 cleaned in tandem (user-authorized): container removed, ~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of builder cache reclaimed. parakeet-asr and magpie-tts unaffected and healthy throughout. The audio path is back to exactly what shipped in v0.11.0:3: POST /api/audio/transcribe-with-speakers → Parakeet (transcription) + Sortformer (diarization) in parallel → merged by timestamp into speaker-labeled blocks v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour was meant to address: 1. memory cap on the parakeet-asr container so a long-audio crash can't swap-thrash Spark 2 again 2. a chunking proxy in /api/audio/transcribe-with-speakers that splits inputs >10 min before Sortformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:03:19 -05:00
Keysat	a24610ad2a	v0.12.0:4 - hotfix: torchaudio build fails without --no-build-isolation Build was crashing inside torchaudio's setup.py with: ModuleNotFoundError: No module named 'torch' PIP_CONSTRAINT was correctly pinning torch/torchvision in the install target env, but pip's PEP 517 build isolation creates a SEPARATE fresh Python env just for the build wheel step — and that env has no torch in it. torchaudio's setup.py imports torch to discover CUDA flags, so it crashes. Pip even printed a deprecation warning that this isolation behavior is hardening, not relaxing. Fix: 1. Pre-install torchaudio's build deps (setuptools, wheel, ninja, pybind11) into the main env since we're disabling isolation. 2. Add --no-build-isolation to the torchaudio install so the build uses NGC's torch directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:53:43 -05:00
Keysat	09a1d3590d	v0.12.0:3 - hotfix: build torchaudio from source against NGC's torch NGC PyTorch (the only base with working torch on Spark's ARM64 + sm_120 Blackwell) doesn't ship torchaudio. Stock pip wheels are amd64-only AND ABI-incompatible with NGC's custom torch 2.10.0a anyway. Pip install just fails or crashes at runtime. Real fix: - apt install git cmake build-essential ninja-build - pip install git+https://github.com/pytorch/audio.git@v2.5.1 with TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0" (sm_120 for Blackwell GB10) - this compiles torchaudio against the torch already in the image, so ABI matches by construction Then constraints.txt locks torch + torchvision + torchaudio so the later `pip install whisperx` can't swap any of them. Cost: +3-5 min to the first install. Docker layer cache reuses the built torchaudio on every subsequent rebuild. Torchaudio v2.5.1 is the last tag that builds cleanly against torch 2.5-2.10 — main branch is too volatile against NGC's alpha torch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:40:50 -05:00
Keysat	98aeef8779	v0.12.0:2 - hotfix: pin NGC's torch versions so pip can't break the ABI WhisperX docker build was crashing at the model-prewarm step: OSError: undefined symbol: torch_library_impl Root cause: the NGC PyTorch base ships custom builds of torch + torchaudio + torchvision matched together for Blackwell (sm_120). When pip installed whisperx, it pulled the latest stock torchaudio wheel as a transitive dep, which was compiled against a different libtorch and won't load against NGC's. Fix: at build time, capture NGC's actual torch/torchaudio/torchvision versions into /tmp/torch-constraints.txt, then `pip install -c` that constraint for all subsequent installs. pip can't swap torch out, so the ABI stays consistent. whisperx and pyannote are happy with torch>=2.0 — NGC's 2.10.0a0 satisfies that easily. The pinned versions print to the build log so you can see them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:26:08 -05:00
Keysat	ce5aee1920	v0.12.0:1 - hotfix: WhisperX install fails on first scp because ~ doesn't expand inside shlex.quote() Symptom: "Failed to ship Dockerfile — bash: line 1: ~/whisperx-build/ Dockerfile: No such file or directory" Same bug pattern as v0.8.1:1 (disk probe). shlex.quote() wraps in single quotes, and the remote shell doesn't do tilde expansion inside single quotes — so it tries to write to a literal directory named "~". Fix: use $HOME in double-quoted shell context, which the remote shell expands correctly. The file names (Dockerfile, requirements.txt, etc.) are hardcoded so they're safe to embed unquoted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:16:44 -05:00
Keysat	5a0bfba6a3	v0.12.0:0 - WhisperX as a one-click dashboard install + managed service Replaces the manual rsync+build+run with a proper spark-control feature. First in the audio path that doesn't require shell access on Spark 2. What's in the box ───────────────── * image/whisperx_container/ - the build context (Dockerfile, requirements, app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT + pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint /v1/audio/transcribe-with-speakers returns the exact same shape spark- control's existing endpoint does, so the recap-relay PR spec needs no changes when we cut over. * image/app/whisperx_install.py - install manager. ships build context to Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min file), polls /health until both Whisper + pyannote report loaded. * Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX when its /health reports diarizer_loaded=true, falls back to the legacy Parakeet + Sortformer path otherwise. Same response shape either way. Clean cutover, easy rollback (`docker rm whisperx-asr`). * Dashboard (Audio / Speech tab): - "Add WhisperX" banner appears when not installed, with a primary "Install WhisperX" button. One click triggers the install. - Build progress dialog with phase + elapsed timer + live build log via SSE (`/api/whisperx/install/{job_id}/stream`). - After install, WhisperX auto-registers as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart). - Banner self-hides once /api/whisperx/status reports healthy. New endpoints ───────────── GET /api/whisperx/status POST /api/whisperx/install GET /api/whisperx/install/{job_id} GET /api/whisperx/install/{job_id}/stream (SSE phase + log) Config additions (env) ────────────────────── WHISPERX_HOST (defaults to spark2_host) WHISPERX_USER (defaults to spark2_user) WHISPERX_CONTAINER (default: whisperx-asr) WHISPERX_PORT (default: 8002) WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3) Dockerfile ────────── Added COPY whisperx_container /app/whisperx_container so the runtime install manager can read the build context from inside the spark-control image and ship it over SSH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:02:26 -05:00
Keysat	cfc1c408d4	v0.11.0:3 - button sizing fix: unify base .btn to 12px / 6px 12px User feedback: every action button OUTSIDE the parakeet/magpie service cards looked too big. Specifically called out: "Reapply patches", "Restart container", "Switch to this", "Download". The ones on the service cards (Start/Restart/Stop) were the size he liked. Root cause: the base .btn used font: inherit, so it picked up 15px from body. .service-actions .btn was the only place with an explicit font-size: 12px + padding: 6px 12px override. Fix: change .btn base directly to font-size: 12px + padding: 6px 12px. Every button across the dashboard now matches the service-card button footprint. The existing per-context overrides become redundant but remain in place; they no longer create visible differences. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:54:46 -05:00
Keysat	3d273223f2	v0.11.0:2 - pill sizing fix: match .tag exactly to .status "Healthy" pill User feedback: every pill outside the Always-On Services cards was rendering visually taller than the "Healthy" status pill they liked. Root cause was the .tag additions in 0.11.0:1 (line-height: 1.5, display: inline-block) that didn't match the .status pill on service cards (which has neither). Dropped both additions, bumped font-size from 11px → 12px so .tag is now pixel-identical to .status: font-size: 12px; padding: 2px 8px; border-radius: 999px; background: var(--surface-2); border: 1px solid var(--border); Every pill on the dashboard (mode-cluster/mode-solo/cap/on-disk/not-on-disk/ custom-pill/.tag.ok/.tag.warn/.tag.bad) now renders at the same footprint as the Healthy/Unhealthy/Starting pills on the service cards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:46:57 -05:00
Keysat	4aa6cf5046	v0.11.0:1 - dashboard polish: tabs, collapsible endpoint, pill consistency Three UX improvements, all client-side; no backend or behavior changes. 1. LLM / Audio tabs under the hardware section. The single long column got split into two tabbed views: * LLM -> model swap + download panel + spark-vllm-docker updates * Audio -> Parakeet/Magpie services + speech-model patches Selection persists in localStorage; default is LLM. The swap-panel (in-flight LLM swap) sits ABOVE the tab strip so it stays visible regardless of which tab is active. 2. Collapsible OpenAI-compatible Endpoint card. New chevron in the card header collapses everything except the title. State persists per browser via localStorage. Defaults to collapsed since you rarely need the URL/ model details visible (and the same info is one tab swap away). 3. Unified pill sizing. The .sm-pill class in speech-models was rendering subtly larger than .tag pills on model cards. Dropped .sm-pill entirely and reused .tag with semantic color modifiers (.tag.ok / .tag.warn / .tag.bad). Same 11px / 2px×8px footprint everywhere now. Also added explicit line-height: 1.5 + display: inline-block to .tag to lock down vertical sizing. No new endpoints, no new dependencies. Tested locally with node --check and ast.parse(). Verified the tab DOM structure wraps the right sections and the speech-models panel still self-shows/hides on data load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:33:16 -05:00
Keysat	391117f705	v0.11.0:0 - Speech model patches panel (lifecycle for v0.10.0 overlays) Folds the image/parakeet_patches/apply.sh script into a one-click dashboard action and adds drift detection so you can see at a glance whether the parakeet-asr container has the latest Sortformer overlays that spark-control ships. Backend: * image/app/speech_models.py - SpeechModelsManager: reads /health from Parakeet, sha256s the local overlay files inside spark-control's Docker image (/app/parakeet_patches), sha256s the same files inside the parakeet-asr container via `docker exec ... sha256sum`, surfaces in_sync / drift / missing status per file. * GET /api/speech-models - status payload * POST /api/speech-models/reapply - copies overlays into container, verifies python syntax, restarts, polls /health for ~120s, returns step-by-step result * POST /api/speech-models/restart - plain `docker restart parakeet-asr` Dockerfile: now COPY parakeet_patches into the image at /app/parakeet_patches so the runtime can read them. Future spark-control releases auto-carry newer overlay versions; the panel surfaces drift after upgrade. Frontend: new "Speech model patches" section on the dashboard with * Status pill (in sync / drift / missing) * Per-file SHA comparison (local vs container) * Loaded-models pills (ASR + diarizer) * Reapply + Restart buttons (both with confirmation modals) * Live progress display during reapply with per-step ✓/✗ Verified post-install against the running cluster: GET /api/speech-models shows both files in_sync (SHAs match) and both models loaded ready on Spark 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:58:13 -05:00
Keysat	fda23088fe	v0.10.0:1 - hotfix: merge function now joins words with proper spacing Smoke testing v0.10.0:0 against a real anarlog audio.mp3 showed the output running words together: "I'mrecordingrightnow", "don'tyoutry". Root cause: _merge_words_with_speakers was doing "".join(cur_words), assuming Parakeet returns words with leading whitespace (which the hyprnote local Parakeet does, but the Spark-hosted Parakeet does not). Rewrote the join with a small helper that: - Strips each token (handles both leading-space and no-leading-space word formats) - Joins with a single space - Keeps punctuation tight — no space before period/comma/colon/etc. Verified post-install with the same test audio: [00:06] Speaker_0: I'm I'm recording right now. [00:18] Speaker_1: you're you're on your computer and your phone, right? No other changes — Parakeet container patches and the endpoint shape stay identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:42:04 -05:00
Keysat	713cd09cc2	v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers Adds a new pipeline for diarized transcription that any client (recap-relay, ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM or UI included — name resolution / analysis happen downstream where prompts and rendering are configurable. Architecture: Spark 2 / parakeet-asr container: + /opt/parakeet/app/diarizer.py (new: SortformerDiarizer class) + /opt/parakeet/app/main.py (patched: loads diarizer, adds /v1/audio/diarize endpoint) Model: nvidia/diar_sortformer_4spk-v1 (~150 MB, ungated, NeMo native) Spark Control: + POST /api/audio/transcribe-with-speakers Body: multipart file Returns: { duration, language, speakers_detected, segments: [{start_ms, end_ms, speaker, text}, ...], models: {transcription, diarization} } Runs Parakeet ASR + Sortformer in parallel, merges words to speaker turns by timestamp, groups into speaker-change blocks (breaks also on >1.5s silence gaps). + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns 503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2. Apply Sortformer patches to the running Parakeet container with: bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user> Patches are reversible — apply.sh backs up the original main.py inside the container at main.py.pre-sortformer before overwriting. Restore by copying that file back and removing diarizer.py, then docker restart. v0.11 follow-up: dashboard "Speech Models" panel to swap/update model versions from the UI instead of needing to re-run apply.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:14:48 -05:00
Keysat	197655a62b	v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention) fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the proxy relayed the upstream 500 verbatim, Open WebUI showed "Server connection error" with no signal to retry, and recovery took up to 5 minutes (waiting for the next periodic deep-health probe). Now the proxy: 1. Detects 500 from /v1/audio/transcriptions 2. Fires deep_health.run_one("parakeet") as a background asyncio task (which contains the same wedge-detect + rate-limited auto-restart logic, but runs immediately instead of waiting for the next tick) 3. Returns 503 with a clear detail message and Retry-After: 60 The client (Open WebUI, Home Assistant, etc.) gets a proper retry signal; the auto-restart triggers inside seconds; the next attempt ~60s later succeeds. Rate-limiting (3 restarts per 30 min) is inherited from the deep-health module so this can't cause restart storms. server.py: pass deep_health into build_audio_router(). audio_proxy.py: new 503-with-restart branch; signature now accepts deep_health as an optional dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:07:35 -05:00
Keysat	b37d7e998b	v0.9.0:1 - hotfix: add python-multipart for /v1/audio/transcriptions v0.9.0:0 introduced the OpenAI audio proxy whose /v1/audio/transcriptions endpoint uses FastAPI's Form() + File() parameters. Those require python-multipart at runtime; it wasn't in image/pyproject.toml because none of the prior endpoints needed multipart. Result: FastAPI raised RuntimeError("Form data requires python-multipart") during route registration, the entrypoint exited 1, and StartOS's reverse proxy started closing TLS handshakes with PR_END_OF_FILE_ERROR because there was no upstream to forward to. Fix: add python-multipart>=0.0.9 to dependencies. Dashboard, /api/, and the new /v1/ audio endpoints all come back up cleanly. No other code changes. Verified post-install: Uvicorn running on http://0.0.0.0:9999, "Application startup complete" in the logs, package status 'installed'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 17:01:52 -05:00
Keysat	f44e7f8b03	v0.9.0:0 - OpenAI-compatible audio proxy for Open WebUI / Home Assistant Adds three new endpoints to spark-control that translate OpenAI's audio API shapes to the Parakeet (STT) and Magpie (TTS, NVIDIA Riva) services on the Sparks: GET /v1/models — STT model + Magpie's 60+ voices POST /v1/audio/speech — OpenAI body -> Magpie multipart synthesize (returns audio/wav passthrough) POST /v1/audio/transcriptions — relay to Parakeet (already compatible) Verified shapes against the live services: - Parakeet returns OpenAI-style {"text": "..."} or verbose_json with segments+words. Already a perfect drop-in for OpenAI clients. - Magpie returns raw WAV bytes with Content-Type: audio/wav. NOT base64-wrapped JSON as one might assume. The proxy is literally a body-translation on the request side; response is passthrough. Voice language is auto-derived from the voice name (e.g. Magpie-Multilingual.EN-US.Mia -> language=en-US) so clients don't need to set it explicitly. Open WebUI / Home Assistant / Recap Relay can now all point at one URL — https://<spark-control>.local/v1 — and get LLM, STT, TTS behind a single identity. No shim service to deploy. Pure addition: no existing routes touched; the dashboard, /api/*, download flow, deep-health, hardware probes are all unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:41:48 -05:00
Keysat	befedf0852	v0.8.1:2 - card button flips to blue "Download" when weights are absent When a model's weights aren't on disk, the green "Switch to this" button on the card is replaced by a blue "Download" button that calls /api/download directly with the model's repo and the right mode (solo -> spark1, cluster -> both). One-click re-install of a previously-deleted model, no more pasting the repo into the manual download form. Also adds a confirmation dialog showing the model name, size, and target Spark(s) before kicking off the download — and disables the button when another download is already in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 09:30:51 -05:00
Keysat	513c78bfa5	v0.8.1:1 - fix disk probe: $HOME wasn't expanding inside shlex.quote The 0.8.1:0 probe wrapped the entire path (including $HOME) in shlex.quote, which produces single quotes — preventing shell variable expansion. The resulting `[ -d '$HOME/.cache/...' ]` test looked for a literal path starting with the string $HOME and always failed, so every model reported as "not downloaded" and no trash icons rendered. Fix: embed $HOME in a double-quoted shell context (which allows expansion) and validate the cache dirname against a whitelist [A-Za-z0-9._-]+ rather than relying on shlex quoting. The dirname is fully constrained by HF's naming rules + our org--name munging, so the whitelist is tight enough. Verified against Spark 1: probe now correctly reports the 25,075,981,924 bytes (23.4 GB) of Qwen3.6's cache dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:58:43 -05:00
Keysat	9ff7ee9c1e	v0.8.1:0 - delete model weights from disk via card trash icon Each model card now shows whether its weights are present on disk (with GB size) or not yet downloaded. When present and the model isn't currently loaded, a trash icon appears; clicking it pops a confirmation showing exactly how many GB will be freed and on which Spark(s), then runs rm -rf on the HF cache directory via SSH. Cluster-mode models are removed from both Sparks; solo-mode from Spark 1 only. Safety rails: refuses to delete the currently-loaded model, refuses during an in-flight swap or download, and the catalog entry stays intact so it can be re-downloaded anytime. Backend: - new image/app/disk.py: probe_disk + delete_from_disk over SSH - GET /api/models/disk-status — parallel probe across all catalog models - DELETE /api/models/{key}/disk — guarded rm -rf, logs to connectivity events Frontend: - on-disk / not-downloaded pills on every card - trash icon-btn in card-actions row (hidden when not on disk) - confirmation dialog showing per-host bytes-to-free - disk-status re-checked every 60s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:07:20 -05:00
Grant	1602b3b3b4	v0.8.0:4 - vLLM deep-health: 'no model loaded' is idle, not a wedge Previously a ConnectError on /v1/models classified vLLM as failing, which would feed into the wedge auto-restart heuristic. But when no model is loaded (the normal idle state between swaps, or after a failed swap leaves the vllm_node container up with no process serving), nothing is listening on 8888 — that's by design, not a wedge. The vLLM probe now does a two-step check: 1. GET /v1/models. ConnectError or empty list -> ok=true with note='no model currently loaded (idle)'. No auto-restart triggered (it wouldn't help anyway — restarting vllm_node kills any loaded model and doesn't load a new one). 2. If a model is loaded, POST 1-token chat completion. A 5xx here is a genuine wedge worth restarting for. Result: deep-health correctly reports 'no model loaded' as informational rather than flagging it as a failure. Auto-restart for vLLM only fires when a model is actually loaded AND inference fails — the right semantics.	2026-05-12 14:50:00 -05:00
Grant	8ac455f5f5	v0.8.0:3 - add --max-num-batched-tokens=16384 to vision models (gemma4, qwen3-vl) After the recent eugr/spark-vllm-docker update, vLLM became stricter about multimodal token budgets: ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496) is larger than max_num_batched_tokens (2048). Please increase max_num_batched_tokens. Each image input produces 2496 tokens, but vLLM's default --max-num-batched-tokens of 2048 is just under. Same class of bug as the Qwen3.6 Mamba block-size assertion we fixed in 0.6.0:1, surfacing on different models. Fix: bake --max-num-batched-tokens=16384 into every multimodal model entry. Now applied to: - qwen36 (already had it for the Mamba constraint; works for multimodal too since Qwen3.6 has vision) - gemma4 (crashed today on engine init) - qwen3-vl (would crash with the same error if anyone tried it) The pre-flight Test button validates argparse but the 2048<2496 check happens at runtime engine init, so it's not caught by Test — only by actually trying to load. This is exactly the kind of bug v0.7's Test catches the syntax of but not the semantics; runtime errors like this still surface only on real swap. Known limitation documented in v0.7 release notes.	2026-05-12 14:47:32 -05:00
Grant	000c55febe	v0.8.0 - Deep health probes + auto-restart on CUDA wedge deep_health.py: - Synthetic probes per service, all payloads generated in-memory (BytesIO), never written to disk: - Parakeet: 1s of digital silence via in-memory WAV → POST /v1/audio/transcriptions - Magpie: short 'hi' text → POST /v1/audio/synthesize (multipart form-data, real TTS API endpoint discovered via openapi.json) - vLLM: 1-token completion against currently-loaded model - Background loop runs every 5 minutes (configurable). Best-effort: exceptions in the loop never kill it. - Auto-restart on wedge-pattern errors (cudaErrorUnknown / CUFFT_INTERNAL_ERROR / 500 / Engine core init failed): docker restart of the affected container. - Rate-limited: max 3 restarts per service per 30 min. - Cooldown: 120 s between consecutive restarts on the same service. - 60 s startup grace before any auto-restart can fire after the app boots. - Probe failures + recoveries logged via record_report(source='deep-health') into the connectivity history alongside the polling-based transitions. API: - GET /api/deep-health: per-service last result + auto-restart counters - POST /api/deep-health/{service}/run: manual trigger now UI: - Service cards show 'Deep check ok/FAILED <time> <latency>' inline, plus a ↻ button to run-now - Auto-restart count in 30-min window surfaced on the card when > 0 - Inline error excerpt shown for failed probes Bug fix: server.py app startup hook was placed before the FastAPI app object was constructed (would crash on import). Moved after.	2026-05-12 14:41:01 -05:00
Grant	6434b01a95	v0.7.0 - Pre-flight launch validation (Test button on every model card) validate.py: - Builds the same args list a real swap would pass to 'vllm serve' - SSHes into Spark 1 and runs vLLM's own argparse layer inside the running vllm_node container, WITHOUT initializing the engine - Uses FlexibleArgumentParser (from vllm.utils.argparse_utils, with fallback to engine.arg_utils) + make_arg_parser — the exact same parser the 'vllm serve' CLI uses. Earlier attempt with bare argparse.ArgumentParser was too strict (rejected '--moe_backend' with underscore that the real CLI accepts via FlexibleArgumentParser's normalization) - Returns structured {ok, stage, error, cmd_args, launch_cmd} so the UI can surface the exact failure cause Endpoint: POST /api/swap/{key}/validate. Cheap (~5s), no engine init, no disruption to the currently-loaded model. Frontend: 'Test' button on every model card, inline result below the action row (green check or red detailed error). Result stays visible until the user reloads or clicks Test again. Catches: typos in flag names, deprecated/removed flags after a vLLM upgrade, type mismatches. Does NOT catch runtime-only failures (Mamba block-size assertion, OOM at load, kernel-compat). Ok=true is necessary-but-not-sufficient; ok=false is definitive 'don't bother running it'.	2026-05-12 13:37:37 -05:00
Grant	5827683a09	v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with: AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048). Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value. Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance. Also documented in known-issues.md.	2026-05-12 13:22:24 -05:00
Grant	ee8c2406b8	v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint connectivity.py: - Generalized 'spark' subject to any string; renamed 'spark' field to 'subject' - Legacy v0.5 events with the old 'spark' field are migrated transparently on read (kind defaults to 'transition') - New record_report(subject, ok, source, detail, latency_ms): always appends an event with kind='report'; does NOT mutate the current state (only active polling is authoritative) - summary() returns events normalized to the new schema Wiring: - /api/status now calls record_state for vllm/parakeet/magpie (dedup on no-change) - /api/services calls record_state for each service after its http check - Result: dashboard observes service-level transitions automatically with no extra polling Passive endpoint: - POST /api/health-event with {service, ok, source?, error?, ms?} - Useful for external apps (e.g. Open WebUI) to surface sub-poll-interval failures the dashboard would otherwise miss UI: - Connectivity dialog groups events by subject (hosts ordered first, then services) - Per-subject summary shows transition count, down count, report count, failed-report count - Transitions and reports render inline with distinct styling; reports show source app + error + latency - Legacy v0.5 events render unchanged Docs: - README documents /api/health-event with a curl example Package: bump to 0.6.0:0	2026-05-12 13:19:27 -05:00
Grant	a02f4db850	v0.5.0 - Wake-on-LAN + connectivity history wol.py: - build_magic_packet(): standard 6x0xFF + 16x MAC layout - send_local_broadcast(): direct from container (ports 9 + 7 for safety) - send_via_peer(): preferred path; SSHes to the OTHER Spark and runs a Python one-liner there so the packet originates on the target's LAN segment (most reliable) - MAC validation + normalization connectivity.py: - /data/connectivity.json persistence (thread-safe, atomic rename) - Stores per-Spark current state + last_change timestamp + rolling 200-event log - Records up/down transitions; computes down_seconds / up_seconds durations - MAC cache populated lazily during hardware probes hardware.py: - Probe now reads MAC via /sys/class/net/<default-route-iface>/address - After each probe, record_state() emits a transition event if state changed - record_mac() caches the address so WoL works when the Spark next goes down Endpoints: - GET /api/connectivity: macs, current state, last_change, events[] - POST /api/spark/{name}/wake: tries via-peer first, falls back to direct broadcast UI: - Unreachable hardware card shows the cached MAC + 'Wake (WoL)' button (only if MAC known) - New 'Connectivity log' button opens a modal with per-Spark transition history (last 25 each), including duration of each prior up/down period - pollHardware also pulls /api/connectivity so WoL buttons appear without an extra fetch Package: bump 0.5.0:0; main.ts sets CONNECTIVITY_LOG=/data/connectivity.json	2026-05-12 12:51:49 -05:00
Grant	1889ab45fb	v0.4.0 - NIM installer + dashboard resilience Hotfix (was v0.3.1): - services.py: cache 'unreachable' per (host,user) for 25s so a dead Spark doesn't hang every /api/services call behind 6s ssh timeout - ssh_run timeout reduced 10 -> 6s for docker_state probes - hardware probe: shorter SSH timeout (6s), longer cache TTL for failures (25s) - JS pollStatus retries loadModels() if state.models is empty (recovers from cold-start proxy timeout) - Unreachable hardware card now includes troubleshooting steps (Spark Control cannot SSH into an unreachable Spark to restart it) v0.4 NIM installer: - nim.py module: curated SUGGESTED_NIMS list (Parakeet, Magpie, Riva) + NimManager that runs docker login nvcr.io + docker pull + docker run -d --gpus all -p PORT:PORT -v VOL:/opt/nim/.cache -e NGC_API_KEY -e ... --restart=unless-stopped + chown the volume to uid 1000 + restart. Streams all output via SSE; redacts the API key from log lines. - custom_services.py: persists installed NIMs to /data/services-overrides.yaml so they appear in the services panel after install - services.py: merges custom services into the panel - /api/nim/catalog GET, /api/nim/install POST + GET/SSE - /api/services/{name} DELETE for custom services - UI: '+ Install NIM' button next to 'Always-on services'; modal lists curated images each with a 'Pick' button + a custom-image form; installation runs in a second dialog with phase + elapsed timer + collapsible log - NGC API key field added to Configure Sparks (masked); injected as NGC_API_KEY env var into the container Package: bump 0.4.0:0; main.ts adds SERVICES_OVERRIDES + NGC_API_KEY env vars	2026-05-12 12:32:29 -05:00
Grant	e88fdcfde4	v0.3.0:1 - hotfix: parallel SSH probes + longer timeout - Hardware probes for spark1 and spark2 now run via asyncio.gather (parallel) so the worst-case wall time is max(per-probe), not sum - Bump per-probe SSH timeout from 8s to 12s to absorb first-call overhead (StrictHostKeyChecking=accept-new on first connect + nvidia-smi cold start) - Unreachable Spark now shows up cleanly in the UI as a single 'unreachable' card with the error message	2026-05-12 12:14:36 -05:00
Grant	64ce0fca10	v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link Hardware dashboard: - New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum - DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true. - 4s TTL cache in HardwareProbe to avoid hammering - /api/hardware returns per-Spark snapshot - UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling - Polls every 8s Knob context (tied to live hardware): - Each Advanced knob now shows plain-English help text - 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM - 'Max context' shows '~N pages of text' - Toggles show tradeoff descriptions Explain context: - '✨ Explain context' button on the update banner - /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE - Renders into an expandable 'Explained by the loaded LLM' section under Pending commits - Reasoning tokens shown italicized when the model emits them Open WebUI integration: - New 'Open WebUI URL' optional field in Configure Sparks - /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set Downloads: - Third radio option: Spark 1 only / Spark 2 only / Both Sparks - Backend picks SSH target based on mode - HF repo link icon next to the input - Helper line about NVFP4 for Blackwell Model cards: - Repo name is now a clickable link to its Hugging Face page Package: bump 0.3.0:0	2026-05-12 12:00:15 -05:00
Grant	c6da6b0784	v0.2.4 - Hotfix: Unknown status + copy UX + update banner context Bug fix: - config.py: empty PARAKEET_CONTAINER / MAGPIE_CONTAINER env vars (from migrating to v0.2.0+ where the field is optional and saved as '') now fall back to 'parakeet-asr' / 'magpie-tts' via the 'or' idiom. Confirmed live: services classify as 'running' instead of 'unknown'. UX: - Replaced text 'Copy' buttons with compact icon buttons (clipboard SVG) - Endpoint Base URL + Model ID + curl snippet are now click-to-copy themselves (the value AND a separate icon button) - Service cards: host, base URL, and model are now three separate copyable rows - Update banner: leading explanatory line — 'Updates to eugr/spark-vllm-docker — the upstream project that orchestrates vLLM on your Sparks. These are not firmware, OS, or model updates.' with a link to the repo.	2026-05-12 11:45:55 -05:00
Grant	75fd0846b4	v0.2.3 - Per-model Advanced settings + catalog-add for downloaded models Backend: - overrides.py: read/write /data/models-overrides.yaml (knobs + custom entries) - apply_knobs_to_args(): strip matching flags from bundled vllm_args and append knob values, so knob changes properly override bundled defaults - extract_knobs_from_args(): seed UI knob values from bundled args so the Advanced dialog has correct starting state - models.py: load_catalog merges overrides on top of bundled yaml - GET /api/models returns effective_knobs per model - PUT /api/models/{key}/knobs persists knob changes - POST /api/models adds a custom catalog entry - DELETE /api/models/{key} removes a custom entry (bundled models cannot be deleted) - swap_manager.reload_catalog() called after each mutation so swaps see latest Frontend: - New 'Advanced' button on every card opens a modal dialog: max-model-len input, gpu-memory-utilization slider, three optimization checkboxes (fastsafetensors, prefix caching, FP8 KV cache). Save persists; Cancel discards. Custom models also have a Delete button. - After a successful download, automatically open the 'Add to catalog' dialog pre-filled with the repo, with the same knob defaults — user just enters key, display name, and clicks Save. - Custom catalog entries are tagged with a blue 'custom' pill on the card. Package: bump 0.2.3:0; main.ts sets MODELS_OVERRIDES=/data/models-overrides.yaml so overrides persist on the StartOS volume.	2026-05-12 11:30:47 -05:00
Grant	474417b458	v0.2.2 - spark-vllm-docker update checks + Apply Update Backend: - updates.py: get_update_status() runs git fetch + git rev-list --left-right --count HEAD...origin/main to learn ahead/behind/dirty, plus git log for pending commits - UpdateManager class with asyncio.Lock; one update at a time - POST /api/updates/apply triggers "git pull --ff-only && ./build-and-copy.sh -c" over SSH with streamed log + phase detection (Pulling / Building the vLLM container / Copying to peer Sparks) - GET /api/updates returns {ok, behind, ahead, dirty, current, log[], branch} Frontend: - Persistent banner near footer: hidden when up-to-date, blue when N commits behind, warn (orange) when local dirty changes block update - 'Show details' expands a list of pending commits - 'Apply update' triggers the long-running build with phase + elapsed timer + collapsible logs - Confirmation dialog explains the 5–40 min duration Package: bump 0.2.2:0	2026-05-12 11:26:55 -05:00
Grant	9dde938348	v0.2.1 - Model download with %% progress Backend: - download.py module: drives ./hf-download.sh <repo> [-c --copy-parallel] over SSH, parses tqdm output (regex matches '8%\|...\| 2.06G/25.1G [03:20<18:35, 20.6MB/s]') into percent + bytes done/total + elapsed + ETA + rate - DownloadManager: in-memory job tracking with asyncio.Lock (one download at a time) - POST /api/download, GET /api/download/{id}, SSE /api/download/{id}/stream - Phase detection: Connecting / Fetching N files / Downloading / Copying to peer Sparks / Done Frontend: - '+ Download a new model' button next to LLM swap section title - Inline form: HF repo text field + solo/cluster radio + Cancel/Start - Progress UI: spinner, elapsed timer, phase label, percent fill, stats line (bytes/rate/ETA), collapsible raw logs Package: bump 0.2.1:0	2026-05-12 11:24:31 -05:00
Grant	27699a2469	v0.2.0 - Always-on services panel with per-service host config Dashboard: - New 'Always-on services' section with cards for Parakeet and Magpie - Each card: host:port, model loaded, status pill (Healthy/Unhealthy/Starting/Not configured) - Start, Restart, Stop buttons. Buttons disabled when not applicable for current state - Restart counter shown when > 1 (would have surfaced the old magpie crash loop) Backend: - New /api/services GET: docker container state + http health for each support service - New POST /api/services/{name}/{action} for start \| stop \| restart - services.py module: docker_state, run_action via SSH - config.py: PARAKEET_HOST/USER/CONTAINER and MAGPIE_* env vars, default to spark2_* - health.py: use per-service hosts (no longer hard-wired to spark2_host) Package: - sparkConfig.yaml.ts: add 6 new optional fields - configureSparks action: optional 'Parakeet host', 'Parakeet container', 'Magpie host', 'Magpie container' fields; descriptions explain they default to Spark 2 when blank - Handler normalizes nulls to empty strings before merge - main.ts: pass new env vars to container - bump to 0.2.0:0	2026-05-12 11:21:15 -05:00
Grant	4cda453c8a	0.1.0:4 - expose /api/endpoints as separate StartOS service interface Adds a second sdk.createInterface with type='api' and path='/api/endpoints' on the same uiPort (9999). StartOS dashboard now shows two service interfaces: Web UI and OpenAI-compatible API. The API URL is discoverable to other services without users needing to remember the /api/endpoints suffix.	2026-05-12 11:07:51 -05:00
Grant	2ba3da55b1	0.1.0:3 - Show Public Key layout + /api/endpoints service-discovery - showPublicKey now uses result.group: install command and raw key are each their own one-click copy box; description is brief - /api/endpoints returns stable shape { vllm, parakeet, magpie } with base_url + model + ready, for other LAN services to consume without hardcoding Spark IPs - health.py: parakeet/magpie now also expose base_url - README: documented /api/endpoints shape	2026-05-12 10:52:57 -05:00
Grant	51804b2e5e	0.1.0:2 - remove '<spark-user>' default everywhere (it's Alice's username, not factory) Per user correction: '<spark-user>' is not the DGX Spark factory default. Generic-ize: - configureSparks: no default user; placeholder 'your SSH username' - sparkConfig schema: empty string defaults - main.ts env fallback: empty - showPublicKey: drop the '<spark-user>' fallback; skip Spark if user not configured - Update feedback memory with the correction	2026-05-12 10:39:57 -05:00
Grant	0ddab99468	Bump to 0.1.0:1 — portability + endpoint display - configureSparks.ts: generic placeholders (e.g. 192.168.1.10), no Alice-specific IPs; descriptions explain the role of each node instead of naming his hardware - showPublicKey.ts: reads sparkConfig.yaml; emits a ready-to-paste one-liner (KEY='...' followed by 'ssh user@host "echo $KEY >> authorized_keys"' for each configured Spark). Falls back to generic instructions if Configure Sparks hasn't been run yet. - /api/status now includes vllm.base_url for the OpenAI endpoint - New endpoint panel in UI: base URL + model ID rows with copy buttons + collapsible curl example - Bump version to 0.1.0:1	2026-05-12 10:38:18 -05:00
Grant	72bf754baa	Pack spark-control_x86_64.s9pk (55 MB) - Move models.yaml into image/ so the docker build context is self-contained - Fix manifest: dockerfile=../image/Dockerfile, workdir=../image - Add LICENSE (MIT) and assets/README.md (StartOS marketplace listing) - s9pk validates: id=spark-control, version=0.1.0:0, osVersion=0.4.0-beta.6, sdkVersion=1.3.3 - Image embeds python:3.12-slim + openssh-client + FastAPI app + models.yaml	2026-05-12 09:52:53 -05:00
Grant	dd9d53060b	Add StartOS 0.4 package scaffold (manifest, main, interfaces, 2 actions) - package/Makefile + s9pk.mk + package.json + tsconfig.json - startos/manifest: dockerBuild source pointing at ../image/Dockerfile - startos/main: reads /data/config.yaml reactively, passes env vars to container - startos/interfaces: binds port 9999 as HTTP UI - startos/actions: showPublicKey (read /data/ssh/id_ed25519.pub), configureSparks - TS + JS bundle compile clean (tsc --noEmit, ncc build)	2026-05-12 09:36:15 -05:00

44 Commits