spark-control

Author	SHA1	Message	Date
Keysat	b67e001642	docs: v0.26.0:0 live + published to registry; surface Gemma-26B eval as next	2026-06-18 12:35:16 -05:00
Keysat	df9f244eae	v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup) The dashboard menu is now the set of models actually downloaded on the Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as launch recipes matched to an on-disk model by repo; an on-disk model with no recipe is flagged needs_setup and its launch settings are inferred from its config.json for a one-time operator confirmation (discovery.py). - delete now removes weights AND the menu card (delete_from_disk sweeps all hosts; the delete endpoint resolves keys via the live menu) - new GET /api/models/suggest; /api/models returns the menu + a recipes list (download autocomplete); GET /api/models/disk-status removed - dropped the two legacy Qwen recipes (235B FP8, 2.5 72B) - tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge) v0.26.0	2026-06-18 11:09:56 -05:00
Keysat	c0b35184ba	docs: trim Current state to live status — coordination epic shipped	2026-06-18 08:09:59 -05:00
Keysat	7ecd77f1e5	docs: defer raw-docker swap generalization — multi-node rationale recorded	2026-06-18 07:58:25 -05:00
Keysat	6bcda6e348	docs: v0.25.0:0 installed live — update Current state	2026-06-18 07:11:33 -05:00
Keysat	7ae6ab3ba8	v0.25.0:0 - cluster coordination layer (swap lock + webhook + schedule registry) GPU-arbiter safety layer for when automation, not just the dashboard, swaps models: - swap reservation lock (POST/GET/DELETE /api/swap/lock); 423-enforced in post_swap via a single-read gate, TTL-bounded, secret-token auth, human force-release override + dashboard banner - swap webhook (swap_complete/swap_failed) fired outside the swap lock, optional HMAC signature, configurable URL+secret - read-only schedule registry (GET/POST/DELETE /api/schedule) + dashboard panel New module image/app/coordination.py; docs/COORDINATION.md for consumers; 22 offline tests in test_coordination.py. v0.25.0	2026-06-18 07:07:08 -05:00
Keysat	dd3d1412d4	docs: v0.24.0:0 committed/tagged/pushed — Gitea release asset + live install still pending	2026-06-17 23:11:14 -05:00
Keysat	26070eb191	v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor) Make the cluster topology configurable so an adopter wired differently (vLLM on both Sparks, port 8000, different container name, no Parakeet) can monitor without forking. Covers the OpenClaw report P4/P5/#6. - VLLM_CONTAINER override (default vllm_node), validated at the boundary and quote_arg-quoted into the swap log-tail + pre-flight validator exec. - DISABLED_SERVICES list: hidden services show no tile and are skipped by status/deep-health/connectivity probes (kills the Parakeet-on-8000 collision). - kind: vllm custom service monitors a second Spark's vLLM via the shared probe_vllm_endpoint; /api/endpoints gains a disabled flag. Swap mechanism intentionally not generalized to raw docker run (that's coordination, roadmap item 4). v0.24.0	2026-06-17 23:03:33 -05:00
Keysat	90394f891b	docs: v0.23.0 published, live install pending (mDNS); runbook sideload troubleshooting	2026-06-17 22:36:41 -05:00
Keysat	e783653ef0	v0.23.0:0 - local / fine-tuned model support Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes), not just Hugging Face repos. - ModelDef gains local_path; a model must set exactly one of repo / local_path. The validator also enforces the local-path whitelist and that any --chat-template lives inside local_path (only that dir is mounted). - build_launch_command bind-mounts the dir into the vLLM container at the SAME host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook, then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream expands that var unquoted; contract noted in runbook.md). - shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'. - POST /api/models validates the full entry via ModelDef before persisting, so a bad entry can't be written and then break catalog load; _merge_overrides skips an invalid override entry instead of failing the whole catalog. - disk.py size-probes a local path with du; disk-delete refused for local models. - UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF link, delete button hidden for local models. - Tests: local launch + injection round-trip, chat-template location, traversal, exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent pass; findings addressed. v0.23.0	2026-06-17 22:27:41 -05:00
Keysat	57a893000e	docs: document the Gitea release ritual in startos-package guide	2026-06-17 21:29:27 -05:00
Keysat	56f7ea4444	fix: gitea-release.sh tolerate 404 on tag lookup; report HTTP errors; mark v0.22.0 published	2026-06-17 21:23:21 -05:00
Keysat	aaad57d88f	docs: mark v0.22.0:0 shipped + record Gitea-release distribution decision	2026-06-17 19:47:49 -05:00
Keysat	136a4713a1	v0.22.0:0 - configurable vllm port; gitea-release tooling; coexistence roadmap - Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env -> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix the "vLLM unreachable" health check without rebuilding the package. - Harden numeric env parsing (config._env_int): a blank or malformed port now falls back to its default instead of crashing daemon startup (closes a P3 tech-debt item; the Configure panel passes unset optional fields as ""). - Add scripts/gitea-release.sh + `make release` to publish the built s9pk to Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token instead of being hand-sent the package. - Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a job runner" stance in ROADMAP.md and Current state. v0.22.0	2026-06-17 19:45:09 -05:00
Keysat	c179389731	docs: trim Current state post-matrix-bridge ship; add bot-tile ops note to runbook	2026-06-15 23:18:28 -05:00
Keysat	9debeb4bbe	v0.21.0:1 - tidy host display for port-less bot tile	2026-06-15 23:09:24 -05:00
Keysat	39f8410623	v0.21.0:0 - matrix-bridge bot tile (status, update, restart, logs)	2026-06-15 22:57:40 -05:00
Keysat	e307a08f05	docs: refresh Current state for handoff — harness shipped, parakeet deferred, finished narrative pruned	2026-06-15 18:32:57 -05:00
Keysat	89338c97f5	test: cover shellsafe validators (repo/image/container injection boundary)	2026-06-15 18:17:35 -05:00
Keysat	d9c098262f	docs(roadmap): defer parakeet long-audio guard; record rationale + impl shortcut	2026-06-15 17:44:48 -05:00
Keysat	6238ac88f7	test: add offline pytest harness (build_launch_command injection, label-merge)	2026-06-15 17:24:49 -05:00
Keysat	17a9973ba2	docs(roadmap): add local-path / fine-tuned model support to backlog	2026-06-15 16:23:44 -05:00
Keysat	e87158c492	v0.20.0:0 - per-spark ssh-key copy + wireguard status badge	2026-06-15 09:53:40 -05:00
Keysat	5341fcc506	Add inbox-check line; align .gitignore with canonical .claude policy Cross-repo git-hygiene audit remediation: surface ~/Projects/standards/INBOX.md items at session start, and switch .gitignore to the deny-by-default .claude/* block (shared wiring allow-listed) plus the canonical secrets/env lines — per standards/portability.md.	2026-06-14 12:17:16 -05:00
Keysat	05d03beeeb	docs: handoff — trim Current state, move full-eval debt to ROADMAP, record SSH-input + CSRF conventions - AGENTS.md: rewrite Current state lean for v0.19.0:0; drop the now-completed full-eval triage block (history lives in git log + EVALUATION.md). - docs/guides/fastapi-image.md: add two durable conventions — user values crossing into SSH must go through shellsafe; new endpoints and the csrf_guard exempt-prefix rule. - ROADMAP.md: park the remaining non-blocking P2/P3 tech debt from the eval.	2026-06-12 17:10:03 -05:00
Keysat	56a519ff4f	docs: record git-history scrub; fix stale SHAs and IP-fragment remnants History was rewritten with git filter-repo to purge owner-specific values (IPs, hostnames, SSH username, key name, personal names) from all commits, tags, and messages — including three LAN IPs and one Start9 address the v0.18.0:1 working-tree scrub had missed (one still live in HEAD at docs/AUDIO_API.md). Verified 0 hits across all refs. - AGENTS.md: Portability + Repo-wart + work-queue #2 + shipping note updated; commit-SHA references repointed to post-rewrite SHAs (367d986->8d839e3). - EVALUATION.md: P0 owner-data finding marked resolved; cleaned shorthand IP-octet fragments (/.87, /11) left by the placeholder substitution.	2026-06-12 16:55:08 -05:00
Keysat	1c4e861783	v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf Triaged from a full independent evaluation (EVALUATION.md). Addresses the three P0/P1 code findings; the proxy/data APIs that downstream apps consume are deliberately untouched. - ssh command injection (P0): new shellsafe.py validates + shlex.quotes every user-supplied value crossing into an SSH command on the Sparks (model repo, vllm args/knobs, NIM image/container/volume/port/env, service names). Boundary validation on POST /api/models and POST /api/nim/install; quoting at every sink in models/download/nim/services. NGC key now quoted too. - qdrant path injection (P1): /api/search validates the collection name against a metacharacter-free whitelist and URL-encodes the path segment. - csrf (P1): csrf_guard middleware enforces same-origin on state-changing control endpoints; /v1/, /scrub, /rehydrate, /api/search, /api/audio/ and /api/health-event are exempt so external consumers are unaffected. Verified: injection survives only as a single quoted token, vLLM preflight shlex.split round-trip intact, CSRF behaviors covered via TestClient, both offline redaction suites still pass, tsc clean, s9pk rebuilt.	2026-06-12 16:36:33 -05:00
Keysat	98988057a2	v0.18.0:1 - scrub owner-specific hostnames, ips, usernames, names from tracked files Replace real cluster IPs/hosts/usernames and example names with neutral placeholders across docs, ops notes, package install text, and the offline redaction test; delete the obsolete build-time starter prompt. Closes the portability audit's single blocker. No runtime behavior change.	2026-06-12 15:07:34 -05:00
Keysat	5e6db2f63b	docs: record canonical AGENTS.md / symlink layout convention	2026-06-12 14:31:54 -05:00
Keysat	6a6112a15f	restructure: AGENTS.md canonical + docs/guides with .claude/rules symlinks Rename CLAUDE.md -> AGENTS.md (cross-vendor standard) with a relative CLAUDE.md symlink so Claude Code still loads it. Move each .claude/rules file into docs/guides/ (paths: frontmatter preserved) and replace the rules file with a relative symlink into the guide. Repoint the AGENTS.md index paragraph at docs/guides/ so non-Claude agents find the guides.	2026-06-12 14:27:17 -05:00
Keysat	d8975bebf7	docs: note self-hosted gitea remote in current state	2026-06-11 19:25:21 -05:00
Keysat	9ef9226e0a	docs: split CLAUDE.md into path-scoped .claude/rules; fix dev/test commands - CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance moved to .claude/rules/{startos-package,fastapi-image,redaction, audio-speech}.md with paths: frontmatter so each loads only when matching files are touched - .gitignore: track .claude/rules/ while keeping the rest of .claude/ (settings.local.json) ignored - test-audio-with-speakers.sh: require audio-file arg in docs, replace owner-specific SPARK_CONTROL/VLLM defaults with generic ones (localhost dev server + Spark Control vLLM proxy), discover the loaded LLM via /api/status since /v1/models lists audio models only - document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local dev (/data only exists in the container) - prettier pass over startos/actions (formatting drift)	2026-06-11 19:12:23 -05:00
Keysat	7e8175d857	docs: add CLAUDE.md (agent guide) + ROADMAP.md (longer-term backlog)	2026-06-11 17:59:08 -05:00
Keysat	8d839e3714	v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API - Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests) - Add embeddings proxy and spark_embed service (Dockerfile + main.py) - Expand audio_proxy with speaker-aware handling; deep_health/health/server updates - Package: configureSparks action + sparkConfig model updates, manifest/main wiring - Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh	2026-06-11 17:45:57 -05:00
Keysat	4a75274db3	v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM Recap Relay dev caught that all audio endpoints route through Spark Control but chat-completions didn't — clients had to know about both SC AND the direct vLLM URL on Spark 1. Closes that last gap. New endpoints: POST /v1/chat/completions — OpenAI-shape, forwards to vLLM on Spark 1 POST /v1/completions — legacy OpenAI completions, same path Implementation (image/app/llm_proxy.py): - Dumb forwarder: request body passed through verbatim, response body streamed back chunk-by-chunk. No transformation. vLLM already speaks the same shape; adding any logic here would just create skew. - Streaming: parses body for `stream: true` and uses httpx.AsyncClient .stream() + FastAPI StreamingResponse if so. Non-streaming path is a simple post-and-return. - 30-minute timeout to accommodate large-context completions (default httpx 5s would kill anything substantial). - On upstream non-200 in streaming mode: emits one SSE `error` event so the client's parser doesn't hang on an empty stream forever. - On upstream connection error: HTTP 502 with "vllm unreachable" detail. Now clients can use ONE host for everything: POST https://spark-control/api/audio/diarize-chunk POST https://spark-control/v1/audio/transcriptions POST https://spark-control/v1/chat/completions GET https://spark-control/api/endpoints (still works for clients that prefer the direct URLs) No parakeet container changes. No Reapply patches needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 19:58:19 -05:00
Keysat	c7f94381e7	v0.13.0:2 - per-segment confidence in diarize-chunk response Recap Relay dev asked: can the diarization output include a confidence level per segment so the UI can render "Speaker_0?" for uncertain assignments rather than confidently mislabeling? Answer: yes. Sortformer's diarize() with include_tensor_outputs=True returns the per-frame per-speaker sigmoid scores (shape [B, T, 4spk], ~12.6 fps frame rate). The current code argmaxes those into segment strings and throws the raw scores away. Now: for each output segment, compute mean probability of the assigned speaker across the segment's frames → confidence in [0, 1]. Implementation: - diarizer.py: diarize_chunk() now calls diarize() with include_tensor_outputs=True, and a new _attach_confidence() helper derives the per-segment mean probability after parsing the segment strings. The frame-rate is computed from tensor shape vs audio duration (no need to hard-code the model's stride). - All failure paths return confidence=None gracefully — Recap Relay can treat None as "no info" or fall back to a default threshold. Endpoint shape change: segments[] now have an optional `confidence` field in [0, 1] (or None). All other fields unchanged. Existing callers that ignore the field aren't affected. Verified with a 5s test signal that the tensor has shape [1, 63, 4] (63 frames / 5s = 12.6 fps) and values in [0, 1] (sigmoid outputs, independent per speaker so overlap detection works). Real speech values will be much higher than the near-zero values of the pure-tone test signal. Reapply patches on the Speech Models card after installing v0.13.0:2 to pick up the updated diarizer.py + main.py in the parakeet container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:36:25 -05:00
Keysat	e775906caa	v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints Spark Control now exposes a per-chunk worker designed for Recap Relay to orchestrate against. Recap Relay does the chunking + global speaker clustering (consistent with how it already handles the Gemini path); Spark Control handles the GPU-bound per-chunk work. Parakeet container: - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method runs Sortformer + extracts one 192-dim voice fingerprint per detected local speaker (concatenating each speaker's audio across the chunk and running TitaNet's get_embedding). - main.py: new POST /v1/audio/diarize-chunk endpoint that returns segments + speakers_detected + fingerprints + models in one shot. Spark Control: - new POST /api/audio/diarize-chunk that proxies to parakeet's new endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s retry-after) as the other audio endpoints. Returns the raw JSON upstream because Recap Relay is the consumer; no merging needed. Response shape Recap Relay receives per chunk: { "duration": 300.0, "segments": [{"start_s","end_s","speaker"}, ...], # LOCAL labels "speakers_detected": ["Speaker_0","Speaker_1",...], "fingerprints": {"Speaker_0":[192 floats], ...}, "models": {"diarization":"...","embedding":"..."} } Recap Relay's job: 1. Chunk audio (existing chunking infrastructure) 2. POST each chunk to /api/audio/diarize-chunk in parallel 3. Collect all fingerprints from all chunks 4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine) 5. Re-label segments with global cluster IDs 6. Concatenate transcripts (from a separate parallel call to /v1/audio/transcriptions) with timestamp offsets and merge with re-labeled diar segments After installing v0.13.0:1, click "Reapply patches" on the Speech Models card to push the updated diarizer.py + main.py into the parakeet container — TitaNet will download (~25 MB) on first call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 11:37:05 -05:00
Keysat	95524f4983	v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we never got a working docker build. The fundamental constraint isn't patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch 2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere. WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't satisfy without rebuilding torchaudio against torch 2.10's alpha API. Walking away cleanly is better than another night of chasing. Removed from the codebase: - image/whisperx_container/* (Dockerfile + requirements + app/main.py) - image/app/whisperx_install.py (install manager + SSH ship-context logic) - image/Dockerfile COPY whisperx_container - WHISPERX_* config keys in config.py - whisperx service entry in services.py - WhisperX-preferred branch in audio_proxy.py - /api/whisperx/* endpoints in server.py - install banner + progress dialog in index.html - render + handlers in app.js - .whisperx-install styles in style.css Spark 2 cleaned in tandem (user-authorized): container removed, ~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of builder cache reclaimed. parakeet-asr and magpie-tts unaffected and healthy throughout. The audio path is back to exactly what shipped in v0.11.0:3: POST /api/audio/transcribe-with-speakers → Parakeet (transcription) + Sortformer (diarization) in parallel → merged by timestamp into speaker-labeled blocks v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour was meant to address: 1. memory cap on the parakeet-asr container so a long-audio crash can't swap-thrash Spark 2 again 2. a chunking proxy in /api/audio/transcribe-with-speakers that splits inputs >10 min before Sortformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:03:19 -05:00
Keysat	a24610ad2a	v0.12.0:4 - hotfix: torchaudio build fails without --no-build-isolation Build was crashing inside torchaudio's setup.py with: ModuleNotFoundError: No module named 'torch' PIP_CONSTRAINT was correctly pinning torch/torchvision in the install target env, but pip's PEP 517 build isolation creates a SEPARATE fresh Python env just for the build wheel step — and that env has no torch in it. torchaudio's setup.py imports torch to discover CUDA flags, so it crashes. Pip even printed a deprecation warning that this isolation behavior is hardening, not relaxing. Fix: 1. Pre-install torchaudio's build deps (setuptools, wheel, ninja, pybind11) into the main env since we're disabling isolation. 2. Add --no-build-isolation to the torchaudio install so the build uses NGC's torch directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:53:43 -05:00
Keysat	09a1d3590d	v0.12.0:3 - hotfix: build torchaudio from source against NGC's torch NGC PyTorch (the only base with working torch on Spark's ARM64 + sm_120 Blackwell) doesn't ship torchaudio. Stock pip wheels are amd64-only AND ABI-incompatible with NGC's custom torch 2.10.0a anyway. Pip install just fails or crashes at runtime. Real fix: - apt install git cmake build-essential ninja-build - pip install git+https://github.com/pytorch/audio.git@v2.5.1 with TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0" (sm_120 for Blackwell GB10) - this compiles torchaudio against the torch already in the image, so ABI matches by construction Then constraints.txt locks torch + torchvision + torchaudio so the later `pip install whisperx` can't swap any of them. Cost: +3-5 min to the first install. Docker layer cache reuses the built torchaudio on every subsequent rebuild. Torchaudio v2.5.1 is the last tag that builds cleanly against torch 2.5-2.10 — main branch is too volatile against NGC's alpha torch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:40:50 -05:00
Keysat	98aeef8779	v0.12.0:2 - hotfix: pin NGC's torch versions so pip can't break the ABI WhisperX docker build was crashing at the model-prewarm step: OSError: undefined symbol: torch_library_impl Root cause: the NGC PyTorch base ships custom builds of torch + torchaudio + torchvision matched together for Blackwell (sm_120). When pip installed whisperx, it pulled the latest stock torchaudio wheel as a transitive dep, which was compiled against a different libtorch and won't load against NGC's. Fix: at build time, capture NGC's actual torch/torchaudio/torchvision versions into /tmp/torch-constraints.txt, then `pip install -c` that constraint for all subsequent installs. pip can't swap torch out, so the ABI stays consistent. whisperx and pyannote are happy with torch>=2.0 — NGC's 2.10.0a0 satisfies that easily. The pinned versions print to the build log so you can see them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:26:08 -05:00
Keysat	ce5aee1920	v0.12.0:1 - hotfix: WhisperX install fails on first scp because ~ doesn't expand inside shlex.quote() Symptom: "Failed to ship Dockerfile — bash: line 1: ~/whisperx-build/ Dockerfile: No such file or directory" Same bug pattern as v0.8.1:1 (disk probe). shlex.quote() wraps in single quotes, and the remote shell doesn't do tilde expansion inside single quotes — so it tries to write to a literal directory named "~". Fix: use $HOME in double-quoted shell context, which the remote shell expands correctly. The file names (Dockerfile, requirements.txt, etc.) are hardcoded so they're safe to embed unquoted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:16:44 -05:00
Keysat	5a0bfba6a3	v0.12.0:0 - WhisperX as a one-click dashboard install + managed service Replaces the manual rsync+build+run with a proper spark-control feature. First in the audio path that doesn't require shell access on Spark 2. What's in the box ───────────────── * image/whisperx_container/ - the build context (Dockerfile, requirements, app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT + pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint /v1/audio/transcribe-with-speakers returns the exact same shape spark- control's existing endpoint does, so the recap-relay PR spec needs no changes when we cut over. * image/app/whisperx_install.py - install manager. ships build context to Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min file), polls /health until both Whisper + pyannote report loaded. * Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX when its /health reports diarizer_loaded=true, falls back to the legacy Parakeet + Sortformer path otherwise. Same response shape either way. Clean cutover, easy rollback (`docker rm whisperx-asr`). * Dashboard (Audio / Speech tab): - "Add WhisperX" banner appears when not installed, with a primary "Install WhisperX" button. One click triggers the install. - Build progress dialog with phase + elapsed timer + live build log via SSE (`/api/whisperx/install/{job_id}/stream`). - After install, WhisperX auto-registers as a managed service alongside Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart). - Banner self-hides once /api/whisperx/status reports healthy. New endpoints ───────────── GET /api/whisperx/status POST /api/whisperx/install GET /api/whisperx/install/{job_id} GET /api/whisperx/install/{job_id}/stream (SSE phase + log) Config additions (env) ────────────────────── WHISPERX_HOST (defaults to spark2_host) WHISPERX_USER (defaults to spark2_user) WHISPERX_CONTAINER (default: whisperx-asr) WHISPERX_PORT (default: 8002) WHISPERX_MODEL (default: medium; tiny/base/small/medium/large-v3) Dockerfile ────────── Added COPY whisperx_container /app/whisperx_container so the runtime install manager can read the build context from inside the spark-control image and ship it over SSH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:02:26 -05:00
Keysat	cfc1c408d4	v0.11.0:3 - button sizing fix: unify base .btn to 12px / 6px 12px User feedback: every action button OUTSIDE the parakeet/magpie service cards looked too big. Specifically called out: "Reapply patches", "Restart container", "Switch to this", "Download". The ones on the service cards (Start/Restart/Stop) were the size he liked. Root cause: the base .btn used font: inherit, so it picked up 15px from body. .service-actions .btn was the only place with an explicit font-size: 12px + padding: 6px 12px override. Fix: change .btn base directly to font-size: 12px + padding: 6px 12px. Every button across the dashboard now matches the service-card button footprint. The existing per-context overrides become redundant but remain in place; they no longer create visible differences. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:54:46 -05:00
Keysat	3d273223f2	v0.11.0:2 - pill sizing fix: match .tag exactly to .status "Healthy" pill User feedback: every pill outside the Always-On Services cards was rendering visually taller than the "Healthy" status pill they liked. Root cause was the .tag additions in 0.11.0:1 (line-height: 1.5, display: inline-block) that didn't match the .status pill on service cards (which has neither). Dropped both additions, bumped font-size from 11px → 12px so .tag is now pixel-identical to .status: font-size: 12px; padding: 2px 8px; border-radius: 999px; background: var(--surface-2); border: 1px solid var(--border); Every pill on the dashboard (mode-cluster/mode-solo/cap/on-disk/not-on-disk/ custom-pill/.tag.ok/.tag.warn/.tag.bad) now renders at the same footprint as the Healthy/Unhealthy/Starting pills on the service cards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:46:57 -05:00
Keysat	4aa6cf5046	v0.11.0:1 - dashboard polish: tabs, collapsible endpoint, pill consistency Three UX improvements, all client-side; no backend or behavior changes. 1. LLM / Audio tabs under the hardware section. The single long column got split into two tabbed views: * LLM -> model swap + download panel + spark-vllm-docker updates * Audio -> Parakeet/Magpie services + speech-model patches Selection persists in localStorage; default is LLM. The swap-panel (in-flight LLM swap) sits ABOVE the tab strip so it stays visible regardless of which tab is active. 2. Collapsible OpenAI-compatible Endpoint card. New chevron in the card header collapses everything except the title. State persists per browser via localStorage. Defaults to collapsed since you rarely need the URL/ model details visible (and the same info is one tab swap away). 3. Unified pill sizing. The .sm-pill class in speech-models was rendering subtly larger than .tag pills on model cards. Dropped .sm-pill entirely and reused .tag with semantic color modifiers (.tag.ok / .tag.warn / .tag.bad). Same 11px / 2px×8px footprint everywhere now. Also added explicit line-height: 1.5 + display: inline-block to .tag to lock down vertical sizing. No new endpoints, no new dependencies. Tested locally with node --check and ast.parse(). Verified the tab DOM structure wraps the right sections and the speech-models panel still self-shows/hides on data load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:33:16 -05:00
Keysat	391117f705	v0.11.0:0 - Speech model patches panel (lifecycle for v0.10.0 overlays) Folds the image/parakeet_patches/apply.sh script into a one-click dashboard action and adds drift detection so you can see at a glance whether the parakeet-asr container has the latest Sortformer overlays that spark-control ships. Backend: * image/app/speech_models.py - SpeechModelsManager: reads /health from Parakeet, sha256s the local overlay files inside spark-control's Docker image (/app/parakeet_patches), sha256s the same files inside the parakeet-asr container via `docker exec ... sha256sum`, surfaces in_sync / drift / missing status per file. * GET /api/speech-models - status payload * POST /api/speech-models/reapply - copies overlays into container, verifies python syntax, restarts, polls /health for ~120s, returns step-by-step result * POST /api/speech-models/restart - plain `docker restart parakeet-asr` Dockerfile: now COPY parakeet_patches into the image at /app/parakeet_patches so the runtime can read them. Future spark-control releases auto-carry newer overlay versions; the panel surfaces drift after upgrade. Frontend: new "Speech model patches" section on the dashboard with * Status pill (in sync / drift / missing) * Per-file SHA comparison (local vs container) * Loaded-models pills (ASR + diarizer) * Reapply + Restart buttons (both with confirmation modals) * Live progress display during reapply with per-step ✓/✗ Verified post-install against the running cluster: GET /api/speech-models shows both files in_sync (SHAs match) and both models loaded ready on Spark 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:58:13 -05:00
Keysat	fda23088fe	v0.10.0:1 - hotfix: merge function now joins words with proper spacing Smoke testing v0.10.0:0 against a real anarlog audio.mp3 showed the output running words together: "I'mrecordingrightnow", "don'tyoutry". Root cause: _merge_words_with_speakers was doing "".join(cur_words), assuming Parakeet returns words with leading whitespace (which the hyprnote local Parakeet does, but the Spark-hosted Parakeet does not). Rewrote the join with a small helper that: - Strips each token (handles both leading-space and no-leading-space word formats) - Joins with a single space - Keeps punctuation tight — no space before period/comma/colon/etc. Verified post-install with the same test audio: [00:06] Speaker_0: I'm I'm recording right now. [00:18] Speaker_1: you're you're on your computer and your phone, right? No other changes — Parakeet container patches and the endpoint shape stay identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:42:04 -05:00
Keysat	713cd09cc2	v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers Adds a new pipeline for diarized transcription that any client (recap-relay, ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM or UI included — name resolution / analysis happen downstream where prompts and rendering are configurable. Architecture: Spark 2 / parakeet-asr container: + /opt/parakeet/app/diarizer.py (new: SortformerDiarizer class) + /opt/parakeet/app/main.py (patched: loads diarizer, adds /v1/audio/diarize endpoint) Model: nvidia/diar_sortformer_4spk-v1 (~150 MB, ungated, NeMo native) Spark Control: + POST /api/audio/transcribe-with-speakers Body: multipart file Returns: { duration, language, speakers_detected, segments: [{start_ms, end_ms, speaker, text}, ...], models: {transcription, diarization} } Runs Parakeet ASR + Sortformer in parallel, merges words to speaker turns by timestamp, groups into speaker-change blocks (breaks also on >1.5s silence gaps). + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns 503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2. Apply Sortformer patches to the running Parakeet container with: bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user> Patches are reversible — apply.sh backs up the original main.py inside the container at main.py.pre-sortformer before overwriting. Restore by copying that file back and removing diarizer.py, then docker restart. v0.11 follow-up: dashboard "Speech Models" panel to swap/update model versions from the UI instead of needing to re-run apply.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:14:48 -05:00
Keysat	197655a62b	v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention) fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the proxy relayed the upstream 500 verbatim, Open WebUI showed "Server connection error" with no signal to retry, and recovery took up to 5 minutes (waiting for the next periodic deep-health probe). Now the proxy: 1. Detects 500 from /v1/audio/transcriptions 2. Fires deep_health.run_one("parakeet") as a background asyncio task (which contains the same wedge-detect + rate-limited auto-restart logic, but runs immediately instead of waiting for the next tick) 3. Returns 503 with a clear detail message and Retry-After: 60 The client (Open WebUI, Home Assistant, etc.) gets a proper retry signal; the auto-restart triggers inside seconds; the next attempt ~60s later succeeds. Rate-limiting (3 restarts per 30 min) is inherited from the deep-health module so this can't cause restart storms. server.py: pass deep_health into build_audio_router(). audio_proxy.py: new 503-with-restart branch; signature now accepts deep_health as an optional dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:07:35 -05:00

1 2

84 Commits