Commit Graph

84 Commits

Author SHA1 Message Date
Keysat b67e001642 docs: v0.26.0:0 live + published to registry; surface Gemma-26B eval as next 2026-06-18 12:35:16 -05:00
Keysat df9f244eae v0.26.0:0 - disk-driven model menu (scan sparks; recipes; needs-setup)
The dashboard menu is now the set of models actually downloaded on the
Sparks, not a hard-coded catalog. models.yaml + overrides are reframed as
launch recipes matched to an on-disk model by repo; an on-disk model with
no recipe is flagged needs_setup and its launch settings are inferred from
its config.json for a one-time operator confirmation (discovery.py).

- delete now removes weights AND the menu card (delete_from_disk sweeps all
  hosts; the delete endpoint resolves keys via the live menu)
- new GET /api/models/suggest; /api/models returns the menu + a recipes list
  (download autocomplete); GET /api/models/disk-status removed
- dropped the two legacy Qwen recipes (235B FP8, 2.5 72B)
- tests: +test_discovery.py (cache parsing, infer_recipe, build_menu merge)
v0.26.0
2026-06-18 11:09:56 -05:00
Keysat c0b35184ba docs: trim Current state to live status — coordination epic shipped 2026-06-18 08:09:59 -05:00
Keysat 7ecd77f1e5 docs: defer raw-docker swap generalization — multi-node rationale recorded 2026-06-18 07:58:25 -05:00
Keysat 6bcda6e348 docs: v0.25.0:0 installed live — update Current state 2026-06-18 07:11:33 -05:00
Keysat 7ae6ab3ba8 v0.25.0:0 - cluster coordination layer (swap lock + webhook + schedule registry)
GPU-arbiter safety layer for when automation, not just the dashboard, swaps
models:
- swap reservation lock (POST/GET/DELETE /api/swap/lock); 423-enforced in
  post_swap via a single-read gate, TTL-bounded, secret-token auth, human
  force-release override + dashboard banner
- swap webhook (swap_complete/swap_failed) fired outside the swap lock, optional
  HMAC signature, configurable URL+secret
- read-only schedule registry (GET/POST/DELETE /api/schedule) + dashboard panel

New module image/app/coordination.py; docs/COORDINATION.md for consumers; 22
offline tests in test_coordination.py.
v0.25.0
2026-06-18 07:07:08 -05:00
Keysat dd3d1412d4 docs: v0.24.0:0 committed/tagged/pushed — Gitea release asset + live install still pending 2026-06-17 23:11:14 -05:00
Keysat 26070eb191 v0.24.0:0 - configurable cluster topology (vllm container name, hide services, second-vllm monitor)
Make the cluster topology configurable so an adopter wired differently
(vLLM on both Sparks, port 8000, different container name, no Parakeet)
can monitor without forking. Covers the OpenClaw report P4/P5/#6.

- VLLM_CONTAINER override (default vllm_node), validated at the boundary
  and quote_arg-quoted into the swap log-tail + pre-flight validator exec.
- DISABLED_SERVICES list: hidden services show no tile and are skipped by
  status/deep-health/connectivity probes (kills the Parakeet-on-8000
  collision).
- kind: vllm custom service monitors a second Spark's vLLM via the shared
  probe_vllm_endpoint; /api/endpoints gains a disabled flag.

Swap mechanism intentionally not generalized to raw docker run (that's
coordination, roadmap item 4).
v0.24.0
2026-06-17 23:03:33 -05:00
Keysat 90394f891b docs: v0.23.0 published, live install pending (mDNS); runbook sideload troubleshooting 2026-06-17 22:36:41 -05:00
Keysat e783653ef0 v0.23.0:0 - local / fine-tuned model support
Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.

- ModelDef gains local_path; a model must set exactly one of repo / local_path.
  The validator also enforces the local-path whitelist and that any
  --chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
  host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
  then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
  expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
  bad entry can't be written and then break catalog load; _merge_overrides skips
  an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
  link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
  exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
  pass; findings addressed.
v0.23.0
2026-06-17 22:27:41 -05:00
Keysat 57a893000e docs: document the Gitea release ritual in startos-package guide 2026-06-17 21:29:27 -05:00
Keysat 56f7ea4444 fix: gitea-release.sh tolerate 404 on tag lookup; report HTTP errors; mark v0.22.0 published 2026-06-17 21:23:21 -05:00
Keysat aaad57d88f docs: mark v0.22.0:0 shipped + record Gitea-release distribution decision 2026-06-17 19:47:49 -05:00
Keysat 136a4713a1 v0.22.0:0 - configurable vllm port; gitea-release tooling; coexistence roadmap
- Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh
  default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env
  -> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix
  the "vLLM unreachable" health check without rebuilding the package.
- Harden numeric env parsing (config._env_int): a blank or malformed port now
  falls back to its default instead of crashing daemon startup (closes a P3
  tech-debt item; the Configure panel passes unset optional fields as "").
- Add scripts/gitea-release.sh + `make release` to publish the built s9pk to
  Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token
  instead of being hand-sent the package.
- Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a
  job runner" stance in ROADMAP.md and Current state.
v0.22.0
2026-06-17 19:45:09 -05:00
Keysat c179389731 docs: trim Current state post-matrix-bridge ship; add bot-tile ops note to runbook 2026-06-15 23:18:28 -05:00
Keysat 9debeb4bbe v0.21.0:1 - tidy host display for port-less bot tile 2026-06-15 23:09:24 -05:00
Keysat 39f8410623 v0.21.0:0 - matrix-bridge bot tile (status, update, restart, logs) 2026-06-15 22:57:40 -05:00
Keysat e307a08f05 docs: refresh Current state for handoff — harness shipped, parakeet deferred, finished narrative pruned 2026-06-15 18:32:57 -05:00
Keysat 89338c97f5 test: cover shellsafe validators (repo/image/container injection boundary) 2026-06-15 18:17:35 -05:00
Keysat d9c098262f docs(roadmap): defer parakeet long-audio guard; record rationale + impl shortcut 2026-06-15 17:44:48 -05:00
Keysat 6238ac88f7 test: add offline pytest harness (build_launch_command injection, label-merge) 2026-06-15 17:24:49 -05:00
Keysat 17a9973ba2 docs(roadmap): add local-path / fine-tuned model support to backlog 2026-06-15 16:23:44 -05:00
Keysat e87158c492 v0.20.0:0 - per-spark ssh-key copy + wireguard status badge 2026-06-15 09:53:40 -05:00
Keysat 5341fcc506 Add inbox-check line; align .gitignore with canonical .claude policy
Cross-repo git-hygiene audit remediation: surface ~/Projects/standards/INBOX.md items at session start, and switch .gitignore to the deny-by-default .claude/* block (shared wiring allow-listed) plus the canonical secrets/env lines — per standards/portability.md.
2026-06-14 12:17:16 -05:00
Keysat 05d03beeeb docs: handoff — trim Current state, move full-eval debt to ROADMAP, record SSH-input + CSRF conventions
- AGENTS.md: rewrite Current state lean for v0.19.0:0; drop the now-completed
  full-eval triage block (history lives in git log + EVALUATION.md).
- docs/guides/fastapi-image.md: add two durable conventions — user values
  crossing into SSH must go through shellsafe; new endpoints and the
  csrf_guard exempt-prefix rule.
- ROADMAP.md: park the remaining non-blocking P2/P3 tech debt from the eval.
2026-06-12 17:10:03 -05:00
Keysat 56a519ff4f docs: record git-history scrub; fix stale SHAs and IP-fragment remnants
History was rewritten with git filter-repo to purge owner-specific values
(IPs, hostnames, SSH username, key name, personal names) from all commits,
tags, and messages — including three LAN IPs and one Start9 address the
v0.18.0:1 working-tree scrub had missed (one still live in HEAD at
docs/AUDIO_API.md). Verified 0 hits across all refs.

- AGENTS.md: Portability + Repo-wart + work-queue #2 + shipping note updated;
  commit-SHA references repointed to post-rewrite SHAs (367d986->8d839e3).
- EVALUATION.md: P0 owner-data finding marked resolved; cleaned shorthand
  IP-octet fragments (/.87, /11) left by the placeholder substitution.
2026-06-12 16:55:08 -05:00
Keysat 1c4e861783 v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf
Triaged from a full independent evaluation (EVALUATION.md). Addresses the
three P0/P1 code findings; the proxy/data APIs that downstream apps consume
are deliberately untouched.

- ssh command injection (P0): new shellsafe.py validates + shlex.quotes every
  user-supplied value crossing into an SSH command on the Sparks (model repo,
  vllm args/knobs, NIM image/container/volume/port/env, service names).
  Boundary validation on POST /api/models and POST /api/nim/install; quoting at
  every sink in models/download/nim/services. NGC key now quoted too.
- qdrant path injection (P1): /api/search validates the collection name against
  a metacharacter-free whitelist and URL-encodes the path segment.
- csrf (P1): csrf_guard middleware enforces same-origin on state-changing
  control endpoints; /v1/*, /scrub, /rehydrate, /api/search, /api/audio/* and
  /api/health-event are exempt so external consumers are unaffected.

Verified: injection survives only as a single quoted token, vLLM preflight
shlex.split round-trip intact, CSRF behaviors covered via TestClient, both
offline redaction suites still pass, tsc clean, s9pk rebuilt.
2026-06-12 16:36:33 -05:00
Keysat 98988057a2 v0.18.0:1 - scrub owner-specific hostnames, ips, usernames, names from tracked files
Replace real cluster IPs/hosts/usernames and example names with neutral
placeholders across docs, ops notes, package install text, and the offline
redaction test; delete the obsolete build-time starter prompt. Closes the
portability audit's single blocker. No runtime behavior change.
2026-06-12 15:07:34 -05:00
Keysat 5e6db2f63b docs: record canonical AGENTS.md / symlink layout convention 2026-06-12 14:31:54 -05:00
Keysat 6a6112a15f restructure: AGENTS.md canonical + docs/guides with .claude/rules symlinks
Rename CLAUDE.md -> AGENTS.md (cross-vendor standard) with a relative
CLAUDE.md symlink so Claude Code still loads it. Move each .claude/rules
file into docs/guides/ (paths: frontmatter preserved) and replace the
rules file with a relative symlink into the guide. Repoint the AGENTS.md
index paragraph at docs/guides/ so non-Claude agents find the guides.
2026-06-12 14:27:17 -05:00
Keysat d8975bebf7 docs: note self-hosted gitea remote in current state 2026-06-11 19:25:21 -05:00
Keysat 9ef9226e0a docs: split CLAUDE.md into path-scoped .claude/rules; fix dev/test commands
- CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance
  moved to .claude/rules/{startos-package,fastapi-image,redaction,
  audio-speech}.md with paths: frontmatter so each loads only when
  matching files are touched
- .gitignore: track .claude/rules/ while keeping the rest of .claude/
  (settings.local.json) ignored
- test-audio-with-speakers.sh: require audio-file arg in docs, replace
  owner-specific SPARK_CONTROL/VLLM defaults with generic ones
  (localhost dev server + Spark Control vLLM proxy), discover the
  loaded LLM via /api/status since /v1/models lists audio models only
- document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local
  dev (/data only exists in the container)
- prettier pass over startos/actions (formatting drift)
2026-06-11 19:12:23 -05:00
Keysat 7e8175d857 docs: add CLAUDE.md (agent guide) + ROADMAP.md (longer-term backlog) 2026-06-11 17:59:08 -05:00
Keysat 8d839e3714 v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API
- Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests)
- Add embeddings proxy and spark_embed service (Dockerfile + main.py)
- Expand audio_proxy with speaker-aware handling; deep_health/health/server updates
- Package: configureSparks action + sparkConfig model updates, manifest/main wiring
- Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh
2026-06-11 17:45:57 -05:00
Keysat 4a75274db3 v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM
Recap Relay dev caught that all audio endpoints route through Spark
Control but chat-completions didn't — clients had to know about both
SC AND the direct vLLM URL on Spark 1. Closes that last gap.

New endpoints:
  POST /v1/chat/completions   — OpenAI-shape, forwards to vLLM on Spark 1
  POST /v1/completions        — legacy OpenAI completions, same path

Implementation (image/app/llm_proxy.py):
  - Dumb forwarder: request body passed through verbatim, response body
    streamed back chunk-by-chunk. No transformation. vLLM already speaks
    the same shape; adding any logic here would just create skew.
  - Streaming: parses body for `stream: true` and uses httpx.AsyncClient
    .stream() + FastAPI StreamingResponse if so. Non-streaming path is
    a simple post-and-return.
  - 30-minute timeout to accommodate large-context completions (default
    httpx 5s would kill anything substantial).
  - On upstream non-200 in streaming mode: emits one SSE `error` event
    so the client's parser doesn't hang on an empty stream forever.
  - On upstream connection error: HTTP 502 with "vllm unreachable" detail.

Now clients can use ONE host for everything:
  POST https://spark-control/api/audio/diarize-chunk
  POST https://spark-control/v1/audio/transcriptions
  POST https://spark-control/v1/chat/completions
  GET  https://spark-control/api/endpoints  (still works for clients that
                                              prefer the direct URLs)

No parakeet container changes. No Reapply patches needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 19:58:19 -05:00
Keysat c7f94381e7 v0.13.0:2 - per-segment confidence in diarize-chunk response
Recap Relay dev asked: can the diarization output include a confidence
level per segment so the UI can render "Speaker_0?" for uncertain
assignments rather than confidently mislabeling?

Answer: yes. Sortformer's diarize() with include_tensor_outputs=True
returns the per-frame per-speaker sigmoid scores (shape [B, T, 4spk],
~12.6 fps frame rate). The current code argmaxes those into segment
strings and throws the raw scores away. Now: for each output segment,
compute mean probability of the assigned speaker across the segment's
frames → confidence in [0, 1].

Implementation:
  - diarizer.py: diarize_chunk() now calls diarize() with
    include_tensor_outputs=True, and a new _attach_confidence() helper
    derives the per-segment mean probability after parsing the segment
    strings. The frame-rate is computed from tensor shape vs audio
    duration (no need to hard-code the model's stride).
  - All failure paths return confidence=None gracefully — Recap Relay
    can treat None as "no info" or fall back to a default threshold.

Endpoint shape change: segments[] now have an optional `confidence`
field in [0, 1] (or None). All other fields unchanged. Existing callers
that ignore the field aren't affected.

Verified with a 5s test signal that the tensor has shape [1, 63, 4]
(63 frames / 5s = 12.6 fps) and values in [0, 1] (sigmoid outputs,
independent per speaker so overlap detection works). Real speech values
will be much higher than the near-zero values of the pure-tone test
signal.

Reapply patches on the Speech Models card after installing v0.13.0:2
to pick up the updated diarizer.py + main.py in the parakeet container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:36:25 -05:00
Keysat e775906caa v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints
Spark Control now exposes a per-chunk worker designed for Recap Relay
to orchestrate against. Recap Relay does the chunking + global speaker
clustering (consistent with how it already handles the Gemini path);
Spark Control handles the GPU-bound per-chunk work.

Parakeet container:
  - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model
    (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method
    runs Sortformer + extracts one 192-dim voice fingerprint per detected
    local speaker (concatenating each speaker's audio across the chunk
    and running TitaNet's get_embedding).
  - main.py: new POST /v1/audio/diarize-chunk endpoint that returns
    segments + speakers_detected + fingerprints + models in one shot.

Spark Control:
  - new POST /api/audio/diarize-chunk that proxies to parakeet's new
    endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s
    retry-after) as the other audio endpoints. Returns the raw JSON
    upstream because Recap Relay is the consumer; no merging needed.

Response shape Recap Relay receives per chunk:
  {
    "duration": 300.0,
    "segments":  [{"start_s","end_s","speaker"}, ...],   # LOCAL labels
    "speakers_detected": ["Speaker_0","Speaker_1",...],
    "fingerprints": {"Speaker_0":[192 floats], ...},
    "models": {"diarization":"...","embedding":"..."}
  }

Recap Relay's job:
  1. Chunk audio (existing chunking infrastructure)
  2. POST each chunk to /api/audio/diarize-chunk in parallel
  3. Collect all fingerprints from all chunks
  4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine)
  5. Re-label segments with global cluster IDs
  6. Concatenate transcripts (from a separate parallel call to
     /v1/audio/transcriptions) with timestamp offsets and merge with
     re-labeled diar segments

After installing v0.13.0:1, click "Reapply patches" on the Speech Models
card to push the updated diarizer.py + main.py into the parakeet
container — TitaNet will download (~25 MB) on first call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:37:05 -05:00
Keysat 95524f4983 v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer
After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we
never got a working docker build. The fundamental constraint isn't
patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that
runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch
2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere.
WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't
satisfy without rebuilding torchaudio against torch 2.10's alpha API.
Walking away cleanly is better than another night of chasing.

Removed from the codebase:
  - image/whisperx_container/* (Dockerfile + requirements + app/main.py)
  - image/app/whisperx_install.py (install manager + SSH ship-context logic)
  - image/Dockerfile COPY whisperx_container
  - WHISPERX_* config keys in config.py
  - whisperx service entry in services.py
  - WhisperX-preferred branch in audio_proxy.py
  - /api/whisperx/* endpoints in server.py
  - install banner + progress dialog in index.html
  - render + handlers in app.js
  - .whisperx-install styles in style.css

Spark 2 cleaned in tandem (user-authorized): container removed,
~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of
builder cache reclaimed. parakeet-asr and magpie-tts unaffected and
healthy throughout.

The audio path is back to exactly what shipped in v0.11.0:3:
  POST /api/audio/transcribe-with-speakers
    → Parakeet (transcription) + Sortformer (diarization) in parallel
    → merged by timestamp into speaker-labeled blocks

v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour
was meant to address:
  1. memory cap on the parakeet-asr container so a long-audio crash
     can't swap-thrash Spark 2 again
  2. a chunking proxy in /api/audio/transcribe-with-speakers that
     splits inputs >10 min before Sortformer

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:03:19 -05:00
Keysat a24610ad2a v0.12.0:4 - hotfix: torchaudio build fails without --no-build-isolation
Build was crashing inside torchaudio's setup.py with:
  ModuleNotFoundError: No module named 'torch'

PIP_CONSTRAINT was correctly pinning torch/torchvision in the install
target env, but pip's PEP 517 build isolation creates a SEPARATE fresh
Python env just for the build wheel step — and that env has no torch
in it. torchaudio's setup.py imports torch to discover CUDA flags, so
it crashes. Pip even printed a deprecation warning that this isolation
behavior is hardening, not relaxing.

Fix:
  1. Pre-install torchaudio's build deps (setuptools, wheel, ninja,
     pybind11) into the main env since we're disabling isolation.
  2. Add --no-build-isolation to the torchaudio install so the build
     uses NGC's torch directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:53:43 -05:00
Keysat 09a1d3590d v0.12.0:3 - hotfix: build torchaudio from source against NGC's torch
NGC PyTorch (the only base with working torch on Spark's ARM64 + sm_120
Blackwell) doesn't ship torchaudio. Stock pip wheels are amd64-only AND
ABI-incompatible with NGC's custom torch 2.10.0a anyway. Pip install
just fails or crashes at runtime.

Real fix:
  - apt install git cmake build-essential ninja-build
  - pip install git+https://github.com/pytorch/audio.git@v2.5.1
    with TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0" (sm_120 for Blackwell GB10)
  - this compiles torchaudio against the torch already in the image, so
    ABI matches by construction

Then constraints.txt locks torch + torchvision + torchaudio so the later
`pip install whisperx` can't swap any of them.

Cost: +3-5 min to the first install. Docker layer cache reuses the
built torchaudio on every subsequent rebuild.

Torchaudio v2.5.1 is the last tag that builds cleanly against
torch 2.5-2.10 — main branch is too volatile against NGC's alpha torch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:40:50 -05:00
Keysat 98aeef8779 v0.12.0:2 - hotfix: pin NGC's torch versions so pip can't break the ABI
WhisperX docker build was crashing at the model-prewarm step:
  OSError: undefined symbol: torch_library_impl

Root cause: the NGC PyTorch base ships custom builds of torch +
torchaudio + torchvision matched together for Blackwell (sm_120). When
pip installed whisperx, it pulled the latest stock torchaudio wheel as
a transitive dep, which was compiled against a different libtorch and
won't load against NGC's.

Fix: at build time, capture NGC's actual torch/torchaudio/torchvision
versions into /tmp/torch-constraints.txt, then `pip install -c` that
constraint for all subsequent installs. pip can't swap torch out, so
the ABI stays consistent. whisperx and pyannote are happy with
torch>=2.0 — NGC's 2.10.0a0 satisfies that easily.

The pinned versions print to the build log so you can see them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:26:08 -05:00
Keysat ce5aee1920 v0.12.0:1 - hotfix: WhisperX install fails on first scp because ~ doesn't
expand inside shlex.quote()

Symptom: "Failed to ship Dockerfile — bash: line 1: ~/whisperx-build/
Dockerfile: No such file or directory"

Same bug pattern as v0.8.1:1 (disk probe). shlex.quote() wraps in single
quotes, and the remote shell doesn't do tilde expansion inside single
quotes — so it tries to write to a literal directory named "~".

Fix: use $HOME in double-quoted shell context, which the remote shell
expands correctly. The file names (Dockerfile, requirements.txt, etc.)
are hardcoded so they're safe to embed unquoted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:16:44 -05:00
Keysat 5a0bfba6a3 v0.12.0:0 - WhisperX as a one-click dashboard install + managed service
Replaces the manual rsync+build+run with a proper spark-control feature.
First in the audio path that doesn't require shell access on Spark 2.

What's in the box
─────────────────
* image/whisperx_container/   - the build context (Dockerfile, requirements,
  app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT +
  pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint
  /v1/audio/transcribe-with-speakers returns the exact same shape spark-
  control's existing endpoint does, so the recap-relay PR spec needs no
  changes when we cut over.

* image/app/whisperx_install.py - install manager. ships build context to
  Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB
  memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min
  file), polls /health until both Whisper + pyannote report loaded.

* Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX
  when its /health reports diarizer_loaded=true, falls back to the legacy
  Parakeet + Sortformer path otherwise. Same response shape either way.
  Clean cutover, easy rollback (`docker rm whisperx-asr`).

* Dashboard (Audio / Speech tab):
  - "Add WhisperX" banner appears when not installed, with a primary
    "Install WhisperX" button. One click triggers the install.
  - Build progress dialog with phase + elapsed timer + live build log via
    SSE (`/api/whisperx/install/{job_id}/stream`).
  - After install, WhisperX auto-registers as a managed service alongside
    Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart).
  - Banner self-hides once /api/whisperx/status reports healthy.

New endpoints
─────────────
  GET  /api/whisperx/status
  POST /api/whisperx/install
  GET  /api/whisperx/install/{job_id}
  GET  /api/whisperx/install/{job_id}/stream  (SSE phase + log)

Config additions (env)
──────────────────────
  WHISPERX_HOST       (defaults to spark2_host)
  WHISPERX_USER       (defaults to spark2_user)
  WHISPERX_CONTAINER  (default: whisperx-asr)
  WHISPERX_PORT       (default: 8002)
  WHISPERX_MODEL      (default: medium; tiny/base/small/medium/large-v3)

Dockerfile
──────────
Added COPY whisperx_container /app/whisperx_container so the runtime
install manager can read the build context from inside the spark-control
image and ship it over SSH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:02:26 -05:00
Keysat cfc1c408d4 v0.11.0:3 - button sizing fix: unify base .btn to 12px / 6px 12px
User feedback: every action button OUTSIDE the parakeet/magpie service
cards looked too big. Specifically called out: "Reapply patches",
"Restart container", "Switch to this", "Download". The ones on the
service cards (Start/Restart/Stop) were the size he liked.

Root cause: the base .btn used font: inherit, so it picked up 15px from
body. .service-actions .btn was the only place with an explicit
font-size: 12px + padding: 6px 12px override.

Fix: change .btn base directly to font-size: 12px + padding: 6px 12px.
Every button across the dashboard now matches the service-card button
footprint. The existing per-context overrides become redundant but
remain in place; they no longer create visible differences.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:54:46 -05:00
Keysat 3d273223f2 v0.11.0:2 - pill sizing fix: match .tag exactly to .status "Healthy" pill
User feedback: every pill outside the Always-On Services cards was rendering
visually taller than the "Healthy" status pill they liked. Root cause was
the .tag additions in 0.11.0:1 (line-height: 1.5, display: inline-block)
that didn't match the .status pill on service cards (which has neither).

Dropped both additions, bumped font-size from 11px → 12px so .tag is now
pixel-identical to .status:
  font-size: 12px;
  padding: 2px 8px;
  border-radius: 999px;
  background: var(--surface-2);
  border: 1px solid var(--border);

Every pill on the dashboard (mode-cluster/mode-solo/cap/on-disk/not-on-disk/
custom-pill/.tag.ok/.tag.warn/.tag.bad) now renders at the same footprint
as the Healthy/Unhealthy/Starting pills on the service cards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:46:57 -05:00
Keysat 4aa6cf5046 v0.11.0:1 - dashboard polish: tabs, collapsible endpoint, pill consistency
Three UX improvements, all client-side; no backend or behavior changes.

1. LLM / Audio tabs under the hardware section. The single long column got
   split into two tabbed views:
     * LLM       -> model swap + download panel + spark-vllm-docker updates
     * Audio     -> Parakeet/Magpie services + speech-model patches
   Selection persists in localStorage; default is LLM. The swap-panel
   (in-flight LLM swap) sits ABOVE the tab strip so it stays visible
   regardless of which tab is active.

2. Collapsible OpenAI-compatible Endpoint card. New chevron in the card
   header collapses everything except the title. State persists per browser
   via localStorage. Defaults to collapsed since you rarely need the URL/
   model details visible (and the same info is one tab swap away).

3. Unified pill sizing. The .sm-pill class in speech-models was rendering
   subtly larger than .tag pills on model cards. Dropped .sm-pill entirely
   and reused .tag with semantic color modifiers (.tag.ok / .tag.warn /
   .tag.bad). Same 11px / 2px×8px footprint everywhere now. Also added
   explicit line-height: 1.5 + display: inline-block to .tag to lock down
   vertical sizing.

No new endpoints, no new dependencies. Tested locally with node --check
and ast.parse(). Verified the tab DOM structure wraps the right sections
and the speech-models panel still self-shows/hides on data load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:33:16 -05:00
Keysat 391117f705 v0.11.0:0 - Speech model patches panel (lifecycle for v0.10.0 overlays)
Folds the image/parakeet_patches/apply.sh script into a one-click
dashboard action and adds drift detection so you can see at a glance
whether the parakeet-asr container has the latest Sortformer overlays
that spark-control ships.

Backend:
  * image/app/speech_models.py - SpeechModelsManager: reads /health from
    Parakeet, sha256s the local overlay files inside spark-control's
    Docker image (/app/parakeet_patches), sha256s the same files inside
    the parakeet-asr container via `docker exec ... sha256sum`, surfaces
    in_sync / drift / missing status per file.
  * GET  /api/speech-models           - status payload
  * POST /api/speech-models/reapply   - copies overlays into container,
                                         verifies python syntax, restarts,
                                         polls /health for ~120s, returns
                                         step-by-step result
  * POST /api/speech-models/restart   - plain `docker restart parakeet-asr`

Dockerfile: now COPY parakeet_patches into the image at /app/parakeet_patches
so the runtime can read them. Future spark-control releases auto-carry
newer overlay versions; the panel surfaces drift after upgrade.

Frontend: new "Speech model patches" section on the dashboard with
  * Status pill (in sync / drift / missing)
  * Per-file SHA comparison (local vs container)
  * Loaded-models pills (ASR + diarizer)
  * Reapply + Restart buttons (both with confirmation modals)
  * Live progress display during reapply with per-step ✓/✗

Verified post-install against the running cluster:
  GET /api/speech-models shows both files in_sync (SHAs match) and both
  models loaded ready on Spark 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:58:13 -05:00
Keysat fda23088fe v0.10.0:1 - hotfix: merge function now joins words with proper spacing
Smoke testing v0.10.0:0 against a real anarlog audio.mp3 showed the
output running words together: "I'mrecordingrightnow", "don'tyoutry".

Root cause: _merge_words_with_speakers was doing "".join(cur_words),
assuming Parakeet returns words with leading whitespace (which the
hyprnote local Parakeet does, but the Spark-hosted Parakeet does not).

Rewrote the join with a small helper that:
  - Strips each token (handles both leading-space and no-leading-space
    word formats)
  - Joins with a single space
  - Keeps punctuation tight — no space before period/comma/colon/etc.

Verified post-install with the same test audio:
  [00:06] Speaker_0: I'm I'm recording right now.
  [00:18] Speaker_1: you're you're on your computer and your phone, right?

No other changes — Parakeet container patches and the endpoint shape
stay identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:42:04 -05:00
Keysat 713cd09cc2 v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers
Adds a new pipeline for diarized transcription that any client (recap-relay,
ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM
or UI included — name resolution / analysis happen downstream where prompts
and rendering are configurable.

Architecture:
  Spark 2 / parakeet-asr container:
    + /opt/parakeet/app/diarizer.py        (new: SortformerDiarizer class)
    + /opt/parakeet/app/main.py            (patched: loads diarizer, adds
                                            /v1/audio/diarize endpoint)
    Model: nvidia/diar_sortformer_4spk-v1  (~150 MB, ungated, NeMo native)

  Spark Control:
    + POST /api/audio/transcribe-with-speakers
      Body: multipart file
      Returns: {
        duration, language, speakers_detected,
        segments: [{start_ms, end_ms, speaker, text}, ...],
        models: {transcription, diarization}
      }
      Runs Parakeet ASR + Sortformer in parallel, merges words to speaker
      turns by timestamp, groups into speaker-change blocks (breaks also
      on >1.5s silence gaps).
    + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns
      503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2.

Apply Sortformer patches to the running Parakeet container with:
  bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user>

Patches are reversible — apply.sh backs up the original main.py inside the
container at main.py.pre-sortformer before overwriting. Restore by copying
that file back and removing diarizer.py, then docker restart.

v0.11 follow-up: dashboard "Speech Models" panel to swap/update model
versions from the UI instead of needing to re-run apply.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:14:48 -05:00
Keysat 197655a62b v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart
Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention)
fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the
proxy relayed the upstream 500 verbatim, Open WebUI showed "Server
connection error" with no signal to retry, and recovery took up to
5 minutes (waiting for the next periodic deep-health probe).

Now the proxy:
  1. Detects 500 from /v1/audio/transcriptions
  2. Fires deep_health.run_one("parakeet") as a background asyncio task
     (which contains the same wedge-detect + rate-limited auto-restart
     logic, but runs immediately instead of waiting for the next tick)
  3. Returns 503 with a clear detail message and Retry-After: 60

The client (Open WebUI, Home Assistant, etc.) gets a proper retry
signal; the auto-restart triggers inside seconds; the next attempt
~60s later succeeds. Rate-limiting (3 restarts per 30 min) is
inherited from the deep-health module so this can't cause restart
storms.

server.py: pass deep_health into build_audio_router().
audio_proxy.py: new 503-with-restart branch; signature now accepts
                deep_health as an optional dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:07:35 -05:00