58 Commits

Author SHA1 Message Date
Keysat bcdfb22e06 v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf
Triaged from a full independent evaluation (EVALUATION.md). Addresses the
three P0/P1 code findings; the proxy/data APIs that downstream apps consume
are deliberately untouched.

- ssh command injection (P0): new shellsafe.py validates + shlex.quotes every
  user-supplied value crossing into an SSH command on the Sparks (model repo,
  vllm args/knobs, NIM image/container/volume/port/env, service names).
  Boundary validation on POST /api/models and POST /api/nim/install; quoting at
  every sink in models/download/nim/services. NGC key now quoted too.
- qdrant path injection (P1): /api/search validates the collection name against
  a metacharacter-free whitelist and URL-encodes the path segment.
- csrf (P1): csrf_guard middleware enforces same-origin on state-changing
  control endpoints; /v1/*, /scrub, /rehydrate, /api/search, /api/audio/* and
  /api/health-event are exempt so external consumers are unaffected.

Verified: injection survives only as a single quoted token, vLLM preflight
shlex.split round-trip intact, CSRF behaviors covered via TestClient, both
offline redaction suites still pass, tsc clean, s9pk rebuilt.
2026-06-12 16:36:33 -05:00
Keysat 50c67cde34 v0.18.0:1 - scrub owner-specific hostnames, ips, usernames, names from tracked files
Replace real cluster IPs/hosts/usernames and example names with neutral
placeholders across docs, ops notes, package install text, and the offline
redaction test; delete the obsolete build-time starter prompt. Closes the
portability audit's single blocker. No runtime behavior change.
2026-06-12 15:07:34 -05:00
Keysat 7a1056241e docs: record canonical AGENTS.md / symlink layout convention 2026-06-12 14:31:54 -05:00
Keysat 8e94712808 restructure: AGENTS.md canonical + docs/guides with .claude/rules symlinks
Rename CLAUDE.md -> AGENTS.md (cross-vendor standard) with a relative
CLAUDE.md symlink so Claude Code still loads it. Move each .claude/rules
file into docs/guides/ (paths: frontmatter preserved) and replace the
rules file with a relative symlink into the guide. Repoint the AGENTS.md
index paragraph at docs/guides/ so non-Claude agents find the guides.
2026-06-12 14:27:17 -05:00
Keysat 4c8ba620cd docs: note self-hosted gitea remote in current state 2026-06-11 19:25:21 -05:00
Keysat c18050cb87 docs: split CLAUDE.md into path-scoped .claude/rules; fix dev/test commands
- CLAUDE.md trimmed to whole-repo facts (58 lines); subsystem guidance
  moved to .claude/rules/{startos-package,fastapi-image,redaction,
  audio-speech}.md with paths: frontmatter so each loads only when
  matching files are touched
- .gitignore: track .claude/rules/ while keeping the rest of .claude/
  (settings.local.json) ignored
- test-audio-with-speakers.sh: require audio-file arg in docs, replace
  owner-specific SPARK_CONTROL/VLLM defaults with generic ones
  (localhost dev server + Spark Control vLLM proxy), discover the
  loaded LLM via /api/status since /v1/models lists audio models only
- document REDACTION_MAP_DB + CONNECTIVITY_LOG as required for local
  dev (/data only exists in the container)
- prettier pass over startos/actions (formatting drift)
2026-06-11 19:12:23 -05:00
Keysat 8e978b44cf docs: add CLAUDE.md (agent guide) + ROADMAP.md (longer-term backlog) 2026-06-11 17:59:08 -05:00
Keysat 367d9869da v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API
- Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests)
- Add embeddings proxy and spark_embed service (Dockerfile + main.py)
- Expand audio_proxy with speaker-aware handling; deep_health/health/server updates
- Package: configureSparks action + sparkConfig model updates, manifest/main wiring
- Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh
2026-06-11 17:45:57 -05:00
Keysat 81c448f70c v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM
Recap Relay dev caught that all audio endpoints route through Spark
Control but chat-completions didn't — clients had to know about both
SC AND the direct vLLM URL on Spark 1. Closes that last gap.

New endpoints:
  POST /v1/chat/completions   — OpenAI-shape, forwards to vLLM on Spark 1
  POST /v1/completions        — legacy OpenAI completions, same path

Implementation (image/app/llm_proxy.py):
  - Dumb forwarder: request body passed through verbatim, response body
    streamed back chunk-by-chunk. No transformation. vLLM already speaks
    the same shape; adding any logic here would just create skew.
  - Streaming: parses body for `stream: true` and uses httpx.AsyncClient
    .stream() + FastAPI StreamingResponse if so. Non-streaming path is
    a simple post-and-return.
  - 30-minute timeout to accommodate large-context completions (default
    httpx 5s would kill anything substantial).
  - On upstream non-200 in streaming mode: emits one SSE `error` event
    so the client's parser doesn't hang on an empty stream forever.
  - On upstream connection error: HTTP 502 with "vllm unreachable" detail.

Now clients can use ONE host for everything:
  POST https://spark-control/api/audio/diarize-chunk
  POST https://spark-control/v1/audio/transcriptions
  POST https://spark-control/v1/chat/completions
  GET  https://spark-control/api/endpoints  (still works for clients that
                                              prefer the direct URLs)

No parakeet container changes. No Reapply patches needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 19:58:19 -05:00
Keysat 01c5ab784d v0.13.0:2 - per-segment confidence in diarize-chunk response
Recap Relay dev asked: can the diarization output include a confidence
level per segment so the UI can render "Speaker_0?" for uncertain
assignments rather than confidently mislabeling?

Answer: yes. Sortformer's diarize() with include_tensor_outputs=True
returns the per-frame per-speaker sigmoid scores (shape [B, T, 4spk],
~12.6 fps frame rate). The current code argmaxes those into segment
strings and throws the raw scores away. Now: for each output segment,
compute mean probability of the assigned speaker across the segment's
frames → confidence in [0, 1].

Implementation:
  - diarizer.py: diarize_chunk() now calls diarize() with
    include_tensor_outputs=True, and a new _attach_confidence() helper
    derives the per-segment mean probability after parsing the segment
    strings. The frame-rate is computed from tensor shape vs audio
    duration (no need to hard-code the model's stride).
  - All failure paths return confidence=None gracefully — Recap Relay
    can treat None as "no info" or fall back to a default threshold.

Endpoint shape change: segments[] now have an optional `confidence`
field in [0, 1] (or None). All other fields unchanged. Existing callers
that ignore the field aren't affected.

Verified with a 5s test signal that the tensor has shape [1, 63, 4]
(63 frames / 5s = 12.6 fps) and values in [0, 1] (sigmoid outputs,
independent per speaker so overlap detection works). Real speech values
will be much higher than the near-zero values of the pure-tone test
signal.

Reapply patches on the Speech Models card after installing v0.13.0:2
to pick up the updated diarizer.py + main.py in the parakeet container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:36:25 -05:00
Keysat 9bcb45789d v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints
Spark Control now exposes a per-chunk worker designed for Recap Relay
to orchestrate against. Recap Relay does the chunking + global speaker
clustering (consistent with how it already handles the Gemini path);
Spark Control handles the GPU-bound per-chunk work.

Parakeet container:
  - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model
    (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method
    runs Sortformer + extracts one 192-dim voice fingerprint per detected
    local speaker (concatenating each speaker's audio across the chunk
    and running TitaNet's get_embedding).
  - main.py: new POST /v1/audio/diarize-chunk endpoint that returns
    segments + speakers_detected + fingerprints + models in one shot.

Spark Control:
  - new POST /api/audio/diarize-chunk that proxies to parakeet's new
    endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s
    retry-after) as the other audio endpoints. Returns the raw JSON
    upstream because Recap Relay is the consumer; no merging needed.

Response shape Recap Relay receives per chunk:
  {
    "duration": 300.0,
    "segments":  [{"start_s","end_s","speaker"}, ...],   # LOCAL labels
    "speakers_detected": ["Speaker_0","Speaker_1",...],
    "fingerprints": {"Speaker_0":[192 floats], ...},
    "models": {"diarization":"...","embedding":"..."}
  }

Recap Relay's job:
  1. Chunk audio (existing chunking infrastructure)
  2. POST each chunk to /api/audio/diarize-chunk in parallel
  3. Collect all fingerprints from all chunks
  4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine)
  5. Re-label segments with global cluster IDs
  6. Concatenate transcripts (from a separate parallel call to
     /v1/audio/transcriptions) with timestamp offsets and merge with
     re-labeled diar segments

After installing v0.13.0:1, click "Reapply patches" on the Speech Models
card to push the updated diarizer.py + main.py into the parakeet
container — TitaNet will download (~25 MB) on first call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:37:05 -05:00
Keysat 8f8efbf0fc v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer
After five hotfix iterations on the WhisperX install (v0.12.0:0–:4) we
never got a working docker build. The fundamental constraint isn't
patchable from outside NVIDIA: NGC PyTorch on ARM64 (the only base that
runs on Spark 2's GB10 Blackwell) ships a custom-versioned torch
2.10.0a0+b558c98 that has no pre-built torchaudio match anywhere.
WhisperX → pyannote → torchaudio is a hard dependency chain we couldn't
satisfy without rebuilding torchaudio against torch 2.10's alpha API.
Walking away cleanly is better than another night of chasing.

Removed from the codebase:
  - image/whisperx_container/* (Dockerfile + requirements + app/main.py)
  - image/app/whisperx_install.py (install manager + SSH ship-context logic)
  - image/Dockerfile COPY whisperx_container
  - WHISPERX_* config keys in config.py
  - whisperx service entry in services.py
  - WhisperX-preferred branch in audio_proxy.py
  - /api/whisperx/* endpoints in server.py
  - install banner + progress dialog in index.html
  - render + handlers in app.js
  - .whisperx-install styles in style.css

Spark 2 cleaned in tandem (user-authorized): container removed,
~/whisperx-build/ removed, 5.4 GB of dangling image layers + 1.3 GB of
builder cache reclaimed. parakeet-asr and magpie-tts unaffected and
healthy throughout.

The audio path is back to exactly what shipped in v0.11.0:3:
  POST /api/audio/transcribe-with-speakers
    → Parakeet (transcription) + Sortformer (diarization) in parallel
    → merged by timestamp into speaker-labeled blocks

v0.13.0:1+ will add the actually-needed fixes that the WhisperX detour
was meant to address:
  1. memory cap on the parakeet-asr container so a long-audio crash
     can't swap-thrash Spark 2 again
  2. a chunking proxy in /api/audio/transcribe-with-speakers that
     splits inputs >10 min before Sortformer

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:03:19 -05:00
Keysat ef5d6ec334 v0.12.0:4 - hotfix: torchaudio build fails without --no-build-isolation
Build was crashing inside torchaudio's setup.py with:
  ModuleNotFoundError: No module named 'torch'

PIP_CONSTRAINT was correctly pinning torch/torchvision in the install
target env, but pip's PEP 517 build isolation creates a SEPARATE fresh
Python env just for the build wheel step — and that env has no torch
in it. torchaudio's setup.py imports torch to discover CUDA flags, so
it crashes. Pip even printed a deprecation warning that this isolation
behavior is hardening, not relaxing.

Fix:
  1. Pre-install torchaudio's build deps (setuptools, wheel, ninja,
     pybind11) into the main env since we're disabling isolation.
  2. Add --no-build-isolation to the torchaudio install so the build
     uses NGC's torch directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:53:43 -05:00
Keysat 99a50a6776 v0.12.0:3 - hotfix: build torchaudio from source against NGC's torch
NGC PyTorch (the only base with working torch on Spark's ARM64 + sm_120
Blackwell) doesn't ship torchaudio. Stock pip wheels are amd64-only AND
ABI-incompatible with NGC's custom torch 2.10.0a anyway. Pip install
just fails or crashes at runtime.

Real fix:
  - apt install git cmake build-essential ninja-build
  - pip install git+https://github.com/pytorch/audio.git@v2.5.1
    with TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0" (sm_120 for Blackwell GB10)
  - this compiles torchaudio against the torch already in the image, so
    ABI matches by construction

Then constraints.txt locks torch + torchvision + torchaudio so the later
`pip install whisperx` can't swap any of them.

Cost: +3-5 min to the first install. Docker layer cache reuses the
built torchaudio on every subsequent rebuild.

Torchaudio v2.5.1 is the last tag that builds cleanly against
torch 2.5-2.10 — main branch is too volatile against NGC's alpha torch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:40:50 -05:00
Keysat 15210e9590 v0.12.0:2 - hotfix: pin NGC's torch versions so pip can't break the ABI
WhisperX docker build was crashing at the model-prewarm step:
  OSError: undefined symbol: torch_library_impl

Root cause: the NGC PyTorch base ships custom builds of torch +
torchaudio + torchvision matched together for Blackwell (sm_120). When
pip installed whisperx, it pulled the latest stock torchaudio wheel as
a transitive dep, which was compiled against a different libtorch and
won't load against NGC's.

Fix: at build time, capture NGC's actual torch/torchaudio/torchvision
versions into /tmp/torch-constraints.txt, then `pip install -c` that
constraint for all subsequent installs. pip can't swap torch out, so
the ABI stays consistent. whisperx and pyannote are happy with
torch>=2.0 — NGC's 2.10.0a0 satisfies that easily.

The pinned versions print to the build log so you can see them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:26:08 -05:00
Keysat 0ba2a3a3fc v0.12.0:1 - hotfix: WhisperX install fails on first scp because ~ doesn't
expand inside shlex.quote()

Symptom: "Failed to ship Dockerfile — bash: line 1: ~/whisperx-build/
Dockerfile: No such file or directory"

Same bug pattern as v0.8.1:1 (disk probe). shlex.quote() wraps in single
quotes, and the remote shell doesn't do tilde expansion inside single
quotes — so it tries to write to a literal directory named "~".

Fix: use $HOME in double-quoted shell context, which the remote shell
expands correctly. The file names (Dockerfile, requirements.txt, etc.)
are hardcoded so they're safe to embed unquoted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:16:44 -05:00
Keysat b87cb0f99b v0.12.0:0 - WhisperX as a one-click dashboard install + managed service
Replaces the manual rsync+build+run with a proper spark-control feature.
First in the audio path that doesn't require shell access on Spark 2.

What's in the box
─────────────────
* image/whisperx_container/   - the build context (Dockerfile, requirements,
  app/main.py FastAPI wrapper). Mainline pipeline: faster-whisper for STT +
  pyannote 3.1 for diarization + wav2vec2 forced alignment. Single endpoint
  /v1/audio/transcribe-with-speakers returns the exact same shape spark-
  control's existing endpoint does, so the recap-relay PR spec needs no
  changes when we cut over.

* image/app/whisperx_install.py - install manager. ships build context to
  Spark 2 over SSH, runs `docker build`, runs `docker run` with 40 GB
  memory cap (vs Sortformer's unbounded which thrashed Spark 2 on a 90-min
  file), polls /health until both Whisper + pyannote report loaded.

* Audio proxy: /api/audio/transcribe-with-speakers now prefers WhisperX
  when its /health reports diarizer_loaded=true, falls back to the legacy
  Parakeet + Sortformer path otherwise. Same response shape either way.
  Clean cutover, easy rollback (`docker rm whisperx-asr`).

* Dashboard (Audio / Speech tab):
  - "Add WhisperX" banner appears when not installed, with a primary
    "Install WhisperX" button. One click triggers the install.
  - Build progress dialog with phase + elapsed timer + live build log via
    SSE (`/api/whisperx/install/{job_id}/stream`).
  - After install, WhisperX auto-registers as a managed service alongside
    Parakeet and Magpie (Start/Restart/Stop, deep-check, auto-restart).
  - Banner self-hides once /api/whisperx/status reports healthy.

New endpoints
─────────────
  GET  /api/whisperx/status
  POST /api/whisperx/install
  GET  /api/whisperx/install/{job_id}
  GET  /api/whisperx/install/{job_id}/stream  (SSE phase + log)

Config additions (env)
──────────────────────
  WHISPERX_HOST       (defaults to spark2_host)
  WHISPERX_USER       (defaults to spark2_user)
  WHISPERX_CONTAINER  (default: whisperx-asr)
  WHISPERX_PORT       (default: 8002)
  WHISPERX_MODEL      (default: medium; tiny/base/small/medium/large-v3)

Dockerfile
──────────
Added COPY whisperx_container /app/whisperx_container so the runtime
install manager can read the build context from inside the spark-control
image and ship it over SSH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:02:26 -05:00
Keysat 346df907d2 v0.11.0:3 - button sizing fix: unify base .btn to 12px / 6px 12px
User feedback: every action button OUTSIDE the parakeet/magpie service
cards looked too big. Specifically called out: "Reapply patches",
"Restart container", "Switch to this", "Download". The ones on the
service cards (Start/Restart/Stop) were the size he liked.

Root cause: the base .btn used font: inherit, so it picked up 15px from
body. .service-actions .btn was the only place with an explicit
font-size: 12px + padding: 6px 12px override.

Fix: change .btn base directly to font-size: 12px + padding: 6px 12px.
Every button across the dashboard now matches the service-card button
footprint. The existing per-context overrides become redundant but
remain in place; they no longer create visible differences.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:54:46 -05:00
Keysat 82224f53e7 v0.11.0:2 - pill sizing fix: match .tag exactly to .status "Healthy" pill
User feedback: every pill outside the Always-On Services cards was rendering
visually taller than the "Healthy" status pill they liked. Root cause was
the .tag additions in 0.11.0:1 (line-height: 1.5, display: inline-block)
that didn't match the .status pill on service cards (which has neither).

Dropped both additions, bumped font-size from 11px → 12px so .tag is now
pixel-identical to .status:
  font-size: 12px;
  padding: 2px 8px;
  border-radius: 999px;
  background: var(--surface-2);
  border: 1px solid var(--border);

Every pill on the dashboard (mode-cluster/mode-solo/cap/on-disk/not-on-disk/
custom-pill/.tag.ok/.tag.warn/.tag.bad) now renders at the same footprint
as the Healthy/Unhealthy/Starting pills on the service cards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:46:57 -05:00
Keysat a7102105aa v0.11.0:1 - dashboard polish: tabs, collapsible endpoint, pill consistency
Three UX improvements, all client-side; no backend or behavior changes.

1. LLM / Audio tabs under the hardware section. The single long column got
   split into two tabbed views:
     * LLM       -> model swap + download panel + spark-vllm-docker updates
     * Audio     -> Parakeet/Magpie services + speech-model patches
   Selection persists in localStorage; default is LLM. The swap-panel
   (in-flight LLM swap) sits ABOVE the tab strip so it stays visible
   regardless of which tab is active.

2. Collapsible OpenAI-compatible Endpoint card. New chevron in the card
   header collapses everything except the title. State persists per browser
   via localStorage. Defaults to collapsed since you rarely need the URL/
   model details visible (and the same info is one tab swap away).

3. Unified pill sizing. The .sm-pill class in speech-models was rendering
   subtly larger than .tag pills on model cards. Dropped .sm-pill entirely
   and reused .tag with semantic color modifiers (.tag.ok / .tag.warn /
   .tag.bad). Same 11px / 2px×8px footprint everywhere now. Also added
   explicit line-height: 1.5 + display: inline-block to .tag to lock down
   vertical sizing.

No new endpoints, no new dependencies. Tested locally with node --check
and ast.parse(). Verified the tab DOM structure wraps the right sections
and the speech-models panel still self-shows/hides on data load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:33:16 -05:00
Keysat ddfd508c2f v0.11.0:0 - Speech model patches panel (lifecycle for v0.10.0 overlays)
Folds the image/parakeet_patches/apply.sh script into a one-click
dashboard action and adds drift detection so you can see at a glance
whether the parakeet-asr container has the latest Sortformer overlays
that spark-control ships.

Backend:
  * image/app/speech_models.py - SpeechModelsManager: reads /health from
    Parakeet, sha256s the local overlay files inside spark-control's
    Docker image (/app/parakeet_patches), sha256s the same files inside
    the parakeet-asr container via `docker exec ... sha256sum`, surfaces
    in_sync / drift / missing status per file.
  * GET  /api/speech-models           - status payload
  * POST /api/speech-models/reapply   - copies overlays into container,
                                         verifies python syntax, restarts,
                                         polls /health for ~120s, returns
                                         step-by-step result
  * POST /api/speech-models/restart   - plain `docker restart parakeet-asr`

Dockerfile: now COPY parakeet_patches into the image at /app/parakeet_patches
so the runtime can read them. Future spark-control releases auto-carry
newer overlay versions; the panel surfaces drift after upgrade.

Frontend: new "Speech model patches" section on the dashboard with
  * Status pill (in sync / drift / missing)
  * Per-file SHA comparison (local vs container)
  * Loaded-models pills (ASR + diarizer)
  * Reapply + Restart buttons (both with confirmation modals)
  * Live progress display during reapply with per-step ✓/✗

Verified post-install against the running cluster:
  GET /api/speech-models shows both files in_sync (SHAs match) and both
  models loaded ready on Spark 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:58:13 -05:00
Keysat 6664543dec v0.10.0:1 - hotfix: merge function now joins words with proper spacing
Smoke testing v0.10.0:0 against a real anarlog audio.mp3 showed the
output running words together: "I'mrecordingrightnow", "don'tyoutry".

Root cause: _merge_words_with_speakers was doing "".join(cur_words),
assuming Parakeet returns words with leading whitespace (which the
hyprnote local Parakeet does, but the Spark-hosted Parakeet does not).

Rewrote the join with a small helper that:
  - Strips each token (handles both leading-space and no-leading-space
    word formats)
  - Joins with a single space
  - Keeps punctuation tight — no space before period/comma/colon/etc.

Verified post-install with the same test audio:
  [00:06] Speaker_0: I'm I'm recording right now.
  [00:18] Speaker_1: you're you're on your computer and your phone, right?

No other changes — Parakeet container patches and the endpoint shape
stay identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:42:04 -05:00
Keysat 33768ae3d7 v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers
Adds a new pipeline for diarized transcription that any client (recap-relay,
ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM
or UI included — name resolution / analysis happen downstream where prompts
and rendering are configurable.

Architecture:
  Spark 2 / parakeet-asr container:
    + /opt/parakeet/app/diarizer.py        (new: SortformerDiarizer class)
    + /opt/parakeet/app/main.py            (patched: loads diarizer, adds
                                            /v1/audio/diarize endpoint)
    Model: nvidia/diar_sortformer_4spk-v1  (~150 MB, ungated, NeMo native)

  Spark Control:
    + POST /api/audio/transcribe-with-speakers
      Body: multipart file
      Returns: {
        duration, language, speakers_detected,
        segments: [{start_ms, end_ms, speaker, text}, ...],
        models: {transcription, diarization}
      }
      Runs Parakeet ASR + Sortformer in parallel, merges words to speaker
      turns by timestamp, groups into speaker-change blocks (breaks also
      on >1.5s silence gaps).
    + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns
      503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2.

Apply Sortformer patches to the running Parakeet container with:
  bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user>

Patches are reversible — apply.sh backs up the original main.py inside the
container at main.py.pre-sortformer before overwriting. Restore by copying
that file back and removing diarizer.py, then docker restart.

v0.11 follow-up: dashboard "Speech Models" panel to swap/update model
versions from the UI instead of needing to re-run apply.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:14:48 -05:00
Keysat 8a5862dcb2 v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart
Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention)
fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the
proxy relayed the upstream 500 verbatim, Open WebUI showed "Server
connection error" with no signal to retry, and recovery took up to
5 minutes (waiting for the next periodic deep-health probe).

Now the proxy:
  1. Detects 500 from /v1/audio/transcriptions
  2. Fires deep_health.run_one("parakeet") as a background asyncio task
     (which contains the same wedge-detect + rate-limited auto-restart
     logic, but runs immediately instead of waiting for the next tick)
  3. Returns 503 with a clear detail message and Retry-After: 60

The client (Open WebUI, Home Assistant, etc.) gets a proper retry
signal; the auto-restart triggers inside seconds; the next attempt
~60s later succeeds. Rate-limiting (3 restarts per 30 min) is
inherited from the deep-health module so this can't cause restart
storms.

server.py: pass deep_health into build_audio_router().
audio_proxy.py: new 503-with-restart branch; signature now accepts
                deep_health as an optional dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:07:35 -05:00
Keysat 589b3e59ab v0.9.0:1 - hotfix: add python-multipart for /v1/audio/transcriptions
v0.9.0:0 introduced the OpenAI audio proxy whose /v1/audio/transcriptions
endpoint uses FastAPI's Form() + File() parameters. Those require
python-multipart at runtime; it wasn't in image/pyproject.toml because
none of the prior endpoints needed multipart.

Result: FastAPI raised RuntimeError("Form data requires python-multipart")
during route registration, the entrypoint exited 1, and StartOS's
reverse proxy started closing TLS handshakes with PR_END_OF_FILE_ERROR
because there was no upstream to forward to.

Fix: add python-multipart>=0.0.9 to dependencies. Dashboard, /api/*,
and the new /v1/* audio endpoints all come back up cleanly. No other
code changes.

Verified post-install: Uvicorn running on http://0.0.0.0:9999,
"Application startup complete" in the logs, package status 'installed'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:01:52 -05:00
Keysat d67152624e v0.9.0:0 - OpenAI-compatible audio proxy for Open WebUI / Home Assistant
Adds three new endpoints to spark-control that translate OpenAI's
audio API shapes to the Parakeet (STT) and Magpie (TTS, NVIDIA Riva)
services on the Sparks:

  GET  /v1/models                — STT model + Magpie's 60+ voices
  POST /v1/audio/speech          — OpenAI body -> Magpie multipart synthesize
                                    (returns audio/wav passthrough)
  POST /v1/audio/transcriptions  — relay to Parakeet (already compatible)

Verified shapes against the live services:
  - Parakeet returns OpenAI-style {"text": "..."} or verbose_json with
    segments+words. Already a perfect drop-in for OpenAI clients.
  - Magpie returns raw WAV bytes with Content-Type: audio/wav. NOT
    base64-wrapped JSON as one might assume. The proxy is literally a
    body-translation on the request side; response is passthrough.

Voice language is auto-derived from the voice name (e.g.
Magpie-Multilingual.EN-US.Mia -> language=en-US) so clients don't
need to set it explicitly.

Open WebUI / Home Assistant / Recap Relay can now all point at one
URL — https://<spark-control>.local/v1 — and get LLM, STT, TTS
behind a single identity. No shim service to deploy.

Pure addition: no existing routes touched; the dashboard, /api/*,
download flow, deep-health, hardware probes are all unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:41:48 -05:00
Keysat 26382dc932 v0.8.1:2 - card button flips to blue "Download" when weights are absent
When a model's weights aren't on disk, the green "Switch to this"
button on the card is replaced by a blue "Download" button that
calls /api/download directly with the model's repo and the right
mode (solo -> spark1, cluster -> both). One-click re-install of a
previously-deleted model, no more pasting the repo into the manual
download form.

Also adds a confirmation dialog showing the model name, size, and
target Spark(s) before kicking off the download — and disables the
button when another download is already in flight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:30:51 -05:00
Keysat c07eaeb4ee v0.8.1:1 - fix disk probe: $HOME wasn't expanding inside shlex.quote
The 0.8.1:0 probe wrapped the entire path (including $HOME) in
shlex.quote, which produces single quotes — preventing shell
variable expansion. The resulting `[ -d '$HOME/.cache/...' ]` test
looked for a literal path starting with the string $HOME and
always failed, so every model reported as "not downloaded" and no
trash icons rendered.

Fix: embed $HOME in a double-quoted shell context (which allows
expansion) and validate the cache dirname against a whitelist
[A-Za-z0-9._-]+ rather than relying on shlex quoting. The dirname
is fully constrained by HF's naming rules + our org--name munging,
so the whitelist is tight enough.

Verified against Spark 1: probe now correctly reports the
25,075,981,924 bytes (23.4 GB) of Qwen3.6's cache dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:58:43 -05:00
Keysat 36ca99f73b v0.8.1:0 - delete model weights from disk via card trash icon
Each model card now shows whether its weights are present on disk
(with GB size) or not yet downloaded. When present and the model
isn't currently loaded, a trash icon appears; clicking it pops a
confirmation showing exactly how many GB will be freed and on
which Spark(s), then runs rm -rf on the HF cache directory via SSH.

Cluster-mode models are removed from both Sparks; solo-mode from
Spark 1 only. Safety rails: refuses to delete the currently-loaded
model, refuses during an in-flight swap or download, and the
catalog entry stays intact so it can be re-downloaded anytime.

Backend:
  - new image/app/disk.py: probe_disk + delete_from_disk over SSH
  - GET  /api/models/disk-status — parallel probe across all catalog models
  - DELETE /api/models/{key}/disk — guarded rm -rf, logs to connectivity events

Frontend:
  - on-disk / not-downloaded pills on every card
  - trash icon-btn in card-actions row (hidden when not on disk)
  - confirmation dialog showing per-host bytes-to-free
  - disk-status re-checked every 60s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:07:20 -05:00
Grant 5662d957af v0.8.0:4 - vLLM deep-health: 'no model loaded' is idle, not a wedge
Previously a ConnectError on /v1/models classified vLLM as failing, which would feed into the wedge auto-restart heuristic. But when no model is loaded (the normal idle state between swaps, or after a failed swap leaves the vllm_node container up with no process serving), nothing is listening on 8888 — that's by design, not a wedge.

The vLLM probe now does a two-step check:
  1. GET /v1/models. ConnectError or empty list -> ok=true with note='no model currently loaded (idle)'. No auto-restart triggered (it wouldn't help anyway — restarting vllm_node kills any loaded model and doesn't load a new one).
  2. If a model is loaded, POST 1-token chat completion. A 5xx here is a genuine wedge worth restarting for.

Result: deep-health correctly reports 'no model loaded' as informational rather than flagging it as a failure. Auto-restart for vLLM only fires when a model is actually loaded AND inference fails — the right semantics.
2026-05-12 14:50:00 -05:00
Grant 5a5634a3a9 v0.8.0:3 - add --max-num-batched-tokens=16384 to vision models (gemma4, qwen3-vl)
After the recent eugr/spark-vllm-docker update, vLLM became stricter about multimodal token budgets:

  ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496) is
  larger than max_num_batched_tokens (2048). Please increase max_num_batched_tokens.

Each image input produces 2496 tokens, but vLLM's default --max-num-batched-tokens of 2048 is just under. Same class of bug as the Qwen3.6 Mamba block-size assertion we fixed in 0.6.0:1, surfacing on different models.

Fix: bake --max-num-batched-tokens=16384 into every multimodal model entry. Now applied to:
  - qwen36 (already had it for the Mamba constraint; works for multimodal too since Qwen3.6 has vision)
  - gemma4 (crashed today on engine init)
  - qwen3-vl (would crash with the same error if anyone tried it)

The pre-flight Test button validates argparse but the 2048<2496 check happens at runtime engine init, so it's not caught by Test — only by actually trying to load. This is exactly the kind of bug v0.7's Test catches the *syntax* of but not the *semantics*; runtime errors like this still surface only on real swap. Known limitation documented in v0.7 release notes.
2026-05-12 14:47:32 -05:00
Grant 9edb70418e v0.8.0 - Deep health probes + auto-restart on CUDA wedge
deep_health.py:
- Synthetic probes per service, all payloads generated in-memory (BytesIO), never written to disk:
  - Parakeet: 1s of digital silence via in-memory WAV → POST /v1/audio/transcriptions
  - Magpie:   short 'hi' text → POST /v1/audio/synthesize (multipart form-data, real TTS API endpoint discovered via openapi.json)
  - vLLM:     1-token completion against currently-loaded model
- Background loop runs every 5 minutes (configurable). Best-effort: exceptions in the loop never kill it.
- Auto-restart on wedge-pattern errors (cudaErrorUnknown / CUFFT_INTERNAL_ERROR / 500 / Engine core init failed): docker restart of the affected container.
  - Rate-limited: max 3 restarts per service per 30 min.
  - Cooldown: 120 s between consecutive restarts on the same service.
  - 60 s startup grace before any auto-restart can fire after the app boots.
- Probe failures + recoveries logged via record_report(source='deep-health') into the connectivity history alongside the polling-based transitions.

API:
- GET /api/deep-health: per-service last result + auto-restart counters
- POST /api/deep-health/{service}/run: manual trigger now

UI:
- Service cards show 'Deep check ok/FAILED <time> <latency>' inline, plus a ↻ button to run-now
- Auto-restart count in 30-min window surfaced on the card when > 0
- Inline error excerpt shown for failed probes

Bug fix: server.py app startup hook was placed before the FastAPI app object was constructed (would crash on import). Moved after.
2026-05-12 14:41:01 -05:00
Grant b7be1bab24 v0.7.0 - Pre-flight launch validation (Test button on every model card)
validate.py:
- Builds the same args list a real swap would pass to 'vllm serve'
- SSHes into Spark 1 and runs vLLM's own argparse layer inside the running vllm_node container, WITHOUT initializing the engine
- Uses FlexibleArgumentParser (from vllm.utils.argparse_utils, with fallback to engine.arg_utils) + make_arg_parser — the exact same parser the 'vllm serve' CLI uses. Earlier attempt with bare argparse.ArgumentParser was too strict (rejected '--moe_backend' with underscore that the real CLI accepts via FlexibleArgumentParser's normalization)
- Returns structured {ok, stage, error, cmd_args, launch_cmd} so the UI can surface the exact failure cause

Endpoint: POST /api/swap/{key}/validate. Cheap (~5s), no engine init, no disruption to the currently-loaded model.

Frontend: 'Test' button on every model card, inline result below the action row (green check or red detailed error). Result stays visible until the user reloads or clicks Test again.

Catches: typos in flag names, deprecated/removed flags after a vLLM upgrade, type mismatches. Does NOT catch runtime-only failures (Mamba block-size assertion, OOM at load, kernel-compat). Ok=true is necessary-but-not-sufficient; ok=false is definitive 'don't bother running it'.
2026-05-12 13:37:37 -05:00
Grant 6b799113c4 v0.6.0:1 - fix Qwen3.6 Mamba block-size assertion at launch
vLLM trips on launching Qwen3.6-35B-A3B-NVFP4 with:
  AssertionError: In Mamba cache align mode, block_size (2096) must be
  <= max_num_batched_tokens (2048).

Qwen3.6 uses a Mamba-attention hybrid. The default --max-num-batched-tokens of 2048 is just under the model's required block_size of 2096. The upstream sibling recipe (qwen3.5-35b-a3b-fp8.yaml) sets it to 16384; use the same value.

Earlier qwen36 swaps in this session worked because vLLM hadn't reached the Mamba-validation code path on that prior path (different attention backend pick or auto-retry). Whatever the reason, the explicit flag avoids the dance.

Also documented in known-issues.md.
2026-05-12 13:22:24 -05:00
Grant d90e7a230a v0.6.0 - Service-level connectivity tracking + passive failure-report endpoint
connectivity.py:
- Generalized 'spark' subject to any string; renamed 'spark' field to 'subject'
- Legacy v0.5 events with the old 'spark' field are migrated transparently on read (kind defaults to 'transition')
- New record_report(subject, ok, source, detail, latency_ms): always appends an event with kind='report'; does NOT mutate the current state (only active polling is authoritative)
- summary() returns events normalized to the new schema

Wiring:
- /api/status now calls record_state for vllm/parakeet/magpie (dedup on no-change)
- /api/services calls record_state for each service after its http check
- Result: dashboard observes service-level transitions automatically with no extra polling

Passive endpoint:
- POST /api/health-event with {service, ok, source?, error?, ms?}
- Useful for external apps (e.g. Open WebUI) to surface sub-poll-interval failures the dashboard would otherwise miss

UI:
- Connectivity dialog groups events by subject (hosts ordered first, then services)
- Per-subject summary shows transition count, down count, report count, failed-report count
- Transitions and reports render inline with distinct styling; reports show source app + error + latency
- Legacy v0.5 events render unchanged

Docs:
- README documents /api/health-event with a curl example

Package: bump to 0.6.0:0
2026-05-12 13:19:27 -05:00
Grant 6209c40f79 v0.5.0 - Wake-on-LAN + connectivity history
wol.py:
- build_magic_packet(): standard 6x0xFF + 16x MAC layout
- send_local_broadcast(): direct from container (ports 9 + 7 for safety)
- send_via_peer(): preferred path; SSHes to the OTHER Spark and runs a Python one-liner there so the packet originates on the target's LAN segment (most reliable)
- MAC validation + normalization

connectivity.py:
- /data/connectivity.json persistence (thread-safe, atomic rename)
- Stores per-Spark current state + last_change timestamp + rolling 200-event log
- Records up/down transitions; computes down_seconds / up_seconds durations
- MAC cache populated lazily during hardware probes

hardware.py:
- Probe now reads MAC via /sys/class/net/<default-route-iface>/address
- After each probe, record_state() emits a transition event if state changed
- record_mac() caches the address so WoL works when the Spark next goes down

Endpoints:
- GET /api/connectivity: macs, current state, last_change, events[]
- POST /api/spark/{name}/wake: tries via-peer first, falls back to direct broadcast

UI:
- Unreachable hardware card shows the cached MAC + 'Wake (WoL)' button (only if MAC known)
- New 'Connectivity log' button opens a modal with per-Spark transition history (last 25 each), including duration of each prior up/down period
- pollHardware also pulls /api/connectivity so WoL buttons appear without an extra fetch

Package: bump 0.5.0:0; main.ts sets CONNECTIVITY_LOG=/data/connectivity.json
2026-05-12 12:51:49 -05:00
Grant e332363004 v0.4.0 - NIM installer + dashboard resilience
Hotfix (was v0.3.1):
- services.py: cache 'unreachable' per (host,user) for 25s so a dead Spark doesn't hang every /api/services call behind 6s ssh timeout
- ssh_run timeout reduced 10 -> 6s for docker_state probes
- hardware probe: shorter SSH timeout (6s), longer cache TTL for failures (25s)
- JS pollStatus retries loadModels() if state.models is empty (recovers from cold-start proxy timeout)
- Unreachable hardware card now includes troubleshooting steps (Spark Control cannot SSH into an unreachable Spark to restart it)

v0.4 NIM installer:
- nim.py module: curated SUGGESTED_NIMS list (Parakeet, Magpie, Riva) + NimManager that runs docker login nvcr.io + docker pull + docker run -d --gpus all -p PORT:PORT -v VOL:/opt/nim/.cache -e NGC_API_KEY -e ... --restart=unless-stopped + chown the volume to uid 1000 + restart. Streams all output via SSE; redacts the API key from log lines.
- custom_services.py: persists installed NIMs to /data/services-overrides.yaml so they appear in the services panel after install
- services.py: merges custom services into the panel
- /api/nim/catalog GET, /api/nim/install POST + GET/SSE
- /api/services/{name} DELETE for custom services
- UI: '+ Install NIM' button next to 'Always-on services'; modal lists curated images each with a 'Pick' button + a custom-image form; installation runs in a second dialog with phase + elapsed timer + collapsible log
- NGC API key field added to Configure Sparks (masked); injected as NGC_API_KEY env var into the container

Package: bump 0.4.0:0; main.ts adds SERVICES_OVERRIDES + NGC_API_KEY env vars
2026-05-12 12:32:29 -05:00
Grant ea35ac03ef v0.3.0:1 - hotfix: parallel SSH probes + longer timeout
- Hardware probes for spark1 and spark2 now run via asyncio.gather (parallel) so the worst-case wall time is max(per-probe), not sum
- Bump per-probe SSH timeout from 8s to 12s to absorb first-call overhead (StrictHostKeyChecking=accept-new on first connect + nvidia-smi cold start)
- Unreachable Spark now shows up cleanly in the UI as a single 'unreachable' card with the error message
2026-05-12 12:14:36 -05:00
Grant 1a86fb0bf0 v0.3.0 - Hardware dashboard + knob context + Explain context + Open WebUI link
Hardware dashboard:
- New hardware.py module: SSH probes each Spark for hostname, uptime, load+cores, RAM, disk, GPU (name, util, temp, power) + per-process GPU memory sum
- DGX Spark uses unified memory (nvidia-smi memory.total returns N/A); fall back to per-process compute memory and compute fraction against system RAM. Marks with gpu_unified_memory=true.
- 4s TTL cache in HardwareProbe to avoid hammering
- /api/hardware returns per-Spark snapshot
- UI: 'Spark hardware' section at the top with per-Spark cards (CPU load, RAM, GPU mem (unified), GPU util + temp + power, disk) — bars with warn threshold styling
- Polls every 8s

Knob context (tied to live hardware):
- Each Advanced knob now shows plain-English help text
- 'GPU memory %' shows '~N GB allocated · ~M GB left for OS/buffers' computed from actual Spark RAM
- 'Max context' shows '~N pages of text'
- Toggles show tradeoff descriptions

Explain context:
- ' Explain context' button on the update banner
- /api/explain-updates POST: forwards pending commits to the loaded vLLM model and streams its response back as SSE
- Renders into an expandable 'Explained by the loaded LLM' section under Pending commits
- Reasoning tokens shown italicized when the model emits them

Open WebUI integration:
- New 'Open WebUI URL' optional field in Configure Sparks
- /api/config exposes it; UI shows 'Open chat ↗' button in the top bar if set

Downloads:
- Third radio option: Spark 1 only / Spark 2 only / Both Sparks
- Backend picks SSH target based on mode
- HF repo link icon next to the input
- Helper line about NVFP4 for Blackwell

Model cards:
- Repo name is now a clickable link to its Hugging Face page

Package: bump 0.3.0:0
2026-05-12 12:00:15 -05:00
Grant 66be0c1fc1 v0.2.4 - Hotfix: Unknown status + copy UX + update banner context
Bug fix:
- config.py: empty PARAKEET_CONTAINER / MAGPIE_CONTAINER env vars (from migrating to v0.2.0+ where the field is optional and saved as '') now fall back to 'parakeet-asr' / 'magpie-tts' via the 'or' idiom. Confirmed live: services classify as 'running' instead of 'unknown'.

UX:
- Replaced text 'Copy' buttons with compact icon buttons (clipboard SVG)
- Endpoint Base URL + Model ID + curl snippet are now click-to-copy themselves (the value AND a separate icon button)
- Service cards: host, base URL, and model are now three separate copyable rows
- Update banner: leading explanatory line — 'Updates to eugr/spark-vllm-docker — the upstream project that orchestrates vLLM on your Sparks. These are not firmware, OS, or model updates.' with a link to the repo.
2026-05-12 11:45:55 -05:00
Grant 91b5d6d6a6 docs: update README with v0.2 feature summary 2026-05-12 11:31:14 -05:00
Grant 4c67ccd28d v0.2.3 - Per-model Advanced settings + catalog-add for downloaded models
Backend:
- overrides.py: read/write /data/models-overrides.yaml (knobs + custom entries)
- apply_knobs_to_args(): strip matching flags from bundled vllm_args and append knob values, so knob changes properly override bundled defaults
- extract_knobs_from_args(): seed UI knob values from bundled args so the Advanced dialog has correct starting state
- models.py: load_catalog merges overrides on top of bundled yaml
- GET /api/models returns effective_knobs per model
- PUT /api/models/{key}/knobs persists knob changes
- POST /api/models adds a custom catalog entry
- DELETE /api/models/{key} removes a custom entry (bundled models cannot be deleted)
- swap_manager.reload_catalog() called after each mutation so swaps see latest

Frontend:
- New 'Advanced' button on every card opens a modal dialog: max-model-len input, gpu-memory-utilization slider, three optimization checkboxes (fastsafetensors, prefix caching, FP8 KV cache). Save persists; Cancel discards. Custom models also have a Delete button.
- After a successful download, automatically open the 'Add to catalog' dialog pre-filled with the repo, with the same knob defaults — user just enters key, display name, and clicks Save.
- Custom catalog entries are tagged with a blue 'custom' pill on the card.

Package: bump 0.2.3:0; main.ts sets MODELS_OVERRIDES=/data/models-overrides.yaml so overrides persist on the StartOS volume.
2026-05-12 11:30:47 -05:00
Grant ea328c2e2f v0.2.2 - spark-vllm-docker update checks + Apply Update
Backend:
- updates.py: get_update_status() runs git fetch + git rev-list --left-right --count HEAD...origin/main to learn ahead/behind/dirty, plus git log for pending commits
- UpdateManager class with asyncio.Lock; one update at a time
- POST /api/updates/apply triggers "git pull --ff-only && ./build-and-copy.sh -c" over SSH with streamed log + phase detection (Pulling / Building the vLLM container / Copying to peer Sparks)
- GET /api/updates returns {ok, behind, ahead, dirty, current, log[], branch}

Frontend:
- Persistent banner near footer: hidden when up-to-date, blue when N commits behind, warn (orange) when local dirty changes block update
- 'Show details' expands a list of pending commits
- 'Apply update' triggers the long-running build with phase + elapsed timer + collapsible logs
- Confirmation dialog explains the 5–40 min duration

Package: bump 0.2.2:0
2026-05-12 11:26:55 -05:00
Grant 27bfc2d6fd v0.2.1 - Model download with %% progress
Backend:
- download.py module: drives ./hf-download.sh <repo> [-c --copy-parallel] over SSH, parses tqdm output (regex matches '8%|...| 2.06G/25.1G [03:20<18:35, 20.6MB/s]') into percent + bytes done/total + elapsed + ETA + rate
- DownloadManager: in-memory job tracking with asyncio.Lock (one download at a time)
- POST /api/download, GET /api/download/{id}, SSE /api/download/{id}/stream
- Phase detection: Connecting / Fetching N files / Downloading / Copying to peer Sparks / Done

Frontend:
- '+ Download a new model' button next to LLM swap section title
- Inline form: HF repo text field + solo/cluster radio + Cancel/Start
- Progress UI: spinner, elapsed timer, phase label, percent fill, stats line (bytes/rate/ETA), collapsible raw logs

Package: bump 0.2.1:0
2026-05-12 11:24:31 -05:00
Grant 61e5d5cce8 v0.2.0 - Always-on services panel with per-service host config
Dashboard:
- New 'Always-on services' section with cards for Parakeet and Magpie
- Each card: host:port, model loaded, status pill (Healthy/Unhealthy/Starting/Not configured)
- Start, Restart, Stop buttons. Buttons disabled when not applicable for current state
- Restart counter shown when > 1 (would have surfaced the old magpie crash loop)

Backend:
- New /api/services GET: docker container state + http health for each support service
- New POST /api/services/{name}/{action} for start | stop | restart
- services.py module: docker_state, run_action via SSH
- config.py: PARAKEET_HOST/USER/CONTAINER and MAGPIE_* env vars, default to spark2_*
- health.py: use per-service hosts (no longer hard-wired to spark2_host)

Package:
- sparkConfig.yaml.ts: add 6 new optional fields
- configureSparks action: optional 'Parakeet host', 'Parakeet container', 'Magpie host', 'Magpie container' fields; descriptions explain they default to Spark 2 when blank
- Handler normalizes nulls to empty strings before merge
- main.ts: pass new env vars to container
- bump to 0.2.0:0
2026-05-12 11:21:15 -05:00
Grant 0aa4bfb303 known-issues: mark magpie crash loop RESOLVED with chown fix recipe
Volume magpie-model-cache was owned by root, container drops to uid 1000. Fix:
docker run --rm -v magpie-model-cache:/cache alpine chown -R 1000:1000 /cache
+ docker restart magpie-tts. After ~3 GB NGC model download, healthy on :9000.
2026-05-12 11:12:25 -05:00
Grant ad68e0e16e 0.1.0:4 - expose /api/endpoints as separate StartOS service interface
Adds a second sdk.createInterface with type='api' and path='/api/endpoints' on the
same uiPort (9999). StartOS dashboard now shows two service interfaces: Web UI and
OpenAI-compatible API. The API URL is discoverable to other services without users
needing to remember the /api/endpoints suffix.
2026-05-12 11:07:51 -05:00
Grant d6fefec017 0.1.0:3 - Show Public Key layout + /api/endpoints service-discovery
- showPublicKey now uses result.group: install command and raw key are each their own one-click copy box; description is brief
- /api/endpoints returns stable shape { vllm, parakeet, magpie } with base_url + model + ready, for other LAN services to consume without hardcoding Spark IPs
- health.py: parakeet/magpie now also expose base_url
- README: documented /api/endpoints shape
2026-05-12 10:52:57 -05:00
Grant 22c817a4ec 0.1.0:2 - remove 'modelo' default everywhere (it's Grant's username, not factory)
Per user correction: 'modelo' is not the DGX Spark factory default. Generic-ize:
- configureSparks: no default user; placeholder 'your SSH username'
- sparkConfig schema: empty string defaults
- main.ts env fallback: empty
- showPublicKey: drop the 'modelo' fallback; skip Spark if user not configured
- Update feedback memory with the correction
2026-05-12 10:39:57 -05:00
Grant d718a3b78a Bump to 0.1.0:1 — portability + endpoint display
- configureSparks.ts: generic placeholders (e.g. 192.168.1.10), no Grant-specific IPs; descriptions explain the role of each node instead of naming his hardware
- showPublicKey.ts: reads sparkConfig.yaml; emits a ready-to-paste one-liner (KEY='...' followed by 'ssh user@host "echo $KEY >> authorized_keys"' for each configured Spark). Falls back to generic instructions if Configure Sparks hasn't been run yet.
- /api/status now includes vllm.base_url for the OpenAI endpoint
- New endpoint panel in UI: base URL + model ID rows with copy buttons + collapsible curl example
- Bump version to 0.1.0:1
2026-05-12 10:38:18 -05:00
Grant f99241ec3e Add per-model descriptions + repo-cleanup polish
- models.yaml: add 'description' field for all 5 models (generic, anyone-can-use)
- ModelDef gains optional description: str | None field
- UI: render description below meta tags; mute the repo line further
- escapeHtml() for safety in case descriptions/names contain HTML chars
- Update runbook: how to add a new model with description
2026-05-12 10:19:09 -05:00
Grant d6f4390372 Add friendly swap UI: timer + phase indicator + progress bar + collapsible logs
- Elapsed timer (mm:ss) in top-right of swap panel
- Phase display: Stopping / Starting / Loading weights (N/M shards) / Compiling / Warming up / Ready
- Progress bar with smooth fill mapped from phase
- Raw vLLM logs hidden behind <details> 'Show technical logs'
- Detection from log content (safetensors %, torch.compile, Application startup, Ray cluster join)
- Backfill from /api/swap/{id} on reattach (mid-swap reload works)
2026-05-12 10:11:14 -05:00
Grant 8a95609504 Add Spark prerequisites section to runbook (spark-vllm-docker is upstream + Spark-side) 2026-05-12 10:05:17 -05:00
Grant f553547c32 Update README with build flow + post-install steps; note IPv6/mDNS quirk 2026-05-12 10:03:37 -05:00
Grant 49f776f172 Pack spark-control_x86_64.s9pk (55 MB)
- Move models.yaml into image/ so the docker build context is self-contained
- Fix manifest: dockerfile=../image/Dockerfile, workdir=../image
- Add LICENSE (MIT) and assets/README.md (StartOS marketplace listing)
- s9pk validates: id=spark-control, version=0.1.0:0, osVersion=0.4.0-beta.6, sdkVersion=1.3.3
- Image embeds python:3.12-slim + openssh-client + FastAPI app + models.yaml
2026-05-12 09:52:53 -05:00
Grant 4a3eeb4f20 Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)
Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model.
First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
2026-05-12 09:49:08 -05:00
Grant f2beb500e7 Add StartOS 0.4 package scaffold (manifest, main, interfaces, 2 actions)
- package/Makefile + s9pk.mk + package.json + tsconfig.json
- startos/manifest: dockerBuild source pointing at ../image/Dockerfile
- startos/main: reads /data/config.yaml reactively, passes env vars to container
- startos/interfaces: binds port 9999 as HTTP UI
- startos/actions: showPublicKey (read /data/ssh/id_ed25519.pub), configureSparks
- TS + JS bundle compile clean (tsc --noEmit, ncc build)
2026-05-12 09:36:15 -05:00
Grant db2f4269da Initial scaffold: image/ FastAPI app, models.yaml, docs
- image/ FastAPI app: /api/status, /api/swap, /api/swap/{id}/stream, /api/test-connection
- models.yaml: 5-model catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen25-72b)
- README, runbook, known-issues
- Dry-run swap verified against live Spark 1 (gemma4 currently loaded)
2026-05-12 09:29:13 -05:00
41 changed files with 108 additions and 1845 deletions
-6
View File
@@ -11,11 +11,5 @@ node_modules/
dist/
build/
.DS_Store
# Claude Code — deny by default, allow-list shared wiring (see standards/portability.md)
.claude/*
!.claude/rules/
!.claude/agents/
!.claude/commands/
!.claude/skills/
!.claude/settings.json
+34 -13
View File
@@ -6,9 +6,6 @@ Browser-based StartOS 0.4 package controlling a dual NVIDIA DGX Spark AI cluster
Subsystem guidance lives in `docs/guides/` and loads when matching files are touched (Claude Code lazy-loads via `.claude/rules/` symlinks; other agents read the guides directly): `startos-package.md` (build/versioning, `package/**`), `fastapi-image.md` (dev server/env/layout, `image/**`), `redaction.md` (vendoring + test gates), `audio-speech.md` (parakeet patches, cluster-container footguns, audio testing). **Read `docs/guides/audio-speech.md` before touching the Sparks' containers over SSH** — ops sessions don't trip the path scoping.
> **Inbox check:** At session start, if `~/Projects/standards/INBOX.md` exists, scan it for
> items tagged `(spark-control)` and surface them before proposing next steps; triage with `/triage`.
## Stack
- Two halves, always coordinated:
@@ -23,7 +20,6 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
```bash
(cd package && make x86) # build the s9pk; make install sideloads (restarts live service — ask first)
(cd image && uvicorn app.server:app --port 9999) # local dev — needs env vars, see fastapi-image rule
(cd image && .venv/bin/python -m pytest) # offline unit suite (launch-cmd injection, label-merge)
(cd image && .venv/bin/python -m app.redaction.test_gateway) # offline redaction suite 1
(cd image && .venv/bin/python app/redaction/test_scrub_leak.py) # offline redaction suite 2
./scripts/test-audio-with-speakers.sh <audio-file> # e2e audio — hits the LIVE cluster
@@ -55,12 +51,37 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state
- **Live service runs v0.22.0:0** (installed and serving); **v0.23.0:0 is built, committed (`e783653`), tagged, and published to Gitea Releases but its live install is PENDING** — see the P3 line below. Working features: swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge; configurable vLLM port (Configure Sparks field, blank ⇒ 8888). Local/fine-tuned model support lands live once v0.23.0:0 installs. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
- **matrix-bridge bot tile (done, v0.21.0:1, verified live):** `bot`-kind service tile — status badge from docker-state only (no HTTP port), plus **Update** / Restart / Stop/Start / **View logs**. Code: `app/matrix_bridge.py` + `/api/matrix-bridge/{update,logs}` (update streams; 25-min cap; fail-loud). Driven directly as `modelo` on Spark 2 (**no `sudo -iu`** — spark2 has no passwordless sudo). User is a blank-default Configure-Sparks field (`matrix_bridge_user`); blank → tile hidden (portable). Host reuses `spark2_host` (`192.168.1.87` = the bot's box `spark-32d0`); container/dir/branch are env-overridable defaults. **Load-bearing ops dep:** Update's `git fetch` runs as `modelo`, which needs `modelo`'s `~/.ssh/config` pinning the Gitea deploy key with `IdentitiesOnly yes` — else the wrong key is offered and Gitea denies (publickey). Optional next, only if the bot dev asks: Docker `HEALTHCHECK` for running-but-disconnected detection (spec §Note).
- **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (102 passing). Covers `build_launch_command` (incl. the shell-injection round-trip + local-model bind-mount), the transcript↔diarizer label-merge, the `shellsafe` validators, `matrix_bridge.build_update_command` (+ phase detection), and the configurable-topology layer (`test_topology.py`: `DISABLED_SERVICES` parsing, `vllm_container` override, disabled-service skip in `services_from_settings` + `check_*`, `probe_vllm_endpoint`). Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
- **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
- **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — SHIPPED **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). Committed `136a471`, pushed, tagged `v0.22.0`, rebuilt clean, installed, and **published to the self-hosted Gitea Releases** 2026-06-17 (`make release``scripts/gitea-release.sh`, takes `GITEA_URL` + a write token). **Distribution model (decided 2026-06-17):** Gitea Releases + a read-only token the adopter's agent uses to pull the latest s9pk (`GET /api/v1/repos/grant/spark-control/releases/latest` → download the `.s9pk` asset → sideload). Note: Gitea returns `browser_download_url` on its `.local` ROOT_URL, which won't resolve off-LAN — a remote adopter pulls via whatever address reaches the Gitea (the WireGuard IP). (2) **local-path/fine-tuned models** — DONE in tree, staged as **v0.23.0:0** (`ModelDef.local_path` + exactly-one-source validator; swap bind-mounts the dir at the same container path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook, **no `launch-cluster.sh` change**; "+ Add local model" UI form + `local` badge; `validate_local_path`; disk-delete refused for local; 94 tests pass. Reviewer-agent pass done, findings addressed (path validation + chat-template-location guard folded into the `ModelDef` validator so YAML/override entries are checked too; `_merge_overrides` skips a bad entry instead of failing the whole catalog; `VLLM_SPARK_EXTRA_DOCKER_ARGS` contract documented in `runbook.md`). **Committed `e783653`, tagged `v0.23.0`, built clean, published to Gitea Releases — but `make install` to the live Start9 FAILED: `immense-voyage.local` wasn't resolving via mDNS from the Mac (server up at `192.168.1.72`; `start-cli -H <ip>` reaches it but returns UNAUTHORIZED, auth bound to the registered `.local` host). FINISH-HERE: flush mDNS (`sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder`) or add a hosts entry, then re-run `cd package && make install`** (details in runbook → "Sideload can't reach the server"). (3) **configurable topology** — DONE in tree, staged as **v0.24.0:0** (built clean, not yet committed/installed). Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`, threaded into the swap log-tail + validator exec via `quote_arg`); "services to hide" (`DISABLED_SERVICES` comma list → `Settings.disabled_services` frozenset, skipped by `services_from_settings`, the `check_*` probes, deep-health `run_all`, and connectivity logging — kills the Parakeet-on-8000 collision); second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (`probe_vllm_endpoint` shared with `check_vllm`). `/api/endpoints` gained a `disabled` flag; the health-dot hides when disabled. 102 tests pass (+8 in `test_topology.py`). Swap mechanism deliberately NOT generalized to raw `docker run` (that's coordination, item 4). Install pending — same mDNS situation as v0.23.0. Next: (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
- **Working (v0.18.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack is healthy (11k+ requests/12h, all 200).
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 14s unresponsiveness while the single GPU is continuously busy. Remedy is client-side; a drafted message (in-flight cap 2, hard ceiling 3 global across audio endpoints, retry-with-backoff on timeout/503) is with the owner to forward to that dev.
- **Decided, not implemented:** remote access stays WireGuard/Tailscale split-tunnel — no public interface, so no API auth built; an empirical concurrency sweep is offered but needs the owner's explicit OK in a quiet window. **Revisit (full-eval 2026-06-12):** the "LAN-only, so no auth" call is now load-bearing against RCE — unquoted user input reaches the SSH shell on several endpoints, so the network boundary is the *only* thing preventing cluster takeover. Quoting the injection sinks (work queue) is needed regardless of the auth decision; a defense-in-depth auth/CSRF gate is the follow-on.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
- **Portability:** working tree scrubbed 2026-06-12 — all owner-specific IPs/hostnames/usernames/names replaced with placeholders in tracked files; `claude-code-starter-prompt.md` deleted (old build-time prompt). Real cluster values live only in StartOS install config, shell env vars, and the gitignored `settings.local.json`. **Caveat (full-eval 2026-06-12): git *history* was not rewritten** — the old IPs/hosts/user `modelo`/key name are still recoverable pre-`50c67cd`. The scrub is working-tree-only; treat the repo as private until history is rewritten (see work queue below).
- **Repo wart:** commit `367d986` is labeled `v0.13.0:4` but actually contains everything through v0.18.0:0 — per-version commits for v0.14v0.18 are missing. Keep commit messages accurate going forward.
- **Hosting:** repo pushes to the owner's self-hosted Gitea — remote `gitea`, branch `master`, over SSH (host alias + key live in the local `~/.ssh/config`; no owner-specific details belong in the repo). Push there after committing.
- **Next (pre-eval backlog):** (1) owner forwards the concurrency note to the Signal Engine dev; (2) run the concurrency sweep if the dev wants the measured knee; (3) add the `--memory` cap to parakeet-asr via the Reapply-patches action; (4) pick the next item from ROADMAP.md.
### Full-eval triage (2026-06-12)
Source: `EVALUATION.md` at repo root (full evidence, file:line pointers, scorecard). Findings triaged below; do these before the pre-eval backlog above where they overlap.
**Work queue — P0/P1, fix before sharing the package wider:**
1. ~~**[P0] Shell-quote/validate every user value crossing into SSH**~~**DONE (code, 2026-06-12; not yet shipped).** New `image/app/shellsafe.py` (`validate_repo`/`validate_image`/`validate_container` whitelists + `quote_arg`/`quote_args`). Boundary validation added to `POST /api/models` (repo) and `POST /api/nim/install` (image+container); `shlex.quote` applied at every SSH sink — `models.build_launch_command` (repo+args, covers `vllm_args`+knobs), `download._do` (repo), `nim._do` (image/container/volume/port/env), `services.docker_state`+`run_action` (container). Verified: injection survives only as a single quoted token, vLLM preflight `shlex.split` round-trip intact, both redaction suites still pass. Side-benefit: NGC key now `shlex.quote`'d in `nim._do` (was single-quoted) — closes the quote-breakout half of the P2 NGC-key item; the process-list-exposure half remains. **Ship step pending:** version bump + release notes + rebuilt s9pk.
2. **[P0] Decide the git-history question** — owner IPs/hosts/user `modelo`/key name persist pre-`50c67cd` despite the working-tree scrub. Either rewrite history (`git-filter-repo`) + rotate the `id_ed25519_shared` key, or keep the repo private-forever. Blocks any public/shared publish. **(Open — git-ops decision, not code.)**
3. ~~**[P1] Defense-in-depth gate on mutating endpoints**~~**DONE (code, 2026-06-12; not yet shipped).** `csrf_guard` HTTP middleware in `server.py` rejects state-changing requests whose `Origin`/`Referer` hostname ≠ the served host. Scoped to control endpoints; the programmatic API surface is exempt (`/v1/*`, `/scrub`, `/rehydrate`, `/api/search`, `/api/audio/`, `/api/health-event`) so downstream consumers are unaffected. No app-layer token auth (deliberate — would break consumers + the non-technical owner). Verified via TestClient: cross-origin control POST→403, same-origin/no-Origin→pass, exempt prefixes always pass, GET never blocked. **Verify on-box:** confirm the StartOS reverse proxy passes `Host`/`Origin` so the dashboard isn't false-positive-blocked.
4. ~~**[P1] Validate the Qdrant `collection`**~~**DONE (code, 2026-06-12; not yet shipped).** `_safe_collection` whitelist (`[A-Za-z0-9._-]`, rejects `..`) + URL-encoded path segment in `embeddings_proxy.py`. The raw `filter` is left as a passthrough (Qdrant parses it; pydantic enforces `dict`) — locking it to an allowlist would break hybrid-search consumers; the path segment was the real injection vector.
**Shipping (all of #1/#3/#4 batched):** version bumped `0.18.0:1``0.19.0:0` with release notes (`versions/v0_1_0.ts`). Rebuild `make x86`; `make install` (live-service restart) needs explicit go-ahead. Not committed yet.
**Known debt — P2, track but not blocking:**
- Test coverage is redaction-only; swap state machine, proxies, SSH wrapper, and the package have zero automated tests. Live-cluster paths (swap exec, audio, embeddings/search) couldn't be exercised at all — biggest blind spot.
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml:6-13`).
- StartOS registry blockers (only if pursuing the registry): source not public + `packageRepo`/`upstreamRepo` are `example.com` placeholders (`manifest/index.ts:12-13`).
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
- NGC API key inlined single-quoted into a remote shell command (`nim.py:147`) — pass via stdin/env.
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py:107`) — latent race as concurrency grows.
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.
**Parked — P3+, do in bulk when next touching docs/packaging:**
- README Status block stale (`v0.2.3 / 0.13.0:4` → v0.18.0:1, undercounts features); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js:177`) + `task_id` reflected in scrub JSON.
- Packaging cosmetics: `marketingUrl` placeholder; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though manifest declares `aarch64`; release notes describe the scrub, not capabilities.
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
+3 -3
View File
@@ -12,13 +12,13 @@ This is a capable, well-documented single-operator control plane: a ~960-line Fa
- **Command injection → cluster RCE** is reported by *both* the evaluator (P1) and the security-auditor (P0) at the same sinks (`models.py:80`, `swap.py:101`, `download.py:129`, `nim.py:145-166`, `services.py:144`). The evaluator demonstrated `build_launch_command` producing a live `;`-separated command from a hostile `repo`. Merged as **one P0** — the auditor's adversarial evidence (browser/CSRF reachability over plaintext HTTP, no auth) escalates the evaluator's network-gated P1.
- **No auth on state-mutating endpoints** is the shared root enabler: the evaluator filed it P2 (documented/intentional), the auditor filed the **CSRF** angle P1 (a malicious page in the operator's browser can `fetch()` the mutating routes and chain into the P0 injections). Merged into one P1, noting the auditor's CSRF evidence escalates the evaluator's original P2.
- **Owner data exposure**: the evaluator flagged real IPs/username in the (gitignored, untracked) `.claude/settings.local.json`; the auditor independently found the same class of data — IPs, hostnames, user `<spark-user>`, key name — persisting in **git history** despite the v0.18.0:1 working-tree scrub. These are the same concern at two locations; the git-history copy is the P0.
- **Owner data exposure**: the evaluator flagged real IPs/username in the (gitignored, untracked) `.claude/settings.local.json`; the auditor independently found the same class of data — IPs, hostnames, user `modelo`, key name — persisting in **git history** despite the v0.18.0:1 working-tree scrub. These are the same concern at two locations; the git-history copy is the P0.
- **Front-end output hygiene**: the evaluator flagged `current_model` rendered via `innerHTML` without `escapeHtml` (`app.js:177`, P3); the exerciser noted `task_id` echoed verbatim in scrub JSON. The auditor read the UI as broadly `escapeHtml`-clean — see Disagreements.
## Priority queue
- [P0] Command injection via unquoted user input (`repo`, `vllm_args`, NIM `image`/`container`/`port`, custom-service `container`) interpolated into SSH shell commands → arbitrary RCE as the SSH user on the Sparks — `models.py:80`, `swap.py:101`, `download.py:129`, `nim.py:145-166`, `services.py:144`; demonstrated via `build_launch_command` — evaluator + security-auditor
- [P0] Owner infra topology (IPs `<spark-1-ip>`/`<spark-2-ip>`, QSFP `<spark-1-qsfp-ip>`/`<spark-2-qsfp-ip>`, hosts `<spark-1-host>`/`<spark-2-host>`, user `<spark-user>`, key `<ssh-key>`) persisted in git history despite the working-tree scrub → target list for the unauthenticated endpoints — security-auditor [RESOLVED 2026-06-12: history rewritten with git filter-repo; 0 hits across all refs]
- [P0] Owner infra topology (IPs `192.168.1.103/.87`, QSFP `192.168.100.10/11`, hosts `spark-27ea`/`spark-32d0`, user `modelo`, key `id_ed25519_shared`) persists in git history pre-`50c67cd` despite the working-tree scrub → target list for the unauthenticated endpoints — security-auditor
- [P1] No auth + no CSRF protection on state-changing endpoints (plaintext `http`, `interfaces.ts:8`) → any LAN peer, or a malicious page in the operator's browser, can drive swap/install/stop/delete and chain into the P0 injections — security-auditor (CSRF P1) + evaluator (auth P2, escalated)
- [P1] SSRF / Qdrant path injection: caller `collection` interpolated into the Qdrant URL with no validation and raw `filter` forwarded verbatim — `embeddings_proxy.py:237,175,204` — security-auditor
- [P2] Test coverage is redaction-only; the swap state machine, proxies, SSH wrapper, and the StartOS package have zero automated tests — evaluator
@@ -62,7 +62,7 @@ No lens score was overturned by cross-agent evidence; Security stays at 2 with t
## Suggested order of work
1. **Close the injection sinks**`shlex.quote` or strict-regex-validate every user-controlled value crossing into SSH (`repo`, `vllm_args`, NIM `image`/`container`/`port`, custom-service names); the safe pattern already exists in `disk.py:_SAFE_DIRNAME`. Cheap, local, independent of the auth decision. (P0)
2. **Decide the git-history question** before any wider sharing — rewrite history (`git-filter-repo`) and rotate the named `<ssh-key>` key, or commit to keeping the repo private-forever. (P0)
2. **Decide the git-history question** before any wider sharing — rewrite history (`git-filter-repo`) and rotate the named `id_ed25519_shared` key, or commit to keeping the repo private-forever. (P0)
3. **Add a defense-in-depth gate** on mutating endpoints — an `Origin`/referer check or a shared-token header in middleware — so a misconfigured StartOS exposure isn't instant RCE; leave read-only probes open. (P1)
4. **Harden the remaining inputs** — validate the Qdrant `collection`, pin dependency floors + commit a lockfile, add upload size caps, drop the root container `USER`. (P1P2)
5. **Add a minimal pytest harness** for `build_launch_command` (incl. injection cases), the swap state transitions, and `_merge_words_with_speakers` — the untested core. (P2)
+1 -1
View File
@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2026 Alice
Copyright (c) 2026 Grant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
+1 -34
View File
@@ -2,23 +2,8 @@
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap``GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
Sequenced:
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
2. **Local-path / fine-tuned model support** — DONE, v0.23.0:0. Catalog/`ModelDef` gained `local_path` (exactly one of `repo`/`local_path`); swap bind-mounts the dir into the vLLM container at the same path via the launch script's `VLLM_SPARK_EXTRA_DOCKER_ARGS` hook (no `launch-cluster.sh` change); "+ Add local model" form + `local` badge; disk-delete refused for local models; `validate_local_path` boundary check. His merged `ten31-v2` was the motivating case.
3. **Configurable topology** — DONE, v0.24.0:0. Three optional Configure-Sparks knobs: vLLM container name (`VLLM_CONTAINER`, blank ⇒ `vllm_node`; threaded through the swap log-tail + pre-flight validator via `quote_arg`); "services to hide" (`DISABLED_SERVICES`, comma list — hidden services show no tile and are skipped by status/deep-health/connectivity probes, killing the Parakeet-on-8000 collision); and a second-Spark vLLM monitor via a `kind: vllm` custom service in `services-overrides.yaml` (read-only tile probed through the shared `probe_vllm_endpoint`). `/api/endpoints` gained a `disabled` flag. Covers report P4/P5/#6. (Generalizing the *swap* mechanism to the adopter's raw `docker run` was deliberately left out — that's coordination, item 4; he swaps via his own crons and uses Spark Control to monitor.)
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
- **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule).
## Near term
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
- parakeet-asr `--memory` cap, shipped via the Reapply-patches action (guards against swap-thrash on very long audio).
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
## Audio quality
@@ -37,21 +22,3 @@ Sequenced:
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
- Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views.
## Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)
P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:
**P2 — track:**
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for `build_launch_command` (incl. injection cases), swap transitions, and `_merge_words_with_speakers` is the highest-value start.
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml`).
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
- NGC API key still appears on the remote process command line (`nim.py`) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py`) — latent race as concurrency grows.
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.
**P3 — bulk-fix when next touching docs/packaging:**
- README Status block stale (`v0.2.3 / 0.13.0:4` → now v0.19.0:0); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js`) + `task_id` reflected in scrub JSON.
- Packaging: `marketingUrl`/`packageRepo`/`upstreamRepo` are `example.com` placeholders; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though the manifest declares `aarch64`.
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
- StartOS registry (only if ever pursuing it): source must be public + real repo URLs.
+1 -1
View File
@@ -9,7 +9,7 @@ from the live deployment.
## 1. Connection / auth
- **Base URL:** `https://<spark-control-host>` (the operator's Start9 LAN address,
e.g. `https://<spark-control-host>:62419`). A `.local` form also exists (survives IP
e.g. `https://192.168.1.72:62419`). A `.local` form also exists (survives IP
changes); the operator can provide it.
- **TLS:** Start9's self-signed Root CA. On the LAN, set `verify=False` /
`rejectUnauthorized:false` (curl `-k`), or install the Start9 Root CA into your
+1 -6
View File
@@ -24,17 +24,12 @@ Other env vars: `BIND_PORT`, `MODELS_YAML`, `SSH_DIR`, `SSH_KNOWN_HOSTS`, `MODEL
## Tests
Two kinds, both run with the `image/.venv` interpreter (system python3 has no deps):
- **pytest unit suite** — offline, pure functions, no cluster. `.venv/bin/python -m pytest` from `image/`. Lives in `image/tests/`; currently covers `build_launch_command` (incl. the shell-injection / `shlex` round-trip invariant) and the transcript↔diarizer label-merge (`_merge_words_with_speakers`). Install the test dep once with `pip install -e '.[dev]'`. Add new pure-function coverage here.
- **Standalone scripts** — the redaction suites and the live-cluster audio e2e are run directly (not via pytest). See the redaction and audio rules.
No pytest harness — each suite is a standalone script run with the `image/.venv` interpreter (system python3 has no deps). See the redaction and audio rules for the suites themselves.
## Conventions
- Pydantic request models go at **module scope**, never inside a `build_router()` body (FastAPI silently 422s otherwise).
- New external-facing endpoints get documented in `docs/` (`AUDIO_API.md`, `EMBEDDINGS.md`, `REDACTION_GATEWAY.md`) and noted in release notes.
- **SSH-input safety:** any user-supplied value that reaches an SSH command on the Sparks MUST go through `app/shellsafe.py` — validate against a whitelist at the API boundary, then `quote_arg`/`quote_args` (`shlex.quote`) at the sink. Never raw f-string a user value into a command string. Existing sinks: `models.build_launch_command`, `download`, `nim`, `services`; `disk.py` keeps its own `_SAFE_DIRNAME` because it needs `$HOME` to expand server-side. The vLLM pre-flight (`validate.py`) relies on `shlex.split` cleanly reversing this quoting — preserve that invariant.
- **CSRF / same-origin:** state-mutating *control* endpoints are guarded by the `csrf_guard` middleware in `server.py` (rejects requests whose `Origin`/`Referer` host ≠ the served host). A new endpoint meant to be called **cross-origin by downstream apps** (a proxy/data endpoint) must be added to `_CSRF_EXEMPT_PREFIXES`, or browser POSTs from those apps will 403. No app-layer token auth by design (LAN/VPN-only; would break consumers).
## Layout
-16
View File
@@ -25,22 +25,6 @@ npm run prettier # prettier --write startos (no semicolons, single quotes, tra
- Version format is `X.Y.Z:N` (`:N` = revision). Bump in `package/startos/versions/v0_1_0.ts`; **replace** the release notes — never leave old notes behind under an extra key (any unknown key fails `tsc`).
- New external-facing endpoints get noted in release notes for downstream app developers (Recap Relay, Ten31 Transcripts, CRM, Signal Engine consume these APIs).
## Releasing to Gitea
The s9pk is distributed via Gitea **Releases** (the binary is gitignored — never commit it). Adopters pull the latest asset with a read-only token. Per-version ritual:
```bash
# 1. bump version in startos/versions/v0_1_0.ts (+ replace release notes), then:
cd package && make x86 # build
# 2. commit + push the source change
git tag vX.Y.Z && git push gitea vX.Y.Z # tag — plain vX.Y.Z, NO ':' (git refs forbid it)
make install # optional: sideload to your own server (restarts it — go/no-go)
# 3. publish the s9pk as a release asset (needs a write-scoped token):
GITEA_URL=https://<gitea-host> GITEA_TOKEN=<write-token> make release
```
`make release``scripts/gitea-release.sh`: creates/reuses the release for the tag and uploads (replacing) the s9pk asset; idempotent, fails loud on real HTTP errors. `GITEA_INSECURE=1` skips TLS verify for a self-signed LAN cert. Hand adopters a **read-only** token (repository: Read), ideally on a dedicated reader account; their agent then `GET`s `/api/v1/repos/<owner>/spark-control/releases/latest` and downloads the `.s9pk` asset. Note Gitea returns `browser_download_url` on its configured ROOT_URL (may be a `.local` name) — an off-LAN adopter pulls via whatever address actually reaches the Gitea.
## Layout
- `package/startos/` — manifest, interfaces, actions (`configureSparks`, `showPublicKey`), `versions/v0_1_0.ts` (current version string + release notes).
+2 -2
View File
@@ -41,7 +41,7 @@ from .config import Settings
logger = logging.getLogger("spark-control.audio")
# Kokoro default voice. The four curated voices below were Alice-tested for
# Kokoro default voice. The four curated voices below were Grant-tested for
# narration/recap-style content; bm_george is the default. Clients can pass
# any of Kokoro's 67 voices in the `voice` field — see /v1/models.
DEFAULT_VOICE = "bm_george"
@@ -100,7 +100,7 @@ def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
"kind": "stt",
},
]
# Curated first — these are the four Alice chose for narration/recap.
# Curated first — these are the four Grant chose for narration/recap.
seen = set()
for v in CURATED_VOICES:
data.append({
+7 -77
View File
@@ -1,54 +1,13 @@
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
from pathlib import Path
from .shellsafe import validate_container
log = logging.getLogger(__name__)
def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default)
def _env_container(name: str, default: str) -> str:
"""Resolve a container-name env var, validating it at the config boundary.
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
the sink — but per the repo's two-layer convention it's also whitelist-checked
here. A malformed optional value falls back to `default` rather than crashing
daemon startup (mirrors `_env_int` for VLLM_PORT)."""
val = os.environ.get(name, "") or default
try:
return validate_container(val)
except ValueError:
log.warning("ignoring invalid %s=%r; using %r", name, val, default)
return default
def _env_set(name: str) -> frozenset[str]:
"""Parse a comma-separated env var into a lowercased frozenset of keys.
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
support service can switch its tile + probes off entirely (rather than have
the probe hit whatever else listens on that port — e.g. a vLLM sharing
Parakeet's default 8000)."""
raw = os.environ.get(name, "")
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
def _env_int(name: str, default: int) -> int:
"""Parse an int env var, falling back to `default` when unset, blank, or
malformed. The StartOS Configure panel passes optional numeric fields as an
empty string when left blank, so a bare int("") would crash daemon startup."""
try:
return int(os.environ.get(name, "") or default)
except (TypeError, ValueError):
return default
def _resolve_models_yaml() -> str:
if env := os.environ.get("MODELS_YAML"):
return env
@@ -83,19 +42,12 @@ class Settings:
qdrant_user: str
qdrant_container: str
qdrant_collection: str
matrix_bridge_host: str
matrix_bridge_user: str
matrix_bridge_container: str
matrix_bridge_dir: str
matrix_bridge_branch: str
redaction_map_db: str
redaction_map_ttl: int
ssh_key_path: str
ssh_known_hosts: str
models_yaml: str
vllm_port: int
vllm_container: str
disabled_services: frozenset[str]
parakeet_port: int
kokoro_port: int
embed_port: int
@@ -129,40 +81,18 @@ class Settings:
qdrant_user=_env("QDRANT_USER") or spark2_user,
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
qdrant_collection=_env("QDRANT_COLLECTION", ""),
# matrix-bridge bot container, driven as its own SSH user (the owner
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
# The user is BLANK by default and set via the "Configure Sparks"
# action; leaving it blank reports the service as unconfigured, which
# hides the tile. That keeps the shared package portable — a
# deployment without the bot never shows a stray tile or a hardcoded
# username. Host defaults to Spark 2 (same box); container/dir/branch
# are sensible defaults. All are env-overridable.
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
# Redaction gateway pseudonym-map store (server-held de-anon key).
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")),
ssh_key_path=_env("SSH_KEY_PATH"),
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
models_yaml=_resolve_models_yaml(),
vllm_port=_env_int("VLLM_PORT", 8888),
# Container name for the swappable vLLM on Spark 1. Defaults to the
# bundled launch-cluster.sh container; override if you named yours
# something else (the swap log-tail and pre-flight validator exec
# into it by name).
vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
# Built-in support-service keys (parakeet, kokoro, embeddings,
# qdrant) the deployment doesn't run — hidden from the dashboard and
# never probed.
disabled_services=_env_set("DISABLED_SERVICES"),
parakeet_port=_env_int("PARAKEET_PORT", 8000),
kokoro_port=_env_int("KOKORO_PORT", 8880),
embed_port=_env_int("EMBED_PORT", 8088),
qdrant_port=_env_int("QDRANT_PORT", 6333),
bind_port=_env_int("BIND_PORT", 9999),
vllm_port=int(_env("VLLM_PORT", "8888")),
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
kokoro_port=int(_env("KOKORO_PORT", "8880")),
embed_port=int(_env("EMBED_PORT", "8088")),
qdrant_port=int(_env("QDRANT_PORT", "6333")),
bind_port=int(_env("BIND_PORT", "9999")),
open_webui_url=_env("OPEN_WEBUI_URL", ""),
ngc_api_key=_env("NGC_API_KEY", ""),
)
-11
View File
@@ -10,17 +10,6 @@ Format:
port: 8001
health_path: /health
image: nvcr.io/nim/nvidia/riva-multilingual:latest
A `kind: vllm` entry monitors an additional vLLM on another Spark (read-only —
the swap machinery only drives the primary Spark 1 vLLM). It gets a health tile
probed via /v1/models plus container state and start/stop/restart:
custom:
- key: vllm-spark2
kind: vllm
host: <spark-2-ip>
user: <ssh-user>
container: vllm_node
port: 8000
"""
from __future__ import annotations
import os
-4
View File
@@ -377,10 +377,6 @@ class DeepHealth:
async def run_all(self) -> dict[str, ProbeResult]:
results = {}
for name in self.PROBES:
# Don't deep-probe a service the deployment switched off — its port
# may be answered by something else (e.g. a vLLM on Parakeet's 8000).
if name in self.settings.disabled_services:
continue
results[name] = await self.run_one(name)
return results
+4 -41
View File
@@ -15,7 +15,6 @@ from dataclasses import dataclass
from typing import Optional
from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run
@@ -77,52 +76,16 @@ async def probe_host(host: str, user: str, repo: str, settings: Settings) -> Hos
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_local_host(host: str, user: str, path: str, settings: Settings) -> HostDiskResult:
"""Return whether a local model directory exists on this host and its size.
For locally fine-tuned models (a Spark directory, not an HF cache entry). The
path is whitelisted at the API boundary (shellsafe.validate_local_path); we
shlex-quote it here in depth.
"""
if not host or not user:
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
qp = quote_arg(path)
cmd = f"if [ -d {qp} ]; then du -sb {qp} 2>/dev/null | cut -f1; else echo MISSING; fi"
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
if rc != 0:
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
raw = out.strip()
if raw == "MISSING" or raw == "":
return HostDiskResult(host=host, on_disk=False)
try:
size = int(raw.splitlines()[-1])
except ValueError:
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
async def probe_disk(
repo: str, mode: str, settings: Settings, *, local_path: str | None = None
) -> DiskStatus:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster).
A local model (local_path set) is probed by directory; otherwise by HF cache.
"""
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
"""Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
if mode == "cluster" and settings.spark2_host:
hosts.append((settings.spark2_host, settings.spark2_user))
if local_path:
results = await asyncio.gather(
*(probe_local_host(h, u, local_path, settings) for h, u in hosts)
)
key = local_path
else:
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
key = repo
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
on_disk = any(r.on_disk for r in results)
total = sum(r.size_bytes for r in results)
return DiskStatus(repo=key, on_disk=on_disk, total_bytes=total, per_host=list(results))
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
-8
View File
@@ -26,9 +26,6 @@ echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.dra
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
WGIF=$(ip -o link show type wireguard 2>/dev/null | awk -F': ' 'NR==1 {print $2}')
echo WG_IFACE=$WGIF
echo WG_ADDR=$(ip -o -4 addr show "$WGIF" 2>/dev/null | awk 'NR==1 {print $4}')
""".strip()
@@ -87,11 +84,6 @@ def _parse(out: str) -> dict:
# MAC address on the default-route interface (for Wake-on-LAN)
if info.get("mac"):
parsed["mac"] = info["mac"].lower()
# WireGuard tunnel membership: name + address of the first wg interface, if
# any. Read-only and unprivileged (`ip` needs no root), so it never depends
# on sudo and never breaks the probe — absence just yields no badge.
parsed["wg_iface"] = info.get("wg_iface") or None
parsed["wg_addr"] = info.get("wg_addr") or None
return parsed
+9 -34
View File
@@ -6,28 +6,17 @@ from .config import Settings
_TIMEOUT = 3.0
def _disabled(settings: Settings, key: str) -> dict | None:
"""A clean 'disabled' verdict if `key` is in DISABLED_SERVICES, else None.
Lets an adopter who doesn't run a given support service switch its probe off
entirely — so the probe never hits whatever else listens on that port, and
the connectivity log doesn't record it as perpetually down."""
if key in settings.disabled_services:
return {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
return None
async def probe_vllm_endpoint(host: str, port: int) -> dict:
"""Probe any OpenAI-compatible vLLM at host:port via /v1/models.
Shared by the primary (Spark 1) health check and any extra vLLM registered
as a custom service (kind: vllm) to monitor a second Spark."""
base_url = f"http://{host}:{port}/v1" if host else None
if not host:
return {"ok": False, "error": "vllm host not configured", "base_url": base_url}
async def check_vllm(settings: Settings) -> dict:
base_url = (
f"http://{settings.spark1_host}:{settings.vllm_port}/v1"
if settings.spark1_host
else None
)
if not settings.spark1_host:
return {"ok": False, "error": "spark1 not configured", "base_url": base_url}
try:
async with httpx.AsyncClient(timeout=_TIMEOUT) as c:
r = await c.get(f"http://{host}:{port}/v1/models")
r = await c.get(f"http://{settings.spark1_host}:{settings.vllm_port}/v1/models")
r.raise_for_status()
ids = [m["id"] for m in r.json().get("data", [])]
return {
@@ -40,15 +29,7 @@ async def probe_vllm_endpoint(host: str, port: int) -> dict:
return {"ok": False, "error": str(e), "base_url": base_url}
async def check_vllm(settings: Settings) -> dict:
if not settings.spark1_host:
return {"ok": False, "error": "spark1 not configured", "base_url": None}
return await probe_vllm_endpoint(settings.spark1_host, settings.vllm_port)
async def check_parakeet(settings: Settings) -> dict:
if d := _disabled(settings, "parakeet"):
return d
base_url = (
f"http://{settings.parakeet_host}:{settings.parakeet_port}"
if settings.parakeet_host
@@ -66,8 +47,6 @@ async def check_parakeet(settings: Settings) -> dict:
async def check_kokoro(settings: Settings) -> dict:
if d := _disabled(settings, "kokoro"):
return d
base_url = (
f"http://{settings.kokoro_host}:{settings.kokoro_port}"
if settings.kokoro_host
@@ -89,8 +68,6 @@ async def check_kokoro(settings: Settings) -> dict:
async def check_embeddings(settings: Settings) -> dict:
if d := _disabled(settings, "embeddings"):
return d
base_url = (
f"http://{settings.embed_host}:{settings.embed_port}"
if settings.embed_host
@@ -112,8 +89,6 @@ async def check_embeddings(settings: Settings) -> dict:
async def check_qdrant(settings: Settings) -> dict:
if d := _disabled(settings, "qdrant"):
return d
base_url = (
f"http://{settings.qdrant_host}:{settings.qdrant_port}"
if settings.qdrant_host
-186
View File
@@ -1,186 +0,0 @@
"""Update + logs for the matrix-bridge bot container on the Spark.
matrix-bridge is a single Docker container managed by docker compose out of a
git clone at `~matrix_bridge_user/matrix-bridge`. Status (the badge) and
start/stop/restart ride the generic service machinery in `services.py`
(`docker_state` / `run_action`). The two things that don't fit that mould live
here:
- **Update** — `git fetch && git reset --hard origin/<branch> && docker
compose up -d --build`. Long-running (docker build), so it streams like the
vLLM `UpdateManager`: fire-and-forget job, SSE stream, fail-loud rc.
- **Logs** — a one-shot `docker logs --tail N` for diagnosing a red badge.
We connect **directly as the configured user** (`modelo` — the repo owner), so
git never trips its dubious-ownership guard and docker runs via the user's
docker-group membership. We deliberately do NOT `sudo -iu modelo`: this Spark
has no passwordless sudo, so a sudo wrap would hang in SSH BatchMode.
"""
from __future__ import annotations
import asyncio
import time
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run, ssh_stream, StreamHandle
# Hard ceiling on a single update. A first build after a base-image bump is
# slow (minutes); the cache makes later ones quick. 25 min is generous headroom
# without letting a genuinely wedged build spin forever.
_UPDATE_TIMEOUT_S = 1500
def build_update_command(directory: str, branch: str) -> str:
"""The update one-liner, run from the bot's git clone as its owner.
`directory` and `branch` come from operator config (not request input), so
they're interpolated directly — same trust model as the Spark hostnames in
`health`/`updates`. `directory` may be `~/...`, which must stay unquoted so
the remote login shell expands it; quoting would defeat that.
"""
return (
f"cd {directory} && "
f"git fetch origin && "
f"git reset --hard origin/{branch} && "
f"docker compose up -d --build"
)
def _phase_for(line: str) -> Optional[str]:
"""Map a streamed output line to a human-readable phase, or None to keep
the current phase. Kept loose — compose/buildkit output varies by version."""
low = line.lower()
if "git reset" in low or "head is now at" in low:
return "Resetting to the latest release…"
if "docker compose" in low or "buildkit" in low or low.startswith("step ") or "=> " in line or "building " in low:
return "Building the bot image…"
if "recreate" in low or "starting" in low or "started" in low or "container matrix-bridge" in low:
return "Recreating the container…"
if "already up to date" in low:
return "No new code; rebuilding…"
return None
@dataclass
class UpdateJob:
id: str
started_at: str
state: str = "starting"
lines: list[str] = field(default_factory=list)
returncode: Optional[int] = None
finished_at: Optional[str] = None
phase: str = "Starting…"
def append(self, line: str) -> None:
self.lines.append(line)
if len(self.lines) > 1000:
del self.lines[: len(self.lines) - 1000]
class MatrixBridgeManager:
def __init__(self, settings: Settings) -> None:
self.settings = settings
self.lock = asyncio.Lock()
self.jobs: dict[str, UpdateJob] = {}
self.current_job_id: Optional[str] = None
def _configured(self) -> bool:
s = self.settings
return bool(s.matrix_bridge_host and s.matrix_bridge_user)
def get(self, job_id: str) -> UpdateJob | None:
return self.jobs.get(job_id)
async def fetch_logs(self, tail: int = 100) -> dict:
"""One-shot `docker logs --tail N <container>` (stderr merged in)."""
s = self.settings
if not self._configured():
return {"ok": False, "error": "matrix-bridge host not configured"}
tail = max(1, min(int(tail), 1000))
# tail is already int-clamped, but quote at the sink anyway so the
# shellsafe convention (no raw interpolation into an SSH command) holds
# regardless of caller.
cmd = f"docker logs --tail {quote_arg(str(tail))} {quote_arg(s.matrix_bridge_container)} 2>&1"
rc, out, err = await ssh_run(
s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, timeout=20
)
return {
"ok": rc == 0,
"rc": rc,
"container": s.matrix_bridge_container,
"output": (out or err).strip(),
}
async def trigger_update(self) -> UpdateJob:
if not self._configured():
raise RuntimeError("matrix-bridge host not configured")
if self.lock.locked():
raise RuntimeError("An update is already in progress")
job = UpdateJob(
id=uuid.uuid4().hex[:8],
started_at=datetime.now(timezone.utc).isoformat(),
)
self.jobs[job.id] = job
self.current_job_id = job.id
asyncio.create_task(self._run(job))
return job
async def _run(self, job: UpdateJob) -> None:
async with self.lock:
try:
await self._do(job)
if job.state != "failed":
job.state = "done"
job.returncode = 0
job.phase = "Done"
except asyncio.TimeoutError:
job.append(f"[error] update timed out after {_UPDATE_TIMEOUT_S}s")
job.state = "failed"
job.returncode = 124
job.phase = "Timed out"
except Exception as e:
job.append(f"[error] {type(e).__name__}: {e}")
job.state = "failed"
if job.returncode is None:
job.returncode = 1
finally:
job.finished_at = datetime.now(timezone.utc).isoformat()
if self.current_job_id == job.id:
self.current_job_id = None
async def _do(self, job: UpdateJob) -> None:
s = self.settings
cmd = build_update_command(s.matrix_bridge_dir, s.matrix_bridge_branch)
job.append(f"$ {cmd}")
job.state = "running"
job.phase = "Fetching latest code…"
handle = StreamHandle()
gen = ssh_stream(s.matrix_bridge_host, s.matrix_bridge_user, cmd, s, handle=handle)
deadline = time.monotonic() + _UPDATE_TIMEOUT_S
try:
while True:
remaining = deadline - time.monotonic()
if remaining <= 0:
raise asyncio.TimeoutError
try:
line = await asyncio.wait_for(gen.__anext__(), timeout=remaining)
except StopAsyncIteration:
break
job.append(line)
phase = _phase_for(line)
if phase:
job.phase = phase
finally:
# Closing the generator terminates the underlying ssh process and
# populates handle.returncode via ssh_stream's finally block.
await gen.aclose()
rc = handle.returncode or 0
if rc != 0:
job.state = "failed"
job.returncode = rc
+8 -78
View File
@@ -1,33 +1,15 @@
from __future__ import annotations
import logging
from typing import Literal, Optional
import yaml
from pydantic import BaseModel, Field, model_validator
from pydantic import BaseModel, Field
from .overrides import apply_knobs_to_args, load_overrides
from .shellsafe import quote_arg, quote_args, validate_local_path
log = logging.getLogger(__name__)
def _chat_template_path(vllm_args: list[str]) -> str | None:
"""Extract the path from a `--chat-template=<path>` arg, if present."""
for a in vllm_args:
if a.startswith("--chat-template="):
return a.split("=", 1)[1]
return None
def _is_within(path: str, base: str) -> bool:
"""True if `path` is `base` itself or lives inside it (lexical check)."""
base = base.rstrip("/")
return path == base or path.startswith(base + "/")
from .shellsafe import quote_arg, quote_args
class ModelDef(BaseModel):
display_name: str
repo: str = "" # HF 'org/name'; empty for a local model
local_path: str | None = None # absolute dir on the Spark; set => local model
repo: str
size_gb: float
mode: Literal["solo", "cluster"]
capabilities: list[str] = Field(default_factory=list)
@@ -37,38 +19,6 @@ class ModelDef(BaseModel):
knobs: dict | None = None # user-customized; merged at launch time
custom: bool = False # True if this came from /data overrides
@model_validator(mode="after")
def _validate_source(self) -> "ModelDef":
if bool(self.repo) == bool(self.local_path):
raise ValueError(
f"model {self.display_name!r} must set exactly one of 'repo' (HF) "
f"or 'local_path' (Spark directory)"
)
if self.local_path:
# Single place that enforces the path whitelist, so YAML/override
# entries get the same boundary check as the API. The quote_arg sink
# is still defense-in-depth.
validate_local_path(self.local_path)
# Only local_path is bind-mounted into the vLLM container, so any
# --chat-template path must live inside it or vLLM can't find it.
tmpl = _chat_template_path(self.vllm_args)
if tmpl is not None and not _is_within(tmpl, self.local_path):
raise ValueError(
f"--chat-template path {tmpl!r} must be inside the model "
f"directory {self.local_path!r} (only that directory is mounted "
f"into the container)"
)
return self
@property
def is_local(self) -> bool:
return bool(self.local_path)
@property
def source(self) -> str:
"""What `vllm serve` is pointed at: the local dir if set, else the HF repo."""
return self.local_path if self.local_path else self.repo
class Defaults(BaseModel):
port: int = 8888
@@ -97,8 +47,7 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
continue
defaults_dump = {
"display_name": entry.get("display_name", key),
"repo": entry.get("repo", ""),
"local_path": entry.get("local_path"),
"repo": entry["repo"],
"size_gb": float(entry.get("size_gb", 0)),
"mode": entry.get("mode", "solo"),
"capabilities": entry.get("capabilities") or [],
@@ -108,12 +57,7 @@ def _merge_overrides(catalog: Catalog) -> Catalog:
"knobs": entry.get("knobs"),
"custom": True,
}
# A single malformed override entry (bad path, missing source, etc.) must
# not take down the whole catalog — skip it and keep the rest loadable.
try:
new_models[key] = ModelDef.model_validate(defaults_dump)
except Exception as e:
log.warning("skipping invalid custom model %r: %s", key, e)
new_models[key] = ModelDef.model_validate(defaults_dump)
return Catalog(defaults=catalog.defaults, models=new_models)
@@ -134,21 +78,7 @@ def build_launch_command(key: str, model: ModelDef, defaults: Defaults) -> str:
solo = "--solo " if model.mode == "solo" else ""
base_args = apply_knobs_to_args(list(model.vllm_args), model.knobs)
args = [f"--port={defaults.port}", f"--host={defaults.host}", *base_args]
# source + args are user-controlled (custom models, knobs); shlex.quote each
# so they cannot break out of the SSH shell command. shlex.split (used by the
# repo + args are user-controlled (custom models, knobs); shlex.quote each so
# they cannot break out of the SSH shell command. shlex.split (used by the
# vLLM pre-flight validator) cleanly reverses this quoting.
prefix = ""
if model.local_path:
# A local model's directory isn't in the HF cache the launch script
# already mounts, so bind-mount it at the SAME path inside the vllm
# container via the script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook. Same
# path inside and out means `vllm serve <dir>` and any
# `--chat-template=<dir>/...` arg both resolve. No launch-cluster.sh
# change needed. (The env assignment sits before the script, so the
# validator's `serve`-keyed shlex round-trip is unaffected.)
mount = quote_arg(f"-v {model.local_path}:{model.local_path}")
prefix = f"VLLM_SPARK_EXTRA_DOCKER_ARGS={mount} "
return (
f"{prefix}./launch-cluster.sh {solo}-d exec vllm serve "
f"{quote_arg(model.source)} {quote_args(args)}"
)
return f"./launch-cluster.sh {solo}-d exec vllm serve {quote_arg(model.repo)} {quote_args(args)}"
+1 -7
View File
@@ -14,7 +14,7 @@ Shape:
custom:
- key: my-new-model
display_name: My New Model (from download)
repo: my-org/my-model # an HF repo; OR set local_path instead (exactly one)
repo: my-org/my-model
size_gb: 20
mode: solo
description: null
@@ -25,12 +25,6 @@ Shape:
fastsafetensors: true
prefix_caching: true
kv_cache_dtype: fp8
- key: my-finetune # a local/fine-tuned model (a directory on the Spark)
display_name: My Fine-tune
local_path: /home/you/models/my-finetune
size_gb: 59
mode: solo
vllm_args: [--chat-template=/home/you/models/my-finetune/chat_template.jinja]
"""
from __future__ import annotations
import os
+18 -188
View File
@@ -3,10 +3,10 @@ import asyncio
import json
from pathlib import Path
from fastapi import FastAPI, HTTPException, Query, Request
from fastapi import FastAPI, HTTPException
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel, ValidationError
from pydantic import BaseModel
from typing import Literal
from .config import Settings
@@ -20,9 +20,8 @@ from .llm_proxy import build_router as build_llm_router
from .embeddings_proxy import build_router as build_embeddings_router
from .redaction_gateway import build_router as build_redaction_router, MapStore
from .hardware import HardwareProbe
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant, probe_vllm_endpoint
from .matrix_bridge import MatrixBridgeManager
from .models import ModelDef, load_catalog
from .health import check_kokoro, check_parakeet, check_vllm, check_embeddings, check_qdrant
from .models import load_catalog
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
from .services import docker_state, run_action, services_from_settings
@@ -44,7 +43,6 @@ hardware_probe = HardwareProbe(settings)
nim_manager = NimManager(settings)
deep_health = DeepHealth(settings)
speech_models = SpeechModelsManager(settings)
matrix_bridge = MatrixBridgeManager(settings)
app = FastAPI(title="spark-control", version="0.1.0")
@@ -183,8 +181,7 @@ async def put_model_knobs(key: str, body: KnobsBody) -> dict:
class CustomModelBody(BaseModel):
key: str
display_name: str
repo: str = ""
local_path: str | None = None
repo: str
size_gb: float = 0
mode: Literal["solo", "cluster"] = "solo"
description: str | None = None
@@ -197,17 +194,8 @@ class CustomModelBody(BaseModel):
async def post_model(body: CustomModelBody) -> dict:
if not body.key or not body.key.replace("-", "").replace("_", "").isalnum():
raise HTTPException(400, "key must be alphanumeric/-/_ only")
# Validate the full entry BEFORE persisting (exactly-one source, local-path
# whitelist, chat-template location). Doing it via ModelDef means the API and
# the YAML-override path share one set of rules, and a bad entry can't be
# written to /data and then break catalog load.
try:
ModelDef.model_validate(body.model_dump())
if body.repo:
validate_repo(body.repo) # HF charset (the model only validates local paths)
except ValidationError as e:
msg = e.errors()[0]["msg"] if e.errors() else str(e)
raise HTTPException(400, msg.removeprefix("Value error, "))
validate_repo(body.repo)
except ValueError as e:
raise HTTPException(400, str(e))
if body.key in catalog.models and not catalog.models[body.key].custom:
@@ -239,13 +227,7 @@ async def get_models_disk_status() -> dict:
return {"configured": False, "models": {}}
keys = list(catalog.models.keys())
statuses = await asyncio.gather(*(
probe_disk(
catalog.models[k].repo,
catalog.models[k].mode,
settings,
local_path=catalog.models[k].local_path,
)
for k in keys
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
), return_exceptions=True)
out: dict[str, dict] = {}
for k, s in zip(keys, statuses):
@@ -276,14 +258,6 @@ async def del_model_disk(key: str) -> dict:
raise HTTPException(404, f"unknown model: {key}")
m = catalog.models[key]
# Never rm a local fine-tune directory from the dashboard — it's irreplaceable
# training output the user placed by hand, not a re-downloadable HF cache.
if m.local_path:
raise HTTPException(
400,
"this is a local model; its directory must be managed on the Spark, not deleted from here",
)
# Refuse if currently loaded
try:
vllm = await check_vllm(settings)
@@ -427,53 +401,6 @@ async def wake_spark(name: str) -> dict:
return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
@app.post("/api/spark/{name}/ssh-key")
async def spark_ssh_key(name: str) -> dict:
"""Ensure the named Spark has an ed25519 keypair and return its PUBLIC key.
This is the Spark's *outbound* identity — the key it uses to log in to other
machines (e.g. the operator's Mac). It is the opposite direction from, and
distinct from, the package's own key shown by the StartOS "Show Public Key"
action (which grants this dashboard SSH access to the Sparks).
Non-destructive: generates the key only if absent, never overwrites an
existing one (which may already be an identity the Spark uses elsewhere).
Public keys are not secret, so returning it is safe. No request-supplied
value reaches the command — `name` is constrained to a fixed set and
host/user come from operator config — so there is nothing to shell-quote.
"""
if name not in ("spark1", "spark2"):
raise HTTPException(404, f"unknown spark: {name}")
host = settings.spark1_host if name == "spark1" else settings.spark2_host
user = settings.spark1_user if name == "spark1" else settings.spark2_user
if not host or not user:
raise HTTPException(400, f"{name} is not configured")
# Empty passphrase so the key is usable unattended; comment carries the
# remote hostname so it's identifiable in an authorized_keys file later.
cmd = (
"set -e; "
"mkdir -p ~/.ssh && chmod 700 ~/.ssh; "
"if [ ! -f ~/.ssh/id_ed25519 ]; then "
'ssh-keygen -t ed25519 -N "" -C "spark-control@$(hostname)" -f ~/.ssh/id_ed25519 >/dev/null 2>&1; '
"echo CREATED=1; else echo CREATED=0; fi; "
"[ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -y -f ~/.ssh/id_ed25519 > ~/.ssh/id_ed25519.pub; "
"echo PUBKEY=$(cat ~/.ssh/id_ed25519.pub)"
)
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=15)
if rc != 0:
raise HTTPException(502, f"couldn't read/create the SSH key on {name}: {err.strip() or out.strip() or f'rc={rc}'}")
created = False
pubkey = ""
for line in out.splitlines():
if line.startswith("CREATED="):
created = line.strip() == "CREATED=1"
elif line.startswith("PUBKEY="):
pubkey = line[len("PUBKEY="):].strip()
if not pubkey:
raise HTTPException(502, f"no public key returned from {name}")
return {"ok": True, "spark": name, "host": host, "user": user, "pubkey": pubkey, "created": created}
@app.get("/api/services")
async def get_services() -> dict:
"""Lifecycle state of always-on support services (Parakeet, Kokoro, …).
@@ -500,15 +427,6 @@ async def get_services() -> dict:
http = await check_embeddings(settings)
elif name == "qdrant":
http = await check_qdrant(settings)
elif svc.kind == "vllm":
# An extra vLLM monitored on another Spark (registered as a custom
# service). Probe its own host/port, not the primary Spark 1 one.
http = await probe_vllm_endpoint(svc.host, svc.port)
elif svc.kind == "bot":
# No HTTP health endpoint (host networking, no port) — judged purely
# by docker state. http_ready stays None so the badge isn't pinned
# to a "Starting…" verdict that can never clear.
http = {"ok": None, "base_url": None}
else:
# Custom services expose a /health endpoint by convention.
http = await check_kokoro(settings) if svc.kind == "tts" else {"ok": None, "base_url": svc.host and f"http://{svc.host}:{svc.port}"}
@@ -519,13 +437,11 @@ async def get_services() -> dict:
"container": svc.container,
"kind": svc.kind,
"base_url": http.get("base_url"),
# None (not False) for services with no HTTP surface (the bot), so
# the UI judges them by docker state alone instead of "Starting…".
"http_ready": None if svc.kind == "bot" else bool(http.get("ok")),
"http_ready": bool(http.get("ok")),
# Prefer the check fn's own top-level model key (embeddings reports
# it there); fall back to a model field inside detail for services
# whose /health embeds it (parakeet).
"model": http.get("model") or http.get("current_model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
"model": http.get("model") or ((http.get("detail") or {}).get("model") if isinstance(http.get("detail"), dict) else None),
"docker_state": docker.get("state"),
"restart_count": docker.get("restart_count"),
"started_at": docker.get("started_at"),
@@ -537,11 +453,8 @@ async def get_services() -> dict:
results = await asyncio.gather(*[one(n) for n in services.keys()])
for name, info in results:
out[name] = info
# Feed http reachability into the connectivity log (transition-only).
# Skip services with no HTTP surface (http_ready is None) — they'd
# otherwise register as perpetually "down".
if info.get("http_ready") is not None:
record_state(name, bool(info.get("http_ready")))
# Feed http reachability into the connectivity log (transition-only)
record_state(name, bool(info.get("http_ready")))
return out
@@ -646,7 +559,7 @@ async def stream_nim_install(job_id: str):
@app.delete("/api/services/{name}")
async def del_service(name: str) -> dict:
# Only allow deleting custom services (not the bundled built-in keys)
if name in ("parakeet", "kokoro", "embeddings", "qdrant", "matrix-bridge"):
if name in ("parakeet", "kokoro", "embeddings", "qdrant"):
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
delete_custom_service(name)
return {"ok": True, "name": name}
@@ -665,81 +578,6 @@ async def service_action(name: str, action: str) -> dict:
return {"name": name, "action": action, **result}
# ---- matrix-bridge bot: update (git pull + rebuild) + logs ----
# Status badge + start/stop/restart ride the generic /api/services machinery
# above (the bot is a registered ServiceDef). Only the long-running Update and
# the logs view need bespoke endpoints.
def _serialize_mb_update(job) -> dict:
return {
"id": job.id,
"state": job.state,
"phase": job.phase,
"started_at": job.started_at,
"finished_at": job.finished_at,
"returncode": job.returncode,
"lines": job.lines,
}
@app.post("/api/matrix-bridge/update")
async def post_matrix_bridge_update() -> dict:
"""Pull latest code, rebuild, and recreate the bot container. Long-running
(docker build) — returns a job id to stream."""
try:
job = await matrix_bridge.trigger_update()
except RuntimeError as e:
raise HTTPException(409 if "in progress" in str(e) else 503, str(e))
return {"job_id": job.id, "state": job.state}
@app.get("/api/matrix-bridge/update/{job_id}")
async def get_matrix_bridge_update(job_id: str) -> dict:
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
return _serialize_mb_update(job)
@app.get("/api/matrix-bridge/update/{job_id}/stream")
async def stream_matrix_bridge_update(job_id: str, request: Request):
job = matrix_bridge.get(job_id)
if job is None:
raise HTTPException(404, "no such job")
async def gen():
sent = 0
last_phase = None
while True:
# An update can run for minutes; bail promptly if the client is gone
# rather than spinning the poll loop until the job's 25-min ceiling.
if await request.is_disconnected():
return
n = len(job.lines)
if n > sent:
for line in job.lines[sent:n]:
yield f"data: {json.dumps({'line': line})}\n\n"
sent = n
if job.phase != last_phase:
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
last_phase = job.phase
if job.returncode is not None and sent >= len(job.lines):
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
return
await asyncio.sleep(0.5)
return StreamingResponse(gen(), media_type="text/event-stream")
@app.get("/api/matrix-bridge/logs")
async def get_matrix_bridge_logs(tail: int = Query(100, ge=1, le=1000)) -> dict:
"""Last N lines of `docker logs` for the bot container (stderr merged)."""
result = await matrix_bridge.fetch_logs(tail=tail)
if not result.get("ok"):
raise HTTPException(502, result.get("output") or result.get("error") or "could not read logs")
return result
# ---- Speech model patch management ----
@app.get("/api/speech-models")
@@ -803,20 +641,17 @@ async def get_endpoints() -> dict:
"base_url": vllm.get("base_url"),
"model": vllm.get("current_model"),
"openai_compat": True,
"disabled": bool(vllm.get("disabled")),
},
"parakeet": {
"ready": bool(parakeet.get("ok")),
"base_url": parakeet.get("base_url"),
"kind": "stt",
"model": (parakeet.get("detail") or {}).get("model") if isinstance(parakeet.get("detail"), dict) else None,
"disabled": bool(parakeet.get("disabled")),
},
"kokoro": {
"ready": bool(kokoro.get("ok")),
"base_url": kokoro.get("base_url"),
"kind": "tts",
"disabled": bool(kokoro.get("disabled")),
},
"embeddings": {
"ready": bool(embeddings.get("ok")),
@@ -825,14 +660,12 @@ async def get_endpoints() -> dict:
"model": embeddings.get("model"),
# The proxied OpenAI-compatible endpoints live on Spark Control itself.
"openai_endpoints": ["/v1/embeddings", "/v1/rerank", "/api/search"],
"disabled": bool(embeddings.get("disabled")),
},
"qdrant": {
"ready": bool(qdrant.get("ok")),
"base_url": qdrant.get("base_url"),
"kind": "vectordb",
"collection": settings.qdrant_collection or None,
"disabled": bool(qdrant.get("disabled")),
},
}
@@ -846,15 +679,12 @@ async def get_status() -> dict:
check_embeddings(settings),
check_qdrant(settings),
)
# Feed health into the connectivity log (deduped — only logs on transition).
# Skip services switched off via DISABLED_SERVICES — they'd otherwise log as
# perpetually down.
for _name, _r in (
("vllm", vllm), ("parakeet", parakeet), ("kokoro", kokoro),
("embeddings", embeddings), ("qdrant", qdrant),
):
if not _r.get("disabled"):
record_state(_name, bool(_r.get("ok")))
# Feed health into the connectivity log (deduped — only logs on transition)
record_state("vllm", bool(vllm.get("ok")))
record_state("parakeet", bool(parakeet.get("ok")))
record_state("kokoro", bool(kokoro.get("ok")))
record_state("embeddings", bool(embeddings.get("ok")))
record_state("qdrant", bool(qdrant.get("ok")))
current_key = _identify_current_model(vllm.get("current_model"))
return {
"configured": settings.configured,
+2 -24
View File
@@ -5,7 +5,6 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
appropriate host.
"""
from __future__ import annotations
import logging
import time
from dataclasses import dataclass
from typing import Literal, Optional
@@ -14,8 +13,6 @@ from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run
log = logging.getLogger(__name__)
# Cache the "unreachable" verdict per (host, user) for a short period so that a
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
@@ -92,27 +89,10 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
container=s.qdrant_container,
port=s.qdrant_port,
),
# matrix-bridge Matrix bot. No HTTP port to probe (host networking, no
# health endpoint) — judged purely by docker state. Driven as its own
# SSH user (modelo, the repo owner) so git/docker run unprivileged.
"matrix-bridge": ServiceDef(
name="matrix-bridge",
kind="bot",
host=s.matrix_bridge_host,
user=s.matrix_bridge_user,
container=s.matrix_bridge_container,
port=0,
),
}
for entry in load_custom_services():
key = entry.get("key")
if not key:
continue
if key in out:
# A custom entry can't shadow a built-in (parakeet/kokoro/…); warn so
# an adopter who picked a colliding key for, say, a second vLLM sees
# why no tile appeared instead of a silent no-op.
log.warning("custom service %r collides with a built-in name; ignoring", key)
if not key or key in out:
continue
out[key] = ServiceDef(
name=key,
@@ -122,9 +102,7 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
container=entry.get("container", key),
port=int(entry.get("port", 0)),
)
# Drop services the deployment has switched off (DISABLED_SERVICES) so they
# show no tile and are never probed/auto-restarted.
return {k: v for k, v in out.items() if k not in s.disabled_services}
return out
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
-25
View File
@@ -28,12 +28,6 @@ _IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
# Docker container / volume name (Docker's own rule).
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
# Absolute filesystem path to a local model directory on a Spark. Conservative
# charset (letters, digits, and safe path punctuation) with a required leading
# '/', so it carries no shell metacharacters and no whitespace. Traversal ('.'
# and '..' segments) is rejected separately in validate_local_path.
_LOCAL_PATH_RE = re.compile(r"^/[A-Za-z0-9._+/-]+$")
def validate_repo(repo: str) -> str:
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
@@ -56,25 +50,6 @@ def validate_container(name: str) -> str:
return name
def validate_local_path(path: str) -> str:
"""Return `path` if it is a safe absolute model directory path; else ValueError.
For locally fine-tuned models served by directory (not an HF repo). Requires
an absolute path, a metacharacter-free charset, and no '.'/'..' segments so a
caller cannot traverse out of an intended models directory. The `quote_arg`
sink still quotes it in depth — this is the boundary check.
"""
p = path or ""
if len(p) > 512 or not _LOCAL_PATH_RE.fullmatch(p):
raise ValueError(
f"invalid local model path (expected an absolute path, no spaces or "
f"shell metacharacters): {path!r}"
)
if any(seg in (".", "..") for seg in p.split("/")):
raise ValueError(f"local model path must not contain '.' or '..' segments: {path!r}")
return p
def quote_arg(value: object) -> str:
"""shlex.quote a single token for safe embedding in a shell command string."""
return shlex.quote(str(value))
+9 -258
View File
@@ -13,7 +13,6 @@ const state = {
swap_progress: 0, // 01
services: {},
service_action_in_flight: null, // e.g. "parakeet:restart"
mb_update_in_flight: false, // matrix-bridge update job running
hardware: {},
config: {},
configured: true,
@@ -60,7 +59,6 @@ function renderCards() {
? `<div class="desc">${escapeHtml(m.description)}</div>`
: '';
const customPill = m.custom ? `<span class="tag custom-pill">custom</span>` : '';
const localPill = m.local_path ? `<span class="tag local-pill" title="Served from a directory on the Spark, not Hugging Face">local</span>` : '';
// Disk-presence pill + trash button. Until /api/models/disk-status comes back,
// we don't know — render a neutral placeholder.
const disk = state.disk_status[key];
@@ -74,10 +72,8 @@ function renderCards() {
}
}
// Trash button — hidden if not on disk; disabled (with tooltip) if currently loaded.
// Never offered for local models: their directory is hand-placed training output,
// not a re-downloadable HF cache (the server refuses the delete too).
let trashBtn = '';
if (state.disk_status_loaded && disk && disk.on_disk && !m.local_path) {
if (state.disk_status_loaded && disk && disk.on_disk) {
const disabled = isActive || isSwapping;
const tip = isActive
? 'Currently loaded — switch to another model first'
@@ -95,9 +91,6 @@ function renderCards() {
primaryBtn = `<button class="btn" disabled>Current</button>`;
} else if (isOnDisk) {
primaryBtn = `<button class="btn primary" data-swap-key="${key}" ${isSwapping ? 'disabled' : ''}>Switch to this</button>`;
} else if (m.local_path) {
// A local model can't be "downloaded" — its directory has to exist on the Spark.
primaryBtn = `<button class="btn" disabled title="Directory not found on the Spark — create it there, then refresh">Not found on Spark</button>`;
} else {
const tip = dlInFlight ? 'A download is already in progress' : 'Download weights to the Spark(s)';
primaryBtn = `<button class="btn info" data-download-key="${key}" title="${escapeHtml(tip)}" ${dlInFlight ? 'disabled' : ''}>Download</button>`;
@@ -108,15 +101,12 @@ function renderCards() {
<span class="tag mode-${m.mode}">${m.mode}</span>
<span class="tag">${m.size_gb} GB</span>
${customPill}
${localPill}
${diskPill}
${(m.capabilities || []).map(c => `<span class="tag cap">${escapeHtml(c)}</span>`).join('')}
</div>
${desc}
<div class="muted small repo">
${m.local_path
? `<span class="local-path" title="Local model directory on the Spark">${escapeHtml(m.local_path)}</span>`
: `<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>`}
<a href="https://huggingface.co/${encodeURIComponent(m.repo)}" target="_blank" rel="noopener" title="View on Hugging Face">${escapeHtml(m.repo)} <span class="hf-icon">↗</span></a>
</div>
<div class="spacer"></div>
<div class="card-actions">
@@ -315,32 +305,6 @@ async function wakeSpark(name) {
}
}
// Generate-if-missing + copy this Spark's OUTBOUND ssh public key (the key the
// Spark uses to log in to other machines, e.g. the Mac). Distinct from the
// package's own key in the StartOS "Show Public Key" action.
async function copySparkSshKey(name, btn) {
if (btn) btn.disabled = true;
try {
const r = await fetchJSON(`/api/spark/${name}/ssh-key`, { method: 'POST' });
// Best-effort clipboard copy; on plain-HTTP this no-ops, but the dialog
// below always shows the key for manual selection.
await copyText(r.pubkey, btn);
const label = r.host ? `${name} (${r.host})` : name;
el('#sshkey-title').textContent = `${name} — SSH public key`;
el('#sshkey-intro').textContent = r.created
? `Generated a new SSH key on ${label} and copied it to your clipboard. This is the key ${name} uses to log in to OTHER machines.`
: `${label} already had an SSH key; copied its public key to your clipboard. This is the key ${name} uses to log in to OTHER machines.`;
el('#sshkey-value').textContent = r.pubkey;
el('#sshkey-install').textContent =
`mkdir -p ~/.ssh && echo '${r.pubkey}' >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys`;
el('#sshkey-dialog').showModal();
} catch (e) {
alert(`Couldn't get the SSH key for ${name}: ${e.message}`);
} finally {
if (btn) btn.disabled = false;
}
}
function renderHardware() {
const panel = el('#hardware-panel');
const grid = el('#hardware-grid');
@@ -394,21 +358,11 @@ function renderHardware() {
if (s.gpu_temp_c != null) gpuExtras.push(`${s.gpu_temp_c}°C`);
if (s.gpu_power_w != null) gpuExtras.push(`${s.gpu_power_w.toFixed(0)}W`);
const gpuExtrasStr = gpuExtras.length ? ` · ${gpuExtras.join(' · ')}` : '';
// Read-only WireGuard badge: shown only when the Spark has a wg interface up.
// "VPN <ip>" means it's a peer on that tunnel (reachable off-LAN when the
// tunnel is up); it reflects interface presence, not live peer reachability.
const wgIp = s.wg_addr ? String(s.wg_addr).split('/')[0] : '';
const wgBadge = s.wg_iface
? ` · <span class="wg-badge" title="On WireGuard tunnel '${escapeHtml(s.wg_iface)}'${wgIp ? ' as ' + escapeHtml(wgIp) : ''} — reachable off-LAN while the tunnel is up">VPN${wgIp ? ' ' + escapeHtml(wgIp) : ''}</span>`
: '';
card.className = 'hw-card';
card.innerHTML = `
<div class="head">
<span class="name">${escapeHtml(s.hostname || key)}</span>
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}${wgBadge}</span>
<button class="icon-btn ssh-key-btn" data-ssh-key="${escapeHtml(key)}" title="Copy this Spark's SSH public key (creates one if it doesn't have one) — e.g. to let it log in to your Mac" aria-label="Copy SSH public key">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
</button>
<span class="meta">${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}</span>
</div>
<div class="hw-metric">
<span class="label">CPU</span>
@@ -448,13 +402,8 @@ function classifyService(s) {
if (s.docker_state === 'missing') return 'missing';
if (s.docker_state === 'restarting') return 'unhealthy';
if (s.docker_state === 'exited') return 'unhealthy';
if (s.docker_state === 'running') {
// http_ready === false means an HTTP probe is expected but failing → still
// warming up. null means the service has no HTTP surface (e.g. the bot), so
// a running container is simply healthy.
if (s.http_ready === false) return 'starting';
return 'running';
}
if (s.docker_state === 'running' && !s.http_ready) return 'starting';
if (s.docker_state === 'running' && s.http_ready) return 'running';
return s.docker_state || 'unknown';
}
@@ -486,11 +435,6 @@ async function renderServices() {
grid.innerHTML = '';
for (const [name, s] of entries) {
const cls = classifyService(s);
const isBot = s.kind === 'bot';
// The bot tile is opt-in: it only belongs to deployments that actually run
// matrix-bridge. When the container is absent (missing) or the host isn't
// configured, hide the tile entirely rather than show a stray red card.
if (isBot && (cls === 'missing' || cls === 'unconfigured')) continue;
const card = document.createElement('div');
card.className = `service-card ${cls}`;
const inFlight = state.service_action_in_flight && state.service_action_in_flight.startsWith(name + ':');
@@ -503,7 +447,7 @@ async function renderServices() {
return false;
};
const copyIcon = `<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>`;
const hostStr = s.host ? (s.port ? `${s.host}:${s.port}` : s.host) : '';
const hostStr = s.host ? `${s.host}:${s.port}` : '';
const hostRow = s.host
? `<div class="row"><span class="k">Host</span><span class="v copyable" data-copy-self title="Click to copy">${escapeHtml(hostStr)}</span><button class="icon-btn" data-copy-text="${escapeHtml(hostStr)}" title="Copy host" aria-label="Copy">${copyIcon}</button></div>`
: `<div class="row"><span class="k">Host</span><span class="v muted-v">not configured</span></div>`;
@@ -557,11 +501,9 @@ async function renderServices() {
${restartsRow}
${deepRow}
<div class="service-actions">
${isBot ? `<button class="btn primary" data-mb-update title="Pull latest code, rebuild, and recreate the bot" ${inFlight || state.mb_update_in_flight ? 'disabled' : ''}>Update</button>` : ''}
<button class="btn" data-svc-action="${name}:start" ${disable('start') ? 'disabled' : ''}>Start</button>
<button class="btn" data-svc-action="${name}:restart" ${disable('restart') ? 'disabled' : ''}>Restart</button>
<button class="btn danger" data-svc-action="${name}:stop" ${disable('stop') ? 'disabled' : ''}>Stop</button>
${isBot ? `<button class="btn" data-mb-logs title="Show the last 100 log lines">View logs</button>` : ''}
</div>
`;
grid.appendChild(card);
@@ -569,10 +511,6 @@ async function renderServices() {
for (const btn of grid.querySelectorAll('.btn[data-svc-action]')) {
btn.addEventListener('click', () => onServiceAction(btn.dataset.svcAction));
}
const mbUpdateBtn = grid.querySelector('[data-mb-update]');
if (mbUpdateBtn) mbUpdateBtn.addEventListener('click', onMatrixBridgeUpdate);
const mbLogsBtn = grid.querySelector('[data-mb-logs]');
if (mbLogsBtn) mbLogsBtn.addEventListener('click', openMatrixBridgeLogs);
for (const btn of grid.querySelectorAll('[data-dh-run]')) {
btn.addEventListener('click', () => onDeepHealthRun(btn.dataset.dhRun, btn));
}
@@ -751,118 +689,6 @@ async function onServiceAction(key) {
}
}
// ===================== matrix-bridge bot (update + logs) =====================
const mbState = { job_id: null, eventsource: null, timer: null, started_at: null };
function mbTimerStart(at) {
mbState.started_at = at;
if (mbState.timer) clearInterval(mbState.timer);
const tick = () => {
if (!mbState.started_at) return;
const sec = Math.max(0, Math.floor((Date.now() - mbState.started_at) / 1000));
el('#mb-update-elapsed').textContent = `${Math.floor(sec / 60)}:${(sec % 60).toString().padStart(2, '0')}`;
};
tick();
mbState.timer = setInterval(tick, 500);
}
async function onMatrixBridgeUpdate() {
if (state.mb_update_in_flight) return;
if (!confirm('Update the matrix-bridge bot?\n\nThis pulls the latest code, rebuilds the container image, and recreates the container. The first build after a base-image change can take several minutes. The bot is briefly offline while it restarts.')) return;
state.mb_update_in_flight = true;
renderServices();
try {
const r = await fetchJSON('/api/matrix-bridge/update', { method: 'POST' });
attachMbUpdateProgress(r.job_id);
} catch (e) {
state.mb_update_in_flight = false;
renderServices();
alert('Update failed to start: ' + e.message);
}
}
async function attachMbUpdateProgress(jobId) {
mbState.job_id = jobId;
el('#mb-update-log').textContent = '';
el('#mb-update-title').textContent = 'Updating matrix-bridge…';
el('#mb-update-phase').textContent = 'Starting…';
el('#mb-update-dialog').showModal();
try {
const snap = await fetchJSON(`/api/matrix-bridge/update/${jobId}`);
mbTimerStart(Date.parse(snap.started_at));
el('#mb-update-phase').textContent = snap.phase || 'Working…';
el('#mb-update-log').textContent = (snap.lines || []).join('\n');
if (snap.returncode !== null) { onMbUpdateDone(snap); return; }
} catch { mbTimerStart(Date.now()); }
const es = new EventSource(`/api/matrix-bridge/update/${jobId}/stream`);
mbState.eventsource = es;
es.onmessage = ev => {
try {
const d = JSON.parse(ev.data);
if (d.line !== undefined) {
const log = el('#mb-update-log');
log.textContent += d.line + '\n';
log.scrollTop = log.scrollHeight;
}
} catch {}
};
es.addEventListener('phase', ev => {
try { el('#mb-update-phase').textContent = JSON.parse(ev.data).phase; } catch {}
});
es.addEventListener('done', ev => {
let d = {}; try { d = JSON.parse(ev.data); } catch {}
onMbUpdateDone(d);
});
es.onerror = () => {
// Don't leave the Update button wedged-disabled on a dropped stream. The
// job keeps running server-side; re-clicking Update returns a clean 409.
es.close();
mbState.eventsource = null;
state.mb_update_in_flight = false;
el('#mb-update-phase').textContent = 'Lost connection to the update stream — reopen or check logs.';
renderServices();
};
}
function onMbUpdateDone(d) {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
if (d.state === 'failed') {
el('#mb-update-title').textContent = `Update failed (rc=${d.returncode})`;
el('#mb-update-phase').textContent = 'Failed — see the log above.';
} else {
el('#mb-update-title').textContent = 'Update complete';
el('#mb-update-phase').textContent = 'Done ✓';
}
// Refresh the tile's badge.
(async () => { try { state.services = await fetchJSON('/api/services'); } catch {} renderServices(); })();
}
async function openMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
el('#mb-logs-title').textContent = 'matrix-bridge logs';
pre.textContent = 'Loading…';
el('#mb-logs-dialog').showModal();
await loadMatrixBridgeLogs();
}
async function loadMatrixBridgeLogs() {
const pre = el('#mb-logs-pre');
const btn = el('#mb-logs-refresh');
if (btn) btn.disabled = true;
try {
const r = await fetchJSON('/api/matrix-bridge/logs?tail=100');
pre.textContent = r.output || '(no output)';
pre.scrollTop = pre.scrollHeight;
} catch (e) {
pre.textContent = 'Could not read logs: ' + e.message;
} finally {
if (btn) btn.disabled = false;
}
}
function renderEndpoint(status) {
const v = status.vllm || {};
const panel = el('#endpoint-panel');
@@ -932,10 +758,6 @@ function renderHealth(status) {
function setDot(id, ok, payload) {
const item = el(id);
if (!item) return;
// A service switched off via DISABLED_SERVICES isn't part of this
// deployment — hide its indicator entirely rather than show it as down.
if (payload && payload.disabled) { item.classList.add('hidden'); return; }
item.classList.remove('hidden');
const dot = item.querySelector('.dot');
dot.classList.remove('ok', 'bad', 'warn');
if (ok === true) dot.classList.add('ok');
@@ -1684,60 +1506,6 @@ function setupAdvancedDialog() {
el('#adv-gmu').addEventListener('input', (e) => { el('#adv-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
}
function openLocalModelDialog() {
const dlg = el('#local-model-dialog');
el('#lm-key').value = '';
el('#lm-name').value = '';
el('#lm-path').value = '';
el('#lm-chat').value = '';
el('#lm-size').value = '';
el('#lm-mode').value = 'solo';
el('#lm-desc').value = '';
el('#lm-mml').value = 32768;
el('#lm-gmu').value = 0.85;
el('#lm-gmu-out').value = '0.85';
el('#lm-fst').checked = true;
el('#lm-pcache').checked = true;
el('#lm-fp8').checked = true;
dlg.showModal();
}
function setupLocalModelDialog() {
el('#lm-cancel').addEventListener('click', () => el('#local-model-dialog').close());
el('#lm-gmu').addEventListener('input', (e) => { el('#lm-gmu-out').value = parseFloat(e.target.value).toFixed(2); });
el('#local-model-form').addEventListener('submit', async (e) => {
e.preventDefault();
const chat = el('#lm-chat').value.trim();
const body = {
key: el('#lm-key').value.trim(),
display_name: el('#lm-name').value.trim(),
local_path: el('#lm-path').value.trim(),
size_gb: parseFloat(el('#lm-size').value) || 0,
mode: el('#lm-mode').value,
description: el('#lm-desc').value.trim() || null,
// A fine-tune's chat template (if any) rides along as a launch flag.
vllm_args: chat ? [`--chat-template=${chat}`] : [],
knobs: {
max_model_len: parseInt(el('#lm-mml').value, 10) || 32768,
gpu_memory_utilization: parseFloat(el('#lm-gmu').value),
fastsafetensors: el('#lm-fst').checked,
prefix_caching: el('#lm-pcache').checked,
kv_cache_dtype: el('#lm-fp8').checked ? 'fp8' : 'auto',
},
};
try {
await fetchJSON('/api/models', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(body),
});
el('#local-model-dialog').close();
await loadModels();
pollStatus();
} catch (e) { alert('Add local model failed: ' + e.message); }
});
}
// ===================== NIM installer =====================
const nimState = {
@@ -2079,32 +1847,15 @@ async function init() {
el('#nim-cancel').addEventListener('click', () => el('#nim-dialog').close());
el('#nim-form').addEventListener('submit', submitNim);
el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close());
el('#mb-update-close').addEventListener('click', () => el('#mb-update-dialog').close());
// Dismissing the modal (Close or Esc) stops streaming; the job runs on
// server-side and re-clicking Update returns a 409 if still in progress.
el('#mb-update-dialog').addEventListener('close', () => {
if (mbState.eventsource) { mbState.eventsource.close(); mbState.eventsource = null; }
if (mbState.timer) { clearInterval(mbState.timer); mbState.timer = null; }
state.mb_update_in_flight = false;
renderServices();
});
el('#mb-logs-close').addEventListener('click', () => el('#mb-logs-dialog').close());
el('#mb-logs-refresh').addEventListener('click', loadMatrixBridgeLogs);
el('#open-connectivity').addEventListener('click', openConnectivityDialog);
el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close());
// Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on
// reachable ones) are rendered dynamically, so delegate from the grid.
// Wake-on-LAN buttons live on unreachable hardware cards; delegate.
el('#hardware-grid').addEventListener('click', (e) => {
const wbtn = e.target.closest('[data-wake]');
if (wbtn) { wakeSpark(wbtn.dataset.wake); return; }
const kbtn = e.target.closest('[data-ssh-key]');
if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; }
const btn = e.target.closest('[data-wake]');
if (btn) wakeSpark(btn.dataset.wake);
});
el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close());
el('#open-local').addEventListener('click', openLocalModelDialog);
setupCatalogDialog();
setupAdvancedDialog();
setupLocalModelDialog();
// Open WebUI link from /api/config
try {
state.config = await fetchJSON('/api/config');
-81
View File
@@ -164,37 +164,6 @@
</div>
</form>
</dialog>
<dialog id="mb-update-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-update-title">Updating matrix-bridge…</h3>
<div class="phase-row">
<div class="phase" id="mb-update-phase">Starting…</div>
<span class="spacer"></span>
<span class="timer" id="mb-update-elapsed">0:00</span>
</div>
<details open>
<summary class="muted small">Log</summary>
<pre id="mb-update-log" class="log"></pre>
</details>
<div class="modal-actions">
<button type="button" id="mb-update-close" class="btn">Close</button>
</div>
</form>
</dialog>
<dialog id="mb-logs-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="mb-logs-title">matrix-bridge logs</h3>
<p class="muted small">Last 100 lines from <code>docker logs</code> on the Spark.</p>
<pre id="mb-logs-pre" class="log"></pre>
<div class="modal-actions">
<button type="button" id="mb-logs-refresh" class="btn">Refresh</button>
<span class="spacer"></span>
<button type="button" id="mb-logs-close" class="btn">Close</button>
</div>
</form>
</dialog>
</section>
<section id="speech-models-panel" class="speech-models hidden">
@@ -229,7 +198,6 @@
<div class="section-header">
<h2 class="section-title">LLM swap</h2>
<button id="open-download" class="btn small-btn">+ Download a new model</button>
<button id="open-local" class="btn small-btn">+ Add local model</button>
</div>
<dialog id="catalog-dialog" class="modal">
@@ -262,37 +230,6 @@
</form>
</dialog>
<dialog id="local-model-dialog" class="modal">
<form method="dialog" class="modal-form" id="local-model-form">
<h3>Add a local / fine-tuned model</h3>
<p class="muted small">For a model that lives as a directory on a Spark (e.g. a fine-tune), not a Hugging Face repo. The directory is bind-mounted into the vLLM container at the same path when you swap to it. It must already exist on the Spark.</p>
<label class="modal-row"><span>Key (URL-safe id)</span><input type="text" id="lm-key" required pattern="[a-zA-Z0-9_-]+"></label>
<label class="modal-row"><span>Display name</span><input type="text" id="lm-name" required></label>
<label class="modal-row"><span>Model directory (absolute path on the Spark)</span><input type="text" id="lm-path" required placeholder="e.g. /home/you/models/my-finetune"></label>
<label class="modal-row"><span>Chat template path (optional)</span><input type="text" id="lm-chat" placeholder="e.g. /home/you/models/my-finetune/chat_template.jinja"></label>
<label class="modal-row"><span>Size (GB)</span><input type="number" id="lm-size" step="0.1" min="0"></label>
<label class="modal-row"><span>Mode</span>
<select id="lm-mode">
<option value="solo">solo (Spark 1 only)</option>
<option value="cluster">cluster (both Sparks via Ray)</option>
</select>
</label>
<label class="modal-row"><span>Description (optional)</span><textarea id="lm-desc" rows="3"></textarea></label>
<fieldset class="modal-fieldset">
<legend>Default launch knobs</legend>
<label class="modal-row"><span>Max context (tokens)</span><input type="number" id="lm-mml" step="1024" min="1024" value="32768"></label>
<label class="modal-row"><span>GPU memory %</span><input type="range" id="lm-gmu" min="0.5" max="0.95" step="0.01" value="0.85"> <output id="lm-gmu-out">0.85</output></label>
<label class="modal-row inline"><input type="checkbox" id="lm-fst" checked> Fast safetensors loading</label>
<label class="modal-row inline"><input type="checkbox" id="lm-pcache" checked> Prefix caching</label>
<label class="modal-row inline"><input type="checkbox" id="lm-fp8" checked> FP8 KV cache</label>
</fieldset>
<div class="modal-actions">
<button type="button" id="lm-cancel" class="btn">Cancel</button>
<button type="submit" class="btn primary">Add local model</button>
</div>
</form>
</dialog>
<dialog id="disk-delete-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3>Delete model weights from disk?</h3>
@@ -307,24 +244,6 @@
</form>
</dialog>
<dialog id="sshkey-dialog" class="modal">
<form method="dialog" class="modal-form">
<h3 id="sshkey-title">SSH public key</h3>
<p id="sshkey-intro" class="muted small"></p>
<div class="sshkey-row">
<pre id="sshkey-value" class="snippet copyable" data-copy-self title="Click to copy"></pre>
<button type="button" class="icon-btn" data-copy="#sshkey-value" title="Copy public key" aria-label="Copy public key">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
</button>
</div>
<p class="muted small">To let this Spark log in to another machine (e.g. your Mac), run this in a terminal <em>on that machine</em>:</p>
<pre id="sshkey-install" class="snippet copyable" data-copy-self title="Click to copy"></pre>
<div class="modal-actions">
<button type="button" id="sshkey-close" class="btn">Close</button>
</div>
</form>
</dialog>
<dialog id="advanced-dialog" class="modal">
<form method="dialog" class="modal-form" id="advanced-form">
<h3 id="adv-title">Advanced settings</h3>
+2 -16
View File
@@ -374,12 +374,6 @@ main {
}
.hw-card .head .name { font-weight: 600; font-size: 15px; }
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
/* WireGuard "VPN <ip>" badge in the meta line — accent (green) = on a tunnel. */
.hw-card .head .meta .wg-badge { color: var(--accent); font-weight: 600; cursor: help; }
/* Copy-this-Spark's-ssh-key button pins to the top-right corner; meta keeps
its margin-left:auto so name/meta/button read left→right→corner. */
.hw-card .head .ssh-key-btn { align-self: flex-start; padding: 3px 6px; }
.hw-card .head .ssh-key-btn svg { width: 13px; height: 13px; }
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
.hw-card.unreachable .name { color: var(--error); }
.hw-card.unreachable ol { color: var(--muted); }
@@ -393,10 +387,6 @@ main {
}
.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
/* SSH-key dialog: key line beside its copy button; long key wraps rather than scrolls. */
.sshkey-row { display: flex; align-items: flex-start; gap: 8px; }
.sshkey-row .snippet { flex: 1; margin: 0; white-space: pre-wrap; word-break: break-all; }
#sshkey-install { white-space: pre-wrap; word-break: break-all; }
.connectivity-content {
max-height: 360px;
@@ -526,12 +516,10 @@ main {
#dl-log-details { margin-top: 12px; }
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
/* ===== NIM install + matrix-bridge dialogs ===== */
/* ===== NIM install dialog ===== */
.modal#nim-dialog,
.modal#nim-progress-dialog,
.modal#mb-update-dialog,
.modal#mb-logs-dialog { max-width: 640px; }
.modal#nim-progress-dialog { max-width: 640px; }
.nim-grid {
display: grid;
gap: 8px;
@@ -694,7 +682,6 @@ main {
.card .repo a { color: inherit; text-decoration: none; }
.card .repo a:hover { color: var(--info); text-decoration: underline; }
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
.card .repo .local-path { font-family: var(--mono, ui-monospace, monospace); opacity: 0.85; }
.tag {
background: var(--surface-2);
border: 1px solid var(--border);
@@ -739,7 +726,6 @@ main {
.card .adv-btn,
.card .test-btn { padding: 8px 12px; font-size: 12px; }
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
.card .local-pill { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
+1 -2
View File
@@ -7,7 +7,6 @@ from typing import Optional
from .config import Settings
from .models import Catalog, build_launch_command
from .shellsafe import quote_arg
from .ssh import ssh_run, ssh_stream, StreamHandle
@@ -113,7 +112,7 @@ class SwapManager:
# Step 3: tail logs until the ready marker (or timeout)
job.state = "tailing"
tail_cmd = f"docker logs -f --tail 50 {quote_arg(s.vllm_container)}"
tail_cmd = "docker logs -f --tail 50 vllm_node"
job.append(f"$ {tail_cmd}")
timeout = max(model.expected_ready_seconds * 2, 600)
handle = StreamHandle()
+1 -2
View File
@@ -22,7 +22,6 @@ from typing import Any
from .config import Settings
from .models import Catalog, build_launch_command
from .shellsafe import quote_arg
from .ssh import ssh_run
@@ -115,7 +114,7 @@ async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dic
# Pipe the JSON args list to a here-doc Python invocation. The validator
# reads from stdin to avoid shell-escaping the args themselves.
cmd = (
f"echo '{payload}' | docker exec -i {quote_arg(settings.vllm_container)} python3 -c "
f"echo '{payload}' | docker exec -i vllm_node python3 -c "
+ shlex.quote(_VALIDATOR_SCRIPT)
)
-6
View File
@@ -12,12 +12,6 @@ dependencies = [
"python-multipart>=0.0.9",
]
[project.optional-dependencies]
dev = ["pytest>=8"]
[tool.pytest.ini_options]
testpaths = ["tests"]
[build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.build_meta"
-17
View File
@@ -1,17 +0,0 @@
"""Shared pytest setup.
These suites are pure/offline — they exercise pure functions and never touch the
Sparks, /data, or the network. We still pin the env vars the app modules expect
(documented in docs/guides/fastapi-image.md) to tmp paths so importing them can
never write to the container-only /data path.
"""
import os
import sys
from pathlib import Path
# Let `import app...` resolve whether or not the package is pip-installed.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
os.environ.setdefault("REDACTION_MAP_DB", "/tmp/spark_control_test_maps.db")
os.environ.setdefault("CONNECTIVITY_LOG", "/tmp/spark_control_test_connectivity.json")
os.environ.setdefault("MODELS_OVERRIDES", "/tmp/spark_control_test_overrides.yaml")
-69
View File
@@ -1,69 +0,0 @@
"""_merge_words_with_speakers + _assign_speaker_to_word: the transcript/diarizer
merge that turns Parakeet words + Sortformer turns into speaker-labelled blocks.
Pure functions, no cluster — this is the core of transcribe-with-speakers.
"""
from app.audio_proxy import _assign_speaker_to_word, _merge_words_with_speakers
def _w(start, end, text):
return {"start": start, "end": end, "text": text}
def _t(start, end, speaker):
return {"start_s": start, "end_s": end, "speaker": speaker}
# ---- _assign_speaker_to_word ----
def test_assign_by_midpoint_containment():
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
assert _assign_speaker_to_word(2.4, 2.8, turns) == "Speaker_1"
def test_assign_falls_back_to_max_overlap_when_midpoint_outside():
# midpoint 5.0 is in no turn; word span overlaps Speaker_0 more than Speaker_1.
turns = [_t(0.0, 4.9, "Speaker_0"), _t(6.0, 8.0, "Speaker_1")]
assert _assign_speaker_to_word(4.0, 6.0, turns) == "Speaker_0"
def test_assign_unknown_when_no_overlap():
turns = [_t(0.0, 1.0, "Speaker_0")]
assert _assign_speaker_to_word(10.0, 11.0, turns) == "Speaker_unknown"
# ---- _merge_words_with_speakers ----
def test_empty_words_returns_empty():
assert _merge_words_with_speakers([], [_t(0, 1, "Speaker_0")]) == []
def test_consecutive_same_speaker_words_join_into_one_block():
words = [_w(0.0, 0.5, "good"), _w(0.5, 1.0, "morning")]
turns = [_t(0.0, 2.0, "Speaker_0")]
blocks = _merge_words_with_speakers(words, turns)
assert blocks == [
{"start_ms": 0, "end_ms": 1000, "speaker": "Speaker_0", "text": "good morning"}
]
def test_speaker_change_splits_blocks():
words = [_w(0.0, 1.0, "hi"), _w(2.1, 3.0, "hello")]
turns = [_t(0.0, 2.0, "Speaker_0"), _t(2.0, 4.0, "Speaker_1")]
blocks = _merge_words_with_speakers(words, turns)
assert [b["speaker"] for b in blocks] == ["Speaker_0", "Speaker_1"]
assert [b["text"] for b in blocks] == ["hi", "hello"]
def test_long_silence_breaks_block_for_same_speaker():
# >1.5s gap between two words of the same speaker forces a new block.
words = [_w(0.0, 0.5, "one"), _w(3.0, 3.5, "two")]
turns = [_t(0.0, 4.0, "Speaker_0")]
blocks = _merge_words_with_speakers(words, turns)
assert len(blocks) == 2
assert [b["text"] for b in blocks] == ["one", "two"]
def test_punctuation_token_joins_without_leading_space():
words = [_w(0.0, 0.5, "hello"), _w(0.5, 0.7, ".")]
turns = [_t(0.0, 2.0, "Speaker_0")]
assert _merge_words_with_speakers(words, turns)[0]["text"] == "hello."
-148
View File
@@ -1,148 +0,0 @@
"""build_launch_command: argument assembly + the shell-injection invariant.
The security-critical property is that every user-controllable value (repo,
vllm_args, knobs) is shlex-quoted at the sink, so `shlex.split` cleanly reverses
the command back into the exact token list. The vLLM pre-flight validator
(validate.py) depends on this round-trip — these tests lock it in.
"""
import shlex
import pytest
from pydantic import ValidationError
from app.models import Defaults, ModelDef, build_launch_command
DEFAULTS = Defaults(port=8888, host="0.0.0.0")
def _model(**kw) -> ModelDef:
base = dict(display_name="X", repo="org/name", size_gb=1.0, mode="solo")
base.update(kw)
return ModelDef(**base)
def test_solo_model_emits_solo_flag_and_ordered_args():
cmd = build_launch_command("k", _model(vllm_args=["--max-model-len=1000"]), DEFAULTS)
assert cmd == (
"./launch-cluster.sh --solo -d exec vllm serve org/name "
"--port=8888 --host=0.0.0.0 --max-model-len=1000"
)
def test_cluster_model_omits_solo_flag():
cmd = build_launch_command("k", _model(mode="cluster", vllm_args=["-tp=2"]), DEFAULTS)
assert " --solo " not in cmd
assert cmd.startswith("./launch-cluster.sh -d exec vllm serve org/name")
def test_knob_overrides_matching_bundled_flag():
# bundled arg sets max-model-len; the knob must win (single occurrence).
m = _model(vllm_args=["--max-model-len=1000"], knobs={"max_model_len": 65536})
cmd = build_launch_command("k", m, DEFAULTS)
assert "--max-model-len=65536" in cmd
assert "--max-model-len=1000" not in cmd
def test_repo_with_shell_metacharacters_is_quoted_not_executed():
# build_launch_command quotes even a hostile repo (validate_repo guards the
# API boundary; this proves the sink itself is safe in depth).
evil = "org/name; rm -rf ~ #"
cmd = build_launch_command("k", _model(repo=evil), DEFAULTS)
# The raw metacharacters must not appear unquoted...
assert "; rm -rf" not in cmd.replace(shlex.quote(evil), "")
# ...and shlex.split must recover the repo as one literal token.
tokens = shlex.split(cmd)
assert evil in tokens
def test_command_string_round_trips_through_shlex_split():
# The invariant validate.py relies on: every arg survives quote -> split intact.
args = ["--max-model-len=32768", "--load-format=fastsafetensors", "--note=a b c"]
cmd = build_launch_command("k", _model(vllm_args=args), DEFAULTS)
tokens = shlex.split(cmd)
for a in args:
assert a in tokens
def test_injection_via_vllm_arg_stays_literal():
payload = "--foo=$(touch /tmp/pwned)"
cmd = build_launch_command("k", _model(vllm_args=[payload]), DEFAULTS)
assert payload in shlex.split(cmd) # preserved as one inert token
# ---- local / fine-tuned models (served by directory, not HF repo) ----
def test_local_model_bind_mounts_dir_and_serves_the_path():
m = _model(repo="", local_path="/home/u/models/ft-v2", vllm_args=["--max-model-len=2048"])
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
# The launch script's hook bind-mounts the host dir at the SAME container path.
assert tokens[0] == (
"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v /home/u/models/ft-v2:/home/u/models/ft-v2"
)
# vLLM is pointed at the directory, not an HF repo id.
i = tokens.index("serve")
assert tokens[i + 1] == "/home/u/models/ft-v2"
assert "--max-model-len=2048" in tokens
def test_local_model_chat_template_arg_survives_round_trip():
m = _model(
repo="",
local_path="/m/ft",
vllm_args=["--chat-template=/m/ft/chat_template.jinja"],
)
cmd = build_launch_command("k", m, DEFAULTS)
assert "--chat-template=/m/ft/chat_template.jinja" in shlex.split(cmd)
def test_local_path_with_metacharacters_is_quoted_not_executed():
# The validator rejects a hostile path at the boundary; bypass it with
# model_construct to prove the quote_arg sink is safe in depth even if a bad
# value somehow reaches build_launch_command.
evil = "/m/ft; rm -rf ~"
m = ModelDef.model_construct(
display_name="X", repo="", local_path=evil, size_gb=1.0, mode="solo",
vllm_args=[], knobs=None, custom=False, capabilities=[],
expected_ready_seconds=300, description=None,
)
cmd = build_launch_command("k", m, DEFAULTS)
tokens = shlex.split(cmd)
i = tokens.index("serve")
assert tokens[i + 1] == evil # recovered as one literal token, not executed
assert tokens[0] == f"VLLM_SPARK_EXTRA_DOCKER_ARGS=-v {evil}:{evil}"
def test_model_requires_exactly_one_source():
with pytest.raises(ValidationError):
ModelDef(display_name="x", size_gb=1, mode="solo") # neither repo nor local_path
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="o/n", local_path="/p", size_gb=1, mode="solo") # both
def test_local_model_rejects_chat_template_outside_dir():
# Only local_path is mounted into the container, so a chat-template elsewhere
# would silently 404 inside vLLM — reject it up front.
with pytest.raises(ValidationError):
ModelDef(
display_name="x", repo="", local_path="/m/ft", size_gb=1, mode="solo",
vllm_args=["--chat-template=/other/dir/t.jinja"],
)
def test_invalid_local_path_rejected_by_model():
with pytest.raises(ValidationError):
ModelDef(display_name="x", repo="", local_path="/m/../etc", size_gb=1, mode="solo")
def test_merge_overrides_loads_local_and_skips_invalid(monkeypatch):
# YAML/override-added local models get the same validation as the API; a single
# bad entry is skipped (logged) rather than breaking the whole catalog load.
from app import models as M
monkeypatch.setattr(M, "load_overrides", lambda: {"knobs": {}, "custom": [
{"key": "good", "display_name": "G", "local_path": "/home/u/m", "size_gb": 1, "mode": "solo"},
{"key": "bad", "display_name": "B", "local_path": "/home/u/../etc", "size_gb": 1, "mode": "solo"},
]})
cat = M._merge_overrides(M.Catalog(models={}))
assert cat.models["good"].is_local and cat.models["good"].source == "/home/u/m"
assert "bad" not in cat.models # traversal path skipped, not catalog-fatal
-47
View File
@@ -1,47 +0,0 @@
"""build_update_command: the matrix-bridge update one-liner.
Pure string assembly, no cluster. Locks in the contract from
docs/spark-control-integration.md (matrix-bridge repo): fetch, hard-reset to the
release branch, then rebuild/recreate via docker compose — chained with `&&` so
any failure (e.g. Gitea unreachable) aborts before the build and surfaces a
non-zero exit. The clone dir must stay unquoted so a `~` expands server-side.
"""
from app.matrix_bridge import build_update_command, _phase_for
def test_command_is_the_contract_chain():
cmd = build_update_command("~/matrix-bridge", "master")
assert cmd == (
"cd ~/matrix-bridge && "
"git fetch origin && "
"git reset --hard origin/master && "
"docker compose up -d --build"
)
def test_fail_loud_chaining():
# Every step is &&-chained: a failed fetch never reaches the build.
cmd = build_update_command("~/matrix-bridge", "master")
assert "; " not in cmd
assert cmd.count(" && ") == 3
assert cmd.index("git fetch") < cmd.index("git reset") < cmd.index("docker compose")
def test_tilde_dir_left_unquoted_for_server_side_expansion():
cmd = build_update_command("~/matrix-bridge", "master")
assert "cd ~/matrix-bridge &&" in cmd
assert "'~" not in cmd # quoting would defeat the home-dir expansion
def test_absolute_dir_and_custom_branch():
cmd = build_update_command("/home/modelo/matrix-bridge", "phase-1")
assert cmd.startswith("cd /home/modelo/matrix-bridge && ")
assert "git reset --hard origin/phase-1 &&" in cmd
def test_phase_detection_maps_known_lines():
assert _phase_for("HEAD is now at 1a2b3c4 some commit") == "Resetting to the latest release…"
assert _phase_for("#5 building image") == "Building the bot image…"
assert _phase_for("Container matrix-bridge Recreate") == "Recreating the container…"
assert _phase_for("Already up to date.") == "No new code; rebuilding…"
assert _phase_for("some unremarkable line") is None
-127
View File
@@ -1,127 +0,0 @@
"""shellsafe validators: the API-boundary whitelist behind the v0.19.0 SSH
command-injection hardening. The quoting *sink* is covered in
test_launch_command.py; this locks in the *boundary* — that hostile input is
rejected early, and that a valid value passes through unchanged so callers can
use `validate_x(v)` inline.
"""
import pytest
from app.shellsafe import (
validate_container,
validate_image,
validate_local_path,
validate_repo,
)
# Shell metacharacters that must never survive any validator — these are the
# actual injection vectors. (Path traversal like "../" is NOT in scope here:
# validate_image legitimately permits "/" and "." for real image refs such as
# nvcr.io/nim/...; the defense for images is "no shell metacharacters" + the
# quote_arg sink, not path-shape. Slash-rejection is tested directly for repo
# and container, where "/" is disallowed.)
HOSTILE = [
"; rm -rf /",
" a b",
"$(touch pwned)",
"`id`",
"x|cat",
"x&y",
"x>out",
"x\nrm",
]
# ---- validate_repo: HF 'org/name', exactly one slash ----
@pytest.mark.parametrize("repo", [
"RedHatAI/Qwen3.6-35B-A3B-NVFP4", # the live production model
"org/name",
"a.b_c-d/x.y_z-1",
])
def test_repo_valid_passes_through_unchanged(repo):
assert validate_repo(repo) == repo
@pytest.mark.parametrize("repo", [
"",
"noslash",
"a/b/c", # two slashes
"/name", # empty org
"org/", # empty name
] + [f"org/name{h}" for h in HOSTILE])
def test_repo_rejects_malformed_and_hostile(repo):
with pytest.raises(ValueError):
validate_repo(repo)
# ---- validate_image: registry/path:tag@digest ----
@pytest.mark.parametrize("image", [
"nvcr.io/nim/nvidia/parakeet-1_1b-ctc-en-us:latest",
"ubuntu",
"img@sha256:deadbeefcafe",
"a.b/c:1.2_3-4",
])
def test_image_valid_passes_through_unchanged(image):
assert validate_image(image) == image
@pytest.mark.parametrize("image", [
"",
"-leading", # must start alphanumeric
".leading",
"/leading",
":leading",
"a" * 513, # over the 512 cap
] + [f"img{h}" for h in HOSTILE])
def test_image_rejects_malformed_and_hostile(image):
with pytest.raises(ValueError):
validate_image(image)
# ---- validate_container: Docker name rule, no slash ----
@pytest.mark.parametrize("name", [
"parakeet-asr",
"a",
"vol_1.2-3",
])
def test_container_valid_passes_through_unchanged(name):
assert validate_container(name) == name
@pytest.mark.parametrize("name", [
"",
"_leading", # underscore is not a valid first char
"-leading",
".leading",
"has/slash", # slash not allowed in a container name
"a" * 129, # over the 128 cap
] + [f"name{h}" for h in HOSTILE])
def test_container_rejects_malformed_and_hostile(name):
with pytest.raises(ValueError):
validate_container(name)
# ---- validate_local_path: absolute model dir, no traversal/metacharacters ----
@pytest.mark.parametrize("path", [
"/home/modelo/models/gemma-4-31B-ten31-v2",
"/data/models/ft.v2_1",
"/srv/m/a-b/c",
])
def test_local_path_valid_passes_through_unchanged(path):
assert validate_local_path(path) == path
@pytest.mark.parametrize("path", [
"",
"relative/path", # must be absolute
"~/models/x", # no ~ expansion
"/models/../etc/shadow", # '..' traversal
"/models/./x", # '.' segment
"/a" * 300, # over the 512 cap (600 chars)
] + [f"/models/x{h}" for h in HOSTILE])
def test_local_path_rejects_relative_traversal_and_hostile(path):
with pytest.raises(ValueError):
validate_local_path(path)
-120
View File
@@ -1,120 +0,0 @@
"""Configurable topology: DISABLED_SERVICES, vLLM container override, and the
extra-vLLM probe. All offline — the disabled checks short-circuit before any
network call, and the probes are exercised only on the not-configured path.
"""
import asyncio
from app.config import Settings
from app.health import (
check_embeddings,
check_kokoro,
check_parakeet,
check_qdrant,
check_vllm,
probe_vllm_endpoint,
)
from app.services import services_from_settings
def _settings(monkeypatch, **env) -> Settings:
# Pin the topology env vars under test; default the rest to blank so a stray
# value in the real environment can't leak into the assertion.
keys = [
"SPARK1_HOST", "SPARK1_USER", "SPARK2_HOST", "SPARK2_USER",
"DISABLED_SERVICES", "VLLM_CONTAINER",
]
for k in keys:
monkeypatch.delenv(k, raising=False)
for k, v in env.items():
monkeypatch.setenv(k, v)
return Settings.from_env()
# ---- DISABLED_SERVICES parsing ----
def test_disabled_services_parsed_lowercased_and_trimmed(monkeypatch):
s = _settings(monkeypatch, DISABLED_SERVICES="parakeet, Kokoro ,,")
assert s.disabled_services == frozenset({"parakeet", "kokoro"})
def test_disabled_services_blank_is_empty(monkeypatch):
assert _settings(monkeypatch).disabled_services == frozenset()
# ---- vLLM container override ----
def test_vllm_container_defaults_to_vllm_node(monkeypatch):
assert _settings(monkeypatch).vllm_container == "vllm_node"
def test_vllm_container_override(monkeypatch):
assert _settings(monkeypatch, VLLM_CONTAINER="vllm-gemma4").vllm_container == "vllm-gemma4"
def test_vllm_container_invalid_falls_back(monkeypatch):
# A malformed value (space / shell metachar) is rejected at the boundary and
# falls back to the default rather than crashing startup or reaching a sink.
assert _settings(monkeypatch, VLLM_CONTAINER="bad name; rm -rf").vllm_container == "vllm_node"
# ---- services map honors the disable list ----
def test_services_from_settings_drops_disabled(monkeypatch):
s = _settings(
monkeypatch,
SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u",
DISABLED_SERVICES="parakeet,qdrant",
)
svcs = services_from_settings(s)
assert "parakeet" not in svcs and "qdrant" not in svcs
assert "kokoro" in svcs and "embeddings" in svcs
def test_custom_vllm_service_registered(monkeypatch):
from app import custom_services
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
{"key": "vllm-spark2", "kind": "vllm", "host": "10.0.0.2",
"user": "u", "container": "vllm_node", "port": 8000},
])
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
svc = services_from_settings(s)["vllm-spark2"]
assert svc.kind == "vllm" and svc.port == 8000 and svc.container == "vllm_node"
def test_custom_service_colliding_with_builtin_is_ignored(monkeypatch):
# A custom entry can't shadow a built-in key — the built-in wins.
from app import custom_services
monkeypatch.setattr(custom_services, "load_custom_services", lambda: [
{"key": "parakeet", "kind": "vllm", "host": "10.0.0.9", "user": "u", "port": 8000},
])
s = _settings(monkeypatch, SPARK1_HOST="10.0.0.1", SPARK1_USER="u",
SPARK2_HOST="10.0.0.2", SPARK2_USER="u")
assert services_from_settings(s)["parakeet"].kind == "stt"
# ---- disabled health checks short-circuit (no network) ----
def test_disabled_check_returns_disabled_verdict(monkeypatch):
s = _settings(
monkeypatch,
SPARK2_HOST="10.0.0.2", SPARK2_USER="u", # host set, but disable wins
DISABLED_SERVICES="parakeet,kokoro,embeddings,qdrant",
)
for check in (check_parakeet, check_kokoro, check_embeddings, check_qdrant):
r = asyncio.run(check(s))
assert r == {"ok": False, "disabled": True, "error": "disabled", "base_url": None}
# ---- vLLM probe: not-configured path is pure ----
def test_probe_vllm_endpoint_unconfigured(monkeypatch):
r = asyncio.run(probe_vllm_endpoint("", 8000))
assert r["ok"] is False and "not configured" in r["error"]
def test_check_vllm_unconfigured_without_spark1(monkeypatch):
s = _settings(monkeypatch) # no SPARK1_HOST
r = asyncio.run(check_vllm(s))
assert r["ok"] is False and "spark1 not configured" in r["error"]
+1 -1
View File
@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2026 Alice
Copyright (c) 2026 Grant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
-11
View File
@@ -1,14 +1,3 @@
ARCHES := x86
# overrides to s9pk.mk must precede the include statement
include s9pk.mk
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
.PHONY: release
release:
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
@@ -40,33 +40,6 @@ const inputSpec = InputSpec.of({
placeholder: 'your SSH username',
masked: false,
}),
vllm_port: Value.text({
name: 'vLLM port (optional)',
description:
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
required: false,
default: null,
placeholder: 'leave blank for 8888',
masked: false,
}),
vllm_container: Value.text({
name: 'vLLM container name (optional)',
description:
'Docker container name for the swappable vLLM on Spark 1. Defaults to "vllm_node" (what the bundled launch-cluster.sh creates). Change this only if you run your vLLM under a different container name — the model-swap log view and the pre-flight validator exec into it by name.',
required: false,
default: null,
placeholder: 'leave blank for vllm_node',
masked: false,
}),
disabled_services: Value.text({
name: 'Services to hide (optional)',
description:
"Comma-separated list of built-in services your cluster doesn't run, so Spark Control hides their tiles and stops probing them. Valid names: parakeet, kokoro, embeddings, qdrant. Example: if you only run vLLM, set this to 'parakeet,kokoro,embeddings,qdrant'. Leave blank to monitor all of them. (Useful when, say, your vLLM shares port 8000 with Parakeet's default — hide Parakeet so its probe doesn't hit vLLM.)",
required: false,
default: null,
placeholder: 'e.g. parakeet,kokoro',
masked: false,
}),
parakeet_host: Value.text({
name: 'Parakeet host (optional)',
description:
@@ -146,15 +119,6 @@ const inputSpec = InputSpec.of({
placeholder: 'e.g. crm_chunks',
masked: false,
}),
matrix_bridge_user: Value.text({
name: 'matrix-bridge bot SSH user (optional)',
description:
"If you run the matrix-bridge Matrix bot on Spark 2, enter the SSH user that owns its ~/matrix-bridge folder (e.g. 'modelo'). Spark Control then shows a tile to update, restart, and view logs for the bot. Leave blank if you don't run the bot — the tile stays hidden. Note: this package's SSH public key must be authorized for that user (Show Public Key action) unless it's the same as your Spark 2 user.",
required: false,
default: null,
placeholder: 'e.g. modelo',
masked: false,
}),
open_webui_url: Value.text({
name: 'Open WebUI URL (optional)',
description:
@@ -7,13 +7,6 @@ export const sparkConfigSchema = z.object({
spark1_user: z.string().catch(''),
spark2_host: z.string().catch(''),
spark2_user: z.string().catch(''),
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
vllm_port: z.string().catch(''),
// Optional vLLM container-name override (Spark 1). Blank => "vllm_node".
vllm_container: z.string().catch(''),
// Optional comma-separated list of built-in services to switch off
// (parakeet, kokoro, embeddings, qdrant). Blank => all enabled.
disabled_services: z.string().catch(''),
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
parakeet_host: z.string().catch(''),
parakeet_user: z.string().catch(''),
@@ -29,8 +22,6 @@ export const sparkConfigSchema = z.object({
qdrant_user: z.string().catch(''),
qdrant_container: z.string().catch(''),
qdrant_collection: z.string().catch(''),
// Optional matrix-bridge bot. Blank => no tile. Host reuses Spark 2.
matrix_bridge_user: z.string().catch(''),
// Optional Open WebUI deep-link
open_webui_url: z.string().catch(''),
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
-8
View File
@@ -13,9 +13,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
spark1_user: '',
spark2_host: '',
spark2_user: '',
vllm_port: '',
vllm_container: '',
disabled_services: '',
parakeet_host: '',
parakeet_user: '',
parakeet_container: '',
@@ -29,7 +26,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
qdrant_user: '',
qdrant_container: '',
qdrant_collection: '',
matrix_bridge_user: '',
open_webui_url: '',
ngc_api_key: '',
}
@@ -53,9 +49,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
SPARK1_USER: cfg.spark1_user,
SPARK2_HOST: cfg.spark2_host,
SPARK2_USER: cfg.spark2_user,
VLLM_PORT: cfg.vllm_port,
VLLM_CONTAINER: cfg.vllm_container,
DISABLED_SERVICES: cfg.disabled_services,
PARAKEET_HOST: cfg.parakeet_host,
PARAKEET_USER: cfg.parakeet_user,
PARAKEET_CONTAINER: cfg.parakeet_container,
@@ -69,7 +62,6 @@ export const main = sdk.setupMain(async ({ effects }) => {
QDRANT_USER: cfg.qdrant_user,
QDRANT_CONTAINER: cfg.qdrant_container,
QDRANT_COLLECTION: cfg.qdrant_collection,
MATRIX_BRIDGE_USER: cfg.matrix_bridge_user,
MODELS_OVERRIDES: '/data/models-overrides.yaml',
SERVICES_OVERRIDES: '/data/services-overrides.yaml',
CONNECTIVITY_LOG: '/data/connectivity.json',
+2 -2
View File
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({
version: '0.24.0:0',
version: '0.19.0:0',
releaseNotes: {
en_US:
"v0.24.0:0 — configurable cluster topology. Spark Control no longer assumes our exact layout, so a cluster that's wired differently can be monitored without forking. Three new optional settings in Configure Sparks: (1) vLLM container name — defaults to \"vllm_node\"; set it if your swappable vLLM runs under a different container name (the swap log view and pre-flight validator exec into it by name). (2) Services to hide — a comma-separated list of built-in services your cluster doesn't run (parakeet, kokoro, embeddings, qdrant); hidden ones show no tile and are never probed, so e.g. a vLLM sharing Parakeet's default port 8000 no longer gets a confusing Parakeet probe. (3) Monitor a second vLLM — register a vLLM on another Spark as a custom service with kind \"vllm\" (in /data/services-overrides.yaml); it gets a read-only health tile (loaded model + container state + start/stop/restart) alongside the swappable one. API: /api/endpoints now reports a `disabled` flag per service.",
'v0.19.0:0 — security hardening of the cluster-control surface (no change to the proxy/data APIs your other apps use). (1) Every user-supplied value that reaches an SSH command on the Sparks — model repo, vLLM args/knobs, NIM image/container, service names — is now strictly validated and shell-quoted, closing a command-injection path. (2) The Qdrant collection name in /api/search is validated so it can no longer be used to reach other collections. (3) State-changing dashboard endpoints (model swap, NIM install, service start/stop, disk delete, etc.) now require a same-origin request, blocking cross-site (CSRF) attacks from a malicious page open in your browser. The OpenAI-compatible proxies (/v1/*), the redaction gateway (/scrub, /rehydrate), /api/search, /api/audio/*, and /api/health-event are exempt, so Recap Relay, the CRM, Open WebUI and other consumers are unaffected.',
},
migrations: {
up: async ({ effects }) => {},
-55
View File
@@ -34,44 +34,6 @@ These take effect on the **next swap to that model**. If a swap fails after this
- Status auto-refreshes every 5 s.
- A swap takes 36 minutes depending on the model. Don't close the tab — but if you do, the swap continues; reopen and you'll re-attach to the log stream.
## matrix-bridge bot tile (optional)
If you run the matrix-bridge bot container on a Spark, set its SSH user in **Configure Sparks** (e.g. the user that owns `~/matrix-bridge`) and a tile appears under "Always-on services" with status, Update, Restart, Stop/Start, and View logs. Status is docker-state only (no HTTP health), so a `running` badge means the container is up, not necessarily that the bot is connected.
The **Update** button runs `git fetch && git reset --hard origin/<branch> && docker compose up -d --build` as that SSH user. For it to reach your git remote:
1. `~/matrix-bridge` must be a clone of the repo (not loose files). Gitignored secrets (`.env`, etc.) survive a `git reset --hard`.
2. If that user has more than one SSH key, pin the remote's key so git doesn't offer the wrong one first (a common `Permission denied (publickey)` cause). In the user's `~/.ssh/config`:
```
Host <your-git-host>
Port <port>
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
```
3. Spark Control's own package key must be authorized for that SSH user (Show Public Key → add to their `authorized_keys`) unless it's the same user Spark Control already uses for that Spark.
## Configurable topology (v0.24.0+)
For a cluster wired differently from the reference layout, three optional knobs in **Configure Sparks** (no fork needed):
- **vLLM container name** — defaults to `vllm_node`. Set it if your swappable vLLM on Spark 1 runs under a different container name; the swap log-tail and the pre-flight validator `docker exec` into it by name.
- **Services to hide** — comma-separated `parakeet,kokoro,embeddings,qdrant`. Hidden services show no tile and are never probed (status, deep-health, or connectivity log). Use this when a service you don't run would otherwise be probed at a port something else answers — e.g. a vLLM on port 8000 colliding with Parakeet's default.
- **Monitor a second vLLM** — the swap machinery only drives the Spark 1 vLLM, but you can *monitor* a vLLM on another Spark by adding a custom service of `kind: vllm` to `/data/services-overrides.yaml`:
```yaml
custom:
- key: vllm-spark2
kind: vllm
host: <spark-2-ip>
user: <ssh-user>
container: vllm_node
port: 8000
```
It gets a read-only tile: loaded model (via `/v1/models`), container state, and start/stop/restart. (Spark Control's SSH key must be authorized for that user — Show Public Key.)
## Adding a new model
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
@@ -80,12 +42,6 @@ For a cluster wired differently from the reference layout, three optional knobs
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
### Local / fine-tuned models (v0.23.0+)
A model that lives as a directory on a Spark (e.g. a LoRA-merged fine-tune) instead of an HF repo: use the **"+ Add local model"** button under LLM swap (or a `custom:` entry with `local_path` instead of `repo` in the override YAML). The directory must already exist on the Spark; only its parent dir is mounted, so a `--chat-template` must live **inside** `local_path`.
**Load-bearing contract:** on swap, spark-control prefixes the launch with `VLLM_SPARK_EXTRA_DOCKER_ARGS="-v <path>:<path>"` so `launch-cluster.sh` bind-mounts the dir into the vLLM container at the same path. This relies on the upstream `eugr/spark-vllm-docker` `launch-cluster.sh` expanding `$VLLM_SPARK_EXTRA_DOCKER_ARGS` **unquoted** into its `docker run` (verified against the on-Spark script 2026-06-17: line ~11 appends it to `DOCKER_ARGS`, used unquoted in `docker run`). If a future upstream version quotes that variable, local-model mounts would silently fail — re-check this before pulling launch-cluster.sh updates.
## Manual swap fallback
If the UI is unavailable and you need to swap by hand:
@@ -101,17 +57,6 @@ cd ~/spark-vllm-docker
docker logs -f vllm_node # wait for "Application startup complete."
```
## Sideload (`make install`) can't reach the server
Symptom: `make install` fails with `package.sideload: error sending request for url (https://immense-voyage.local/rpc/v1)`. Cause seen 2026-06-17: `immense-voyage.local` stopped resolving via mDNS from the Mac (`curl https://immense-voyage.local/...` → exit 6, "couldn't resolve host"), even though the server is up — `curl -sk https://<server-ip>/rpc/v1` returns 200.
- **Don't** work around it with `start-cli -H https://<server-ip> package install`: TLS connects but it returns `UNAUTHORIZED`, because start-cli's stored credential is bound to the registered `.local` host, not the IP.
- **Fix:** make the name resolve again, then re-run `make install`:
- `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (flush mDNS), or
- `echo "<server-ip> immense-voyage.local" | sudo tee -a /etc/hosts` (deterministic; remove later).
Note this only blocks installing to *your own* Start9 — building and publishing the s9pk to Gitea Releases is unaffected (adopters still pull the latest).
## Diagnostics
```bash
-65
View File
@@ -1,65 +0,0 @@
#!/usr/bin/env bash
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
# latest package with a read-only token instead of being hand-sent the file.
#
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
#
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
# reuses an existing release for the tag and replaces a same-named asset.
# Set GITEA_INSECURE=1 to skip TLS verification (self-signed cert on a LAN box).
set -euo pipefail
VERSION="${1:-}"; S9PK="${2:-}"
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository read+write access}"
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
ASSET="$(basename "$S9PK")"
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
CURL=(curl -sS) # no -f: we inspect HTTP codes ourselves
[ "${GITEA_INSECURE:-}" = "1" ] && CURL+=(-k)
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
# api METHOD URL [extra curl args...] -> sets globals HTTP_CODE and BODY
api() {
local method="$1" url="$2"; shift 2
local out
out="$("${CURL[@]}" -X "$method" -H "Authorization: token ${GITEA_TOKEN}" "$@" \
-w $'\n%{http_code}' "$url")"
HTTP_CODE="${out##*$'\n'}"
BODY="${out%$'\n'*}"
}
# Reuse an existing release for this tag, otherwise create one.
api GET "$API/releases/tags/$TAG"
if [ "$HTTP_CODE" = 200 ]; then
id="$(printf '%s' "$BODY" | jq -r '.id')"
elif [ "$HTTP_CODE" = 404 ]; then
api POST "$API/releases" -H 'Content-Type: application/json' \
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')"
[ "$HTTP_CODE" = 201 ] || { echo "create release failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
id="$(printf '%s' "$BODY" | jq -r '.id')"
else
echo "release lookup failed (HTTP $HTTP_CODE) — check GITEA_URL and the token's scope: $BODY" >&2
exit 1
fi
[ -n "$id" ] && [ "$id" != null ] || { echo "could not parse release id: $BODY" >&2; exit 1; }
# Replace a same-named asset so re-runs don't 409.
api GET "$API/releases/$id/assets"
old="$(printf '%s' "$BODY" | jq -r --arg n "$ASSET" '.[]? | select(.name==$n) | .id')"
[ -n "$old" ] && { api DELETE "$API/releases/$id/assets/$old"; }
api POST "$API/releases/$id/assets?name=$ASSET" \
-F "attachment=@${S9PK};type=application/octet-stream"
[ "$HTTP_CODE" = 201 ] || { echo "asset upload failed (HTTP $HTTP_CODE): $BODY" >&2; exit 1; }
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"