Files

T

Keysat e783653ef0 v0.23.0:0 - local / fine-tuned model support

Add models that live as a directory on a Spark (e.g. LoRA-merged fine-tunes),
not just Hugging Face repos.

- ModelDef gains local_path; a model must set exactly one of repo / local_path.
  The validator also enforces the local-path whitelist and that any
  --chat-template lives inside local_path (only that dir is mounted).
- build_launch_command bind-mounts the dir into the vLLM container at the SAME
  host==container path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook,
  then `vllm serve <dir>`. No launch-cluster.sh change (verified the upstream
  expands that var unquoted; contract noted in runbook.md).
- shellsafe.validate_local_path: absolute path, charset whitelist, no '.'/'..'.
- POST /api/models validates the full entry via ModelDef before persisting, so a
  bad entry can't be written and then break catalog load; _merge_overrides skips
  an invalid override entry instead of failing the whole catalog.
- disk.py size-probes a local path with du; disk-delete refused for local models.
- UI: "+ Add local model" dialog, `local` badge, path shown instead of an HF
  link, delete button hidden for local models.
- Tests: local launch + injection round-trip, chat-template location, traversal,
  exactly-one-source, _merge_overrides skip-invalid (94 pass). Reviewer-agent
  pass; findings addressed.

2026-06-17 22:27:41 -05:00

7.6 KiB

Raw Blame History

ROADMAP

Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.

Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)

Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on both Sparks, port 8000, raw docker run, container vllm-gemma4) and an automated cron physically swaps models — so his notes are partly portability gaps (the package hard-codes our layout) and partly coordination gaps (his dashboard and his crons fight over the GPU).

Design stance (decided): Spark Control is the control plane / GPU arbiter, not a job runner. Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in separate application services that call Spark Control's swap API. The dividing line is what a scheduled job does: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (POST /api/swap → GET /api/swap/{id} / …/stream, POST /api/swap/{key}/validate) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps today — the items below add the safety layer, not the capability.

Sequenced:

Configurable VLLM_PORT — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
Local-path / fine-tuned model support — DONE, v0.23.0:0. Catalog/ModelDef gained local_path (exactly one of repo/local_path); swap bind-mounts the dir into the vLLM container at the same path via the launch script's VLLM_SPARK_EXTRA_DOCKER_ARGS hook (no launch-cluster.sh change); "+ Add local model" form + local badge; disk-delete refused for local models; validate_local_path boundary check. His merged ten31-v2 was the motivating case.
Configurable topology — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on both Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
Coordination layer — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
- Swap lock with holder + TTL (POST / GET / DELETE /api/swap/lock). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
- Swap-event webhook (swap_complete / swap_failed) to a configurable URL, so downstream consumers update their provider config when the running model changes.
- Schedule visibility — read-only view the dashboard surfaces, registered by external schedulers (Spark Control does not own the schedule).

Near term

parakeet-asr long-audio memory guard — deferred 2026-06-15, low priority. A duration cap on /v1/audio/diarize: Sortformer runs the whole file in one pass (diarizer.py:128-135) over Spark 2's shared 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. Precautionary — no observed incident, and the production consumer (Recap Relay) already chunks via /diarize-chunk (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full /diarize. When picked up: add a configurable MAX_DIARIZE_SECONDS guard in diarizer.py right after duration is computed (~line 130) → raise → HTTP 413 in main.py (mirrors the existing MAX_UPLOAD_MB 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.

Audio quality

Echo cancellation for dual-channel label-merge — removes the mic-bleed limit when the local user isn't wearing headphones.
LLM "referee" pass for low-confidence label-merge speaker naming.

Platform hardening

Qdrant auth (API key) + scheduled snapshots/backups.
Observability: request metrics + GPU-busy tracking, so load questions are answered from data instead of log archaeology.
API-key auth on Spark Control — only if public (non-VPN) exposure is ever needed; current stance is LAN + split-tunnel VPN only.

Throughput (only if audio load outgrows one GPU)

Second audio worker / queueing layer; revisit which services share Spark 2.

Dashboard

Per-model configurable vLLM flags editable from the UI (today: edit models.yaml and rebuild).
Spark host update actions (OS/driver) from the UI.
Open WebUI link-out integration; richer per-service detail views.

Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)

P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:

P2 — track:

No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for build_launch_command (incl. injection cases), swap transitions, and _merge_words_with_speakers is the highest-value start.
Loose dependency floors permit vulnerable python-multipart/starlette (DoS CVEs) on rebuild; no lockfile; no upload size caps (pyproject.toml).
Opaque HTTP 500 on POST /api/models / PUT /knobs when MODELS_OVERRIDES unset in dev (write to read-only /data) — catch the OSError.
NGC API key still appears on the remote process command line (nim.py) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
Global mutable catalog reassigned via global, shared across async requests with no snapshot (server.py) — latent race as concurrency grows.
Container runs uvicorn as root bound to 0.0.0.0:9999 (no USER in Dockerfile) — amplifies any RCE blast radius.

P3 — bulk-fix when next touching docs/packaging:

README Status block stale (v0.2.3 / 0.13.0:4 → now v0.19.0:0); deprecated @app.on_event + hardcoded app.version="0.1.0"; NimInstallBody.register shadows BaseModel (rename → register_service); httpx class names leak into TTS/speech-models error text; one unescaped innerHTML sink (app.js) + task_id reflected in scrub JSON.
Packaging: marketingUrl/packageRepo/upstreamRepo are example.com placeholders; broken instructions.md source link; per-service SSH users (parakeet_user etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); Makefile builds only x86 though the manifest declares aarch64.
Hardening misc: no body/upload size limits on /v1/audio/*, /v1/chat/completions, /scrub; int(_env(...)) startup crash on bad VLLM_PORT; upstream error text echoed to clients.
StartOS registry (only if ever pursuing it): source must be public + real repo URLs.

7.6 KiB Raw Blame History