ROADMAP

Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.

Near term

parakeet-asr long-audio memory guard — deferred 2026-06-15, low priority. A duration cap on /v1/audio/diarize: Sortformer runs the whole file in one pass (diarizer.py:128-135) over Spark 2's shared 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. Precautionary — no observed incident, and the production consumer (Recap Relay) already chunks via /diarize-chunk (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full /diarize. When picked up: add a configurable MAX_DIARIZE_SECONDS guard in diarizer.py right after duration is computed (~line 130) → raise → HTTP 413 in main.py (mirrors the existing MAX_UPLOAD_MB 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.

Echo cancellation for dual-channel label-merge — removes the mic-bleed limit when the local user isn't wearing headphones.
LLM "referee" pass for low-confidence label-merge speaker naming.

Qdrant auth (API key) + scheduled snapshots/backups.
Observability: request metrics + GPU-busy tracking, so load questions are answered from data instead of log archaeology.
API-key auth on Spark Control — only if public (non-VPN) exposure is ever needed; current stance is LAN + split-tunnel VPN only.

Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (models.yaml + custom overrides) and the "Add custom model" path (POST /api/models) only accepts an HF org/name repo (shellsafe._HF_REPO_RE), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the org/name regex (path whitelist + shlex.quote, no traversal); (b) models.build_launch_command / launch-cluster.sh able to vllm serve <path>; (c) disk.py size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
Per-model configurable vLLM flags editable from the UI (today: edit models.yaml and rebuild).
Spark host update actions (OS/driver) from the UI.
Open WebUI link-out integration; richer per-service detail views.

P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:

P2 — track:

No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for build_launch_command (incl. injection cases), swap transitions, and _merge_words_with_speakers is the highest-value start.
Loose dependency floors permit vulnerable python-multipart/starlette (DoS CVEs) on rebuild; no lockfile; no upload size caps (pyproject.toml).
Opaque HTTP 500 on POST /api/models / PUT /knobs when MODELS_OVERRIDES unset in dev (write to read-only /data) — catch the OSError.
NGC API key still appears on the remote process command line (nim.py) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
Global mutable catalog reassigned via global, shared across async requests with no snapshot (server.py) — latent race as concurrency grows.
Container runs uvicorn as root bound to 0.0.0.0:9999 (no USER in Dockerfile) — amplifies any RCE blast radius.

P3 — bulk-fix when next touching docs/packaging:

README Status block stale (v0.2.3 / 0.13.0:4 → now v0.19.0:0); deprecated @app.on_event + hardcoded app.version="0.1.0"; NimInstallBody.register shadows BaseModel (rename → register_service); httpx class names leak into TTS/speech-models error text; one unescaped innerHTML sink (app.js) + task_id reflected in scrub JSON.
Packaging: marketingUrl/packageRepo/upstreamRepo are example.com placeholders; broken instructions.md source link; per-service SSH users (parakeet_user etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); Makefile builds only x86 though the manifest declares aarch64.
Hardening misc: no body/upload size limits on /v1/audio/*, /v1/chat/completions, /scrub; int(_env(...)) startup crash on bad VLLM_PORT; upstream error text echoed to clients.
StartOS registry (only if ever pursuing it): source must be public + real repo URLs.