# ROADMAP

Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.

## Near term
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.

## Audio quality
- Echo cancellation for dual-channel label-merge — removes the mic-bleed limit when the local user isn't wearing headphones.
- LLM "referee" pass for low-confidence label-merge speaker naming.

## Platform hardening
- Qdrant auth (API key) + scheduled snapshots/backups.
- Observability: request metrics + GPU-busy tracking, so load questions are answered from data instead of log archaeology.
- API-key auth on Spark Control — only if public (non-VPN) exposure is ever needed; current stance is LAN + split-tunnel VPN only.

## Throughput (only if audio load outgrows one GPU)
- Second audio worker / queueing layer; revisit which services share Spark 2.

## Dashboard
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (`models.yaml` + custom overrides) and the "Add custom model" path (`POST /api/models`) only accepts an HF `org/name` repo (`shellsafe._HF_REPO_RE`), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of the `org/name` regex (path whitelist + `shlex.quote`, no traversal); (b) `models.build_launch_command` / `launch-cluster.sh` able to `vllm serve <path>`; (c) `disk.py` size-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe.
- Per-model configurable vLLM flags editable from the UI (today: edit `models.yaml` and rebuild).
- Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views.

## Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)

P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:

**P2 — track:**
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for `build_launch_command` (incl. injection cases), swap transitions, and `_merge_words_with_speakers` is the highest-value start.
- Loose dependency floors permit vulnerable `python-multipart`/`starlette` (DoS CVEs) on rebuild; no lockfile; no upload size caps (`pyproject.toml`).
- Opaque HTTP 500 on `POST /api/models` / `PUT /knobs` when `MODELS_OVERRIDES` unset in dev (write to read-only `/data`) — catch the `OSError`.
- NGC API key still appears on the remote process command line (`nim.py`) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure.
- Global mutable `catalog` reassigned via `global`, shared across async requests with no snapshot (`server.py`) — latent race as concurrency grows.
- Container runs uvicorn as **root** bound to `0.0.0.0:9999` (no `USER` in Dockerfile) — amplifies any RCE blast radius.

**P3 — bulk-fix when next touching docs/packaging:**
- README Status block stale (`v0.2.3 / 0.13.0:4` → now v0.19.0:0); deprecated `@app.on_event` + hardcoded `app.version="0.1.0"`; `NimInstallBody.register` shadows `BaseModel` (rename → `register_service`); httpx class names leak into TTS/speech-models error text; one unescaped `innerHTML` sink (`app.js`) + `task_id` reflected in scrub JSON.
- Packaging: `marketingUrl`/`packageRepo`/`upstreamRepo` are `example.com` placeholders; broken `instructions.md` source link; per-service SSH users (`parakeet_user` etc.) absent from the Configure-Sparks action inputSpec (silent default-empty); `Makefile` builds only x86 though the manifest declares `aarch64`.
- Hardening misc: no body/upload size limits on `/v1/audio/*`, `/v1/chat/completions`, `/scrub`; `int(_env(...))` startup crash on bad `VLLM_PORT`; upstream error text echoed to clients.
- StartOS registry (only if ever pursuing it): source must be public + real repo URLs.