5.3 KiB
5.3 KiB
ROADMAP
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
Near term
- parakeet-asr long-audio memory guard — deferred 2026-06-15, low priority. A duration cap on
/v1/audio/diarize: Sortformer runs the whole file in one pass (diarizer.py:128-135) over Spark 2's shared 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. Precautionary — no observed incident, and the production consumer (Recap Relay) already chunks via/diarize-chunk(~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full/diarize. When picked up: add a configurableMAX_DIARIZE_SECONDSguard indiarizer.pyright afterdurationis computed (~line 130) → raise → HTTP 413 inmain.py(mirrors the existingMAX_UPLOAD_MB413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files. - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
Audio quality
- Echo cancellation for dual-channel label-merge — removes the mic-bleed limit when the local user isn't wearing headphones.
- LLM "referee" pass for low-confidence label-merge speaker naming.
Platform hardening
- Qdrant auth (API key) + scheduled snapshots/backups.
- Observability: request metrics + GPU-busy tracking, so load questions are answered from data instead of log archaeology.
- API-key auth on Spark Control — only if public (non-VPN) exposure is ever needed; current stance is LAN + split-tunnel VPN only.
Throughput (only if audio load outgrows one GPU)
- Second audio worker / queueing layer; revisit which services share Spark 2.
Dashboard
- Support local-path / fine-tuned models in the swap catalog. Today the catalog is static (
models.yaml+ custom overrides) and the "Add custom model" path (POST /api/models) only accepts an HForg/namerepo (shellsafe._HF_REPO_RE), so a model that exists only as a directory on a Spark (the usual fine-tuning output) can't be registered or swapped. Needs: (a) a "local model" add form/field taking a Spark-side directory path, with its own safe validation instead of theorg/nameregex (path whitelist +shlex.quote, no traversal); (b)models.build_launch_command/launch-cluster.shable tovllm serve <path>; (c)disk.pysize-probe handling a path instead of deriving the HF cache dir from a repo id. Raised 2026-06-15 — a colleague's locally fine-tuned model doesn't appear because nothing scans the machine; the list is a curated catalog, not a discovery probe. - Per-model configurable vLLM flags editable from the UI (today: edit
models.yamland rebuild). - Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views.
Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)
P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:
P2 — track:
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for
build_launch_command(incl. injection cases), swap transitions, and_merge_words_with_speakersis the highest-value start. - Loose dependency floors permit vulnerable
python-multipart/starlette(DoS CVEs) on rebuild; no lockfile; no upload size caps (pyproject.toml). - Opaque HTTP 500 on
POST /api/models/PUT /knobswhenMODELS_OVERRIDESunset in dev (write to read-only/data) — catch theOSError. - NGC API key still appears on the remote process command line (
nim.py) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure. - Global mutable
catalogreassigned viaglobal, shared across async requests with no snapshot (server.py) — latent race as concurrency grows. - Container runs uvicorn as root bound to
0.0.0.0:9999(noUSERin Dockerfile) — amplifies any RCE blast radius.
P3 — bulk-fix when next touching docs/packaging:
- README Status block stale (
v0.2.3 / 0.13.0:4→ now v0.19.0:0); deprecated@app.on_event+ hardcodedapp.version="0.1.0";NimInstallBody.registershadowsBaseModel(rename →register_service); httpx class names leak into TTS/speech-models error text; one unescapedinnerHTMLsink (app.js) +task_idreflected in scrub JSON. - Packaging:
marketingUrl/packageRepo/upstreamRepoareexample.complaceholders; brokeninstructions.mdsource link; per-service SSH users (parakeet_useretc.) absent from the Configure-Sparks action inputSpec (silent default-empty);Makefilebuilds only x86 though the manifest declaresaarch64. - Hardening misc: no body/upload size limits on
/v1/audio/*,/v1/chat/completions,/scrub;int(_env(...))startup crash on badVLLM_PORT; upstream error text echoed to clients. - StartOS registry (only if ever pursuing it): source must be public + real repo URLs.