05d03beeeb
- AGENTS.md: rewrite Current state lean for v0.19.0:0; drop the now-completed full-eval triage block (history lives in git log + EVALUATION.md). - docs/guides/fastapi-image.md: add two durable conventions — user values crossing into SSH must go through shellsafe; new endpoints and the csrf_guard exempt-prefix rule. - ROADMAP.md: park the remaining non-blocking P2/P3 tech debt from the eval.
3.5 KiB
3.5 KiB
ROADMAP
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
Near term
- parakeet-asr
--memorycap, shipped via the Reapply-patches action (guards against swap-thrash on very long audio). - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
Audio quality
- Echo cancellation for dual-channel label-merge — removes the mic-bleed limit when the local user isn't wearing headphones.
- LLM "referee" pass for low-confidence label-merge speaker naming.
Platform hardening
- Qdrant auth (API key) + scheduled snapshots/backups.
- Observability: request metrics + GPU-busy tracking, so load questions are answered from data instead of log archaeology.
- API-key auth on Spark Control — only if public (non-VPN) exposure is ever needed; current stance is LAN + split-tunnel VPN only.
Throughput (only if audio load outgrows one GPU)
- Second audio worker / queueing layer; revisit which services share Spark 2.
Dashboard
- Per-model configurable vLLM flags editable from the UI (today: edit
models.yamland rebuild). - Spark host update actions (OS/driver) from the UI.
- Open WebUI link-out integration; richer per-service detail views.
Tech debt (from the 2026-06-12 full-eval — see EVALUATION.md)
P0/P1 security findings are all fixed in v0.19.0:0. Remaining, none blocking:
P2 — track:
- No automated tests beyond the two redaction suites — swap state machine, proxies, SSH wrapper, and the StartOS package are untested; live-cluster paths (swap exec, audio, embeddings/search) are exercised only by hand. Biggest coverage gap; a small pytest harness for
build_launch_command(incl. injection cases), swap transitions, and_merge_words_with_speakersis the highest-value start. - Loose dependency floors permit vulnerable
python-multipart/starlette(DoS CVEs) on rebuild; no lockfile; no upload size caps (pyproject.toml). - Opaque HTTP 500 on
POST /api/models/PUT /knobswhenMODELS_OVERRIDESunset in dev (write to read-only/data) — catch theOSError. - NGC API key still appears on the remote process command line (
nim.py) — the quote-breakout risk is fixed; pass via stdin/env to also remove the process-list exposure. - Global mutable
catalogreassigned viaglobal, shared across async requests with no snapshot (server.py) — latent race as concurrency grows. - Container runs uvicorn as root bound to
0.0.0.0:9999(noUSERin Dockerfile) — amplifies any RCE blast radius.
P3 — bulk-fix when next touching docs/packaging:
- README Status block stale (
v0.2.3 / 0.13.0:4→ now v0.19.0:0); deprecated@app.on_event+ hardcodedapp.version="0.1.0";NimInstallBody.registershadowsBaseModel(rename →register_service); httpx class names leak into TTS/speech-models error text; one unescapedinnerHTMLsink (app.js) +task_idreflected in scrub JSON. - Packaging:
marketingUrl/packageRepo/upstreamRepoareexample.complaceholders; brokeninstructions.mdsource link; per-service SSH users (parakeet_useretc.) absent from the Configure-Sparks action inputSpec (silent default-empty);Makefilebuilds only x86 though the manifest declaresaarch64. - Hardening misc: no body/upload size limits on
/v1/audio/*,/v1/chat/completions,/scrub;int(_env(...))startup crash on badVLLM_PORT; upstream error text echoed to clients. - StartOS registry (only if ever pursuing it): source must be public + real repo URLs.