v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf

Triaged from a full independent evaluation (EVALUATION.md). Addresses the
three P0/P1 code findings; the proxy/data APIs that downstream apps consume
are deliberately untouched.

- ssh command injection (P0): new shellsafe.py validates + shlex.quotes every
  user-supplied value crossing into an SSH command on the Sparks (model repo,
  vllm args/knobs, NIM image/container/volume/port/env, service names).
  Boundary validation on POST /api/models and POST /api/nim/install; quoting at
  every sink in models/download/nim/services. NGC key now quoted too.
- qdrant path injection (P1): /api/search validates the collection name against
  a metacharacter-free whitelist and URL-encodes the path segment.
- csrf (P1): csrf_guard middleware enforces same-origin on state-changing
  control endpoints; /v1/*, /scrub, /rehydrate, /api/search, /api/audio/* and
  /api/health-event are exempt so external consumers are unaffected.

Verified: injection survives only as a single quoted token, vLLM preflight
shlex.split round-trip intact, CSRF behaviors covered via TestClient, both
offline redaction suites still pass, tsc clean, s9pk rebuilt.
This commit is contained in:
Keysat
2026-06-12 16:36:33 -05:00
parent 98988057a2
commit 1c4e861783
10 changed files with 260 additions and 24 deletions
+3 -2
View File
@@ -10,6 +10,7 @@ from dataclasses import dataclass
from typing import Literal, Optional
from .config import Settings
from .shellsafe import quote_arg
from .ssh import ssh_run
@@ -111,7 +112,7 @@ async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
if _is_recently_unreachable(svc.host, svc.user):
return {"state": "unreachable", "host_unreachable": True, "restart_count": None, "uptime": None}
cmd = (
f"docker inspect {svc.container} "
f"docker inspect {quote_arg(svc.container)} "
f"--format '{{{{.State.Status}}}}|{{{{.State.StartedAt}}}}|{{{{.RestartCount}}}}|{{{{.State.ExitCode}}}}|{{{{.State.Error}}}}' "
f"2>&1 || echo 'NOT_FOUND'"
)
@@ -141,7 +142,7 @@ async def run_action(settings: Settings, svc: ServiceDef, action: ServiceAction)
"""Run docker start/stop/restart on the target host."""
if not svc.host or not svc.user:
return {"ok": False, "error": "service host not configured"}
cmd = f"docker {action} {svc.container}"
cmd = f"docker {action} {quote_arg(svc.container)}"
rc, out, err = await ssh_run(svc.host, svc.user, cmd, settings, timeout=30)
return {
"ok": rc == 0,