v0.19.0:0 - harden cluster-control surface: ssh injection, qdrant path, csrf

Triaged from a full independent evaluation (EVALUATION.md). Addresses the
three P0/P1 code findings; the proxy/data APIs that downstream apps consume
are deliberately untouched.

- ssh command injection (P0): new shellsafe.py validates + shlex.quotes every
  user-supplied value crossing into an SSH command on the Sparks (model repo,
  vllm args/knobs, NIM image/container/volume/port/env, service names).
  Boundary validation on POST /api/models and POST /api/nim/install; quoting at
  every sink in models/download/nim/services. NGC key now quoted too.
- qdrant path injection (P1): /api/search validates the collection name against
  a metacharacter-free whitelist and URL-encodes the path segment.
- csrf (P1): csrf_guard middleware enforces same-origin on state-changing
  control endpoints; /v1/*, /scrub, /rehydrate, /api/search, /api/audio/* and
  /api/health-event are exempt so external consumers are unaffected.

Verified: injection survives only as a single quoted token, vLLM preflight
shlex.split round-trip intact, CSRF behaviors covered via TestClient, both
offline redaction suites still pass, tsc clean, s9pk rebuilt.
This commit is contained in:
Keysat
2026-06-12 16:36:33 -05:00
parent 98988057a2
commit 1c4e861783
10 changed files with 260 additions and 24 deletions
+17 -1
View File
@@ -25,8 +25,10 @@ vector is supplied, /api/search degrades cleanly to dense + rerank.
"""
from __future__ import annotations
import logging
import re
import time
from typing import Any, Optional, Union
from urllib.parse import quote as urlquote
import httpx
from fastapi import APIRouter, HTTPException
@@ -36,6 +38,19 @@ from .config import Settings
logger = logging.getLogger("spark-control.embeddings")
# Qdrant collection name: caller-supplied and interpolated into the Qdrant URL
# path. Restrict to a metacharacter-free whitelist so it cannot inject path
# segments ('/', '..'), a query string ('?'), or a fragment ('#') and pivot to
# other collections/endpoints on the internal Qdrant. (Qdrant's own names are
# alphanumerics + dot/dash/underscore.)
_COLLECTION_RE = re.compile(r"^[A-Za-z0-9._-]+$")
def _safe_collection(name: str) -> str:
if not name or ".." in name or not _COLLECTION_RE.fullmatch(name):
raise HTTPException(400, f"invalid collection name: {name!r}")
return name
# Embedding/rerank can be slow on a cold model; search is interactive.
EMBED_TIMEOUT = 120.0
QDRANT_TIMEOUT = 30.0
@@ -175,6 +190,7 @@ def build_router(settings: Settings) -> APIRouter:
collection = body.collection or settings.qdrant_collection
if not collection:
raise HTTPException(400, "collection is required (no default configured)")
collection = _safe_collection(collection)
top_k = max(1, min(body.top_k, 100))
retrieve_n = body.retrieve_n or max(50, top_k * 10)
@@ -234,7 +250,7 @@ def build_router(settings: Settings) -> APIRouter:
t1 = time.time()
qr = await _post(
f"{_qdrant_base()}/collections/{collection}/points/query",
f"{_qdrant_base()}/collections/{urlquote(collection, safe='')}/points/query",
query_body, QDRANT_TIMEOUT, "qdrant",
)
if qr.status_code == 404: