v0.27.0:0 - in-app settings gear + swap-lock route fix

Move the ~20 optional cluster knobs out of the StartOS "Configure Sparks"
action (now just the 4 required fields) and into a dashboard ⚙ Settings gear,
backed by a /data/app_settings.json overlay keyed by env-var names. One shared
mutable Settings instance + Settings.reload() applies edits live without a
restart; existing installs' values migrate automatically on first boot.

Also: support-service ports (parakeet/kokoro/embed/qdrant + vllm) are now
configurable, and GET /api/swap/lock no longer 404s (it was shadowed by the
/api/swap/{job_id} catch-all). WebhookNotifier is re-pointed on save so its
url/secret reload live too.
This commit is contained in:
Keysat
2026-06-18 13:41:28 -05:00
parent b67e001642
commit 7e0759846f
15 changed files with 797 additions and 268 deletions
+286
View File
@@ -0,0 +1,286 @@
"""App-owned settings overlay: the in-dashboard 'gear' knobs.
Spark Control's *required* wiring — the two Spark IPs and SSH users — is set once
via the StartOS "Configure Sparks" action and arrives as env vars. Everything
else (ports, container names, support-service hosts, integrations, webhook) is
optional and lives here: a small JSON overlay on /data that the dashboard gear
reads and writes, so an operator never has to open StartOS actions to tune the
cluster. This follows the StartOS 0.4 convention (minimal setup action; routine
config in the app's own UI) and stays inside the package's backup volume, so the
file is backed up and restored for free.
Each overlay entry is keyed by the *same env var name* config.Settings already
reads, so the overlay is simply an env-var override store. Precedence (see
config._effective_env): process env first, this overlay on top — so a knob set
in the gear wins, while an un-touched knob falls through to whatever the StartOS
action injected, then to the code default.
First-run migration: when the overlay file doesn't exist yet (e.g. an existing
install upgrading into this version), it's seeded from the current env so any
value previously set via the StartOS action carries over into the gear with no
operator action and nothing lost.
"""
from __future__ import annotations
import json
import logging
import os
import re
import tempfile
from pathlib import Path
from typing import Mapping
log = logging.getLogger(__name__)
# Field metadata drives BOTH the /api/settings response (the front-end renders
# the form generically from this) and light server-side validation. `key` is the
# env var name; `type` is one of text|int|csv|secret. `secret` values are
# write-only — never echoed back to the browser.
FIELDS: list[dict] = [
# --- vLLM (Spark 1) ---
{"group": "vLLM (Spark 1)", "key": "VLLM_PORT", "label": "vLLM port", "type": "int",
"placeholder": "8888",
"help": "Port your vLLM listens on. Blank ⇒ 8888 (the bundled launch-cluster.sh). Set 8000 for vanilla vLLM, or wherever yours listens."},
{"group": "vLLM (Spark 1)", "key": "VLLM_CONTAINER", "label": "vLLM container name", "type": "text",
"placeholder": "vllm_node",
"help": "Docker container the swappable vLLM runs in. Blank ⇒ vllm_node. The swap log-tail and pre-flight validator exec into it by name."},
# --- Monitoring ---
{"group": "Monitoring", "key": "DISABLED_SERVICES", "label": "Services to hide", "type": "csv",
"placeholder": "e.g. parakeet,kokoro",
"help": "Comma-separated built-in services your cluster doesn't run, so their tiles are hidden and never probed. Valid: parakeet, kokoro, embeddings, qdrant. Blank ⇒ monitor all."},
# --- Parakeet (STT) ---
{"group": "Parakeet (STT)", "key": "PARAKEET_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Parakeet STT container. Blank ⇒ Spark 2."},
{"group": "Parakeet (STT)", "key": "PARAKEET_PORT", "label": "Port", "type": "int",
"placeholder": "8000",
"help": "Port Parakeet listens on. Blank ⇒ 8000. Set this if you remapped it (e.g. because your vLLM holds 8000)."},
{"group": "Parakeet (STT)", "key": "PARAKEET_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "parakeet-asr",
"help": "Docker container name for Parakeet. Blank ⇒ parakeet-asr."},
{"group": "Parakeet (STT)", "key": "PARAKEET_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Parakeet container. Blank ⇒ your Spark 2 user."},
# --- Kokoro (TTS) ---
{"group": "Kokoro (TTS)", "key": "KOKORO_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Kokoro TTS container. Blank ⇒ Spark 2."},
{"group": "Kokoro (TTS)", "key": "KOKORO_PORT", "label": "Port", "type": "int",
"placeholder": "8880",
"help": "Port Kokoro listens on. Blank ⇒ 8880."},
{"group": "Kokoro (TTS)", "key": "KOKORO_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "kokoro-tts",
"help": "Docker container name for Kokoro. Blank ⇒ kokoro-tts."},
{"group": "Kokoro (TTS)", "key": "KOKORO_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Kokoro container. Blank ⇒ your Spark 2 user."},
# --- Embeddings ---
{"group": "Embeddings", "key": "EMBED_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the spark-embed container (bge-m3 + reranker). Blank ⇒ Spark 2."},
{"group": "Embeddings", "key": "EMBED_PORT", "label": "Port", "type": "int",
"placeholder": "8088",
"help": "Port the embedding server listens on. Blank ⇒ 8088."},
{"group": "Embeddings", "key": "EMBED_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "spark-embed",
"help": "Docker container name for the embedding server. Blank ⇒ spark-embed."},
{"group": "Embeddings", "key": "EMBED_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the embedding container. Blank ⇒ your Spark 2 user."},
# --- Qdrant ---
{"group": "Qdrant", "key": "QDRANT_HOST", "label": "Host", "type": "text",
"placeholder": "leave blank for Spark 2",
"help": "Host running the Qdrant vector database. Blank ⇒ Spark 2."},
{"group": "Qdrant", "key": "QDRANT_PORT", "label": "Port", "type": "int",
"placeholder": "6333",
"help": "Port Qdrant's REST API listens on. Blank ⇒ 6333."},
{"group": "Qdrant", "key": "QDRANT_CONTAINER", "label": "Container name", "type": "text",
"placeholder": "qdrant",
"help": "Docker container name for Qdrant. Blank ⇒ qdrant."},
{"group": "Qdrant", "key": "QDRANT_USER", "label": "SSH user", "type": "text",
"placeholder": "leave blank for Spark 2 user",
"help": "SSH user that owns the Qdrant container. Blank ⇒ your Spark 2 user."},
{"group": "Qdrant", "key": "QDRANT_COLLECTION", "label": "Default collection", "type": "text",
"placeholder": "e.g. crm_chunks",
"help": "Collection used by /api/search when a request doesn't name one. Blank ⇒ callers must pass a collection."},
# --- Integrations ---
{"group": "Integrations", "key": "OPEN_WEBUI_URL", "label": "Open WebUI URL", "type": "text",
"placeholder": "e.g. https://open-webui.yourserver.local",
"help": "If set, the header shows a one-click 'Open chat' button to your Open WebUI."},
{"group": "Integrations", "key": "MATRIX_BRIDGE_USER", "label": "matrix-bridge bot SSH user", "type": "text",
"placeholder": "e.g. modelo",
"help": "SSH user owning the bot's ~/matrix-bridge clone (Spark 2). Set this to show the bot tile (update/restart/logs). Blank ⇒ tile hidden."},
{"group": "Integrations", "key": "NGC_API_KEY", "label": "NGC API key", "type": "secret",
"placeholder": "starts with nvapi-…",
"help": "NVIDIA NGC personal key, needed only to install NIM containers from nvcr.io. Stored on this server."},
{"group": "Integrations", "key": "SWAP_WEBHOOK_URL", "label": "Swap webhook URL", "type": "text",
"placeholder": "e.g. https://my-service.local/spark-swap",
"help": "POSTed a small JSON event (swap_complete / swap_failed) after every model swap, so automation can re-point to the new model. Blank ⇒ disabled."},
{"group": "Integrations", "key": "SWAP_WEBHOOK_SECRET", "label": "Swap webhook secret", "type": "secret",
"placeholder": "a random shared string",
"help": "If set, each webhook is HMAC-signed (X-Spark-Signature) so the receiver can verify it. Blank ⇒ unsigned."},
]
_BY_KEY = {f["key"]: f for f in FIELDS}
_SECRET_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "secret")
_INT_KEYS = frozenset(f["key"] for f in FIELDS if f["type"] == "int")
# Reject control characters (incl. newlines) — these values flow into env vars,
# URLs, and SSH command lines (quoted at the sink, but defence in depth).
_BAD_CHARS = re.compile(r"[\x00-\x1f\x7f]")
# A secret's value is never echoed back, so a blank submit means "keep the stored
# one" (you can't see it to retype it). To actually *remove* a stored secret the
# UI sends this sentinel instead of a real value. Surfaced to the front-end via
# public_view so the two stay in sync.
CLEAR_SENTINEL = "__clear__"
def _path() -> Path:
return Path(os.environ.get("APP_SETTINGS_FILE", "/data/app_settings.json"))
def field_keys() -> frozenset[str]:
return frozenset(_BY_KEY)
def load_overlay() -> dict[str, str]:
"""Return the overlay as {ENV_KEY: value}, filtered to known, non-empty keys.
Pure read (no side effects) — called on every Settings (re)build, so it must
not write. Missing/corrupt file ⇒ {}. The file is tiny."""
p = _path()
if not p.exists():
return {}
try:
raw = json.loads(p.read_text())
except (ValueError, OSError) as e:
log.warning("ignoring unreadable %s: %s", p, e)
return {}
if not isinstance(raw, dict):
return {}
return {k: str(v) for k, v in raw.items() if k in _BY_KEY and v not in (None, "")}
def seed_from_env(env: Mapping[str, str]) -> None:
"""One-time migration, called once at startup: if no overlay exists yet, seed
it from the current env so any optional value previously set via the StartOS
action carries into the gear automatically (nothing lost on upgrade). No-op
if the file already exists or the env carries no known non-empty knob — a
fresh install then starts with no overlay and pure defaults. Values run
through the same validation as apply(); a malformed one (e.g. a paste-error
port) is skipped rather than written, matching the gear's own guards."""
if _path().exists():
return
seeded: dict[str, str] = {}
for k in _BY_KEY:
v = env.get(k)
if not v:
continue
try:
cleaned = _validate(k, v)
except SettingsError as e:
log.warning("skipping invalid env value while seeding overlay: %s", e)
continue
if cleaned and cleaned != CLEAR_SENTINEL:
seeded[k] = cleaned
if seeded:
_write(seeded)
log.info("seeded settings overlay from env (%d keys): %s", len(seeded), _path())
def _write(overlay: dict[str, str]) -> None:
p = _path()
p.parent.mkdir(parents=True, exist_ok=True)
# Atomic replace so a crash mid-write never leaves a truncated overlay.
fd, tmp = tempfile.mkstemp(dir=str(p.parent), prefix=".app_settings.", suffix=".tmp")
try:
with os.fdopen(fd, "w") as fh:
json.dump(overlay, fh, indent=2, sort_keys=True)
os.replace(tmp, p)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
def public_view() -> dict:
"""Shape the gear form for the browser: ordered groups of fields with their
current overlay value. Secret values are never sent — only a `set` flag."""
overlay = load_overlay()
groups: list[dict] = []
index: dict[str, dict] = {}
for f in FIELDS:
g = index.get(f["group"])
if g is None:
g = {"name": f["group"], "fields": []}
index[f["group"]] = g
groups.append(g)
entry = {
"key": f["key"],
"label": f["label"],
"type": f["type"],
"placeholder": f.get("placeholder", ""),
"help": f.get("help", ""),
}
if f["type"] == "secret":
entry["set"] = bool(overlay.get(f["key"]))
else:
entry["value"] = overlay.get(f["key"], "")
g["fields"].append(entry)
return {"groups": groups, "clear_sentinel": CLEAR_SENTINEL}
class SettingsError(ValueError):
"""Bad input to apply() — surfaced as 422 by the endpoint."""
def _validate(key: str, value) -> str:
"""Clean + validate one value; raise SettingsError on bad input. Returns the
stripped string ('' is valid and means 'unset'). The CLEAR_SENTINEL passes
through for the caller to interpret (secret removal)."""
if key not in _BY_KEY:
raise SettingsError(f"unknown setting: {key}")
val = ("" if value is None else str(value)).strip()
if val == CLEAR_SENTINEL:
return val
if _BAD_CHARS.search(val):
raise SettingsError(f"{key}: control characters are not allowed")
if key in _INT_KEYS and val:
if not val.isdigit() or not (1 <= int(val) <= 65535):
raise SettingsError(f"{key}: must be a port number between 1 and 65535")
return val
def apply(updates: Mapping[str, str]) -> dict[str, str]:
"""Validate `updates` and merge them into the overlay, then persist.
Rules per key:
- unknown key / bad int / control chars → reject (422, via _validate)
- secret + CLEAR_SENTINEL → delete the stored secret
- secret + blank value → leave the stored secret unchanged (don't wipe)
- non-secret + blank → delete the key (revert to env/default)
- otherwise → set the key
Returns the new overlay. The caller reloads Settings so the change goes live.
"""
overlay = load_overlay()
for key, value in updates.items():
val = _validate(key, value)
if key in _SECRET_KEYS:
if val == CLEAR_SENTINEL:
overlay.pop(key, None)
elif val:
overlay[key] = val
# blank secret ⇒ leave the existing value in place
elif val and val != CLEAR_SENTINEL:
overlay[key] = val
else:
overlay.pop(key, None)
_write(overlay)
return overlay
+84 -58
View File
@@ -1,26 +1,28 @@
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
from dataclasses import dataclass, fields
from pathlib import Path
from typing import Mapping
from . import app_settings
from .shellsafe import validate_container
log = logging.getLogger(__name__)
def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default)
def _env(src: Mapping[str, str], name: str, default: str = "") -> str:
return src.get(name, default)
def _env_container(name: str, default: str) -> str:
def _env_container(src: Mapping[str, str], name: str, default: str) -> str:
"""Resolve a container-name env var, validating it at the config boundary.
The value flows into `docker logs`/`docker exec` over SSH, so it's quoted at
the sink — but per the repo's two-layer convention it's also whitelist-checked
here. A malformed optional value falls back to `default` rather than crashing
daemon startup (mirrors `_env_int` for VLLM_PORT)."""
val = os.environ.get(name, "") or default
daemon startup (mirrors `_env_int`)."""
val = src.get(name, "") or default
try:
return validate_container(val)
except ValueError:
@@ -28,23 +30,23 @@ def _env_container(name: str, default: str) -> str:
return default
def _env_set(name: str) -> frozenset[str]:
def _env_set(src: Mapping[str, str], name: str) -> frozenset[str]:
"""Parse a comma-separated env var into a lowercased frozenset of keys.
Used by DISABLED_SERVICES so an adopter whose cluster doesn't run a given
support service can switch its tile + probes off entirely (rather than have
the probe hit whatever else listens on that port — e.g. a vLLM sharing
Parakeet's default 8000)."""
raw = os.environ.get(name, "")
raw = src.get(name, "")
return frozenset(part.strip().lower() for part in raw.split(",") if part.strip())
def _env_int(name: str, default: int) -> int:
def _env_int(src: Mapping[str, str], name: str, default: int) -> int:
"""Parse an int env var, falling back to `default` when unset, blank, or
malformed. The StartOS Configure panel passes optional numeric fields as an
empty string when left blank, so a bare int("") would crash daemon startup."""
malformed. Optional numeric fields arrive as an empty string when left blank,
so a bare int("") would crash daemon startup."""
try:
return int(os.environ.get(name, "") or default)
return int(src.get(name, "") or default)
except (TypeError, ValueError):
return default
@@ -64,8 +66,23 @@ def _resolve_models_yaml() -> str:
return str(candidates[0]) # let load fail with a clear path
@dataclass(frozen=True)
def _effective_env() -> dict[str, str]:
"""The env Settings is built from: process env first, the in-app settings
overlay on top. The overlay (the dashboard 'gear') is keyed by the same env
var names, so a knob set in the UI overrides the value the StartOS action
injected — while an un-touched knob keeps falling through to the action's
value, then to the code default. See app_settings."""
return {**os.environ, **app_settings.load_overlay()}
@dataclass
class Settings:
# NOTE: intentionally NOT frozen. There is exactly one Settings instance,
# shared by reference across every router closure and manager (build_router,
# self.settings = settings). `reload()` mutates it in place so a change saved
# via the in-app settings gear goes live for all of them without rebuilding
# the app — the only window of inconsistency is the microseconds it takes to
# reassign the fields, acceptable for a single-operator config save.
spark1_host: str
spark1_user: str
spark2_host: str
@@ -107,73 +124,82 @@ class Settings:
swap_webhook_secret: str
@classmethod
def from_env(cls) -> "Settings":
spark2_host = _env("SPARK2_HOST")
spark2_user = _env("SPARK2_USER")
def from_env(cls, src: Mapping[str, str] | None = None) -> "Settings":
src = _effective_env() if src is None else src
spark2_host = _env(src, "SPARK2_HOST")
spark2_user = _env(src, "SPARK2_USER")
# Parakeet (STT) and Kokoro (TTS) default to Spark 2 unless overridden.
return cls(
spark1_host=_env("SPARK1_HOST"),
spark1_user=_env("SPARK1_USER"),
spark1_host=_env(src, "SPARK1_HOST"),
spark1_user=_env(src, "SPARK1_USER"),
spark2_host=spark2_host,
spark2_user=spark2_user,
parakeet_host=_env("PARAKEET_HOST") or spark2_host,
parakeet_user=_env("PARAKEET_USER") or spark2_user,
parakeet_container=_env("PARAKEET_CONTAINER") or "parakeet-asr",
kokoro_host=_env("KOKORO_HOST") or spark2_host,
kokoro_user=_env("KOKORO_USER") or spark2_user,
kokoro_container=_env("KOKORO_CONTAINER") or "kokoro-tts",
parakeet_host=_env(src, "PARAKEET_HOST") or spark2_host,
parakeet_user=_env(src, "PARAKEET_USER") or spark2_user,
parakeet_container=_env(src, "PARAKEET_CONTAINER") or "parakeet-asr",
kokoro_host=_env(src, "KOKORO_HOST") or spark2_host,
kokoro_user=_env(src, "KOKORO_USER") or spark2_user,
kokoro_container=_env(src, "KOKORO_CONTAINER") or "kokoro-tts",
# Embeddings (spark-embed: bge-m3 dense + reranker) and Qdrant
# (vector storage) default to Spark 2 unless overridden.
embed_host=_env("EMBED_HOST") or spark2_host,
embed_user=_env("EMBED_USER") or spark2_user,
embed_container=_env("EMBED_CONTAINER") or "spark-embed",
qdrant_host=_env("QDRANT_HOST") or spark2_host,
qdrant_user=_env("QDRANT_USER") or spark2_user,
qdrant_container=_env("QDRANT_CONTAINER") or "qdrant",
qdrant_collection=_env("QDRANT_COLLECTION", ""),
embed_host=_env(src, "EMBED_HOST") or spark2_host,
embed_user=_env(src, "EMBED_USER") or spark2_user,
embed_container=_env(src, "EMBED_CONTAINER") or "spark-embed",
qdrant_host=_env(src, "QDRANT_HOST") or spark2_host,
qdrant_user=_env(src, "QDRANT_USER") or spark2_user,
qdrant_container=_env(src, "QDRANT_CONTAINER") or "qdrant",
qdrant_collection=_env(src, "QDRANT_COLLECTION", ""),
# matrix-bridge bot container, driven as its own SSH user (the owner
# of the ~/matrix-bridge git clone) so git/docker run unprivileged.
# The user is BLANK by default and set via the "Configure Sparks"
# action; leaving it blank reports the service as unconfigured, which
# hides the tile. That keeps the shared package portable — a
# deployment without the bot never shows a stray tile or a hardcoded
# username. Host defaults to Spark 2 (same box); container/dir/branch
# are sensible defaults. All are env-overridable.
matrix_bridge_host=_env("MATRIX_BRIDGE_HOST") or spark2_host,
matrix_bridge_user=_env("MATRIX_BRIDGE_USER"),
matrix_bridge_container=_env("MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
matrix_bridge_dir=_env("MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
# The user is BLANK by default and set via the settings gear; leaving
# it blank reports the service as unconfigured, which hides the tile.
# That keeps the shared package portable — a deployment without the
# bot never shows a stray tile or a hardcoded username. Host defaults
# to Spark 2 (same box); container/dir/branch are sensible defaults.
matrix_bridge_host=_env(src, "MATRIX_BRIDGE_HOST") or spark2_host,
matrix_bridge_user=_env(src, "MATRIX_BRIDGE_USER"),
matrix_bridge_container=_env(src, "MATRIX_BRIDGE_CONTAINER") or "matrix-bridge",
matrix_bridge_dir=_env(src, "MATRIX_BRIDGE_DIR") or "~/matrix-bridge",
matrix_bridge_branch=_env(src, "MATRIX_BRIDGE_BRANCH") or "master",
# Redaction gateway pseudonym-map store (server-held de-anon key).
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
ssh_key_path=_env("SSH_KEY_PATH"),
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
redaction_map_db=_env(src, "REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=_env_int(src, "REDACTION_MAP_TTL", 7200),
ssh_key_path=_env(src, "SSH_KEY_PATH"),
ssh_known_hosts=_env(src, "SSH_KNOWN_HOSTS"),
models_yaml=_resolve_models_yaml(),
vllm_port=_env_int("VLLM_PORT", 8888),
vllm_port=_env_int(src, "VLLM_PORT", 8888),
# Container name for the swappable vLLM on Spark 1. Defaults to the
# bundled launch-cluster.sh container; override if you named yours
# something else (the swap log-tail and pre-flight validator exec
# into it by name).
vllm_container=_env_container("VLLM_CONTAINER", "vllm_node"),
vllm_container=_env_container(src, "VLLM_CONTAINER", "vllm_node"),
# Built-in support-service keys (parakeet, kokoro, embeddings,
# qdrant) the deployment doesn't run — hidden from the dashboard and
# never probed.
disabled_services=_env_set("DISABLED_SERVICES"),
parakeet_port=_env_int("PARAKEET_PORT", 8000),
kokoro_port=_env_int("KOKORO_PORT", 8880),
embed_port=_env_int("EMBED_PORT", 8088),
qdrant_port=_env_int("QDRANT_PORT", 6333),
bind_port=_env_int("BIND_PORT", 9999),
open_webui_url=_env("OPEN_WEBUI_URL", ""),
ngc_api_key=_env("NGC_API_KEY", ""),
disabled_services=_env_set(src, "DISABLED_SERVICES"),
parakeet_port=_env_int(src, "PARAKEET_PORT", 8000),
kokoro_port=_env_int(src, "KOKORO_PORT", 8880),
embed_port=_env_int(src, "EMBED_PORT", 8088),
qdrant_port=_env_int(src, "QDRANT_PORT", 6333),
bind_port=_env_int(src, "BIND_PORT", 9999),
open_webui_url=_env(src, "OPEN_WEBUI_URL", ""),
ngc_api_key=_env(src, "NGC_API_KEY", ""),
# Coordination layer: fire a swap-lifecycle webhook to this URL so
# downstream consumers re-point their model config on a swap. Blank
# ⇒ disabled. The optional secret HMAC-signs the body (X-Spark-Signature).
swap_webhook_url=_env("SWAP_WEBHOOK_URL", ""),
swap_webhook_secret=_env("SWAP_WEBHOOK_SECRET", ""),
swap_webhook_url=_env(src, "SWAP_WEBHOOK_URL", ""),
swap_webhook_secret=_env(src, "SWAP_WEBHOOK_SECRET", ""),
)
def reload(self) -> None:
"""Recompute every field from the current env + settings overlay and
assign it onto this same instance, so all holders of the reference see
the change without an app restart. Called after the gear writes the
overlay (see server.post_settings)."""
fresh = Settings.from_env()
for f in fields(self):
setattr(self, f.name, getattr(fresh, f.name))
@property
def configured(self) -> bool:
return bool(self.spark1_host)
+8
View File
@@ -239,6 +239,14 @@ class WebhookNotifier:
self.secret = secret or ""
self.timeout = timeout
def update(self, url: str, secret: str = "") -> None:
"""Re-point after a live settings change. The notifier holds snapshot
copies of these two fields (not the Settings object), so Settings.reload()
can't reach it — server.post_settings calls this explicitly so editing the
webhook URL/secret in the dashboard gear takes effect without a restart."""
self.url = (url or "").strip()
self.secret = secret or ""
@property
def enabled(self) -> bool:
return bool(self.url)
+89 -46
View File
@@ -1,6 +1,7 @@
from __future__ import annotations
import asyncio
import json
import os
from pathlib import Path
from fastapi import FastAPI, HTTPException, Query, Request
@@ -9,6 +10,7 @@ from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel, ValidationError
from typing import Literal
from . import app_settings
from .config import Settings
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
from .coordination import LockHeld, ScheduleRegistry, SwapLockManager, WebhookNotifier, valid_schedule_id
@@ -37,6 +39,10 @@ from .validate import validate_launch
from .wol import send_local_broadcast, send_via_peer
# One-time migration: seed the in-app settings overlay from env (values set via
# the StartOS action on a pre-gear install) before building Settings, so nothing
# is lost on upgrade. No-op once the overlay exists. See app_settings.
app_settings.seed_from_env(os.environ)
settings = Settings.from_env()
catalog = load_catalog(settings.models_yaml)
# Coordination layer (GPU arbiter): swap-lifecycle webhook, the swap reservation
@@ -156,6 +162,35 @@ async def get_config() -> dict:
}
# ---- In-app settings ('gear') ----
# The optional cluster knobs (ports, container names, support-service hosts,
# integrations) live in an app-owned overlay on /data, edited here instead of in
# the StartOS action — which keeps to just the four required setup fields. See
# app_settings. Writes apply live: we rewrite the overlay then reload the shared
# Settings instance in place, so every router/manager holding the reference picks
# up the change with no container restart.
@app.get("/api/settings")
async def get_settings() -> dict:
return app_settings.public_view()
class SettingsUpdate(BaseModel):
values: dict[str, str]
@app.post("/api/settings")
async def post_settings(req: SettingsUpdate) -> dict:
try:
app_settings.apply(req.values)
except app_settings.SettingsError as e:
raise HTTPException(422, str(e))
settings.reload()
# WebhookNotifier snapshots url/secret (not the Settings object), so reload()
# can't reach it — re-point it explicitly so a webhook edit applies live too.
swap_webhook.update(settings.swap_webhook_url, settings.swap_webhook_secret)
return app_settings.public_view()
def _reload_catalog() -> None:
global catalog
catalog = load_catalog(settings.models_yaml)
@@ -947,6 +982,56 @@ async def post_swap(req: SwapRequest, request: Request) -> dict:
return {"job_id": job.id, "model_key": job.model_key, "state": job.state}
# ---- Swap reservation lock (the GPU arbiter) ----
# ROUTE ORDER IS LOAD-BEARING: these static `/api/swap/lock` routes MUST be
# registered before the parametric `/api/swap/{job_id}` below. FastAPI matches in
# registration order, so if `{job_id}` came first, GET /api/swap/lock would bind
# job_id="lock", look up a (non-existent) swap job, and 404 — which is exactly
# the bug this ordering fixes. Keep these above the {job_id} routes.
# CSRF: these are control-surface, not browser-exempt — an external scheduler is
# a non-browser client (no Origin header) so it passes the guard already, the
# same way it calls /api/swap; the dashboard is same-origin.
class LockAcquireRequest(BaseModel):
holder: str
ttl_seconds: int | None = None
note: str = ""
token: str | None = None # present only to extend an existing hold
@app.post("/api/swap/lock")
async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
"""Reserve the GPU swap path. Returns a secret token used to swap (header
X-Swap-Lock-Token) and to release. 409 if held by another holder."""
try:
lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
except ValueError as e:
raise HTTPException(422, str(e))
except LockHeld as e:
raise HTTPException(status_code=409, detail={
"error": "swap lock is held by another holder",
"lock": e.state,
})
return {**swap_lock.status(), "token": lock.token}
@app.get("/api/swap/lock")
async def get_swap_lock() -> dict:
"""Public, token-free view of the reservation: held? who? until when?"""
return swap_lock.status()
@app.delete("/api/swap/lock")
async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
"""Release the reservation. Needs the matching X-Swap-Lock-Token unless
?force=true (the human override from the dashboard)."""
token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
try:
released = swap_lock.release(token, force=force)
except PermissionError as e:
raise HTTPException(403, str(e))
return {"released": released, **swap_lock.status()}
@app.get("/api/swap/{job_id}")
async def get_swap(job_id: str) -> dict:
job = swap_manager.get(job_id)
@@ -992,52 +1077,10 @@ async def stream_swap(job_id: str):
return StreamingResponse(gen(), media_type="text/event-stream")
# ---- Coordination layer: swap lock + schedule registry ----
# Endpoints are control-surface, not browser-exempt: an external scheduler is a
# non-browser client (no Origin header) so it passes the CSRF guard already, the
# same way it calls /api/swap today; the dashboard is same-origin.
class LockAcquireRequest(BaseModel):
holder: str
ttl_seconds: int | None = None
note: str = ""
token: str | None = None # present only to extend an existing hold
@app.post("/api/swap/lock")
async def acquire_swap_lock(req: LockAcquireRequest) -> dict:
"""Reserve the GPU swap path. Returns a secret token used to swap (header
X-Swap-Lock-Token) and to release. 409 if held by another holder."""
try:
lock = swap_lock.acquire(req.holder, req.ttl_seconds, req.note, token=req.token)
except ValueError as e:
raise HTTPException(422, str(e))
except LockHeld as e:
raise HTTPException(status_code=409, detail={
"error": "swap lock is held by another holder",
"lock": e.state,
})
return {**swap_lock.status(), "token": lock.token}
@app.get("/api/swap/lock")
async def get_swap_lock() -> dict:
"""Public, token-free view of the reservation: held? who? until when?"""
return swap_lock.status()
@app.delete("/api/swap/lock")
async def release_swap_lock(request: Request, force: bool = Query(False)) -> dict:
"""Release the reservation. Needs the matching X-Swap-Lock-Token unless
?force=true (the human override from the dashboard)."""
token = request.headers.get("x-swap-lock-token") or request.query_params.get("token")
try:
released = swap_lock.release(token, force=force)
except PermissionError as e:
raise HTTPException(403, str(e))
return {"released": released, **swap_lock.status()}
# ---- Coordination layer: read-only schedule registry ----
# (The swap reservation lock lives above, next to the swap routes.) Same CSRF
# posture: control-surface, not browser-exempt — external schedulers send no
# Origin header so they pass the guard; the dashboard is same-origin.
class ScheduleRequest(BaseModel):
name: str
id: str | None = None
+96
View File
@@ -2192,8 +2192,104 @@ function handleUpdateDone(d) {
setTimeout(pollUpdates, 2000);
}
// ===================== settings ('gear') =====================
// Renders the optional cluster knobs from /api/settings (server-driven field
// list, so adding a knob server-side needs no JS change) and POSTs edits back.
// The server reloads its config in place, so changes take effect immediately.
let settingsClearSentinel = '__clear__';
function renderSettingsForm(data) {
settingsClearSentinel = data.clear_sentinel || settingsClearSentinel;
const body = el('#settings-body');
body.innerHTML = (data.groups || []).map((g) => {
const rows = g.fields.map((f) => {
const help = f.help ? `<span class="muted small settings-help">${escapeHtml(f.help)}</span>` : '';
let input;
let clearToggle = '';
if (f.type === 'secret') {
const ph = f.set ? 'set — leave blank to keep' : (f.placeholder || '');
input = `<input type="password" autocomplete="off" data-key="${f.key}" data-secret="1" placeholder="${escapeHtml(ph)}">`;
// A stored secret is never echoed back, so blank means "keep". Offer an
// explicit way to remove it.
if (f.set) clearToggle = `<label class="settings-clear muted small"><input type="checkbox" data-clear-for="${f.key}"> clear stored value</label>`;
} else if (f.type === 'int') {
input = `<input type="number" min="1" max="65535" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
} else {
input = `<input type="text" autocomplete="off" data-key="${f.key}" value="${escapeHtml(f.value || '')}" placeholder="${escapeHtml(f.placeholder || '')}">`;
}
return `<div class="settings-field"><label class="modal-row"><span>${escapeHtml(f.label)}</span>${input}</label>${clearToggle}${help}</div>`;
}).join('');
return `<fieldset class="modal-fieldset"><legend>${escapeHtml(g.name)}</legend>${rows}</fieldset>`;
}).join('');
}
async function openSettingsDialog() {
const dlg = el('#settings-dialog');
const err = el('#settings-error');
err.classList.add('hidden');
el('#settings-body').innerHTML = '<p class="muted small">Loading…</p>';
dlg.showModal();
try {
renderSettingsForm(await fetchJSON('/api/settings'));
} catch (e) {
el('#settings-body').innerHTML = '';
err.textContent = 'Could not load settings: ' + e.message;
err.classList.remove('hidden');
}
}
async function saveSettings(e) {
e.preventDefault();
const err = el('#settings-error');
err.classList.add('hidden');
const values = {};
$$('#settings-body [data-key]').forEach((inp) => {
const key = inp.dataset.key;
const v = inp.value.trim();
if (inp.dataset.secret) {
// "clear" checkbox wins; else a typed value sets it; else omit (keep the
// stored one — we can't see it to retype it).
const clear = el(`[data-clear-for="${key}"]`);
if (clear && clear.checked) values[key] = settingsClearSentinel;
else if (v) values[key] = v;
} else {
values[key] = v; // blank non-secret ⇒ server reverts it to the default
}
});
const btn = el('#settings-save');
btn.disabled = true;
try {
await fetchJSON('/api/settings', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({ values }),
});
el('#settings-dialog').close();
// Re-pull everything a knob can move: the Open WebUI link, health probes,
// service tiles, and the model menu (host/port changes alter all of them).
try {
state.config = await fetchJSON('/api/config');
const a = el('#open-webui-link');
if (state.config.open_webui_url) { a.href = state.config.open_webui_url; a.classList.remove('hidden'); }
else { a.classList.add('hidden'); }
} catch (e3) { console.warn('post-save /api/config refresh failed:', e3); }
pollStatus();
renderServices();
loadModels();
} catch (e2) {
err.textContent = 'Save failed: ' + e2.message.replace(/^\d+ [^:]*:\s*/, '');
err.classList.remove('hidden');
} finally {
btn.disabled = false;
}
}
async function init() {
setupCopyButtons();
el('#open-settings').addEventListener('click', openSettingsDialog);
el('#settings-cancel').addEventListener('click', () => el('#settings-dialog').close());
el('#settings-form').addEventListener('submit', saveSettings);
el('#open-download').addEventListener('click', openDownloadForm);
el('#dl-cancel').addEventListener('click', closeDownloadPanel);
el('#dl-start').addEventListener('click', startDownload);
+15 -1
View File
@@ -17,14 +17,28 @@
<span class="muted">connecting…</span>
</div>
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
<button id="open-settings" class="topbar-btn" type="button" title="Settings" aria-label="Open cluster settings">⚙ Settings</button>
</header>
<main>
<section id="setup-banner" class="banner hidden">
<strong>Configuration needed.</strong>
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
<span>Run the <em>Configure Sparks</em> action in StartOS to set your two Spark IPs and SSH users. Everything else (ports, services, integrations) lives under <em>⚙ Settings</em> above.</span>
</section>
<dialog id="settings-dialog" class="modal">
<form method="dialog" class="modal-form" id="settings-form">
<h3>Settings</h3>
<p class="muted small">Optional cluster knobs — vLLM/service ports, container names, support-service hosts, and integrations. The two Spark IPs and SSH users are set once via the <em>Configure Sparks</em> action in StartOS; everything else is here. Changes apply immediately. Stored on this server and included in StartOS backups.</p>
<div id="settings-body" class="settings-body"><p class="muted small">Loading…</p></div>
<p id="settings-error" class="muted small dd-error hidden"></p>
<div class="modal-actions">
<button type="button" id="settings-cancel" class="btn">Cancel</button>
<button type="submit" id="settings-save" class="btn primary">Save</button>
</div>
</form>
</dialog>
<section id="hardware-panel" class="hardware-panel hidden">
<div class="section-header">
<h2 class="section-title">Spark hardware</h2>
+10
View File
@@ -964,3 +964,13 @@ main {
.tab-content.active { display: block; }
/* (WhisperX install banner styles removed in v0.13.0:0 — see release notes) */
/* ===== Settings ('gear') dialog ===== */
.modal#settings-dialog { max-width: 560px; }
/* Cap the (tall) form so the Save/Cancel actions stay reachable; the grouped
fields scroll within. */
#settings-body { max-height: 60vh; overflow-y: auto; padding-right: 6px; display: flex; flex-direction: column; gap: 12px; }
.settings-field { display: flex; flex-direction: column; gap: 2px; }
.settings-help { display: block; line-height: 1.35; }
.settings-clear { display: inline-flex; align-items: center; gap: 6px; margin-top: 2px; cursor: pointer; }
.settings-clear input { width: auto; }