v0.22.0:0 - configurable vllm port; gitea-release tooling; coexistence roadmap

- Configure Sparks gains a vLLM port field (blank => 8888, our launch-cluster.sh
  default); VLLM_PORT plumbed configureSparks -> sparkConfig.yaml -> main.ts env
  -> config.py. So an adopter whose vLLM listens elsewhere (e.g. 8000) can fix
  the "vLLM unreachable" health check without rebuilding the package.
- Harden numeric env parsing (config._env_int): a blank or malformed port now
  falls back to its default instead of crashing daemon startup (closes a P3
  tech-debt item; the Configure panel passes unset optional fields as "").
- Add scripts/gitea-release.sh + `make release` to publish the built s9pk to
  Gitea Releases, so the OpenClaw adopter pulls updates with a read-only token
  instead of being hand-sent the package.
- Capture the OpenClaw/Johnny-5 coexistence epic and the "control plane, not a
  job runner" stance in ROADMAP.md and Current state.
This commit is contained in:
Keysat
2026-06-17 19:45:09 -05:00
parent c179389731
commit 136a4713a1
9 changed files with 104 additions and 10 deletions
+1 -1
View File
@@ -63,4 +63,4 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns).
- **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Next:** (1) audio concurrency sweep — only if the Signal Engine dev wants the measured knee; needs owner OK in a quiet window. (2) Otherwise pull from `ROADMAP.md`: local-path/fine-tuned model support (new) or P2 tech-debt. (matrix-bridge Phase 3 shipped v0.21.0:1; only open item is the optional Docker `HEALTHCHECK` if the bot dev asks. Parakeet long-audio guard deferred rationale in ROADMAP.)
- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — DONE in tree, staged as **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). **Not yet built/installed/committed — awaiting go/no-go.** (2) local-path/fine-tuned models (in ROADMAP under Dashboard). (3) configurable topology (service→Spark→port map + container names). (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP).
+15
View File
@@ -2,6 +2,21 @@
Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up.
## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16)
Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU).
**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap``GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability.
Sequenced:
1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000.
2. **Local-path / fine-tuned model support** — see the dedicated item under "## Dashboard" below. Independently wanted; his merged `ten31-v2` (a directory, not an HF repo) is the motivating case.
3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.)
4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models):
- **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory.
- **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes.
- **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule).
## Near term
- parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files.
- Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee.
+17 -7
View File
@@ -8,6 +8,16 @@ def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default)
def _env_int(name: str, default: int) -> int:
"""Parse an int env var, falling back to `default` when unset, blank, or
malformed. The StartOS Configure panel passes optional numeric fields as an
empty string when left blank, so a bare int("") would crash daemon startup."""
try:
return int(os.environ.get(name, "") or default)
except (TypeError, ValueError):
return default
def _resolve_models_yaml() -> str:
if env := os.environ.get("MODELS_YAML"):
return env
@@ -101,16 +111,16 @@ class Settings:
matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master",
# Redaction gateway pseudonym-map store (server-held de-anon key).
redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"),
redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")),
redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200),
ssh_key_path=_env("SSH_KEY_PATH"),
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
models_yaml=_resolve_models_yaml(),
vllm_port=int(_env("VLLM_PORT", "8888")),
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
kokoro_port=int(_env("KOKORO_PORT", "8880")),
embed_port=int(_env("EMBED_PORT", "8088")),
qdrant_port=int(_env("QDRANT_PORT", "6333")),
bind_port=int(_env("BIND_PORT", "9999")),
vllm_port=_env_int("VLLM_PORT", 8888),
parakeet_port=_env_int("PARAKEET_PORT", 8000),
kokoro_port=_env_int("KOKORO_PORT", 8880),
embed_port=_env_int("EMBED_PORT", 8088),
qdrant_port=_env_int("QDRANT_PORT", 6333),
bind_port=_env_int("BIND_PORT", 9999),
open_webui_url=_env("OPEN_WEBUI_URL", ""),
ngc_api_key=_env("NGC_API_KEY", ""),
)
+11
View File
@@ -1,3 +1,14 @@
ARCHES := x86
# overrides to s9pk.mk must precede the include statement
include s9pk.mk
# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only
# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN;
# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh.
RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts)
.PHONY: release
release:
@test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; }
GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \
../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk"
@@ -40,6 +40,15 @@ const inputSpec = InputSpec.of({
placeholder: 'your SSH username',
masked: false,
}),
vllm_port: Value.text({
name: 'vLLM port (optional)',
description:
"The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.",
required: false,
default: null,
placeholder: 'leave blank for 8888',
masked: false,
}),
parakeet_host: Value.text({
name: 'Parakeet host (optional)',
description:
@@ -7,6 +7,8 @@ export const sparkConfigSchema = z.object({
spark1_user: z.string().catch(''),
spark2_host: z.string().catch(''),
spark2_user: z.string().catch(''),
// Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default).
vllm_port: z.string().catch(''),
// Optional per-service overrides. Blank => use spark2_host / spark2_user.
parakeet_host: z.string().catch(''),
parakeet_user: z.string().catch(''),
+2
View File
@@ -13,6 +13,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
spark1_user: '',
spark2_host: '',
spark2_user: '',
vllm_port: '',
parakeet_host: '',
parakeet_user: '',
parakeet_container: '',
@@ -50,6 +51,7 @@ export const main = sdk.setupMain(async ({ effects }) => {
SPARK1_USER: cfg.spark1_user,
SPARK2_HOST: cfg.spark2_host,
SPARK2_USER: cfg.spark2_user,
VLLM_PORT: cfg.vllm_port,
PARAKEET_HOST: cfg.parakeet_host,
PARAKEET_USER: cfg.parakeet_user,
PARAKEET_CONTAINER: cfg.parakeet_container,
+2 -2
View File
@@ -1,10 +1,10 @@
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
export const v0_1_0 = VersionInfo.of({
version: '0.21.0:1',
version: '0.22.0:0',
releaseNotes: {
en_US:
"v0.21.0:1matrix-bridge bot tile. The Matrix bot container on Spark 2 now appears as a tile under \"Always-on services\" with a live status badge (judged by the container itself, since the bot has no health port). Buttons: Update (pulls the latest code, rebuilds the image, and recreates the container — long-running, with a streamed log and a generous timeout), Restart, Stop/Start, and View logs (last 100 lines). Everything fails loud: a non-zero exit or stderr shows in the panel rather than a silent stall. To enable it, set the bot's SSH user (the owner of ~/matrix-bridge, e.g. 'modelo') in the Configure Sparks action — leave it blank and no tile appears, so this stays out of the way on systems that don't run the bot. New endpoints (LAN-only, browser-driven): POST /api/matrix-bridge/update (+ /{id} and /{id}/stream for progress), GET /api/matrix-bridge/logs. One-time setup on the Spark (owner): make ~/matrix-bridge a git clone of your Gitea repo, and — unless that SSH user is the same as your Spark 2 user — authorize this package's SSH public key for it (Show Public Key, then add it to that user's authorized_keys). There is no passwordless sudo on the Spark, so commands run directly as that user rather than via sudo.",
"v0.22.0:0configurable vLLM port. The port Spark Control uses to reach vLLM on Spark 1 (the health check and the chat proxy) is now a field in the Configure Sparks action, so you can point it at a vLLM that listens on a non-default port without rebuilding the package. Leave it blank to keep the previous default of 8888 — what the bundled launch-cluster.sh wrapper uses; set it to 8000 (vLLM's own default) or any other port if your vLLM listens elsewhere. Also hardened numeric-setting parsing so a blank or malformed port value falls back to its default instead of crashing daemon startup.",
},
migrations: {
up: async ({ effects }) => {},
+45
View File
@@ -0,0 +1,45 @@
#!/usr/bin/env bash
# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the
# latest package with a read-only token instead of being hand-sent the file.
#
# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN=<write-token> \
# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk
#
# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed
# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it
# reuses an existing release for the tag and replaces a same-named asset.
set -euo pipefail
VERSION="${1:-}"; S9PK="${2:-}"
[ -n "$VERSION" ] && [ -n "$S9PK" ] || {
echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 <version e.g. 0.22.0:0> <s9pk path>" >&2; exit 2; }
: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}"
: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository write access}"
[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; }
TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0
ASSET="$(basename "$S9PK")"
SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control
API="${GITEA_URL%/}/api/v1/repos/${SLUG}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}")
echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}"
# Reuse an existing release for this tag, otherwise create one.
id="$(curl -fsS "${AUTH[@]}" "$API/releases/tags/$TAG" 2>/dev/null | jq -r '.id // empty')"
if [ -z "$id" ]; then
id="$(curl -fsS -X POST "${AUTH[@]}" -H 'Content-Type: application/json' \
--data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \
'{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')" \
"$API/releases" | jq -r '.id')"
fi
[ -n "$id" ] && [ "$id" != null ] || { echo "could not obtain release id (check URL/token/tag)" >&2; exit 1; }
# Replace a same-named asset so re-runs don't 409.
old="$(curl -fsS "${AUTH[@]}" "$API/releases/$id/assets" | jq -r --arg n "$ASSET" '.[] | select(.name==$n) | .id')"
[ -n "$old" ] && curl -fsS -X DELETE "${AUTH[@]}" "$API/releases/$id/assets/$old" >/dev/null || true
curl -fsS -X POST "${AUTH[@]}" -F "attachment=@${S9PK};type=application/octet-stream" \
"$API/releases/$id/assets?name=$ASSET" >/dev/null
echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"