diff --git a/AGENTS.md b/AGENTS.md index e515a74..398819b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -63,4 +63,4 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers; matrix-bridge badge won't visibly flip on a fast `docker restart` (status re-checked only after the command returns). - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag. - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.) -- **Next:** (1) audio concurrency sweep — only if the Signal Engine dev wants the measured knee; needs owner OK in a quiet window. (2) Otherwise pull from `ROADMAP.md`: local-path/fine-tuned model support (new) or P2 tech-debt. (matrix-bridge Phase 3 shipped v0.21.0:1; only open item is the optional Docker `HEALTHCHECK` if the bot dev asks. Parakeet long-audio guard deferred — rationale in ROADMAP.) +- **Next — committed 2026-06-17: OpenClaw/Johnny-5 coexistence epic (full plan + design stance in `ROADMAP.md` → "Cluster coordination").** Stance: Spark Control = control plane / GPU arbiter, **not** a job runner; business cron jobs live in separate services that *call* its swap API (swaps are already API-driven via `POST /api/swap`). Sequence: (1) **configurable `VLLM_PORT`** — DONE in tree, staged as **v0.22.0:0** (Configure-Sparks field, blank ⇒ 8888; + `_env_int` hardening in `config.py` so a blank/bad port no longer crashes startup, killing a P3 tech-debt item). **Not yet built/installed/committed — awaiting go/no-go.** (2) local-path/fine-tuned models (in ROADMAP under Dashboard). (3) configurable topology (service→Spark→port map + container names). (4) coordination layer (swap lock + swap webhook + schedule visibility) — only when our own automation lands. Still-open older threads: audio concurrency sweep (only if the Signal Engine dev wants the knee; needs a quiet window); optional matrix-bridge Docker `HEALTHCHECK` if the bot dev asks; Parakeet long-audio guard deferred (rationale in ROADMAP). diff --git a/ROADMAP.md b/ROADMAP.md index d4aaa2f..74ca0bf 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,6 +2,21 @@ Longer-term backlog, roughly ordered. An item moves to "Current state" in CLAUDE.md when picked up. +## Cluster coordination — OpenClaw coexistence (committed 2026-06-17, from Johnny 5 report 2026-06-16) + +Driven by the one other Spark Control adopter (a colleague running OpenClaw + cron jobs against his own dual Sparks; report at the date above). His cluster is configured differently from ours (vLLM on **both** Sparks, port 8000, raw `docker run`, container `vllm-gemma4`) and an automated cron physically swaps models — so his notes are partly *portability gaps* (the package hard-codes our layout) and partly *coordination gaps* (his dashboard and his crons fight over the GPU). + +**Design stance (decided):** Spark Control is the **control plane / GPU arbiter, not a job runner.** Recurring business pipelines (his "Daily Vol" generator; our own future scheduled jobs) live in *separate* application services that *call* Spark Control's swap API. The dividing line is what a scheduled job *does*: control-plane actions (swap a model, warm it, restart a service, run a health sweep) are in scope for an in-package scheduler; business logic (scrape / summarize / build / deploy) stays in the app layer. Swaps are already API-driven (`POST /api/swap` → `GET /api/swap/{id}` / `…/stream`, `POST /api/swap/{key}/validate`) and non-browser clients pass the CSRF guard, so an external scheduler can drive swaps **today** — the items below add the *safety* layer, not the capability. + +Sequenced: +1. **Configurable `VLLM_PORT`** — DONE, v0.22.0:0. Field in Configure Sparks (blank ⇒ 8888); numeric-setting parsing hardened so a blank/bad value falls back instead of crashing startup. Was the immediate "vLLM unreachable" bug for an adopter on port 8000. +2. **Local-path / fine-tuned model support** — see the dedicated item under "## Dashboard" below. Independently wanted; his merged `ten31-v2` (a directory, not an HF repo) is the motivating case. +3. **Configurable topology** — make the service→Spark→port map and container names configurable so the package stops assuming our exact layout. Lets an adopter monitor vLLM on *both* Sparks, use a different container name, and stop the Parakeet probe from hitting a vLLM that shares its port — without forking. (Covers report P4 multi-Spark vLLM, P5 container name, and the Parakeet-port collision #6.) +4. **Coordination layer** — build when our own automation actually lands (zero value until something other than the dashboard swaps models): + - **Swap lock** with holder + TTL (`POST` / `GET` / `DELETE /api/swap/lock`). An external scheduler acquires it before swapping; the dashboard then refuses manual swaps and shows who holds the GPU and until when. Enforced by the swap path, not advisory. + - **Swap-event webhook** (`swap_complete` / `swap_failed`) to a configurable URL, so downstream consumers update their provider config when the running model changes. + - **Schedule visibility** — read-only view the dashboard surfaces, *registered by* external schedulers (Spark Control does not own the schedule). + ## Near term - parakeet-asr long-audio memory guard — **deferred 2026-06-15, low priority.** A duration cap on `/v1/audio/diarize`: Sortformer runs the whole file in one pass (`diarizer.py:128-135`) over Spark 2's *shared* 128 GB unified memory (also feeding Kokoro/embeddings/Qdrant), so one giant single file can thrash into swap. **Precautionary — no observed incident**, and the production consumer (Recap Relay) already chunks via `/diarize-chunk` (~5-min, already bounded), so the only exposed path is a consumer POSTing one huge file to the full `/diarize`. When picked up: add a configurable `MAX_DIARIZE_SECONDS` guard in `diarizer.py` right after `duration` is computed (~line 130) → raise → HTTP 413 in `main.py` (mirrors the existing `MAX_UPLOAD_MB` 413); ship via the Reapply-patches action (restarts the live parakeet-asr container → needs go/no-go). Leave transcription out of v1 (upstream/un-patched file; parakeet-TDT handles long audio better). Revisit only if a consumer starts sending long single files. - Controlled concurrency sweep of the audio endpoints in a quiet window — replace the reasoned in-flight cap (2, ceiling 3) with the measured knee. diff --git a/image/app/config.py b/image/app/config.py index 5aa0830..e0d50aa 100644 --- a/image/app/config.py +++ b/image/app/config.py @@ -8,6 +8,16 @@ def _env(name: str, default: str = "") -> str: return os.environ.get(name, default) +def _env_int(name: str, default: int) -> int: + """Parse an int env var, falling back to `default` when unset, blank, or + malformed. The StartOS Configure panel passes optional numeric fields as an + empty string when left blank, so a bare int("") would crash daemon startup.""" + try: + return int(os.environ.get(name, "") or default) + except (TypeError, ValueError): + return default + + def _resolve_models_yaml() -> str: if env := os.environ.get("MODELS_YAML"): return env @@ -101,16 +111,16 @@ class Settings: matrix_bridge_branch=_env("MATRIX_BRIDGE_BRANCH") or "master", # Redaction gateway pseudonym-map store (server-held de-anon key). redaction_map_db=_env("REDACTION_MAP_DB", "/data/redaction_maps.db"), - redaction_map_ttl=int(_env("REDACTION_MAP_TTL", "7200")), + redaction_map_ttl=_env_int("REDACTION_MAP_TTL", 7200), ssh_key_path=_env("SSH_KEY_PATH"), ssh_known_hosts=_env("SSH_KNOWN_HOSTS"), models_yaml=_resolve_models_yaml(), - vllm_port=int(_env("VLLM_PORT", "8888")), - parakeet_port=int(_env("PARAKEET_PORT", "8000")), - kokoro_port=int(_env("KOKORO_PORT", "8880")), - embed_port=int(_env("EMBED_PORT", "8088")), - qdrant_port=int(_env("QDRANT_PORT", "6333")), - bind_port=int(_env("BIND_PORT", "9999")), + vllm_port=_env_int("VLLM_PORT", 8888), + parakeet_port=_env_int("PARAKEET_PORT", 8000), + kokoro_port=_env_int("KOKORO_PORT", 8880), + embed_port=_env_int("EMBED_PORT", 8088), + qdrant_port=_env_int("QDRANT_PORT", 6333), + bind_port=_env_int("BIND_PORT", 9999), open_webui_url=_env("OPEN_WEBUI_URL", ""), ngc_api_key=_env("NGC_API_KEY", ""), ) diff --git a/package/Makefile b/package/Makefile index 927e1fc..b6e66c0 100644 --- a/package/Makefile +++ b/package/Makefile @@ -1,3 +1,14 @@ ARCHES := x86 # overrides to s9pk.mk must precede the include statement include s9pk.mk + +# Publish the built s9pk to Gitea Releases (adopters pull it with a read-only +# token instead of being hand-sent the package). Needs GITEA_URL + GITEA_TOKEN; +# the vX.Y.Z git tag must already be pushed. See ../scripts/gitea-release.sh. +RELEASE_VERSION := $(shell sed -n "s/.*version: '\([^']*\)'.*/\1/p" startos/versions/v0_1_0.ts) + +.PHONY: release +release: + @test -f "$(PACKAGE_ID)_x86_64.s9pk" || { echo "Build first: make x86"; exit 1; } + GITEA_URL="$(GITEA_URL)" GITEA_TOKEN="$(GITEA_TOKEN)" \ + ../scripts/gitea-release.sh "$(RELEASE_VERSION)" "$(PACKAGE_ID)_x86_64.s9pk" diff --git a/package/startos/actions/configureSparks.ts b/package/startos/actions/configureSparks.ts index 6b81be6..abd8168 100644 --- a/package/startos/actions/configureSparks.ts +++ b/package/startos/actions/configureSparks.ts @@ -40,6 +40,15 @@ const inputSpec = InputSpec.of({ placeholder: 'your SSH username', masked: false, }), + vllm_port: Value.text({ + name: 'vLLM port (optional)', + description: + "The port your vLLM server listens on, on Spark 1 — used by the health check and the chat proxy. Leave blank to use 8888, which is what the bundled launch-cluster.sh wrapper uses. Set this to 8000 (vLLM's own default) or another port if your vLLM listens elsewhere.", + required: false, + default: null, + placeholder: 'leave blank for 8888', + masked: false, + }), parakeet_host: Value.text({ name: 'Parakeet host (optional)', description: diff --git a/package/startos/fileModels/sparkConfig.yaml.ts b/package/startos/fileModels/sparkConfig.yaml.ts index dccec0c..85a63b6 100644 --- a/package/startos/fileModels/sparkConfig.yaml.ts +++ b/package/startos/fileModels/sparkConfig.yaml.ts @@ -7,6 +7,8 @@ export const sparkConfigSchema = z.object({ spark1_user: z.string().catch(''), spark2_host: z.string().catch(''), spark2_user: z.string().catch(''), + // Optional vLLM port override (Spark 1). Blank => 8888 (launch-cluster.sh default). + vllm_port: z.string().catch(''), // Optional per-service overrides. Blank => use spark2_host / spark2_user. parakeet_host: z.string().catch(''), parakeet_user: z.string().catch(''), diff --git a/package/startos/main.ts b/package/startos/main.ts index 03336cc..9595fa6 100644 --- a/package/startos/main.ts +++ b/package/startos/main.ts @@ -13,6 +13,7 @@ export const main = sdk.setupMain(async ({ effects }) => { spark1_user: '', spark2_host: '', spark2_user: '', + vllm_port: '', parakeet_host: '', parakeet_user: '', parakeet_container: '', @@ -50,6 +51,7 @@ export const main = sdk.setupMain(async ({ effects }) => { SPARK1_USER: cfg.spark1_user, SPARK2_HOST: cfg.spark2_host, SPARK2_USER: cfg.spark2_user, + VLLM_PORT: cfg.vllm_port, PARAKEET_HOST: cfg.parakeet_host, PARAKEET_USER: cfg.parakeet_user, PARAKEET_CONTAINER: cfg.parakeet_container, diff --git a/package/startos/versions/v0_1_0.ts b/package/startos/versions/v0_1_0.ts index 8da74f4..61bee24 100644 --- a/package/startos/versions/v0_1_0.ts +++ b/package/startos/versions/v0_1_0.ts @@ -1,10 +1,10 @@ import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk' export const v0_1_0 = VersionInfo.of({ - version: '0.21.0:1', + version: '0.22.0:0', releaseNotes: { en_US: - "v0.21.0:1 — matrix-bridge bot tile. The Matrix bot container on Spark 2 now appears as a tile under \"Always-on services\" with a live status badge (judged by the container itself, since the bot has no health port). Buttons: Update (pulls the latest code, rebuilds the image, and recreates the container — long-running, with a streamed log and a generous timeout), Restart, Stop/Start, and View logs (last 100 lines). Everything fails loud: a non-zero exit or stderr shows in the panel rather than a silent stall. To enable it, set the bot's SSH user (the owner of ~/matrix-bridge, e.g. 'modelo') in the Configure Sparks action — leave it blank and no tile appears, so this stays out of the way on systems that don't run the bot. New endpoints (LAN-only, browser-driven): POST /api/matrix-bridge/update (+ /{id} and /{id}/stream for progress), GET /api/matrix-bridge/logs. One-time setup on the Spark (owner): make ~/matrix-bridge a git clone of your Gitea repo, and — unless that SSH user is the same as your Spark 2 user — authorize this package's SSH public key for it (Show Public Key, then add it to that user's authorized_keys). There is no passwordless sudo on the Spark, so commands run directly as that user rather than via sudo.", + "v0.22.0:0 — configurable vLLM port. The port Spark Control uses to reach vLLM on Spark 1 (the health check and the chat proxy) is now a field in the Configure Sparks action, so you can point it at a vLLM that listens on a non-default port without rebuilding the package. Leave it blank to keep the previous default of 8888 — what the bundled launch-cluster.sh wrapper uses; set it to 8000 (vLLM's own default) or any other port if your vLLM listens elsewhere. Also hardened numeric-setting parsing so a blank or malformed port value falls back to its default instead of crashing daemon startup.", }, migrations: { up: async ({ effects }) => {}, diff --git a/scripts/gitea-release.sh b/scripts/gitea-release.sh new file mode 100755 index 0000000..95939ea --- /dev/null +++ b/scripts/gitea-release.sh @@ -0,0 +1,45 @@ +#!/usr/bin/env bash +# Publish a built Spark Control s9pk to Gitea Releases, so adopters can pull the +# latest package with a read-only token instead of being hand-sent the file. +# +# GITEA_URL=https://gitea.example:3000 GITEA_TOKEN= \ +# scripts/gitea-release.sh 0.22.0:0 package/spark-control_x86_64.s9pk +# +# The git tag (vX.Y.Z, derived from the version) must already exist and be pushed +# (`git tag v0.22.0 && git push gitea v0.22.0`). Re-running is idempotent: it +# reuses an existing release for the tag and replaces a same-named asset. +set -euo pipefail + +VERSION="${1:-}"; S9PK="${2:-}" +[ -n "$VERSION" ] && [ -n "$S9PK" ] || { + echo "usage: GITEA_URL=.. GITEA_TOKEN=.. $0 " >&2; exit 2; } +: "${GITEA_URL:?set GITEA_URL to your Gitea base URL, e.g. https://gitea.lan:3000}" +: "${GITEA_TOKEN:?set GITEA_TOKEN to a token with repository write access}" +[ -f "$S9PK" ] || { echo "s9pk not found: $S9PK" >&2; exit 1; } + +TAG="v${VERSION%%:*}" # 0.22.0:0 -> v0.22.0 +ASSET="$(basename "$S9PK")" +SLUG="$(git remote get-url gitea | sed -E 's#.*[:/]([^/:]+/[^/]+)\.git$#\1#')" # grant/spark-control +API="${GITEA_URL%/}/api/v1/repos/${SLUG}" +AUTH=(-H "Authorization: token ${GITEA_TOKEN}") + +echo "repo ${SLUG} | tag ${TAG} | asset ${ASSET} | ${GITEA_URL}" + +# Reuse an existing release for this tag, otherwise create one. +id="$(curl -fsS "${AUTH[@]}" "$API/releases/tags/$TAG" 2>/dev/null | jq -r '.id // empty')" +if [ -z "$id" ]; then + id="$(curl -fsS -X POST "${AUTH[@]}" -H 'Content-Type: application/json' \ + --data "$(jq -n --arg t "$TAG" --arg n "$VERSION" \ + '{tag_name:$t, name:$n, body:("Spark Control "+$n+". See AGENTS.md / release notes.")}')" \ + "$API/releases" | jq -r '.id')" +fi +[ -n "$id" ] && [ "$id" != null ] || { echo "could not obtain release id (check URL/token/tag)" >&2; exit 1; } + +# Replace a same-named asset so re-runs don't 409. +old="$(curl -fsS "${AUTH[@]}" "$API/releases/$id/assets" | jq -r --arg n "$ASSET" '.[] | select(.name==$n) | .id')" +[ -n "$old" ] && curl -fsS -X DELETE "${AUTH[@]}" "$API/releases/$id/assets/$old" >/dev/null || true + +curl -fsS -X POST "${AUTH[@]}" -F "attachment=@${S9PK};type=application/octet-stream" \ + "$API/releases/$id/assets?name=$ASSET" >/dev/null + +echo "published: ${GITEA_URL%/}/${SLUG}/releases/tag/${TAG}"