diff --git a/AGENTS.md b/AGENTS.md index 033c106..175fc9d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -54,13 +54,13 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou ## Current state -- **Working (v0.19.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack healthy (11k+ requests/12h, all 200). +- **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack healthy (11k+ requests/12h, all 200). - **Security hardening shipped (v0.19.0:0, 2026-06-12):** closed an SSH command-injection path (`shellsafe.py` validates + `shlex.quote`s every user value crossing into a Spark command), a Qdrant collection path-injection, and added a same-origin (CSRF) guard on control endpoints (proxy/data API exempt, consumers unaffected). Full evidence in `EVALUATION.md`; remaining non-blocking P2/P3 debt now lives in `ROADMAP.md`. -- **Git history scrubbed (2026-06-12):** owner-specific IPs/hosts/user/key-name/personal-names purged from all commits/tags/messages via `git filter-repo`, force-pushed to `gitea` (every SHA changed); 0 hits across all refs. Pre-rewrite backup bundle: `../spark-control-prehistory-rewrite.bundle`. Owner declined SSH-key rotation (only the key *name* leaked, never the material) — don't re-flag. -- **Only unverified bit of v0.19.0:0:** an on-box click-through of one control action (swap / service start/stop) to confirm the CSRF guard doesn't false-positive-block the dashboard behind the StartOS proxy. If a normal action ever returns "cross-origin request … blocked," the fix is loosening the `Host`/`Origin` check in `csrf_guard`. +- **Git history scrubbed (2026-06-12):** owner-specific IPs/hosts/user/key-name/personal-names purged from all commits/tags/messages via `git filter-repo`, force-pushed to `gitea` (every SHA changed); 0 hits across all refs. Pre-rewrite backup bundle: `../spark-control-prehistory-rewrite.bundle`. Owner declined SSH-key rotation (only the key *name* leaked, never the material) — don't re-flag.- **Shipped — Spark connectivity helpers (v0.20.0:0, built + installed 2026-06-15):** two read-mostly hardware-card additions. (a) **SSH-key copy:** small copy icon top-right of each reachable card → `POST /api/spark/{name}/ssh-key` (generate-if-missing + return the Spark's *outbound* pubkey; non-destructive; CSRF-guarded; no request input reaches the command so no shellsafe). UI pops `#sshkey-dialog` (key + paste-on-Mac one-liner) since plain-HTTP blocks `navigator.clipboard`. Opposite direction from the StartOS `showPublicKey` action (that grants the *dashboard* access to the Sparks). (b) **WireGuard status badge:** the `hardware.py` probe now also reports `wg_iface`/`wg_addr` via unprivileged `ip -o link show type wireguard` (no root/sudo, ends in a pipe to awk so it can't trip the probe's `set -e`); `renderHardware` shows a `VPN ` badge in the meta line when a tunnel is up. Reflects interface presence, not live peer reachability (true handshake age would need `sudo wg show`). Verified: clean `make x86` + `start-cli package install` exit 0, the real `ip ... type wireguard` output on spark2 matches the parser, and — **confirmed in-browser** — the SSH-key icon works. That also closes the long-open v0.19.0 question: the same-origin CSRF guard does NOT false-block control endpoints behind the StartOS proxy (the SSH-key POST goes through it). The `VPN 10.59.211.6` badge render is confirmed in-browser too — feature fully verified. +- **spark2 joined the `starttunnel` WireGuard subnet (2026-06-15):** config installed at `/etc/wireguard/starttunnel.conf`, interface `starttunnel` up at `10.59.211.6/24`, `wg-quick@starttunnel` enabled (survives reboot). Split tunnel (`AllowedIPs = 10.59.211.0/24`) so the Spark keeps its LAN route — the dashboard's SSH is unaffected. Purpose: let a bot on spark2 reach the owner's Mac off-LAN. **Finding:** passwordless sudo is NOT configured on spark2 (`sudo wg show` → "a password is required") — the earlier assumption was wrong; harmless here since the badge is sudo-free, but note it before designing any dashboard feature that needs root on a Spark. - **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 1–4s unresponsiveness while the single GPU is continuously busy. Client-side remedy drafted (in-flight cap 2, hard ceiling 3 across audio endpoints, retry-with-backoff on timeout/503), with the owner to forward to that dev. - **Decided, not implemented:** no public interface / no API token auth — LAN + WireGuard/Tailscale split-tunnel only (the CSRF guard now covers the browser-driven vector). An empirical audio concurrency sweep is offered but needs the owner's OK in a quiet window. - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers. - **Repo wart:** commit `8d839e3` (was `367d986` pre-rewrite) is labeled `v0.13.0:4` but contains everything through v0.18.0:0 — per-version commits for v0.14–v0.18 don't exist. Keep commit messages accurate. - **Hosting:** pushes to the owner's self-hosted Gitea — remote `gitea`, branch `master`, over SSH. Push after committing. -- **Next:** (1) on-box CSRF click-through; (2) owner forwards the concurrency note to the Signal Engine dev; (3) concurrency sweep if the dev wants the measured knee; (4) parakeet-asr `--memory` cap via Reapply-patches; (5) start the `ROADMAP.md` tech-debt list (a pytest harness first). +- **Next:** (1) owner forwards the concurrency note to the Signal Engine dev; (2) concurrency sweep if the dev wants the measured knee; (3) parakeet-asr `--memory` cap via Reapply-patches; (4) start the `ROADMAP.md` tech-debt list (a pytest harness first). diff --git a/image/app/hardware.py b/image/app/hardware.py index 5561c38..edff198 100644 --- a/image/app/hardware.py +++ b/image/app/hardware.py @@ -26,6 +26,9 @@ echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.dra echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}') DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}') echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null) +WGIF=$(ip -o link show type wireguard 2>/dev/null | awk -F': ' 'NR==1 {print $2}') +echo WG_IFACE=$WGIF +echo WG_ADDR=$(ip -o -4 addr show "$WGIF" 2>/dev/null | awk 'NR==1 {print $4}') """.strip() @@ -84,6 +87,11 @@ def _parse(out: str) -> dict: # MAC address on the default-route interface (for Wake-on-LAN) if info.get("mac"): parsed["mac"] = info["mac"].lower() + # WireGuard tunnel membership: name + address of the first wg interface, if + # any. Read-only and unprivileged (`ip` needs no root), so it never depends + # on sudo and never breaks the probe — absence just yields no badge. + parsed["wg_iface"] = info.get("wg_iface") or None + parsed["wg_addr"] = info.get("wg_addr") or None return parsed diff --git a/image/app/server.py b/image/app/server.py index c79a5e8..bea8125 100644 --- a/image/app/server.py +++ b/image/app/server.py @@ -401,6 +401,53 @@ async def wake_spark(name: str) -> dict: return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via} +@app.post("/api/spark/{name}/ssh-key") +async def spark_ssh_key(name: str) -> dict: + """Ensure the named Spark has an ed25519 keypair and return its PUBLIC key. + + This is the Spark's *outbound* identity — the key it uses to log in to other + machines (e.g. the operator's Mac). It is the opposite direction from, and + distinct from, the package's own key shown by the StartOS "Show Public Key" + action (which grants this dashboard SSH access to the Sparks). + + Non-destructive: generates the key only if absent, never overwrites an + existing one (which may already be an identity the Spark uses elsewhere). + Public keys are not secret, so returning it is safe. No request-supplied + value reaches the command — `name` is constrained to a fixed set and + host/user come from operator config — so there is nothing to shell-quote. + """ + if name not in ("spark1", "spark2"): + raise HTTPException(404, f"unknown spark: {name}") + host = settings.spark1_host if name == "spark1" else settings.spark2_host + user = settings.spark1_user if name == "spark1" else settings.spark2_user + if not host or not user: + raise HTTPException(400, f"{name} is not configured") + # Empty passphrase so the key is usable unattended; comment carries the + # remote hostname so it's identifiable in an authorized_keys file later. + cmd = ( + "set -e; " + "mkdir -p ~/.ssh && chmod 700 ~/.ssh; " + "if [ ! -f ~/.ssh/id_ed25519 ]; then " + 'ssh-keygen -t ed25519 -N "" -C "spark-control@$(hostname)" -f ~/.ssh/id_ed25519 >/dev/null 2>&1; ' + "echo CREATED=1; else echo CREATED=0; fi; " + "[ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -y -f ~/.ssh/id_ed25519 > ~/.ssh/id_ed25519.pub; " + "echo PUBKEY=$(cat ~/.ssh/id_ed25519.pub)" + ) + rc, out, err = await ssh_run(host, user, cmd, settings, timeout=15) + if rc != 0: + raise HTTPException(502, f"couldn't read/create the SSH key on {name}: {err.strip() or out.strip() or f'rc={rc}'}") + created = False + pubkey = "" + for line in out.splitlines(): + if line.startswith("CREATED="): + created = line.strip() == "CREATED=1" + elif line.startswith("PUBKEY="): + pubkey = line[len("PUBKEY="):].strip() + if not pubkey: + raise HTTPException(502, f"no public key returned from {name}") + return {"ok": True, "spark": name, "host": host, "user": user, "pubkey": pubkey, "created": created} + + @app.get("/api/services") async def get_services() -> dict: """Lifecycle state of always-on support services (Parakeet, Kokoro, …). diff --git a/image/app/static/app.js b/image/app/static/app.js index 252fbb5..7ac4939 100644 --- a/image/app/static/app.js +++ b/image/app/static/app.js @@ -305,6 +305,32 @@ async function wakeSpark(name) { } } +// Generate-if-missing + copy this Spark's OUTBOUND ssh public key (the key the +// Spark uses to log in to other machines, e.g. the Mac). Distinct from the +// package's own key in the StartOS "Show Public Key" action. +async function copySparkSshKey(name, btn) { + if (btn) btn.disabled = true; + try { + const r = await fetchJSON(`/api/spark/${name}/ssh-key`, { method: 'POST' }); + // Best-effort clipboard copy; on plain-HTTP this no-ops, but the dialog + // below always shows the key for manual selection. + await copyText(r.pubkey, btn); + const label = r.host ? `${name} (${r.host})` : name; + el('#sshkey-title').textContent = `${name} — SSH public key`; + el('#sshkey-intro').textContent = r.created + ? `Generated a new SSH key on ${label} and copied it to your clipboard. This is the key ${name} uses to log in to OTHER machines.` + : `${label} already had an SSH key; copied its public key to your clipboard. This is the key ${name} uses to log in to OTHER machines.`; + el('#sshkey-value').textContent = r.pubkey; + el('#sshkey-install').textContent = + `mkdir -p ~/.ssh && echo '${r.pubkey}' >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys`; + el('#sshkey-dialog').showModal(); + } catch (e) { + alert(`Couldn't get the SSH key for ${name}: ${e.message}`); + } finally { + if (btn) btn.disabled = false; + } +} + function renderHardware() { const panel = el('#hardware-panel'); const grid = el('#hardware-grid'); @@ -358,11 +384,21 @@ function renderHardware() { if (s.gpu_temp_c != null) gpuExtras.push(`${s.gpu_temp_c}°C`); if (s.gpu_power_w != null) gpuExtras.push(`${s.gpu_power_w.toFixed(0)}W`); const gpuExtrasStr = gpuExtras.length ? ` · ${gpuExtras.join(' · ')}` : ''; + // Read-only WireGuard badge: shown only when the Spark has a wg interface up. + // "VPN " means it's a peer on that tunnel (reachable off-LAN when the + // tunnel is up); it reflects interface presence, not live peer reachability. + const wgIp = s.wg_addr ? String(s.wg_addr).split('/')[0] : ''; + const wgBadge = s.wg_iface + ? ` · VPN${wgIp ? ' ' + escapeHtml(wgIp) : ''}` + : ''; card.className = 'hw-card'; card.innerHTML = `
${escapeHtml(s.hostname || key)} - ${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')} + ${escapeHtml(key)} · ${escapeHtml(s.gpu_name || '')} · ${escapeHtml(s.uptime || '')}${wgBadge} +
CPU @@ -1849,11 +1885,15 @@ async function init() { el('#nim-prog-close').addEventListener('click', () => el('#nim-progress-dialog').close()); el('#open-connectivity').addEventListener('click', openConnectivityDialog); el('#connectivity-close').addEventListener('click', () => el('#connectivity-dialog').close()); - // Wake-on-LAN buttons live on unreachable hardware cards; delegate. + // Hardware-card buttons (Wake-on-LAN on unreachable cards; SSH-key copy on + // reachable ones) are rendered dynamically, so delegate from the grid. el('#hardware-grid').addEventListener('click', (e) => { - const btn = e.target.closest('[data-wake]'); - if (btn) wakeSpark(btn.dataset.wake); + const wbtn = e.target.closest('[data-wake]'); + if (wbtn) { wakeSpark(wbtn.dataset.wake); return; } + const kbtn = e.target.closest('[data-ssh-key]'); + if (kbtn) { copySparkSshKey(kbtn.dataset.sshKey, kbtn); return; } }); + el('#sshkey-close').addEventListener('click', () => el('#sshkey-dialog').close()); setupCatalogDialog(); setupAdvancedDialog(); // Open WebUI link from /api/config diff --git a/image/app/static/index.html b/image/app/static/index.html index 9df3dfc..2057858 100644 --- a/image/app/static/index.html +++ b/image/app/static/index.html @@ -244,6 +244,24 @@ + + + +