docs: refresh Current state for handoff — harness shipped, parakeet deferred, finished narrative pruned

This commit is contained in:
Keysat
2026-06-15 18:32:57 -05:00
parent 89338c97f5
commit e307a08f05
+8 -10
View File
@@ -55,13 +55,11 @@ Subsystem guidance lives in `docs/guides/` and loads when matching files are tou
## Current state ## Current state
- **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel mode. Spark 2 audio stack healthy (11k+ requests/12h, all 200). - **Working (v0.20.0:0, installed and serving):** swap dashboard; chat / transcribe / diarize(+chunk) / TTS proxies; embeddings + rerank + hybrid search (Qdrant); `/scrub` + `/rehydrate`; label-merge incl. dual-channel; per-Spark SSH-key copy + WireGuard `VPN <ip>` hardware-card badge. Spark 2 audio stack healthy. Security hardening (v0.19.0:0 — shellsafe SSH-injection guard, Qdrant path-injection, same-origin CSRF guard) shipped and stable; evidence in `EVALUATION.md`.
- **Security hardening shipped (v0.19.0:0, 2026-06-12):** closed an SSH command-injection path (`shellsafe.py` validates + `shlex.quote`s every user value crossing into a Spark command), a Qdrant collection path-injection, and added a same-origin (CSRF) guard on control endpoints (proxy/data API exempt, consumers unaffected). Full evidence in `EVALUATION.md`; remaining non-blocking P2/P3 debt now lives in `ROADMAP.md`. - **Tests:** offline pytest harness in `image/tests/``cd image && .venv/bin/python -m pytest` (65 passing). Covers `build_launch_command` (incl. the shell-injection round-trip), the transcript↔diarizer label-merge, and the `shellsafe` validators. Mock-heavy swap/proxy tests deliberately skipped (low ROI). Redaction + live-audio suites remain standalone scripts.
- **Git history scrubbed (2026-06-12):** owner-specific IPs/hosts/user/key-name/personal-names purged from all commits/tags/messages via `git filter-repo`, force-pushed to `gitea` (every SHA changed); 0 hits across all refs. Pre-rewrite backup bundle: `../spark-control-prehistory-rewrite.bundle`. Owner declined SSH-key rotation (only the key *name* leaked, never the material) — don't re-flag.- **Shipped — Spark connectivity helpers (v0.20.0:0, built + installed 2026-06-15):** two read-mostly hardware-card additions. (a) **SSH-key copy:** small copy icon top-right of each reachable card → `POST /api/spark/{name}/ssh-key` (generate-if-missing + return the Spark's *outbound* pubkey; non-destructive; CSRF-guarded; no request input reaches the command so no shellsafe). UI pops `#sshkey-dialog` (key + paste-on-Mac one-liner) since plain-HTTP blocks `navigator.clipboard`. Opposite direction from the StartOS `showPublicKey` action (that grants the *dashboard* access to the Sparks). (b) **WireGuard status badge:** the `hardware.py` probe now also reports `wg_iface`/`wg_addr` via unprivileged `ip -o link show type wireguard` (no root/sudo, ends in a pipe to awk so it can't trip the probe's `set -e`); `renderHardware` shows a `VPN <ip>` badge in the meta line when a tunnel is up. Reflects interface presence, not live peer reachability (true handshake age would need `sudo wg show`). Verified: clean `make x86` + `start-cli package install` exit 0, the real `ip ... type wireguard` output on spark2 matches the parser, and — **confirmed in-browser** — the SSH-key icon works. That also closes the long-open v0.19.0 question: the same-origin CSRF guard does NOT false-block control endpoints behind the StartOS proxy (the SSH-key POST goes through it). The `VPN 10.59.211.6` badge render is confirmed in-browser too — feature fully verified. - **Signal Engine "flakiness":** diagnosed as *not* a server bug — transient 14s unresponsiveness while the single GPU is busy. Client-side remedy (in-flight cap 2 / ceiling 3 / retry-on-timeout+503) drafted and **forwarded to that dev (owner confirmed 2026-06-15)**. Awaiting whether they want the measured concurrency knee.
- **spark2 joined the `starttunnel` WireGuard subnet (2026-06-15):** config installed at `/etc/wireguard/starttunnel.conf`, interface `starttunnel` up at `10.59.211.6/24`, `wg-quick@starttunnel` enabled (survives reboot). Split tunnel (`AllowedIPs = 10.59.211.0/24`) so the Spark keeps its LAN route — the dashboard's SSH is unaffected. Purpose: let a bot on spark2 reach the owner's Mac off-LAN. **Finding:** passwordless sudo is NOT configured on spark2 (`sudo wg show` → "a password is required") — the earlier assumption was wrong; harmless here since the badge is sudo-free, but note it before designing any dashboard feature that needs root on a Spark. - **Stance (decided, not built):** no public interface / no API-token auth — LAN + WireGuard/Tailscale split-tunnel only; the CSRF guard covers the browser-driven vector.
- **In progress — Signal Engine "flakiness":** diagnosed, not a server bug — transient 14s unresponsiveness while the single GPU is continuously busy. Client-side remedy drafted (in-flight cap 2, hard ceiling 3 across audio endpoints, retry-with-backoff on timeout/503), with the owner to forward to that dev. - **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers.
- **Decided, not implemented:** no public interface / no API token auth — LAN + WireGuard/Tailscale split-tunnel only (the CSRF guard now covers the browser-driven vector). An empirical audio concurrency sweep is offered but needs the owner's OK in a quiet window. - **Infra gotcha (safety):** passwordless sudo is NOT configured on spark2 — design unprivileged probes for any Spark feature (the badge uses `ip`, not `sudo wg show`). spark2 sits on the `starttunnel` WireGuard subnet (`10.59.211.6/24`, survives reboot). Owner declined SSH-key rotation after the 2026-06-12 history scrub (only the key *name* leaked) — don't re-flag.
- **Known limits:** `/health` blips while the GPU is busy (mitigated client-side); dual-channel can miss a quiet local word under loud remote bleed; the connectivity log misses sub-5s outages between 5s polls; diarizer caps at 4 speakers. - **Hosting:** self-hosted Gitea — remote `gitea`, branch `master`, over SSH; push after committing. (Wart: commit `8d839e3` is mislabeled `v0.13.0:4` but contains through v0.18.0:0.)
- **Repo wart:** commit `8d839e3` (was `367d986` pre-rewrite) is labeled `v0.13.0:4` but contains everything through v0.18.0:0 — per-version commits for v0.14v0.18 don't exist. Keep commit messages accurate. - **Next:** (1) audio concurrency sweep — only if the Signal Engine dev wants the measured knee; needs owner OK in a quiet window. (2) Otherwise pull from `ROADMAP.md`: local-path/fine-tuned model support (new) or P2 tech-debt. Parakeet long-audio guard is deferred (rationale in ROADMAP).
- **Hosting:** pushes to the owner's self-hosted Gitea — remote `gitea`, branch `master`, over SSH. Push after committing.
- **Next:** (1) owner forwards the concurrency note to the Signal Engine dev; (2) concurrency sweep if the dev wants the measured knee; (3) parakeet-asr `--memory` cap via Reapply-patches; (4) start the `ROADMAP.md` tech-debt list (a pytest harness first).