Add full-eval report (EVALUATION.md)
This commit is contained in:
@@ -0,0 +1,75 @@
|
|||||||
|
# Evaluation — recap-relay — 2026-06-13
|
||||||
|
|
||||||
|
Intent: An operator-side, credit-metered relay that fronts Gemini and the operator's local AI hardware ("Spark Control"), owning transcription/diarization/analysis routing, the cloud Pro/Max tier + expiry, self-serve billing settlement (BTCPay/Zaprite), and the internal-meetings pipeline — a private Start9 s9pk that ships only to the operator's own box.
|
||||||
|
|
||||||
|
Agents run: evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker. (reviewer skipped — working tree is clean, no diff to review.)
|
||||||
|
|
||||||
|
## Verdict
|
||||||
|
|
||||||
|
Recap Relay substantially achieves its stated intent: the pipeline works end-to-end, 47/47 tests pass, the server boots clean, webhook signatures verify correctly, and secret hygiene (on disk, over the wire, in git history) is genuinely careful with unusually good "why" comments. The headline risk is a **server-side request forgery on `/relay/transcribe-url` and `/relay/summarize-url`** — three agents independently landed on it: `downloadDirect` fetches any caller-supplied `media_url` with redirect-following and no scheme/host allowlist, and the route is reachable by anyone presenting a self-chosen `X-Recap-Install-Id`, so it is unauthenticated in practice and sits on the operator's LAN beside other StartOS services. Close behind are a **billing money-leak** (mid-cycle renewal resets the monthly credit counter to zero) and a **known-vulnerable `multer@1.4.5` DoS line**. None of these block a single-operator private sideload today, but the SSRF and the billing reset should be fixed before this relay sees any broader or adversarial traffic. Packaging is well-formed for sideload; the only public-registry blocker (a missing README) is moot for a package that intentionally never leaves the operator's box.
|
||||||
|
|
||||||
|
## Cross-referenced findings
|
||||||
|
|
||||||
|
- **SSRF via `media_url`** — corroborated by **three** agents. evaluator (P1) and security-auditor (P1) both pin `server/routes/transcribe-url.js:99-159` `downloadDirect` doing `fetch(url, {redirect:"follow"})` with no validation; the exerciser independently observed `file://`/`javascript://`/empty-host URLs accepted at input (failing only downstream). security-auditor's added insight: any client can self-assign an install-id (auto-creates a Core row, no allowlist), so it's effectively unauthenticated. **One P1, three kinds of evidence.**
|
||||||
|
- **Path traversal on internal-meetings `:id`** — security-auditor (P2) and exerciser (P2) both reproduced `path.join(meetingsDir, id + '.json')` resolving outside the directory (e.g. `../../../etc/passwd.json`). Admin-cookie gated, so self-limited on a one-operator box, but a latent traversal. `output-store.js:52` already shows the correct sanitize pattern the meetings store omits. **One P2, two reproductions.**
|
||||||
|
- **Non-constant-time operator-key compare** — evaluator (P2) and security-auditor (P2) both flag `server/identity.js:43/84` using `!==` on the high-value shared `relay_cloud_operator_key`, inconsistent with the `timingSafeEqual` used on the admin/webhook paths. **One P2.**
|
||||||
|
- **Vulnerable `multer@1.4.5-lts.2`** — security-auditor (P1, CVE-2025-47944/47935/48997/7338 DoS) and exerciser (npm install warning; worth tracking v2). **One P1.**
|
||||||
|
- **Stale reported version `0.2.11`** — exerciser (`/relay/health` reports `0.2.11`) and doc-auditor (`server/package.json` lags the StartOS `0.2.124`) describe the same root cause from two angles. **One P3.**
|
||||||
|
- **126 `startos/versions/*.ts` files** — evaluator (P3, dead weight) and start9-spec-checker (Surprise: ~one revision per deploy) both noted the version-graph bulk; the graph is well-formed, just heavy. **One P3.**
|
||||||
|
- **Billing-grant integrity** — evaluator's renewal-reset (P1) and security-auditor's webhook-replay-double-credit (P2) are distinct bugs in the same money path that *compound*: a webhook double-fire across a restart both re-grants and resets the monthly counter. Kept separate below, cross-referenced.
|
||||||
|
|
||||||
|
## Priority queue
|
||||||
|
|
||||||
|
- [P1] SSRF: `downloadDirect` fetches any `media_url` with redirect-follow, no scheme/host allowlist; reachable via self-chosen install-id — `server/routes/transcribe-url.js:99-159`, `summarize-url.js` — evaluator, security-auditor, exerciser
|
||||||
|
- [P1] Mid-cycle/early renewal resets `monthly_consumed = 0`, handing back a full monthly credit allotment for free — `server/credits.js:770` `extendUserTier` → `setUserTier` `:641` — evaluator
|
||||||
|
- [P1] `multer@1.4.5-lts.2` DoS CVEs (malformed multipart crashes the process); upgrade to `^2.0.1` — `server/package.json:15`, used at `routes/internal-meetings.js:1669` — security-auditor, exerciser
|
||||||
|
- [P2] Path traversal on internal-meetings `:id` (admin-gated); validate against `^[A-Za-z0-9_-]+$` before `path.join` — `routes/internal-meetings.js:84,91,242` — security-auditor, exerciser
|
||||||
|
- [P2] Non-constant-time operator-key comparison (timing oracle on a high-value key) — `server/identity.js:43,84` — evaluator, security-auditor
|
||||||
|
- [P2] Webhook replay double-credit/double-extend: dedup Set is in-memory, lost on restart; persist processed invoice/order ids — `routes/credits.js:63`, `zaprite-webhook.js:27` — security-auditor
|
||||||
|
- [P2] Malformed JSON body returns a full Node stack trace (local FS paths) — no Express error handler for `entity.parse.failed` — any JSON-body route, repro on `/relay/analyze` — exerciser
|
||||||
|
- [P2] BTCPay declared `optional:false` / `kind:'running'` despite comments calling it optional → StartOS refuses to start the relay unless BTCPay is co-installed and running — `startos/manifest/index.ts:38-49`, `startos/dependencies.ts` — start9-spec-checker
|
||||||
|
- [P2] No unit tests on the money path (`commitCredit`, `refundCredit`, `applyTierPromotion`, `planBackend`, webhook grant handlers) — this is why the P1 renewal bug ships green — `server/test/` — evaluator
|
||||||
|
- [P2] `routes/internal-meetings.js` is 2225 lines mixing storage, two backfill migrations, MD/HTML formatters, the upload pipeline, and ~14 handlers; extract formatters + storage — evaluator
|
||||||
|
- [P2] `cors()` is fully open and applies to `/admin/*` too; scope origins, don't wildcard admin — `server/index.js:54` — evaluator
|
||||||
|
- [P2] AGENTS.md:8 says all `/relay/*` sit "behind the operator key" — wrong; most are per-call header auth, only `routes/user-tier.js` needs the operator key — doc-auditor
|
||||||
|
- [P2] AGENTS.md:61 omits the third admin-exempt path `/admin/btcpay/callback` (state-token, not cookie) — `server/admin-auth.js:70` — doc-auditor
|
||||||
|
- [P3] No request rate limiting on public `/relay/*`; expensive transcribe/SSE paths can be hammered — evaluator
|
||||||
|
- [P3] Container likely runs as root: entrypoint `chown`s `/data` to uid 1001 but the image declares no such user and no `USER` directive — `docker_entrypoint.sh`, `Dockerfile` — security-auditor
|
||||||
|
- [P3] Stored XSS surface: dashboard renders user-supplied `title`/`media_url` via `innerHTML`; prefer `textContent`/escape (HTML export already escapes) — `public/dashboard.html` — security-auditor
|
||||||
|
- [P3] `lan-fetch.js` disables TLS verification (`rejectUnauthorized:false`); safe only because the Spark Control URL is admin-set — document that invariant — security-auditor
|
||||||
|
- [P3] `_ln_debug`/`_debug` and verbose error `message` fields returned to clients on credits/tier flows; trim before any broad exposure — security-auditor
|
||||||
|
- [P3] 126 `startos/versions/*.ts` files for a one-box package; prune to the last installed baseline — evaluator, start9-spec-checker
|
||||||
|
- [P3] `/relay/health` reports `0.2.11` (server/package.json never bumped past 0.2.11; StartOS is at 0.2.124) — misleading to monitoring — exerciser, doc-auditor
|
||||||
|
- [P3] No `README.md` at repo root (would block public-registry submission only; non-blocking for the private sideload) — start9-spec-checker
|
||||||
|
- [P3] `yt-dlp` installed unpinned in the Dockerfile (`pip3 install ... yt-dlp`) — pin a version for reproducible builds — start9-spec-checker
|
||||||
|
- [P3] AGENTS.md:46 `test/` layout lists 3 files; six exist (adds `tier-expiry`, `zaprite`, `polish-speaker-labels`) — doc-auditor
|
||||||
|
- [P3] `server/index.js:3-6` header comment still says "Two public endpoints" — file now mounts 17 relay routes + admin groups — doc-auditor, evaluator
|
||||||
|
- [P3] `POST /admin/logout` (`admin-auth.js:131`) is undocumented in the AGENTS.md endpoint contract — doc-auditor
|
||||||
|
- [P3] Dockerfile per-subdir `COPY` footgun: a new `server/<subdir>/` is silently omitted from the image (comment warns, but no static check) — start9-spec-checker
|
||||||
|
- [P3] Manifest polish (informational, registry-only): `license:'Proprietary'` not SPDX, `docsUrls:[]`, bare-domain repo URLs, `icon.png` vs `.svg`, Dockerfile Node 20 vs documented Node 22 — start9-spec-checker
|
||||||
|
|
||||||
|
## Scorecard
|
||||||
|
|
||||||
|
| Lens | Score /5 | Notes (synthesis) |
|
||||||
|
|---|---|---|
|
||||||
|
| Architecture | 4 | Clean per-route auth, decoupled identity/credits/jobs layers, sound `planBackend` routing. Scale pain: one 2225-line route file, 126 packaging version files. (evaluator) |
|
||||||
|
| Security | 3 | Strong webhook HMAC verify, masked secrets, `0o600` files, server-UUID ids. Gaps corroborated by the auditor: SSRF (P1), multer CVE (P1), timing compare, open CORS, likely container-root, no rate limiting. (evaluator + security-auditor concur) |
|
||||||
|
| Performance | 4 | Realistic load is one operator; serial JSON-ledger writes adequate and acknowledged; bounded O(N³) clustering; audit log rotates at 64MB; minor ledger TOCTOU. (evaluator) |
|
||||||
|
| Testing | 3 | 47 pass, clustering/edits-heavy. No money-path or route-level tests — the P1 billing bug ships green because `tier-expiry.test.js:63` asserts only the expiry date, not the credit counter. (evaluator, exerciser) |
|
||||||
|
| Code quality | 4 | Consistent style, excellent why-comments, honest TODOs. Dead-weight risk in the 2225-line file; SSRF/timing gaps suggest no security pass; no linter config. (evaluator) |
|
||||||
|
| Documentation | 3.5 | **Adjusted down from evaluator's 4.** AGENTS.md + the subsystem guide are largely accurate, but doc-auditor found moderate drift — notably AGENTS.md:8 mis-stating `/relay/*` auth, a missing exempt path, and a stale test list. No README. (evaluator, sharpened by doc-auditor) |
|
||||||
|
|
||||||
|
## Disagreements & gaps
|
||||||
|
|
||||||
|
- **Apparent disagreement on the SSRF severity** — the exerciser saw hostile `media_url`s "fail gracefully downstream" while evaluator and security-auditor rate it P1. Not a real conflict: graceful *download* failure doesn't stop the request from reaching internal hosts, and the failure timing/messages are themselves the SSRF signal. The exerciser simply had no internal target to hit from its sandbox. Resolution: P1 stands.
|
||||||
|
- **Shared blind spot — the live upload/analyze pipeline.** Every agent skipped or only sampled the deep internals of `chunked-analyze.js`, `post-cluster-polish.js`, `meeting-extras.js`, `gemini.js`, and `hardware.js`, and none could exercise `POST /admin/internal-meetings/upload` end-to-end (admin-gated + needs live Spark Control). The most complex state-mutating path (upload → merge → recluster → repolish) is unverified outside its unit tests.
|
||||||
|
- **Dependency audit ran offline.** `npm audit` had no network in-sandbox (`0 vulns` is unreliable); the multer finding was confirmed via web advisories, but transitive CVEs could be missed. A live `npm audit`/`osv-scanner` would raise confidence.
|
||||||
|
|
||||||
|
## Suggested order of work
|
||||||
|
|
||||||
|
1. **Close the SSRF (P1)** — in `downloadDirect`, reject non-http(s) schemes, DNS-resolve and reject private/loopback/link-local hosts, and re-validate on each redirect hop (or stop following redirects to internal ranges). This is the one finding with adversarial reach.
|
||||||
|
2. **Fix the billing money-path (P1) and lock it with tests** — make `extendUserTier` preserve `monthly_consumed`/`last_renewal_at` and only roll them on a true period boundary; persist webhook dedup so a restart can't double-grant; add unit tests for `commitCredit`/`refundCredit`/`applyTierPromotion` and the grant handlers (do this before trusting any green run on billing).
|
||||||
|
3. **Upgrade `multer` to ^2.0.1 (P1)** and re-test the `file` upload field.
|
||||||
|
4. **Add input-validation guards (P2)** — validate `:id` against the UUID shape before `path.join`, and register an Express error handler that turns `entity.parse.failed` into a JSON 400 instead of a stack trace.
|
||||||
|
5. **Resolve the BTCPay dependency contradiction (P2)** — decide optional vs required, then make the manifest (`optional`), `dependencies.ts` (`kind`), and the code comment agree; this governs whether the relay can even start without BTCPay.
|
||||||
|
6. **Tighten the small security gaps and refresh the docs (P2–P3)** — constant-time operator-key compare, scope CORS off `/admin/*`, then correct AGENTS.md (the `/relay/*` auth statement, the `/admin/btcpay/callback` exempt path, the test list) and bump `server/package.json` so `/relay/health` stops reporting `0.2.11`.
|
||||||
Reference in New Issue
Block a user