From d4c742d6e76d74791f36789131940138c510b35b Mon Sep 17 00:00:00 2001 From: Keysat Date: Sun, 14 Jun 2026 09:39:46 -0500 Subject: [PATCH] Add full-evaluation report Independent six-lens evaluation (evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker). Surfaces three P0s on the cloud surface and a registry-submission block; full priority queue in the file. --- EVALUATION.md | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 EVALUATION.md diff --git a/EVALUATION.md b/EVALUATION.md new file mode 100644 index 0000000..e51ba92 --- /dev/null +++ b/EVALUATION.md @@ -0,0 +1,102 @@ +# Evaluation — recap (Recaps) — 2026-06-13 + +Intent: Recaps is a YouTube + podcast summarizer and personal library, served as a single-page vanilla-JS app from a Node.js (ES module) backend — shipping both as a single-mode StartOS `.s9pk` self-host package and as the multi-tenant `recaps.cc` cloud (SQLite, credit-metered, talking to a sibling "relay" service). + +Agents run: evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker. Reviewer **skipped** (working tree clean — no diff to review). All five completed; none failed. + +## Verdict + +The pure-logic core is genuinely well-built — 103 passing tests, parameterized SQL everywhere, scrypt + `timingSafeEqual` auth, hashed single-use magic-link tokens, server-derived tenant identity that resists `X-Recap-User-Id` spoofing, and a correctly server-to-server billing/settlement path. But three independent agents surfaced **three P0 issues** that the test suite can't see because the riskiest files are untested: an **arbitrary file write** via `/api/library/import` (reproduced live — `../../` in a session key escapes the scope dir), an **SSRF-with-read-back** in the podcast download path (reachable by an anonymous trial), and a **live, still-active Gemini API key committed to git history**. On top of that the multi-tenant claim is undercut by a process-global lock that serializes every concurrent cloud summarize to one-at-a-time. None of these is hard to fix, but the cloud is **not ready for untrusted users** until the three P0s are closed and the key is rotated. Self-host single-mode is in much better shape (the file-write and SSRF matter far less with one trusted operator), but a StartOS **community-registry submission is BLOCKED** on packaging gaps regardless. + +## Cross-referenced findings (corroboration / merges) + +- **SSE error-boundary leak — TWO agents, ONE finding.** The evaluator (`index.js:3432`, `:3003`, `:4246`) and the security-auditor (`relay.js:135-144` → `index.js:~4246`) independently flagged that relay backend errors are forwarded to cloud users verbatim, violating the AGENTS.md "scrub operator-internal language" contract. The evaluator's surprise sharpens it: the pipeline's own regex at `index.js:3419` detects "Parakeet/Gemma/CUDA" in the error string and then forwards that exact string two lines later. **One P2, two evidence kinds.** +- **The file-write P0 and a doc finding share a root cause.** The exerciser reproduced arbitrary file write in `library.js` because the session key is used as a filename without `safeFilename()`. The doc-auditor independently found that `safeFilename()` is *module-private in `history.js`*, not an exported shared util — which is *why* `library.js` couldn't reuse it. AGENTS.md tells contributors "`safeFilename()` already exists — use it," but it isn't importable. Fix the code and the doc together. +- **The concurrency lock cuts both ways.** The evaluator's P1 (global `currentFreeJob` lock serializes all tenants) and the security-auditor's P2 (concurrent credit over-spend *when the lock is absent*, i.e. on licensed installs) are two faces of the same `isFreeUser()`/`currentFreeJob` mechanism: when the lock applies it breaks multi-tenant concurrency; when it doesn't, it lets a 1-credit user fan out N parallel requests before any debit lands. +- **"Riskiest files are untested" — evaluator + exerciser agree.** Both note zero coverage on `/api/process` gating, `relay.js`, `tenant-auth.js`, and billing. Every P0/P1 code bug above lives in an untested file. The exerciser further notes the real summarize→save→debit happy path could not be run end-to-end (no Gemini key / relay credits) — a shared blind spot, below. +- **Secrets — evaluator + security-auditor.** Both flag `cookies.txt` as sensitive plaintext in the repo root; the security-auditor escalates with the far more serious committed-and-active Gemini key in history `d5046a0:.env`. Working tree itself is clean. +- **Version-file sprawl — evaluator + spec-checker.** Both independently noted 175 StartOS version files with only ~3 carrying real migrations. + +## Priority queue + +**P0 — block the cloud; fix before any untrusted use** +- [P0] Arbitrary file write via `/api/library/import` — `../../` in a session key escapes scope; reproduced writing `/tmp/rce_test.json` & `recap/evil.json` — `server/library.js:131-139` (key used verbatim, no `safeFilename()`) — **exerciser** *(confirmed in single-mode; confirm multi-mode tenant reachability — it's per-scope so any authed tenant should reach it)* +- [P0] SSRF with read-back via podcast URL — anon trial can POST `{type:"podcast", url:"http://169.254.169.254/..."}`; unguarded `http.get` follows redirects to any host and the body is transcribed back to the attacker — `server/audio.js:78-97` (`downloadPodcastAudio`), gate `server/index.js:2747`, reached `:3455` — **security-auditor** +- [P0] Live Gemini API key committed to git history and still the active key — `git show d5046a0:.env`, pushed to `origin/master` — **security-auditor** *(rotate now; purge from history)* + +**P1 — correctness / availability / submission blockers** +- [P1] `require("crypto")` inside an ESM module throws `ReferenceError` on the anon license-purchase settle path — `server/license-purchase.js:423` (called from `:353-354`) — **evaluator** +- [P1] Global single-flight `currentFreeJob` lock serializes the *entire* multi-tenant cloud (a 2nd concurrent `/api/process` from any user gets 409); `isFreeUser()` returns true for every tenant in multi-mode — `server/index.js:2621`, `server/license-middleware.js:143` — **evaluator** +- [P1] Trial IP-cap + magic-link rate-limit bypass via spoofed `X-Forwarded-For` (no `trust proxy` / XFF normalization) — unlimited free trial credits if the edge proxy appends rather than replaces XFF — `server/anon-trial.js:50-57`, `server/auth-routes.js:209` — **security-auditor** *(deployment-dependent on the recaps.cc proxy)* +- [P1] StartOS community-registry submission **BLOCKED** (3 blockers): missing root `instructions.md`; `packageRepo`/`upstreamRepo` point to `https://ten31.xyz` (a homepage, not a source repo); `license: 'Proprietary'` fails the "source available" gate — `startos/manifest/index.ts`, repo root — **start9-spec-checker** *(only blocks registry submission; does NOT affect `make install`)* + +**P2 — real risk / quality, not release-blocking for self-host** +- [P2] Operator-internal language (Parakeet/Gemma/CUDA/LAN IPs/`*.local`) leaks to cloud users at the SSE error boundary; no scrub exists — `server/index.js:3432,3003,4246` + `server/providers/relay.js:135-144` — **evaluator + security-auditor** +- [P2] Concurrent credit over-spend (TOCTOU) on licensed installs — N parallel requests pass the `total>0` check before any blind `debitOne` lands — gate `server/index.js:2497-2550` vs debit `:3158,:4197` (`anon-trial.js:299`) — **security-auditor** +- [P2] Multi-mode tenant can spend the operator's *server* Gemini key directly (bypasses relay metering) by selecting `transcriptionProvider:"gemini"` with an empty key — `server/providers/index.js:104-114` — **security-auditor** +- [P2] `GET /api/history` reads + `JSON.parse`s every full session file (transcript+summary, MB each) just to list ~8 metadata fields — `server/history.js:418-437` — **evaluator** +- [P2] Dependency CVEs: nodemailer 6.10.1 (high — SMTP injection/DoS, low practical reach here), ws 8.20.0, qs/express, protobufjs 7.5.6 (moderate) — `npm audit` in `server/` — **security-auditor** +- [P2] Unsanitized IDs persisted into `_meta.json` via array-form import and `PUT /api/history/move` (corrupts the listing; no direct file-path escape) — `server/library.js`, history move handler — **exerciser** +- [P2] `PUT /api/history/meta` accepts arbitrary JSON shapes with no schema (a wrong-typed body, e.g. `{"folders":"not-an-array"}`, can break the listing) — **exerciser** +- [P2] `server/index.js` is 4351 lines mixing routing, the full `/api/process` pipeline, yt-dlp orchestration, and SSE (`:2463-4270`) — **evaluator** +- [P2] Zero tests on the highest-risk files (`/api/process` gating, `relay.js`, `tenant-auth.js`, billing) — every code P0/P1 above lives here — **evaluator + exerciser** + +**P2/P3 — documentation drift (doc-auditor)** +- [P2] Documented credit-gate order in AGENTS.md omits the "paid cloud user" bypass state (between license and free-tenant) — `AGENTS.md:77` vs `server/index.js:2464-2472` — **doc-auditor** +- [P2] Operator-facing copy is stale Gemini-first / Gemini-only; relay is the default provider — `startos-registry/packages/recap/INSTRUCTIONS.md:5,23`, `assets/ABOUT.md:3,18` — **doc-auditor** +- [P3] AGENTS.md directory layout omits ~25 server modules (audio, library, credits-purchase, tts-routes, relay-default/-capabilities/-state, url-resolver, ytdlp, …) — `AGENTS.md:34-60` — **doc-auditor** +- [P3] `relay-client.md` Authorization header missing the `Bearer` scheme — says `Authorization: `, code sends `Authorization: Bearer ` — `docs/guides/relay-client.md:17` (== `.claude/rules/relay-client.md:17`) — **doc-auditor** +- [P3] `index.html` line count stated `~10k`, actually 12,515 — `AGENTS.md:51` — **doc-auditor** +- [P3] AGENTS.md `safeFilename()` convention implies an importable util, but it's module-private in `history.js` — `AGENTS.md:79` — **doc-auditor** *(see P0 file-write)* + +**P3 — hardening / hygiene** +- [P3] 100MB JSON body limit on all routes is a cheap memory-exhaustion lever for unauthenticated `/api/process` — `server/index.js:203` — **evaluator** +- [P3] `/api/credits/claim`: a leaked anon BTCPay invoice ID lets any signed-in account claim those credits — `server/credits-purchase.js:342` — **security-auditor** +- [P3] `downloadPodcastAudio` has no size/time cap (disk/memory fill) — `server/audio.js` — **security-auditor** +- [P3] Container runs as root (no `USER` in `Dockerfile`) — acceptable under StartOS isolation, add non-root for the cloud image — **security-auditor** +- [P3] In-memory auth rate-limit buckets reset on restart — `server/auth-routes.js:106` — **security-auditor + exerciser** +- [P3] Stale `youtube-summarizer_x86_64.s9pk` (~223MB) + root `package.json` still named `youtube-summarizer-startos` — leftover from the package-ID rename — **start9-spec-checker** +- [P3] Manifest `docsUrls: []` empty; multi-tenant cloud actions exposed in the single-mode StartOS menu must be verified to run cleanly (`enableMultiTenantMode` et al.) — `startos/actions/index.ts` — **start9-spec-checker** +- [P3] `cookies.txt` (119KB, sensitive yt-dlp plaintext) sits in the repo root and is expiring imminently (`/api/health` already reports `fileExpiring:true`) — correctly gitignored — **evaluator + exerciser** +- [P3] 175 StartOS version files, only ~3 with real migrations (172 empty `up`/`down` stubs) — maintenance surface — **evaluator + start9-spec-checker** + +## Scorecard + +The evaluator's six-lens table, with two lenses adjusted on cross-agent evidence: + +| Lens | Evaluator | Adjusted | Why adjusted | +|---|---|---|---| +| Architecture | 3 | 3 | — | +| Security | 4 | **2** | The evaluator sampled `index.js`/`relay.js` via grep and found no P0; the exerciser (file write, reproduced) and security-auditor (SSRF read-back + committed live key) each found a P0 the evaluator's sample missed. Three P0s on the cloud surface contradict a 4. | +| Performance | 3 | 3 | — | +| Testing | 3 | 3 | Corroborated by the exerciser; every code P0/P1 lives in an untested file. Borderline 2. | +| Code quality | 3 | 3 | — | +| Documentation | 4 | **3** | The doc-auditor found moderate drift the evaluator's "candid docs" read didn't weigh: a missing gate-order state, a directory layout missing half the server, and stale Gemini-first operator instructions. | + +Not a lens, but tracked separately: **StartOS packaging = BLOCKED for registry submission** (3 hard blockers), fine for `make install`. + +## Disagreements & gaps + +- **Disagreement on Security posture.** The evaluator rated Security 4 ("fundamentals done right"); the security-auditor and exerciser found three P0s. Both are partly right — the *fundamentals* (SQL, auth, billing trust model) genuinely are strong, but the *user-controlled-input → side-effect* paths (`library.js` filename, `audio.js` URL fetch) were the evaluator's blind spot because it sampled the big files by grep rather than reading `library.js`/`audio.js` in full. Adjusted Security to 2 to reflect the reproduced exploit. +- **Shared blind spot — the real pipeline was never run end-to-end.** No agent had a Gemini key or relay credits, so the actual summarize → `saveToHistory` → credit-debit happy path under concurrency is unverified by *anyone*. The credit-TOCTOU (P2) and the concurrency lock (P1) are reasoned from code, not observed. This is the single highest-value gap to close with a live integration test. +- **Deployment-dependent finding.** The `X-Forwarded-For` bypass (P1) can't be confirmed from this repo — it hinges on whether the recaps.cc edge proxy appends or overwrites XFF. Self-hosted StartOS (tunnel sets XFF) is unaffected. +- **Spec-checker couldn't fully verify two things read-only:** whether the local `start-cli` pack hard-fails without `instructions.md` (its `list-ingredients` omits it — ambiguous), and whether the multi-tenant actions run cleanly in single mode (needs an on-device run). + +## Surprises (carried forward from every agent) + +- The pipeline detects relay errors containing "Parakeet/Gemma/CUDA" (`index.js:3419`) and then forwards that exact string to cloud users two lines later — it knows the secret and leaks it anyway. (evaluator) +- The SSRF guard the codebase clearly knows it needs — it strips `::ffff:` and validates IPs for the trial cap — is entirely absent from the one place that fetches a user-controlled URL. (security-auditor) +- The billing/settlement trust model (a common place for funds bugs) is correctly server-to-server, relay-verified, and idempotent — a *positive* surprise. (security-auditor) +- The first `/api/process` after boot triggers a yt-dlp + Homebrew self-update, adding 5–10s latency before the request completes. (exerciser) +- AGENTS.md's server directory layout documents only a skeleton — more than half the `server/*.js` files are unlisted; it reads like it was written when the server was much smaller. (doc-auditor) +- `start-cli s9pk list-ingredients` does not include `instructions.md` even though the docs say the build fails without it, and the root `package.json` is still named `youtube-summarizer-startos`. (start9-spec-checker) + +## Suggested order of work + +1. **Rotate the leaked Gemini key now**, then purge `d5046a0:.env` from history (BFG/filter-repo) and force-push. Independent of all code — do it first. *(P0 secret)* +2. **Close the two reachable code P0s:** add an SSRF guard to `downloadPodcastAudio` (reject private/link-local/loopback IPs, https-only, re-validate on redirect, size/time cap — folds in the P3 cap), and validate the import session key. Export `safeFilename()` as a shared util and call it from `library.js` + the history-move/import paths (this also closes the P2 `_meta.json` pollution). *(P0 file-write + SSRF)* +3. **Fix the two P1 correctness bugs:** the ESM `require()` in `license-purchase.js`, and scope the concurrency lock per-identity (or skip it entirely) in multi-mode. *(P1)* +4. **Add a live integration test** for `/api/process` gating + library import + a happy-path summarize-with-debit (with a stub/real key) — this is the shared blind spot and the regression net for steps 2–3. Then close the credit TOCTOU and the SSE error-boundary scrub against that test. *(P2)* +5. **Patch dependency CVEs** (`npm audit fix`; nodemailer is a major bump) and the multi-mode server-key fallback. *(P2)* +6. **Reconcile the docs:** add the "paid cloud user" gate-order state, rewrite the operator INSTRUCTIONS.md / ABOUT.md as relay-default (not Gemini-first), fix the directory layout and the Bearer/line-count/`safeFilename` nits. *(P2/P3)* +7. **Only if submitting to the registry:** add `instructions.md`, point `packageRepo`/`upstreamRepo` at a public source repo, choose a source-available license, and remove the stale `youtube-summarizer_x86_64.s9pk`. Otherwise these are non-issues for `make install`. *(P1 conditional)*