Files
recap/EVALUATION.md
Keysat d4c742d6e7 Add full-evaluation report
Independent six-lens evaluation (evaluator, security-auditor, exerciser,
doc-auditor, start9-spec-checker). Surfaces three P0s on the cloud surface
and a registry-submission block; full priority queue in the file.
2026-06-14 09:39:46 -05:00

103 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Evaluation — recap (Recaps) — 2026-06-13
Intent: Recaps is a YouTube + podcast summarizer and personal library, served as a single-page vanilla-JS app from a Node.js (ES module) backend — shipping both as a single-mode StartOS `.s9pk` self-host package and as the multi-tenant `recaps.cc` cloud (SQLite, credit-metered, talking to a sibling "relay" service).
Agents run: evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker. Reviewer **skipped** (working tree clean — no diff to review). All five completed; none failed.
## Verdict
The pure-logic core is genuinely well-built — 103 passing tests, parameterized SQL everywhere, scrypt + `timingSafeEqual` auth, hashed single-use magic-link tokens, server-derived tenant identity that resists `X-Recap-User-Id` spoofing, and a correctly server-to-server billing/settlement path. But three independent agents surfaced **three P0 issues** that the test suite can't see because the riskiest files are untested: an **arbitrary file write** via `/api/library/import` (reproduced live — `../../` in a session key escapes the scope dir), an **SSRF-with-read-back** in the podcast download path (reachable by an anonymous trial), and a **live, still-active Gemini API key committed to git history**. On top of that the multi-tenant claim is undercut by a process-global lock that serializes every concurrent cloud summarize to one-at-a-time. None of these is hard to fix, but the cloud is **not ready for untrusted users** until the three P0s are closed and the key is rotated. Self-host single-mode is in much better shape (the file-write and SSRF matter far less with one trusted operator), but a StartOS **community-registry submission is BLOCKED** on packaging gaps regardless.
## Cross-referenced findings (corroboration / merges)
- **SSE error-boundary leak — TWO agents, ONE finding.** The evaluator (`index.js:3432`, `:3003`, `:4246`) and the security-auditor (`relay.js:135-144``index.js:~4246`) independently flagged that relay backend errors are forwarded to cloud users verbatim, violating the AGENTS.md "scrub operator-internal language" contract. The evaluator's surprise sharpens it: the pipeline's own regex at `index.js:3419` detects "Parakeet/Gemma/CUDA" in the error string and then forwards that exact string two lines later. **One P2, two evidence kinds.**
- **The file-write P0 and a doc finding share a root cause.** The exerciser reproduced arbitrary file write in `library.js` because the session key is used as a filename without `safeFilename()`. The doc-auditor independently found that `safeFilename()` is *module-private in `history.js`*, not an exported shared util — which is *why* `library.js` couldn't reuse it. AGENTS.md tells contributors "`safeFilename()` already exists — use it," but it isn't importable. Fix the code and the doc together.
- **The concurrency lock cuts both ways.** The evaluator's P1 (global `currentFreeJob` lock serializes all tenants) and the security-auditor's P2 (concurrent credit over-spend *when the lock is absent*, i.e. on licensed installs) are two faces of the same `isFreeUser()`/`currentFreeJob` mechanism: when the lock applies it breaks multi-tenant concurrency; when it doesn't, it lets a 1-credit user fan out N parallel requests before any debit lands.
- **"Riskiest files are untested" — evaluator + exerciser agree.** Both note zero coverage on `/api/process` gating, `relay.js`, `tenant-auth.js`, and billing. Every P0/P1 code bug above lives in an untested file. The exerciser further notes the real summarize→save→debit happy path could not be run end-to-end (no Gemini key / relay credits) — a shared blind spot, below.
- **Secrets — evaluator + security-auditor.** Both flag `cookies.txt` as sensitive plaintext in the repo root; the security-auditor escalates with the far more serious committed-and-active Gemini key in history `d5046a0:.env`. Working tree itself is clean.
- **Version-file sprawl — evaluator + spec-checker.** Both independently noted 175 StartOS version files with only ~3 carrying real migrations.
## Priority queue
**P0 — block the cloud; fix before any untrusted use**
- [P0] Arbitrary file write via `/api/library/import``../../` in a session key escapes scope; reproduced writing `/tmp/rce_test.json` & `recap/evil.json``server/library.js:131-139` (key used verbatim, no `safeFilename()`) — **exerciser** *(confirmed in single-mode; confirm multi-mode tenant reachability — it's per-scope so any authed tenant should reach it)*
- [P0] SSRF with read-back via podcast URL — anon trial can POST `{type:"podcast", url:"http://169.254.169.254/..."}`; unguarded `http.get` follows redirects to any host and the body is transcribed back to the attacker — `server/audio.js:78-97` (`downloadPodcastAudio`), gate `server/index.js:2747`, reached `:3455`**security-auditor**
- [P0] Live Gemini API key committed to git history and still the active key — `git show d5046a0:.env`, pushed to `origin/master`**security-auditor** *(rotate now; purge from history)*
**P1 — correctness / availability / submission blockers**
- [P1] `require("crypto")` inside an ESM module throws `ReferenceError` on the anon license-purchase settle path — `server/license-purchase.js:423` (called from `:353-354`) — **evaluator**
- [P1] Global single-flight `currentFreeJob` lock serializes the *entire* multi-tenant cloud (a 2nd concurrent `/api/process` from any user gets 409); `isFreeUser()` returns true for every tenant in multi-mode — `server/index.js:2621`, `server/license-middleware.js:143`**evaluator**
- [P1] Trial IP-cap + magic-link rate-limit bypass via spoofed `X-Forwarded-For` (no `trust proxy` / XFF normalization) — unlimited free trial credits if the edge proxy appends rather than replaces XFF — `server/anon-trial.js:50-57`, `server/auth-routes.js:209`**security-auditor** *(deployment-dependent on the recaps.cc proxy)*
- [P1] StartOS community-registry submission **BLOCKED** (3 blockers): missing root `instructions.md`; `packageRepo`/`upstreamRepo` point to `https://ten31.xyz` (a homepage, not a source repo); `license: 'Proprietary'` fails the "source available" gate — `startos/manifest/index.ts`, repo root — **start9-spec-checker** *(only blocks registry submission; does NOT affect `make install`)*
**P2 — real risk / quality, not release-blocking for self-host**
- [P2] Operator-internal language (Parakeet/Gemma/CUDA/LAN IPs/`*.local`) leaks to cloud users at the SSE error boundary; no scrub exists — `server/index.js:3432,3003,4246` + `server/providers/relay.js:135-144`**evaluator + security-auditor**
- [P2] Concurrent credit over-spend (TOCTOU) on licensed installs — N parallel requests pass the `total>0` check before any blind `debitOne` lands — gate `server/index.js:2497-2550` vs debit `:3158,:4197` (`anon-trial.js:299`) — **security-auditor**
- [P2] Multi-mode tenant can spend the operator's *server* Gemini key directly (bypasses relay metering) by selecting `transcriptionProvider:"gemini"` with an empty key — `server/providers/index.js:104-114`**security-auditor**
- [P2] `GET /api/history` reads + `JSON.parse`s every full session file (transcript+summary, MB each) just to list ~8 metadata fields — `server/history.js:418-437`**evaluator**
- [P2] Dependency CVEs: nodemailer 6.10.1 (high — SMTP injection/DoS, low practical reach here), ws 8.20.0, qs/express, protobufjs 7.5.6 (moderate) — `npm audit` in `server/`**security-auditor**
- [P2] Unsanitized IDs persisted into `_meta.json` via array-form import and `PUT /api/history/move` (corrupts the listing; no direct file-path escape) — `server/library.js`, history move handler — **exerciser**
- [P2] `PUT /api/history/meta` accepts arbitrary JSON shapes with no schema (a wrong-typed body, e.g. `{"folders":"not-an-array"}`, can break the listing) — **exerciser**
- [P2] `server/index.js` is 4351 lines mixing routing, the full `/api/process` pipeline, yt-dlp orchestration, and SSE (`:2463-4270`) — **evaluator**
- [P2] Zero tests on the highest-risk files (`/api/process` gating, `relay.js`, `tenant-auth.js`, billing) — every code P0/P1 above lives here — **evaluator + exerciser**
**P2/P3 — documentation drift (doc-auditor)**
- [P2] Documented credit-gate order in AGENTS.md omits the "paid cloud user" bypass state (between license and free-tenant) — `AGENTS.md:77` vs `server/index.js:2464-2472`**doc-auditor**
- [P2] Operator-facing copy is stale Gemini-first / Gemini-only; relay is the default provider — `startos-registry/packages/recap/INSTRUCTIONS.md:5,23`, `assets/ABOUT.md:3,18`**doc-auditor**
- [P3] AGENTS.md directory layout omits ~25 server modules (audio, library, credits-purchase, tts-routes, relay-default/-capabilities/-state, url-resolver, ytdlp, …) — `AGENTS.md:34-60`**doc-auditor**
- [P3] `relay-client.md` Authorization header missing the `Bearer` scheme — says `Authorization: <license>`, code sends `Authorization: Bearer <license_key>``docs/guides/relay-client.md:17` (== `.claude/rules/relay-client.md:17`) — **doc-auditor**
- [P3] `index.html` line count stated `~10k`, actually 12,515 — `AGENTS.md:51`**doc-auditor**
- [P3] AGENTS.md `safeFilename()` convention implies an importable util, but it's module-private in `history.js``AGENTS.md:79`**doc-auditor** *(see P0 file-write)*
**P3 — hardening / hygiene**
- [P3] 100MB JSON body limit on all routes is a cheap memory-exhaustion lever for unauthenticated `/api/process``server/index.js:203`**evaluator**
- [P3] `/api/credits/claim`: a leaked anon BTCPay invoice ID lets any signed-in account claim those credits — `server/credits-purchase.js:342`**security-auditor**
- [P3] `downloadPodcastAudio` has no size/time cap (disk/memory fill) — `server/audio.js`**security-auditor**
- [P3] Container runs as root (no `USER` in `Dockerfile`) — acceptable under StartOS isolation, add non-root for the cloud image — **security-auditor**
- [P3] In-memory auth rate-limit buckets reset on restart — `server/auth-routes.js:106`**security-auditor + exerciser**
- [P3] Stale `youtube-summarizer_x86_64.s9pk` (~223MB) + root `package.json` still named `youtube-summarizer-startos` — leftover from the package-ID rename — **start9-spec-checker**
- [P3] Manifest `docsUrls: []` empty; multi-tenant cloud actions exposed in the single-mode StartOS menu must be verified to run cleanly (`enableMultiTenantMode` et al.) — `startos/actions/index.ts`**start9-spec-checker**
- [P3] `cookies.txt` (119KB, sensitive yt-dlp plaintext) sits in the repo root and is expiring imminently (`/api/health` already reports `fileExpiring:true`) — correctly gitignored — **evaluator + exerciser**
- [P3] 175 StartOS version files, only ~3 with real migrations (172 empty `up`/`down` stubs) — maintenance surface — **evaluator + start9-spec-checker**
## Scorecard
The evaluator's six-lens table, with two lenses adjusted on cross-agent evidence:
| Lens | Evaluator | Adjusted | Why adjusted |
|---|---|---|---|
| Architecture | 3 | 3 | — |
| Security | 4 | **2** | The evaluator sampled `index.js`/`relay.js` via grep and found no P0; the exerciser (file write, reproduced) and security-auditor (SSRF read-back + committed live key) each found a P0 the evaluator's sample missed. Three P0s on the cloud surface contradict a 4. |
| Performance | 3 | 3 | — |
| Testing | 3 | 3 | Corroborated by the exerciser; every code P0/P1 lives in an untested file. Borderline 2. |
| Code quality | 3 | 3 | — |
| Documentation | 4 | **3** | The doc-auditor found moderate drift the evaluator's "candid docs" read didn't weigh: a missing gate-order state, a directory layout missing half the server, and stale Gemini-first operator instructions. |
Not a lens, but tracked separately: **StartOS packaging = BLOCKED for registry submission** (3 hard blockers), fine for `make install`.
## Disagreements & gaps
- **Disagreement on Security posture.** The evaluator rated Security 4 ("fundamentals done right"); the security-auditor and exerciser found three P0s. Both are partly right — the *fundamentals* (SQL, auth, billing trust model) genuinely are strong, but the *user-controlled-input → side-effect* paths (`library.js` filename, `audio.js` URL fetch) were the evaluator's blind spot because it sampled the big files by grep rather than reading `library.js`/`audio.js` in full. Adjusted Security to 2 to reflect the reproduced exploit.
- **Shared blind spot — the real pipeline was never run end-to-end.** No agent had a Gemini key or relay credits, so the actual summarize → `saveToHistory` → credit-debit happy path under concurrency is unverified by *anyone*. The credit-TOCTOU (P2) and the concurrency lock (P1) are reasoned from code, not observed. This is the single highest-value gap to close with a live integration test.
- **Deployment-dependent finding.** The `X-Forwarded-For` bypass (P1) can't be confirmed from this repo — it hinges on whether the recaps.cc edge proxy appends or overwrites XFF. Self-hosted StartOS (tunnel sets XFF) is unaffected.
- **Spec-checker couldn't fully verify two things read-only:** whether the local `start-cli` pack hard-fails without `instructions.md` (its `list-ingredients` omits it — ambiguous), and whether the multi-tenant actions run cleanly in single mode (needs an on-device run).
## Surprises (carried forward from every agent)
- The pipeline detects relay errors containing "Parakeet/Gemma/CUDA" (`index.js:3419`) and then forwards that exact string to cloud users two lines later — it knows the secret and leaks it anyway. (evaluator)
- The SSRF guard the codebase clearly knows it needs — it strips `::ffff:` and validates IPs for the trial cap — is entirely absent from the one place that fetches a user-controlled URL. (security-auditor)
- The billing/settlement trust model (a common place for funds bugs) is correctly server-to-server, relay-verified, and idempotent — a *positive* surprise. (security-auditor)
- The first `/api/process` after boot triggers a yt-dlp + Homebrew self-update, adding 510s latency before the request completes. (exerciser)
- AGENTS.md's server directory layout documents only a skeleton — more than half the `server/*.js` files are unlisted; it reads like it was written when the server was much smaller. (doc-auditor)
- `start-cli s9pk list-ingredients` does not include `instructions.md` even though the docs say the build fails without it, and the root `package.json` is still named `youtube-summarizer-startos`. (start9-spec-checker)
## Suggested order of work
1. **Rotate the leaked Gemini key now**, then purge `d5046a0:.env` from history (BFG/filter-repo) and force-push. Independent of all code — do it first. *(P0 secret)*
2. **Close the two reachable code P0s:** add an SSRF guard to `downloadPodcastAudio` (reject private/link-local/loopback IPs, https-only, re-validate on redirect, size/time cap — folds in the P3 cap), and validate the import session key. Export `safeFilename()` as a shared util and call it from `library.js` + the history-move/import paths (this also closes the P2 `_meta.json` pollution). *(P0 file-write + SSRF)*
3. **Fix the two P1 correctness bugs:** the ESM `require()` in `license-purchase.js`, and scope the concurrency lock per-identity (or skip it entirely) in multi-mode. *(P1)*
4. **Add a live integration test** for `/api/process` gating + library import + a happy-path summarize-with-debit (with a stub/real key) — this is the shared blind spot and the regression net for steps 23. Then close the credit TOCTOU and the SSE error-boundary scrub against that test. *(P2)*
5. **Patch dependency CVEs** (`npm audit fix`; nodemailer is a major bump) and the multi-mode server-key fallback. *(P2)*
6. **Reconcile the docs:** add the "paid cloud user" gate-order state, rewrite the operator INSTRUCTIONS.md / ABOUT.md as relay-default (not Gemini-first), fix the directory layout and the Bearer/line-count/`safeFilename` nits. *(P2/P3)*
7. **Only if submitting to the registry:** add `instructions.md`, point `packageRepo`/`upstreamRepo` at a public source repo, choose a source-available license, and remove the stale `youtube-summarizer_x86_64.s9pk`. Otherwise these are non-issues for `make install`. *(P1 conditional)*