Files
recap/EVALUATION.md
T
Keysat d4c742d6e7 Add full-evaluation report
Independent six-lens evaluation (evaluator, security-auditor, exerciser,
doc-auditor, start9-spec-checker). Surfaces three P0s on the cloud surface
and a registry-submission block; full priority queue in the file.
2026-06-14 09:39:46 -05:00

16 KiB
Raw Blame History

Evaluation — recap (Recaps) — 2026-06-13

Intent: Recaps is a YouTube + podcast summarizer and personal library, served as a single-page vanilla-JS app from a Node.js (ES module) backend — shipping both as a single-mode StartOS .s9pk self-host package and as the multi-tenant recaps.cc cloud (SQLite, credit-metered, talking to a sibling "relay" service).

Agents run: evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker. Reviewer skipped (working tree clean — no diff to review). All five completed; none failed.

Verdict

The pure-logic core is genuinely well-built — 103 passing tests, parameterized SQL everywhere, scrypt + timingSafeEqual auth, hashed single-use magic-link tokens, server-derived tenant identity that resists X-Recap-User-Id spoofing, and a correctly server-to-server billing/settlement path. But three independent agents surfaced three P0 issues that the test suite can't see because the riskiest files are untested: an arbitrary file write via /api/library/import (reproduced live — ../../ in a session key escapes the scope dir), an SSRF-with-read-back in the podcast download path (reachable by an anonymous trial), and a live, still-active Gemini API key committed to git history. On top of that the multi-tenant claim is undercut by a process-global lock that serializes every concurrent cloud summarize to one-at-a-time. None of these is hard to fix, but the cloud is not ready for untrusted users until the three P0s are closed and the key is rotated. Self-host single-mode is in much better shape (the file-write and SSRF matter far less with one trusted operator), but a StartOS community-registry submission is BLOCKED on packaging gaps regardless.

Cross-referenced findings (corroboration / merges)

  • SSE error-boundary leak — TWO agents, ONE finding. The evaluator (index.js:3432, :3003, :4246) and the security-auditor (relay.js:135-144index.js:~4246) independently flagged that relay backend errors are forwarded to cloud users verbatim, violating the AGENTS.md "scrub operator-internal language" contract. The evaluator's surprise sharpens it: the pipeline's own regex at index.js:3419 detects "Parakeet/Gemma/CUDA" in the error string and then forwards that exact string two lines later. One P2, two evidence kinds.
  • The file-write P0 and a doc finding share a root cause. The exerciser reproduced arbitrary file write in library.js because the session key is used as a filename without safeFilename(). The doc-auditor independently found that safeFilename() is module-private in history.js, not an exported shared util — which is why library.js couldn't reuse it. AGENTS.md tells contributors "safeFilename() already exists — use it," but it isn't importable. Fix the code and the doc together.
  • The concurrency lock cuts both ways. The evaluator's P1 (global currentFreeJob lock serializes all tenants) and the security-auditor's P2 (concurrent credit over-spend when the lock is absent, i.e. on licensed installs) are two faces of the same isFreeUser()/currentFreeJob mechanism: when the lock applies it breaks multi-tenant concurrency; when it doesn't, it lets a 1-credit user fan out N parallel requests before any debit lands.
  • "Riskiest files are untested" — evaluator + exerciser agree. Both note zero coverage on /api/process gating, relay.js, tenant-auth.js, and billing. Every P0/P1 code bug above lives in an untested file. The exerciser further notes the real summarize→save→debit happy path could not be run end-to-end (no Gemini key / relay credits) — a shared blind spot, below.
  • Secrets — evaluator + security-auditor. Both flag cookies.txt as sensitive plaintext in the repo root; the security-auditor escalates with the far more serious committed-and-active Gemini key in history d5046a0:.env. Working tree itself is clean.
  • Version-file sprawl — evaluator + spec-checker. Both independently noted 175 StartOS version files with only ~3 carrying real migrations.

Priority queue

P0 — block the cloud; fix before any untrusted use

  • [P0] Arbitrary file write via /api/library/import../../ in a session key escapes scope; reproduced writing /tmp/rce_test.json & recap/evil.jsonserver/library.js:131-139 (key used verbatim, no safeFilename()) — exerciser (confirmed in single-mode; confirm multi-mode tenant reachability — it's per-scope so any authed tenant should reach it)
  • [P0] SSRF with read-back via podcast URL — anon trial can POST {type:"podcast", url:"http://169.254.169.254/..."}; unguarded http.get follows redirects to any host and the body is transcribed back to the attacker — server/audio.js:78-97 (downloadPodcastAudio), gate server/index.js:2747, reached :3455security-auditor
  • [P0] Live Gemini API key committed to git history and still the active key — git show d5046a0:.env, pushed to origin/mastersecurity-auditor (rotate now; purge from history)

P1 — correctness / availability / submission blockers

  • [P1] require("crypto") inside an ESM module throws ReferenceError on the anon license-purchase settle path — server/license-purchase.js:423 (called from :353-354) — evaluator
  • [P1] Global single-flight currentFreeJob lock serializes the entire multi-tenant cloud (a 2nd concurrent /api/process from any user gets 409); isFreeUser() returns true for every tenant in multi-mode — server/index.js:2621, server/license-middleware.js:143evaluator
  • [P1] Trial IP-cap + magic-link rate-limit bypass via spoofed X-Forwarded-For (no trust proxy / XFF normalization) — unlimited free trial credits if the edge proxy appends rather than replaces XFF — server/anon-trial.js:50-57, server/auth-routes.js:209security-auditor (deployment-dependent on the recaps.cc proxy)
  • [P1] StartOS community-registry submission BLOCKED (3 blockers): missing root instructions.md; packageRepo/upstreamRepo point to https://ten31.xyz (a homepage, not a source repo); license: 'Proprietary' fails the "source available" gate — startos/manifest/index.ts, repo root — start9-spec-checker (only blocks registry submission; does NOT affect make install)

P2 — real risk / quality, not release-blocking for self-host

  • [P2] Operator-internal language (Parakeet/Gemma/CUDA/LAN IPs/*.local) leaks to cloud users at the SSE error boundary; no scrub exists — server/index.js:3432,3003,4246 + server/providers/relay.js:135-144evaluator + security-auditor
  • [P2] Concurrent credit over-spend (TOCTOU) on licensed installs — N parallel requests pass the total>0 check before any blind debitOne lands — gate server/index.js:2497-2550 vs debit :3158,:4197 (anon-trial.js:299) — security-auditor
  • [P2] Multi-mode tenant can spend the operator's server Gemini key directly (bypasses relay metering) by selecting transcriptionProvider:"gemini" with an empty key — server/providers/index.js:104-114security-auditor
  • [P2] GET /api/history reads + JSON.parses every full session file (transcript+summary, MB each) just to list ~8 metadata fields — server/history.js:418-437evaluator
  • [P2] Dependency CVEs: nodemailer 6.10.1 (high — SMTP injection/DoS, low practical reach here), ws 8.20.0, qs/express, protobufjs 7.5.6 (moderate) — npm audit in server/security-auditor
  • [P2] Unsanitized IDs persisted into _meta.json via array-form import and PUT /api/history/move (corrupts the listing; no direct file-path escape) — server/library.js, history move handler — exerciser
  • [P2] PUT /api/history/meta accepts arbitrary JSON shapes with no schema (a wrong-typed body, e.g. {"folders":"not-an-array"}, can break the listing) — exerciser
  • [P2] server/index.js is 4351 lines mixing routing, the full /api/process pipeline, yt-dlp orchestration, and SSE (:2463-4270) — evaluator
  • [P2] Zero tests on the highest-risk files (/api/process gating, relay.js, tenant-auth.js, billing) — every code P0/P1 above lives here — evaluator + exerciser

P2/P3 — documentation drift (doc-auditor)

  • [P2] Documented credit-gate order in AGENTS.md omits the "paid cloud user" bypass state (between license and free-tenant) — AGENTS.md:77 vs server/index.js:2464-2472doc-auditor
  • [P2] Operator-facing copy is stale Gemini-first / Gemini-only; relay is the default provider — startos-registry/packages/recap/INSTRUCTIONS.md:5,23, assets/ABOUT.md:3,18doc-auditor
  • [P3] AGENTS.md directory layout omits ~25 server modules (audio, library, credits-purchase, tts-routes, relay-default/-capabilities/-state, url-resolver, ytdlp, …) — AGENTS.md:34-60doc-auditor
  • [P3] relay-client.md Authorization header missing the Bearer scheme — says Authorization: <license>, code sends Authorization: Bearer <license_key>docs/guides/relay-client.md:17 (== .claude/rules/relay-client.md:17) — doc-auditor
  • [P3] index.html line count stated ~10k, actually 12,515 — AGENTS.md:51doc-auditor
  • [P3] AGENTS.md safeFilename() convention implies an importable util, but it's module-private in history.jsAGENTS.md:79doc-auditor (see P0 file-write)

P3 — hardening / hygiene

  • [P3] 100MB JSON body limit on all routes is a cheap memory-exhaustion lever for unauthenticated /api/processserver/index.js:203evaluator
  • [P3] /api/credits/claim: a leaked anon BTCPay invoice ID lets any signed-in account claim those credits — server/credits-purchase.js:342security-auditor
  • [P3] downloadPodcastAudio has no size/time cap (disk/memory fill) — server/audio.jssecurity-auditor
  • [P3] Container runs as root (no USER in Dockerfile) — acceptable under StartOS isolation, add non-root for the cloud image — security-auditor
  • [P3] In-memory auth rate-limit buckets reset on restart — server/auth-routes.js:106security-auditor + exerciser
  • [P3] Stale youtube-summarizer_x86_64.s9pk (~223MB) + root package.json still named youtube-summarizer-startos — leftover from the package-ID rename — start9-spec-checker
  • [P3] Manifest docsUrls: [] empty; multi-tenant cloud actions exposed in the single-mode StartOS menu must be verified to run cleanly (enableMultiTenantMode et al.) — startos/actions/index.tsstart9-spec-checker
  • [P3] cookies.txt (119KB, sensitive yt-dlp plaintext) sits in the repo root and is expiring imminently (/api/health already reports fileExpiring:true) — correctly gitignored — evaluator + exerciser
  • [P3] 175 StartOS version files, only ~3 with real migrations (172 empty up/down stubs) — maintenance surface — evaluator + start9-spec-checker

Scorecard

The evaluator's six-lens table, with two lenses adjusted on cross-agent evidence:

Lens Evaluator Adjusted Why adjusted
Architecture 3 3
Security 4 2 The evaluator sampled index.js/relay.js via grep and found no P0; the exerciser (file write, reproduced) and security-auditor (SSRF read-back + committed live key) each found a P0 the evaluator's sample missed. Three P0s on the cloud surface contradict a 4.
Performance 3 3
Testing 3 3 Corroborated by the exerciser; every code P0/P1 lives in an untested file. Borderline 2.
Code quality 3 3
Documentation 4 3 The doc-auditor found moderate drift the evaluator's "candid docs" read didn't weigh: a missing gate-order state, a directory layout missing half the server, and stale Gemini-first operator instructions.

Not a lens, but tracked separately: StartOS packaging = BLOCKED for registry submission (3 hard blockers), fine for make install.

Disagreements & gaps

  • Disagreement on Security posture. The evaluator rated Security 4 ("fundamentals done right"); the security-auditor and exerciser found three P0s. Both are partly right — the fundamentals (SQL, auth, billing trust model) genuinely are strong, but the user-controlled-input → side-effect paths (library.js filename, audio.js URL fetch) were the evaluator's blind spot because it sampled the big files by grep rather than reading library.js/audio.js in full. Adjusted Security to 2 to reflect the reproduced exploit.
  • Shared blind spot — the real pipeline was never run end-to-end. No agent had a Gemini key or relay credits, so the actual summarize → saveToHistory → credit-debit happy path under concurrency is unverified by anyone. The credit-TOCTOU (P2) and the concurrency lock (P1) are reasoned from code, not observed. This is the single highest-value gap to close with a live integration test.
  • Deployment-dependent finding. The X-Forwarded-For bypass (P1) can't be confirmed from this repo — it hinges on whether the recaps.cc edge proxy appends or overwrites XFF. Self-hosted StartOS (tunnel sets XFF) is unaffected.
  • Spec-checker couldn't fully verify two things read-only: whether the local start-cli pack hard-fails without instructions.md (its list-ingredients omits it — ambiguous), and whether the multi-tenant actions run cleanly in single mode (needs an on-device run).

Surprises (carried forward from every agent)

  • The pipeline detects relay errors containing "Parakeet/Gemma/CUDA" (index.js:3419) and then forwards that exact string to cloud users two lines later — it knows the secret and leaks it anyway. (evaluator)
  • The SSRF guard the codebase clearly knows it needs — it strips ::ffff: and validates IPs for the trial cap — is entirely absent from the one place that fetches a user-controlled URL. (security-auditor)
  • The billing/settlement trust model (a common place for funds bugs) is correctly server-to-server, relay-verified, and idempotent — a positive surprise. (security-auditor)
  • The first /api/process after boot triggers a yt-dlp + Homebrew self-update, adding 510s latency before the request completes. (exerciser)
  • AGENTS.md's server directory layout documents only a skeleton — more than half the server/*.js files are unlisted; it reads like it was written when the server was much smaller. (doc-auditor)
  • start-cli s9pk list-ingredients does not include instructions.md even though the docs say the build fails without it, and the root package.json is still named youtube-summarizer-startos. (start9-spec-checker)

Suggested order of work

  1. Rotate the leaked Gemini key now, then purge d5046a0:.env from history (BFG/filter-repo) and force-push. Independent of all code — do it first. (P0 secret)
  2. Close the two reachable code P0s: add an SSRF guard to downloadPodcastAudio (reject private/link-local/loopback IPs, https-only, re-validate on redirect, size/time cap — folds in the P3 cap), and validate the import session key. Export safeFilename() as a shared util and call it from library.js + the history-move/import paths (this also closes the P2 _meta.json pollution). (P0 file-write + SSRF)
  3. Fix the two P1 correctness bugs: the ESM require() in license-purchase.js, and scope the concurrency lock per-identity (or skip it entirely) in multi-mode. (P1)
  4. Add a live integration test for /api/process gating + library import + a happy-path summarize-with-debit (with a stub/real key) — this is the shared blind spot and the regression net for steps 23. Then close the credit TOCTOU and the SSE error-boundary scrub against that test. (P2)
  5. Patch dependency CVEs (npm audit fix; nodemailer is a major bump) and the multi-mode server-key fallback. (P2)
  6. Reconcile the docs: add the "paid cloud user" gate-order state, rewrite the operator INSTRUCTIONS.md / ABOUT.md as relay-default (not Gemini-first), fix the directory layout and the Bearer/line-count/safeFilename nits. (P2/P3)
  7. Only if submitting to the registry: add instructions.md, point packageRepo/upstreamRepo at a public source repo, choose a source-available license, and remove the stale youtube-summarizer_x86_64.s9pk. Otherwise these are non-issues for make install. (P1 conditional)