Files

T

Keysat d4c742d6e7 Add full-evaluation report

Independent six-lens evaluation (evaluator, security-auditor, exerciser,
doc-auditor, start9-spec-checker). Surfaces three P0s on the cloud surface
and a registry-submission block; full priority queue in the file.

2026-06-14 09:39:46 -05:00

16 KiB

Raw Blame History

Evaluation — recap (Recaps) — 2026-06-13

Intent: Recaps is a YouTube + podcast summarizer and personal library, served as a single-page vanilla-JS app from a Node.js (ES module) backend — shipping both as a single-mode StartOS .s9pk self-host package and as the multi-tenant recaps.cc cloud (SQLite, credit-metered, talking to a sibling "relay" service).

Agents run: evaluator, security-auditor, exerciser, doc-auditor, start9-spec-checker. Reviewer skipped (working tree clean — no diff to review). All five completed; none failed.

Verdict

The pure-logic core is genuinely well-built — 103 passing tests, parameterized SQL everywhere, scrypt + timingSafeEqual auth, hashed single-use magic-link tokens, server-derived tenant identity that resists X-Recap-User-Id spoofing, and a correctly server-to-server billing/settlement path. But three independent agents surfaced three P0 issues that the test suite can't see because the riskiest files are untested: an arbitrary file write via /api/library/import (reproduced live — ../../ in a session key escapes the scope dir), an SSRF-with-read-back in the podcast download path (reachable by an anonymous trial), and a live, still-active Gemini API key committed to git history. On top of that the multi-tenant claim is undercut by a process-global lock that serializes every concurrent cloud summarize to one-at-a-time. None of these is hard to fix, but the cloud is not ready for untrusted users until the three P0s are closed and the key is rotated. Self-host single-mode is in much better shape (the file-write and SSRF matter far less with one trusted operator), but a StartOS community-registry submission is BLOCKED on packaging gaps regardless.

Cross-referenced findings (corroboration / merges)

SSE error-boundary leak — TWO agents, ONE finding. The evaluator (index.js:3432, :3003, :4246) and the security-auditor (relay.js:135-144 → index.js:~4246) independently flagged that relay backend errors are forwarded to cloud users verbatim, violating the AGENTS.md "scrub operator-internal language" contract. The evaluator's surprise sharpens it: the pipeline's own regex at index.js:3419 detects "Parakeet/Gemma/CUDA" in the error string and then forwards that exact string two lines later. One P2, two evidence kinds.
The file-write P0 and a doc finding share a root cause. The exerciser reproduced arbitrary file write in library.js because the session key is used as a filename without safeFilename(). The doc-auditor independently found that safeFilename() is module-private in history.js, not an exported shared util — which is why library.js couldn't reuse it. AGENTS.md tells contributors "safeFilename() already exists — use it," but it isn't importable. Fix the code and the doc together.
The concurrency lock cuts both ways. The evaluator's P1 (global currentFreeJob lock serializes all tenants) and the security-auditor's P2 (concurrent credit over-spend when the lock is absent, i.e. on licensed installs) are two faces of the same isFreeUser()/currentFreeJob mechanism: when the lock applies it breaks multi-tenant concurrency; when it doesn't, it lets a 1-credit user fan out N parallel requests before any debit lands.
"Riskiest files are untested" — evaluator + exerciser agree. Both note zero coverage on /api/process gating, relay.js, tenant-auth.js, and billing. Every P0/P1 code bug above lives in an untested file. The exerciser further notes the real summarize→save→debit happy path could not be run end-to-end (no Gemini key / relay credits) — a shared blind spot, below.
Secrets — evaluator + security-auditor. Both flag cookies.txt as sensitive plaintext in the repo root; the security-auditor escalates with the far more serious committed-and-active Gemini key in history d5046a0:.env. Working tree itself is clean.
Version-file sprawl — evaluator + spec-checker. Both independently noted 175 StartOS version files with only ~3 carrying real migrations.

Priority queue

P0 — block the cloud; fix before any untrusted use

[P0] Arbitrary file write via /api/library/import — ../../ in a session key escapes scope; reproduced writing /tmp/rce_test.json & recap/evil.json — server/library.js:131-139 (key used verbatim, no safeFilename()) — exerciser (confirmed in single-mode; confirm multi-mode tenant reachability — it's per-scope so any authed tenant should reach it)
[P0] SSRF with read-back via podcast URL — anon trial can POST {type:"podcast", url:"http://169.254.169.254/..."}; unguarded http.get follows redirects to any host and the body is transcribed back to the attacker — server/audio.js:78-97 (downloadPodcastAudio), gate server/index.js:2747, reached :3455 — security-auditor
[P0] Live Gemini API key committed to git history and still the active key — git show d5046a0:.env, pushed to origin/master — security-auditor (rotate now; purge from history)

P1 — correctness / availability / submission blockers

[P1] require("crypto") inside an ESM module throws ReferenceError on the anon license-purchase settle path — server/license-purchase.js:423 (called from :353-354) — evaluator
[P1] Global single-flight currentFreeJob lock serializes the entire multi-tenant cloud (a 2nd concurrent /api/process from any user gets 409); isFreeUser() returns true for every tenant in multi-mode — server/index.js:2621, server/license-middleware.js:143 — evaluator
[P1] Trial IP-cap + magic-link rate-limit bypass via spoofed X-Forwarded-For (no trust proxy / XFF normalization) — unlimited free trial credits if the edge proxy appends rather than replaces XFF — server/anon-trial.js:50-57, server/auth-routes.js:209 — security-auditor (deployment-dependent on the recaps.cc proxy)
[P1] StartOS community-registry submission BLOCKED (3 blockers): missing root instructions.md; packageRepo/upstreamRepo point to https://ten31.xyz (a homepage, not a source repo); license: 'Proprietary' fails the "source available" gate — startos/manifest/index.ts, repo root — start9-spec-checker (only blocks registry submission; does NOT affect make install)

P2 — real risk / quality, not release-blocking for self-host

[P2] Operator-internal language (Parakeet/Gemma/CUDA/LAN IPs/*.local) leaks to cloud users at the SSE error boundary; no scrub exists — server/index.js:3432,3003,4246 + server/providers/relay.js:135-144 — evaluator + security-auditor
[P2] Concurrent credit over-spend (TOCTOU) on licensed installs — N parallel requests pass the total>0 check before any blind debitOne lands — gate server/index.js:2497-2550 vs debit :3158,:4197 (anon-trial.js:299) — security-auditor
[P2] Multi-mode tenant can spend the operator's server Gemini key directly (bypasses relay metering) by selecting transcriptionProvider:"gemini" with an empty key — server/providers/index.js:104-114 — security-auditor
[P2] GET /api/history reads + JSON.parses every full session file (transcript+summary, MB each) just to list ~8 metadata fields — server/history.js:418-437 — evaluator
[P2] Dependency CVEs: nodemailer 6.10.1 (high — SMTP injection/DoS, low practical reach here), ws 8.20.0, qs/express, protobufjs 7.5.6 (moderate) — npm audit in server/ — security-auditor
[P2] Unsanitized IDs persisted into _meta.json via array-form import and PUT /api/history/move (corrupts the listing; no direct file-path escape) — server/library.js, history move handler — exerciser
[P2] PUT /api/history/meta accepts arbitrary JSON shapes with no schema (a wrong-typed body, e.g. {"folders":"not-an-array"}, can break the listing) — exerciser
[P2] server/index.js is 4351 lines mixing routing, the full /api/process pipeline, yt-dlp orchestration, and SSE (:2463-4270) — evaluator
[P2] Zero tests on the highest-risk files (/api/process gating, relay.js, tenant-auth.js, billing) — every code P0/P1 above lives here — evaluator + exerciser

P2/P3 — documentation drift (doc-auditor)

[P2] Documented credit-gate order in AGENTS.md omits the "paid cloud user" bypass state (between license and free-tenant) — AGENTS.md:77 vs server/index.js:2464-2472 — doc-auditor
[P2] Operator-facing copy is stale Gemini-first / Gemini-only; relay is the default provider — startos-registry/packages/recap/INSTRUCTIONS.md:5,23, assets/ABOUT.md:3,18 — doc-auditor
[P3] AGENTS.md directory layout omits ~25 server modules (audio, library, credits-purchase, tts-routes, relay-default/-capabilities/-state, url-resolver, ytdlp, …) — AGENTS.md:34-60 — doc-auditor
[P3] relay-client.md Authorization header missing the Bearer scheme — says Authorization: <license>, code sends Authorization: Bearer <license_key> — docs/guides/relay-client.md:17 (== .claude/rules/relay-client.md:17) — doc-auditor
[P3] index.html line count stated ~10k, actually 12,515 — AGENTS.md:51 — doc-auditor
[P3] AGENTS.md safeFilename() convention implies an importable util, but it's module-private in history.js — AGENTS.md:79 — doc-auditor (see P0 file-write)

P3 — hardening / hygiene

[P3] 100MB JSON body limit on all routes is a cheap memory-exhaustion lever for unauthenticated /api/process — server/index.js:203 — evaluator
[P3] /api/credits/claim: a leaked anon BTCPay invoice ID lets any signed-in account claim those credits — server/credits-purchase.js:342 — security-auditor
[P3] downloadPodcastAudio has no size/time cap (disk/memory fill) — server/audio.js — security-auditor
[P3] Container runs as root (no USER in Dockerfile) — acceptable under StartOS isolation, add non-root for the cloud image — security-auditor
[P3] In-memory auth rate-limit buckets reset on restart — server/auth-routes.js:106 — security-auditor + exerciser
[P3] Stale youtube-summarizer_x86_64.s9pk (~223MB) + root package.json still named youtube-summarizer-startos — leftover from the package-ID rename — start9-spec-checker
[P3] Manifest docsUrls: [] empty; multi-tenant cloud actions exposed in the single-mode StartOS menu must be verified to run cleanly (enableMultiTenantMode et al.) — startos/actions/index.ts — start9-spec-checker
[P3] cookies.txt (119KB, sensitive yt-dlp plaintext) sits in the repo root and is expiring imminently (/api/health already reports fileExpiring:true) — correctly gitignored — evaluator + exerciser
[P3] 175 StartOS version files, only ~3 with real migrations (172 empty up/down stubs) — maintenance surface — evaluator + start9-spec-checker

Scorecard

The evaluator's six-lens table, with two lenses adjusted on cross-agent evidence:

Lens	Evaluator	Adjusted	Why adjusted
Architecture	3	3	—
Security	4	2	The evaluator sampled `index.js`/`relay.js` via grep and found no P0; the exerciser (file write, reproduced) and security-auditor (SSRF read-back + committed live key) each found a P0 the evaluator's sample missed. Three P0s on the cloud surface contradict a 4.
Performance	3	3	—
Testing	3	3	Corroborated by the exerciser; every code P0/P1 lives in an untested file. Borderline 2.
Code quality	3	3	—
Documentation	4	3	The doc-auditor found moderate drift the evaluator's "candid docs" read didn't weigh: a missing gate-order state, a directory layout missing half the server, and stale Gemini-first operator instructions.

Not a lens, but tracked separately: StartOS packaging = BLOCKED for registry submission (3 hard blockers), fine for make install.

Disagreements & gaps

Disagreement on Security posture. The evaluator rated Security 4 ("fundamentals done right"); the security-auditor and exerciser found three P0s. Both are partly right — the fundamentals (SQL, auth, billing trust model) genuinely are strong, but the user-controlled-input → side-effect paths (library.js filename, audio.js URL fetch) were the evaluator's blind spot because it sampled the big files by grep rather than reading library.js/audio.js in full. Adjusted Security to 2 to reflect the reproduced exploit.
Shared blind spot — the real pipeline was never run end-to-end. No agent had a Gemini key or relay credits, so the actual summarize → saveToHistory → credit-debit happy path under concurrency is unverified by anyone. The credit-TOCTOU (P2) and the concurrency lock (P1) are reasoned from code, not observed. This is the single highest-value gap to close with a live integration test.
Deployment-dependent finding. The X-Forwarded-For bypass (P1) can't be confirmed from this repo — it hinges on whether the recaps.cc edge proxy appends or overwrites XFF. Self-hosted StartOS (tunnel sets XFF) is unaffected.
Spec-checker couldn't fully verify two things read-only: whether the local start-cli pack hard-fails without instructions.md (its list-ingredients omits it — ambiguous), and whether the multi-tenant actions run cleanly in single mode (needs an on-device run).

Surprises (carried forward from every agent)

The pipeline detects relay errors containing "Parakeet/Gemma/CUDA" (index.js:3419) and then forwards that exact string to cloud users two lines later — it knows the secret and leaks it anyway. (evaluator)
The SSRF guard the codebase clearly knows it needs — it strips ::ffff: and validates IPs for the trial cap — is entirely absent from the one place that fetches a user-controlled URL. (security-auditor)
The billing/settlement trust model (a common place for funds bugs) is correctly server-to-server, relay-verified, and idempotent — a positive surprise. (security-auditor)
The first /api/process after boot triggers a yt-dlp + Homebrew self-update, adding 5–10s latency before the request completes. (exerciser)
AGENTS.md's server directory layout documents only a skeleton — more than half the server/*.js files are unlisted; it reads like it was written when the server was much smaller. (doc-auditor)
start-cli s9pk list-ingredients does not include instructions.md even though the docs say the build fails without it, and the root package.json is still named youtube-summarizer-startos. (start9-spec-checker)

Suggested order of work

Rotate the leaked Gemini key now, then purge d5046a0:.env from history (BFG/filter-repo) and force-push. Independent of all code — do it first. (P0 secret)
Close the two reachable code P0s: add an SSRF guard to downloadPodcastAudio (reject private/link-local/loopback IPs, https-only, re-validate on redirect, size/time cap — folds in the P3 cap), and validate the import session key. Export safeFilename() as a shared util and call it from library.js + the history-move/import paths (this also closes the P2 _meta.json pollution). (P0 file-write + SSRF)
Fix the two P1 correctness bugs: the ESM require() in license-purchase.js, and scope the concurrency lock per-identity (or skip it entirely) in multi-mode. (P1)
Add a live integration test for /api/process gating + library import + a happy-path summarize-with-debit (with a stub/real key) — this is the shared blind spot and the regression net for steps 2–3. Then close the credit TOCTOU and the SSE error-boundary scrub against that test. (P2)
Patch dependency CVEs (npm audit fix; nodemailer is a major bump) and the multi-mode server-key fallback. (P2)
Reconcile the docs: add the "paid cloud user" gate-order state, rewrite the operator INSTRUCTIONS.md / ABOUT.md as relay-default (not Gemini-first), fix the directory layout and the Bearer/line-count/safeFilename nits. (P2/P3)
Only if submitting to the registry: add instructions.md, point packageRepo/upstreamRepo at a public source repo, choose a source-available license, and remove the stale youtube-summarizer_x86_64.s9pk. Otherwise these are non-issues for make install. (P1 conditional)

16 KiB Raw Blame History Unescape Escape