v1.1.0:8 — admin-gate whole-DB routes + AI custom-URL providers; SSRF guard

Multi-user authorization hardening from a full security evaluation (EVALUATION.md): - P0: /api/settings/{export,import}-db are now admin-only. Previously any signed-in user could download the whole instance DB (all bcrypt hashes + plaintext AI keys) or replace it wholesale. Per-user CSV export/import stays open. - AI custom-URL providers (Ollama, OpenAI-compatible) are now admin-only, and every server fetch to a user-supplied URL passes through assertSafeProviderUrl (blocks link-local/cloud-metadata; private LAN allowed by design). Fixed-URL cloud providers stay per-user. Removed the dead legacy /api/ai/config route. - Dev: fixed broken quick-start (added npm run create-admin; rewrote README; dropped dead CLAUDE_API_KEY) and the export-db 0-byte path resolution (resolveDatabasePath now matches Prisma). ExVer bumped to 1.1.0:8 (no schema/data migration). Tests 197 pass, build green, tsc clean.
2026-06-12 23:15:09 -05:00
parent 09eeef249d
commit 988a3cca9a
30 changed files with 815 additions and 195 deletions
@@ -0,0 +1,78 @@
+# Evaluation — proof-of-work (Workout-log) — 2026-06-13
+
+Intent: A self-hosted, multi-user workout planner and logger (Next.js 14 App Router + server actions/SSE, Prisma/SQLite, bcrypt auth) with an AI program-suggestion subsystem (5 LLM providers, background generation), packaged as a StartOS 0.4 s9pk.
+
+Agents run: evaluator, security-auditor, exerciser, start9-spec-checker. (reviewer skipped — working tree is clean, no uncommitted diff to review.)
+
+## Verdict
+
+This is a well-built, genuinely sophisticated app — clean provider abstraction, careful idempotent StartOS migrations, 177 passing tests, green build/tsc/lint, and consistent per-object authorization across the domain routes. But it ships **one release-blocking multi-tenant breach with three independent confirmations**: `/api/settings/export-db` and `/api/settings/import-db` operate on the *entire instance database* yet are gated only by "is logged in," so any ordinary user can download every other user's data plus all bcrypt hashes and plaintext AI keys, or overwrite the whole DB to mint themselves an admin. It is harmless in single-user mode and catastrophic the moment a second user exists — which is a first-class advertised feature. Below that sit a real authenticated SSRF, a vulnerable Next.js version, several uncaught-input 500s, and a broken README quick-start. Community-registry packaging is separately BLOCKED on four items, but the user has parked registry submission, so those are deferred rather than urgent.
+
+## Cross-referenced findings
+
+- **Whole-instance DB export/import un-gated (the P0).** Flagged by all three code agents: evaluator (two P0s), security-auditor (two P0s, with the import path traced to full admin takeover via an injected known-hash User row), and exerciser (observed `/api/settings/export-db` returns 200 to a non-admin, plus a separate dev-only 0-byte bug at the same endpoint). One root cause, two endpoints, three kinds of evidence → kept as the top two P0s.
+- **SSRF via user-controlled provider `baseUrl`.** Both evaluator (P2) and security-auditor (P1) independently found `lib/ai/providers/{ollama,openai}.ts` fetch an arbitrary URL reachable from `/api/ai/test` and `/api/ai/ollama/models` with no private-range blocking. Merged; severity reconciled to P1 (see Disagreements).
+- **Login enumeration / timing + CSP `unsafe-eval`.** Both evaluator and security-auditor noted the unknown-email login path lacks a dummy bcrypt compare (timing oracle) and that the shipped CSP allows `unsafe-eval` while the in-repo comment only justifies `unsafe-inline`. Merged into single P3s.
+- **`/api/health` info disclosure.** Security-auditor (P3) and exerciser both noted the health endpoint returns user count + signupsOpen unauthenticated. Merged.
+- **Stale install alert / "0.3.5 data snapshot" copy.** start9-spec-checker flagged this in both the manifest install alert and the long description; the evaluator separately confirmed the seed ships zero users. Merged into one packaging item.
+
+## Priority queue
+
+- [P0] Any authenticated user can GET `/api/settings/export-db` and download the whole instance DB (all bcrypt hashes + plaintext AI API keys) — `app/api/settings/export-db/route.ts:15`, UI shown to all at `components/settings/SettingsForm.tsx:260`, `app/main/settings/page.tsx:33` — evaluator, security-auditor, exerciser
+- [P0] Any authenticated user can POST `/api/settings/import-db` and replace the whole instance DB → admin takeover + data destruction — `app/api/settings/import-db/route.ts:18`, replace at `:135` — evaluator, security-auditor
+- [P1] Authenticated SSRF via attacker-controlled provider `baseUrl` (probe LAN / Start9 services / metadata) — `lib/ai/providers/openai.ts:24`, `lib/ai/providers/ollama.ts:22`, reached via `app/api/ai/test/route.ts:66` and `app/api/ai/ollama/models/route.ts:39` — security-auditor (P1), evaluator (P2)
+- [P1] Vulnerable Next.js 14.2.35 (RSC DoS, WS-upgrade SSRF, App Router XSS, cache-poisoning) — `proof-of-work/package-lock.json`; fixes land in 15.5.16+ (major bump, test first) — security-auditor
+- [P1] README quick-start login is broken — `admin@local` fails Zod `.email()` (no TLD) → HTTP 400; seed ships zero users so fresh clone cannot log in via documented creds — `proof-of-work/README.md:23` — exerciser
+- [P1] `GET /api/settings/export-db` returns a 0-byte file in dev — `resolveDatabasePath()` picks the empty `data/app.db` created by `prisma db push` over the live `prisma/data/app.db` — `proof-of-work/lib/db-file.ts` — exerciser
+- [P2] No rate limiting on `POST /api/auth` (raw API endpoint) — credential brute-force bypasses the UI server-action's 10/15min cap — `app/api/auth/route.ts` — exerciser
+- [P2] Login/signup rate limit defeatable via spoofed leftmost `X-Forwarded-For` — `lib/rateLimit.ts:53` (mitigated only if the StartOS proxy overwrites XFF) — security-auditor
+- [P2] Invalid `date` string in `POST /api/workouts` → HTTP 500 instead of 400 (Prisma throws, generic catch) — `app/api/workouts/route.ts` — exerciser
+- [P2] Malformed JSON body → HTTP 500 instead of 400 (`request.json()` SyntaxError uncaught; also `/api/import/exercises/seed`) — exerciser
+- [P2] Negative pagination `offset=-5` on `GET /api/workouts` → HTTP 500 (Prisma rejects negative skip) — exerciser
+- [P2] Container runs as root — no `USER nextjs` directive before entrypoint, maximizing RCE blast radius — `start9/0.4/Dockerfile` — security-auditor
+- [P2] Packaging blocker: `license` is `"Proprietary"`, not a valid SPDX id; `LICENSE` file mismatches — `start9/0.4/startos/manifest/index.ts:23`, `start9/0.4/LICENSE` — start9-spec-checker
+- [P2] Packaging blocker: `instructions.md` absent (required; build fails / user-facing in UI) — `start9/0.4/` — start9-spec-checker
+- [P2] Packaging blocker: `packageRepo` + `upstreamRepo` both 404 (`github.com/keysat-xyz/proof-of-work`) — `start9/0.4/startos/manifest/index.ts:23-24` — start9-spec-checker
+- [P2] Packaging blocker: stale install alert + long description claim a "one-time 0.3.5 /data snapshot baked into the image" that no longer happens — `start9/0.4/startos/manifest/i18n.ts:22-33` — start9-spec-checker
+- [P3] Login timing oracle — unknown-email path returns early with no dummy bcrypt compare, leaking account existence — `app/auth/login/actions.ts:27` — security-auditor, evaluator
+- [P3] CSP ships `script-src 'unsafe-eval'` while the in-repo comment only justifies `'unsafe-inline'` — `proof-of-work/next.config.js:19` — evaluator, security-auditor
+- [P3] `/api/health` returns user count + `signupsOpen` to unauthenticated callers — `app/api/health/route.ts:44` — security-auditor, exerciser
+- [P3] Rate-limit map never evicts fully-expired keys — unbounded distinct-IP growth over process lifetime — `lib/rateLimit.ts:24` — evaluator
+- [P3] `workout PATCH` / `sets POST` accept `exerciseId` without verifying caller ownership (unlike `programs PATCH`) — `app/api/workouts/[id]/route.ts:128`, `app/api/workouts/[id]/sets/route.ts:61` — security-auditor
+- [P3] Session tokens valid 30 days, no idle timeout, no rotation on privilege change — `lib/auth.ts` — security-auditor
+- [P3] No max-length validation on any text field (10KB/100KB names accepted) — exerciser
+- [P3] `defaultRestSeconds` silently dropped by `POST /api/preferences` though it appears in Settings UI — exerciser
+- [P3] WAL not enabled in local dev (health-check warns `journal_mode='delete'`) — degraded backup safety for dev only — exerciser
+- [P3] Stale `.env.example` lists dead `CLAUDE_API_KEY` (keys live per-user in DB) — `proof-of-work/.env.example` — evaluator
+- [P3] Drifted docs: AGENTS.md/CHANGELOG say "34 tests"; suite is now 177 — evaluator
+- [P3] Git history (initial commit `1b64c45`) contains a committed SQLite DB with a default admin bcrypt hash — purge before ever making history public — security-auditor
+- [P3] Packaging warnings: icon is PNG vs template SVG; README is migration-era prose missing spec sections; no `.github/workflows/`; `docsUrls` points at generic Start9 docs; Dockerfile on Node 20 vs LTS 22 — start9-spec-checker
+- [P3] Repo cruft: legacy `start9/0.4/workout-log_x86_64.s9pk` artifact not removed by `make clean`; `bcryptjs` listed in `start9/0.4/package.json` appears unused (app uses native `bcrypt`) — start9-spec-checker
+
+## Scorecard
+
+| Lens | Score /5 | Justification |
+|---|---|---|
+| Architecture | 4 | Clean provider abstraction (`lib/ai/providers/index.ts:7`), helpers split out of routes, sound schema with intentional composite indexes, process-local background bus by design. |
+| Security | 2 | Two P0 instance-DB routes un-gated; otherwise strong (CSPRNG sessions, double-gated admin actions, no signup enumeration, redacted keys). Corroborated by auditor — no adjustment. |
+| Performance | 4 | Hot paths indexed, WAL tuning at boot, throttled progress flush; only an unbounded rate-limiter map. |
+| Testing | 4 | 177 tests pass in ~4s covering AI parsing, auth, admin guards, route CRUD, idempotency. *Note:* exerciser found multiple uncaught-input 500s the suite doesn't cover, and no test asserts the instance-DB admin gate — evidence the suite has an input-validation/authz-regression blind spot, but the existing coverage is real, so held at 4. |
+| Code quality | 4 | Consistent route shape, thorough "why" comments, tsc/lint clean; two minor validation styles coexist. |
+| Documentation | 4 | README/AGENTS/CHANGELOG detailed and mostly true; broken quick-start creds, stale `.env.example`, drifted test counts, migration-era packaging README. |
+
+## Disagreements & gaps
+
+- **SSRF severity (P1 vs P2).** security-auditor rated it P1, evaluator P2 — the split is purely about the self-hosted threat model (evaluator weighted "operator owns the box" down; auditor weighted "real in multi-user mode" up). Resolved to **P1**: the same multi-user reality that makes the DB-export a P0 also makes an authenticated user reaching the operator's LAN a real cross-tenant capability. Noted rather than averaged.
+- **Packaging "BLOCKER" → P2 mapping.** start9-spec-checker correctly reports the package as BLOCKED for *community-registry submission*. The user has explicitly parked registry publishing (sideload via `make install` only), so these are mapped to P2 (must-fix-before-submit, not on the current critical path) rather than P0. If/when community submission is back on the table, treat all four as hard blockers.
+- **Shared blind spot — no live/multi-user runtime test.** Every code agent assessed multi-user authz and the AI subsystem *from source*; the exerciser couldn't test live AI generation (no API keys) or true concurrent multi-user load, and nobody exercised the StartOS proxy. So two severity calls (the XFF rate-limit bypass, secure-cookie/HTTPS posture) rest on whether the StartOS reverse proxy rewrites headers — unverified here.
+- **Untested by design:** live LLM generation end-to-end, PWA/service-worker, StartOS Actions/backup-restore on a real box, and the `import-db` restore path with a valid .db file.
+
+## Suggested order of work
+
+1. **Close the P0 first.** Gate `export-db` + `import-db` on `user.isAdmin` (return 403) in both route handlers *and* hide them in `SettingsForm`/`page.tsx`; better, move whole-DB replace into the StartOS operator action layer entirely. Per-user export already exists at `/api/me/export`.
+2. **Add a regression test** asserting a non-admin gets 403 from both instance-DB routes — this is the exact gap the suite has today, and it locks the P0 fix.
+3. **Fix the P1 SSRF** — resolve provider `baseUrl` hosts and reject loopback/link-local/private/reserved ranges; restrict scheme to http/https (or limit provider config to admins).
+4. **Fix the developer-facing P1s** — correct the README quick-start (the seed ships zero users; document the first-run setup flow instead of `admin@local`), and fix `resolveDatabasePath()` so export doesn't pick an empty 0-byte DB.
+5. **Sweep the input-validation 500s together** — wrap `request.json()` in try/catch→400 and add Zod guards for date/offset across the workouts routes (one shared pattern fixes several P2s); add rate limiting to `POST /api/auth`.
+6. **Harden the container** — add `USER nextjs` to the Dockerfile; plan the Next.js 15 upgrade as its own tested change.
+7. **Defer packaging fixes** until registry submission is back on — then clear all four blockers (SPDX license, `instructions.md`, repo URLs, stale install alert) plus the warnings in one pass. Quick wins anytime: delete the legacy `workout-log_x86_64.s9pk`, drop dead `CLAUDE_API_KEY`/`bcryptjs`, fix drifted test counts.