From ffdb59aa8f970d219e37a10db205747cc7ca3fbf Mon Sep 17 00:00:00 2001
From: Keysat <licensing@keysat.xyz>
Date: Fri, 12 Jun 2026 22:42:29 -0500
Subject: [PATCH] Document P0 webhook fix + test seam; ship :54; track
 EVALUATION.md

---
 AGENTS.md               | 48 ++++++++++++++-------------
 EVALUATION.md           | 72 +++++++++++++++++++++++++++++++++++++++++
 docs/guides/payments.md | 21 ++++++++++--
 docs/guides/testing.md  | 29 +++++++++--------
 4 files changed, 131 insertions(+), 39 deletions(-)
 create mode 100644 EVALUATION.md

diff --git a/AGENTS.md b/AGENTS.md
index 4aabd37..89ac066 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -58,7 +58,8 @@ tests/crosscheck/                 cross-language LIC1 verifier → guides/crypto
 Note: the daemon (`licensing-service-startos`, repo `keysat`), each SDK, and
 `plans/` are **separate git repos** — commit code/plan changes in their own repo.
 The root `Licensing` repo (`keysat-root`) tracks only `AGENTS.md` + `docs/guides/`
-+ `.claude/rules/`. **Remotes differ per repo**: the daemon's `main` tracks
++ `.claude/rules/` + `EVALUATION.md` (the latest full-eval report; overwritten
+each run, history in git log). **Remotes differ per repo**: the daemon's `main` tracks
 **GitHub** (`origin`, the public upstream) with a `gitea` backup — plain `git push`
 goes to GitHub, so also `git push gitea main`; root + plans are **Gitea-only**.
 Run `git remote -v` (full) and check what the branch tracks before pushing.
@@ -97,17 +98,19 @@ Operator-specific memories at `~/.claude/projects/-Users-macpro-Projects-licensi
 
 ## Current state (2026-06-13)
 
-- **Live**: server `immense-voyage.local` runs daemon `0.2.0:53` (migrations
-  0020–0022 applied). Registry `registry.keysat.xyz` now publishes `:53` too
-  (GitHub release `v0.2.0-53` cut; `files.keysat.xyz` serves the s9pk). Four SDKs
+- **Live**: server `immense-voyage.local` runs daemon `0.2.0:54` (migrations
+  0020–0022 applied). Registry `registry.keysat.xyz` publishes `:54` too
+  (GitHub release `v0.2.0-54` cut; `files.keysat.xyz` serves the s9pk). Four SDKs
   published; `keysat.xyz` + `docs.keysat.xyz` deployed.
 - **`:52`/`:53` = multi-provider/merchant-profile model**: data model + backend
   resolution shipped and audited sound; the resolution/CRUD query surface now has
   test coverage. See `docs/guides/payments.md`.
-- **Purchase-path bug fixed and shipped**: the `:52` ambiguous-column bug (broke
-  *every* paid purchase) was fixed in daemon `31f4670`; `:53` (version bump
-  `8c4bacc`) built, installed to prod, and published to the registry on
-  2026-06-13. The live purchase path works again.
+- **Two payment-path fixes shipped 2026-06-13**: (a) `:53` fixed the `:52`
+  ambiguous-column bug that broke *every* paid purchase (daemon `31f4670`); (b)
+  `:54` fixed the **P0 Zaprite webhook-forgery** — settle now re-confirms against
+  the provider API before issuing (daemon `783372c`, bump `495fbbf`). Both built,
+  installed to prod, and published to the registry. Live purchase + settle paths
+  are sound.
 - **GAP — multi-profile is non-functional end-to-end**: nothing in the shipped
   app writes `products.merchant_profile_id` (the INSERT in
   `create_product_with_currency` omits it; `update_product_with_currency` has no
@@ -118,23 +121,22 @@ Operator-specific memories at `~/.claude/projects/-Users-macpro-Projects-licensi
   product→profile **write path** is missing. **This is the gating piece for
   multi-profile** — see the scoped slice below.
 - **Triage from `EVALUATION.md` (full-eval, 2026-06-13)** — P0/P1 = work queue,
-  P2 = known debt, P3+ = deferred. The report at repo root has file:line evidence.
+  P2 = known debt, P3+ = deferred. The report at repo root has file:line evidence;
+  it's tracked, so re-running full-eval overwrites it and `git log -- EVALUATION.md`
+  preserves prior runs.
 
 - **Work queue (P0/P1 — do first, in this order)**:
-  1. **[P1] Provider-injection test seam — BEFORE #2's fix.** The prod purchase/
-     settle path has no mock seam (the mock injects into the dead `state.payment`
-     singleton, not `resolve_provider_for_profile_rail` — this is how the `:52`
-     500 shipped). Add an always-compiled `Option<Arc<dyn PaymentProvider>>`
-     override on `AppState`, checked first in `resolve_provider_for_profile_rail`.
-     Greens the two `paid_purchase_*` red tests and gives #2 a regression harness.
-     **Delete** the dead `payment_provider_preference_round_trip` in the same pass.
-  2. **[P0] Zaprite webhook forgery.** The settle-webhook is unauthenticated — a
-     forged `order.change`/`status=PAID` with a buyer-visible order id mints a
-     signed license (the `externalUniqId` "trust anchor" in the comments is never
-     read in the webhook path). Fix: on settle, re-fetch `get_invoice_status` from
-     the provider and require local `pending` state + matching amount/currency
-     before issuing (mirror the reconciler's safe re-fetch). Closes the P1 below
-     too. `payment/zaprite/provider.rs:234-362`, `api/webhook.rs:62-196,121-194`.
+  1. ✅ **SHIPPED in `:54` — Provider-injection test seam.** Added the
+     always-compiled `AppState::provider_override` + `provider_from_row` helper at
+     every resolution site; greened the two `paid_purchase_*` tests; deleted the
+     dead `payment_provider_preference_round_trip`. See `docs/guides/testing.md`.
+  2. ✅ **SHIPPED in `:54` — Zaprite webhook forgery fix.** `webhook.rs::handle_inner`
+     re-fetches `provider.get_invoice_status` and requires `Settled` before any
+     settle-derived action; acks 200 (no issue) when the provider is unreachable.
+     Two regression tests (forged-settle, provider-unreachable). api 47/47.
+     **Still open (auditor P1):** a literal paid-amount/currency check — needs a
+     trait change (`get_invoice_status` returns only a status enum). See
+     `docs/guides/payments.md`. **This is now the top remaining security item.**
   3. **[P1] Scoped API keys (`ks_…`) are non-functional** — issuable but 403 on
      every admin endpoint; the `require_admin`→`require_scope` migration was never
      done. Finish it, or stop advertising/issuing them. `api/api_keys.rs:14`.
diff --git a/EVALUATION.md b/EVALUATION.md
new file mode 100644
index 0000000..4273868
--- /dev/null
+++ b/EVALUATION.md
@@ -0,0 +1,72 @@
+# Evaluation — Keysat (licensing-service-startos) — 2026-06-13
+
+Intent: a self-hosted, Bitcoin-native software-licensing service that runs as a StartOS 0.4.x package — a Rust/axum/sqlx+SQLite daemon with Ed25519 license signing plus a TypeScript StartOS wrapper — recently extended with a multi-merchant-profile, multi-provider (BTCPay required, Zaprite optional) payment model.
+
+Evaluation target: `licensing-service-startos/` (the daemon + StartOS wrapper). The SDK repos, public sites, and `plans/` were out of scope.
+
+Agents run: evaluator, security-auditor, exerciser, start9-spec-checker (all completed). reviewer **skipped** — the guide runs it only on uncommitted changes, and the daemon tree was clean at evaluation time.
+
+## Verdict
+
+Keysat substantially achieves its intent: the cryptographic licensing core (offline-verifiable LIC1 keys, Ed25519, cross-language-tested) is reference-grade, the payment-provider trait is well-factored, the SQL surface is injection-clear despite runtime-prepared queries, and the security fundamentals (Argon2id, constant-time HMAC/compare, 256-bit sessions, generic-error recovery, FK enforcement confirmed) are above the bar for a project this young. The one release-blocking defect is the **Zaprite settle-webhook: it performs no authentication of any kind and issues a signed license off a buyer-visible order id — anyone who starts a real Zaprite checkout can forge a "paid" callback and mint licenses without paying** (the `externalUniqId` "trust anchor" the code comments describe is never actually read in the webhook path). Behind that sits a systemic test gap — the production purchase/settle path has no integration-test seam, which is exactly how the `:52` 500-on-every-purchase bug shipped — and a cluster of API-hygiene and StartOS-submission issues. None of the deeper architecture is rotten; the work is hardening a sound core, not rebuilding it.
+
+## Cross-referenced findings
+
+- **Zaprite webhook forgery (P0).** Flagged by BOTH the evaluator (as P2) and the security-auditor (escalated to **P0** after tracing it end-to-end: public route → no signature → `externalUniqId` never read → no amount check → no `pending`-state precondition → `webhook.rs:194` issues a signed license). One finding, two agents, severity resolved to P0 (funds loss). `payment/zaprite/provider.rs:234-362`, `api/webhook.rs:62-196`, route `api/mod.rs:429-430`.
+- **Untested production purchase path (P1).** Evaluator named it the #1 issue (mock injects into the bypassed legacy `state.payment` singleton at `tests/api.rs:457`, not the real `resolve_provider_for_profile_rail` at `purchase.rs:482`); the exerciser independently reported it could not exercise the live payment/settle path at all (medium confidence). Same gap, two kinds of evidence. This is also the root of the two `paid_purchase_*` red tests.
+- **FK enforcement (resolved, not a finding).** A prior "latent/unchecked" worry; the security-auditor and daemon-architecture both confirm the sqlx pool sets `foreign_keys(true)` per connection (`db/mod.rs`). Closed.
+
+## Priority queue
+
+- **[P0]** Zaprite settle-webhook is unauthenticated — forged `order.change`/`status=PAID` with a buyer-visible order id mints a free signed license; no amount or state check — `payment/zaprite/provider.rs:234-362` + `api/webhook.rs:62-196` — security-auditor (P0) + evaluator (P2)
+- **[P1]** Production purchase/settle path has no integration-test seam; mock targets the dead `state.payment` singleton, not `resolve_provider_for_profile_rail` — `payment/mod.rs:177`, `tests/api.rs:457`, `purchase.rs:482` — evaluator + exerciser
+- **[P1]** Settle handler issues without verifying paid amount/currency or requiring `pending` state (underpaid/replayed invoices honored at full entitlement; BTCPay forgery blocked by HMAC but replay is a no-op only after first issuance) — `api/webhook.rs:121-194` — security-auditor
+- **[P1]** Scoped API keys (`ks_…`) are advertised + issuable but return 403 on every admin endpoint — the `require_admin`→`require_scope` migration was never done — `api/api_keys.rs:14` — exerciser
+- **[P2]** `/v1/purchase` and `/v1/redeem` have no rate limiting (invoice-spam / upstream-provider exhaustion; unthrottled `free_license` code guessing) — `api/mod.rs:343-345` — security-auditor
+- **[P2]** Per-IP rate-limit bucket keys on attacker-controllable `X-Forwarded-For`; bypassable if the daemon's shared `0.0.0.0:8080` listener is reachable without the StartOS proxy rewriting XFF (unverified) — `recover.rs:65-70`, `validate.rs:95-111`, `admin.rs:56-67` — security-auditor
+- **[P2]** All `422`/`415` responses return naked axum plain-text rejection strings instead of the JSON error envelope — SDK consumers that `JSON.parse()` a 422 throw — every JSON endpoint — exerciser
+- **[P2]** Product `slug` has no validation — empty string, 300-char, and SQL-meta characters all stored as valid slugs; empty-slug product pollutes the public catalog — `POST /v1/admin/products` — exerciser
+- **[P2]** `GET /v1/admin/products` returns 405 though the served OpenAPI documents it (`products:read`); only `post` is wired — `api/mod.rs:431` — exerciser
+- **[P2]** Dependency advisories: `sqlx 0.7.4` (RUSTSEC-2024-0363, fixed ≥0.8.1), `rustls-webpki 0.101.7` (CRL panic / name-constraint bypass, fixed ≥0.103.12); `rsa` reachable only via unused MySQL backend; `paste`/`rustls-pemfile` unmaintained — `cargo audit` — security-auditor
+- **[P2]** StartOS submission blocker: `instructions.md` absent (spec says build should fail; `start-cli` built silently anyway) — start9-spec-checker
+- **[P2]** StartOS submission blocker: `packageRepo` is a dead link — manifest points to `…/keysat-startos` (404); real repo is `…/keysat` — `startos/manifest/index.ts` — start9-spec-checker
+- **[P2]** StartOS submission blocker: `docsUrls` dead link — points to `/docs/INTEGRATION.md`; file is at `/licensing-service/docs/INTEGRATION.md` — `startos/manifest/index.ts` — start9-spec-checker
+- **[P2]** StartOS submission blocker: manifest declares `aarch64` but only x86_64 is published (Start9 review tests on Pi-class hardware) — `publish.sh` / manifest `arch` — start9-spec-checker (corroborates existing AGENTS.md TODO)
+- **[P2]** No CI; `cargo test`/`clippy`/`fmt` all unenforced (56 files differ from `fmt`); the only release gate is `publish.sh` — for a payment daemon the untested suite is the load-bearing weakness behind the P0/P1s — `docs/guides/testing.md` — evaluator
+- **[P3]** Dead test `payment_provider_preference_round_trip` inserts into the dropped `btcpay_config`/`zaprite_config` tables — delete it — `tests/api.rs:1224` — evaluator + exerciser
+- **[P3]** `/v1/purchase` (400) and `/v1/btcpay/webhook` (503) return different status/error codes for the same "no provider configured" condition — exerciser
+- **[P3]** `POST /v1/admin/discount-codes` requires an undocumented `kind` field → opaque 422 — exerciser
+- **[P3]** Field-naming inconsistencies across related endpoints: `license_id` vs `id`; machines use `key` vs documented `license_key`; `redeem`/`purchase` use `product` vs `validate`'s `product_slug` — exerciser (Surprises)
+- **[P3]** Migration self-heal auto-deletes `_sqlx_migrations` rows on checksum drift, gated only by a hand-maintained allowlist — foot-gun as the list grows — `db/mod.rs` / migration 0009 — evaluator (Surprise) + security-auditor
+- **[P3]** Zaprite `validate_webhook` WARN-logs up to 2 KB of raw unauthenticated payload — log-flood / log-injection from a public endpoint — `provider.rs:262-278` — security-auditor
+- **[P3]** Outbound webhook delivery POSTs to operator-supplied URLs with no SSRF allow-list (operator-only input; a stolen admin token could pivot to internal hosts) — `webhooks.rs:150-156` — security-auditor
+- **[P3]** Stale comment in `startos/versions/v0.2.0.ts:3-4` says "NOT YET WIRED INTO versions/index.ts" but it is wired as `current` at `:53` — evaluator
+
+## Scorecard
+
+The evaluator's lens table, with one adjustment from cross-agent evidence:
+
+| Lens | Score /5 | Note |
+|------|----------|------|
+| Architecture | 4 | Clean trait + per-profile resolver; legacy `state.payment` singleton coexists and is what tests inject into — the fork behind the test gap. |
+| Security | **3** (was 4) | **Adjusted down** by the security-auditor's P0 forgery + P1 missing-amount-check (funds loss). Fundamentals remain 4–5 (Argon2id, HMAC, injection-clear, FK-enforced); the Zaprite P0 is the pull. Zaprite is optional, which bounds blast radius. |
+| Performance | 4 | WAL + 8-conn pool, indexed lookups, bounded workers; no load test run. |
+| Testing | 3 | Cross-language LIC1 fixtures are a real strength; the production purchase/settle path is untested — corroborated by the exerciser's inability to reach it and the shipped-but-broken scoped-API-key feature. |
+| Code quality | 5 | Disciplined dynamic SQL (all `.bind()`, const-only interpolation); honest comments. Caveat (not score-changing): the exerciser found absent input validation (slug) and plain-text error leaks. |
+| Documentation | 5 | Scoped guides + wire-format spec are reference-grade and candid (testing.md names its own known failures). |
+
+## Disagreements & gaps
+
+- **Severity disagreement, resolved:** evaluator rated the Zaprite webhook P2; security-auditor traced it end-to-end and rated it P0. Resolved to **P0** — it is unauthenticated, public, and mints signed licenses (direct funds loss for any operator running the optional Zaprite provider — which includes the master operator).
+- **Shared blind spot (all agents):** none could exercise the live BTCPay/Zaprite settlement→issuance path without real provider credentials. The evaluator and exerciser both labeled their payment-path confidence "medium." The single highest-value verification missing is a full purchase cycle against a real BTCPay/Zaprite sandbox — which also exposes the P0/P1 settle-path logic to a real test.
+- **Quality tension:** evaluator scored code-quality 5 on SQL discipline; the exerciser found input-validation gaps (slug) and framework error leaks. Both are true at different layers — the score holds, the caveat is noted.
+
+## Suggested order of work
+
+1. **Fix the Zaprite P0 with its test alongside.** On any settle event, re-fetch `get_invoice_status` from the provider and require `pending` local state + matching paid amount/currency before issuing — this closes both the P0 forgery and the P1 missing-amount-check in one change. (The reconciler already does the safe re-fetch; mirror it in the webhook handler.)
+2. **Add the provider-injection test seam first** (an always-compiled `Option<Arc<dyn PaymentProvider>>` override on `AppState`, checked in `resolve_provider_for_profile_rail`). This is a prerequisite for trusting any payment-path test result — it greens the two `paid_purchase_*` red tests AND gives step 1 a regression harness. Delete the dead `payment_provider_preference_round_trip` in the same pass.
+3. **Decide scoped API keys:** either finish the `require_scope` migration or stop advertising/issuing `ks_…` keys until it lands — a shipped-but-403 feature is worse than an absent one.
+4. **Quick-win hardening batch:** rate-limit `/v1/purchase` + `/v1/redeem`; JSON-ify 422/415; add slug validation; wire `GET /v1/admin/products`; bump `sqlx` ≥0.8.1 and `rustls`/`rustls-webpki`.
+5. **StartOS submission fixes** (before any registry submission): the two manifest dead links and the aarch64-vs-manifest decision are trivial; add `instructions.md`; confirm with Start9 whether the custom `LicenseRef-Keysat-1.0` satisfies their "source available" bar.
+6. **Add a CI gate** (Gitea Actions/Woodpecker running `cargo test` + `clippy`) so the honor-system test gap — the meta-cause behind the shipped `:52` and Zaprite defects — closes permanently.
+7. **Then resume feature work:** the product→merchant-profile picker (already scoped at the top of the AGENTS.md roadmap; the write path is the gating piece that makes multi-profile usable end-to-end) and the remaining deferred UIs.
diff --git a/docs/guides/payments.md b/docs/guides/payments.md
index e9bede1..4291456 100644
--- a/docs/guides/payments.md
+++ b/docs/guides/payments.md
@@ -51,9 +51,24 @@ card+lightning+onchain), NOT stored per row.
 3. else earliest-`connected_at` (deterministic) + a warning.
 
 `payment::build_provider(&row, ...)` constructs a **real** `BtcpayProvider` /
-`ZapriteProvider` from the DB row — there is **no mock seam** in this path, so
-integration tests can't drive it with `MockPaymentProvider` without one. The legacy
-`state.payment` arc is a compat shim and is NOT consulted by the new resolver.
+`ZapriteProvider` from the DB row. Tests swap in a mock via the always-compiled
+`AppState::provider_override` seam (`None` in production), honored by
+`AppState::provider_from_row` at every resolution site — see [testing](testing.md).
+The legacy `state.payment` arc is a compat shim and is NOT consulted by the new
+resolver.
+
+## Webhook settle confirmation (anti-forgery)
+
+Zaprite webhooks carry no signature, so the settle handler does **not** trust the
+webhook body's claim. `api/webhook.rs::handle_inner` re-fetches
+`provider.get_invoice_status` and requires `Settled` before persisting status or
+taking ANY settle-derived action (license issuance, tier-change, subscription
+renewal — the guard sits ahead of all of them). On a provider-API error it acks
+`200` without issuing — the reconcile loop re-confirms and issues on its next tick
+(fail-closed on issuance, and a 2xx avoids a provider retry-storm). **Not yet
+done**: a literal paid-amount/currency check (the trait exposes only a status
+enum); trusting the provider's own `Settled` determination is the current
+boundary — see the auditor's open P1.
 
 ## Provider connect
 
diff --git a/docs/guides/testing.md b/docs/guides/testing.md
index f21cfcc..7d0bfa4 100644
--- a/docs/guides/testing.md
+++ b/docs/guides/testing.md
@@ -37,21 +37,24 @@ purchase returning 500 on `:52`) shipped this way because the new merchant-profi
 resolution path had no passing test. When adding/altering a repo query, add a test
 that actually executes it. See [payments](payments.md).
 
-## Known-failing tests (3 in tests/api.rs)
+## Driving the purchase / settle path — the provider seam
 
-Test-debt from the `:52` payments transition; the backend is sound, these are a
-test-strategy decision, not bugs:
+Integration tests are a separate crate, so `#[cfg(test)]` doesn't reach the lib
+and `payment::build_provider` only ever makes real BTCPay/Zaprite clients. To
+drive the real resolver (`resolve_provider_for_profile_rail` /
+`payment_provider_by_id`) with a `MockPaymentProvider`, set the always-compiled
+`AppState::provider_override` field (`None` in production; honored by
+`provider_from_row`). `install_mock_provider` / `make_test_state_with_mock_provider`
+wire a mock into BOTH that field AND the legacy `state.payment` singleton (the
+back-compat `/v1/{kind}/webhook` route reads the singleton), and seed a real
+`payment_providers` row on the default profile so profile/rail/row resolution
+still runs for real. `MockPaymentProvider::new_unconfirmed()` /
+`new_status_unavailable()` vary the `get_invoice_status` answer to exercise the
+webhook settle-confirmation guard (see [payments](payments.md)).
 
-- `paid_purchase_creates_invoice_via_provider`, `paid_purchase_in_usd_records_listed_currency_and_rate`
-  — fail with a legitimate 400 ("no payment providers connected"); the fixture
-  seeds no provider. Reaching 200 needs the mock wired through
-  `resolve_provider_for_profile_rail` (`build_provider` only makes real clients;
-  `#[cfg(test)]` does not apply to integration tests against the lib).
-- `payment_provider_preference_round_trip` — inserts into the dropped
-  `btcpay_config`/`zaprite_config`; superseded by
-  `merchant_profile_provider_resolution_queries_round_trip` (delete or rewrite).
-
-The other 43 api tests + all other suites pass.
+The 3 `:52`-era known-failing tests are **resolved**: the two `paid_purchase_*`
+greened via the seam, the dead `payment_provider_preference_round_trip` deleted.
+api suite is now **47 pass / 0 fail**; all other suites green.
 
 ## Cross-language wire-format tests