Capture: recap subscription entitlement-gate verification

This commit is contained in:
Keysat
2026-06-20 09:10:09 -05:00
parent dc8312a944
commit 41f80881ef
+1
View File
@@ -47,3 +47,4 @@ Example:
- [ ] (proof-of-work) [feature][P2] brainstorm better tracking of cardio logging and cardio program planning (in-week variety and long term programs) — via matrix, 2026-06-19 - [ ] (proof-of-work) [feature][P2] brainstorm better tracking of cardio logging and cardio program planning (in-week variety and long term programs) — via matrix, 2026-06-19
- [ ] (matrix-bridge) [bug][P2] what are the open brackets when you log an inbox item through matrix, eg “📥 captured → - [ ] (proof-of-work) [feature][P2] brainstorm better tracking of cardio logging and cardio program planning (in-week variety and long term programs) — via matrix, 2026-06-19” — via matrix, 2026-06-19 - [ ] (matrix-bridge) [bug][P2] what are the open brackets when you log an inbox item through matrix, eg “📥 captured → - [ ] (proof-of-work) [feature][P2] brainstorm better tracking of cardio logging and cardio program planning (in-week variety and long term programs) — via matrix, 2026-06-19” — via matrix, 2026-06-19
- [ ] (recap-relay) [bug][P1] Analyze-phase hang permanently jams the single in-memory hardware FIFO slot → ALL YouTube processing (manual + background subscriptions) blocks; caused a multi-hour (possibly ~week) operator-box outage. ROOT CAUSE (confirmed via operator dashboard + relay logs — job 9514ee26, video mxg6OsCl7Oc: download 14s ok → transcribe 200s ok → "starting analyze" then silent ~700min, status UNKNOWN): analyzeText's ai.models.generateContent (server/backends/gemini.js:587 analyze; :281 transcribe) passes NO AbortSignal, and the client-wide httpOptions.timeout (900_000ms, gemini.js:216) doesn't fire on a half-open/stalled connection so the await never settles; chunked-analyze retry (server/chunked-analyze.js:642) is bounded and only catches rejections (a hang never throws); the worker releases the hardware slot only in its finally (server/routes/summarize-url.js:1105), which a never-completing try never reaches; server/hardware-queue.js acquireHardwareSlot (:51-98) is a single in-memory FIFO with NO acquire-timeout / dead-holder detection and jobs.js has no stuck-"running" watchdog, so later jobs wait forever ("queued at position N"). FIX (2 parts + a unit test; recap-relay is its own repo — own version bump + make install, never deploy to registry): (1) PROXIMATE — thread AbortSignal.timeout(...) into the Gemini analyze + transcribe generateContent calls so a stall REJECTS → retry/fallback + worker finally run → job fails cleanly instead of zombie-ing (verify @google/genai per-request abort param, config.abortSignal vs signal, against installed SDK); (2) SYSTEMIC — add a dead-holder watchdog (worker- or slot-level) that force-releases the slot + marks the job failed past a hard ceiling so no future hang can permanently jam the FIFO SPOF. IMMEDIATE UNBLOCK (operational, not the fix): restart recap-relay (job map + slot are in-memory) — recurs until fixed. SECONDARY: occasional "[chunked-analyze] invalid JSON in window response — retrying" self-recovers (look only if frequent); and the full week-long subscription silence exceeds this one ~12h job — check https://recaps.cc/api/sub-check-log whether earlier jobs hung identically or the background sub-check entitlement gate (recap server/index.js:1400, Keysat-license-gated, may skip silently post-core-decoupling) is the cause — diagnosed from dashboard + logs, not yet fixed, 2026-06-20 - [ ] (recap-relay) [bug][P1] Analyze-phase hang permanently jams the single in-memory hardware FIFO slot → ALL YouTube processing (manual + background subscriptions) blocks; caused a multi-hour (possibly ~week) operator-box outage. ROOT CAUSE (confirmed via operator dashboard + relay logs — job 9514ee26, video mxg6OsCl7Oc: download 14s ok → transcribe 200s ok → "starting analyze" then silent ~700min, status UNKNOWN): analyzeText's ai.models.generateContent (server/backends/gemini.js:587 analyze; :281 transcribe) passes NO AbortSignal, and the client-wide httpOptions.timeout (900_000ms, gemini.js:216) doesn't fire on a half-open/stalled connection so the await never settles; chunked-analyze retry (server/chunked-analyze.js:642) is bounded and only catches rejections (a hang never throws); the worker releases the hardware slot only in its finally (server/routes/summarize-url.js:1105), which a never-completing try never reaches; server/hardware-queue.js acquireHardwareSlot (:51-98) is a single in-memory FIFO with NO acquire-timeout / dead-holder detection and jobs.js has no stuck-"running" watchdog, so later jobs wait forever ("queued at position N"). FIX (2 parts + a unit test; recap-relay is its own repo — own version bump + make install, never deploy to registry): (1) PROXIMATE — thread AbortSignal.timeout(...) into the Gemini analyze + transcribe generateContent calls so a stall REJECTS → retry/fallback + worker finally run → job fails cleanly instead of zombie-ing (verify @google/genai per-request abort param, config.abortSignal vs signal, against installed SDK); (2) SYSTEMIC — add a dead-holder watchdog (worker- or slot-level) that force-releases the slot + marks the job failed past a hard ceiling so no future hang can permanently jam the FIFO SPOF. IMMEDIATE UNBLOCK (operational, not the fix): restart recap-relay (job map + slot are in-memory) — recurs until fixed. SECONDARY: occasional "[chunked-analyze] invalid JSON in window response — retrying" self-recovers (look only if frequent); and the full week-long subscription silence exceeds this one ~12h job — check https://recaps.cc/api/sub-check-log whether earlier jobs hung identically or the background sub-check entitlement gate (recap server/index.js:1400, Keysat-license-gated, may skip silently post-core-decoupling) is the cause — diagnosed from dashboard + logs, not yet fixed, 2026-06-20
- [ ] (recap) [bug][P2] Verify whether cloud background subscription processing is silently skipped post-core-decoupling: _checkSubscriptionsInner returns early ("Skipped: subscriptions require a Pro license") unless licenseMW.LIC.entitlements.has("subscriptions") (server/index.js:1400), but cloud paid status moved to users.tier (relay-owned), NOT a per-user Keysat license — so the owner/operator scope may lack that entitlement and the daily background sub-check could skip silently while manual processing still works. UNCONFIRMED lead surfaced while diagnosing the 2026-06-20 relay analyze-hang outage (separate issue; see the recap-relay inbox item). Check via https://recaps.cc/api/sub-check-log (signed in) — 2026-06-20