Files
recap/docs/daily-digest-plan.md
T
Keysat b4fa5d7be8 Add opt-in Daily Digest (daily email of last 24h of library recaps)
Multi-mode, off by default. Each new recap is synthesized into a 1-2
paragraph overview via the relay (operator-absorbed) and cached onto the
session JSON; a daily 08:00 scan emails opted-in users their fresh
recaps, deduped by a per-user watermark that never skips a failed or
over-cap recap. One-click tokenized unsubscribe; settings-modal toggle;
admin test trigger. Bumps to 0.2.158.
2026-06-15 19:50:48 -05:00

9.3 KiB
Raw Blame History

Daily Digest — plan

Status: proposed (awaiting go-ahead). Captures the design agreed with Grant on 2026-06-15. Build only after sign-off.

Goal

An opt-in (off by default) daily "wake-up" email to recaps.cc users: the recaps added to their library in the last ~24 hours, each shown as a synthesized 12 paragraph overview generated from that recap's existing per-topic summaries. Turns passive subscriptions into a daily touchpoint without making the user open the app.

Decisions (locked 2026-06-15)

  • Content — "overnight recaps": library additions since the user's last digest.
  • Audience / opt-in — multi-mode (recaps.cc) first; off by default; per-user toggle.
  • Per-episode depth — a 12 paragraph overview synthesized from the stored topic summaries (chunks). NOT raw full text (too long, Gmail clips >~102 KB), NOT a one-sentence blurb (too thin). This is Grant's call and it's what bounds email size.
  • Volume — per-episode size is bounded by the 2-paragraph synthesis. Still cap at ~10 episodes per email with an "and N more in your library →" overflow link for extreme days.
  • Cadence — once per user per ~24h at a fixed server-time hour (default 08:00). Timezone-aware send is a v2. Skip the email entirely when nothing is new.
  • Dedup — a per-user last_digest_at watermark; each digest covers recaps created since that instant, so nothing repeats and nothing is missed.

Data (grounded in code)

  • Saved recap record (server/history.js saveToHistory): id, title, type, url, createdAt (ISO), topicCount, chunks (topics, each with bullet summaries), entries (transcript), speakers/speakerNames. No top-level summary is stored → the 12 paragraph overview must be synthesized.
  • Multi-mode users live in the users table (id, email, …); a user's library scope is their user id.

Architecture

Mirror server/subscription-reminders.js (the proven daily-scan-plus-email pattern: self-gating, deduped, never throws).

  • server/daily-digest.js (new)
    • runDigestScan({ force }): gate on isSmtpReady() + public URL set. For each opted-in user, list sessions with createdAt > last_digest_at; if none, skip. For each new recap, get-or-generate its overview (see below), render the email, sendMail, then advance the watermark. Returns a {sent, skipped} summary; never throws.
    • startDigestScheduler(): boot delay + interval, fires near the target hour. Idempotent; safe to start unconditionally in multi mode.
  • SynthesissynthesizeEpisodeOverview(record): send the recap's topic titles + bullet summaries to the relay LLM with a "write a 12 paragraph overview" prompt. Cache the result back onto the session JSON (e.g. digestOverview) so it's generated once and could later power an in-app episode overview. Sanitize operator-internal strings at this boundary (Parakeet/CUDA/LAN IPs etc. must not reach cloud users — existing repo convention).
  • EmailrenderDigestEmail({ brandName, episodes, manageUrl, unsubscribeUrl }) in server/email-template.js, matching the existing reminder/magic-link templates.
  • Opt-in storage — migration in server/db.js: add users.digest_enabled (default 0) and users.last_digest_at (ms, nullable). Toggle endpoint in server/account-routes.js (requires session). Settings-modal toggle in public/index.html.
  • Unsubscribe — a one-click tokenized GET link in every email that flips digest_enabled = 0 without requiring login (signed token), plus the in-app toggle. Consent + deliverability hygiene on the young recaps.cc domain.
  • Operator test triggerPOST /api/admin/digest/run { test_email }, mirroring the reminders test hook, so it can be smoke-tested without waiting a day.

Cost / credits

The synthesis is one small relay LLM call per new recap per opted-in user, run once and cached. Bounded by (opted-in users × new recaps/day). Recommend operator-absorbed (it's a retention feature, input is already-short topic summaries) rather than drawing the user's credits. Confirm.

Open questions (defaults chosen; confirm or adjust)

  1. Synthesis cost owneroperator-absorbed (default) vs user credits? RESOLVED 2026-06-15: operator-absorbed, zero operator action. The synthesis provider is built with resolveProviderOpts("relay", { req: null }) → the operator's install identity, the same relay credit pool free signed-in users' summaries already draw from (providers/index.js pickRelayIdentity). No comped system user-id needed. Flipping to user-billing later = pass the recipient's cloud identity at the marked line in daily-digest.js buildSynthesisProvider().
  2. Send hour — 08:00 server time (default)?
  3. Single-mode operator digest — defer to a follow-on (default: multi-mode only v1)?
  4. Relay contractdoes an existing relay endpoint (/relay/analyze) fit RESOLVED 2026-06-15: /relay/analyze fits as-is, no new relay capability. The route (recap-relay/server/routes/analyze.js) takes a free-form { prompt: string } and returns { result: { text } }; the client already wraps it as relay.js analyzeText({ prompt }) → result.text. "Topic sections JSON" is only what today's chunked-analyze.js caller asks for in its prompt — the endpoint is generic. Synthesis = build a "summarize these summaries into 12 paragraphs" prompt, read result.text. No cross-repo change. (Aside: relay AGENTS.md:78 still describes this endpoint as { transcript, … } → topic sections JSON — stale; flag for that repo.) Billing: each standalone analyze charges 1 credit on the call's credit key unless it shares an X-Recap-Job-Id — that's the Q1 (cost-owner) mechanism, decided at phase 2.

Build phases

  1. BUILT 2026-06-15. Schema + opt-in toggle. db.js: users.digest_enabled (default 0) + users.last_digest_at (ms, nullable) via SCHEMA_SQL + migrateUserDigestPrefs. account-routes.js: GET/POST /api/account/digest (enabling stamps last_digest_at = now so the first send isn't a backlog dump). public/index.html: settings-modal toggle (renderDigestBlock + loadMyDigest / setDigestEnabled, optimistic with revert).
  2. BUILT 2026-06-15. Synthesis + cache → server/daily-digest.js: buildOverviewPrompt (pure), scrubOperatorStrings (conservative backstop — infra proper nouns + LAN/private hosts; dropped CUDA to avoid mangling legit tech content), synthesizeEpisodeOverview (relay analyzeText, operator-absorbed identity, stable per-episode jobId), getOrCreateEpisodeOverview (digestOverview cache + best-effort patchSession write-back). NOT wired into a scheduler yet — dormant until phase 3. Tests: test/daily-digest.test.js (12, pass). Note: chunks carry a summary text per topic (not bullets — the Data section's "bullet summaries" wording was loose).
  3. BUILT 2026-06-15. Email + scan + scheduler + dedup + overflow cap. email-template.js renderDigestEmail (minimal inline style, per-episode title→source link + overview, overflow line, one-click unsubscribe). daily-digest.js: selectDigestEpisodes (pure: watermark filter + cap + overflow), runDigestScan (hourly tick, acts at SEND_HOUR=8; per-user MIN_RESEND_MS=20h + watermark dedup; skips empty; advances watermark only on successful send; never throws), startDigestScheduler, setupDigestRoutes (public GET /api/digest/unsubscribe?token=). history.js listScopeSessions. db.js adds users.digest_unsub_token (minted lazily on first send). Wired in index.js (multi-mode) + tenant-auth.js public path.
  4. BUILT 2026-06-15. POST /api/admin/digest/run{test_email} sends a sample render; bare body forces a real scan now (bypasses the hour gate, not the resend gate). Mirrors /api/admin/reminders/run.
  5. DONE. test/daily-digest.test.js — 19 tests (prompt, scrub, synth/cache, selectDigestEpisodes watermark/cap/overflow/empty, scopeForUser, email render). Full suite 138 pass. Verified on a real multi-mode boot: migrations apply, scheduler starts, and the unsubscribe route (400/404/200 + flips digest_enabled) works end-to-end.

Status: feature-complete, awaiting on-box smoke test

Built end-to-end but not yet installed (no version bump). The relay synthesis call and SMTP send can only be exercised on the operator's box. Operator smoke test: POST /api/admin/digest/run {test_email} to eyeball the render; then opt in, add a recap, and force a scan (or wait for 08:00) to see a real synthesized digest.

Fresh-eyes review applied (2026-06-15). Three correctness fixes after a reviewer pass: (1) the watermark now advances to the newest sent recap but never past a failed/deferred one (nextDigestWatermark) — the old now stamp silently dropped both synthesis-failures and over-cap overflow recaps forever; (2) force no longer bypasses the in-progress lock, so an operator force-run during the scheduled tick can't double-send; (3) idx_users_unsub_token is created in the migration, not SCHEMA_SQL (the latter runs before the column exists on upgraded DBs → would crash boot). Existing-DB upgrade verified on a realistic pre-digest schema. Also added an index on the unauthenticated token lookup + a null-scope guard.