Files
recap/docs/daily-digest-plan.md
Keysat b4fa5d7be8 Add opt-in Daily Digest (daily email of last 24h of library recaps)
Multi-mode, off by default. Each new recap is synthesized into a 1-2
paragraph overview via the relay (operator-absorbed) and cached onto the
session JSON; a daily 08:00 scan emails opted-in users their fresh
recaps, deduped by a per-user watermark that never skips a failed or
over-cap recap. One-click tokenized unsubscribe; settings-modal toggle;
admin test trigger. Bumps to 0.2.158.
2026-06-15 19:50:48 -05:00

146 lines
9.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Daily Digest — plan
Status: **proposed** (awaiting go-ahead). Captures the design agreed with Grant on
2026-06-15. Build only after sign-off.
## Goal
An **opt-in** (off by default) daily "wake-up" email to recaps.cc users: the recaps
added to their library in the last ~24 hours, each shown as a **synthesized 12
paragraph overview** generated from that recap's existing per-topic summaries. Turns
passive subscriptions into a daily touchpoint without making the user open the app.
## Decisions (locked 2026-06-15)
- **Content** — "overnight recaps": library additions since the user's last digest.
- **Audience / opt-in** — multi-mode (recaps.cc) first; **off by default**; per-user toggle.
- **Per-episode depth** — a 12 paragraph overview *synthesized from the stored topic
summaries* (`chunks`). NOT raw full text (too long, Gmail clips >~102 KB), NOT a
one-sentence blurb (too thin). This is Grant's call and it's what bounds email size.
- **Volume** — per-episode size is bounded by the 2-paragraph synthesis. Still cap at
~10 episodes per email with an "and N more in your library →" overflow link for
extreme days.
- **Cadence** — once per user per ~24h at a fixed server-time hour (default 08:00).
Timezone-aware send is a v2. **Skip the email entirely when nothing is new.**
- **Dedup** — a per-user `last_digest_at` watermark; each digest covers recaps created
since that instant, so nothing repeats and nothing is missed.
## Data (grounded in code)
- Saved recap record (`server/history.js` `saveToHistory`): `id`, `title`, `type`,
`url`, `createdAt` (ISO), `topicCount`, `chunks` (topics, each with bullet
summaries), `entries` (transcript), `speakers`/`speakerNames`. **No top-level
summary is stored** → the 12 paragraph overview must be synthesized.
- Multi-mode users live in the `users` table (`id`, `email`, …); a user's library
scope is their user id.
## Architecture
Mirror `server/subscription-reminders.js` (the proven daily-scan-plus-email pattern:
self-gating, deduped, never throws).
- **`server/daily-digest.js`** (new)
- `runDigestScan({ force })`: gate on `isSmtpReady()` + public URL set. For each
opted-in user, list sessions with `createdAt > last_digest_at`; if none, skip. For
each new recap, get-or-generate its overview (see below), render the email,
`sendMail`, then advance the watermark. Returns a `{sent, skipped}` summary; never
throws.
- `startDigestScheduler()`: boot delay + interval, fires near the target hour.
Idempotent; safe to start unconditionally in multi mode.
- **Synthesis** — `synthesizeEpisodeOverview(record)`: send the recap's topic titles +
bullet summaries to the relay LLM with a "write a 12 paragraph overview" prompt.
**Cache** the result back onto the session JSON (e.g. `digestOverview`) so it's
generated once and could later power an in-app episode overview. **Sanitize
operator-internal strings at this boundary** (Parakeet/CUDA/LAN IPs etc. must not
reach cloud users — existing repo convention).
- **Email** — `renderDigestEmail({ brandName, episodes, manageUrl, unsubscribeUrl })`
in `server/email-template.js`, matching the existing reminder/magic-link templates.
- **Opt-in storage** — migration in `server/db.js`: add `users.digest_enabled`
(default 0) and `users.last_digest_at` (ms, nullable). Toggle endpoint in
`server/account-routes.js` (requires session). Settings-modal toggle in
`public/index.html`.
- **Unsubscribe** — a one-click tokenized GET link in every email that flips
`digest_enabled = 0` without requiring login (signed token), plus the in-app toggle.
Consent + deliverability hygiene on the young recaps.cc domain.
- **Operator test trigger** — `POST /api/admin/digest/run { test_email }`, mirroring
the reminders test hook, so it can be smoke-tested without waiting a day.
## Cost / credits
The synthesis is one small relay LLM call per new recap per opted-in user, run once and
cached. Bounded by (opted-in users × new recaps/day). **Recommend operator-absorbed**
(it's a retention feature, input is already-short topic summaries) rather than drawing
the user's credits. Confirm.
## Open questions (defaults chosen; confirm or adjust)
1. **Synthesis cost owner**~~operator-absorbed (default) vs user credits?~~
**RESOLVED 2026-06-15: operator-absorbed, zero operator action.** The synthesis
provider is built with `resolveProviderOpts("relay", { req: null })` → the operator's
install identity, the *same* relay credit pool free signed-in users' summaries already
draw from (`providers/index.js` `pickRelayIdentity`). No comped system user-id needed.
Flipping to user-billing later = pass the recipient's cloud identity at the marked line
in `daily-digest.js` `buildSynthesisProvider()`.
2. **Send hour** — 08:00 server time (default)?
3. **Single-mode operator digest** — defer to a follow-on (default: multi-mode only v1)?
4. **Relay contract**~~does an existing relay endpoint (`/relay/analyze`) fit~~
**RESOLVED 2026-06-15: `/relay/analyze` fits as-is, no new relay capability.** The
route (`recap-relay/server/routes/analyze.js`) takes a free-form `{ prompt: string }`
and returns `{ result: { text } }`; the client already wraps it as
`relay.js` `analyzeText({ prompt }) → result.text`. "Topic sections JSON" is only what
today's `chunked-analyze.js` caller asks for in *its* prompt — the endpoint is generic.
Synthesis = build a "summarize these summaries into 12 paragraphs" prompt, read
`result.text`. **No cross-repo change.** (Aside: relay `AGENTS.md:78` still describes
this endpoint as `{ transcript, … } → topic sections JSON` — stale; flag for that repo.)
Billing: each standalone analyze charges 1 credit on the call's credit key unless it
shares an `X-Recap-Job-Id` — that's the Q1 (cost-owner) mechanism, decided at phase 2.
## Build phases
1. **BUILT 2026-06-15.** Schema + opt-in toggle. `db.js`: `users.digest_enabled`
(default 0) + `users.last_digest_at` (ms, nullable) via SCHEMA_SQL +
`migrateUserDigestPrefs`. `account-routes.js`: `GET`/`POST /api/account/digest`
(enabling stamps `last_digest_at = now` so the first send isn't a backlog dump).
`public/index.html`: settings-modal toggle (`renderDigestBlock` + `loadMyDigest` /
`setDigestEnabled`, optimistic with revert).
2. **BUILT 2026-06-15.** Synthesis + cache → `server/daily-digest.js`:
`buildOverviewPrompt` (pure), `scrubOperatorStrings` (conservative backstop — infra
proper nouns + LAN/private hosts; dropped CUDA to avoid mangling legit tech content),
`synthesizeEpisodeOverview` (relay `analyzeText`, operator-absorbed identity, stable
per-episode jobId), `getOrCreateEpisodeOverview` (`digestOverview` cache + best-effort
`patchSession` write-back). NOT wired into a scheduler yet — dormant until phase 3.
Tests: `test/daily-digest.test.js` (12, pass). Note: chunks carry a `summary` text per
topic (not bullets — the Data section's "bullet summaries" wording was loose).
3. **BUILT 2026-06-15.** Email + scan + scheduler + dedup + overflow cap.
`email-template.js` `renderDigestEmail` (minimal inline style, per-episode title→source
link + overview, overflow line, one-click unsubscribe). `daily-digest.js`:
`selectDigestEpisodes` (pure: watermark filter + cap + overflow), `runDigestScan`
(hourly tick, acts at `SEND_HOUR=8`; per-user `MIN_RESEND_MS=20h` + watermark dedup;
skips empty; advances watermark only on successful send; never throws),
`startDigestScheduler`, `setupDigestRoutes` (public `GET /api/digest/unsubscribe?token=`).
`history.js` `listScopeSessions`. `db.js` adds `users.digest_unsub_token` (minted lazily
on first send). Wired in `index.js` (multi-mode) + `tenant-auth.js` public path.
4. **BUILT 2026-06-15.** `POST /api/admin/digest/run``{test_email}` sends a sample
render; bare body forces a real scan now (bypasses the hour gate, not the resend gate).
Mirrors `/api/admin/reminders/run`.
5. **DONE.** `test/daily-digest.test.js` — 19 tests (prompt, scrub, synth/cache,
`selectDigestEpisodes` watermark/cap/overflow/empty, `scopeForUser`, email render).
Full suite **138 pass**. Verified on a real multi-mode boot: migrations apply, scheduler
starts, and the unsubscribe route (400/404/200 + flips `digest_enabled`) works end-to-end.
## Status: feature-complete, awaiting on-box smoke test
Built end-to-end but **not yet installed** (no version bump). The relay synthesis call and
SMTP send can only be exercised on the operator's box. Operator smoke test:
`POST /api/admin/digest/run {test_email}` to eyeball the render; then opt in, add a recap,
and force a scan (or wait for 08:00) to see a real synthesized digest.
**Fresh-eyes review applied (2026-06-15).** Three correctness fixes after a reviewer pass:
(1) the watermark now advances to the newest *sent* recap but never past a failed/deferred
one (`nextDigestWatermark`) — the old `now` stamp silently dropped both synthesis-failures
and over-cap overflow recaps forever; (2) `force` no longer bypasses the in-progress lock,
so an operator force-run during the scheduled tick can't double-send; (3) `idx_users_unsub_token`
is created in the migration, not `SCHEMA_SQL` (the latter runs before the column exists on
upgraded DBs → would crash boot). Existing-DB upgrade verified on a realistic pre-digest
schema. Also added an index on the unauthenticated token lookup + a null-scope guard.