Snapshot of the working tree before cleanup. Captures: - Keysat licensing: server/license.js, /api/license/* endpoints in server/index.js, activation modal in public/index.html, embedded Ed25519 issuer key (assets/issuer.pub). - StartOS 0.4 expansion: setApiKey action, version files v0.1.1 through v0.1.15, file-models/config.json.ts, manifest updates. - Self-hosted registry server (startos-registry/). - Build/deploy scripts (bin/bump-version.sh, bin/deploy.sh, vendored yt-dlp binary), .gitignore, .deploy.env.example. - Recent design docs (KEYSAT_INTEGRATION.md, UPGRADE-DESIGN.md) — retained here so they remain recoverable when removed in the follow-up cleanup commit.
26 KiB
YouTube Summarizer — Upgrade design notes
Design sketches for three follow-up features:
- Bundled-LLM relay — let buyers use the operator's API credentials without exposing them
- Multi-provider LLM support — OpenAI, Anthropic Claude, VeniceAI, etc. as Pro alternatives to Gemini
- OpenWebUI integration — connect to a buyer's local LLM running on the same Start9 box
These are not mutually exclusive; the relay design is the architectural foundation that the other two slot into.
1. Bundled-LLM relay
The user's first idea — "use my running youtube-summarizer instance"
"If I already have my own youtube-summarizer instance running on clearnet, maybe other users can somehow hit that instance and my instance will send them the json files?"
This is coherent and would work, but the tradeoffs aren't great:
- Privacy. The relay sees every YouTube URL every customer processes — channel, video, timing. This makes you a de facto traffic-analysis honeypot whether you want one or not. Customers who care about Start9-style sovereignty will balk.
- Bandwidth. youtube-summarizer downloads the audio with
yt-dlp, splits it withffmpeg, then sends audio to Gemini for transcription. If your instance does that work, every customer's video transits your bandwidth twice (down from YouTube, up to Google). A typical hour-long podcast is 30–80 MB. 100 active customers running 5 videos/day each = 15–40 GB/day. - Self-host violation. Customers chose Start9 specifically to keep media-access local. Routing it through your box silently undoes that choice.
- Operational scope creep. Your instance now needs hardened uptime, capacity planning, abuse handling, ban-evasion when YouTube rate-limits you, etc. — turning a side product into infrastructure.
So while it'd technically work, I'd retire that variant in favor of a narrower relay design that keeps user-data and self-host benefits intact.
The architecture: a thin LLM proxy
Keep yt-dlp, ffmpeg, and the orchestration on the customer's box. Move only the LLM API call through your relay. Your Gemini key never leaves your server.
[customer's Start9 box] [your relay] [Google]
youtube-summarizer ──signed─────► relay.keysat.xyz ──────► generativelanguage.googleapis.com
yt-dlp + ffmpeg verifies LIC1 (your API key)
builds Gemini prompt enforces tier limits
POSTs to relay forwards to Gemini
◄─stream─ streams response back ◄────────
receives chunks JSON
saves to history
The customer's app does all the heavy lifting (download, audio split, prompt assembly, history). The only thing that crosses the wire to your relay is the actual Gemini API call body — which would have gone to Google directly anyway.
Relay API contract (v1)
Single endpoint, mirrors Gemini's REST shape so the customer's existing Gemini client code barely changes:
POST https://relay.keysat.xyz/v1/proxy/gemini/{model}:generateContent
Headers:
X-Keysat-License: LIC1-... # the customer's license key
X-Keysat-Product: youtube-summarizer
Content-Type: application/json
Body:
{ ...exactly what they would have sent to Google... }
Response:
Either Gemini's response verbatim, or:
402 { error: "license_required", message: "..." }
402 { error: "feature_not_in_tier", feature: "bundled_api", message: "..." }
402 { error: "rate_limit_exceeded", reset_at: "...", message: "..." }
401 { error: "license_invalid", reason: "revoked|expired|product_mismatch" }
A streaming variant for :streamGenerateContent does the same thing with chunked transfer encoding so first-token latency stays low.
The relay does four things, in order, per request:
- Verify the license offline using the embedded Keysat public key (same
Verifierthe customer's app uses). Reject with 401 if signature fails or product slug doesn't match. - Check entitlements. The license needs whatever entitlement gates this feature. For example, Pro tier might have
bundled_api; Core wouldn't. - Enforce rate limits per license per day, persisted in a small KV (Redis, SQLite, even a flat JSON file for v0). Default: Core = 0, Pro = N requests/day. Configurable at deploy time.
- Forward to Gemini with your real API key. Stream the response back. Log the
license_id, model, token count, and rough cost for billing visibility — never log the prompt content (privacy).
Implementation outline
A v1 relay is small — ~200 lines of Node, deployable on Cloudflare Workers, Fly.io, or your own Start9 box.
// pseudocode for the core forward handler
import { Verifier, PublicKey } from '@keysat/licensing-client'
import { Hono } from 'hono'
const ISSUER_PEM = process.env.ISSUER_PEM! // your Keysat issuer pubkey
const GEMINI_KEY = process.env.GEMINI_API_KEY!
const verifier = new Verifier(PublicKey.fromPem(ISSUER_PEM))
const app = new Hono()
app.post('/v1/proxy/gemini/:model{.+}', async (c) => {
const license = c.req.header('X-Keysat-License')
if (!license) return c.json({ error: 'license_required' }, 402)
// 1. Verify
let payload
try { payload = verifier.verify(license).payload }
catch (e) { return c.json({ error: 'license_invalid', reason: e.message }, 401) }
if (payload.productSlug !== 'youtube-summarizer') {
return c.json({ error: 'product_mismatch' }, 401)
}
// 2. Entitlement
if (!payload.entitlements.includes('bundled_api')) {
return c.json({ error: 'feature_not_in_tier', feature: 'bundled_api' }, 402)
}
// 3. Rate limit (per license per UTC day)
const today = new Date().toISOString().slice(0, 10)
const used = await usageStore.incr(`${payload.licenseId}:${today}`)
const cap = entitlementsToDailyCap(payload.entitlements) // e.g. Pro = 50
if (used > cap) {
return c.json({ error: 'rate_limit_exceeded', reset_at: tomorrowMidnightUtc() }, 402)
}
// 4. Forward
const upstream = `https://generativelanguage.googleapis.com/v1/models/${c.req.param('model')}?key=${GEMINI_KEY}`
const r = await fetch(upstream, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: c.req.raw.body,
})
// log usage (no body content)
await metrics.record({
license_id: payload.licenseId,
model: c.req.param('model'),
tokens_in: r.headers.get('x-goog-prompt-tokens'),
tokens_out: r.headers.get('x-goog-completion-tokens'),
status: r.status,
})
return new Response(r.body, { status: r.status, headers: r.headers })
})
Customer-side change
In the youtube-summarizer's server/index.js, the existing GoogleGenAI instantiation gets a base URL override when the user has the bundled-API option enabled:
// today
const ai = new GoogleGenAI({ apiKey, httpOptions: { ... } })
// with relay
const useBundled = LIC.entitlements.has('bundled_api') && !clientKey
const ai = new GoogleGenAI({
apiKey: useBundled ? LIC.licenseKey : apiKey, // license key as bearer
baseUrl: useBundled ? RELAY_BASE_URL : undefined,
defaultHeaders: useBundled ? { 'X-Keysat-License': LIC.licenseKey } : undefined,
})
The user never sees a Gemini API key field if their license includes bundled_api. They install, activate, and it works.
Operational considerations
- Cost monitoring. Wire the metrics output into a billing alert (e.g. Cloudflare Workers Analytics, or a Slack webhook on threshold). A misconfigured policy that gives Core users
bundled_apicould blow through your budget overnight. - Provider hot-swap. Because the relay sits between customer and Gemini, you can swap to a different LLM provider without any client update. Useful for cost optimization or if Google ever revokes your key.
- Self-hostable relay. Ship the relay code as its own s9pk so customers who want to bring their own API key can run their own relay. Same code, their key. This keeps the "Start9 sovereignty" pitch intact for the subset of users who care.
- Resilience. Add an
X-Keysat-Bundled-Offheader (or a license entitlement flag) the customer's app can use to bypass the relay if it's down — falls back to "you need to enter a key in settings."
Variants worth considering
- Two-tier rate limits inside Pro. Pro_lite (10/day), Pro_unlimited (no cap, costs more). Just two policies in Keysat admin with different
bundled_apirate annotations. - Token-budget instead of request-count. Daily Gemini token cap rather than request cap. More accurate cost control but harder to communicate to buyers.
- Stripe-style metered billing later. If usage ever justifies it, the relay's per-license metrics are exactly what you'd hand to a metered-billing system.
Top-up credits — pay-as-you-go beyond the daily cap
A natural extension of the daily-rate-limit model: when a buyer hits their cap and wants more right now (instead of waiting for the UTC reset), let them buy a credit pack. Each pack adds N requests to their license's available pool, drawn down as they go.
Why this works for your business model. It captures the high-engagement long tail: the buyer who occasionally has a "need to summarize 30 podcasts today for research" day. Without credits, they either upgrade to a higher tier they don't actually need (overcharge), or they hit the wall and leave annoyed (lost engagement). Credits convert those moments into incremental Bitcoin revenue with no commitment.
Why it works architecturally. Keysat already has BTCPay integration and a buy-flow for products. Credit packs are just another product slug in your Keysat admin (youtube-summarizer-credits-100, -500, etc.) priced in sats. The buy flow returns a credit token — a short signed blob, similar shape to a LIC1-... license but smaller — that the customer's app posts to your relay to redeem. The relay verifies the signature, increments that license's credit balance in its KV store, marks the token as used.
Token format and verification. Same Ed25519 trust root as licenses, different prefix:
CRED1-AIBAH5T...4LXMZW2A
Payload includes: credit_pack_id (UUID), license_id (the license being topped up), units (e.g. 100), issued_at, expires_at (or 0 for never), signature. The relay's redeem endpoint:
POST https://relay.keysat.xyz/v1/credits/redeem
Headers: X-Keysat-License: LIC1-...
Body: { "credit_token": "CRED1-..." }
200 → { "ok": true, "credits_added": 100, "balance": 134 }
400 → { "error": "already_redeemed" | "license_mismatch" | "expired" | "bad_signature" }
Relay accounting. Modify the rate-limit step in §"Implementation outline" to consult both the daily cap and the credit pool:
1. Today's daily allotment used N of M (Pro = 50/day, say)
2. If N < M → allow, increment daily counter
3. If N >= M → check credit balance for this license
4. If credits > 0 → allow, decrement credit balance
5. Else → 402 rate_limit_exceeded with "buy more" hint
The 402 response includes a buy-credits URL the app can deep-link to, same pattern as the existing /buy/youtube-summarizer link:
{
"error": "rate_limit_exceeded",
"reset_at": "2026-05-08T00:00:00Z",
"buy_credits_url": "https://licensing.keysat.xyz/buy/youtube-summarizer-credits-100",
"message": "You've used today's allotment. Buy a credit pack or wait until midnight UTC."
}
Customer-side UX. In settings, the License block now shows usage:
Pro license — Active
Today: 50 / 50 used. Credits: 23 available.
[Buy 100 more credits — 5,000 sats]
When they tap "Buy", an await client.startPurchase('youtube-summarizer-credits-100')-style flow opens BTCPay, settles, returns a CRED1-... token, the app POSTs it to /v1/credits/redeem, balance updates. Same friction profile as the original license activation — paste-from-clipboard or open-URL — except now it's mid-flow rather than first-launch.
Operator pricing knobs. Each credit pack you list in Keysat admin has its own price + units. Common shapes:
- 100 credits @ 5K sats ($3) — impulse-buy size
- 500 credits @ 20K sats ($12) — moderate top-up
- 2000 credits @ 60K sats ($35) — power user (effective ~30% discount per credit)
Set the underlying cost-per-credit so 1 credit ≈ 1 Gemini call ≈ your cost × margin. The pack-discount structure encourages bigger top-ups, which matches your incentive (you want fewer, larger top-ups to amortize the BTCPay fee per transaction).
Edge cases worth thinking about.
- Refunds. Credits are non-refundable by default (they're prepaid API time, not durable goods). State this in the buy-page copy.
- Expiration. Decide whether credits expire. Pros of expiration: predictable cost liability on your books. Cons: customer-hostile. A 12-month rolling expiration is a fair compromise.
- Per-product slug or universal. Credit packs could be product-specific (
youtube-summarizer-credits-100) or operator-wide (keysat-credits-100, redeemable across any of your products). Universal is more flexible but harder to price coherently if the underlying API costs differ across products. - Multi-machine. Credits attach to a license, and licenses can move between machines if the operator allows. Make sure the credit balance follows the license, not the fingerprint. (Mostly a non-issue if your relay state is keyed by
license_id.)
Effort estimate. ~2 days on top of the base relay:
- Credit-token format + Ed25519 signing helper in the licensing service: 0.5 day (mirrors the existing license format).
- BTCPay product slug for credit packs in Keysat admin: 0.25 day (reuse existing buy-flow).
- Relay redeem endpoint + balance accounting: 0.5 day.
- Customer app: balance display + buy-flow integration: 0.5 day.
- Test end-to-end (buy → redeem → relay deduct → exhaust → buy again): 0.25 day.
Worth shipping with the v1 relay or after? I'd ship credits a release after the base relay. Validate the daily-cap model is the right primitive first; if buyers regularly hit caps, that's the demand signal that justifies adding credits. Shipping both at once risks over-investing in a monetization layer no one needs.
2. Multi-provider LLM support
Why this is a real lift
Each provider's API is similar in spirit but different in detail, especially around:
- Audio handling. Today's youtube-summarizer ships audio chunks directly to Gemini, which natively transcribes them. Most other providers don't accept audio at all — you have to transcribe first, then send text. So adding non-Gemini providers means adding a transcription layer, which is a separate API surface.
- Structured output. youtube-summarizer's prompts ask Gemini for JSON-formatted topic chunks. Different providers express "give me JSON" differently (function-calling, response-format JSON mode, schema-guided decoding, or just "trust the prompt"). Inconsistent reliability.
- Streaming. Each provider's stream protocol is different (SSE shape, delta encoding). Either standardize on a unified stream type internally, or accept that streaming UX differs per provider.
- Token limits and context windows. A 3-hour podcast that fits in Gemini 2.5 Pro's 2M-token window won't fit in Claude Opus's 200K. Forces chunking strategy decisions per provider.
Provider taxonomy
| Provider | Native audio | Long context | OpenAI-compat API | Notes |
|---|---|---|---|---|
| Google Gemini 2.5 | yes | 2M tokens | no | What you have today. |
| OpenAI GPT-4o | yes (audio modality) | 128K | yes (it's the reference) | Separate audio.transcriptions endpoint for Whisper. |
| Anthropic Claude | no | 200K | partial (via wrappers) | Need external transcription. |
| VeniceAI | no (most models) | varies | yes (OpenAI-compatible) | Privacy-focused; uses open models. |
| OpenAI-compatible local (Ollama, OpenWebUI, vLLM) | usually no | varies | yes | See section 3. |
The realistic picture: there are two LLM steps in your pipeline (transcription + topic analysis), and providers split into "can do both," "can do only the second," and "can't do either with quality." Multi-provider support means designing both steps as pluggable.
Provider abstraction
Define a small interface in server/:
// server/providers/types.ts
export interface TranscriptionProvider {
name: string
transcribe(audioChunk: Buffer, opts: { language?: string }): Promise<TranscriptResult>
}
export interface AnalysisProvider {
name: string
analyze(transcript: string, prompt: PromptSpec): AsyncIterable<AnalysisChunk>
}
export interface ProviderBundle {
transcribe: TranscriptionProvider
analyze: AnalysisProvider
}
Then concrete implementations:
server/providers/
gemini.ts # both transcribe + analyze, native audio
openai.ts # both transcribe (Whisper) + analyze
claude.ts # analyze only — pairs with whisper or deepgram
venice.ts # analyze only (OpenAI-compatible) — pairs with whisper
openwebui.ts # analyze only, OpenAI-compatible at custom base URL
whisper-cpp.ts # local transcribe via whisper.cpp binary
deepgram.ts # remote transcribe, very cheap
Then a small registry that picks the right combo:
const BUNDLES: Record<string, ProviderBundle> = {
'gemini': { transcribe: gemini, analyze: gemini },
'openai-gpt4o': { transcribe: openai, analyze: openai },
'claude+whisper': { transcribe: openai, analyze: claude },
'venice+whisper': { transcribe: openai, analyze: venice },
'local': { transcribe: whisper, analyze: openwebui },
}
The user picks a bundle in settings; /api/process reads it and dispatches accordingly.
What this looks like for the user
In the settings panel, the existing "Analysis Model" section becomes "Provider":
Provider:
◉ Gemini (default, fast, includes audio transcription)
○ OpenAI GPT-4o (Pro feature)
○ Anthropic Claude + Whisper (Pro feature)
○ VeniceAI + Whisper (Pro feature, privacy-focused)
○ Local LLM via OpenWebUI (Pro feature, see "Connect OpenWebUI")
Model: [model dropdown — provider-specific]
API Key: [______________] (per-provider, stored locally)
Each provider has its own API key field. Pro tier unlocks non-Gemini providers via a multi_provider entitlement.
Pricing/tier tie-in
This pairs cleanly with the relay design from §1: your bundled relay can support multiple providers behind the same bundled_api entitlement. Customer's tier determines which providers are reachable.
Core tier: ["core", "history", "library"]
→ Gemini only, BYO key
Pro tier: ["core", "history", "library", "subscriptions", "clips", "multi_provider", "bundled_api"]
→ all providers; bundled relay covers Gemini + OpenAI; BYO available for any
Effort estimate
- Provider interface + Gemini refactor: 1 day. Move existing logic into
gemini.tsmatching the new interface. - OpenAI provider: 1 day. The OpenAI Node SDK is straightforward; transcription via
audio.transcriptions.create({ model: 'whisper-1' }). - Claude provider: 1 day. Pair with OpenAI's Whisper (or Deepgram) for audio. Prompt-engineer JSON output (Claude prefers
<json>...</json>tags or function calling). - VeniceAI: 0.5 day if it's truly OpenAI-compatible — basically the OpenAI provider with a different base URL.
- Frontend provider switcher + per-provider key fields: 1 day.
- Testing across providers: 2 days. Different audio quality, different JSON adherence, different latencies.
About a week of focused work for v1. Worth doing only if you have buyers actively asking, since maintenance scales with provider count (each one breaks differently when the vendor changes pricing/APIs).
3. OpenWebUI / local LLM integration
Why this is special
OpenWebUI on a Start9 box gives the user a self-hosted local LLM (Llama 3, Mistral, whatever they've pulled). It exposes an OpenAI-compatible API. From youtube-summarizer's perspective, it's "OpenAI provider but pointed at http://openwebui-internal.local:3000/v1" instead of https://api.openai.com/v1.
The Start9-specific superpower: service mesh dependencies. youtube-summarizer's manifest can declare a soft dependency on OpenWebUI, and StartOS will inject a hostname/port the customer's box can reach internally. No customer-typed URL required.
Architectural shape
[customer's Start9 box]
+--------------------------+ +-------------------------+
| youtube-summarizer | | OpenWebUI |
| (this app) |◄────►| (local LLM via Ollama) |
+--------------------------+ +-------------------------+
uses openwebui's hosts the actual model
OpenAI-compatible API (Llama 3 70B, etc.)
In manifest/index.ts:
dependencies: {
'openwebui': {
type: 'opt-in',
description: 'Optional: use a local LLM running on this server.',
versionRange: '>=0.4.0',
requirement: 'optional',
}
}
When the dependency is present and started, StartOS gives youtube-summarizer the OpenWebUI hostname (e.g. http://openwebui.embassy:8080 on the internal mesh). The app's "Local LLM" provider option becomes auto-configured.
Limitations to call out clearly
- No native audio. Most local models won't accept audio; they're text-only. Need a separate transcription path: ship
whisper.cppin the youtube-summarizer container (~30 MB) and run it locally. The customer's CPU does the transcription. Slow on Pi-class hardware (a 1-hour podcast might take 10+ minutes); fine on desktop-class. - JSON adherence varies. Local models are less reliable at structured output than Gemini/Claude. Need defensive parsing + retries. Consider using a JSON-schema-guided decoder (xgrammar, llguidance) if available in OpenWebUI's runner.
- Context windows are smaller. Llama 3.1 70B = 128K. Long podcasts may need chunked summarization-of-summarizations strategy. Existing chunking logic adapts but needs tuning.
- Compute cost. Running 70B inference on a Start9 box with no GPU is ~5–30 sec/token. Fine for a topic-summary of a transcript chunk; rough for a long-form summary. Consider Llama 3.1 8B as default.
Pricing tie-in
This one's interesting — it's a Pro feature but has zero operator cost (the customer's hardware does the work). So it's pure margin once Pro is bought. Could even be marketed as: "Pro gives you Bundled Gemini OR your own local LLM — no API bills either way."
Effort estimate
- Manifest dependency declaration: 1 hour.
- OpenWebUI provider (subset of the OpenAI provider work): 1 day. Just a different base URL + auto-config from the StartOS-injected hostname.
- Local Whisper integration: 1 day. Ship
whisper.cppbinary, expose a transcription endpoint, fall back to Gemini-based transcription if the binary errors. - Tuning prompts for smaller local models: 2 days. Llama 3.1 needs different prompting than Gemini. Iterative.
- Frontend "OpenWebUI detected" affordance: 0.5 day. When the dependency is present, show a green badge and one-click switch.
About 4–5 days. Cleanest if done after the multi-provider abstraction (§2) lands, since OpenWebUI is just another provider in that taxonomy.
How they all stack
There's a natural ordering of work:
- Land the relay first (§1). It's the foundation — once Gemini-via-relay works, every later provider plugs into the same relay pipeline.
- Multi-provider abstraction (§2). The provider interface is what makes the relay support multiple upstreams without growing fragile if-statements.
- OpenWebUI as a provider (§3). Just another bundle in the registry once §2 is done.
If you ship in that order, the work compounds. If you ship them out of order, each one needs partial rework when the next lands.
A reasonable cadence: relay v1 (~1 week) → multi-provider Pro feature (~1 week) → OpenWebUI integration (~3 days). Roughly 3 weeks of focused work for the full vision, parallelizable in places.
Pricing scenarios for reference
Sketches only — actual numbers depend on volume. Goal is to show how the architecture supports different pricing strategies.
Scenario A: bundled-only, simple
- Core (one-time) → BYO Gemini key, all single-video features
- Pro (one-time) → Bundled Gemini (50 videos/day), all features
- Operator margin per Pro license = (Pro price) − (50 × cost-per-video × expected-license-lifetime)
Scenario A+: bundled with credit top-ups
- Same as A, plus: when a Pro license hits its daily cap, buyer can purchase credit packs (100/500/2000) via BTCPay to extend within the same license. Captures heavy-use moments without forcing tier upgrades.
- Margin per credit pack = (pack price) − (units × cost-per-call). Set so each pack discount tier still nets ≥40% margin even at full burn-down.
- See section 1 for the architecture; effort is ~2 days on top of base relay.
Scenario B: bundled subscription
- Core (one-time) → BYO key, all single-video features
- Pro Lite (subscription) → Bundled Gemini (10/day), all features
- Pro Unlimited (subscription) → Bundled Gemini (no cap), multi-provider, all features
- Per-month subscription billing → recurring cost coverage
Scenario C: provider-tiered
- Core (one-time) → BYO Gemini key only
- Pro Standard (one-time) → Bundled Gemini, OpenAI, multi-provider
- Pro Local (one-time) → Multi-provider including OpenWebUI/local LLM, no bundled
- Splits the customer base by what they actually want
The current tier setup (Core/Pro split) is structurally compatible with all three scenarios. You'd just adjust the entitlement-to-feature mapping in Keysat admin policies.