Files
proof-of-work/docs/guides/ai-subsystem.md
T
Keysat d4557304a5
CI / proof-of-work (Next.js app) (push) Waiting to run
CI / start9/0.4 (StartOS package code) (push) Waiting to run
docs(ai): record model-output robustness patterns
Capture what the first live SparkControl/Qwen run taught: looseInt decimal tolerance, the exerciseMatch name->library auto-mapping, and the thinking-token latency characteristic + its lever. Durable subsystem knowledge for future sessions touching the generate flows.
2026-06-19 16:23:36 -05:00

116 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
paths:
- proof-of-work/lib/ai/**
- proof-of-work/app/api/ai/**
---
# AI subsystem
Scoped guidance for the AI generation subsystem (`proof-of-work/lib/ai/**` and the
generate/generations route handlers). Whole-repo rules live in `AGENTS.md`.
## Architecture
- `generate/route.ts` kicks off a **detached background runner** (`generationRunner.ts`)
and returns an id; the client attaches via SSE (`generations/[id]/stream`) and can also
poll the row. Navigating away does NOT cancel generation.
- System prompt = `systemPromptBase.ts` (output contract: JSON-only, library
`exerciseId`s only, suggested weights) + the template's coaching prompt +
`PROGRAM_OUTPUT_SHAPE` + library + optional history block (`historyContext.ts`).
- Multi-config: `AIConfigProfile` rows per user; `UserPreferences.activeAIConfigId`
points at the active one and is mirrored into the legacy `ai*` columns for back-compat.
## Two generation kinds (`AIGeneration.kind`)
The runner spine is shared by two output shapes, discriminated by `AIGeneration.kind`
("program" | "workout", default "program"). The runner picks the parser by kind and
stores the JSON in the (reused) `parsedProgram` column.
- **program** (`kind: 'program'`) — `generate/route.ts``programSchema.ts`
(`PROGRAM_OUTPUT_SHAPE` / `parseAIProgram`). Applied to DB rows via `apply.ts`.
Shown in AI · History (which filters `kind: 'program'`).
- **workout** (`kind: 'workout'`) — `generate-workout/route.ts` (uses
`workoutPrompt.ts` + `workoutSchema.ts`: `WORKOUT_OUTPUT_SHAPE` / `parseAIWorkout`).
A single day's session. **No server-side apply**: the client (`GenerateWorkoutClient.tsx`)
stashes the reviewed suggestion in `sessionStorage` and routes to
`/main/workouts/new?from=ai`, where `AiWorkoutPrefill.tsx` expands it (via
`workoutDraft.ts::buildPrefillExercises`) and pre-fills the normal `WorkoutForm`
nothing persists until the user saves through the regular workout path.
**Refine = a new workout generation** seeded with the prior suggestion JSON
(`priorWorkout` in the route body → REVISION mode in `workoutPrompt.ts`). These rows
are ephemeral, so they're excluded from the program-shaped AI · History.
- Adding a new kind: extend the union in `KickoffOpts`, add a parser + output-shape,
branch the parser selection in `generationRunner.ts`, and decide whether it belongs in
History (filtered by kind).
## Provider abstraction
- Each provider yields an async iterable of `GenerateChunk` (`text` / `usage` / `done` /
`error`); add new ones under `lib/ai/providers/` and register in `index.ts`.
`openai.ts` exports both `openai` and `openai-compatible`, so the five provider files
register **6** providers (`claude`, `openai`, `openai-compatible`, `gemini`, `ollama`,
`sparkcontrol`).
- **SparkControl** (`sparkcontrol.ts`) — the operator's own self-hosted local-inference
gateway. OpenAI-compatible wire format, so it reuses `generateOpenAIStyle` with
`{ requireApiKey: false }` (keyless on the LAN — the streamer omits the `Authorization`
header when no key is set). Reached over the **internal same-box StartOS address**
(`http://spark-control.startos:9999/v1`, plain HTTP — no TLS, no cert-skip). Custom base
URL ⇒ SSRF-guarded + admin-only, same as Ollama. The Settings UI auto-detects the loaded
vLLM model via `app/api/ai/sparkcontrol/model` (probes SparkControl's `/api/endpoints`
`vllm.model`), mirroring the Ollama `/api/tags` auto-detect. Free in the cost UI.
- **Base-URL hygiene:** only custom-URL providers (`requiresBaseUrl`: ollama,
openai-compatible, sparkcontrol) store a base URL. Both config write paths
(`configs` POST + `[id]` PATCH) null it for fixed-URL providers, and the Settings form
clears it on provider change — otherwise a stale URL silently rides along to
claude/openai/gemini, which ignore it and hit their hardcoded endpoints.
- Streaming AI uses SSE; partial JSON is recovered with `lib/ai/lenientJson.ts`.
- Pricing/model menus live in `lib/ai/pricing.ts` (`PRICES`, `MODEL_MENU`) — keep them
paired so every menu model has a price entry (there's a test enforcing this).
- **Adding a provider** (precedent: `sparkcontrol`, 1.2.0:7) is a fan-out across ~8 spots —
miss one and it half-works: the provider file + `ProviderId` union (`types.ts`) + register
in `providers/index.ts` (`ALL` + `PROVIDER_ORDER`); the zod `provider` enum in **both**
`configs` POST and `[id]` PATCH (+ `defaultName` PRETTY map); the UI `PROVIDERS` list in
`AIIntegration.tsx` (`requiresKey`/`requiresUrl` must mirror the server `requiresApiKey`/
`requiresBaseUrl`); `MODEL_MENU` (`[]` if no curated menu) + an `estimateCost` branch
(free/null for self-hosted). A custom-URL provider is admin-only + SSRF-guarded everywhere
(configs POST/PATCH, `ai/test`, any probe route) and must appear in those routes' 403
enumeration strings. `ai/test` and `generate` work for free once it's in `getProvider`.
## Model-output robustness (esp. local models)
Local models (Qwen via SparkControl, Ollama) don't honor the JSON contract as tightly
as the cloud APIs, so the parse/apply path is deliberately tolerant. Two layers, both
added after the first SparkControl run surfaced the failures live:
- **Decimal integers** (1.2.0:8): models emit `"rpe": 7.5` / `"reps": 8.0` where the
schema expects ints. `looseInt(z.number().int()…)` (`programSchema.ts`, used by
`workoutSchema.ts`) rounds a number to the nearest int **before** the `.int()` check —
wrap every integer field in both schemas with it. Transform-before-validate, so inferred
types are unchanged. Without it, one stray decimal fails the ENTIRE parse.
- **Exercise→library name matching** (1.2.0:9): models return a good `exerciseName` with a
null or invented `exerciseId`. `lib/ai/exerciseMatch.ts` (`resolveExerciseIds`) normalizes
the name (lowercase, strip the `(barbell)`-style qualifier + punctuation) and auto-maps
only **unique confident** matches; ambiguous/unknown stay null so the UI flags them for
manual mapping. Wired into BOTH generate flows at the parse→display boundary
(`GenerateWorkoutClient`, `GenerateClient`) — re-resolve there if you add a third flow.
- **Latency characteristic (not a bug):** a thinking model (Qwen3.x) spends most of its
output tokens on internal reasoning, streamed as `reasoning_content` — which the OpenAI
streamer ignores (it reads only `delta.content`). So `tokensOut` can be ~10× the visible
JSON and a generation runs minutes (e.g. 7.4k out, 2.8k-char JSON, ~3 min on a DGX Spark
at ~41 tok/s). The lever is **disabling thinking on the vLLM/SparkControl side** (or via a
`chat_template_kwargs:{enable_thinking:false}` request param); left on by owner's choice.
## SSRF / provider-URL safety
- Any `fetch` to a user-supplied provider base URL MUST go through
`assertSafeProviderUrl` (`lib/ai/safeUrl.ts`) first — it enforces http(s) and blocks
link-local/cloud-metadata (169.254/16, fe80::/10) + unspecified. **Private-LAN +
loopback are allowed on purpose** (reaching `ollama.startos`/LAN gateways is the
feature). Currently wired into `providers/ollama.ts`, the `openai-compatible` path in
`providers/openai.ts` (NOT the fixed `api.openai.com` path), and the `ai/ollama/models`
probe. Add the guard to any new user-URL fetch path.
- Custom-URL providers (those with `requiresBaseUrl`: ollama, openai-compatible) are
**admin-only**`isCustomUrlProvider` gates `ai/configs` POST + `[id]` PATCH + `ai/test`,
and `ai/ollama/models` is fully admin-only. The Settings UI hides them from non-admins.
This is a second defense layer on top of the IP block; keep both when adding routes.