d4557304a5
Capture what the first live SparkControl/Qwen run taught: looseInt decimal tolerance, the exerciseMatch name->library auto-mapping, and the thinking-token latency characteristic + its lever. Durable subsystem knowledge for future sessions touching the generate flows.
116 lines
7.6 KiB
Markdown
116 lines
7.6 KiB
Markdown
---
|
||
paths:
|
||
- proof-of-work/lib/ai/**
|
||
- proof-of-work/app/api/ai/**
|
||
---
|
||
|
||
# AI subsystem
|
||
|
||
Scoped guidance for the AI generation subsystem (`proof-of-work/lib/ai/**` and the
|
||
generate/generations route handlers). Whole-repo rules live in `AGENTS.md`.
|
||
|
||
## Architecture
|
||
|
||
- `generate/route.ts` kicks off a **detached background runner** (`generationRunner.ts`)
|
||
and returns an id; the client attaches via SSE (`generations/[id]/stream`) and can also
|
||
poll the row. Navigating away does NOT cancel generation.
|
||
- System prompt = `systemPromptBase.ts` (output contract: JSON-only, library
|
||
`exerciseId`s only, suggested weights) + the template's coaching prompt +
|
||
`PROGRAM_OUTPUT_SHAPE` + library + optional history block (`historyContext.ts`).
|
||
- Multi-config: `AIConfigProfile` rows per user; `UserPreferences.activeAIConfigId`
|
||
points at the active one and is mirrored into the legacy `ai*` columns for back-compat.
|
||
|
||
## Two generation kinds (`AIGeneration.kind`)
|
||
|
||
The runner spine is shared by two output shapes, discriminated by `AIGeneration.kind`
|
||
("program" | "workout", default "program"). The runner picks the parser by kind and
|
||
stores the JSON in the (reused) `parsedProgram` column.
|
||
|
||
- **program** (`kind: 'program'`) — `generate/route.ts` → `programSchema.ts`
|
||
(`PROGRAM_OUTPUT_SHAPE` / `parseAIProgram`). Applied to DB rows via `apply.ts`.
|
||
Shown in AI · History (which filters `kind: 'program'`).
|
||
- **workout** (`kind: 'workout'`) — `generate-workout/route.ts` (uses
|
||
`workoutPrompt.ts` + `workoutSchema.ts`: `WORKOUT_OUTPUT_SHAPE` / `parseAIWorkout`).
|
||
A single day's session. **No server-side apply**: the client (`GenerateWorkoutClient.tsx`)
|
||
stashes the reviewed suggestion in `sessionStorage` and routes to
|
||
`/main/workouts/new?from=ai`, where `AiWorkoutPrefill.tsx` expands it (via
|
||
`workoutDraft.ts::buildPrefillExercises`) and pre-fills the normal `WorkoutForm` —
|
||
nothing persists until the user saves through the regular workout path.
|
||
**Refine = a new workout generation** seeded with the prior suggestion JSON
|
||
(`priorWorkout` in the route body → REVISION mode in `workoutPrompt.ts`). These rows
|
||
are ephemeral, so they're excluded from the program-shaped AI · History.
|
||
- Adding a new kind: extend the union in `KickoffOpts`, add a parser + output-shape,
|
||
branch the parser selection in `generationRunner.ts`, and decide whether it belongs in
|
||
History (filtered by kind).
|
||
|
||
## Provider abstraction
|
||
|
||
- Each provider yields an async iterable of `GenerateChunk` (`text` / `usage` / `done` /
|
||
`error`); add new ones under `lib/ai/providers/` and register in `index.ts`.
|
||
`openai.ts` exports both `openai` and `openai-compatible`, so the five provider files
|
||
register **6** providers (`claude`, `openai`, `openai-compatible`, `gemini`, `ollama`,
|
||
`sparkcontrol`).
|
||
- **SparkControl** (`sparkcontrol.ts`) — the operator's own self-hosted local-inference
|
||
gateway. OpenAI-compatible wire format, so it reuses `generateOpenAIStyle` with
|
||
`{ requireApiKey: false }` (keyless on the LAN — the streamer omits the `Authorization`
|
||
header when no key is set). Reached over the **internal same-box StartOS address**
|
||
(`http://spark-control.startos:9999/v1`, plain HTTP — no TLS, no cert-skip). Custom base
|
||
URL ⇒ SSRF-guarded + admin-only, same as Ollama. The Settings UI auto-detects the loaded
|
||
vLLM model via `app/api/ai/sparkcontrol/model` (probes SparkControl's `/api/endpoints`
|
||
→ `vllm.model`), mirroring the Ollama `/api/tags` auto-detect. Free in the cost UI.
|
||
- **Base-URL hygiene:** only custom-URL providers (`requiresBaseUrl`: ollama,
|
||
openai-compatible, sparkcontrol) store a base URL. Both config write paths
|
||
(`configs` POST + `[id]` PATCH) null it for fixed-URL providers, and the Settings form
|
||
clears it on provider change — otherwise a stale URL silently rides along to
|
||
claude/openai/gemini, which ignore it and hit their hardcoded endpoints.
|
||
- Streaming AI uses SSE; partial JSON is recovered with `lib/ai/lenientJson.ts`.
|
||
- Pricing/model menus live in `lib/ai/pricing.ts` (`PRICES`, `MODEL_MENU`) — keep them
|
||
paired so every menu model has a price entry (there's a test enforcing this).
|
||
- **Adding a provider** (precedent: `sparkcontrol`, 1.2.0:7) is a fan-out across ~8 spots —
|
||
miss one and it half-works: the provider file + `ProviderId` union (`types.ts`) + register
|
||
in `providers/index.ts` (`ALL` + `PROVIDER_ORDER`); the zod `provider` enum in **both**
|
||
`configs` POST and `[id]` PATCH (+ `defaultName` PRETTY map); the UI `PROVIDERS` list in
|
||
`AIIntegration.tsx` (`requiresKey`/`requiresUrl` must mirror the server `requiresApiKey`/
|
||
`requiresBaseUrl`); `MODEL_MENU` (`[]` if no curated menu) + an `estimateCost` branch
|
||
(free/null for self-hosted). A custom-URL provider is admin-only + SSRF-guarded everywhere
|
||
(configs POST/PATCH, `ai/test`, any probe route) and must appear in those routes' 403
|
||
enumeration strings. `ai/test` and `generate` work for free once it's in `getProvider`.
|
||
|
||
## Model-output robustness (esp. local models)
|
||
|
||
Local models (Qwen via SparkControl, Ollama) don't honor the JSON contract as tightly
|
||
as the cloud APIs, so the parse/apply path is deliberately tolerant. Two layers, both
|
||
added after the first SparkControl run surfaced the failures live:
|
||
|
||
- **Decimal integers** (1.2.0:8): models emit `"rpe": 7.5` / `"reps": 8.0` where the
|
||
schema expects ints. `looseInt(z.number().int()…)` (`programSchema.ts`, used by
|
||
`workoutSchema.ts`) rounds a number to the nearest int **before** the `.int()` check —
|
||
wrap every integer field in both schemas with it. Transform-before-validate, so inferred
|
||
types are unchanged. Without it, one stray decimal fails the ENTIRE parse.
|
||
- **Exercise→library name matching** (1.2.0:9): models return a good `exerciseName` with a
|
||
null or invented `exerciseId`. `lib/ai/exerciseMatch.ts` (`resolveExerciseIds`) normalizes
|
||
the name (lowercase, strip the `(barbell)`-style qualifier + punctuation) and auto-maps
|
||
only **unique confident** matches; ambiguous/unknown stay null so the UI flags them for
|
||
manual mapping. Wired into BOTH generate flows at the parse→display boundary
|
||
(`GenerateWorkoutClient`, `GenerateClient`) — re-resolve there if you add a third flow.
|
||
- **Latency characteristic (not a bug):** a thinking model (Qwen3.x) spends most of its
|
||
output tokens on internal reasoning, streamed as `reasoning_content` — which the OpenAI
|
||
streamer ignores (it reads only `delta.content`). So `tokensOut` can be ~10× the visible
|
||
JSON and a generation runs minutes (e.g. 7.4k out, 2.8k-char JSON, ~3 min on a DGX Spark
|
||
at ~41 tok/s). The lever is **disabling thinking on the vLLM/SparkControl side** (or via a
|
||
`chat_template_kwargs:{enable_thinking:false}` request param); left on by owner's choice.
|
||
|
||
## SSRF / provider-URL safety
|
||
|
||
- Any `fetch` to a user-supplied provider base URL MUST go through
|
||
`assertSafeProviderUrl` (`lib/ai/safeUrl.ts`) first — it enforces http(s) and blocks
|
||
link-local/cloud-metadata (169.254/16, fe80::/10) + unspecified. **Private-LAN +
|
||
loopback are allowed on purpose** (reaching `ollama.startos`/LAN gateways is the
|
||
feature). Currently wired into `providers/ollama.ts`, the `openai-compatible` path in
|
||
`providers/openai.ts` (NOT the fixed `api.openai.com` path), and the `ai/ollama/models`
|
||
probe. Add the guard to any new user-URL fetch path.
|
||
- Custom-URL providers (those with `requiresBaseUrl`: ollama, openai-compatible) are
|
||
**admin-only** — `isCustomUrlProvider` gates `ai/configs` POST + `[id]` PATCH + `ai/test`,
|
||
and `ai/ollama/models` is fully admin-only. The Settings UI hides them from non-admins.
|
||
This is a second defense layer on top of the IP block; keep both when adding routes.
|