d4557304a5
Capture what the first live SparkControl/Qwen run taught: looseInt decimal tolerance, the exerciseMatch name->library auto-mapping, and the thinking-token latency characteristic + its lever. Durable subsystem knowledge for future sessions touching the generate flows.
7.6 KiB
7.6 KiB
paths
| paths | ||
|---|---|---|
|
AI subsystem
Scoped guidance for the AI generation subsystem (proof-of-work/lib/ai/** and the
generate/generations route handlers). Whole-repo rules live in AGENTS.md.
Architecture
generate/route.tskicks off a detached background runner (generationRunner.ts) and returns an id; the client attaches via SSE (generations/[id]/stream) and can also poll the row. Navigating away does NOT cancel generation.- System prompt =
systemPromptBase.ts(output contract: JSON-only, libraryexerciseIds only, suggested weights) + the template's coaching prompt +PROGRAM_OUTPUT_SHAPE+ library + optional history block (historyContext.ts). - Multi-config:
AIConfigProfilerows per user;UserPreferences.activeAIConfigIdpoints at the active one and is mirrored into the legacyai*columns for back-compat.
Two generation kinds (AIGeneration.kind)
The runner spine is shared by two output shapes, discriminated by AIGeneration.kind
("program" | "workout", default "program"). The runner picks the parser by kind and
stores the JSON in the (reused) parsedProgram column.
- program (
kind: 'program') —generate/route.ts→programSchema.ts(PROGRAM_OUTPUT_SHAPE/parseAIProgram). Applied to DB rows viaapply.ts. Shown in AI · History (which filterskind: 'program'). - workout (
kind: 'workout') —generate-workout/route.ts(usesworkoutPrompt.ts+workoutSchema.ts:WORKOUT_OUTPUT_SHAPE/parseAIWorkout). A single day's session. No server-side apply: the client (GenerateWorkoutClient.tsx) stashes the reviewed suggestion insessionStorageand routes to/main/workouts/new?from=ai, whereAiWorkoutPrefill.tsxexpands it (viaworkoutDraft.ts::buildPrefillExercises) and pre-fills the normalWorkoutForm— nothing persists until the user saves through the regular workout path. Refine = a new workout generation seeded with the prior suggestion JSON (priorWorkoutin the route body → REVISION mode inworkoutPrompt.ts). These rows are ephemeral, so they're excluded from the program-shaped AI · History. - Adding a new kind: extend the union in
KickoffOpts, add a parser + output-shape, branch the parser selection ingenerationRunner.ts, and decide whether it belongs in History (filtered by kind).
Provider abstraction
- Each provider yields an async iterable of
GenerateChunk(text/usage/done/error); add new ones underlib/ai/providers/and register inindex.ts.openai.tsexports bothopenaiandopenai-compatible, so the five provider files register 6 providers (claude,openai,openai-compatible,gemini,ollama,sparkcontrol). - SparkControl (
sparkcontrol.ts) — the operator's own self-hosted local-inference gateway. OpenAI-compatible wire format, so it reusesgenerateOpenAIStylewith{ requireApiKey: false }(keyless on the LAN — the streamer omits theAuthorizationheader when no key is set). Reached over the internal same-box StartOS address (http://spark-control.startos:9999/v1, plain HTTP — no TLS, no cert-skip). Custom base URL ⇒ SSRF-guarded + admin-only, same as Ollama. The Settings UI auto-detects the loaded vLLM model viaapp/api/ai/sparkcontrol/model(probes SparkControl's/api/endpoints→vllm.model), mirroring the Ollama/api/tagsauto-detect. Free in the cost UI. - Base-URL hygiene: only custom-URL providers (
requiresBaseUrl: ollama, openai-compatible, sparkcontrol) store a base URL. Both config write paths (configsPOST +[id]PATCH) null it for fixed-URL providers, and the Settings form clears it on provider change — otherwise a stale URL silently rides along to claude/openai/gemini, which ignore it and hit their hardcoded endpoints. - Streaming AI uses SSE; partial JSON is recovered with
lib/ai/lenientJson.ts. - Pricing/model menus live in
lib/ai/pricing.ts(PRICES,MODEL_MENU) — keep them paired so every menu model has a price entry (there's a test enforcing this). - Adding a provider (precedent:
sparkcontrol, 1.2.0:7) is a fan-out across ~8 spots — miss one and it half-works: the provider file +ProviderIdunion (types.ts) + register inproviders/index.ts(ALL+PROVIDER_ORDER); the zodproviderenum in bothconfigsPOST and[id]PATCH (+defaultNamePRETTY map); the UIPROVIDERSlist inAIIntegration.tsx(requiresKey/requiresUrlmust mirror the serverrequiresApiKey/requiresBaseUrl);MODEL_MENU([]if no curated menu) + anestimateCostbranch (free/null for self-hosted). A custom-URL provider is admin-only + SSRF-guarded everywhere (configs POST/PATCH,ai/test, any probe route) and must appear in those routes' 403 enumeration strings.ai/testandgeneratework for free once it's ingetProvider.
Model-output robustness (esp. local models)
Local models (Qwen via SparkControl, Ollama) don't honor the JSON contract as tightly as the cloud APIs, so the parse/apply path is deliberately tolerant. Two layers, both added after the first SparkControl run surfaced the failures live:
- Decimal integers (1.2.0:8): models emit
"rpe": 7.5/"reps": 8.0where the schema expects ints.looseInt(z.number().int()…)(programSchema.ts, used byworkoutSchema.ts) rounds a number to the nearest int before the.int()check — wrap every integer field in both schemas with it. Transform-before-validate, so inferred types are unchanged. Without it, one stray decimal fails the ENTIRE parse. - Exercise→library name matching (1.2.0:9): models return a good
exerciseNamewith a null or inventedexerciseId.lib/ai/exerciseMatch.ts(resolveExerciseIds) normalizes the name (lowercase, strip the(barbell)-style qualifier + punctuation) and auto-maps only unique confident matches; ambiguous/unknown stay null so the UI flags them for manual mapping. Wired into BOTH generate flows at the parse→display boundary (GenerateWorkoutClient,GenerateClient) — re-resolve there if you add a third flow. - Latency characteristic (not a bug): a thinking model (Qwen3.x) spends most of its
output tokens on internal reasoning, streamed as
reasoning_content— which the OpenAI streamer ignores (it reads onlydelta.content). SotokensOutcan be ~10× the visible JSON and a generation runs minutes (e.g. 7.4k out, 2.8k-char JSON, ~3 min on a DGX Spark at ~41 tok/s). The lever is disabling thinking on the vLLM/SparkControl side (or via achat_template_kwargs:{enable_thinking:false}request param); left on by owner's choice.
SSRF / provider-URL safety
- Any
fetchto a user-supplied provider base URL MUST go throughassertSafeProviderUrl(lib/ai/safeUrl.ts) first — it enforces http(s) and blocks link-local/cloud-metadata (169.254/16, fe80::/10) + unspecified. Private-LAN + loopback are allowed on purpose (reachingollama.startos/LAN gateways is the feature). Currently wired intoproviders/ollama.ts, theopenai-compatiblepath inproviders/openai.ts(NOT the fixedapi.openai.compath), and theai/ollama/modelsprobe. Add the guard to any new user-URL fetch path. - Custom-URL providers (those with
requiresBaseUrl: ollama, openai-compatible) are admin-only —isCustomUrlProvidergatesai/configsPOST +[id]PATCH +ai/test, andai/ollama/modelsis fully admin-only. The Settings UI hides them from non-admins. This is a second defense layer on top of the IP block; keep both when adding routes.