Files
proof-of-work/docs/guides/ai-subsystem.md
T
Keysat d4557304a5
CI / proof-of-work (Next.js app) (push) Waiting to run
CI / start9/0.4 (StartOS package code) (push) Waiting to run
docs(ai): record model-output robustness patterns
Capture what the first live SparkControl/Qwen run taught: looseInt decimal tolerance, the exerciseMatch name->library auto-mapping, and the thinking-token latency characteristic + its lever. Durable subsystem knowledge for future sessions touching the generate flows.
2026-06-19 16:23:36 -05:00

7.6 KiB
Raw Blame History

paths
paths
proof-of-work/lib/ai/**
proof-of-work/app/api/ai/**

AI subsystem

Scoped guidance for the AI generation subsystem (proof-of-work/lib/ai/** and the generate/generations route handlers). Whole-repo rules live in AGENTS.md.

Architecture

  • generate/route.ts kicks off a detached background runner (generationRunner.ts) and returns an id; the client attaches via SSE (generations/[id]/stream) and can also poll the row. Navigating away does NOT cancel generation.
  • System prompt = systemPromptBase.ts (output contract: JSON-only, library exerciseIds only, suggested weights) + the template's coaching prompt + PROGRAM_OUTPUT_SHAPE + library + optional history block (historyContext.ts).
  • Multi-config: AIConfigProfile rows per user; UserPreferences.activeAIConfigId points at the active one and is mirrored into the legacy ai* columns for back-compat.

Two generation kinds (AIGeneration.kind)

The runner spine is shared by two output shapes, discriminated by AIGeneration.kind ("program" | "workout", default "program"). The runner picks the parser by kind and stores the JSON in the (reused) parsedProgram column.

  • program (kind: 'program') — generate/route.tsprogramSchema.ts (PROGRAM_OUTPUT_SHAPE / parseAIProgram). Applied to DB rows via apply.ts. Shown in AI · History (which filters kind: 'program').
  • workout (kind: 'workout') — generate-workout/route.ts (uses workoutPrompt.ts + workoutSchema.ts: WORKOUT_OUTPUT_SHAPE / parseAIWorkout). A single day's session. No server-side apply: the client (GenerateWorkoutClient.tsx) stashes the reviewed suggestion in sessionStorage and routes to /main/workouts/new?from=ai, where AiWorkoutPrefill.tsx expands it (via workoutDraft.ts::buildPrefillExercises) and pre-fills the normal WorkoutForm — nothing persists until the user saves through the regular workout path. Refine = a new workout generation seeded with the prior suggestion JSON (priorWorkout in the route body → REVISION mode in workoutPrompt.ts). These rows are ephemeral, so they're excluded from the program-shaped AI · History.
  • Adding a new kind: extend the union in KickoffOpts, add a parser + output-shape, branch the parser selection in generationRunner.ts, and decide whether it belongs in History (filtered by kind).

Provider abstraction

  • Each provider yields an async iterable of GenerateChunk (text / usage / done / error); add new ones under lib/ai/providers/ and register in index.ts. openai.ts exports both openai and openai-compatible, so the five provider files register 6 providers (claude, openai, openai-compatible, gemini, ollama, sparkcontrol).
  • SparkControl (sparkcontrol.ts) — the operator's own self-hosted local-inference gateway. OpenAI-compatible wire format, so it reuses generateOpenAIStyle with { requireApiKey: false } (keyless on the LAN — the streamer omits the Authorization header when no key is set). Reached over the internal same-box StartOS address (http://spark-control.startos:9999/v1, plain HTTP — no TLS, no cert-skip). Custom base URL ⇒ SSRF-guarded + admin-only, same as Ollama. The Settings UI auto-detects the loaded vLLM model via app/api/ai/sparkcontrol/model (probes SparkControl's /api/endpointsvllm.model), mirroring the Ollama /api/tags auto-detect. Free in the cost UI.
  • Base-URL hygiene: only custom-URL providers (requiresBaseUrl: ollama, openai-compatible, sparkcontrol) store a base URL. Both config write paths (configs POST + [id] PATCH) null it for fixed-URL providers, and the Settings form clears it on provider change — otherwise a stale URL silently rides along to claude/openai/gemini, which ignore it and hit their hardcoded endpoints.
  • Streaming AI uses SSE; partial JSON is recovered with lib/ai/lenientJson.ts.
  • Pricing/model menus live in lib/ai/pricing.ts (PRICES, MODEL_MENU) — keep them paired so every menu model has a price entry (there's a test enforcing this).
  • Adding a provider (precedent: sparkcontrol, 1.2.0:7) is a fan-out across ~8 spots — miss one and it half-works: the provider file + ProviderId union (types.ts) + register in providers/index.ts (ALL + PROVIDER_ORDER); the zod provider enum in both configs POST and [id] PATCH (+ defaultName PRETTY map); the UI PROVIDERS list in AIIntegration.tsx (requiresKey/requiresUrl must mirror the server requiresApiKey/ requiresBaseUrl); MODEL_MENU ([] if no curated menu) + an estimateCost branch (free/null for self-hosted). A custom-URL provider is admin-only + SSRF-guarded everywhere (configs POST/PATCH, ai/test, any probe route) and must appear in those routes' 403 enumeration strings. ai/test and generate work for free once it's in getProvider.

Model-output robustness (esp. local models)

Local models (Qwen via SparkControl, Ollama) don't honor the JSON contract as tightly as the cloud APIs, so the parse/apply path is deliberately tolerant. Two layers, both added after the first SparkControl run surfaced the failures live:

  • Decimal integers (1.2.0:8): models emit "rpe": 7.5 / "reps": 8.0 where the schema expects ints. looseInt(z.number().int()…) (programSchema.ts, used by workoutSchema.ts) rounds a number to the nearest int before the .int() check — wrap every integer field in both schemas with it. Transform-before-validate, so inferred types are unchanged. Without it, one stray decimal fails the ENTIRE parse.
  • Exercise→library name matching (1.2.0:9): models return a good exerciseName with a null or invented exerciseId. lib/ai/exerciseMatch.ts (resolveExerciseIds) normalizes the name (lowercase, strip the (barbell)-style qualifier + punctuation) and auto-maps only unique confident matches; ambiguous/unknown stay null so the UI flags them for manual mapping. Wired into BOTH generate flows at the parse→display boundary (GenerateWorkoutClient, GenerateClient) — re-resolve there if you add a third flow.
  • Latency characteristic (not a bug): a thinking model (Qwen3.x) spends most of its output tokens on internal reasoning, streamed as reasoning_content — which the OpenAI streamer ignores (it reads only delta.content). So tokensOut can be ~10× the visible JSON and a generation runs minutes (e.g. 7.4k out, 2.8k-char JSON, ~3 min on a DGX Spark at ~41 tok/s). The lever is disabling thinking on the vLLM/SparkControl side (or via a chat_template_kwargs:{enable_thinking:false} request param); left on by owner's choice.

SSRF / provider-URL safety

  • Any fetch to a user-supplied provider base URL MUST go through assertSafeProviderUrl (lib/ai/safeUrl.ts) first — it enforces http(s) and blocks link-local/cloud-metadata (169.254/16, fe80::/10) + unspecified. Private-LAN + loopback are allowed on purpose (reaching ollama.startos/LAN gateways is the feature). Currently wired into providers/ollama.ts, the openai-compatible path in providers/openai.ts (NOT the fixed api.openai.com path), and the ai/ollama/models probe. Add the guard to any new user-URL fetch path.
  • Custom-URL providers (those with requiresBaseUrl: ollama, openai-compatible) are admin-onlyisCustomUrlProvider gates ai/configs POST + [id] PATCH + ai/test, and ai/ollama/models is fully admin-only. The Settings UI hides them from non-admins. This is a second defense layer on top of the IP block; keep both when adding routes.