docs(ai): record model-output robustness patterns
Capture what the first live SparkControl/Qwen run taught: looseInt decimal tolerance, the exerciseMatch name->library auto-mapping, and the thinking-token latency characteristic + its lever. Durable subsystem knowledge for future sessions touching the generate flows.
This commit is contained in:
@@ -76,6 +76,30 @@ stores the JSON in the (reused) `parsedProgram` column.
|
|||||||
(configs POST/PATCH, `ai/test`, any probe route) and must appear in those routes' 403
|
(configs POST/PATCH, `ai/test`, any probe route) and must appear in those routes' 403
|
||||||
enumeration strings. `ai/test` and `generate` work for free once it's in `getProvider`.
|
enumeration strings. `ai/test` and `generate` work for free once it's in `getProvider`.
|
||||||
|
|
||||||
|
## Model-output robustness (esp. local models)
|
||||||
|
|
||||||
|
Local models (Qwen via SparkControl, Ollama) don't honor the JSON contract as tightly
|
||||||
|
as the cloud APIs, so the parse/apply path is deliberately tolerant. Two layers, both
|
||||||
|
added after the first SparkControl run surfaced the failures live:
|
||||||
|
|
||||||
|
- **Decimal integers** (1.2.0:8): models emit `"rpe": 7.5` / `"reps": 8.0` where the
|
||||||
|
schema expects ints. `looseInt(z.number().int()…)` (`programSchema.ts`, used by
|
||||||
|
`workoutSchema.ts`) rounds a number to the nearest int **before** the `.int()` check —
|
||||||
|
wrap every integer field in both schemas with it. Transform-before-validate, so inferred
|
||||||
|
types are unchanged. Without it, one stray decimal fails the ENTIRE parse.
|
||||||
|
- **Exercise→library name matching** (1.2.0:9): models return a good `exerciseName` with a
|
||||||
|
null or invented `exerciseId`. `lib/ai/exerciseMatch.ts` (`resolveExerciseIds`) normalizes
|
||||||
|
the name (lowercase, strip the `(barbell)`-style qualifier + punctuation) and auto-maps
|
||||||
|
only **unique confident** matches; ambiguous/unknown stay null so the UI flags them for
|
||||||
|
manual mapping. Wired into BOTH generate flows at the parse→display boundary
|
||||||
|
(`GenerateWorkoutClient`, `GenerateClient`) — re-resolve there if you add a third flow.
|
||||||
|
- **Latency characteristic (not a bug):** a thinking model (Qwen3.x) spends most of its
|
||||||
|
output tokens on internal reasoning, streamed as `reasoning_content` — which the OpenAI
|
||||||
|
streamer ignores (it reads only `delta.content`). So `tokensOut` can be ~10× the visible
|
||||||
|
JSON and a generation runs minutes (e.g. 7.4k out, 2.8k-char JSON, ~3 min on a DGX Spark
|
||||||
|
at ~41 tok/s). The lever is **disabling thinking on the vLLM/SparkControl side** (or via a
|
||||||
|
`chat_template_kwargs:{enable_thinking:false}` request param); left on by owner's choice.
|
||||||
|
|
||||||
## SSRF / provider-URL safety
|
## SSRF / provider-URL safety
|
||||||
|
|
||||||
- Any `fetch` to a user-supplied provider base URL MUST go through
|
- Any `fetch` to a user-supplied provider base URL MUST go through
|
||||||
|
|||||||
Reference in New Issue
Block a user