Package v0.2.12→v0.2.124: manifest, actions, version graph

This commit is contained in:
Keysat
2026-06-13 13:36:30 -05:00
parent 318c6c4b81
commit 1243f4414c
126 changed files with 2052 additions and 441 deletions
+13
View File
@@ -0,0 +1,13 @@
import { VersionInfo } from '@start9labs/start-sdk'
export const v_0_2_114 = VersionInfo.of({
version: '0.2.114:0',
releaseNotes: {
en_US:
"Spark Control 503 'CUDA wedge' retry — proper multi-attempt loop on both transcribe AND diarize paths. The Parakeet wrapper and the Sortformer/TitaNet wrapper both return HTTP 503 with a body of {detail: 'Parakeet returned a transient error (likely CUDA wedge). Auto-restart triggered; retry in ~60s.'} when the GPU container wedges and Spark Control kicks an automatic container restart. Before this ship: the diarize path retried ONCE on 503 (then failed), and the transcribe path didn't have a 503-aware retry at all — its 'retrying bare' fallback was a PARAMETER fix (drop verbose_json) that ran immediately and just hit the same wedge again, then permanently failed the chunk. Symptom: a multi-chunk job that starts two chunks in parallel can hit a single wedge at job start and lose both chunks, breaking the whole pipeline. Now: each path waits and retries up to 4 attempts total per chunk, honoring the Retry-After header when supplied (clamped to [5s, 120s], default 60s if absent), with ±5s jitter so parallel chunks don't synchronize their wake-ups and pile back onto the freshly-restarted container at the exact same instant (would just re-wedge). The transcribe path's existing rich → bare parameter fallback still runs for non-503 4xx/5xx (a different failure mode). Error messages now distinguish 503-after-N-retries ('operator container may need manual restart') from one-shot failures. Net effect for an operator on Spark Control: a fresh container restart that takes 60-120s now mostly gets absorbed by the retry loop instead of killing the pipeline.",
},
migrations: {
up: async ({ effects }) => {},
down: async ({ effects }) => {},
},
})