197655a62b
Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention)
fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the
proxy relayed the upstream 500 verbatim, Open WebUI showed "Server
connection error" with no signal to retry, and recovery took up to
5 minutes (waiting for the next periodic deep-health probe).
Now the proxy:
1. Detects 500 from /v1/audio/transcriptions
2. Fires deep_health.run_one("parakeet") as a background asyncio task
(which contains the same wedge-detect + rate-limited auto-restart
logic, but runs immediately instead of waiting for the next tick)
3. Returns 503 with a clear detail message and Retry-After: 60
The client (Open WebUI, Home Assistant, etc.) gets a proper retry
signal; the auto-restart triggers inside seconds; the next attempt
~60s later succeeds. Rate-limiting (3 restarts per 30 min) is
inherited from the deep-health module so this can't cause restart
storms.
server.py: pass deep_health into build_audio_router().
audio_proxy.py: new 503-with-restart branch; signature now accepts
deep_health as an optional dependency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 lines
965 B
TypeScript
14 lines
965 B
TypeScript
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
|
|
|
export const v0_1_0 = VersionInfo.of({
|
|
version: '0.9.0:2',
|
|
releaseNotes: {
|
|
en_US:
|
|
'v0.9.0:2 — Open WebUI voice mode UX fix. Parakeet has a recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention) that fires reliably on Open WebUI\'s WebM/Opus→MP3 audio. Previously the proxy just relayed the upstream 500, Open WebUI showed "Server connection error", and you had to wait up to 5 min for the periodic deep-health probe to detect+restart Parakeet. Now: when Parakeet returns 500, the proxy fires deep-health\'s probe immediately in the background (which contains the same wedge-detect + rate-limited auto-restart logic) and returns 503 with Retry-After: 60 instead. The client gets a clear retry signal and the auto-restart kicks in within seconds. Retrying ~60s later should succeed reliably.',
|
|
},
|
|
migrations: {
|
|
up: async ({ effects }) => {},
|
|
down: IMPOSSIBLE,
|
|
},
|
|
})
|