Commit Graph

2 Commits

Author SHA1 Message Date
Keysat 197655a62b v0.9.0:2 - audio proxy: turn Parakeet wedge 500 into clean 503 + immediate auto-restart
Parakeet's recurring CUDA wedge (CUBLAS_STATUS_*_ERROR mid-attention)
fires reliably on Open WebUI's WebM/Opus->MP3 audio. Previously the
proxy relayed the upstream 500 verbatim, Open WebUI showed "Server
connection error" with no signal to retry, and recovery took up to
5 minutes (waiting for the next periodic deep-health probe).

Now the proxy:
  1. Detects 500 from /v1/audio/transcriptions
  2. Fires deep_health.run_one("parakeet") as a background asyncio task
     (which contains the same wedge-detect + rate-limited auto-restart
     logic, but runs immediately instead of waiting for the next tick)
  3. Returns 503 with a clear detail message and Retry-After: 60

The client (Open WebUI, Home Assistant, etc.) gets a proper retry
signal; the auto-restart triggers inside seconds; the next attempt
~60s later succeeds. Rate-limiting (3 restarts per 30 min) is
inherited from the deep-health module so this can't cause restart
storms.

server.py: pass deep_health into build_audio_router().
audio_proxy.py: new 503-with-restart branch; signature now accepts
                deep_health as an optional dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:07:35 -05:00
Keysat f44e7f8b03 v0.9.0:0 - OpenAI-compatible audio proxy for Open WebUI / Home Assistant
Adds three new endpoints to spark-control that translate OpenAI's
audio API shapes to the Parakeet (STT) and Magpie (TTS, NVIDIA Riva)
services on the Sparks:

  GET  /v1/models                — STT model + Magpie's 60+ voices
  POST /v1/audio/speech          — OpenAI body -> Magpie multipart synthesize
                                    (returns audio/wav passthrough)
  POST /v1/audio/transcriptions  — relay to Parakeet (already compatible)

Verified shapes against the live services:
  - Parakeet returns OpenAI-style {"text": "..."} or verbose_json with
    segments+words. Already a perfect drop-in for OpenAI clients.
  - Magpie returns raw WAV bytes with Content-Type: audio/wav. NOT
    base64-wrapped JSON as one might assume. The proxy is literally a
    body-translation on the request side; response is passthrough.

Voice language is auto-derived from the voice name (e.g.
Magpie-Multilingual.EN-US.Mia -> language=en-US) so clients don't
need to set it explicitly.

Open WebUI / Home Assistant / Recap Relay can now all point at one
URL — https://<spark-control>.local/v1 — and get LLM, STT, TTS
behind a single identity. No shim service to deploy.

Pure addition: no existing routes touched; the dashboard, /api/*,
download flow, deep-health, hardware probes are all unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:41:48 -05:00