v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API
- Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests) - Add embeddings proxy and spark_embed service (Dockerfile + main.py) - Expand audio_proxy with speaker-aware handling; deep_health/health/server updates - Package: configureSparks action + sparkConfig model updates, manifest/main wiring - Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh
This commit is contained in:
+9
-1
@@ -1,6 +1,14 @@
|
||||
# Known issues
|
||||
|
||||
## ~~magpie-tts crash loop (Spark 2)~~ — RESOLVED 2026-05-12
|
||||
## Magpie removed in v0.14.0 (2026-06-03)
|
||||
|
||||
**Why**: Magpie/Riva's TTS decoder had a structural defect — ~30% truncation rate at short inputs, ~50%+ at multi-sentence inputs, fresh-container restart did not help. Reproduced server-side and confirmed in Riva's own logs (status:0 with implausibly short audio_duration). Switching to Riva's streaming endpoint did not help — same failure rate. Even with v0.13.0:5's retry layer and v0.13.0:6's server-side chunking, end-to-end reliability capped at ~85%.
|
||||
|
||||
**What replaced it**: Kokoro-82M (Apache 2.0) via `ghcr.io/remsky/kokoro-fastapi-gpu`. 24/24 successful renders across the same input lengths that broke Magpie 13/24 times, ~1s wallclock per call, 1.3 GB GPU memory (vs Magpie's 49 GB). No retry/chunking layer needed in the proxy. Default voice `bm_george`; curated quick-picks include `bf_emma`, `am_michael`, `af_heart`.
|
||||
|
||||
The old chunking/retry workaround in `audio_proxy.py` and the Magpie sections in the dashboard, config, services, and deep_health modules were all removed in v0.14.0. Migration: existing users need to pull and run the Kokoro container on Spark 2 (one `docker run` command), then either let Spark Control auto-discover it or update Configure Sparks if running on a non-default host.
|
||||
|
||||
## ~~magpie-tts crash loop (Spark 2)~~ — RESOLVED 2026-05-12, then Magpie removed entirely 2026-06-03
|
||||
|
||||
**What Magpie is:** NVIDIA's multilingual text-to-speech (TTS) model, served via the NIM (NVIDIA Inference Microservices) framework — a Riva Speech Server container that converts text into spoken audio. It's the counterpart to Parakeet (which is speech-to-text / STT). When working, it exposes `/v1/audio/speech` on port 9000 and is used by clients like Open WebUI for the "read aloud" feature.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user