Add safe optimization flags to gemma4 + qwen36 (fastsafetensors, prefix-caching, fp8 kv)
Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model. First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
This commit is contained in:
@@ -2,6 +2,8 @@
|
||||
|
||||
## magpie-tts crash loop (Spark 2)
|
||||
|
||||
**What Magpie is:** NVIDIA's multilingual text-to-speech (TTS) model, served via the NIM (NVIDIA Inference Microservices) framework — a Riva Speech Server container that converts text into spoken audio. It's the counterpart to Parakeet (which is speech-to-text / STT). When working, it exposes `/v1/audio/speech` on port 9000 and is used by clients like Open WebUI for the "read aloud" feature.
|
||||
|
||||
The `magpie-tts` container at `nvcr.io/nim/nvidia/magpie-tts-multilingual:latest` is in a restart loop and `:9000` is not reachable. **Status as of 2026-05-12: unfixed. UI surfaces a red dot.**
|
||||
|
||||
**Root cause (from `docker logs magpie-tts`):**
|
||||
|
||||
Reference in New Issue
Block a user