342e150266
Aligned with sibling recipes in eugr/spark-vllm-docker. Applies on next swap to each model. First real swap gemma4 -> qwen36 succeeded in 5:30 with --moe_backend=flashinfer_cutlass.
43 lines
2.3 KiB
Markdown
43 lines
2.3 KiB
Markdown
# Known issues
|
|
|
|
## magpie-tts crash loop (Spark 2)
|
|
|
|
**What Magpie is:** NVIDIA's multilingual text-to-speech (TTS) model, served via the NIM (NVIDIA Inference Microservices) framework — a Riva Speech Server container that converts text into spoken audio. It's the counterpart to Parakeet (which is speech-to-text / STT). When working, it exposes `/v1/audio/speech` on port 9000 and is used by clients like Open WebUI for the "read aloud" feature.
|
|
|
|
The `magpie-tts` container at `nvcr.io/nim/nvidia/magpie-tts-multilingual:latest` is in a restart loop and `:9000` is not reachable. **Status as of 2026-05-12: unfixed. UI surfaces a red dot.**
|
|
|
|
**Root cause (from `docker logs magpie-tts`):**
|
|
|
|
```
|
|
nimlib.exceptions.ManifestDownloadError: Error downloading manifest:
|
|
I/O error Permission denied (os error 13)
|
|
```
|
|
|
|
The container exits 1 from `nimutils.download_models()` when fetching `nim/nvidia/magpie-tts-multilingual` model files from NGC. The "permission denied" is a local filesystem error — the container can't write the model cache where it expects to.
|
|
|
|
**To diagnose further:**
|
|
|
|
```bash
|
|
ssh <spark-user>@<spark-2-ip>
|
|
docker inspect magpie-tts | jq '.[].HostConfig.Mounts, .[].Config.Env'
|
|
# Look for: the mount path for the model cache, and whether NGC_API_KEY is set.
|
|
```
|
|
|
|
**Likely fixes (untried):**
|
|
|
|
1. Chown the bind-mounted cache directory on Spark 2 to the UID the container runs as.
|
|
2. Set an `NGC_API_KEY` env var (NIM containers need this for non-public artifacts).
|
|
3. Confirm there's free disk space.
|
|
|
|
## Qwen3.6-35B-A3B `--moe_backend=flashinfer_cutlass` may fail on launch
|
|
|
|
This flag is Blackwell-specific. If vLLM in the container reports `unrecognized arguments: --moe_backend` or similar, edit `models.yaml` for `qwen36` and drop that flag. The swap UI does NOT auto-fallback in v0.1 — failure surfaces in the log stream.
|
|
|
|
## Two SSH paths to Spark 1 from the laptop
|
|
|
|
`ssh <spark-user>@<spark-1-ip>` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for `<spark-1-host>.local`. Always use the `.local` hostname or `<spark-2-ip>`-style entries that ARE matched.
|
|
|
|
## Older models in `models.yaml`
|
|
|
|
The `qwen3-235b-fp8` and `qwen25-72b` catalog entries are conservative guesses for vLLM flags — they're on disk but were never the focus of this project. First launch of either may fail or be suboptimal; capture working flags here.
|