Compare commits
43 Commits
v0.2
..
0ba2a3a3fc
| Author | SHA1 | Date | |
|---|---|---|---|
| 0ba2a3a3fc | |||
| b87cb0f99b | |||
| 346df907d2 | |||
| 82224f53e7 | |||
| a7102105aa | |||
| ddfd508c2f | |||
| 6664543dec | |||
| 33768ae3d7 | |||
| 8a5862dcb2 | |||
| 589b3e59ab | |||
| d67152624e | |||
| 26382dc932 | |||
| c07eaeb4ee | |||
| 36ca99f73b | |||
| 5662d957af | |||
| 5a5634a3a9 | |||
| 9edb70418e | |||
| b7be1bab24 | |||
| 6b799113c4 | |||
| d90e7a230a | |||
| 6209c40f79 | |||
| e332363004 | |||
| ea35ac03ef | |||
| 1a86fb0bf0 | |||
| 66be0c1fc1 | |||
| 91b5d6d6a6 | |||
| 4c67ccd28d | |||
| ea328c2e2f | |||
| 27bfc2d6fd | |||
| 61e5d5cce8 | |||
| 0aa4bfb303 | |||
| ad68e0e16e | |||
| d6fefec017 | |||
| 22c817a4ec | |||
| d718a3b78a | |||
| f99241ec3e | |||
| d6f4390372 | |||
| 8a95609504 | |||
| f553547c32 | |||
| 49f776f172 | |||
| 4a3eeb4f20 | |||
| f2beb500e7 | |||
| db2f4269da |
@@ -1,6 +1,6 @@
|
|||||||
MIT License
|
MIT License
|
||||||
|
|
||||||
Copyright (c) 2026 Alice
|
Copyright (c) 2026 Grant
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -31,17 +31,17 @@ Two layers in this repo:
|
|||||||
cd image
|
cd image
|
||||||
python3 -m venv .venv && source .venv/bin/activate
|
python3 -m venv .venv && source .venv/bin/activate
|
||||||
pip install -e .
|
pip install -e .
|
||||||
export SPARK1_HOST=<spark-1-ip>
|
export SPARK1_HOST=192.168.1.103
|
||||||
export SPARK1_USER=<spark-user>
|
export SPARK1_USER=modelo
|
||||||
export SPARK2_HOST=<spark-2-ip>
|
export SPARK2_HOST=192.168.1.87
|
||||||
export SPARK2_USER=<spark-user>
|
export SPARK2_USER=modelo
|
||||||
export SSH_KEY_PATH="$HOME/Library/Application Support/NVIDIA/Sync/config/nvsync.key"
|
export SSH_KEY_PATH="$HOME/Library/Application Support/NVIDIA/Sync/config/nvsync.key"
|
||||||
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
|
uvicorn app.server:app --host 0.0.0.0 --port 9999 --reload
|
||||||
```
|
```
|
||||||
|
|
||||||
Open <http://localhost:9999>.
|
Open <http://localhost:9999>.
|
||||||
|
|
||||||
> **Note:** use the **IP** `<spark-1-ip>` for Spark 1, not `<spark-1-host>.local`. mDNS resolves to IPv6 first and `httpx` hangs on it because vLLM only binds IPv4.
|
> **Note:** use the **IP** `192.168.1.103` for Spark 1, not `spark-27ea.local`. mDNS resolves to IPv6 first and `httpx` hangs on it because vLLM only binds IPv4.
|
||||||
|
|
||||||
## Build the StartOS package
|
## Build the StartOS package
|
||||||
|
|
||||||
@@ -58,8 +58,8 @@ To sideload onto your Start9: `make install` (needs `host:` set in `~/.startos/c
|
|||||||
## Post-install setup (one-time per Start9 install)
|
## Post-install setup (one-time per Start9 install)
|
||||||
|
|
||||||
1. Open the Spark Control service → **Actions** → **Show Public Key** → copy the line.
|
1. Open the Spark Control service → **Actions** → **Show Public Key** → copy the line.
|
||||||
2. SSH to each Spark and append the line to `~/.ssh/authorized_keys` for the `<spark-user>` user.
|
2. SSH to each Spark and append the line to `~/.ssh/authorized_keys` for the `modelo` user.
|
||||||
3. **Actions** → **Configure Sparks** → enter `<spark-1-ip>` / `<spark-user>` for Spark 1 and `<spark-2-ip>` / `<spark-user>` for Spark 2.
|
3. **Actions** → **Configure Sparks** → enter `192.168.1.103` / `modelo` for Spark 1 and `192.168.1.87` / `modelo` for Spark 2.
|
||||||
4. Start the service. Open the Web UI — current model + health should show within ~5 s.
|
4. Start the service. Open the Web UI — current model + health should show within ~5 s.
|
||||||
|
|
||||||
## Repo layout
|
## Repo layout
|
||||||
@@ -76,14 +76,32 @@ Other services on your LAN can hit `GET /api/endpoints` to learn where the curre
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"vllm": { "ready": true, "base_url": "http://<spark-1-ip>:8888/v1", "model": "RedHatAI/Qwen3.6-35B-A3B-NVFP4", "openai_compat": true },
|
"vllm": { "ready": true, "base_url": "http://192.168.1.103:8888/v1", "model": "RedHatAI/Qwen3.6-35B-A3B-NVFP4", "openai_compat": true },
|
||||||
"parakeet":{ "ready": true, "base_url": "http://<spark-2-ip>:8000", "kind": "stt", "model": "nvidia/parakeet-tdt-0.6b-v3" },
|
"parakeet":{ "ready": true, "base_url": "http://192.168.1.87:8000", "kind": "stt", "model": "nvidia/parakeet-tdt-0.6b-v3" },
|
||||||
"magpie": { "ready": false, "base_url": "http://<spark-2-ip>:9000", "kind": "tts" }
|
"magpie": { "ready": false, "base_url": "http://192.168.1.87:9000", "kind": "tts" }
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
`base_url` is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with `ready: true` to safely route traffic.
|
`base_url` is filled in whenever Configure Sparks has been completed (even if the underlying service isn't currently up). Pair the URL with `ready: true` to safely route traffic.
|
||||||
|
|
||||||
|
## Reporting failures from external apps
|
||||||
|
|
||||||
|
Spark Control polls every 5 s, so a brief blip in Parakeet/Magpie/vLLM availability can slip between polls and never make it into the connectivity log. To capture short failures, an external app (e.g. Open WebUI) can POST whenever a call fails (or succeeds):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://<dashboard-url>/api/health-event \
|
||||||
|
-H 'content-type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"service": "parakeet",
|
||||||
|
"ok": false,
|
||||||
|
"source": "open-webui",
|
||||||
|
"error": "HTTP 503",
|
||||||
|
"ms": 420
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Fields: `service` (required), `ok` (required), `source` (optional, free-form), `error` (optional), `ms` (optional latency). Each POST appends a `report` event to the connectivity log alongside the polling-based transition events.
|
||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
**v0.2.3** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
|
**v0.2.3** — installed and verified on a Start9 server. Five bundled LLMs in the catalog (qwen3-vl, gemma4, qwen36, qwen3-235b-fp8, qwen2.5-72b), plus any custom models added through the UI.
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# Project: spark-control — Model switcher web UI for dual DGX Spark cluster
|
# Project: spark-control — Model switcher web UI for dual DGX Spark cluster
|
||||||
|
|
||||||
> **Update 2026-05-12 — Direction change:** the web UI is being built as a
|
> **Update 2026-05-12 — Direction change:** the web UI is being built as a
|
||||||
> **StartOS 0.4 package** (sideloaded onto Alice's existing Start9 server),
|
> **StartOS 0.4 package** (sideloaded onto Grant's existing Start9 server),
|
||||||
> **not** as a FastAPI service running directly on Spark 1. The Start9 server
|
> **not** as a FastAPI service running directly on Spark 1. The Start9 server
|
||||||
> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
|
> shares a LAN with the Sparks and SSHes into Spark 1 to invoke
|
||||||
> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
|
> `launch-cluster.sh`. StartOS handles `.local` exposure and HTTPS; SSH
|
||||||
@@ -38,8 +38,8 @@ The web UI itself, when deployed, will run on **Spark 1** (where it can directly
|
|||||||
From my laptop I can SSH to either Spark directly:
|
From my laptop I can SSH to either Spark directly:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-ip> # Spark 1
|
ssh modelo@192.168.1.103 # Spark 1
|
||||||
ssh <spark-user>@<spark-2-ip> # Spark 2
|
ssh modelo@192.168.1.87 # Spark 2
|
||||||
```
|
```
|
||||||
|
|
||||||
(I can also use SSH key auth — set up earlier.)
|
(I can also use SSH key auth — set up earlier.)
|
||||||
@@ -47,7 +47,7 @@ ssh <spark-user>@<spark-2-ip> # Spark 2
|
|||||||
When you need to run a command on a Spark, use this pattern:
|
When you need to run a command on a Spark, use this pattern:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-ip> 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
ssh modelo@192.168.1.103 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
||||||
```
|
```
|
||||||
|
|
||||||
For multi-line commands or scripts, you can pipe a heredoc or just SSH in directly and run them interactively. Either works — but always tell me what you're about to run so I can review.
|
For multi-line commands or scripts, you can pipe a heredoc or just SSH in directly and run them interactively. Either works — but always tell me what you're about to run so I can review.
|
||||||
@@ -55,19 +55,19 @@ For multi-line commands or scripts, you can pipe a heredoc or just SSH in direct
|
|||||||
For file transfers between my laptop and the Sparks, use `rsync`:
|
For file transfers between my laptop and the Sparks, use `rsync`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
rsync -avz ~/Projects/spark-control/ <spark-user>@<spark-1-ip>:~/spark-control/
|
rsync -avz ~/Projects/spark-control/ modelo@192.168.1.103:~/spark-control/
|
||||||
```
|
```
|
||||||
|
|
||||||
## My hardware and what's running
|
## My hardware and what's running
|
||||||
|
|
||||||
**Two NVIDIA DGX Spark units** networked together:
|
**Two NVIDIA DGX Spark units** networked together:
|
||||||
|
|
||||||
- **Spark 1** — hostname `<spark-1-host>`, LAN IP `<spark-1-ip>`, QSFP IP `<spark-1-qsfp-ip>`. Head node for the vLLM cluster.
|
- **Spark 1** — hostname `spark-27ea`, LAN IP `192.168.1.103`, QSFP IP `192.168.100.10`. Head node for the vLLM cluster.
|
||||||
- **Spark 2** — hostname `<spark-2-host>`, LAN IP `<spark-2-ip>`, QSFP IP `<spark-2-qsfp-ip>`. Worker node for vLLM cluster, also hosts standalone services.
|
- **Spark 2** — hostname `spark-32d0`, LAN IP `192.168.1.87`, QSFP IP `192.168.100.11`. Worker node for vLLM cluster, also hosts standalone services.
|
||||||
|
|
||||||
Both run Ubuntu 24.04, NVIDIA driver 580.x, CUDA 13.0, Docker, and have 128 GB unified memory each. They share a QSFP cable for high-speed (200 Gb/s) inter-node networking.
|
Both run Ubuntu 24.04, NVIDIA driver 580.x, CUDA 13.0, Docker, and have 128 GB unified memory each. They share a QSFP cable for high-speed (200 Gb/s) inter-node networking.
|
||||||
|
|
||||||
Passwordless SSH works in both directions via `~/.ssh/<ssh-key>` key. My Linux username on both machines is `<spark-user>`.
|
Passwordless SSH works in both directions via `~/.ssh/id_ed25519_shared` key. My Linux username on both machines is `modelo`.
|
||||||
|
|
||||||
**Currently running:**
|
**Currently running:**
|
||||||
- One LLM at a time on the cluster (via the `eugr/spark-vllm-docker` project — see below)
|
- One LLM at a time on the cluster (via the `eugr/spark-vllm-docker` project — see below)
|
||||||
@@ -88,7 +88,7 @@ Key commands (all run from `~/spark-vllm-docker` on Spark 1):
|
|||||||
|
|
||||||
Container names: `vllm_node` (the main vLLM container), `ray_head` and `ray_worker` (Ray cluster), plus support containers.
|
Container names: `vllm_node` (the main vLLM container), `ray_head` and `ray_worker` (Ray cluster), plus support containers.
|
||||||
|
|
||||||
The vLLM server binds to port **8888** and exposes an OpenAI-compatible API at `http://<spark-1-ip>:8888/v1`.
|
The vLLM server binds to port **8888** and exposes an OpenAI-compatible API at `http://192.168.1.103:8888/v1`.
|
||||||
|
|
||||||
## Models I have on disk (both Sparks)
|
## Models I have on disk (both Sparks)
|
||||||
|
|
||||||
@@ -154,7 +154,7 @@ Note: the `--moe_backend flashinfer_cutlass` flag is Blackwell-specific. If it e
|
|||||||
- Status check: `./launch-cluster.sh status`
|
- Status check: `./launch-cluster.sh status`
|
||||||
- See vLLM logs: `docker logs vllm_node` (add `-f` to follow)
|
- See vLLM logs: `docker logs vllm_node` (add `-f` to follow)
|
||||||
- Hard reset if stuck: `./launch-cluster.sh stop && docker ps -aq | xargs -r docker rm -f`
|
- Hard reset if stuck: `./launch-cluster.sh stop && docker ps -aq | xargs -r docker rm -f`
|
||||||
- Health check (is API responding?): `curl -s http://<spark-1-ip>:8888/v1/models`
|
- Health check (is API responding?): `curl -s http://192.168.1.103:8888/v1/models`
|
||||||
|
|
||||||
### "Ready" signal
|
### "Ready" signal
|
||||||
The model is ready to serve when `docker logs vllm_node` contains the line `Application startup complete.` Until then, it's still loading weights or compiling CUDA graphs.
|
The model is ready to serve when `docker logs vllm_node` contains the line `Application startup complete.` Until then, it's still loading weights or compiling CUDA graphs.
|
||||||
@@ -163,8 +163,8 @@ The model is ready to serve when `docker logs vllm_node` contains the line `Appl
|
|||||||
|
|
||||||
These don't get touched by model swaps:
|
These don't get touched by model swaps:
|
||||||
|
|
||||||
- **`parakeet-asr`** — STT on port 8000. Already running 24/7. Verify with `curl http://<spark-2-ip>:8000/health` which should return `{"status":"ready",...}`.
|
- **`parakeet-asr`** — STT on port 8000. Already running 24/7. Verify with `curl http://192.168.1.87:8000/health` which should return `{"status":"ready",...}`.
|
||||||
- **`magpie-tts`** — TTS on port 9000. May or may not be running; verify with `docker ps` on Spark 2 and `curl http://<spark-2-ip>:9000/v1/health/ready`.
|
- **`magpie-tts`** — TTS on port 9000. May or may not be running; verify with `docker ps` on Spark 2 and `curl http://192.168.1.87:9000/v1/health/ready`.
|
||||||
|
|
||||||
## What I want you to build
|
## What I want you to build
|
||||||
|
|
||||||
@@ -201,7 +201,7 @@ spark-control/
|
|||||||
5. Return exit code 0 on success, non-zero on failure
|
5. Return exit code 0 on success, non-zero on failure
|
||||||
|
|
||||||
Two versions might be useful:
|
Two versions might be useful:
|
||||||
- The version that runs on **my laptop** — wraps everything in `ssh <spark-user>@<spark-1-ip> ...`
|
- The version that runs on **my laptop** — wraps everything in `ssh modelo@192.168.1.103 ...`
|
||||||
- A simpler version that lives on **Spark 1** — runs commands directly without SSH (used by the deployed web UI)
|
- A simpler version that lives on **Spark 1** — runs commands directly without SSH (used by the deployed web UI)
|
||||||
|
|
||||||
You can either share one script with a `--remote` flag, or make them two distinct files. Your call — propose the cleaner option.
|
You can either share one script with a `--remote` flag, or make them two distinct files. Your call — propose the cleaner option.
|
||||||
@@ -246,14 +246,14 @@ The web UI runs on **Spark 1** so it can directly invoke `launch-cluster.sh` wit
|
|||||||
## First task
|
## First task
|
||||||
|
|
||||||
1. First, **verify SSH access to both Sparks** from my laptop:
|
1. First, **verify SSH access to both Sparks** from my laptop:
|
||||||
- `ssh <spark-user>@<spark-1-ip> hostname` should return `<spark-1-host>`
|
- `ssh modelo@192.168.1.103 hostname` should return `spark-27ea`
|
||||||
- `ssh <spark-user>@<spark-2-ip> hostname` should return `<spark-2-host>`
|
- `ssh modelo@192.168.1.87 hostname` should return `spark-32d0`
|
||||||
2. Then **verify the current state of the cluster** via SSH:
|
2. Then **verify the current state of the cluster** via SSH:
|
||||||
- Confirm `~/spark-vllm-docker` exists on Spark 1 and `launch-cluster.sh` is there: `ssh <spark-user>@<spark-1-ip> 'ls ~/spark-vllm-docker/launch-cluster.sh'`
|
- Confirm `~/spark-vllm-docker` exists on Spark 1 and `launch-cluster.sh` is there: `ssh modelo@192.168.1.103 'ls ~/spark-vllm-docker/launch-cluster.sh'`
|
||||||
- Check which LLM (if any) is currently loaded: `ssh <spark-user>@<spark-1-ip> 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'` and `ssh <spark-user>@<spark-1-ip> 'curl -s http://localhost:8888/v1/models'`
|
- Check which LLM (if any) is currently loaded: `ssh modelo@192.168.1.103 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'` and `ssh modelo@192.168.1.103 'curl -s http://localhost:8888/v1/models'`
|
||||||
- Verify which models are downloaded: `ssh <spark-user>@<spark-1-ip> 'ls ~/.cache/huggingface/hub/ | grep -iE "qwen|gemma"'`
|
- Verify which models are downloaded: `ssh modelo@192.168.1.103 'ls ~/.cache/huggingface/hub/ | grep -iE "qwen|gemma"'`
|
||||||
- Specifically check if `Qwen3.6-35B-A3B-NVFP4` is downloaded; if not, that's the prerequisite step (run the `hf-download.sh` command on Spark 1)
|
- Specifically check if `Qwen3.6-35B-A3B-NVFP4` is downloaded; if not, that's the prerequisite step (run the `hf-download.sh` command on Spark 1)
|
||||||
- Check what's running on Spark 2: `ssh <spark-user>@<spark-2-ip> 'docker ps'` (looking for parakeet-asr and possibly magpie-tts)
|
- Check what's running on Spark 2: `ssh modelo@192.168.1.87 'docker ps'` (looking for parakeet-asr and possibly magpie-tts)
|
||||||
3. Then create the repo structure on my laptop at `~/Projects/spark-control/`
|
3. Then create the repo structure on my laptop at `~/Projects/spark-control/`
|
||||||
4. Then propose the design for `models.yaml` and the swap script before implementing
|
4. Then propose the design for `models.yaml` and the swap script before implementing
|
||||||
|
|
||||||
|
|||||||
@@ -12,6 +12,18 @@ RUN chmod +x /app/entrypoint.sh
|
|||||||
|
|
||||||
COPY models.yaml /app/models.yaml
|
COPY models.yaml /app/models.yaml
|
||||||
|
|
||||||
|
# Parakeet container wrapper patches (diarizer.py + main.py overlay).
|
||||||
|
# Shipped inside spark-control so the "Reapply speech-model patches" action
|
||||||
|
# can copy these into the parakeet-asr container on Spark 2 over SSH at any
|
||||||
|
# time — survives docker rm + redeploy of the parakeet container.
|
||||||
|
COPY parakeet_patches /app/parakeet_patches
|
||||||
|
|
||||||
|
# WhisperX container build context (Dockerfile + requirements.txt + app/).
|
||||||
|
# The "Install WhisperX" action in spark-control ships these files to Spark 2
|
||||||
|
# over SSH, then runs `docker build` + `docker run` there. The container
|
||||||
|
# becomes a managed always-on service alongside parakeet-asr and magpie-tts.
|
||||||
|
COPY whisperx_container /app/whisperx_container
|
||||||
|
|
||||||
RUN pip install --no-cache-dir -e .
|
RUN pip install --no-cache-dir -e .
|
||||||
|
|
||||||
ENV BIND_PORT=9999
|
ENV BIND_PORT=9999
|
||||||
|
|||||||
@@ -0,0 +1,434 @@
|
|||||||
|
"""OpenAI-compatible audio proxy: lets any OpenAI-shaped client (Open WebUI,
|
||||||
|
Home Assistant, etc.) talk to Parakeet (STT) and Magpie (TTS) through one URL.
|
||||||
|
|
||||||
|
Endpoints exposed on spark-control's port (same as the dashboard):
|
||||||
|
GET /v1/models — lists STT model + Magpie voices in OpenAI shape
|
||||||
|
POST /v1/audio/speech — OpenAI TTS → Magpie /v1/audio/synthesize
|
||||||
|
POST /v1/audio/transcriptions — forward to Parakeet (already OpenAI-compatible)
|
||||||
|
|
||||||
|
Both downstream services already speak HTTP on the LAN; this module just adapts
|
||||||
|
request/response shapes so OpenAI clients don't need a custom integration.
|
||||||
|
|
||||||
|
When Parakeet returns a 500 (commonly the recurring CUDA wedge), the proxy
|
||||||
|
returns a clearer 503 with Retry-After=60, and fires the deep-health probe in
|
||||||
|
the background — which detects the wedge and triggers a rate-limited container
|
||||||
|
restart inside seconds. The client's next attempt ~60s later then succeeds.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from fastapi import APIRouter, Form, HTTPException, Request, UploadFile, File
|
||||||
|
from fastapi.responses import Response, StreamingResponse
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
|
||||||
|
logger = logging.getLogger("spark-control.audio")
|
||||||
|
|
||||||
|
# Magpie voice name encodes its language. Example:
|
||||||
|
# Magpie-Multilingual.EN-US.Mia -> en-US
|
||||||
|
# Magpie-Multilingual.ES-US.Diego -> es-US
|
||||||
|
# Magpie-Multilingual.FR-FR.Pascal -> fr-FR
|
||||||
|
def _lang_from_voice(voice: str) -> str:
|
||||||
|
try:
|
||||||
|
parts = voice.split(".")
|
||||||
|
# parts = ["Magpie-Multilingual", "EN-US", "Mia"] (or with emotion suffix)
|
||||||
|
if len(parts) >= 2 and "-" in parts[1]:
|
||||||
|
lang_part = parts[1] # "EN-US"
|
||||||
|
primary, region = lang_part.split("-", 1)
|
||||||
|
return f"{primary.lower()}-{region.upper()}"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return "en-US"
|
||||||
|
|
||||||
|
|
||||||
|
# Default voice: configurable, falls back to a sensible English voice if unset.
|
||||||
|
DEFAULT_VOICE = "Magpie-Multilingual.EN-US.Mia"
|
||||||
|
|
||||||
|
|
||||||
|
class SpeechRequest(BaseModel):
|
||||||
|
"""OpenAI /v1/audio/speech request body."""
|
||||||
|
model: Optional[str] = None # ignored — Magpie has one model
|
||||||
|
input: str # the text to speak
|
||||||
|
voice: Optional[str] = None # e.g. "Magpie-Multilingual.EN-US.Mia"
|
||||||
|
response_format: Optional[str] = "wav" # only "wav" supported today
|
||||||
|
speed: Optional[float] = 1.0 # ignored by Magpie
|
||||||
|
# Magpie-specific extensions (clients may pass these through)
|
||||||
|
language: Optional[str] = None
|
||||||
|
sample_rate_hz: Optional[int] = 22050
|
||||||
|
encoding: Optional[str] = "LINEAR_PCM"
|
||||||
|
|
||||||
|
|
||||||
|
def build_router(settings: Settings, deep_health: Any = None) -> APIRouter:
|
||||||
|
"""Build the audio proxy router.
|
||||||
|
|
||||||
|
If `deep_health` is provided, 500s from Parakeet trigger an immediate
|
||||||
|
background probe (which contains the same wedge-detect → auto-restart
|
||||||
|
logic as the 5-minute periodic loop, but fires now instead of waiting).
|
||||||
|
"""
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
def _parakeet_base() -> str:
|
||||||
|
return f"http://{settings.parakeet_host}:{settings.parakeet_port}"
|
||||||
|
|
||||||
|
def _magpie_base() -> str:
|
||||||
|
return f"http://{settings.magpie_host}:{settings.magpie_port}"
|
||||||
|
|
||||||
|
# ---- /v1/models ----
|
||||||
|
@router.get("/v1/models")
|
||||||
|
async def list_models() -> dict:
|
||||||
|
"""Advertise the STT model + a small voice menu so clients can
|
||||||
|
populate their voice-picker UIs. Falls back gracefully if Magpie
|
||||||
|
is offline (returns just the STT entry)."""
|
||||||
|
data: list[dict] = [
|
||||||
|
{
|
||||||
|
"id": "parakeet-tdt-0.6b-v3",
|
||||||
|
"object": "model",
|
||||||
|
"owned_by": "nvidia",
|
||||||
|
"kind": "stt",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
# Try to enumerate voices from Magpie; if unreachable, just skip.
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||||
|
r = await client.get(f"{_magpie_base()}/v1/audio/list_voices")
|
||||||
|
if r.status_code == 200:
|
||||||
|
voices_by_locales = r.json()
|
||||||
|
seen = set()
|
||||||
|
for _locales, payload in voices_by_locales.items():
|
||||||
|
for v in payload.get("voices", []):
|
||||||
|
# Collapse emotion variants — expose only the base voice name.
|
||||||
|
# "Magpie-Multilingual.EN-US.Mia.Angry" -> "Magpie-Multilingual.EN-US.Mia"
|
||||||
|
parts = v.split(".")
|
||||||
|
base = ".".join(parts[:3]) if len(parts) >= 3 else v
|
||||||
|
if base not in seen:
|
||||||
|
seen.add(base)
|
||||||
|
data.append({
|
||||||
|
"id": base,
|
||||||
|
"object": "model",
|
||||||
|
"owned_by": "nvidia",
|
||||||
|
"kind": "tts",
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("magpie voice list unavailable: %s", e)
|
||||||
|
return {"object": "list", "data": data}
|
||||||
|
|
||||||
|
# ---- /v1/audio/speech (TTS) ----
|
||||||
|
@router.post("/v1/audio/speech")
|
||||||
|
async def speech(body: SpeechRequest) -> Response:
|
||||||
|
"""OpenAI-style TTS. Translates to Magpie's multipart synth call.
|
||||||
|
|
||||||
|
Returns raw WAV bytes (Content-Type: audio/wav) — browsers and most
|
||||||
|
clients play these directly.
|
||||||
|
"""
|
||||||
|
text = (body.input or "").strip()
|
||||||
|
if not text:
|
||||||
|
raise HTTPException(400, "input text is required")
|
||||||
|
|
||||||
|
voice = body.voice or DEFAULT_VOICE
|
||||||
|
language = body.language or _lang_from_voice(voice)
|
||||||
|
sample_rate = int(body.sample_rate_hz or 22050)
|
||||||
|
encoding = body.encoding or "LINEAR_PCM"
|
||||||
|
|
||||||
|
form = {
|
||||||
|
"text": text,
|
||||||
|
"language": language,
|
||||||
|
"voice": voice,
|
||||||
|
"sample_rate_hz": str(sample_rate),
|
||||||
|
"encoding": encoding,
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
r = await client.post(f"{_magpie_base()}/v1/audio/synthesize", data=form)
|
||||||
|
except httpx.HTTPError as e:
|
||||||
|
raise HTTPException(502, f"magpie unreachable: {e}")
|
||||||
|
|
||||||
|
if r.status_code != 200:
|
||||||
|
# Surface Magpie's error message verbatim so clients can debug voice/lang typos.
|
||||||
|
raise HTTPException(r.status_code, r.text[:500])
|
||||||
|
|
||||||
|
# Magpie returns WAV bytes already (Content-Type: audio/wav). Pass through.
|
||||||
|
media_type = r.headers.get("content-type", "audio/wav")
|
||||||
|
return Response(content=r.content, media_type=media_type)
|
||||||
|
|
||||||
|
# ---- /v1/audio/transcriptions (STT) ----
|
||||||
|
@router.post("/v1/audio/transcriptions")
|
||||||
|
async def transcriptions(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default=None),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
prompt: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json"),
|
||||||
|
temperature: Optional[float] = Form(default=None),
|
||||||
|
) -> Response:
|
||||||
|
"""Forward to Parakeet's already-OpenAI-compatible endpoint.
|
||||||
|
|
||||||
|
We relay rather than redirect so clients only need to know one URL
|
||||||
|
(spark-control's) — and so any future client-side rewrites of the
|
||||||
|
request shape (e.g. translating Whisper-format params) happen here.
|
||||||
|
"""
|
||||||
|
body = await file.read()
|
||||||
|
files = {"file": (file.filename or "audio.wav", body, file.content_type or "application/octet-stream")}
|
||||||
|
data: dict[str, str] = {}
|
||||||
|
if model: data["model"] = model
|
||||||
|
if language: data["language"] = language
|
||||||
|
if prompt: data["prompt"] = prompt
|
||||||
|
if response_format: data["response_format"] = response_format
|
||||||
|
if temperature is not None: data["temperature"] = str(temperature)
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=300.0) as client:
|
||||||
|
r = await client.post(
|
||||||
|
f"{_parakeet_base()}/v1/audio/transcriptions",
|
||||||
|
files=files, data=data,
|
||||||
|
)
|
||||||
|
except httpx.HTTPError as e:
|
||||||
|
raise HTTPException(502, f"parakeet unreachable: {e}")
|
||||||
|
|
||||||
|
if r.status_code == 500:
|
||||||
|
# Parakeet 500s are almost always the CUDA wedge (CUBLAS_*_ERROR
|
||||||
|
# mid-attention). Kick deep-health to detect+restart in the
|
||||||
|
# background, and return a clean retry signal to the client.
|
||||||
|
err_snippet = r.text[:400]
|
||||||
|
logger.warning("parakeet 500 — firing deep-health probe in background. detail=%s", err_snippet)
|
||||||
|
if deep_health is not None:
|
||||||
|
try:
|
||||||
|
asyncio.create_task(deep_health.run_one("parakeet"))
|
||||||
|
except Exception as e:
|
||||||
|
logger.error("failed to schedule deep-health probe: %s", e)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="Parakeet returned a transient error (likely CUDA wedge). Auto-restart triggered; retry in ~60s.",
|
||||||
|
headers={"Retry-After": "60"},
|
||||||
|
)
|
||||||
|
|
||||||
|
if r.status_code != 200:
|
||||||
|
raise HTTPException(r.status_code, r.text[:500])
|
||||||
|
return Response(content=r.content, media_type=r.headers.get("content-type", "application/json"))
|
||||||
|
|
||||||
|
def _whisperx_base() -> str:
|
||||||
|
return f"http://{settings.whisperx_host}:{settings.whisperx_port}"
|
||||||
|
|
||||||
|
async def _whisperx_healthy() -> bool:
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=2.0) as client:
|
||||||
|
r = await client.get(f"{_whisperx_base()}/health")
|
||||||
|
return r.status_code == 200 and bool(r.json().get("diarizer_loaded"))
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# ---- /api/audio/transcribe-with-speakers (STT + diarization, merged) ----
|
||||||
|
@router.post("/api/audio/transcribe-with-speakers")
|
||||||
|
async def transcribe_with_speakers(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
) -> dict:
|
||||||
|
"""Diarized transcription: run Parakeet ASR and Sortformer diarization on
|
||||||
|
the same audio in parallel, then merge by timestamp.
|
||||||
|
|
||||||
|
Response shape (designed for downstream UIs like recap-relay):
|
||||||
|
|
||||||
|
{
|
||||||
|
"duration": 90.5,
|
||||||
|
"language": "en",
|
||||||
|
"speakers_detected": ["Speaker_0", "Speaker_1"],
|
||||||
|
"segments": [
|
||||||
|
{"start_ms": 39308, "end_ms": 51000,
|
||||||
|
"speaker": "Speaker_0", "text": "good morning i think..."},
|
||||||
|
...
|
||||||
|
],
|
||||||
|
"models": {
|
||||||
|
"transcription": "parakeet-tdt-0.6b-v3",
|
||||||
|
"diarization": "nvidia/diar_sortformer_4spk-v1"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Each segment is a block of consecutive words by the same speaker. Speaker
|
||||||
|
labels are anonymous (Speaker_0, Speaker_1, ...) — name resolution is the
|
||||||
|
caller's responsibility (LLM analysis with optional participant hints,
|
||||||
|
or manual mapping UI).
|
||||||
|
"""
|
||||||
|
body = await file.read()
|
||||||
|
if not body:
|
||||||
|
raise HTTPException(400, "Empty file")
|
||||||
|
filename = file.filename or "audio.wav"
|
||||||
|
content_type = file.content_type or "application/octet-stream"
|
||||||
|
|
||||||
|
# Prefer WhisperX (single-pipeline, handles long audio properly) when it's
|
||||||
|
# installed and healthy. Fall back to Parakeet + Sortformer otherwise.
|
||||||
|
if await _whisperx_healthy():
|
||||||
|
files = {"file": (filename, body, content_type)}
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=1800.0) as client:
|
||||||
|
r = await client.post(
|
||||||
|
f"{_whisperx_base()}/v1/audio/transcribe-with-speakers",
|
||||||
|
files=files,
|
||||||
|
)
|
||||||
|
except httpx.HTTPError as e:
|
||||||
|
raise HTTPException(502, f"whisperx unreachable: {e}")
|
||||||
|
if r.status_code != 200:
|
||||||
|
raise HTTPException(r.status_code, r.text[:500])
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
# ── Legacy fallback: Parakeet ASR + Sortformer diarizer in parallel ──
|
||||||
|
async def _call_transcribe(client: httpx.AsyncClient) -> dict:
|
||||||
|
files = {"file": (filename, body, content_type)}
|
||||||
|
data = {"response_format": "verbose_json"}
|
||||||
|
r = await client.post(
|
||||||
|
f"{_parakeet_base()}/v1/audio/transcriptions",
|
||||||
|
files=files, data=data,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
async def _call_diarize(client: httpx.AsyncClient) -> dict:
|
||||||
|
files = {"file": (filename, body, content_type)}
|
||||||
|
r = await client.post(
|
||||||
|
f"{_parakeet_base()}/v1/audio/diarize",
|
||||||
|
files=files,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
# Run both in parallel against the same Parakeet container — Sortformer
|
||||||
|
# and Parakeet ASR are independent forward passes that share the GPU.
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=600.0) as client:
|
||||||
|
stt, diar = await asyncio.gather(
|
||||||
|
_call_transcribe(client),
|
||||||
|
_call_diarize(client),
|
||||||
|
)
|
||||||
|
except httpx.HTTPStatusError as e:
|
||||||
|
# Surface upstream errors. If transcribe wedged, kick deep-health.
|
||||||
|
if e.response.status_code == 500 and deep_health is not None:
|
||||||
|
try:
|
||||||
|
asyncio.create_task(deep_health.run_one("parakeet"))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="Parakeet transient error (likely CUDA wedge). Auto-restart triggered; retry in ~60s.",
|
||||||
|
headers={"Retry-After": "60"},
|
||||||
|
)
|
||||||
|
raise HTTPException(e.response.status_code, e.response.text[:500])
|
||||||
|
except httpx.HTTPError as e:
|
||||||
|
raise HTTPException(502, f"parakeet unreachable: {e}")
|
||||||
|
|
||||||
|
merged = _merge_words_with_speakers(
|
||||||
|
words=stt.get("words", []),
|
||||||
|
diar_turns=diar.get("segments", []),
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"duration": stt.get("duration") or diar.get("duration") or 0.0,
|
||||||
|
"language": stt.get("language", "en"),
|
||||||
|
"speakers_detected": diar.get("speakers_detected", []),
|
||||||
|
"segments": merged,
|
||||||
|
"models": {
|
||||||
|
"transcription": stt.get("model") if isinstance(stt.get("model"), str) else "parakeet",
|
||||||
|
"diarization": diar.get("model", "sortformer"),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
return router
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Merge helper: assign speaker to each word, then group into blocks ----
|
||||||
|
|
||||||
|
def _assign_speaker_to_word(word_start_s: float, word_end_s: float, diar_turns: list[dict]) -> str:
|
||||||
|
"""Find the diarization turn that contains this word, or has the most
|
||||||
|
overlap with it. Returns the speaker label, or 'Speaker_unknown' if no
|
||||||
|
turn overlaps at all."""
|
||||||
|
word_mid = (word_start_s + word_end_s) / 2.0
|
||||||
|
# Fast path: find the turn containing the midpoint
|
||||||
|
for t in diar_turns:
|
||||||
|
if t["start_s"] <= word_mid <= t["end_s"]:
|
||||||
|
return t["speaker"]
|
||||||
|
# Slow path: pick the turn with max overlap with the word's span
|
||||||
|
best_speaker = "Speaker_unknown"
|
||||||
|
best_overlap = 0.0
|
||||||
|
for t in diar_turns:
|
||||||
|
overlap = max(0.0, min(word_end_s, t["end_s"]) - max(word_start_s, t["start_s"]))
|
||||||
|
if overlap > best_overlap:
|
||||||
|
best_overlap = overlap
|
||||||
|
best_speaker = t["speaker"]
|
||||||
|
return best_speaker
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_words_with_speakers(words: list[dict], diar_turns: list[dict]) -> list[dict]:
|
||||||
|
"""Group consecutive same-speaker words into blocks.
|
||||||
|
|
||||||
|
Each input word: {"start": float_s, "end": float_s, "text": str} (Parakeet
|
||||||
|
verbose_json format; values are seconds).
|
||||||
|
Each input turn: {"start_s": float, "end_s": float, "speaker": str}.
|
||||||
|
|
||||||
|
Output: [{"start_ms": int, "end_ms": int, "speaker": str, "text": str}, ...]
|
||||||
|
|
||||||
|
Also breaks a block on a long silence gap (>1.5 s) even within the same
|
||||||
|
speaker — keeps blocks readable in UI rendering.
|
||||||
|
"""
|
||||||
|
if not words:
|
||||||
|
return []
|
||||||
|
SILENCE_BREAK_S = 1.5
|
||||||
|
|
||||||
|
def _join_words(parts: list[str]) -> str:
|
||||||
|
"""Join word tokens with proper spacing. Different STT outputs vary —
|
||||||
|
some include leading spaces in the word text (' morning'), some don't
|
||||||
|
('morning'). Normalize by stripping each token then joining with one
|
||||||
|
space; collapse multiple spaces. Keeps punctuation tight (no space
|
||||||
|
before period/comma/etc.)."""
|
||||||
|
cleaned = [p.strip() for p in parts if p and p.strip()]
|
||||||
|
if not cleaned:
|
||||||
|
return ""
|
||||||
|
out = cleaned[0]
|
||||||
|
for token in cleaned[1:]:
|
||||||
|
# No leading space before pure-punctuation tokens
|
||||||
|
if token and token[0] in ".,;:!?)]}'\"":
|
||||||
|
out += token
|
||||||
|
else:
|
||||||
|
out += " " + token
|
||||||
|
return out
|
||||||
|
|
||||||
|
blocks: list[dict] = []
|
||||||
|
cur_words: list[str] = []
|
||||||
|
cur_speaker: Optional[str] = None
|
||||||
|
cur_start_s: Optional[float] = None
|
||||||
|
cur_end_s: Optional[float] = None
|
||||||
|
|
||||||
|
for w in words:
|
||||||
|
ws = float(w.get("start", 0.0))
|
||||||
|
we = float(w.get("end", ws))
|
||||||
|
wt = str(w.get("text", ""))
|
||||||
|
spk = _assign_speaker_to_word(ws, we, diar_turns)
|
||||||
|
|
||||||
|
is_new_block = (
|
||||||
|
cur_speaker is None
|
||||||
|
or spk != cur_speaker
|
||||||
|
or (cur_end_s is not None and ws - cur_end_s > SILENCE_BREAK_S)
|
||||||
|
)
|
||||||
|
if is_new_block:
|
||||||
|
if cur_speaker is not None:
|
||||||
|
blocks.append({
|
||||||
|
"start_ms": int(cur_start_s * 1000),
|
||||||
|
"end_ms": int(cur_end_s * 1000),
|
||||||
|
"speaker": cur_speaker,
|
||||||
|
"text": _join_words(cur_words),
|
||||||
|
})
|
||||||
|
cur_words = [wt]
|
||||||
|
cur_speaker = spk
|
||||||
|
cur_start_s = ws
|
||||||
|
cur_end_s = we
|
||||||
|
else:
|
||||||
|
cur_words.append(wt)
|
||||||
|
cur_end_s = we
|
||||||
|
|
||||||
|
if cur_speaker is not None and cur_words:
|
||||||
|
blocks.append({
|
||||||
|
"start_ms": int(cur_start_s * 1000),
|
||||||
|
"end_ms": int(cur_end_s * 1000),
|
||||||
|
"speaker": cur_speaker,
|
||||||
|
"text": _join_words(cur_words),
|
||||||
|
})
|
||||||
|
|
||||||
|
return blocks
|
||||||
+17
-3
@@ -35,6 +35,11 @@ class Settings:
|
|||||||
magpie_host: str
|
magpie_host: str
|
||||||
magpie_user: str
|
magpie_user: str
|
||||||
magpie_container: str
|
magpie_container: str
|
||||||
|
whisperx_host: str
|
||||||
|
whisperx_user: str
|
||||||
|
whisperx_container: str
|
||||||
|
whisperx_port: int
|
||||||
|
whisperx_model: str
|
||||||
ssh_key_path: str
|
ssh_key_path: str
|
||||||
ssh_known_hosts: str
|
ssh_known_hosts: str
|
||||||
models_yaml: str
|
models_yaml: str
|
||||||
@@ -42,12 +47,14 @@ class Settings:
|
|||||||
parakeet_port: int
|
parakeet_port: int
|
||||||
magpie_port: int
|
magpie_port: int
|
||||||
bind_port: int
|
bind_port: int
|
||||||
|
open_webui_url: str
|
||||||
|
ngc_api_key: str
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def from_env(cls) -> "Settings":
|
def from_env(cls) -> "Settings":
|
||||||
spark2_host = _env("SPARK2_HOST")
|
spark2_host = _env("SPARK2_HOST")
|
||||||
spark2_user = _env("SPARK2_USER")
|
spark2_user = _env("SPARK2_USER")
|
||||||
# Parakeet and Magpie default to Spark 2 unless explicitly overridden.
|
# Parakeet, Magpie, and WhisperX all default to Spark 2 unless overridden.
|
||||||
return cls(
|
return cls(
|
||||||
spark1_host=_env("SPARK1_HOST"),
|
spark1_host=_env("SPARK1_HOST"),
|
||||||
spark1_user=_env("SPARK1_USER"),
|
spark1_user=_env("SPARK1_USER"),
|
||||||
@@ -55,10 +62,15 @@ class Settings:
|
|||||||
spark2_user=spark2_user,
|
spark2_user=spark2_user,
|
||||||
parakeet_host=_env("PARAKEET_HOST") or spark2_host,
|
parakeet_host=_env("PARAKEET_HOST") or spark2_host,
|
||||||
parakeet_user=_env("PARAKEET_USER") or spark2_user,
|
parakeet_user=_env("PARAKEET_USER") or spark2_user,
|
||||||
parakeet_container=_env("PARAKEET_CONTAINER", "parakeet-asr"),
|
parakeet_container=_env("PARAKEET_CONTAINER") or "parakeet-asr",
|
||||||
magpie_host=_env("MAGPIE_HOST") or spark2_host,
|
magpie_host=_env("MAGPIE_HOST") or spark2_host,
|
||||||
magpie_user=_env("MAGPIE_USER") or spark2_user,
|
magpie_user=_env("MAGPIE_USER") or spark2_user,
|
||||||
magpie_container=_env("MAGPIE_CONTAINER", "magpie-tts"),
|
magpie_container=_env("MAGPIE_CONTAINER") or "magpie-tts",
|
||||||
|
whisperx_host=_env("WHISPERX_HOST") or spark2_host,
|
||||||
|
whisperx_user=_env("WHISPERX_USER") or spark2_user,
|
||||||
|
whisperx_container=_env("WHISPERX_CONTAINER") or "whisperx-asr",
|
||||||
|
whisperx_port=int(_env("WHISPERX_PORT", "8002")),
|
||||||
|
whisperx_model=_env("WHISPERX_MODEL", "medium"),
|
||||||
ssh_key_path=_env("SSH_KEY_PATH"),
|
ssh_key_path=_env("SSH_KEY_PATH"),
|
||||||
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
ssh_known_hosts=_env("SSH_KNOWN_HOSTS"),
|
||||||
models_yaml=_resolve_models_yaml(),
|
models_yaml=_resolve_models_yaml(),
|
||||||
@@ -66,6 +78,8 @@ class Settings:
|
|||||||
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
|
parakeet_port=int(_env("PARAKEET_PORT", "8000")),
|
||||||
magpie_port=int(_env("MAGPIE_PORT", "9000")),
|
magpie_port=int(_env("MAGPIE_PORT", "9000")),
|
||||||
bind_port=int(_env("BIND_PORT", "9999")),
|
bind_port=int(_env("BIND_PORT", "9999")),
|
||||||
|
open_webui_url=_env("OPEN_WEBUI_URL", ""),
|
||||||
|
ngc_api_key=_env("NGC_API_KEY", ""),
|
||||||
)
|
)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
|
|||||||
@@ -0,0 +1,190 @@
|
|||||||
|
"""Track up/down transitions for any subject (Sparks AND services) and cache MACs.
|
||||||
|
|
||||||
|
Persisted to /data/connectivity.json. Schema:
|
||||||
|
|
||||||
|
{
|
||||||
|
"macs": { "spark1": "aa:bb:..", "spark2": "11:22:.." },
|
||||||
|
"current": { "spark1": "up", "parakeet": "up", "magpie": "down", ... },
|
||||||
|
"last_change": { ... },
|
||||||
|
"events": [
|
||||||
|
# Active-probe transition (logged when state flips during polling)
|
||||||
|
{ "subject": "spark2", "at": "...", "kind": "transition",
|
||||||
|
"transition": "down" },
|
||||||
|
{ "subject": "spark2", "at": "...", "kind": "transition",
|
||||||
|
"transition": "up", "down_seconds": 4500 },
|
||||||
|
|
||||||
|
# Passive report (logged whenever an external app POSTs to
|
||||||
|
# /api/health-event regardless of state change)
|
||||||
|
{ "subject": "parakeet", "at": "...", "kind": "report",
|
||||||
|
"ok": false, "source": "open-webui",
|
||||||
|
"detail": "Connection refused", "latency_ms": 320 },
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
Legacy events from v0.5 with `spark` instead of `subject` and no `kind` field
|
||||||
|
are read transparently as kind="transition".
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import threading
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
|
||||||
|
MAX_EVENTS = 200 # rolling window — plenty for showing recent history
|
||||||
|
|
||||||
|
|
||||||
|
def _path() -> str:
|
||||||
|
return os.environ.get("CONNECTIVITY_LOG", "/data/connectivity.json")
|
||||||
|
|
||||||
|
|
||||||
|
_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
def _read() -> dict:
|
||||||
|
try:
|
||||||
|
with open(_path()) as f:
|
||||||
|
return json.load(f) or {}
|
||||||
|
except (FileNotFoundError, json.JSONDecodeError):
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def _write(data: dict) -> None:
|
||||||
|
p = _path()
|
||||||
|
Path(p).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
tmp = p + ".tmp"
|
||||||
|
with open(tmp, "w") as f:
|
||||||
|
json.dump(data, f, indent=2, sort_keys=False)
|
||||||
|
os.replace(tmp, p)
|
||||||
|
|
||||||
|
|
||||||
|
def load() -> dict:
|
||||||
|
with _lock:
|
||||||
|
d = _read()
|
||||||
|
d.setdefault("macs", {})
|
||||||
|
d.setdefault("current", {})
|
||||||
|
d.setdefault("last_change", {})
|
||||||
|
d.setdefault("events", [])
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
def record_mac(subject: str, mac: Optional[str]) -> None:
|
||||||
|
if not mac:
|
||||||
|
return
|
||||||
|
with _lock:
|
||||||
|
d = _read()
|
||||||
|
d.setdefault("macs", {})
|
||||||
|
if d["macs"].get(subject) != mac:
|
||||||
|
d["macs"][subject] = mac
|
||||||
|
_write(d)
|
||||||
|
|
||||||
|
|
||||||
|
def record_state(subject: str, reachable: bool) -> Optional[dict]:
|
||||||
|
"""Update current state for `subject`. If it differs from the last seen
|
||||||
|
state, append a transition event. Returns the event dict if a transition
|
||||||
|
was recorded, else None.
|
||||||
|
|
||||||
|
`subject` can be a Spark host key (spark1/spark2) or a service name
|
||||||
|
(parakeet/magpie/vllm).
|
||||||
|
"""
|
||||||
|
new_state = "up" if reachable else "down"
|
||||||
|
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
with _lock:
|
||||||
|
d = _read()
|
||||||
|
d.setdefault("macs", {})
|
||||||
|
d.setdefault("current", {})
|
||||||
|
d.setdefault("last_change", {})
|
||||||
|
d.setdefault("events", [])
|
||||||
|
prev = d["current"].get(subject)
|
||||||
|
if prev == new_state:
|
||||||
|
return None
|
||||||
|
event: dict = {
|
||||||
|
"subject": subject,
|
||||||
|
"at": now,
|
||||||
|
"kind": "transition",
|
||||||
|
"transition": new_state,
|
||||||
|
}
|
||||||
|
# When we have a previous state and timestamp, compute duration
|
||||||
|
last_change = d["last_change"].get(subject)
|
||||||
|
if prev and last_change:
|
||||||
|
try:
|
||||||
|
prev_dt = datetime.fromisoformat(last_change.replace("Z", "+00:00"))
|
||||||
|
duration = (datetime.now(timezone.utc) - prev_dt).total_seconds()
|
||||||
|
if prev == "down" and new_state == "up":
|
||||||
|
event["down_seconds"] = round(duration)
|
||||||
|
if prev == "up" and new_state == "down":
|
||||||
|
event["up_seconds"] = round(duration)
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
d["current"][subject] = new_state
|
||||||
|
d["last_change"][subject] = now
|
||||||
|
d["events"].append(event)
|
||||||
|
if len(d["events"]) > MAX_EVENTS:
|
||||||
|
d["events"] = d["events"][-MAX_EVENTS:]
|
||||||
|
_write(d)
|
||||||
|
return event
|
||||||
|
|
||||||
|
|
||||||
|
def record_report(
|
||||||
|
subject: str,
|
||||||
|
*,
|
||||||
|
ok: bool,
|
||||||
|
source: str = "external",
|
||||||
|
detail: str = "",
|
||||||
|
latency_ms: Optional[int] = None,
|
||||||
|
) -> dict:
|
||||||
|
"""Record a passive report from an external caller (e.g. Open WebUI got a
|
||||||
|
503 calling Parakeet). Always appended to the events list; does NOT change
|
||||||
|
the active-probe state (which only the polling probe is authoritative on).
|
||||||
|
"""
|
||||||
|
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
with _lock:
|
||||||
|
d = _read()
|
||||||
|
d.setdefault("events", [])
|
||||||
|
event: dict = {
|
||||||
|
"subject": subject,
|
||||||
|
"at": now,
|
||||||
|
"kind": "report",
|
||||||
|
"ok": bool(ok),
|
||||||
|
"source": source or "external",
|
||||||
|
}
|
||||||
|
if detail:
|
||||||
|
event["detail"] = detail
|
||||||
|
if latency_ms is not None:
|
||||||
|
event["latency_ms"] = int(latency_ms)
|
||||||
|
d["events"].append(event)
|
||||||
|
if len(d["events"]) > MAX_EVENTS:
|
||||||
|
d["events"] = d["events"][-MAX_EVENTS:]
|
||||||
|
_write(d)
|
||||||
|
return event
|
||||||
|
|
||||||
|
|
||||||
|
def get_mac(subject: str) -> Optional[str]:
|
||||||
|
d = load()
|
||||||
|
return d.get("macs", {}).get(subject)
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_event(e: dict) -> dict:
|
||||||
|
"""Promote legacy v0.5 events to the v0.6 shape so the UI sees one schema."""
|
||||||
|
if "subject" in e:
|
||||||
|
e.setdefault("kind", "transition")
|
||||||
|
return e
|
||||||
|
# Legacy: had "spark" + "transition" only
|
||||||
|
if "spark" in e:
|
||||||
|
e["subject"] = e.pop("spark")
|
||||||
|
e.setdefault("kind", "transition")
|
||||||
|
return e
|
||||||
|
|
||||||
|
|
||||||
|
def summary() -> dict:
|
||||||
|
"""Compact summary for the UI: known MACs, current state, recent events."""
|
||||||
|
d = load()
|
||||||
|
events = [_normalize_event(dict(e)) for e in d.get("events", [])]
|
||||||
|
return {
|
||||||
|
"macs": d.get("macs", {}),
|
||||||
|
"current": d.get("current", {}),
|
||||||
|
"last_change": d.get("last_change", {}),
|
||||||
|
"events": events[-80:],
|
||||||
|
}
|
||||||
@@ -0,0 +1,59 @@
|
|||||||
|
"""User-installed services persist in /data/services-overrides.yaml.
|
||||||
|
|
||||||
|
Format:
|
||||||
|
custom:
|
||||||
|
- key: my-riva
|
||||||
|
kind: stt
|
||||||
|
host: 192.168.1.87
|
||||||
|
user: modelo
|
||||||
|
container: riva-asr
|
||||||
|
port: 8001
|
||||||
|
health_path: /health
|
||||||
|
image: nvcr.io/nim/nvidia/riva-multilingual:latest
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
|
||||||
|
def _path() -> str:
|
||||||
|
return os.environ.get("SERVICES_OVERRIDES", "/data/services-overrides.yaml")
|
||||||
|
|
||||||
|
|
||||||
|
def load_custom_services() -> list[dict]:
|
||||||
|
try:
|
||||||
|
with open(_path()) as f:
|
||||||
|
data = yaml.safe_load(f) or {}
|
||||||
|
except FileNotFoundError:
|
||||||
|
return []
|
||||||
|
return data.get("custom") or []
|
||||||
|
|
||||||
|
|
||||||
|
def add_custom_service(entry: dict) -> None:
|
||||||
|
p = _path()
|
||||||
|
Path(p).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
data: dict = {}
|
||||||
|
try:
|
||||||
|
with open(p) as f:
|
||||||
|
data = yaml.safe_load(f) or {}
|
||||||
|
except FileNotFoundError:
|
||||||
|
pass
|
||||||
|
custom = data.get("custom") or []
|
||||||
|
custom = [c for c in custom if c.get("key") != entry["key"]]
|
||||||
|
custom.append(entry)
|
||||||
|
data["custom"] = custom
|
||||||
|
with open(p, "w") as f:
|
||||||
|
yaml.safe_dump(data, f, sort_keys=False)
|
||||||
|
|
||||||
|
|
||||||
|
def delete_custom_service(key: str) -> None:
|
||||||
|
p = _path()
|
||||||
|
try:
|
||||||
|
with open(p) as f:
|
||||||
|
data = yaml.safe_load(f) or {}
|
||||||
|
except FileNotFoundError:
|
||||||
|
return
|
||||||
|
data["custom"] = [c for c in (data.get("custom") or []) if c.get("key") != key]
|
||||||
|
with open(p, "w") as f:
|
||||||
|
yaml.safe_dump(data, f, sort_keys=False)
|
||||||
@@ -0,0 +1,363 @@
|
|||||||
|
"""Deep health probes for each service.
|
||||||
|
|
||||||
|
Why this exists: Triton's /health endpoint returns 200 as long as the HTTP
|
||||||
|
layer is alive and the model is registered. It does NOT verify that the CUDA
|
||||||
|
context inside the worker process is healthy. We've observed Parakeet getting
|
||||||
|
its CUDA context wedged after an OOM, where /health stays green but every
|
||||||
|
real transcription returns 500 cudaErrorUnknown.
|
||||||
|
|
||||||
|
So this module sends *real* but tiny synthetic inference requests:
|
||||||
|
- Parakeet: 1 second of digital silence (16 kHz mono PCM, in-memory WAV)
|
||||||
|
- Magpie: short text-to-speech, response audio discarded
|
||||||
|
- vLLM: 1-token chat completion against whatever model is loaded
|
||||||
|
|
||||||
|
All synthetic payloads are generated on demand into BytesIO, sent over HTTP,
|
||||||
|
and never touched the filesystem (on either spark-control's side or the
|
||||||
|
target service's side beyond normal Triton/Riva working memory).
|
||||||
|
|
||||||
|
When a probe fails with a signal that looks like a CUDA wedge, we
|
||||||
|
automatically issue `docker restart <container>`. Rate-limited to 3 restarts
|
||||||
|
per service per 30 minutes to avoid restart loops.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import io
|
||||||
|
import time
|
||||||
|
import wave
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .connectivity import record_report
|
||||||
|
from .services import ServiceDef, run_action, services_from_settings
|
||||||
|
|
||||||
|
|
||||||
|
# Default 5-minute interval, controllable via env. Sub-minute is silly for a
|
||||||
|
# heavy synthetic probe; we just want to catch wedges within a reasonable
|
||||||
|
# window — much faster than the user noticing on their next real call.
|
||||||
|
DEFAULT_INTERVAL_SEC = 300.0
|
||||||
|
PROBE_TIMEOUT_SEC = 20.0
|
||||||
|
RESTART_RATE_LIMIT = 3 # max auto-restarts per service
|
||||||
|
RESTART_RATE_WINDOW_SEC = 1800.0 # within a 30-min window
|
||||||
|
RESTART_COOLDOWN_SEC = 120.0 # don't restart again within this many seconds of the last one
|
||||||
|
STARTUP_GRACE_SEC = 60.0 # don't auto-restart for the first minute after this app boots
|
||||||
|
|
||||||
|
|
||||||
|
def _silence_wav(seconds: float = 1.0, sample_rate: int = 16000) -> io.BytesIO:
|
||||||
|
"""Return an in-memory WAV file containing `seconds` of digital silence."""
|
||||||
|
n_frames = int(seconds * sample_rate)
|
||||||
|
buf = io.BytesIO()
|
||||||
|
with wave.open(buf, "wb") as w:
|
||||||
|
w.setnchannels(1)
|
||||||
|
w.setsampwidth(2) # int16
|
||||||
|
w.setframerate(sample_rate)
|
||||||
|
w.writeframes(b"\x00\x00" * n_frames)
|
||||||
|
buf.seek(0)
|
||||||
|
return buf
|
||||||
|
|
||||||
|
|
||||||
|
def _looks_like_wedge(error: str) -> bool:
|
||||||
|
"""Heuristic: does this error string look like a stuck CUDA context that
|
||||||
|
a container restart would clear? We want to be conservative — only act
|
||||||
|
on signals we're confident about, otherwise leave the user in charge."""
|
||||||
|
err = (error or "").lower()
|
||||||
|
needles = [
|
||||||
|
"cudaerrorunknown",
|
||||||
|
"cuda error: unknown",
|
||||||
|
"cuda kernel errors",
|
||||||
|
"internal server error",
|
||||||
|
"engine core initialization failed",
|
||||||
|
"503", # service unavailable from a dependency
|
||||||
|
"500", # generic 5xx with a body that may not parse
|
||||||
|
]
|
||||||
|
return any(n in err for n in needles)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ProbeResult:
|
||||||
|
ok: bool
|
||||||
|
at: str
|
||||||
|
latency_ms: Optional[int] = None
|
||||||
|
error: str = ""
|
||||||
|
note: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ServiceState:
|
||||||
|
last: Optional[ProbeResult] = None
|
||||||
|
last_ok_at: Optional[str] = None
|
||||||
|
restarts: list[float] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class DeepHealth:
|
||||||
|
def __init__(self, settings: Settings, interval_sec: float = DEFAULT_INTERVAL_SEC) -> None:
|
||||||
|
self.settings = settings
|
||||||
|
self.interval_sec = interval_sec
|
||||||
|
self.state: dict[str, ServiceState] = {
|
||||||
|
"parakeet": ServiceState(),
|
||||||
|
"magpie": ServiceState(),
|
||||||
|
"vllm": ServiceState(),
|
||||||
|
}
|
||||||
|
self._stop = asyncio.Event()
|
||||||
|
self._boot_at = time.monotonic()
|
||||||
|
|
||||||
|
# ---- probes ---------------------------------------------------------
|
||||||
|
|
||||||
|
async def probe_parakeet(self) -> ProbeResult:
|
||||||
|
s = self.settings
|
||||||
|
now_iso = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
if not s.parakeet_host:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error="not configured")
|
||||||
|
url = f"http://{s.parakeet_host}:{s.parakeet_port}/v1/audio/transcriptions"
|
||||||
|
wav = _silence_wav(1.0)
|
||||||
|
t0 = time.monotonic()
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=PROBE_TIMEOUT_SEC) as c:
|
||||||
|
r = await c.post(
|
||||||
|
url,
|
||||||
|
files={"file": ("probe.wav", wav, "audio/wav")},
|
||||||
|
data={"model": "parakeet-tdt-0.6b-v3"},
|
||||||
|
)
|
||||||
|
latency = round((time.monotonic() - t0) * 1000)
|
||||||
|
if 200 <= r.status_code < 300:
|
||||||
|
return ProbeResult(ok=True, at=now_iso, latency_ms=latency)
|
||||||
|
return ProbeResult(
|
||||||
|
ok=False,
|
||||||
|
at=now_iso,
|
||||||
|
latency_ms=latency,
|
||||||
|
error=f"HTTP {r.status_code}: {r.text[:240]}",
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error=f"{type(e).__name__}: {e}")
|
||||||
|
|
||||||
|
async def probe_magpie(self) -> ProbeResult:
|
||||||
|
s = self.settings
|
||||||
|
now_iso = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
if not s.magpie_host:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error="not configured")
|
||||||
|
# Magpie /v1/audio/synthesize expects multipart form-data, not JSON.
|
||||||
|
# The (None, value) tuple in httpx's `files=` produces a non-file form field.
|
||||||
|
url = f"http://{s.magpie_host}:{s.magpie_port}/v1/audio/synthesize"
|
||||||
|
form: dict = {"text": (None, "hi"), "language": (None, "en-US")}
|
||||||
|
t0 = time.monotonic()
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=PROBE_TIMEOUT_SEC) as c:
|
||||||
|
r = await c.post(url, files=form)
|
||||||
|
latency = round((time.monotonic() - t0) * 1000)
|
||||||
|
if 200 <= r.status_code < 300:
|
||||||
|
return ProbeResult(ok=True, at=now_iso, latency_ms=latency)
|
||||||
|
# 4xx that aren't 5xx mean server is alive but our payload is off —
|
||||||
|
# don't classify as wedge.
|
||||||
|
if 400 <= r.status_code < 500:
|
||||||
|
return ProbeResult(
|
||||||
|
ok=True,
|
||||||
|
at=now_iso,
|
||||||
|
latency_ms=latency,
|
||||||
|
note=f"{r.status_code} — server alive (probe payload may need a voice name)",
|
||||||
|
)
|
||||||
|
return ProbeResult(
|
||||||
|
ok=False,
|
||||||
|
at=now_iso,
|
||||||
|
latency_ms=latency,
|
||||||
|
error=f"HTTP {r.status_code}: {r.text[:240]}",
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error=f"{type(e).__name__}: {e}")
|
||||||
|
|
||||||
|
async def probe_vllm(self) -> ProbeResult:
|
||||||
|
s = self.settings
|
||||||
|
now_iso = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
if not s.spark1_host:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error="not configured")
|
||||||
|
base = f"http://{s.spark1_host}:{s.vllm_port}"
|
||||||
|
# Step 1: is there a model loaded?
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=5.0) as c:
|
||||||
|
r = await c.get(f"{base}/v1/models")
|
||||||
|
if 200 <= r.status_code < 300:
|
||||||
|
models = r.json().get("data") or []
|
||||||
|
else:
|
||||||
|
# 5xx on /v1/models suggests something wedged after a model loaded
|
||||||
|
return ProbeResult(
|
||||||
|
ok=False,
|
||||||
|
at=now_iso,
|
||||||
|
error=f"list_models HTTP {r.status_code}: {r.text[:240]}",
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
# Connection refused / timeout: usually means no vLLM process listening
|
||||||
|
# (the vllm_node container is alive but no `vllm serve` is running yet).
|
||||||
|
# That's an idle state, not a wedge — don't trigger auto-restart.
|
||||||
|
return ProbeResult(
|
||||||
|
ok=True,
|
||||||
|
at=now_iso,
|
||||||
|
note="no model currently loaded (idle)",
|
||||||
|
)
|
||||||
|
|
||||||
|
if not models:
|
||||||
|
return ProbeResult(
|
||||||
|
ok=True,
|
||||||
|
at=now_iso,
|
||||||
|
note="no model currently loaded (idle)",
|
||||||
|
)
|
||||||
|
|
||||||
|
model_id = models[0]["id"]
|
||||||
|
# Step 2: model is loaded; verify it can actually complete a 1-token request.
|
||||||
|
t0 = time.monotonic()
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=PROBE_TIMEOUT_SEC) as c:
|
||||||
|
r = await c.post(
|
||||||
|
f"{base}/v1/chat/completions",
|
||||||
|
json={
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "hi"}],
|
||||||
|
"max_tokens": 1,
|
||||||
|
"temperature": 0,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
latency = round((time.monotonic() - t0) * 1000)
|
||||||
|
if 200 <= r.status_code < 300:
|
||||||
|
return ProbeResult(ok=True, at=now_iso, latency_ms=latency, note=f"model={model_id}")
|
||||||
|
return ProbeResult(
|
||||||
|
ok=False,
|
||||||
|
at=now_iso,
|
||||||
|
latency_ms=latency,
|
||||||
|
error=f"HTTP {r.status_code}: {r.text[:240]}",
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
return ProbeResult(ok=False, at=now_iso, error=f"{type(e).__name__}: {e}")
|
||||||
|
|
||||||
|
# ---- orchestration --------------------------------------------------
|
||||||
|
|
||||||
|
PROBES = {
|
||||||
|
"parakeet": "probe_parakeet",
|
||||||
|
"magpie": "probe_magpie",
|
||||||
|
"vllm": "probe_vllm",
|
||||||
|
}
|
||||||
|
|
||||||
|
async def run_one(self, service: str) -> ProbeResult:
|
||||||
|
fn = getattr(self, self.PROBES[service])
|
||||||
|
result: ProbeResult = await fn()
|
||||||
|
st = self.state[service]
|
||||||
|
prev_ok = st.last.ok if st.last else None
|
||||||
|
st.last = result
|
||||||
|
if result.ok:
|
||||||
|
st.last_ok_at = result.at
|
||||||
|
|
||||||
|
# Log to connectivity history: every failure, plus the first success
|
||||||
|
# after a failure (recovery), plus the first probe ever — but skip
|
||||||
|
# the "still ok" steady-state to keep the log readable.
|
||||||
|
if not result.ok:
|
||||||
|
record_report(
|
||||||
|
service,
|
||||||
|
ok=False,
|
||||||
|
source="deep-health",
|
||||||
|
detail=result.error[:240],
|
||||||
|
latency_ms=result.latency_ms,
|
||||||
|
)
|
||||||
|
elif prev_ok is False:
|
||||||
|
record_report(
|
||||||
|
service,
|
||||||
|
ok=True,
|
||||||
|
source="deep-health",
|
||||||
|
detail="recovered" + (f" — {result.note}" if result.note else ""),
|
||||||
|
latency_ms=result.latency_ms,
|
||||||
|
)
|
||||||
|
elif prev_ok is None:
|
||||||
|
record_report(
|
||||||
|
service,
|
||||||
|
ok=True,
|
||||||
|
source="deep-health",
|
||||||
|
detail="first probe ok" + (f" — {result.note}" if result.note else ""),
|
||||||
|
latency_ms=result.latency_ms,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Maybe auto-restart
|
||||||
|
if not result.ok and _looks_like_wedge(result.error):
|
||||||
|
await self._maybe_restart(service, result.error)
|
||||||
|
return result
|
||||||
|
|
||||||
|
async def _maybe_restart(self, service: str, error: str) -> None:
|
||||||
|
# No restarts during the boot grace period.
|
||||||
|
if time.monotonic() - self._boot_at < STARTUP_GRACE_SEC:
|
||||||
|
return
|
||||||
|
st = self.state[service]
|
||||||
|
now = time.monotonic()
|
||||||
|
st.restarts = [t for t in st.restarts if now - t < RESTART_RATE_WINDOW_SEC]
|
||||||
|
if st.restarts and now - st.restarts[-1] < RESTART_COOLDOWN_SEC:
|
||||||
|
return # already restarted recently, give it time
|
||||||
|
if len(st.restarts) >= RESTART_RATE_LIMIT:
|
||||||
|
record_report(
|
||||||
|
service,
|
||||||
|
ok=False,
|
||||||
|
source="deep-health",
|
||||||
|
detail=f"rate-limited; not auto-restarting (would be #{len(st.restarts)+1} in 30 min)",
|
||||||
|
)
|
||||||
|
return
|
||||||
|
services = services_from_settings(self.settings)
|
||||||
|
if service not in services:
|
||||||
|
return
|
||||||
|
svc = services[service]
|
||||||
|
if not svc.host or not svc.user:
|
||||||
|
return
|
||||||
|
result = await run_action(self.settings, svc, "restart")
|
||||||
|
st.restarts.append(now)
|
||||||
|
ok = result.get("ok", False)
|
||||||
|
record_report(
|
||||||
|
service,
|
||||||
|
ok=False,
|
||||||
|
source="deep-health",
|
||||||
|
detail=f"auto-restart triggered (wedge: {error[:120]}); restart {'OK' if ok else 'FAILED'}",
|
||||||
|
)
|
||||||
|
|
||||||
|
async def run_all(self) -> dict[str, ProbeResult]:
|
||||||
|
results = {}
|
||||||
|
for name in self.PROBES:
|
||||||
|
results[name] = await self.run_one(name)
|
||||||
|
return results
|
||||||
|
|
||||||
|
async def run_periodic(self) -> None:
|
||||||
|
"""Long-running loop. Cancel via .stop()."""
|
||||||
|
# Brief initial wait to let app finish startup
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(self._stop.wait(), timeout=10.0)
|
||||||
|
return
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
pass
|
||||||
|
while not self._stop.is_set():
|
||||||
|
try:
|
||||||
|
await self.run_all()
|
||||||
|
except Exception:
|
||||||
|
# Never let the loop die; the periodic check is best-effort
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(self._stop.wait(), timeout=self.interval_sec)
|
||||||
|
return
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
def stop(self) -> None:
|
||||||
|
self._stop.set()
|
||||||
|
|
||||||
|
def summary(self) -> dict:
|
||||||
|
out = {}
|
||||||
|
for name, st in self.state.items():
|
||||||
|
last = st.last
|
||||||
|
out[name] = {
|
||||||
|
"last_ok_at": st.last_ok_at,
|
||||||
|
"last": (
|
||||||
|
{
|
||||||
|
"ok": last.ok,
|
||||||
|
"at": last.at,
|
||||||
|
"latency_ms": last.latency_ms,
|
||||||
|
"error": last.error,
|
||||||
|
"note": last.note,
|
||||||
|
}
|
||||||
|
if last
|
||||||
|
else None
|
||||||
|
),
|
||||||
|
"auto_restarts_window": len(st.restarts),
|
||||||
|
}
|
||||||
|
return out
|
||||||
@@ -0,0 +1,134 @@
|
|||||||
|
"""On-disk presence + deletion for Hugging Face model caches on the Sparks.
|
||||||
|
|
||||||
|
The HF cache layout for a repo `org/name` is:
|
||||||
|
|
||||||
|
~/.cache/huggingface/hub/models--org--name/
|
||||||
|
|
||||||
|
We use `du -sb` to measure size (bytes) and `rm -rf` to free it. All operations
|
||||||
|
are gated by the server endpoints, which refuse to delete a currently-loaded
|
||||||
|
model or one tied to an in-flight swap/download.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
# HF cache dirnames are `models--<org>--<name>` where <org> and <name> only contain
|
||||||
|
# Hugging Face's allowed identifier chars: letters, digits, dot, dash, underscore.
|
||||||
|
# Validate against this whitelist so we can safely embed the dirname into a shell
|
||||||
|
# command without quoting (we need $HOME outside the quotes to expand).
|
||||||
|
_SAFE_DIRNAME = re.compile(r"^[A-Za-z0-9._\-]+$")
|
||||||
|
|
||||||
|
|
||||||
|
def repo_to_cache_dirname(repo: str) -> str:
|
||||||
|
"""Convert 'org/name' to 'models--org--name' (the HF hub cache directory)."""
|
||||||
|
if "/" not in repo:
|
||||||
|
raise ValueError(f"repo must be in 'org/name' form: {repo!r}")
|
||||||
|
dn = "models--" + repo.replace("/", "--")
|
||||||
|
if not _SAFE_DIRNAME.fullmatch(dn):
|
||||||
|
raise ValueError(f"unsafe cache dirname (rejected by whitelist): {dn!r}")
|
||||||
|
return dn
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HostDiskResult:
|
||||||
|
host: str
|
||||||
|
on_disk: bool
|
||||||
|
size_bytes: int = 0
|
||||||
|
error: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DiskStatus:
|
||||||
|
repo: str
|
||||||
|
on_disk: bool # True if present on AT LEAST one host
|
||||||
|
total_bytes: int # sum across hosts
|
||||||
|
per_host: list[HostDiskResult]
|
||||||
|
|
||||||
|
|
||||||
|
async def probe_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
||||||
|
"""Return whether the model's cache dir exists on this host and its size."""
|
||||||
|
if not host or not user:
|
||||||
|
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
|
||||||
|
dn = repo_to_cache_dirname(repo) # whitelisted; safe to embed
|
||||||
|
# $HOME must expand server-side, so we build the path with double quotes
|
||||||
|
# (which DO allow variable expansion) rather than shlex.quote single quotes.
|
||||||
|
cmd = (
|
||||||
|
f'P="$HOME/.cache/huggingface/hub/{dn}"; '
|
||||||
|
f'if [ -d "$P" ]; then du -sb "$P" 2>/dev/null | cut -f1; '
|
||||||
|
f'else echo MISSING; fi'
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=20.0)
|
||||||
|
if rc != 0:
|
||||||
|
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
|
||||||
|
raw = out.strip()
|
||||||
|
if raw == "MISSING" or raw == "":
|
||||||
|
return HostDiskResult(host=host, on_disk=False)
|
||||||
|
try:
|
||||||
|
size = int(raw.splitlines()[-1])
|
||||||
|
except ValueError:
|
||||||
|
return HostDiskResult(host=host, on_disk=False, error=f"unparsable du output: {raw!r}")
|
||||||
|
return HostDiskResult(host=host, on_disk=True, size_bytes=size)
|
||||||
|
|
||||||
|
|
||||||
|
async def probe_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
||||||
|
"""Probe one model across the relevant Sparks based on its mode (solo|cluster)."""
|
||||||
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
|
if mode == "cluster" and settings.spark2_host:
|
||||||
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
|
results = await asyncio.gather(*(probe_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
on_disk = any(r.on_disk for r in results)
|
||||||
|
total = sum(r.size_bytes for r in results)
|
||||||
|
return DiskStatus(repo=repo, on_disk=on_disk, total_bytes=total, per_host=list(results))
|
||||||
|
|
||||||
|
|
||||||
|
async def delete_host(host: str, user: str, repo: str, settings: Settings) -> HostDiskResult:
|
||||||
|
"""Probe + rm -rf on one host. Returns bytes freed (0 if the dir wasn't there)."""
|
||||||
|
if not host or not user:
|
||||||
|
return HostDiskResult(host=host or "?", on_disk=False, error="host not configured")
|
||||||
|
dn = repo_to_cache_dirname(repo) # whitelisted; safe to embed
|
||||||
|
# Compute size first, then remove. If absent, still return success (idempotent).
|
||||||
|
# $HOME is in double-quoted context so it expands; the dirname is whitelisted.
|
||||||
|
cmd = (
|
||||||
|
f'set -e; '
|
||||||
|
f'P="$HOME/.cache/huggingface/hub/{dn}"; '
|
||||||
|
f'if [ -d "$P" ]; then '
|
||||||
|
f' SIZE=$(du -sb "$P" 2>/dev/null | cut -f1); '
|
||||||
|
f' rm -rf -- "$P"; '
|
||||||
|
f' echo "FREED $SIZE"; '
|
||||||
|
f'else '
|
||||||
|
f' echo "FREED 0"; '
|
||||||
|
f'fi'
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(host, user, cmd, settings, timeout=120.0)
|
||||||
|
if rc != 0:
|
||||||
|
return HostDiskResult(host=host, on_disk=False, error=(err or out).strip() or f"rc={rc}")
|
||||||
|
# Parse the "FREED N" line
|
||||||
|
freed = 0
|
||||||
|
for line in out.splitlines():
|
||||||
|
parts = line.strip().split()
|
||||||
|
if len(parts) == 2 and parts[0] == "FREED":
|
||||||
|
try:
|
||||||
|
freed = int(parts[1])
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
break
|
||||||
|
return HostDiskResult(host=host, on_disk=False, size_bytes=freed)
|
||||||
|
|
||||||
|
|
||||||
|
async def delete_from_disk(repo: str, mode: str, settings: Settings) -> DiskStatus:
|
||||||
|
"""rm -rf the model's cache dir on the relevant Sparks. Idempotent."""
|
||||||
|
hosts: list[tuple[str, str]] = [(settings.spark1_host, settings.spark1_user)]
|
||||||
|
if mode == "cluster" and settings.spark2_host:
|
||||||
|
hosts.append((settings.spark2_host, settings.spark2_user))
|
||||||
|
|
||||||
|
results = await asyncio.gather(*(delete_host(h, u, repo, settings) for h, u in hosts))
|
||||||
|
total_freed = sum(r.size_bytes for r in results)
|
||||||
|
# After deletion, on_disk should be False on all hosts.
|
||||||
|
return DiskStatus(repo=repo, on_disk=False, total_bytes=total_freed, per_host=list(results))
|
||||||
+14
-5
@@ -19,7 +19,7 @@ from .config import Settings
|
|||||||
from .ssh import ssh_stream, StreamHandle
|
from .ssh import ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
Mode = Literal["solo", "cluster"]
|
Mode = Literal["spark1", "spark2", "cluster"]
|
||||||
|
|
||||||
|
|
||||||
_TQDM_RE = re.compile(
|
_TQDM_RE = re.compile(
|
||||||
@@ -113,17 +113,26 @@ class DownloadManager:
|
|||||||
|
|
||||||
async def _do(self, job: DownloadJob) -> None:
|
async def _do(self, job: DownloadJob) -> None:
|
||||||
s = self.settings
|
s = self.settings
|
||||||
if not s.spark1_host or not s.spark1_user:
|
# Pick the SSH target and hf-download flags from the mode.
|
||||||
raise RuntimeError("spark1 not configured")
|
if job.mode == "spark2":
|
||||||
|
target_host, target_user = s.spark2_host, s.spark2_user
|
||||||
|
flags = ""
|
||||||
|
elif job.mode == "cluster":
|
||||||
|
target_host, target_user = s.spark1_host, s.spark1_user
|
||||||
|
flags = "-c --copy-parallel"
|
||||||
|
else: # spark1
|
||||||
|
target_host, target_user = s.spark1_host, s.spark1_user
|
||||||
|
flags = ""
|
||||||
|
if not target_host or not target_user:
|
||||||
|
raise RuntimeError(f"{job.mode} host not configured")
|
||||||
|
|
||||||
flags = "-c --copy-parallel" if job.mode == "cluster" else ""
|
|
||||||
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {job.repo} {flags}".strip()
|
cmd = f"cd ~/spark-vllm-docker && ./hf-download.sh {job.repo} {flags}".strip()
|
||||||
job.append(f"$ {cmd}")
|
job.append(f"$ {cmd}")
|
||||||
job.state = "downloading"
|
job.state = "downloading"
|
||||||
job.progress.phase = "Connecting to Hugging Face…"
|
job.progress.phase = "Connecting to Hugging Face…"
|
||||||
|
|
||||||
handle = StreamHandle()
|
handle = StreamHandle()
|
||||||
async for line in ssh_stream(s.spark1_host, s.spark1_user, cmd, s, handle=handle):
|
async for line in ssh_stream(target_host, target_user, cmd, s, handle=handle):
|
||||||
job.append(line)
|
job.append(line)
|
||||||
self._update_progress(job, line)
|
self._update_progress(job, line)
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,137 @@
|
|||||||
|
"""Per-Spark hardware snapshots: RAM, disk, GPU memory + utilization, CPU load, uptime.
|
||||||
|
|
||||||
|
Drives via a single SSH command per Spark that runs `free`, `df`, `nvidia-smi`,
|
||||||
|
`/proc/loadavg`, and `uptime -p` and prints labeled lines back. We parse those
|
||||||
|
labels in `_parse`.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import time
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .connectivity import record_mac, record_state
|
||||||
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
_PROBE = r"""
|
||||||
|
set -e
|
||||||
|
echo HOSTNAME=$(hostname)
|
||||||
|
echo UPTIME=$(uptime -p 2>/dev/null || uptime)
|
||||||
|
echo LOAD=$(awk '{print $1, $2, $3}' /proc/loadavg)
|
||||||
|
echo CORES=$(nproc 2>/dev/null || echo 0)
|
||||||
|
echo MEMORY=$(free -b 2>/dev/null | awk '/^Mem:/ {print $2, $3}')
|
||||||
|
echo DISK=$(df -B1 / 2>/dev/null | awk 'NR==2 {print $2, $3}')
|
||||||
|
echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.draw,memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
|
||||||
|
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
|
||||||
|
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
|
||||||
|
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
|
||||||
|
""".strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_int(s: str) -> int | None:
|
||||||
|
try: return int(s)
|
||||||
|
except (TypeError, ValueError): return None
|
||||||
|
|
||||||
|
|
||||||
|
def _parse(out: str) -> dict:
|
||||||
|
info: dict[str, Any] = {}
|
||||||
|
for raw in out.splitlines():
|
||||||
|
if "=" not in raw:
|
||||||
|
continue
|
||||||
|
k, v = raw.split("=", 1)
|
||||||
|
info[k.strip().lower()] = v.strip()
|
||||||
|
parsed: dict[str, Any] = {}
|
||||||
|
parsed["hostname"] = info.get("hostname")
|
||||||
|
parsed["uptime"] = info.get("uptime")
|
||||||
|
parsed["cores"] = _parse_int(info.get("cores", ""))
|
||||||
|
# Load average -> (1m, 5m, 15m)
|
||||||
|
if info.get("load"):
|
||||||
|
loads = info["load"].split()
|
||||||
|
try:
|
||||||
|
parsed["load"] = [float(x) for x in loads[:3]]
|
||||||
|
except ValueError:
|
||||||
|
parsed["load"] = None
|
||||||
|
# Memory: total used in bytes
|
||||||
|
if info.get("memory"):
|
||||||
|
mem = info["memory"].split()
|
||||||
|
if len(mem) == 2:
|
||||||
|
tot, used = _parse_int(mem[0]), _parse_int(mem[1])
|
||||||
|
parsed["ram_total_bytes"] = tot
|
||||||
|
parsed["ram_used_bytes"] = used
|
||||||
|
# Disk: total used in bytes
|
||||||
|
if info.get("disk"):
|
||||||
|
dk = info["disk"].split()
|
||||||
|
if len(dk) == 2:
|
||||||
|
parsed["disk_total_bytes"] = _parse_int(dk[0])
|
||||||
|
parsed["disk_used_bytes"] = _parse_int(dk[1])
|
||||||
|
# GPU: "name, util_gpu, temp_C, power_W, memory_total_MiB"
|
||||||
|
if info.get("gpu"):
|
||||||
|
parts = [p.strip() for p in info["gpu"].split(",")]
|
||||||
|
if len(parts) >= 5:
|
||||||
|
name, ug, temp, power, mt = parts[0], parts[1], parts[2], parts[3], parts[4]
|
||||||
|
parsed["gpu_name"] = name
|
||||||
|
parsed["gpu_util_pct"] = _parse_int(ug)
|
||||||
|
parsed["gpu_temp_c"] = _parse_int(temp)
|
||||||
|
try: parsed["gpu_power_w"] = float(power)
|
||||||
|
except ValueError: parsed["gpu_power_w"] = None
|
||||||
|
# memory.total may be "[N/A]" on unified-memory systems (DGX Spark)
|
||||||
|
parsed["gpu_mem_total_mib"] = _parse_int(mt)
|
||||||
|
parsed["gpu_unified_memory"] = parsed["gpu_mem_total_mib"] is None
|
||||||
|
# Sum per-process compute memory (works even on unified-memory systems)
|
||||||
|
if info.get("gpu_mem_used_mib"):
|
||||||
|
parsed["gpu_mem_used_mib"] = _parse_int(info["gpu_mem_used_mib"])
|
||||||
|
# MAC address on the default-route interface (for Wake-on-LAN)
|
||||||
|
if info.get("mac"):
|
||||||
|
parsed["mac"] = info["mac"].lower()
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
|
class HardwareProbe:
|
||||||
|
"""Caches results briefly to avoid hammering the Sparks."""
|
||||||
|
|
||||||
|
def __init__(self, settings: Settings, ttl_sec: float = 4.0, fail_ttl_sec: float = 25.0) -> None:
|
||||||
|
self.settings = settings
|
||||||
|
self.ttl_sec = ttl_sec
|
||||||
|
self.fail_ttl_sec = fail_ttl_sec
|
||||||
|
self._cache: dict[str, tuple[float, dict]] = {}
|
||||||
|
self._locks: dict[str, asyncio.Lock] = {}
|
||||||
|
|
||||||
|
def _ttl_for(self, value: dict) -> float:
|
||||||
|
return self.ttl_sec if value.get("reachable") else self.fail_ttl_sec
|
||||||
|
|
||||||
|
def _lock(self, key: str) -> asyncio.Lock:
|
||||||
|
if key not in self._locks:
|
||||||
|
self._locks[key] = asyncio.Lock()
|
||||||
|
return self._locks[key]
|
||||||
|
|
||||||
|
async def fetch(self) -> dict:
|
||||||
|
s1, s2 = await asyncio.gather(
|
||||||
|
self._one("spark1", self.settings.spark1_host, self.settings.spark1_user),
|
||||||
|
self._one("spark2", self.settings.spark2_host, self.settings.spark2_user),
|
||||||
|
)
|
||||||
|
return {"spark1": s1, "spark2": s2}
|
||||||
|
|
||||||
|
async def _one(self, key: str, host: str, user: str) -> dict:
|
||||||
|
if not host or not user:
|
||||||
|
return {"reachable": False, "configured": False}
|
||||||
|
async with self._lock(key):
|
||||||
|
now = time.monotonic()
|
||||||
|
cached = self._cache.get(key)
|
||||||
|
if cached and (now - cached[0] < self._ttl_for(cached[1])):
|
||||||
|
return cached[1]
|
||||||
|
# Use a shorter timeout for the connect phase; if a previous probe
|
||||||
|
# marked this host unreachable, return the cached failure immediately.
|
||||||
|
rc, out, err = await ssh_run(host, user, _PROBE, self.settings, timeout=6)
|
||||||
|
if rc != 0:
|
||||||
|
result = {"reachable": False, "configured": True, "host": host, "error": err.strip() or out.strip() or f"rc={rc}"}
|
||||||
|
self._cache[key] = (now, result)
|
||||||
|
record_state(key, False)
|
||||||
|
return result
|
||||||
|
parsed = _parse(out)
|
||||||
|
result = {"reachable": True, "configured": True, "host": host, **parsed}
|
||||||
|
self._cache[key] = (now, result)
|
||||||
|
record_state(key, True)
|
||||||
|
if parsed.get("mac"):
|
||||||
|
record_mac(key, parsed["mac"])
|
||||||
|
return result
|
||||||
@@ -0,0 +1,202 @@
|
|||||||
|
"""NVIDIA NIM container install / lifecycle.
|
||||||
|
|
||||||
|
Two pieces:
|
||||||
|
* A small curated catalog of NIM images (so users don't have to copy/paste
|
||||||
|
huge nvcr.io URLs).
|
||||||
|
* An installer that SSHes into the target Spark, runs `docker pull` then
|
||||||
|
`docker run -d --gpus all -p PORT:PORT -v VOLUME:/opt/nim/.cache
|
||||||
|
-e NGC_API_KEY=... IMAGE` and streams output.
|
||||||
|
|
||||||
|
Custom services also persist via `overrides.add_custom_service()` so the
|
||||||
|
Services panel can show them.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .ssh import ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
|
# Curated list. These are the most useful NIM containers for a dual-Spark
|
||||||
|
# audio-and-LLM setup. Browse the full catalog at
|
||||||
|
# https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia
|
||||||
|
CATALOG_URL = "https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers"
|
||||||
|
|
||||||
|
|
||||||
|
SUGGESTED_NIMS: list[dict] = [
|
||||||
|
{
|
||||||
|
"key": "parakeet-tdt-0.6b-v3",
|
||||||
|
"name": "Parakeet TDT 0.6B v3",
|
||||||
|
"image": "nvcr.io/nim/nvidia/parakeet-tdt-0-6b-v3:latest",
|
||||||
|
"default_container": "parakeet-asr",
|
||||||
|
"default_port": 8000,
|
||||||
|
"kind": "stt",
|
||||||
|
"description": "Streaming speech-to-text (English). Used by Open WebUI for voice input. ~1 GB.",
|
||||||
|
"homepage": "https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/parakeet-tdt-0-6b-v3",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key": "magpie-tts-multilingual",
|
||||||
|
"name": "Magpie TTS Multilingual",
|
||||||
|
"image": "nvcr.io/nim/nvidia/magpie-tts-multilingual:latest",
|
||||||
|
"default_container": "magpie-tts",
|
||||||
|
"default_port": 9000,
|
||||||
|
"kind": "tts",
|
||||||
|
"description": "Multilingual text-to-speech. Counterpart to Parakeet for 'read aloud'. ~3 GB.",
|
||||||
|
"homepage": "https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/magpie-tts-multilingual",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key": "riva-multilingual",
|
||||||
|
"name": "Riva Multilingual ASR",
|
||||||
|
"image": "nvcr.io/nim/nvidia/riva-multilingual:latest",
|
||||||
|
"default_container": "riva-asr",
|
||||||
|
"default_port": 8001,
|
||||||
|
"kind": "stt",
|
||||||
|
"description": "NVIDIA Riva speech-recognition multi-language model. Larger and more accurate than Parakeet.",
|
||||||
|
"homepage": "https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class NimInstallJob:
|
||||||
|
id: str
|
||||||
|
image: str
|
||||||
|
container: str
|
||||||
|
port: int
|
||||||
|
host: str
|
||||||
|
user: str
|
||||||
|
volume: Optional[str]
|
||||||
|
started_at: str
|
||||||
|
state: str = "starting" # starting | pulling | running | done | failed
|
||||||
|
phase: str = "Starting…"
|
||||||
|
lines: list[str] = field(default_factory=list)
|
||||||
|
returncode: Optional[int] = None
|
||||||
|
finished_at: Optional[str] = None
|
||||||
|
|
||||||
|
def append(self, line: str) -> None:
|
||||||
|
self.lines.append(line)
|
||||||
|
if len(self.lines) > 1000:
|
||||||
|
del self.lines[: len(self.lines) - 1000]
|
||||||
|
|
||||||
|
|
||||||
|
class NimManager:
|
||||||
|
def __init__(self, settings: Settings) -> None:
|
||||||
|
self.settings = settings
|
||||||
|
self.lock = asyncio.Lock()
|
||||||
|
self.jobs: dict[str, NimInstallJob] = {}
|
||||||
|
self.current_job_id: Optional[str] = None
|
||||||
|
|
||||||
|
def get(self, job_id: str) -> NimInstallJob | None:
|
||||||
|
return self.jobs.get(job_id)
|
||||||
|
|
||||||
|
async def trigger(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
image: str,
|
||||||
|
container: str,
|
||||||
|
port: int,
|
||||||
|
host: str,
|
||||||
|
user: str,
|
||||||
|
volume: str | None = None,
|
||||||
|
extra_env: dict[str, str] | None = None,
|
||||||
|
) -> NimInstallJob:
|
||||||
|
if self.lock.locked():
|
||||||
|
raise RuntimeError("Another NIM install is already in progress")
|
||||||
|
if not host or not user:
|
||||||
|
raise RuntimeError("target host not configured")
|
||||||
|
if not self.settings.ngc_api_key:
|
||||||
|
raise RuntimeError(
|
||||||
|
"NGC_API_KEY is not set. Open Configure Sparks in StartOS and paste your NGC personal API key (free at https://ngc.nvidia.com/setup/personal-key)."
|
||||||
|
)
|
||||||
|
|
||||||
|
job = NimInstallJob(
|
||||||
|
id=uuid.uuid4().hex[:8],
|
||||||
|
image=image,
|
||||||
|
container=container,
|
||||||
|
port=port,
|
||||||
|
host=host,
|
||||||
|
user=user,
|
||||||
|
volume=volume or f"{container}-cache",
|
||||||
|
started_at=datetime.now(timezone.utc).isoformat(),
|
||||||
|
)
|
||||||
|
self.jobs[job.id] = job
|
||||||
|
self.current_job_id = job.id
|
||||||
|
asyncio.create_task(self._run(job, extra_env or {}))
|
||||||
|
return job
|
||||||
|
|
||||||
|
async def _run(self, job: NimInstallJob, extra_env: dict[str, str]) -> None:
|
||||||
|
async with self.lock:
|
||||||
|
try:
|
||||||
|
await self._do(job, extra_env)
|
||||||
|
if job.state != "failed":
|
||||||
|
job.state = "done"
|
||||||
|
job.returncode = 0
|
||||||
|
job.phase = "Done"
|
||||||
|
except Exception as e:
|
||||||
|
job.append(f"[error] {type(e).__name__}: {e}")
|
||||||
|
job.state = "failed"
|
||||||
|
if job.returncode is None:
|
||||||
|
job.returncode = 1
|
||||||
|
finally:
|
||||||
|
job.finished_at = datetime.now(timezone.utc).isoformat()
|
||||||
|
if self.current_job_id == job.id:
|
||||||
|
self.current_job_id = None
|
||||||
|
|
||||||
|
async def _do(self, job: NimInstallJob, extra_env: dict[str, str]) -> None:
|
||||||
|
# Build the bash one-liner. We use docker login non-interactively with the NGC API key.
|
||||||
|
env_parts = [f'-e NGC_API_KEY=$NGC_API_KEY']
|
||||||
|
for k, v in extra_env.items():
|
||||||
|
env_parts.append(f"-e {k}={v}")
|
||||||
|
env_str = " ".join(env_parts)
|
||||||
|
cmd = (
|
||||||
|
f"set -e; "
|
||||||
|
f"export NGC_API_KEY='{self.settings.ngc_api_key}'; "
|
||||||
|
f"echo '=== docker login nvcr.io ==='; "
|
||||||
|
f"echo \"$NGC_API_KEY\" | docker login nvcr.io -u '$oauthtoken' --password-stdin; "
|
||||||
|
f"echo '=== docker pull {job.image} (this can be 1-10 GB) ==='; "
|
||||||
|
f"docker pull {job.image}; "
|
||||||
|
f"echo '=== remove any prior container with the same name ==='; "
|
||||||
|
f"docker rm -f {job.container} 2>/dev/null || true; "
|
||||||
|
f"echo '=== docker run -d --gpus all -p {job.port}:{job.port} -v {job.volume}:/opt/nim/.cache {env_str} --name {job.container} --restart unless-stopped {job.image} ==='; "
|
||||||
|
f"docker run -d --gpus all "
|
||||||
|
f"-p {job.port}:{job.port} "
|
||||||
|
f"-v {job.volume}:/opt/nim/.cache "
|
||||||
|
f"{env_str} "
|
||||||
|
f"--name {job.container} "
|
||||||
|
f"--restart unless-stopped "
|
||||||
|
f"{job.image}; "
|
||||||
|
f"echo '=== ensuring cache volume is writable by uid 1000 (riva-server) ==='; "
|
||||||
|
f"docker run --rm -v {job.volume}:/cache alpine chown -R 1000:1000 /cache && "
|
||||||
|
f"docker restart {job.container}; "
|
||||||
|
f"echo '=== install complete; container is starting up and will download its model on first boot ==='"
|
||||||
|
)
|
||||||
|
job.append(f"$ <install command for {job.image} on {job.host}>")
|
||||||
|
job.state = "pulling"
|
||||||
|
job.phase = "Pulling image from nvcr.io (this can take a few minutes)…"
|
||||||
|
|
||||||
|
handle = StreamHandle()
|
||||||
|
async for line in ssh_stream(job.host, job.user, cmd, self.settings, handle=handle):
|
||||||
|
# Don't log lines containing the api key
|
||||||
|
if self.settings.ngc_api_key and self.settings.ngc_api_key in line:
|
||||||
|
continue
|
||||||
|
job.append(line)
|
||||||
|
if "docker pull" in line:
|
||||||
|
job.phase = "Pulling image from nvcr.io…"
|
||||||
|
elif "Login Succeeded" in line:
|
||||||
|
job.phase = "Logged in to NGC; pulling image…"
|
||||||
|
elif "Pull complete" in line:
|
||||||
|
job.phase = "Pulling layers…"
|
||||||
|
elif "Status: Downloaded newer image" in line or "Image is up to date" in line:
|
||||||
|
job.phase = "Image ready; starting container…"
|
||||||
|
elif "docker run -d" in line:
|
||||||
|
job.state = "running"
|
||||||
|
job.phase = "Container starting; downloading model on first boot…"
|
||||||
|
|
||||||
|
rc = handle.returncode or 0
|
||||||
|
if rc != 0:
|
||||||
|
job.state = "failed"
|
||||||
|
job.returncode = rc
|
||||||
+516
-1
@@ -10,14 +10,25 @@ from pydantic import BaseModel
|
|||||||
from typing import Literal
|
from typing import Literal
|
||||||
|
|
||||||
from .config import Settings
|
from .config import Settings
|
||||||
|
from .connectivity import get_mac, record_report, record_state, summary as connectivity_summary
|
||||||
|
from .custom_services import add_custom_service, delete_custom_service
|
||||||
|
from .audio_proxy import build_router as build_audio_router
|
||||||
|
from .deep_health import DeepHealth
|
||||||
|
from .disk import delete_from_disk, probe_disk
|
||||||
from .download import DownloadManager
|
from .download import DownloadManager
|
||||||
|
from .hardware import HardwareProbe
|
||||||
from .health import check_magpie, check_parakeet, check_vllm
|
from .health import check_magpie, check_parakeet, check_vllm
|
||||||
from .models import load_catalog
|
from .models import load_catalog
|
||||||
|
from .nim import SUGGESTED_NIMS, CATALOG_URL, NimManager
|
||||||
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
|
from .overrides import add_custom, delete_custom, extract_knobs_from_args, load_overrides, set_knobs
|
||||||
from .services import docker_state, run_action, services_from_settings
|
from .services import docker_state, run_action, services_from_settings
|
||||||
|
from .speech_models import SpeechModelsManager
|
||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
from .whisperx_install import WhisperXInstaller
|
||||||
from .swap import SwapManager
|
from .swap import SwapManager
|
||||||
from .updates import UpdateManager, get_update_status
|
from .updates import UpdateManager, get_update_status
|
||||||
|
from .validate import validate_launch
|
||||||
|
from .wol import send_local_broadcast, send_via_peer
|
||||||
|
|
||||||
|
|
||||||
settings = Settings.from_env()
|
settings = Settings.from_env()
|
||||||
@@ -25,12 +36,36 @@ catalog = load_catalog(settings.models_yaml)
|
|||||||
swap_manager = SwapManager(settings, catalog)
|
swap_manager = SwapManager(settings, catalog)
|
||||||
download_manager = DownloadManager(settings)
|
download_manager = DownloadManager(settings)
|
||||||
update_manager = UpdateManager(settings)
|
update_manager = UpdateManager(settings)
|
||||||
|
hardware_probe = HardwareProbe(settings)
|
||||||
|
nim_manager = NimManager(settings)
|
||||||
|
deep_health = DeepHealth(settings)
|
||||||
|
speech_models = SpeechModelsManager(settings)
|
||||||
|
whisperx_installer = WhisperXInstaller(settings)
|
||||||
|
|
||||||
app = FastAPI(title="spark-control", version="0.1.0")
|
app = FastAPI(title="spark-control", version="0.1.0")
|
||||||
|
|
||||||
|
|
||||||
|
@app.on_event("startup")
|
||||||
|
async def _start_deep_health() -> None:
|
||||||
|
# Fire-and-forget; the loop catches its own exceptions.
|
||||||
|
asyncio.create_task(deep_health.run_periodic())
|
||||||
|
|
||||||
|
|
||||||
|
@app.on_event("shutdown")
|
||||||
|
async def _stop_deep_health() -> None:
|
||||||
|
deep_health.stop()
|
||||||
|
|
||||||
|
|
||||||
_STATIC_DIR = Path(__file__).resolve().parent / "static"
|
_STATIC_DIR = Path(__file__).resolve().parent / "static"
|
||||||
app.mount("/static", StaticFiles(directory=_STATIC_DIR), name="static")
|
app.mount("/static", StaticFiles(directory=_STATIC_DIR), name="static")
|
||||||
|
|
||||||
|
# OpenAI-compatible audio proxy: /v1/audio/speech, /v1/audio/transcriptions, /v1/models.
|
||||||
|
# Lets Open WebUI, Home Assistant, and any other OpenAI-shaped client talk to
|
||||||
|
# Parakeet (STT) and Magpie (TTS) through a single spark-control URL.
|
||||||
|
# Passing deep_health lets the proxy fire an immediate wedge-detect + auto-restart
|
||||||
|
# when Parakeet returns 500, instead of waiting up to 5 min for the periodic probe.
|
||||||
|
app.include_router(build_audio_router(settings, deep_health=deep_health))
|
||||||
|
|
||||||
|
|
||||||
@app.get("/", include_in_schema=False)
|
@app.get("/", include_in_schema=False)
|
||||||
async def index() -> FileResponse:
|
async def index() -> FileResponse:
|
||||||
@@ -44,6 +79,7 @@ async def get_config() -> dict:
|
|||||||
"spark1_host": settings.spark1_host,
|
"spark1_host": settings.spark1_host,
|
||||||
"spark2_host": settings.spark2_host,
|
"spark2_host": settings.spark2_host,
|
||||||
"vllm_port": settings.vllm_port,
|
"vllm_port": settings.vllm_port,
|
||||||
|
"open_webui_url": settings.open_webui_url or None,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -116,6 +152,191 @@ async def del_model(key: str) -> dict:
|
|||||||
return {"ok": True, "key": key}
|
return {"ok": True, "key": key}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/models/disk-status")
|
||||||
|
async def get_models_disk_status() -> dict:
|
||||||
|
"""Probe each catalog model's HF cache on the appropriate Spark(s) in parallel.
|
||||||
|
|
||||||
|
Result is keyed by model key: {on_disk, total_bytes, per_host:[{host,on_disk,size_bytes,error?}]}.
|
||||||
|
Designed to be called once on dashboard load; takes ~1–3s depending on Spark count.
|
||||||
|
"""
|
||||||
|
if not settings.configured:
|
||||||
|
return {"configured": False, "models": {}}
|
||||||
|
keys = list(catalog.models.keys())
|
||||||
|
statuses = await asyncio.gather(*(
|
||||||
|
probe_disk(catalog.models[k].repo, catalog.models[k].mode, settings) for k in keys
|
||||||
|
), return_exceptions=True)
|
||||||
|
out: dict[str, dict] = {}
|
||||||
|
for k, s in zip(keys, statuses):
|
||||||
|
if isinstance(s, Exception):
|
||||||
|
out[k] = {"on_disk": False, "total_bytes": 0, "per_host": [], "error": str(s)}
|
||||||
|
continue
|
||||||
|
out[k] = {
|
||||||
|
"on_disk": s.on_disk,
|
||||||
|
"total_bytes": s.total_bytes,
|
||||||
|
"per_host": [
|
||||||
|
{"host": r.host, "on_disk": r.on_disk, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
||||||
|
for r in s.per_host
|
||||||
|
],
|
||||||
|
}
|
||||||
|
return {"configured": True, "models": out}
|
||||||
|
|
||||||
|
|
||||||
|
@app.delete("/api/models/{key}/disk")
|
||||||
|
async def del_model_disk(key: str) -> dict:
|
||||||
|
"""Delete a model's weights from the Spark filesystem(s). The catalog entry stays.
|
||||||
|
|
||||||
|
Safety rails:
|
||||||
|
- Refuses if the model is currently loaded on vLLM.
|
||||||
|
- Refuses if a swap or download is in flight.
|
||||||
|
- Idempotent: if the cache dir is already gone on a host, that host reports 0 bytes freed.
|
||||||
|
"""
|
||||||
|
if key not in catalog.models:
|
||||||
|
raise HTTPException(404, f"unknown model: {key}")
|
||||||
|
m = catalog.models[key]
|
||||||
|
|
||||||
|
# Refuse if currently loaded
|
||||||
|
try:
|
||||||
|
vllm = await check_vllm(settings)
|
||||||
|
except Exception:
|
||||||
|
vllm = {}
|
||||||
|
if vllm.get("ok") and vllm.get("current_model") == m.repo:
|
||||||
|
raise HTTPException(
|
||||||
|
409,
|
||||||
|
f"'{m.display_name}' is the currently loaded model. Switch to a different model first, then try again."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Refuse if a swap is in flight
|
||||||
|
if swap_manager.current_job_id:
|
||||||
|
raise HTTPException(409, "a model swap is in progress; wait for it to finish")
|
||||||
|
|
||||||
|
# Refuse if a download is in flight for this same repo (a different model's download is fine)
|
||||||
|
if download_manager.current_job_id:
|
||||||
|
job = download_manager.get(download_manager.current_job_id)
|
||||||
|
if job and job.repo == m.repo:
|
||||||
|
raise HTTPException(409, "this model is currently downloading; cancel or wait for it to finish")
|
||||||
|
|
||||||
|
status = await delete_from_disk(m.repo, m.mode, settings)
|
||||||
|
# Audit log
|
||||||
|
record_report(
|
||||||
|
f"disk:{key}",
|
||||||
|
ok=True,
|
||||||
|
source="disk-delete",
|
||||||
|
detail=f"freed {status.total_bytes} bytes across {len(status.per_host)} host(s)",
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"ok": True,
|
||||||
|
"key": key,
|
||||||
|
"repo": m.repo,
|
||||||
|
"bytes_freed": status.total_bytes,
|
||||||
|
"per_host": [
|
||||||
|
{"host": r.host, "size_bytes": r.size_bytes, **({"error": r.error} if r.error else {})}
|
||||||
|
for r in status.per_host
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/hardware")
|
||||||
|
async def get_hardware() -> dict:
|
||||||
|
"""Per-Spark hardware snapshot — RAM, disk, GPU mem + util, CPU load, uptime."""
|
||||||
|
return await hardware_probe.fetch()
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/connectivity")
|
||||||
|
async def get_connectivity() -> dict:
|
||||||
|
"""Up/down transition log per Spark + cached MACs."""
|
||||||
|
return connectivity_summary()
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/deep-health")
|
||||||
|
async def get_deep_health() -> dict:
|
||||||
|
"""Last result + auto-restart counters for each service's synthetic probe."""
|
||||||
|
return deep_health.summary()
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/deep-health/{service}/run")
|
||||||
|
async def run_deep_health(service: str) -> dict:
|
||||||
|
"""Manually run a single service's deep-health probe right now."""
|
||||||
|
if service not in deep_health.PROBES:
|
||||||
|
raise HTTPException(404, f"unknown service: {service}")
|
||||||
|
result = await deep_health.run_one(service)
|
||||||
|
return {
|
||||||
|
"ok": result.ok,
|
||||||
|
"at": result.at,
|
||||||
|
"latency_ms": result.latency_ms,
|
||||||
|
"error": result.error,
|
||||||
|
"note": result.note,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class HealthEventBody(BaseModel):
|
||||||
|
service: str # e.g. "parakeet", "magpie", "vllm"
|
||||||
|
ok: bool # true on success, false on failure
|
||||||
|
source: str | None = None # what app reported (e.g. "open-webui")
|
||||||
|
error: str | None = None # optional detail
|
||||||
|
ms: int | None = None # optional latency
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/health-event")
|
||||||
|
async def post_health_event(body: HealthEventBody) -> dict:
|
||||||
|
"""Passive endpoint: any LAN app can POST here when its call to one of our
|
||||||
|
services succeeds or (more usefully) fails. We log the report into the
|
||||||
|
connectivity history so a brief blip that polling misses still surfaces.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
curl -X POST http://<dashboard>/api/health-event \\
|
||||||
|
-H 'content-type: application/json' \\
|
||||||
|
-d '{"service":"parakeet","ok":false,"error":"503","source":"open-webui","ms":420}'
|
||||||
|
"""
|
||||||
|
if not body.service.strip():
|
||||||
|
raise HTTPException(400, "service is required")
|
||||||
|
event = record_report(
|
||||||
|
body.service.strip(),
|
||||||
|
ok=body.ok,
|
||||||
|
source=(body.source or "external").strip(),
|
||||||
|
detail=(body.error or "").strip(),
|
||||||
|
latency_ms=body.ms,
|
||||||
|
)
|
||||||
|
return {"ok": True, "recorded": event}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/spark/{name}/wake")
|
||||||
|
async def wake_spark(name: str) -> dict:
|
||||||
|
"""Send a Wake-on-LAN magic packet for the named Spark.
|
||||||
|
|
||||||
|
Tries the OTHER Spark (if reachable) first because the packet has to
|
||||||
|
originate on the target's LAN segment to be reliable. Falls back to a
|
||||||
|
direct UDP broadcast from this container.
|
||||||
|
"""
|
||||||
|
if name not in ("spark1", "spark2"):
|
||||||
|
raise HTTPException(404, f"unknown spark: {name}")
|
||||||
|
mac = get_mac(name)
|
||||||
|
if not mac:
|
||||||
|
raise HTTPException(400, f"MAC for {name} not yet known; bring it up once so we can probe it, then this will work next time it sleeps")
|
||||||
|
|
||||||
|
# Find the peer's connectivity to decide the path.
|
||||||
|
other = "spark2" if name == "spark1" else "spark1"
|
||||||
|
other_host = settings.spark1_host if other == "spark1" else settings.spark2_host
|
||||||
|
other_user = settings.spark1_user if other == "spark1" else settings.spark2_user
|
||||||
|
|
||||||
|
delivered_via = None
|
||||||
|
via_peer_ok = False
|
||||||
|
via_peer_err = ""
|
||||||
|
if other_host and other_user:
|
||||||
|
via_peer_ok, via_peer_err = await send_via_peer(other_host, other_user, mac, settings)
|
||||||
|
if via_peer_ok:
|
||||||
|
delivered_via = other
|
||||||
|
|
||||||
|
if not via_peer_ok:
|
||||||
|
# Fall back to direct from this container
|
||||||
|
try:
|
||||||
|
send_local_broadcast(mac)
|
||||||
|
delivered_via = "container"
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(500, f"WoL failed: peer={via_peer_err!r} container={e!r}")
|
||||||
|
|
||||||
|
return {"ok": True, "spark": name, "mac": mac, "delivered_via": delivered_via}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/services")
|
@app.get("/api/services")
|
||||||
async def get_services() -> dict:
|
async def get_services() -> dict:
|
||||||
"""Lifecycle state of always-on support services (Parakeet, Magpie, …).
|
"""Lifecycle state of always-on support services (Parakeet, Magpie, …).
|
||||||
@@ -158,9 +379,113 @@ async def get_services() -> dict:
|
|||||||
results = await asyncio.gather(*[one(n) for n in services.keys()])
|
results = await asyncio.gather(*[one(n) for n in services.keys()])
|
||||||
for name, info in results:
|
for name, info in results:
|
||||||
out[name] = info
|
out[name] = info
|
||||||
|
# Feed http reachability into the connectivity log (transition-only)
|
||||||
|
record_state(name, bool(info.get("http_ready")))
|
||||||
return out
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/nim/catalog")
|
||||||
|
async def get_nim_catalog() -> dict:
|
||||||
|
return {
|
||||||
|
"catalog_url": CATALOG_URL,
|
||||||
|
"ngc_key_configured": bool(settings.ngc_api_key),
|
||||||
|
"suggested": SUGGESTED_NIMS,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class NimInstallBody(BaseModel):
|
||||||
|
image: str
|
||||||
|
container: str
|
||||||
|
port: int
|
||||||
|
host: Literal["spark1", "spark2"] = "spark2"
|
||||||
|
kind: str = ""
|
||||||
|
register: bool = True # write to custom services overrides after install
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/nim/install")
|
||||||
|
async def post_nim_install(body: NimInstallBody) -> dict:
|
||||||
|
target_host = settings.spark1_host if body.host == "spark1" else settings.spark2_host
|
||||||
|
target_user = settings.spark1_user if body.host == "spark1" else settings.spark2_user
|
||||||
|
try:
|
||||||
|
job = await nim_manager.trigger(
|
||||||
|
image=body.image,
|
||||||
|
container=body.container,
|
||||||
|
port=body.port,
|
||||||
|
host=target_host,
|
||||||
|
user=target_user,
|
||||||
|
)
|
||||||
|
except RuntimeError as e:
|
||||||
|
raise HTTPException(409 if "in progress" in str(e) else 400, str(e))
|
||||||
|
|
||||||
|
if body.register:
|
||||||
|
# Persist in custom services so the panel shows it after install.
|
||||||
|
add_custom_service({
|
||||||
|
"key": body.container,
|
||||||
|
"kind": body.kind or "nim",
|
||||||
|
"host": target_host,
|
||||||
|
"user": target_user,
|
||||||
|
"container": body.container,
|
||||||
|
"port": body.port,
|
||||||
|
"image": body.image,
|
||||||
|
})
|
||||||
|
return {"job_id": job.id, "image": job.image, "container": job.container, "state": job.state}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/nim/install/{job_id}")
|
||||||
|
async def get_nim_install(job_id: str) -> dict:
|
||||||
|
job = nim_manager.get(job_id)
|
||||||
|
if job is None:
|
||||||
|
raise HTTPException(404, "no such job")
|
||||||
|
return {
|
||||||
|
"id": job.id,
|
||||||
|
"image": job.image,
|
||||||
|
"container": job.container,
|
||||||
|
"port": job.port,
|
||||||
|
"host": job.host,
|
||||||
|
"state": job.state,
|
||||||
|
"phase": job.phase,
|
||||||
|
"started_at": job.started_at,
|
||||||
|
"finished_at": job.finished_at,
|
||||||
|
"returncode": job.returncode,
|
||||||
|
"lines": job.lines,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/nim/install/{job_id}/stream")
|
||||||
|
async def stream_nim_install(job_id: str):
|
||||||
|
job = nim_manager.get(job_id)
|
||||||
|
if job is None:
|
||||||
|
raise HTTPException(404, "no such job")
|
||||||
|
|
||||||
|
async def gen():
|
||||||
|
sent = 0
|
||||||
|
last_phase = None
|
||||||
|
while True:
|
||||||
|
n = len(job.lines)
|
||||||
|
if n > sent:
|
||||||
|
for line in job.lines[sent:n]:
|
||||||
|
yield f"data: {json.dumps({'line': line})}\n\n"
|
||||||
|
sent = n
|
||||||
|
if job.phase != last_phase:
|
||||||
|
yield f"event: phase\ndata: {json.dumps({'state': job.state, 'phase': job.phase})}\n\n"
|
||||||
|
last_phase = job.phase
|
||||||
|
if job.returncode is not None and sent >= len(job.lines):
|
||||||
|
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
|
||||||
|
return
|
||||||
|
await asyncio.sleep(0.5)
|
||||||
|
|
||||||
|
return StreamingResponse(gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
|
||||||
|
@app.delete("/api/services/{name}")
|
||||||
|
async def del_service(name: str) -> dict:
|
||||||
|
# Only allow deleting custom services (not the bundled parakeet/magpie keys)
|
||||||
|
if name in ("parakeet", "magpie"):
|
||||||
|
raise HTTPException(400, "built-in service; cannot delete (use Configure Sparks to point at a different host)")
|
||||||
|
delete_custom_service(name)
|
||||||
|
return {"ok": True, "name": name}
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/services/{name}/{action}")
|
@app.post("/api/services/{name}/{action}")
|
||||||
async def service_action(name: str, action: str) -> dict:
|
async def service_action(name: str, action: str) -> dict:
|
||||||
services = services_from_settings(settings)
|
services = services_from_settings(settings)
|
||||||
@@ -174,6 +499,108 @@ async def service_action(name: str, action: str) -> dict:
|
|||||||
return {"name": name, "action": action, **result}
|
return {"name": name, "action": action, **result}
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Speech model patch management ----
|
||||||
|
|
||||||
|
@app.get("/api/speech-models")
|
||||||
|
async def get_speech_models() -> dict:
|
||||||
|
"""Status of the parakeet-asr container + the spark-control overlay patches
|
||||||
|
(diarizer.py + main.py). Drift between local shipped patches and what's
|
||||||
|
inside the container is surfaced so the UI can prompt for reapply."""
|
||||||
|
return await speech_models.status()
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/speech-models/reapply")
|
||||||
|
async def post_speech_models_reapply() -> dict:
|
||||||
|
"""Copy spark-control's shipped diarizer.py + patched main.py into the
|
||||||
|
parakeet-asr container, verify Python syntax, restart the container, and
|
||||||
|
wait for both models (Parakeet ASR + Sortformer) to reload. ~60–120 seconds."""
|
||||||
|
try:
|
||||||
|
result = await speech_models.reapply_patches()
|
||||||
|
except RuntimeError as e:
|
||||||
|
raise HTTPException(409, str(e))
|
||||||
|
if not result.get("ok"):
|
||||||
|
# Bubble up which step failed for client-side error rendering.
|
||||||
|
raise HTTPException(500, {"detail": "patch reapply failed", "result": result})
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/speech-models/restart")
|
||||||
|
async def post_speech_models_restart() -> dict:
|
||||||
|
"""`docker restart parakeet-asr` only — no file changes. Useful when the
|
||||||
|
container's models look wedged but patches are already current."""
|
||||||
|
try:
|
||||||
|
result = await speech_models.restart_container()
|
||||||
|
except RuntimeError as e:
|
||||||
|
raise HTTPException(409, str(e))
|
||||||
|
if not result.get("ok"):
|
||||||
|
raise HTTPException(500, {"detail": "container restart failed", "result": result})
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---- WhisperX install (Phase 2 of the WhisperX migration) ----
|
||||||
|
|
||||||
|
@app.get("/api/whisperx/status")
|
||||||
|
async def get_whisperx_status() -> dict:
|
||||||
|
"""Is WhisperX installed + healthy on Spark 2 right now?"""
|
||||||
|
return await whisperx_installer.status()
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/whisperx/install")
|
||||||
|
async def post_whisperx_install() -> dict:
|
||||||
|
"""One-click install: ships the WhisperX build context from inside
|
||||||
|
spark-control to Spark 2, runs `docker build` + `docker run`, polls
|
||||||
|
/health until both models are loaded. Streams progress via the matching
|
||||||
|
GET /api/whisperx/install/{job_id}/stream SSE endpoint."""
|
||||||
|
try:
|
||||||
|
job = await whisperx_installer.trigger()
|
||||||
|
except RuntimeError as e:
|
||||||
|
raise HTTPException(409, str(e))
|
||||||
|
return {"job_id": job.id, "started_at": job.started_at}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/whisperx/install/{job_id}")
|
||||||
|
async def get_whisperx_install(job_id: str) -> dict:
|
||||||
|
job = whisperx_installer.get(job_id)
|
||||||
|
if not job:
|
||||||
|
raise HTTPException(404, "unknown job")
|
||||||
|
return {
|
||||||
|
"id": job.id,
|
||||||
|
"state": job.state,
|
||||||
|
"phase": job.phase,
|
||||||
|
"lines": job.lines,
|
||||||
|
"started_at": job.started_at,
|
||||||
|
"finished_at": job.finished_at,
|
||||||
|
"returncode": job.returncode,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/whisperx/install/{job_id}/stream")
|
||||||
|
async def stream_whisperx_install(job_id: str) -> StreamingResponse:
|
||||||
|
job = whisperx_installer.get(job_id)
|
||||||
|
if not job:
|
||||||
|
raise HTTPException(404, "unknown job")
|
||||||
|
|
||||||
|
async def event_stream():
|
||||||
|
last_idx = 0
|
||||||
|
last_phase = ""
|
||||||
|
last_state = ""
|
||||||
|
while True:
|
||||||
|
new_lines = job.lines[last_idx:]
|
||||||
|
last_idx = len(job.lines)
|
||||||
|
for line in new_lines:
|
||||||
|
yield f"data: {json.dumps({'line': line})}\n\n"
|
||||||
|
if job.phase != last_phase or job.state != last_state:
|
||||||
|
yield f"event: phase\ndata: {json.dumps({'phase': job.phase, 'state': job.state})}\n\n"
|
||||||
|
last_phase = job.phase
|
||||||
|
last_state = job.state
|
||||||
|
if job.finished_at:
|
||||||
|
yield f"event: done\ndata: {json.dumps({'state': job.state, 'returncode': job.returncode})}\n\n"
|
||||||
|
return
|
||||||
|
await asyncio.sleep(0.6)
|
||||||
|
|
||||||
|
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/endpoints")
|
@app.get("/api/endpoints")
|
||||||
async def get_endpoints() -> dict:
|
async def get_endpoints() -> dict:
|
||||||
"""Service-discovery summary. Stable shape; other apps on the LAN can poll this
|
"""Service-discovery summary. Stable shape; other apps on the LAN can poll this
|
||||||
@@ -212,6 +639,10 @@ async def get_status() -> dict:
|
|||||||
check_parakeet(settings),
|
check_parakeet(settings),
|
||||||
check_magpie(settings),
|
check_magpie(settings),
|
||||||
)
|
)
|
||||||
|
# Feed health into the connectivity log (deduped — only logs on transition)
|
||||||
|
record_state("vllm", bool(vllm.get("ok")))
|
||||||
|
record_state("parakeet", bool(parakeet.get("ok")))
|
||||||
|
record_state("magpie", bool(magpie.get("ok")))
|
||||||
current_key = _identify_current_model(vllm.get("current_model"))
|
current_key = _identify_current_model(vllm.get("current_model"))
|
||||||
return {
|
return {
|
||||||
"configured": settings.configured,
|
"configured": settings.configured,
|
||||||
@@ -237,6 +668,15 @@ class SwapRequest(BaseModel):
|
|||||||
dry_run: bool = False
|
dry_run: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/swap/{key}/validate")
|
||||||
|
async def validate_swap(key: str) -> dict:
|
||||||
|
"""Pre-flight check: run vLLM's argparse layer against the proposed launch
|
||||||
|
command WITHOUT starting an engine. Cheap (~5 s) and doesn't disturb the
|
||||||
|
currently-loaded model.
|
||||||
|
"""
|
||||||
|
return await validate_launch(key, catalog, settings)
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/swap")
|
@app.post("/api/swap")
|
||||||
async def post_swap(req: SwapRequest) -> dict:
|
async def post_swap(req: SwapRequest) -> dict:
|
||||||
if not settings.configured and not req.dry_run:
|
if not settings.configured and not req.dry_run:
|
||||||
@@ -297,7 +737,7 @@ async def stream_swap(job_id: str):
|
|||||||
|
|
||||||
class DownloadRequest(BaseModel):
|
class DownloadRequest(BaseModel):
|
||||||
repo: str
|
repo: str
|
||||||
mode: Literal["solo", "cluster"] = "solo"
|
mode: Literal["spark1", "spark2", "cluster"] = "spark1"
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/download")
|
@app.post("/api/download")
|
||||||
@@ -376,6 +816,81 @@ async def get_updates() -> dict:
|
|||||||
return await get_update_status(settings)
|
return await get_update_status(settings)
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/explain-updates")
|
||||||
|
async def explain_updates():
|
||||||
|
"""Stream a layman's explanation of the pending commits from the currently-loaded vLLM model."""
|
||||||
|
import httpx
|
||||||
|
info = await get_update_status(settings)
|
||||||
|
if not info.get("ok"):
|
||||||
|
async def err_gen():
|
||||||
|
yield f"event: done\ndata: {json.dumps({'error': info.get('error', 'unknown')})}\n\n"
|
||||||
|
return StreamingResponse(err_gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
vllm = await check_vllm(settings)
|
||||||
|
if not vllm.get("ok") or not vllm.get("current_model"):
|
||||||
|
async def err_gen():
|
||||||
|
yield f"event: done\ndata: {json.dumps({'error': 'no vLLM model loaded — swap to a model first'})}\n\n"
|
||||||
|
return StreamingResponse(err_gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
commits = "\n".join(info.get("log", []))
|
||||||
|
if not commits.strip():
|
||||||
|
async def empty_gen():
|
||||||
|
yield f"event: done\ndata: {json.dumps({'error': 'no pending commits'})}\n\n"
|
||||||
|
return StreamingResponse(empty_gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
"You are reviewing pending git commits to `eugr/spark-vllm-docker`, an upstream community project that "
|
||||||
|
"orchestrates vLLM on dual NVIDIA DGX Spark hardware (Blackwell GPUs, cluster via Ray, recipes per model). "
|
||||||
|
"The reader has a setup running models like Qwen3.6-35B-A3B-NVFP4 (daily driver, solo), Qwen3-VL 235B (cluster), "
|
||||||
|
"and Gemma 4 31B. The reader is technically literate but is NOT a vLLM expert.\n\n"
|
||||||
|
"For the commit list below: give a short overall verdict (Apply / Optional / Skip and why), then a brief "
|
||||||
|
"bullet per commit grouping similar ones. Call out anything that would break a working setup or that "
|
||||||
|
"requires re-downloading models. Avoid jargon. ~250 words max.\n\n"
|
||||||
|
f"Pending commits:\n{commits}"
|
||||||
|
)
|
||||||
|
|
||||||
|
async def gen():
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=httpx.Timeout(300.0, connect=5.0)) as c:
|
||||||
|
async with c.stream(
|
||||||
|
"POST",
|
||||||
|
f"{vllm['base_url']}/chat/completions",
|
||||||
|
json={
|
||||||
|
"model": vllm["current_model"],
|
||||||
|
"stream": True,
|
||||||
|
"messages": [{"role": "user", "content": prompt}],
|
||||||
|
"max_tokens": 600,
|
||||||
|
"temperature": 0.4,
|
||||||
|
},
|
||||||
|
) as r:
|
||||||
|
r.raise_for_status()
|
||||||
|
async for line in r.aiter_lines():
|
||||||
|
if not line.startswith("data: "):
|
||||||
|
continue
|
||||||
|
data = line[6:].strip()
|
||||||
|
if data == "[DONE]":
|
||||||
|
break
|
||||||
|
try:
|
||||||
|
chunk = json.loads(data)
|
||||||
|
choices = chunk.get("choices") or []
|
||||||
|
if not choices:
|
||||||
|
continue
|
||||||
|
delta = choices[0].get("delta") or {}
|
||||||
|
text = delta.get("content")
|
||||||
|
reasoning = delta.get("reasoning")
|
||||||
|
if text:
|
||||||
|
yield f"data: {json.dumps({'content': text})}\n\n"
|
||||||
|
elif reasoning:
|
||||||
|
yield f"data: {json.dumps({'reasoning': reasoning})}\n\n"
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
except Exception as e:
|
||||||
|
yield f"data: {json.dumps({'error': f'{type(e).__name__}: {e}'})}\n\n"
|
||||||
|
yield f"event: done\ndata: {json.dumps({'ok': True})}\n\n"
|
||||||
|
|
||||||
|
return StreamingResponse(gen(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
|
||||||
class UpdateRequest(BaseModel):
|
class UpdateRequest(BaseModel):
|
||||||
mode: Literal["solo", "cluster"] = "cluster"
|
mode: Literal["solo", "cluster"] = "cluster"
|
||||||
|
|
||||||
|
|||||||
+50
-2
@@ -5,6 +5,7 @@ machinery. We just run `docker start|stop|restart <container>` via SSH on the
|
|||||||
appropriate host.
|
appropriate host.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Literal, Optional
|
from typing import Literal, Optional
|
||||||
|
|
||||||
@@ -12,6 +13,25 @@ from .config import Settings
|
|||||||
from .ssh import ssh_run
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
# Cache the "unreachable" verdict per (host, user) for a short period so that a
|
||||||
|
# repeated docker_state call doesn't re-pay the 6 s SSH connect timeout each time.
|
||||||
|
_UNREACHABLE_TTL = 25.0
|
||||||
|
_unreachable_cache: dict[tuple[str, str], float] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def _is_recently_unreachable(host: str, user: str) -> bool:
|
||||||
|
ts = _unreachable_cache.get((host, user))
|
||||||
|
return bool(ts and time.monotonic() - ts < _UNREACHABLE_TTL)
|
||||||
|
|
||||||
|
|
||||||
|
def _mark_unreachable(host: str, user: str) -> None:
|
||||||
|
_unreachable_cache[(host, user)] = time.monotonic()
|
||||||
|
|
||||||
|
|
||||||
|
def _clear_unreachable(host: str, user: str) -> None:
|
||||||
|
_unreachable_cache.pop((host, user), None)
|
||||||
|
|
||||||
|
|
||||||
ServiceName = Literal["parakeet", "magpie"]
|
ServiceName = Literal["parakeet", "magpie"]
|
||||||
ServiceAction = Literal["start", "stop", "restart"]
|
ServiceAction = Literal["start", "stop", "restart"]
|
||||||
|
|
||||||
@@ -27,7 +47,8 @@ class ServiceDef:
|
|||||||
|
|
||||||
|
|
||||||
def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
||||||
return {
|
from .custom_services import load_custom_services
|
||||||
|
out: dict[str, ServiceDef] = {
|
||||||
"parakeet": ServiceDef(
|
"parakeet": ServiceDef(
|
||||||
name="parakeet",
|
name="parakeet",
|
||||||
kind="stt",
|
kind="stt",
|
||||||
@@ -44,20 +65,47 @@ def services_from_settings(s: Settings) -> dict[str, ServiceDef]:
|
|||||||
container=s.magpie_container,
|
container=s.magpie_container,
|
||||||
port=s.magpie_port,
|
port=s.magpie_port,
|
||||||
),
|
),
|
||||||
|
"whisperx": ServiceDef(
|
||||||
|
name="whisperx",
|
||||||
|
kind="stt+diarize",
|
||||||
|
host=s.whisperx_host,
|
||||||
|
user=s.whisperx_user,
|
||||||
|
container=s.whisperx_container,
|
||||||
|
port=s.whisperx_port,
|
||||||
|
),
|
||||||
}
|
}
|
||||||
|
for entry in load_custom_services():
|
||||||
|
key = entry.get("key")
|
||||||
|
if not key or key in out:
|
||||||
|
continue
|
||||||
|
out[key] = ServiceDef(
|
||||||
|
name=key,
|
||||||
|
kind=entry.get("kind", ""),
|
||||||
|
host=entry.get("host", ""),
|
||||||
|
user=entry.get("user", ""),
|
||||||
|
container=entry.get("container", key),
|
||||||
|
port=int(entry.get("port", 0)),
|
||||||
|
)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
async def docker_state(settings: Settings, svc: ServiceDef) -> dict:
|
||||||
"""Get docker state (running, exited, restarting, etc.) + restart count."""
|
"""Get docker state (running, exited, restarting, etc.) + restart count."""
|
||||||
if not svc.host or not svc.user:
|
if not svc.host or not svc.user:
|
||||||
return {"state": "unconfigured", "restart_count": None, "uptime": None}
|
return {"state": "unconfigured", "restart_count": None, "uptime": None}
|
||||||
|
if _is_recently_unreachable(svc.host, svc.user):
|
||||||
|
return {"state": "unreachable", "host_unreachable": True, "restart_count": None, "uptime": None}
|
||||||
cmd = (
|
cmd = (
|
||||||
f"docker inspect {svc.container} "
|
f"docker inspect {svc.container} "
|
||||||
f"--format '{{{{.State.Status}}}}|{{{{.State.StartedAt}}}}|{{{{.RestartCount}}}}|{{{{.State.ExitCode}}}}|{{{{.State.Error}}}}' "
|
f"--format '{{{{.State.Status}}}}|{{{{.State.StartedAt}}}}|{{{{.RestartCount}}}}|{{{{.State.ExitCode}}}}|{{{{.State.Error}}}}' "
|
||||||
f"2>&1 || echo 'NOT_FOUND'"
|
f"2>&1 || echo 'NOT_FOUND'"
|
||||||
)
|
)
|
||||||
rc, out, _ = await ssh_run(svc.host, svc.user, cmd, settings, timeout=10)
|
rc, out, _ = await ssh_run(svc.host, svc.user, cmd, settings, timeout=6)
|
||||||
out = out.strip()
|
out = out.strip()
|
||||||
|
if rc == 124 or "timeout after" in out.lower():
|
||||||
|
_mark_unreachable(svc.host, svc.user)
|
||||||
|
return {"state": "unreachable", "host_unreachable": True, "restart_count": None, "uptime": None}
|
||||||
|
_clear_unreachable(svc.host, svc.user)
|
||||||
if rc != 0 or out.startswith("NOT_FOUND") or "Error" in out and "no such object" in out.lower():
|
if rc != 0 or out.startswith("NOT_FOUND") or "Error" in out and "no such object" in out.lower():
|
||||||
return {"state": "missing", "restart_count": None, "uptime": None, "raw": out}
|
return {"state": "missing", "restart_count": None, "uptime": None, "raw": out}
|
||||||
parts = out.split("|")
|
parts = out.split("|")
|
||||||
|
|||||||
@@ -0,0 +1,319 @@
|
|||||||
|
"""Speech-model patch management for the parakeet-asr container on Spark 2.
|
||||||
|
|
||||||
|
The parakeet-asr container ships with a stock FastAPI wrapper that only supports
|
||||||
|
ASR (Parakeet TDT). Spark Control augments it with two overlay files —
|
||||||
|
`diarizer.py` and a patched `main.py` — that add Sortformer-based diarization
|
||||||
|
and the `/v1/audio/diarize` endpoint.
|
||||||
|
|
||||||
|
These overlays survive `docker restart` (writable layer) but NOT `docker rm`
|
||||||
|
(volume rebuild). If the parakeet container is ever recreated, the overlays
|
||||||
|
need to be re-applied. This module handles that:
|
||||||
|
|
||||||
|
- GET /api/speech-models → current state (loaded models, patch
|
||||||
|
checksums, drift detection)
|
||||||
|
- POST /api/speech-models/reapply → copy overlays from spark-control's
|
||||||
|
shipped /app/parakeet_patches into
|
||||||
|
the parakeet container + restart
|
||||||
|
- POST /api/speech-models/restart → just `docker restart parakeet-asr`,
|
||||||
|
no overlay changes
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import shlex
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .connectivity import record_report
|
||||||
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
# /app/parakeet_patches inside the spark-control container image (set up by
|
||||||
|
# the Dockerfile COPY directive). Each file under here is the canonical
|
||||||
|
# version we'd push into the parakeet container.
|
||||||
|
PATCHES_DIR = Path(__file__).resolve().parent.parent / "parakeet_patches"
|
||||||
|
|
||||||
|
# Files we manage. Mapped local-source-path -> destination-path-in-container.
|
||||||
|
MANAGED_FILES = {
|
||||||
|
"diarizer.py": "/opt/parakeet/app/diarizer.py",
|
||||||
|
"main.py": "/opt/parakeet/app/main.py",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _sha256_short(text: bytes) -> str:
|
||||||
|
return hashlib.sha256(text).hexdigest()[:12]
|
||||||
|
|
||||||
|
|
||||||
|
def _local_patches() -> dict[str, dict]:
|
||||||
|
"""Read the canonical patch files shipped inside spark-control.
|
||||||
|
|
||||||
|
Returns: {local_name: {"path": str, "sha": str, "size": int, "missing": bool}}
|
||||||
|
"""
|
||||||
|
out: dict[str, dict] = {}
|
||||||
|
for local_name in MANAGED_FILES:
|
||||||
|
p = PATCHES_DIR / local_name
|
||||||
|
if not p.exists():
|
||||||
|
out[local_name] = {"path": str(p), "missing": True}
|
||||||
|
continue
|
||||||
|
body = p.read_bytes()
|
||||||
|
out[local_name] = {
|
||||||
|
"path": str(p),
|
||||||
|
"sha": _sha256_short(body),
|
||||||
|
"size": len(body),
|
||||||
|
"missing": False,
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
async def _parakeet_health(settings: Settings) -> dict:
|
||||||
|
"""Pull current model loading state from Parakeet's /health endpoint."""
|
||||||
|
url = f"http://{settings.parakeet_host}:{settings.parakeet_port}/health"
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=4.0) as client:
|
||||||
|
r = await client.get(url)
|
||||||
|
if r.status_code == 200:
|
||||||
|
return r.json()
|
||||||
|
return {"reachable": False, "status_code": r.status_code, "error": r.text[:200]}
|
||||||
|
except Exception as e:
|
||||||
|
return {"reachable": False, "error": f"{type(e).__name__}: {e}"}
|
||||||
|
|
||||||
|
|
||||||
|
async def _remote_file_sha(settings: Settings, container_path: str) -> Optional[str]:
|
||||||
|
"""sha256 of a file inside the parakeet container, or None if missing/error."""
|
||||||
|
if not settings.parakeet_host or not settings.parakeet_user:
|
||||||
|
return None
|
||||||
|
cmd = (
|
||||||
|
f"docker exec parakeet-asr sh -c "
|
||||||
|
f"'[ -f {shlex.quote(container_path)} ] && "
|
||||||
|
f"sha256sum {shlex.quote(container_path)} 2>/dev/null | cut -c1-12 || echo MISSING'"
|
||||||
|
)
|
||||||
|
rc, out, _ = await ssh_run(settings.parakeet_host, settings.parakeet_user, cmd, settings, timeout=15)
|
||||||
|
if rc != 0:
|
||||||
|
return None
|
||||||
|
s = out.strip()
|
||||||
|
if s == "MISSING" or not s:
|
||||||
|
return None
|
||||||
|
return s
|
||||||
|
|
||||||
|
|
||||||
|
class SpeechModelsManager:
|
||||||
|
"""Tracks last-reapply state in-memory; persists nothing across spark-control
|
||||||
|
restarts (the source-of-truth is what's actually inside the parakeet
|
||||||
|
container, which we read fresh on every status call)."""
|
||||||
|
|
||||||
|
def __init__(self, settings: Settings) -> None:
|
||||||
|
self.settings = settings
|
||||||
|
self.last_reapply_at: Optional[str] = None
|
||||||
|
self.last_reapply_result: Optional[dict] = None
|
||||||
|
self.last_restart_at: Optional[str] = None
|
||||||
|
self._reapply_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def status(self) -> dict:
|
||||||
|
"""Build the full speech-models status payload for the UI.
|
||||||
|
|
||||||
|
Compares the SHAs of files we shipped inside spark-control vs what's
|
||||||
|
actually running inside the parakeet container — surfaces drift if
|
||||||
|
patches were applied from an older spark-control version, or never
|
||||||
|
applied at all.
|
||||||
|
"""
|
||||||
|
local = _local_patches()
|
||||||
|
health = await _parakeet_health(self.settings)
|
||||||
|
|
||||||
|
# Probe remote SHAs in parallel
|
||||||
|
async def _probe(local_name: str) -> tuple[str, Optional[str]]:
|
||||||
|
return local_name, await _remote_file_sha(self.settings, MANAGED_FILES[local_name])
|
||||||
|
|
||||||
|
remote_results = await asyncio.gather(*(_probe(n) for n in MANAGED_FILES))
|
||||||
|
remote = {name: sha for name, sha in remote_results}
|
||||||
|
|
||||||
|
files = []
|
||||||
|
all_in_sync = True
|
||||||
|
any_missing_remote = False
|
||||||
|
for local_name in MANAGED_FILES:
|
||||||
|
local_info = local.get(local_name, {})
|
||||||
|
local_sha = local_info.get("sha")
|
||||||
|
remote_sha = remote.get(local_name)
|
||||||
|
in_sync = bool(local_sha) and (local_sha == remote_sha)
|
||||||
|
if not in_sync:
|
||||||
|
all_in_sync = False
|
||||||
|
if remote_sha is None:
|
||||||
|
any_missing_remote = True
|
||||||
|
files.append({
|
||||||
|
"name": local_name,
|
||||||
|
"container_path": MANAGED_FILES[local_name],
|
||||||
|
"local_sha": local_sha,
|
||||||
|
"remote_sha": remote_sha,
|
||||||
|
"in_sync": in_sync,
|
||||||
|
"size_bytes": local_info.get("size"),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Coarse status for the UI to render a single pill
|
||||||
|
if any_missing_remote:
|
||||||
|
patch_status = "missing" # overlay files missing in container
|
||||||
|
elif all_in_sync:
|
||||||
|
patch_status = "in_sync"
|
||||||
|
else:
|
||||||
|
patch_status = "drift" # local files newer than container
|
||||||
|
|
||||||
|
return {
|
||||||
|
"container_health": health,
|
||||||
|
"patches": {
|
||||||
|
"status": patch_status,
|
||||||
|
"files": files,
|
||||||
|
"last_reapply_at": self.last_reapply_at,
|
||||||
|
"last_reapply_result": self.last_reapply_result,
|
||||||
|
"last_restart_at": self.last_restart_at,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
async def reapply_patches(self) -> dict:
|
||||||
|
"""Copy the patches shipped inside spark-control into the parakeet
|
||||||
|
container, verify syntax, and restart it. Same logic as apply.sh but
|
||||||
|
run from inside spark-control's FastAPI process."""
|
||||||
|
if self._reapply_lock.locked():
|
||||||
|
raise RuntimeError("a patch reapply is already in progress")
|
||||||
|
async with self._reapply_lock:
|
||||||
|
return await self._do_reapply()
|
||||||
|
|
||||||
|
async def _do_reapply(self) -> dict:
|
||||||
|
s = self.settings
|
||||||
|
if not s.parakeet_host or not s.parakeet_user:
|
||||||
|
raise RuntimeError("parakeet host/user not configured")
|
||||||
|
|
||||||
|
steps: list[dict] = []
|
||||||
|
|
||||||
|
# 0. Verify local patches present
|
||||||
|
local = _local_patches()
|
||||||
|
for name, info in local.items():
|
||||||
|
if info.get("missing"):
|
||||||
|
steps.append({"step": "verify_local", "ok": False, "name": name, "error": "patch file missing inside spark-control image"})
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
steps.append({"step": "verify_local", "ok": True, "files": list(local.keys())})
|
||||||
|
|
||||||
|
# 1. Backup main.py inside container (idempotent — only if backup doesn't already exist)
|
||||||
|
backup_cmd = (
|
||||||
|
"docker exec parakeet-asr sh -c '"
|
||||||
|
"test -f /opt/parakeet/app/main.py.pre-sortformer || "
|
||||||
|
"cp /opt/parakeet/app/main.py /opt/parakeet/app/main.py.pre-sortformer"
|
||||||
|
"'"
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(s.parakeet_host, s.parakeet_user, backup_cmd, s, timeout=15)
|
||||||
|
steps.append({"step": "backup_original", "ok": rc == 0, "stdout": out.strip()[:200], "stderr": err.strip()[:200]})
|
||||||
|
if rc != 0:
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
|
||||||
|
# 2. Copy each patch file into the container via `docker exec -i ... 'cat > path'`
|
||||||
|
for local_name, container_path in MANAGED_FILES.items():
|
||||||
|
local_body = (PATCHES_DIR / local_name).read_bytes()
|
||||||
|
copy_cmd = f"docker exec -i parakeet-asr sh -c {shlex.quote('cat > ' + container_path)}"
|
||||||
|
ok, out, err = await self._ssh_pipe_to_remote(
|
||||||
|
s.parakeet_host, s.parakeet_user, copy_cmd, local_body, s, timeout=30
|
||||||
|
)
|
||||||
|
steps.append({"step": "copy_file", "name": local_name, "ok": ok,
|
||||||
|
"bytes": len(local_body), "stdout": out[:200], "stderr": err[:200]})
|
||||||
|
if not ok:
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
|
||||||
|
# 3. Verify Python syntax inside the container
|
||||||
|
syntax_cmd = (
|
||||||
|
"docker exec parakeet-asr python3 -c "
|
||||||
|
"'import ast; "
|
||||||
|
"ast.parse(open(\"/opt/parakeet/app/diarizer.py\").read()); "
|
||||||
|
"ast.parse(open(\"/opt/parakeet/app/main.py\").read()); "
|
||||||
|
"print(\"py OK\")'"
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(s.parakeet_host, s.parakeet_user, syntax_cmd, s, timeout=30)
|
||||||
|
ok = rc == 0 and "py OK" in out
|
||||||
|
steps.append({"step": "verify_syntax", "ok": ok, "stdout": out.strip()[:300], "stderr": err.strip()[:300]})
|
||||||
|
if not ok:
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
|
||||||
|
# 4. Restart the container
|
||||||
|
restart_cmd = "docker restart parakeet-asr"
|
||||||
|
rc, out, err = await ssh_run(s.parakeet_host, s.parakeet_user, restart_cmd, s, timeout=60)
|
||||||
|
steps.append({"step": "docker_restart", "ok": rc == 0, "stdout": out.strip()[:200], "stderr": err.strip()[:200]})
|
||||||
|
if rc != 0:
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
|
||||||
|
# 5. Poll /health until both models are loaded again (up to ~120s)
|
||||||
|
loaded = False
|
||||||
|
for _ in range(40):
|
||||||
|
await asyncio.sleep(3)
|
||||||
|
h = await _parakeet_health(s)
|
||||||
|
if h.get("asr_loaded") and h.get("diarizer_loaded"):
|
||||||
|
loaded = True
|
||||||
|
steps.append({"step": "verify_health", "ok": True, "asr_loaded": True, "diarizer_loaded": True})
|
||||||
|
break
|
||||||
|
if not loaded:
|
||||||
|
steps.append({"step": "verify_health", "ok": False, "error": "models did not load within 120s"})
|
||||||
|
return self._finish_reapply(False, steps)
|
||||||
|
|
||||||
|
return self._finish_reapply(True, steps)
|
||||||
|
|
||||||
|
def _finish_reapply(self, success: bool, steps: list[dict]) -> dict:
|
||||||
|
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
self.last_reapply_at = now
|
||||||
|
result = {"ok": success, "at": now, "steps": steps}
|
||||||
|
self.last_reapply_result = result
|
||||||
|
record_report(
|
||||||
|
"parakeet",
|
||||||
|
ok=success,
|
||||||
|
source="speech-models-reapply",
|
||||||
|
detail=f"reapply patches: {'OK' if success else 'FAILED at step ' + str([s for s in steps if not s.get('ok')][:1])}",
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
async def restart_container(self) -> dict:
|
||||||
|
"""Restart the parakeet-asr container without changing any files."""
|
||||||
|
s = self.settings
|
||||||
|
if not s.parakeet_host or not s.parakeet_user:
|
||||||
|
raise RuntimeError("parakeet host/user not configured")
|
||||||
|
rc, out, err = await ssh_run(s.parakeet_host, s.parakeet_user,
|
||||||
|
"docker restart parakeet-asr", s, timeout=60)
|
||||||
|
ok = rc == 0
|
||||||
|
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
|
||||||
|
self.last_restart_at = now
|
||||||
|
record_report(
|
||||||
|
"parakeet",
|
||||||
|
ok=ok,
|
||||||
|
source="speech-models-restart",
|
||||||
|
detail=f"manual restart: {'OK' if ok else 'rc=' + str(rc) + ' ' + err.strip()[:120]}",
|
||||||
|
)
|
||||||
|
return {"ok": ok, "at": now, "stdout": out.strip()[:200], "stderr": err.strip()[:200]}
|
||||||
|
|
||||||
|
async def _ssh_pipe_to_remote(
|
||||||
|
self,
|
||||||
|
host: str,
|
||||||
|
user: str,
|
||||||
|
remote_cmd: str,
|
||||||
|
payload: bytes,
|
||||||
|
settings: Settings,
|
||||||
|
timeout: float = 30.0,
|
||||||
|
) -> tuple[bool, str, str]:
|
||||||
|
"""Run `ssh user@host <remote_cmd>` while piping `payload` to its stdin.
|
||||||
|
This is the bash equivalent of `ssh ... '<cmd>' < local_file`.
|
||||||
|
|
||||||
|
Returns (success, stdout_str, stderr_str)."""
|
||||||
|
from .ssh import _base_args
|
||||||
|
args = _base_args(settings) + [f"{user}@{host}", remote_cmd]
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
*args,
|
||||||
|
stdin=asyncio.subprocess.PIPE,
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.PIPE,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
stdout_b, stderr_b = await asyncio.wait_for(
|
||||||
|
proc.communicate(input=payload), timeout=timeout
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
proc.kill()
|
||||||
|
await proc.wait()
|
||||||
|
return False, "", f"timeout after {timeout}s"
|
||||||
|
ok = proc.returncode == 0
|
||||||
|
return ok, stdout_b.decode(errors="replace"), stderr_b.decode(errors="replace")
|
||||||
+1116
-24
File diff suppressed because it is too large
Load Diff
+212
-12
@@ -16,6 +16,7 @@
|
|||||||
<div class="current" id="current">
|
<div class="current" id="current">
|
||||||
<span class="muted">connecting…</span>
|
<span class="muted">connecting…</span>
|
||||||
</div>
|
</div>
|
||||||
|
<a id="open-webui-link" class="topbar-btn hidden" href="#" target="_blank" rel="noopener" title="Open Open WebUI">Open chat ↗</a>
|
||||||
</header>
|
</header>
|
||||||
|
|
||||||
<main>
|
<main>
|
||||||
@@ -24,23 +25,55 @@
|
|||||||
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
|
<span>Run the <em>Configure Sparks</em> action in StartOS to set hostnames, then run <em>Test Connection</em>.</span>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="endpoint-panel" class="endpoint-panel hidden">
|
<section id="hardware-panel" class="hardware-panel hidden">
|
||||||
|
<div class="section-header">
|
||||||
|
<h2 class="section-title">Spark hardware</h2>
|
||||||
|
<button id="open-connectivity" class="btn small-btn">Connectivity log</button>
|
||||||
|
</div>
|
||||||
|
<div id="hardware-grid" class="hardware-grid"></div>
|
||||||
|
|
||||||
|
<dialog id="connectivity-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form">
|
||||||
|
<h3>Spark connectivity history</h3>
|
||||||
|
<p class="muted small">Most recent up/down transitions per Spark. Tracked since this dashboard was installed.</p>
|
||||||
|
<div id="connectivity-content" class="connectivity-content"></div>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="connectivity-close" class="btn">Close</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section id="endpoint-panel" class="endpoint-panel hidden collapsed">
|
||||||
|
<div class="ep-header">
|
||||||
<div class="ep-title muted small">OpenAI-compatible endpoint</div>
|
<div class="ep-title muted small">OpenAI-compatible endpoint</div>
|
||||||
|
<button type="button" class="icon-btn ep-collapse-btn" id="ep-collapse" title="Show / hide endpoint details" aria-label="Toggle endpoint details">
|
||||||
|
<svg viewBox="0 0 24 24" width="14" height="14" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><polyline points="6 9 12 15 18 9"></polyline></svg>
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
<div class="ep-body">
|
||||||
<div class="ep-row">
|
<div class="ep-row">
|
||||||
<span class="ep-label">Base URL</span>
|
<span class="ep-label">Base URL</span>
|
||||||
<code class="ep-value" id="ep-url">—</code>
|
<code class="ep-value copyable" id="ep-url" data-copy-self title="Click to copy">—</code>
|
||||||
<button class="copy-btn" data-copy="#ep-url" title="Copy base URL">Copy</button>
|
<button class="icon-btn" data-copy="#ep-url" title="Copy base URL" aria-label="Copy">
|
||||||
|
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
||||||
|
</button>
|
||||||
</div>
|
</div>
|
||||||
<div class="ep-row">
|
<div class="ep-row">
|
||||||
<span class="ep-label">Model ID</span>
|
<span class="ep-label">Model ID</span>
|
||||||
<code class="ep-value" id="ep-model">—</code>
|
<code class="ep-value copyable" id="ep-model" data-copy-self title="Click to copy">—</code>
|
||||||
<button class="copy-btn" data-copy="#ep-model" title="Copy model ID">Copy</button>
|
<button class="icon-btn" data-copy="#ep-model" title="Copy model ID" aria-label="Copy">
|
||||||
|
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
||||||
|
</button>
|
||||||
</div>
|
</div>
|
||||||
<details class="ep-curl">
|
<details class="ep-curl">
|
||||||
<summary class="muted small">curl example</summary>
|
<summary class="muted small">curl example</summary>
|
||||||
<pre id="ep-curl-snippet" class="snippet"></pre>
|
<pre id="ep-curl-snippet" class="snippet copyable" data-copy-self title="Click to copy"></pre>
|
||||||
<button class="copy-btn small" data-copy="#ep-curl-snippet">Copy snippet</button>
|
<button class="icon-btn" data-copy="#ep-curl-snippet" title="Copy snippet" aria-label="Copy">
|
||||||
|
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>
|
||||||
|
</button>
|
||||||
</details>
|
</details>
|
||||||
|
</div><!-- /.ep-body -->
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="swap-panel" class="swap-panel hidden">
|
<section id="swap-panel" class="swap-panel hidden">
|
||||||
@@ -63,11 +96,144 @@
|
|||||||
</details>
|
</details>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="services-panel" class="services hidden">
|
<nav id="dashboard-tabs" class="dashboard-tabs hidden" role="tablist">
|
||||||
<h2 class="section-title">Always-on services</h2>
|
<button type="button" class="dashboard-tab" data-tab="llm" role="tab" aria-selected="true">LLM</button>
|
||||||
<div id="services-grid" class="services-grid"></div>
|
<button type="button" class="dashboard-tab" data-tab="audio" role="tab" aria-selected="false">Audio / Speech</button>
|
||||||
|
</nav>
|
||||||
|
|
||||||
|
<div class="tab-content" id="tab-audio" role="tabpanel" aria-labelledby="tab-audio-trigger">
|
||||||
|
|
||||||
|
<section id="whisperx-install-card" class="whisperx-install hidden">
|
||||||
|
<div class="wx-install-body">
|
||||||
|
<div class="wx-install-title">
|
||||||
|
<strong>Add WhisperX</strong>
|
||||||
|
<span class="tag ok">recommended</span>
|
||||||
|
</div>
|
||||||
|
<p class="muted small">
|
||||||
|
WhisperX is a single-container speech pipeline (faster-whisper for transcription + pyannote 3.1 for diarization)
|
||||||
|
designed to handle long audio cleanly. Replaces the Parakeet + Sortformer combo we patched together,
|
||||||
|
which crashed on a 90-min meeting. Pulled and built directly on Spark 2 (~10–15 min first time;
|
||||||
|
you only do this once).
|
||||||
|
</p>
|
||||||
|
<p class="muted small">
|
||||||
|
Requires a Hugging Face token at <code>~/.cache/huggingface/token</code> on Spark 2 (already set up).
|
||||||
|
</p>
|
||||||
|
<div class="wx-install-actions">
|
||||||
|
<button id="wx-install" class="btn primary">Install WhisperX</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<dialog id="whisperx-progress-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form">
|
||||||
|
<h3 id="wx-prog-title">Installing WhisperX…</h3>
|
||||||
|
<div class="phase-row">
|
||||||
|
<span class="spinner"></span>
|
||||||
|
<div class="phase" id="wx-prog-phase">Starting…</div>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<span class="timer" id="wx-prog-elapsed">0:00</span>
|
||||||
|
</div>
|
||||||
|
<details open>
|
||||||
|
<summary class="muted small">Build log</summary>
|
||||||
|
<pre id="wx-prog-log" class="log"></pre>
|
||||||
|
</details>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="wx-prog-close" class="btn">Close</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
|
||||||
|
<section id="services-panel" class="services hidden">
|
||||||
|
<div class="section-header">
|
||||||
|
<h2 class="section-title">Always-on services</h2>
|
||||||
|
<button id="open-nim" class="btn small-btn">+ Install NIM</button>
|
||||||
|
</div>
|
||||||
|
<div id="services-grid" class="services-grid"></div>
|
||||||
|
|
||||||
|
<dialog id="nim-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form" id="nim-form">
|
||||||
|
<h3>Install a NVIDIA NIM container</h3>
|
||||||
|
<p class="muted small" id="nim-key-warn"></p>
|
||||||
|
<p class="muted small">Pick a curated container below or paste any image from <a href="#" id="nim-catalog-link" target="_blank" rel="noopener">the NGC NIM catalog</a>. Spark Control will <code>docker pull</code> and <code>docker run</code> it on the target Spark.</p>
|
||||||
|
|
||||||
|
<div id="nim-suggested" class="nim-grid"></div>
|
||||||
|
|
||||||
|
<fieldset class="modal-fieldset">
|
||||||
|
<legend>Custom image</legend>
|
||||||
|
<label class="modal-row"><span>Image (nvcr.io/...)</span><input type="text" id="nim-image" placeholder="nvcr.io/nim/nvidia/<name>:latest"></label>
|
||||||
|
<label class="modal-row"><span>Container name</span><input type="text" id="nim-container" placeholder="my-service"></label>
|
||||||
|
<label class="modal-row"><span>Port</span><input type="number" id="nim-port" min="1" max="65535"></label>
|
||||||
|
<label class="modal-row"><span>Kind</span>
|
||||||
|
<select id="nim-kind">
|
||||||
|
<option value="nim">NIM (other)</option>
|
||||||
|
<option value="stt">STT (speech-to-text)</option>
|
||||||
|
<option value="tts">TTS (text-to-speech)</option>
|
||||||
|
<option value="vision">Vision</option>
|
||||||
|
<option value="embedding">Embedding</option>
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
<label class="modal-row"><span>Target Spark</span>
|
||||||
|
<select id="nim-host">
|
||||||
|
<option value="spark2">Spark 2 (default for support services)</option>
|
||||||
|
<option value="spark1">Spark 1 (head node)</option>
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
</fieldset>
|
||||||
|
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="nim-cancel" class="btn">Cancel</button>
|
||||||
|
<button type="submit" class="btn primary" id="nim-start">Install</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
|
||||||
|
<dialog id="nim-progress-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form">
|
||||||
|
<h3 id="nim-prog-title">Installing…</h3>
|
||||||
|
<div class="phase-row">
|
||||||
|
<div class="phase" id="nim-prog-phase">Starting…</div>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<span class="timer" id="nim-prog-elapsed">0:00</span>
|
||||||
|
</div>
|
||||||
|
<details open>
|
||||||
|
<summary class="muted small">Log</summary>
|
||||||
|
<pre id="nim-prog-log" class="log"></pre>
|
||||||
|
</details>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="nim-prog-close" class="btn">Close</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section id="speech-models-panel" class="speech-models hidden">
|
||||||
|
<div class="section-header">
|
||||||
|
<h2 class="section-title">Speech model patches</h2>
|
||||||
|
</div>
|
||||||
|
<p class="muted small sm-blurb">
|
||||||
|
Spark Control adds Sortformer speaker diarization to the third-party Parakeet ASR
|
||||||
|
container via two Python overlays (<code>diarizer.py</code> + a patched <code>main.py</code>).
|
||||||
|
Overlays survive container restart but not a fresh redeploy — if the parakeet container is
|
||||||
|
ever rebuilt, click <strong>Reapply patches</strong> below to restore them.
|
||||||
|
</p>
|
||||||
|
<div id="speech-models-card" class="speech-models-card"></div>
|
||||||
|
|
||||||
|
<dialog id="speech-models-progress-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form">
|
||||||
|
<h3>Reapplying speech-model patches…</h3>
|
||||||
|
<p class="muted small">Copying overlays into the parakeet container, verifying syntax, restarting, waiting for both models to load. Takes ~60–120 s.</p>
|
||||||
|
<div id="sm-prog-steps" class="sm-prog-steps"></div>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="sm-prog-close" class="btn" disabled>Close</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
</div><!-- /#tab-audio -->
|
||||||
|
|
||||||
|
<div class="tab-content" id="tab-llm" role="tabpanel" aria-labelledby="tab-llm-trigger">
|
||||||
|
|
||||||
<section id="models-section">
|
<section id="models-section">
|
||||||
<div class="section-header">
|
<div class="section-header">
|
||||||
<h2 class="section-title">LLM swap</h2>
|
<h2 class="section-title">LLM swap</h2>
|
||||||
@@ -104,6 +270,20 @@
|
|||||||
</form>
|
</form>
|
||||||
</dialog>
|
</dialog>
|
||||||
|
|
||||||
|
<dialog id="disk-delete-dialog" class="modal">
|
||||||
|
<form method="dialog" class="modal-form">
|
||||||
|
<h3>Delete model weights from disk?</h3>
|
||||||
|
<p id="dd-summary" class="muted small"></p>
|
||||||
|
<ul class="muted small dd-hosts" id="dd-hosts"></ul>
|
||||||
|
<p class="muted small">This is reversible — you can re-download from the catalog at any time. The catalog entry stays intact.</p>
|
||||||
|
<p id="dd-error" class="muted small dd-error hidden"></p>
|
||||||
|
<div class="modal-actions">
|
||||||
|
<button type="button" id="dd-cancel" class="btn">Cancel</button>
|
||||||
|
<button type="button" id="dd-confirm" class="btn danger">Delete from disk</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</dialog>
|
||||||
|
|
||||||
<dialog id="advanced-dialog" class="modal">
|
<dialog id="advanced-dialog" class="modal">
|
||||||
<form method="dialog" class="modal-form" id="advanced-form">
|
<form method="dialog" class="modal-form" id="advanced-form">
|
||||||
<h3 id="adv-title">Advanced settings</h3>
|
<h3 id="adv-title">Advanced settings</h3>
|
||||||
@@ -127,11 +307,20 @@
|
|||||||
<label class="dl-row">
|
<label class="dl-row">
|
||||||
<span class="dl-label">HuggingFace repo</span>
|
<span class="dl-label">HuggingFace repo</span>
|
||||||
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
|
<input type="text" id="dl-repo" placeholder="e.g. RedHatAI/Qwen3.6-35B-A3B-NVFP4" autocomplete="off">
|
||||||
|
<a id="dl-hf-link" class="dl-hf-link hidden" href="#" target="_blank" rel="noopener" title="Open on Hugging Face">↗</a>
|
||||||
</label>
|
</label>
|
||||||
|
<div class="dl-help muted small">
|
||||||
|
<a href="https://huggingface.co/models?other=vllm" target="_blank" rel="noopener">Browse vLLM-compatible models</a>
|
||||||
|
· NVFP4-quantized models (e.g. <code>RedHatAI/...</code>) are best for Blackwell hardware
|
||||||
|
</div>
|
||||||
<div class="dl-row">
|
<div class="dl-row">
|
||||||
<span class="dl-label">Where</span>
|
<span class="dl-label">Where</span>
|
||||||
<label class="radio"><input type="radio" name="dl-mode" value="solo" checked> Spark 1 only (solo)</label>
|
<label class="radio"><input type="radio" name="dl-mode" value="spark1" checked> Spark 1 only</label>
|
||||||
<label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (cluster, copy in parallel)</label>
|
<label class="radio"><input type="radio" name="dl-mode" value="spark2"> Spark 2 only</label>
|
||||||
|
<label class="radio"><input type="radio" name="dl-mode" value="cluster"> Both Sparks (for cluster models)</label>
|
||||||
|
</div>
|
||||||
|
<div class="dl-help muted small">
|
||||||
|
For <strong>solo</strong> models, download to wherever you'll run them. For <strong>cluster</strong> models (-tp 2), both Sparks need the weights — "Both" downloads to one Spark and rsyncs to the other in parallel.
|
||||||
</div>
|
</div>
|
||||||
<div class="dl-actions">
|
<div class="dl-actions">
|
||||||
<button id="dl-cancel" class="btn">Cancel</button>
|
<button id="dl-cancel" class="btn">Cancel</button>
|
||||||
@@ -165,9 +354,14 @@
|
|||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="update-banner" class="update-banner hidden">
|
<section id="update-banner" class="update-banner hidden">
|
||||||
|
<div class="ub-context muted small">
|
||||||
|
Updates to <strong><a href="https://github.com/eugr/spark-vllm-docker" target="_blank" rel="noopener">eugr/spark-vllm-docker</a></strong>
|
||||||
|
— the upstream project that orchestrates vLLM on your Sparks (launch-cluster.sh, recipes, mods). These are <em>not</em> firmware, OS, or model updates.
|
||||||
|
</div>
|
||||||
<div class="ub-row">
|
<div class="ub-row">
|
||||||
<span id="ub-text">Checking for updates…</span>
|
<span id="ub-text">Checking for updates…</span>
|
||||||
<span class="spacer"></span>
|
<span class="spacer"></span>
|
||||||
|
<button id="ub-explain" class="btn small-btn hidden">✨ Explain context</button>
|
||||||
<button id="ub-details" class="btn small-btn hidden">Show details</button>
|
<button id="ub-details" class="btn small-btn hidden">Show details</button>
|
||||||
<button id="ub-apply" class="btn small-btn primary hidden">Apply update</button>
|
<button id="ub-apply" class="btn small-btn primary hidden">Apply update</button>
|
||||||
</div>
|
</div>
|
||||||
@@ -175,6 +369,10 @@
|
|||||||
<summary class="muted small">Pending commits</summary>
|
<summary class="muted small">Pending commits</summary>
|
||||||
<pre id="ub-log" class="snippet"></pre>
|
<pre id="ub-log" class="snippet"></pre>
|
||||||
</details>
|
</details>
|
||||||
|
<details id="ub-explain-section" class="hidden">
|
||||||
|
<summary class="muted small">Explained by the loaded LLM</summary>
|
||||||
|
<div id="ub-explain-content" class="explain-content"></div>
|
||||||
|
</details>
|
||||||
<div id="ub-progress" class="hidden">
|
<div id="ub-progress" class="hidden">
|
||||||
<div class="phase-row">
|
<div class="phase-row">
|
||||||
<div class="phase" id="ub-phase">Applying update…</div>
|
<div class="phase" id="ub-phase">Applying update…</div>
|
||||||
@@ -188,6 +386,8 @@
|
|||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
</div><!-- /#tab-llm -->
|
||||||
|
|
||||||
<footer class="footer">
|
<footer class="footer">
|
||||||
<div class="health">
|
<div class="health">
|
||||||
<span class="health-item" id="h-vllm"><span class="dot"></span> vLLM</span>
|
<span class="health-item" id="h-vllm"><span class="dot"></span> vLLM</span>
|
||||||
|
|||||||
+412
-9
@@ -45,6 +45,17 @@ body {
|
|||||||
.logo-dot { width: 10px; height: 10px; border-radius: 50%; background: var(--accent); box-shadow: 0 0 12px var(--accent); }
|
.logo-dot { width: 10px; height: 10px; border-radius: 50%; background: var(--accent); box-shadow: 0 0 12px var(--accent); }
|
||||||
.current { flex: 1; text-align: right; font-size: 14px; }
|
.current { flex: 1; text-align: right; font-size: 14px; }
|
||||||
.current strong { color: var(--accent); }
|
.current strong { color: var(--accent); }
|
||||||
|
.topbar-btn {
|
||||||
|
background: var(--surface-2);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
color: var(--text);
|
||||||
|
padding: 5px 10px;
|
||||||
|
border-radius: 6px;
|
||||||
|
font-size: 12px;
|
||||||
|
text-decoration: none;
|
||||||
|
transition: border-color 0.15s, background 0.15s;
|
||||||
|
}
|
||||||
|
.topbar-btn:hover { background: #24242c; border-color: var(--accent); color: var(--accent); }
|
||||||
|
|
||||||
main {
|
main {
|
||||||
max-width: 880px;
|
max-width: 880px;
|
||||||
@@ -97,7 +108,8 @@ main {
|
|||||||
overflow-x: auto;
|
overflow-x: auto;
|
||||||
white-space: nowrap;
|
white-space: nowrap;
|
||||||
}
|
}
|
||||||
.copy-btn {
|
.copy-btn,
|
||||||
|
.icon-btn {
|
||||||
appearance: none;
|
appearance: none;
|
||||||
background: var(--surface-2);
|
background: var(--surface-2);
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
@@ -108,15 +120,27 @@ main {
|
|||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
||||||
flex-shrink: 0;
|
flex-shrink: 0;
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
}
|
}
|
||||||
.copy-btn:hover { color: var(--text); border-color: #34343c; }
|
.icon-btn { padding: 5px 7px; }
|
||||||
.copy-btn.copied {
|
.icon-btn svg { width: 14px; height: 14px; display: block; }
|
||||||
|
.copy-btn:hover,
|
||||||
|
.icon-btn:hover { color: var(--text); border-color: #34343c; }
|
||||||
|
.copy-btn.copied,
|
||||||
|
.icon-btn.copied {
|
||||||
color: var(--accent);
|
color: var(--accent);
|
||||||
border-color: rgba(74, 222, 128, 0.4);
|
border-color: rgba(74, 222, 128, 0.4);
|
||||||
background: rgba(74, 222, 128, 0.08);
|
background: rgba(74, 222, 128, 0.08);
|
||||||
}
|
}
|
||||||
|
.icon-btn.copied svg { color: var(--accent); }
|
||||||
.copy-btn.small { padding: 3px 8px; font-size: 11px; }
|
.copy-btn.small { padding: 3px 8px; font-size: 11px; }
|
||||||
|
|
||||||
|
.copyable { cursor: pointer; }
|
||||||
|
.copyable:hover { outline: 1px solid rgba(96, 165, 250, 0.5); }
|
||||||
|
.copyable.copied { outline: 1px solid var(--accent); background: rgba(74, 222, 128, 0.05); }
|
||||||
|
|
||||||
.ep-curl { margin-top: 8px; }
|
.ep-curl { margin-top: 8px; }
|
||||||
.ep-curl summary { cursor: pointer; padding: 4px 0; }
|
.ep-curl summary { cursor: pointer; padding: 4px 0; }
|
||||||
.ep-curl[open] summary { margin-bottom: 6px; }
|
.ep-curl[open] summary { margin-bottom: 6px; }
|
||||||
@@ -255,6 +279,14 @@ main {
|
|||||||
font: 13px ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace;
|
font: 13px ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace;
|
||||||
}
|
}
|
||||||
.modal-row textarea { font-family: inherit; resize: vertical; }
|
.modal-row textarea { font-family: inherit; resize: vertical; }
|
||||||
|
.modal-row .knob-hint {
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 11px;
|
||||||
|
line-height: 1.5;
|
||||||
|
margin-top: 2px;
|
||||||
|
padding-left: 2px;
|
||||||
|
}
|
||||||
|
.modal-row.inline .knob-hint { width: 100%; margin-left: 22px; margin-top: 0; }
|
||||||
.modal-row input:focus, .modal-row textarea:focus, .modal-row select:focus { outline: 1px solid var(--info); border-color: var(--info); }
|
.modal-row input:focus, .modal-row textarea:focus, .modal-row select:focus { outline: 1px solid var(--info); border-color: var(--info); }
|
||||||
.modal-row input[type='range'] { padding: 0; flex: 1; }
|
.modal-row input[type='range'] { padding: 0; flex: 1; }
|
||||||
.modal-fieldset {
|
.modal-fieldset {
|
||||||
@@ -274,10 +306,39 @@ main {
|
|||||||
background: var(--surface);
|
background: var(--surface);
|
||||||
border: 1px solid rgba(96, 165, 250, 0.4);
|
border: 1px solid rgba(96, 165, 250, 0.4);
|
||||||
border-radius: var(--radius);
|
border-radius: var(--radius);
|
||||||
padding: 10px 14px;
|
padding: 12px 14px;
|
||||||
margin-top: 18px;
|
margin-top: 18px;
|
||||||
font-size: 13px;
|
font-size: 13px;
|
||||||
}
|
}
|
||||||
|
.ub-context { margin-bottom: 8px; line-height: 1.5; }
|
||||||
|
.ub-context a { color: var(--info); text-decoration: none; }
|
||||||
|
.ub-context a:hover { text-decoration: underline; }
|
||||||
|
.ub-context em { font-style: normal; color: var(--text); font-weight: 500; }
|
||||||
|
|
||||||
|
#ub-explain-section { margin-top: 8px; }
|
||||||
|
#ub-explain-section summary { cursor: pointer; padding: 4px 0; }
|
||||||
|
.explain-content {
|
||||||
|
background: #08080b;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 12px 14px;
|
||||||
|
margin-top: 8px;
|
||||||
|
font-size: 13px;
|
||||||
|
line-height: 1.6;
|
||||||
|
color: #c7c7d1;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
word-break: break-word;
|
||||||
|
max-height: 320px;
|
||||||
|
overflow: auto;
|
||||||
|
}
|
||||||
|
.explain-content .reasoning {
|
||||||
|
color: var(--muted);
|
||||||
|
font-style: italic;
|
||||||
|
font-size: 11px;
|
||||||
|
border-left: 2px solid var(--border);
|
||||||
|
padding-left: 10px;
|
||||||
|
margin: 4px 0;
|
||||||
|
}
|
||||||
.update-banner.up-to-date {
|
.update-banner.up-to-date {
|
||||||
border-color: var(--border);
|
border-color: var(--border);
|
||||||
color: var(--muted);
|
color: var(--muted);
|
||||||
@@ -289,6 +350,90 @@ main {
|
|||||||
#ub-list summary { cursor: pointer; padding: 4px 0; }
|
#ub-list summary { cursor: pointer; padding: 4px 0; }
|
||||||
#ub-progress { margin-top: 10px; }
|
#ub-progress { margin-top: 10px; }
|
||||||
|
|
||||||
|
/* ===== Hardware dashboard ===== */
|
||||||
|
|
||||||
|
.hardware-grid {
|
||||||
|
display: grid;
|
||||||
|
gap: 14px;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(320px, 1fr));
|
||||||
|
}
|
||||||
|
.hw-card {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius);
|
||||||
|
padding: 14px 16px;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 8px;
|
||||||
|
}
|
||||||
|
.hw-card .head {
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 8px;
|
||||||
|
margin-bottom: 4px;
|
||||||
|
}
|
||||||
|
.hw-card .head .name { font-weight: 600; font-size: 15px; }
|
||||||
|
.hw-card .head .meta { color: var(--muted); font-size: 12px; margin-left: auto; }
|
||||||
|
.hw-card.unreachable { border-color: rgba(239, 68, 68, 0.4); }
|
||||||
|
.hw-card.unreachable .name { color: var(--error); }
|
||||||
|
.hw-card.unreachable ol { color: var(--muted); }
|
||||||
|
.hw-card .wol-row {
|
||||||
|
margin-top: 8px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 8px;
|
||||||
|
font-size: 12px;
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
.hw-card .wol-row .btn { padding: 5px 10px; font-size: 12px; }
|
||||||
|
.hw-card .mac-display { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; }
|
||||||
|
|
||||||
|
.connectivity-content {
|
||||||
|
max-height: 360px;
|
||||||
|
overflow-y: auto;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 10px;
|
||||||
|
background: var(--surface-2);
|
||||||
|
}
|
||||||
|
.conn-spark { margin-bottom: 16px; }
|
||||||
|
.conn-spark h4 { font-size: 13px; margin: 0 0 8px; color: var(--text); }
|
||||||
|
.conn-event {
|
||||||
|
font-size: 12px;
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
padding: 4px 0;
|
||||||
|
border-bottom: 1px solid rgba(255,255,255,0.04);
|
||||||
|
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
}
|
||||||
|
.conn-event:last-child { border-bottom: 0; }
|
||||||
|
.conn-event .when { color: var(--muted); flex-shrink: 0; }
|
||||||
|
.conn-event .what { flex: 1; }
|
||||||
|
.conn-event.up .what { color: var(--accent); }
|
||||||
|
.conn-event.down .what { color: var(--error); }
|
||||||
|
.conn-event.report .what { font-style: italic; }
|
||||||
|
.conn-event .muted { color: var(--muted); font-style: normal; }
|
||||||
|
.conn-event .dur { color: var(--muted); }
|
||||||
|
.conn-summary { color: var(--muted); font-size: 11px; padding: 4px 0 10px; }
|
||||||
|
.hw-metric { display: flex; align-items: center; gap: 10px; font-size: 12px; }
|
||||||
|
.hw-metric .label { color: var(--muted); width: 56px; flex-shrink: 0; text-transform: uppercase; letter-spacing: 0.05em; font-size: 11px; }
|
||||||
|
.hw-metric .bar { flex: 1; height: 8px; background: var(--surface-2); border-radius: 4px; overflow: hidden; position: relative; }
|
||||||
|
.hw-metric .bar > span {
|
||||||
|
display: block;
|
||||||
|
height: 100%;
|
||||||
|
background: linear-gradient(90deg, var(--info), var(--accent));
|
||||||
|
border-radius: 4px;
|
||||||
|
transition: width 0.4s ease-out;
|
||||||
|
}
|
||||||
|
.hw-metric .bar.warn > span { background: linear-gradient(90deg, var(--warn), var(--error)); }
|
||||||
|
.hw-metric .val {
|
||||||
|
font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace;
|
||||||
|
font-size: 12px;
|
||||||
|
color: var(--text);
|
||||||
|
min-width: 110px;
|
||||||
|
text-align: right;
|
||||||
|
}
|
||||||
|
|
||||||
/* ===== Section header (title + action button) ===== */
|
/* ===== Section header (title + action button) ===== */
|
||||||
|
|
||||||
.section-header {
|
.section-header {
|
||||||
@@ -341,6 +486,24 @@ main {
|
|||||||
min-width: 200px;
|
min-width: 200px;
|
||||||
}
|
}
|
||||||
.dl-row input[type='text']:focus { outline: 1px solid var(--info); border-color: var(--info); }
|
.dl-row input[type='text']:focus { outline: 1px solid var(--info); border-color: var(--info); }
|
||||||
|
.dl-hf-link {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
background: var(--surface-2);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
color: var(--info);
|
||||||
|
padding: 7px 10px;
|
||||||
|
border-radius: 6px;
|
||||||
|
text-decoration: none;
|
||||||
|
font-size: 14px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}
|
||||||
|
.dl-hf-link:hover { background: rgba(96, 165, 250, 0.08); border-color: var(--info); }
|
||||||
|
.dl-help { padding-left: 122px; line-height: 1.6; }
|
||||||
|
.dl-help a { color: var(--info); text-decoration: none; }
|
||||||
|
.dl-help a:hover { text-decoration: underline; }
|
||||||
|
.dl-help code { background: var(--surface-2); padding: 1px 5px; border-radius: 3px; font-size: 11px; }
|
||||||
.radio { display: inline-flex; align-items: center; gap: 6px; font-size: 13px; color: var(--text); cursor: pointer; }
|
.radio { display: inline-flex; align-items: center; gap: 6px; font-size: 13px; color: var(--text); cursor: pointer; }
|
||||||
.radio input { accent-color: var(--accent); }
|
.radio input { accent-color: var(--accent); }
|
||||||
.dl-actions { display: flex; gap: 8px; justify-content: flex-end; margin-top: 10px; }
|
.dl-actions { display: flex; gap: 8px; justify-content: flex-end; margin-top: 10px; }
|
||||||
@@ -353,6 +516,37 @@ main {
|
|||||||
#dl-log-details { margin-top: 12px; }
|
#dl-log-details { margin-top: 12px; }
|
||||||
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
|
#dl-log-details summary { cursor: pointer; padding: 4px 0; }
|
||||||
|
|
||||||
|
/* ===== NIM install dialog ===== */
|
||||||
|
|
||||||
|
.modal#nim-dialog,
|
||||||
|
.modal#nim-progress-dialog { max-width: 640px; }
|
||||||
|
.nim-grid {
|
||||||
|
display: grid;
|
||||||
|
gap: 8px;
|
||||||
|
grid-template-columns: 1fr;
|
||||||
|
max-height: 240px;
|
||||||
|
overflow-y: auto;
|
||||||
|
margin-bottom: 4px;
|
||||||
|
}
|
||||||
|
.nim-card {
|
||||||
|
background: var(--surface-2);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 10px 12px;
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
align-items: flex-start;
|
||||||
|
}
|
||||||
|
.nim-card .info { flex: 1; }
|
||||||
|
.nim-card .name { font-weight: 600; font-size: 13px; }
|
||||||
|
.nim-card .desc { color: var(--muted); font-size: 12px; margin-top: 4px; }
|
||||||
|
.nim-card .img { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; color: #6b6b75; font-size: 11px; margin-top: 4px; word-break: break-all; }
|
||||||
|
.nim-card .btn { padding: 6px 12px; font-size: 12px; flex-shrink: 0; }
|
||||||
|
.nim-card .links { font-size: 11px; margin-top: 4px; }
|
||||||
|
.nim-card .links a { color: var(--info); text-decoration: none; }
|
||||||
|
.nim-card .links a:hover { text-decoration: underline; }
|
||||||
|
.nim-key-warn { color: var(--warn); }
|
||||||
|
|
||||||
/* ===== Section titles ===== */
|
/* ===== Section titles ===== */
|
||||||
|
|
||||||
.section-title {
|
.section-title {
|
||||||
@@ -409,13 +603,38 @@ main {
|
|||||||
|
|
||||||
.service-card .row {
|
.service-card .row {
|
||||||
display: flex;
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
font-size: 12px;
|
font-size: 12px;
|
||||||
color: var(--muted);
|
color: var(--muted);
|
||||||
gap: 6px;
|
gap: 6px;
|
||||||
}
|
}
|
||||||
.service-card .row .k { width: 60px; flex-shrink: 0; }
|
.service-card .row .k { width: 60px; flex-shrink: 0; }
|
||||||
.service-card .row .v { color: var(--text); font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; word-break: break-all; }
|
.service-card .row .v {
|
||||||
|
color: var(--text);
|
||||||
|
font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace;
|
||||||
|
word-break: break-all;
|
||||||
|
flex: 1;
|
||||||
|
padding: 2px 4px;
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
.service-card .row .v.muted-v { color: var(--muted); font-family: inherit; }
|
.service-card .row .v.muted-v { color: var(--muted); font-family: inherit; }
|
||||||
|
.service-card .row .v.copyable:hover { outline: 1px solid rgba(96, 165, 250, 0.5); }
|
||||||
|
.service-card .row .v.copyable.copied { outline: 1px solid var(--accent); background: rgba(74, 222, 128, 0.05); }
|
||||||
|
.service-card .row .icon-btn { padding: 3px 6px; }
|
||||||
|
.service-card .row .icon-btn svg { width: 12px; height: 12px; }
|
||||||
|
.service-card .deep-row .deep-v { display: flex; align-items: center; gap: 6px; font-family: inherit; flex-wrap: wrap; }
|
||||||
|
.service-card .dh-ok { color: var(--accent); }
|
||||||
|
.service-card .dh-fail { color: var(--error); font-weight: 500; }
|
||||||
|
.service-card .dh-run-btn { font-family: inherit; }
|
||||||
|
.service-card .deep-error {
|
||||||
|
padding: 4px 8px;
|
||||||
|
background: rgba(239, 68, 68, 0.06);
|
||||||
|
border-left: 2px solid var(--error);
|
||||||
|
border-radius: 4px;
|
||||||
|
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
font-size: 11px;
|
||||||
|
word-break: break-word;
|
||||||
|
}
|
||||||
|
|
||||||
.service-actions {
|
.service-actions {
|
||||||
display: flex;
|
display: flex;
|
||||||
@@ -460,26 +679,35 @@ main {
|
|||||||
font-size: 11px;
|
font-size: 11px;
|
||||||
color: #5c5c66;
|
color: #5c5c66;
|
||||||
}
|
}
|
||||||
|
.card .repo a { color: inherit; text-decoration: none; }
|
||||||
|
.card .repo a:hover { color: var(--info); text-decoration: underline; }
|
||||||
|
.card .repo .hf-icon { font-size: 13px; opacity: 0.7; }
|
||||||
.tag {
|
.tag {
|
||||||
background: var(--surface-2);
|
background: var(--surface-2);
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
padding: 2px 8px;
|
padding: 2px 8px;
|
||||||
border-radius: 999px;
|
border-radius: 999px;
|
||||||
font-size: 11px;
|
font-size: 12px;
|
||||||
}
|
}
|
||||||
.tag.mode-cluster { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
.tag.mode-cluster { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
||||||
.tag.mode-solo { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
.tag.mode-solo { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
.tag.cap { color: var(--muted); }
|
.tag.cap { color: var(--muted); }
|
||||||
|
/* Semantic status pills — reuse .tag sizing so every pill on the page
|
||||||
|
renders at the same 11px / 2px×8px footprint. */
|
||||||
|
.tag.ok { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
|
.tag.warn { color: var(--warn); border-color: rgba(245, 158, 11, 0.4); }
|
||||||
|
.tag.bad { color: var(--error); border-color: rgba(239, 68, 68, 0.4); }
|
||||||
|
|
||||||
.btn {
|
.btn {
|
||||||
appearance: none;
|
appearance: none;
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
background: var(--surface-2);
|
background: var(--surface-2);
|
||||||
color: var(--text);
|
color: var(--text);
|
||||||
padding: 8px 14px;
|
padding: 6px 12px;
|
||||||
border-radius: 8px;
|
border-radius: 8px;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
font: inherit;
|
font: inherit;
|
||||||
|
font-size: 12px;
|
||||||
font-weight: 500;
|
font-weight: 500;
|
||||||
transition: background 0.15s, border-color 0.15s, opacity 0.15s;
|
transition: background 0.15s, border-color 0.15s, opacity 0.15s;
|
||||||
}
|
}
|
||||||
@@ -489,11 +717,37 @@ main {
|
|||||||
.btn:disabled { opacity: 0.45; cursor: not-allowed; }
|
.btn:disabled { opacity: 0.45; cursor: not-allowed; }
|
||||||
.btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); }
|
.btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); }
|
||||||
.btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); }
|
.btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); }
|
||||||
|
.btn.info { background: var(--info); color: #0a1e3d; border-color: var(--info); }
|
||||||
|
.btn.info:hover:not(:disabled) { background: #82baff; border-color: #82baff; }
|
||||||
.card.active .btn { background: rgba(74, 222, 128, 0.12); color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
.card.active .btn { background: rgba(74, 222, 128, 0.12); color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
.card-actions { display: flex; gap: 6px; }
|
.card-actions { display: flex; gap: 6px; }
|
||||||
.card-actions .btn.primary { flex: 1; }
|
.card-actions .btn.primary,
|
||||||
.card .adv-btn { padding: 8px 12px; font-size: 12px; }
|
.card-actions .btn.info { flex: 1; }
|
||||||
|
.card .adv-btn,
|
||||||
|
.card .test-btn { padding: 8px 12px; font-size: 12px; }
|
||||||
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
.card .custom-pill { color: var(--info); border-color: rgba(96, 165, 250, 0.4); }
|
||||||
|
.tag.on-disk { color: var(--accent); border-color: rgba(74, 222, 128, 0.4); }
|
||||||
|
.tag.not-on-disk { color: var(--muted); border-color: var(--border); opacity: 0.7; }
|
||||||
|
.card-actions .icon-btn.danger { color: var(--error); border-color: rgba(239, 68, 68, 0.3); margin-left: auto; }
|
||||||
|
.card-actions .icon-btn.danger:hover:not(:disabled) { background: rgba(239, 68, 68, 0.08); border-color: var(--error); color: var(--error); }
|
||||||
|
.card-actions .icon-btn.danger:disabled { opacity: 0.35; cursor: not-allowed; }
|
||||||
|
.dd-hosts { padding-left: 18px; margin: 4px 0 8px; }
|
||||||
|
.dd-hosts code { background: var(--surface-2); padding: 1px 5px; border-radius: 4px; }
|
||||||
|
.dd-error { color: var(--error); }
|
||||||
|
|
||||||
|
.test-result {
|
||||||
|
font-size: 12px;
|
||||||
|
line-height: 1.45;
|
||||||
|
padding: 8px 10px;
|
||||||
|
border-radius: 5px;
|
||||||
|
margin-top: 4px;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
background: var(--surface-2);
|
||||||
|
}
|
||||||
|
.test-result.ok { border-color: rgba(74, 222, 128, 0.4); background: rgba(74, 222, 128, 0.04); }
|
||||||
|
.test-result.fail { border-color: rgba(239, 68, 68, 0.45); background: rgba(239, 68, 68, 0.06); word-break: break-word; }
|
||||||
|
.test-result .ok-mark { color: var(--accent); font-weight: 600; }
|
||||||
|
.test-result .fail-mark { color: var(--error); font-weight: 600; }
|
||||||
|
|
||||||
.footer {
|
.footer {
|
||||||
margin-top: 28px;
|
margin-top: 28px;
|
||||||
@@ -516,3 +770,152 @@ main {
|
|||||||
main { padding: 16px 14px 80px; }
|
main { padding: 16px 14px 80px; }
|
||||||
.cards { grid-template-columns: 1fr; }
|
.cards { grid-template-columns: 1fr; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* ===== Speech model patches (v0.11) ===== */
|
||||||
|
.speech-models { margin-top: 28px; }
|
||||||
|
.sm-blurb { max-width: 880px; margin-bottom: 14px; }
|
||||||
|
.sm-blurb code {
|
||||||
|
background: var(--surface-2);
|
||||||
|
padding: 1px 6px;
|
||||||
|
border-radius: 4px;
|
||||||
|
font-size: 12px;
|
||||||
|
}
|
||||||
|
.speech-models-card {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 10px;
|
||||||
|
padding: 16px;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 14px;
|
||||||
|
}
|
||||||
|
.sm-header {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
.sm-title {
|
||||||
|
font-weight: 600;
|
||||||
|
color: var(--text);
|
||||||
|
}
|
||||||
|
/* .sm-pill removed in v0.11.0:1 — speech-models pills now reuse the shared
|
||||||
|
.tag styling (+ .tag.ok / .tag.warn / .tag.bad color modifiers) so every
|
||||||
|
pill on the page renders identically. */
|
||||||
|
|
||||||
|
.sm-models { display: flex; flex-direction: column; gap: 6px; }
|
||||||
|
.sm-model-row {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 160px 1fr auto;
|
||||||
|
align-items: center;
|
||||||
|
gap: 12px;
|
||||||
|
padding: 6px 0;
|
||||||
|
border-top: 1px solid var(--border);
|
||||||
|
}
|
||||||
|
.sm-model-row:first-child { border-top: none; }
|
||||||
|
.sm-model-kind { color: var(--muted); font-size: 13px; }
|
||||||
|
.sm-model-name { font-family: ui-monospace, monospace; font-size: 12px; word-break: break-all; }
|
||||||
|
|
||||||
|
.sm-files { display: flex; flex-direction: column; gap: 4px; }
|
||||||
|
.sm-file-row {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 160px 100px 1fr;
|
||||||
|
gap: 12px;
|
||||||
|
font-size: 12px;
|
||||||
|
padding: 4px 0;
|
||||||
|
}
|
||||||
|
.sm-file-name code {
|
||||||
|
background: var(--surface-2);
|
||||||
|
padding: 1px 6px;
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
|
.sm-file-ok { color: var(--accent); }
|
||||||
|
.sm-file-warn { color: var(--warn); }
|
||||||
|
.sm-file-bad { color: var(--error); }
|
||||||
|
.sm-file-sha code {
|
||||||
|
background: var(--surface-2);
|
||||||
|
padding: 1px 4px;
|
||||||
|
border-radius: 3px;
|
||||||
|
font-size: 11px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.sm-meta { margin-top: 4px; }
|
||||||
|
.sm-actions { display: flex; gap: 10px; }
|
||||||
|
|
||||||
|
.sm-prog-steps {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 6px;
|
||||||
|
margin: 12px 0;
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
.sm-prog-step {
|
||||||
|
padding: 6px 10px;
|
||||||
|
background: var(--surface-2);
|
||||||
|
border-radius: 6px;
|
||||||
|
}
|
||||||
|
.sm-prog-done {
|
||||||
|
font-weight: 600;
|
||||||
|
margin-top: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ===== Collapsible endpoint card (v0.11.0:1) ===== */
|
||||||
|
.endpoint-panel .ep-header {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
.endpoint-panel .ep-title { flex: 1; margin: 0; }
|
||||||
|
.endpoint-panel .ep-collapse-btn {
|
||||||
|
flex-shrink: 0;
|
||||||
|
transition: transform 0.2s;
|
||||||
|
}
|
||||||
|
.endpoint-panel.collapsed .ep-body { display: none; }
|
||||||
|
.endpoint-panel.collapsed .ep-collapse-btn svg { transform: rotate(-90deg); }
|
||||||
|
.endpoint-panel:not(.collapsed) .ep-header { margin-bottom: 10px; }
|
||||||
|
|
||||||
|
/* ===== Dashboard tabs (LLM / Audio) (v0.11.0:1) ===== */
|
||||||
|
.dashboard-tabs {
|
||||||
|
display: flex;
|
||||||
|
gap: 4px;
|
||||||
|
margin-top: 8px;
|
||||||
|
margin-bottom: 16px;
|
||||||
|
border-bottom: 1px solid var(--border);
|
||||||
|
padding: 0 2px;
|
||||||
|
}
|
||||||
|
.dashboard-tab {
|
||||||
|
appearance: none;
|
||||||
|
background: transparent;
|
||||||
|
border: 1px solid transparent;
|
||||||
|
border-bottom: none;
|
||||||
|
color: var(--muted);
|
||||||
|
padding: 8px 16px;
|
||||||
|
border-radius: 6px 6px 0 0;
|
||||||
|
cursor: pointer;
|
||||||
|
font: inherit;
|
||||||
|
font-size: 14px;
|
||||||
|
font-weight: 500;
|
||||||
|
margin-bottom: -1px;
|
||||||
|
transition: color 0.15s, background 0.15s, border-color 0.15s;
|
||||||
|
}
|
||||||
|
.dashboard-tab:hover { color: var(--text); }
|
||||||
|
.dashboard-tab.active {
|
||||||
|
color: var(--text);
|
||||||
|
background: var(--surface);
|
||||||
|
border-color: var(--border);
|
||||||
|
border-bottom: 1px solid var(--surface);
|
||||||
|
}
|
||||||
|
.tab-content { display: none; }
|
||||||
|
.tab-content.active { display: block; }
|
||||||
|
|
||||||
|
/* ===== WhisperX install banner (v0.12) ===== */
|
||||||
|
.whisperx-install {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--info);
|
||||||
|
border-radius: var(--radius);
|
||||||
|
padding: 16px 18px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
.wx-install-body { display: flex; flex-direction: column; gap: 10px; }
|
||||||
|
.wx-install-title { display: flex; align-items: center; gap: 10px; }
|
||||||
|
.wx-install-title strong { font-size: 15px; color: var(--text); }
|
||||||
|
.wx-install-actions { display: flex; gap: 10px; margin-top: 4px; }
|
||||||
|
|||||||
@@ -0,0 +1,137 @@
|
|||||||
|
"""Pre-flight validation of a proposed vLLM launch command.
|
||||||
|
|
||||||
|
Runs vLLM's own argparse layer (EngineArgs) inside the vllm_node container WITHOUT
|
||||||
|
starting the engine. Catches:
|
||||||
|
|
||||||
|
* unknown flag names (typos)
|
||||||
|
* bad types / values that argparse rejects
|
||||||
|
* deprecated flags removed in the installed vLLM version
|
||||||
|
|
||||||
|
Does NOT catch (these surface only during real engine init):
|
||||||
|
* model-architecture-specific constraints (e.g. Qwen3.6 Mamba block_size)
|
||||||
|
* OOM at weight-loading time
|
||||||
|
* Triton / CUDA-kernel compatibility errors
|
||||||
|
|
||||||
|
A pre-flight check that returns "ok" is therefore NOT a guarantee — but a
|
||||||
|
"failed" verdict is a definitive 'don't bother with the real swap'.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import json
|
||||||
|
import shlex
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .models import Catalog, build_launch_command
|
||||||
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
# Validates the proposed args against the same combined parser vLLM uses for
|
||||||
|
# `vllm serve` (engine args + server args + frontend args). Returns one JSON
|
||||||
|
# line on stdout: {"ok": true, ...} or {"ok": false, ...}.
|
||||||
|
_VALIDATOR_SCRIPT = r"""
|
||||||
|
import argparse, json, sys
|
||||||
|
|
||||||
|
# Mirror what `vllm serve` does internally: FlexibleArgumentParser (which is
|
||||||
|
# more lenient about dashes vs underscores) wrapped with make_arg_parser
|
||||||
|
# (which adds engine + server + frontend args).
|
||||||
|
parser = None
|
||||||
|
try:
|
||||||
|
# Newer vLLM path
|
||||||
|
from vllm.utils.argparse_utils import FlexibleArgumentParser
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
# Older fallback
|
||||||
|
from vllm.engine.arg_utils import FlexibleArgumentParser
|
||||||
|
except Exception:
|
||||||
|
FlexibleArgumentParser = argparse.ArgumentParser # type: ignore
|
||||||
|
|
||||||
|
try:
|
||||||
|
from vllm.entrypoints.openai.cli_args import make_arg_parser
|
||||||
|
parser = make_arg_parser(FlexibleArgumentParser(add_help=False))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if parser is None:
|
||||||
|
try:
|
||||||
|
from vllm.engine.arg_utils import EngineArgs
|
||||||
|
parser = FlexibleArgumentParser(add_help=False)
|
||||||
|
EngineArgs.add_cli_args(parser)
|
||||||
|
except Exception as e:
|
||||||
|
print(json.dumps({"ok": False, "stage": "import", "error": f"{type(e).__name__}: {e}"}))
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
class _ArgError(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def _err(message):
|
||||||
|
raise _ArgError(message)
|
||||||
|
|
||||||
|
parser.error = _err # capture argparse errors instead of sys.exit(2)
|
||||||
|
|
||||||
|
try:
|
||||||
|
raw = sys.stdin.read()
|
||||||
|
arglist = json.loads(raw)
|
||||||
|
ns = parser.parse_args(arglist)
|
||||||
|
print(json.dumps({"ok": True, "model": getattr(ns, "model", None)}))
|
||||||
|
except _ArgError as e:
|
||||||
|
print(json.dumps({"ok": False, "stage": "parse", "error": str(e)}))
|
||||||
|
except SystemExit as e:
|
||||||
|
print(json.dumps({"ok": False, "stage": "parse", "error": f"argparse exit {e.code}"}))
|
||||||
|
except Exception as e:
|
||||||
|
print(json.dumps({"ok": False, "stage": "parse", "error": f"{type(e).__name__}: {e}"}))
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def _vllm_arg_list(key: str, model_def, catalog: Catalog) -> list[str]:
|
||||||
|
"""Reconstruct the args list passed to `vllm serve` (without the positional model)."""
|
||||||
|
cmd = build_launch_command(key, model_def, catalog.defaults)
|
||||||
|
# build_launch_command yields:
|
||||||
|
# ./launch-cluster.sh [--solo] -d exec vllm serve <repo> <args...>
|
||||||
|
# We just want the bits after `vllm serve <repo>`.
|
||||||
|
tokens = shlex.split(cmd)
|
||||||
|
if "serve" not in tokens:
|
||||||
|
return []
|
||||||
|
i = tokens.index("serve")
|
||||||
|
after = tokens[i + 1 :] # repo, then args
|
||||||
|
if not after:
|
||||||
|
return []
|
||||||
|
args = after[1:] # drop the repo
|
||||||
|
# EngineArgs expects --model=REPO rather than positional, so prepend it.
|
||||||
|
return [f"--model={after[0]}", *args]
|
||||||
|
|
||||||
|
|
||||||
|
async def validate_launch(key: str, catalog: Catalog, settings: Settings) -> dict:
|
||||||
|
if key not in catalog.models:
|
||||||
|
return {"ok": False, "stage": "lookup", "error": f"unknown model: {key}"}
|
||||||
|
if not settings.spark1_host or not settings.spark1_user:
|
||||||
|
return {"ok": False, "stage": "config", "error": "spark1 not configured"}
|
||||||
|
|
||||||
|
model = catalog.models[key]
|
||||||
|
arg_list = _vllm_arg_list(key, model, catalog)
|
||||||
|
if not arg_list:
|
||||||
|
return {"ok": False, "stage": "build", "error": "failed to build args list"}
|
||||||
|
|
||||||
|
payload = json.dumps(arg_list).replace("'", "'\\''")
|
||||||
|
# Pipe the JSON args list to a here-doc Python invocation. The validator
|
||||||
|
# reads from stdin to avoid shell-escaping the args themselves.
|
||||||
|
cmd = (
|
||||||
|
f"echo '{payload}' | docker exec -i vllm_node python3 -c "
|
||||||
|
+ shlex.quote(_VALIDATOR_SCRIPT)
|
||||||
|
)
|
||||||
|
|
||||||
|
rc, out, err = await ssh_run(settings.spark1_host, settings.spark1_user, cmd, settings, timeout=20)
|
||||||
|
if rc != 0 and not out.strip():
|
||||||
|
return {
|
||||||
|
"ok": False,
|
||||||
|
"stage": "ssh",
|
||||||
|
"error": err.strip() or f"rc={rc}",
|
||||||
|
"cmd_args": arg_list,
|
||||||
|
"launch_cmd": build_launch_command(key, model, catalog.defaults),
|
||||||
|
}
|
||||||
|
last = out.strip().splitlines()[-1] if out.strip() else ""
|
||||||
|
try:
|
||||||
|
result: dict[str, Any] = json.loads(last)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
result = {"ok": False, "stage": "decode", "error": "validator did not return JSON", "raw": out[-500:]}
|
||||||
|
result["cmd_args"] = arg_list
|
||||||
|
result["launch_cmd"] = build_launch_command(key, model, catalog.defaults)
|
||||||
|
return result
|
||||||
@@ -0,0 +1,267 @@
|
|||||||
|
"""WhisperX install action — ships the build context from inside spark-control
|
||||||
|
to Spark 2 over SSH, then runs `docker build` + `docker run` on Spark 2 and
|
||||||
|
streams progress back as SSE.
|
||||||
|
|
||||||
|
Pattern mirrors NimManager (see nim.py) but for a locally-built container
|
||||||
|
rather than an `nvcr.io` pull. Build context lives at
|
||||||
|
/app/whisperx_container/ inside the spark-control Docker image (set up by
|
||||||
|
the Dockerfile COPY directive).
|
||||||
|
|
||||||
|
Endpoints:
|
||||||
|
POST /api/whisperx/install — kick off
|
||||||
|
GET /api/whisperx/install/{job_id} — snapshot
|
||||||
|
GET /api/whisperx/install/{job_id}/stream — SSE phase + log lines
|
||||||
|
GET /api/whisperx/status — installed + healthy?
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import shlex
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .ssh import _base_args, ssh_run, ssh_stream, StreamHandle
|
||||||
|
|
||||||
|
|
||||||
|
# Build context shipped inside the spark-control image (Dockerfile COPYs it).
|
||||||
|
BUILD_CONTEXT_DIR = Path(__file__).resolve().parent.parent / "whisperx_container"
|
||||||
|
|
||||||
|
# Files we ship to Spark 2's build dir. Mapped local-name → remote-relative-path.
|
||||||
|
BUILD_FILES = {
|
||||||
|
"Dockerfile": "Dockerfile",
|
||||||
|
"requirements.txt": "requirements.txt",
|
||||||
|
"README.md": "README.md",
|
||||||
|
"app/main.py": "app/main.py",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class WhisperXInstallJob:
|
||||||
|
id: str
|
||||||
|
started_at: str
|
||||||
|
state: str = "starting" # starting | sending | building | running | done | failed
|
||||||
|
phase: str = "Starting…"
|
||||||
|
lines: list[str] = field(default_factory=list)
|
||||||
|
returncode: Optional[int] = None
|
||||||
|
finished_at: Optional[str] = None
|
||||||
|
|
||||||
|
def append(self, line: str) -> None:
|
||||||
|
self.lines.append(line)
|
||||||
|
if len(self.lines) > 1500:
|
||||||
|
del self.lines[: len(self.lines) - 1500]
|
||||||
|
|
||||||
|
|
||||||
|
class WhisperXInstaller:
|
||||||
|
def __init__(self, settings: Settings) -> None:
|
||||||
|
self.settings = settings
|
||||||
|
self.lock = asyncio.Lock()
|
||||||
|
self.jobs: dict[str, WhisperXInstallJob] = {}
|
||||||
|
self.current_job_id: Optional[str] = None
|
||||||
|
|
||||||
|
def get(self, job_id: str) -> WhisperXInstallJob | None:
|
||||||
|
return self.jobs.get(job_id)
|
||||||
|
|
||||||
|
async def status(self) -> dict:
|
||||||
|
"""Probe whether WhisperX is installed + healthy on its configured host."""
|
||||||
|
s = self.settings
|
||||||
|
host_present = bool(s.whisperx_host and s.whisperx_user)
|
||||||
|
if not host_present:
|
||||||
|
return {"configured": False, "installed": False, "healthy": False}
|
||||||
|
# Probe HTTP health
|
||||||
|
url = f"http://{s.whisperx_host}:{s.whisperx_port}/health"
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=3.0) as client:
|
||||||
|
r = await client.get(url)
|
||||||
|
if r.status_code == 200:
|
||||||
|
body = r.json()
|
||||||
|
return {
|
||||||
|
"configured": True,
|
||||||
|
"installed": True,
|
||||||
|
"healthy": True,
|
||||||
|
"model": body.get("model"),
|
||||||
|
"device": body.get("device"),
|
||||||
|
"diarizer_loaded": body.get("diarizer_loaded", False),
|
||||||
|
}
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
# No HTTP — check if the container exists at all
|
||||||
|
container_present = await self._container_exists()
|
||||||
|
return {
|
||||||
|
"configured": True,
|
||||||
|
"installed": container_present,
|
||||||
|
"healthy": False,
|
||||||
|
"current_job_id": self.current_job_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
async def _container_exists(self) -> bool:
|
||||||
|
s = self.settings
|
||||||
|
cmd = f"docker ps -a --filter name=^{s.whisperx_container}$ --format '{{{{.Names}}}}'"
|
||||||
|
rc, out, _ = await ssh_run(s.whisperx_host, s.whisperx_user, cmd, s, timeout=10)
|
||||||
|
return rc == 0 and s.whisperx_container in out
|
||||||
|
|
||||||
|
async def trigger(self) -> WhisperXInstallJob:
|
||||||
|
if self.lock.locked():
|
||||||
|
raise RuntimeError("a WhisperX install is already in progress")
|
||||||
|
s = self.settings
|
||||||
|
if not s.whisperx_host or not s.whisperx_user:
|
||||||
|
raise RuntimeError("whisperx host/user not configured")
|
||||||
|
for local_name in BUILD_FILES:
|
||||||
|
if not (BUILD_CONTEXT_DIR / local_name).exists():
|
||||||
|
raise RuntimeError(f"build context file missing inside spark-control image: {local_name}")
|
||||||
|
job = WhisperXInstallJob(
|
||||||
|
id=uuid.uuid4().hex[:8],
|
||||||
|
started_at=datetime.now(timezone.utc).isoformat(),
|
||||||
|
)
|
||||||
|
self.jobs[job.id] = job
|
||||||
|
self.current_job_id = job.id
|
||||||
|
asyncio.create_task(self._run(job))
|
||||||
|
return job
|
||||||
|
|
||||||
|
async def _run(self, job: WhisperXInstallJob) -> None:
|
||||||
|
async with self.lock:
|
||||||
|
try:
|
||||||
|
await self._do(job)
|
||||||
|
if job.state != "failed":
|
||||||
|
job.state = "done"
|
||||||
|
job.returncode = 0
|
||||||
|
job.phase = "Done — WhisperX is running on port 8002"
|
||||||
|
except Exception as e:
|
||||||
|
job.append(f"[error] {type(e).__name__}: {e}")
|
||||||
|
job.state = "failed"
|
||||||
|
if job.returncode is None:
|
||||||
|
job.returncode = 1
|
||||||
|
finally:
|
||||||
|
job.finished_at = datetime.now(timezone.utc).isoformat()
|
||||||
|
if self.current_job_id == job.id:
|
||||||
|
self.current_job_id = None
|
||||||
|
|
||||||
|
async def _ssh_pipe(self, host: str, user: str, remote_cmd: str,
|
||||||
|
payload: bytes, timeout: float = 60.0) -> tuple[bool, str, str]:
|
||||||
|
"""ssh user@host <remote_cmd> with payload piped to stdin."""
|
||||||
|
args = _base_args(self.settings) + [f"{user}@{host}", remote_cmd]
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
*args,
|
||||||
|
stdin=asyncio.subprocess.PIPE,
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.PIPE,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
stdout_b, stderr_b = await asyncio.wait_for(
|
||||||
|
proc.communicate(input=payload), timeout=timeout
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
proc.kill(); await proc.wait()
|
||||||
|
return False, "", f"timeout after {timeout}s"
|
||||||
|
return proc.returncode == 0, stdout_b.decode(errors="replace"), stderr_b.decode(errors="replace")
|
||||||
|
|
||||||
|
async def _do(self, job: WhisperXInstallJob) -> None:
|
||||||
|
s = self.settings
|
||||||
|
host = s.whisperx_host
|
||||||
|
user = s.whisperx_user
|
||||||
|
# NOTE: `~` does not expand inside shlex.quote() single-quotes (bit us
|
||||||
|
# in v0.12.0:0). Use a $HOME-relative path that the REMOTE shell
|
||||||
|
# expands; all path components are hardcoded so injection is moot.
|
||||||
|
build_dir_remote = "\"$HOME\"/whisperx-build"
|
||||||
|
build_dir_display = "~/whisperx-build"
|
||||||
|
|
||||||
|
# ── Phase 1: stage build context on Spark 2 ──
|
||||||
|
job.state = "sending"
|
||||||
|
job.phase = "Sending build context to Spark 2…"
|
||||||
|
job.append(f"$ ssh {user}@{host} 'mkdir -p {build_dir_display}/app'")
|
||||||
|
rc, out, err = await ssh_run(
|
||||||
|
host, user,
|
||||||
|
f"mkdir -p {build_dir_remote}/app && "
|
||||||
|
f"rm -f {build_dir_remote}/Dockerfile {build_dir_remote}/requirements.txt "
|
||||||
|
f"{build_dir_remote}/README.md {build_dir_remote}/app/main.py",
|
||||||
|
s, timeout=10,
|
||||||
|
)
|
||||||
|
if rc != 0:
|
||||||
|
job.append(f"[mkdir failed] {err.strip()}")
|
||||||
|
raise RuntimeError("failed to create build directory")
|
||||||
|
for local_name, remote_rel in BUILD_FILES.items():
|
||||||
|
local_path = BUILD_CONTEXT_DIR / local_name
|
||||||
|
body = local_path.read_bytes()
|
||||||
|
remote_path_for_shell = f"{build_dir_remote}/{remote_rel}"
|
||||||
|
# remote_rel is hardcoded ("Dockerfile" / "app/main.py" etc.) — safe
|
||||||
|
# to embed unquoted inside the double-quoted $HOME path.
|
||||||
|
cmd = f"cat > {remote_path_for_shell}"
|
||||||
|
ok, out, err = await self._ssh_pipe(host, user, cmd, body, timeout=30)
|
||||||
|
if not ok:
|
||||||
|
job.append(f"[scp {local_name} failed] {err.strip()[:200]}")
|
||||||
|
raise RuntimeError(f"failed to ship {local_name}")
|
||||||
|
job.append(f" → {build_dir_display}/{remote_rel} ({len(body)} bytes)")
|
||||||
|
|
||||||
|
# ── Phase 2: docker build ──
|
||||||
|
job.state = "building"
|
||||||
|
job.phase = "Building Docker image on Spark 2 (this is the slow part — 5–15 min if base layers aren't cached)…"
|
||||||
|
build_cmd = (
|
||||||
|
f"set -e; "
|
||||||
|
f"cd {build_dir_remote}; "
|
||||||
|
f"echo '=== docker build -t {s.whisperx_container}:latest . ==='; "
|
||||||
|
f"docker build -t {s.whisperx_container}:latest ."
|
||||||
|
)
|
||||||
|
job.append(f"$ {build_cmd}")
|
||||||
|
handle = StreamHandle()
|
||||||
|
async for line in ssh_stream(host, user, build_cmd, s, handle=handle):
|
||||||
|
job.append(line)
|
||||||
|
if "Step " in line and "/" in line:
|
||||||
|
# docker build progress: "Step 5/10 : RUN pip install ..."
|
||||||
|
job.phase = f"Building: {line.strip()[:120]}"
|
||||||
|
elif "Successfully built" in line or "naming to" in line:
|
||||||
|
job.phase = "Image built — preparing to start container…"
|
||||||
|
if (handle.returncode or 0) != 0:
|
||||||
|
job.returncode = handle.returncode
|
||||||
|
raise RuntimeError(f"docker build failed (rc={handle.returncode})")
|
||||||
|
|
||||||
|
# ── Phase 3: docker run ──
|
||||||
|
job.state = "running"
|
||||||
|
job.phase = "Starting container…"
|
||||||
|
run_cmd = (
|
||||||
|
f"set -e; "
|
||||||
|
f"echo '=== removing any prior {s.whisperx_container} container ==='; "
|
||||||
|
f"docker rm -f {s.whisperx_container} 2>/dev/null || true; "
|
||||||
|
f"echo '=== docker run -d --restart unless-stopped --name {s.whisperx_container} ==='; "
|
||||||
|
f"HF_TOKEN=$(cat ~/.cache/huggingface/token 2>/dev/null || true); "
|
||||||
|
f"if [ -z \"$HF_TOKEN\" ]; then echo 'WARN: no HF_TOKEN found at ~/.cache/huggingface/token — diarization will be disabled until you set one'; fi; "
|
||||||
|
f"docker run -d --restart unless-stopped "
|
||||||
|
f"--name {s.whisperx_container} "
|
||||||
|
f"--gpus all --memory=40g "
|
||||||
|
f"-p {s.whisperx_port}:{s.whisperx_port} "
|
||||||
|
f"-v whisperx-models:/root/.cache/huggingface "
|
||||||
|
f"-e HF_TOKEN=\"$HF_TOKEN\" "
|
||||||
|
f"-e WHISPER_MODEL={s.whisperx_model} "
|
||||||
|
f"{s.whisperx_container}:latest"
|
||||||
|
)
|
||||||
|
job.append(f"$ {run_cmd}")
|
||||||
|
rc, out, err = await ssh_run(host, user, run_cmd, s, timeout=60)
|
||||||
|
if rc != 0:
|
||||||
|
job.append(f"[docker run failed rc={rc}] {(err or out).strip()[:300]}")
|
||||||
|
raise RuntimeError("docker run failed")
|
||||||
|
job.append(out.strip())
|
||||||
|
|
||||||
|
# ── Phase 4: wait for /health to report ready ──
|
||||||
|
job.phase = "Container is starting; loading whisper + alignment + pyannote models (~60–120 s on first boot)…"
|
||||||
|
url = f"http://{s.whisperx_host}:{s.whisperx_port}/health"
|
||||||
|
ready = False
|
||||||
|
for i in range(60): # up to ~180 s
|
||||||
|
await asyncio.sleep(3)
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=4.0) as client:
|
||||||
|
r = await client.get(url)
|
||||||
|
if r.status_code == 200:
|
||||||
|
body = r.json()
|
||||||
|
if body.get("status") == "ready":
|
||||||
|
ready = True
|
||||||
|
job.append(f"[ready] {body}")
|
||||||
|
break
|
||||||
|
job.phase = f"Loading models (transcribe={body.get('transcribe_loaded')}, align={body.get('align_loaded')}, diarize={body.get('diarizer_loaded')})…"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if not ready:
|
||||||
|
raise RuntimeError("container started but /health did not report ready within ~180 s — check `docker logs whisperx-asr` on Spark 2")
|
||||||
|
job.phase = "Done — WhisperX is healthy and reachable on port 8002"
|
||||||
@@ -0,0 +1,69 @@
|
|||||||
|
"""Wake-on-LAN.
|
||||||
|
|
||||||
|
Two delivery paths, tried in order:
|
||||||
|
|
||||||
|
1. SSH into the other Spark and have IT broadcast — most reliable because the
|
||||||
|
packet originates from the same LAN subnet as the sleeping Spark.
|
||||||
|
2. Direct UDP broadcast from this container. May or may not work depending
|
||||||
|
on the StartOS container's network namespace.
|
||||||
|
|
||||||
|
The DGX Spark's NIC must have WoL enabled in firmware/OS for either path to
|
||||||
|
actually wake the box; this module just delivers the magic packet correctly.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import asyncio
|
||||||
|
import re
|
||||||
|
import socket
|
||||||
|
|
||||||
|
from .config import Settings
|
||||||
|
from .ssh import ssh_run
|
||||||
|
|
||||||
|
|
||||||
|
_MAC_RE = re.compile(r"^[0-9a-fA-F]{2}([:-]?[0-9a-fA-F]{2}){5}$")
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_mac(mac: str) -> str:
|
||||||
|
mac = mac.strip().lower()
|
||||||
|
if not _MAC_RE.match(mac):
|
||||||
|
raise ValueError(f"invalid MAC address: {mac!r}")
|
||||||
|
return mac.replace("-", ":")
|
||||||
|
|
||||||
|
|
||||||
|
def build_magic_packet(mac: str) -> bytes:
|
||||||
|
mac_bytes = bytes.fromhex(normalize_mac(mac).replace(":", ""))
|
||||||
|
return b"\xff" * 6 + mac_bytes * 16
|
||||||
|
|
||||||
|
|
||||||
|
def send_local_broadcast(mac: str, broadcast: str = "255.255.255.255", port: int = 9) -> None:
|
||||||
|
"""Send from THIS container. May not reach the LAN in some topologies."""
|
||||||
|
pkt = build_magic_packet(mac)
|
||||||
|
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
||||||
|
try:
|
||||||
|
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
|
||||||
|
s.sendto(pkt, (broadcast, port))
|
||||||
|
# Also send to port 7 (alternate WoL convention) for safety
|
||||||
|
s.sendto(pkt, (broadcast, 7))
|
||||||
|
finally:
|
||||||
|
s.close()
|
||||||
|
|
||||||
|
|
||||||
|
async def send_via_peer(host: str, user: str, mac: str, settings: Settings) -> tuple[bool, str]:
|
||||||
|
"""Use a different (reachable) Spark to send the WoL packet to its peer.
|
||||||
|
|
||||||
|
Uses Python 3 (always present on the Sparks for vLLM) to avoid depending on
|
||||||
|
wakeonlan / etherwake being installed.
|
||||||
|
"""
|
||||||
|
normalized = normalize_mac(mac)
|
||||||
|
mac_hex = normalized.replace(":", "")
|
||||||
|
py = (
|
||||||
|
"python3 -c \""
|
||||||
|
"import socket; "
|
||||||
|
f"m=bytes.fromhex('{mac_hex}'); "
|
||||||
|
"s=socket.socket(socket.AF_INET, socket.SOCK_DGRAM); "
|
||||||
|
"s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1); "
|
||||||
|
"s.sendto(b'\\xff'*6 + m*16, ('255.255.255.255', 9)); "
|
||||||
|
"s.sendto(b'\\xff'*6 + m*16, ('255.255.255.255', 7)); "
|
||||||
|
"print('sent')\""
|
||||||
|
)
|
||||||
|
rc, out, err = await ssh_run(host, user, py, settings, timeout=8)
|
||||||
|
return rc == 0 and "sent" in out, (err.strip() or out.strip() or f"rc={rc}")
|
||||||
@@ -30,6 +30,7 @@ models:
|
|||||||
- -tp=2
|
- -tp=2
|
||||||
- --distributed-executor-backend=ray
|
- --distributed-executor-backend=ray
|
||||||
- --max-model-len=32768
|
- --max-model-len=32768
|
||||||
|
- --max-num-batched-tokens=16384
|
||||||
|
|
||||||
gemma4:
|
gemma4:
|
||||||
display_name: "Gemma 4 31B"
|
display_name: "Gemma 4 31B"
|
||||||
@@ -45,6 +46,7 @@ models:
|
|||||||
vllm_args:
|
vllm_args:
|
||||||
- --gpu-memory-utilization=0.8
|
- --gpu-memory-utilization=0.8
|
||||||
- --max-model-len=32768
|
- --max-model-len=32768
|
||||||
|
- --max-num-batched-tokens=16384
|
||||||
- --reasoning-parser=gemma4
|
- --reasoning-parser=gemma4
|
||||||
- --tool-call-parser=gemma4
|
- --tool-call-parser=gemma4
|
||||||
- --enable-auto-tool-choice
|
- --enable-auto-tool-choice
|
||||||
@@ -66,6 +68,7 @@ models:
|
|||||||
vllm_args:
|
vllm_args:
|
||||||
- --gpu-memory-utilization=0.85
|
- --gpu-memory-utilization=0.85
|
||||||
- --max-model-len=65536
|
- --max-model-len=65536
|
||||||
|
- --max-num-batched-tokens=16384
|
||||||
- --reasoning-parser=qwen3
|
- --reasoning-parser=qwen3
|
||||||
- --moe_backend=flashinfer_cutlass
|
- --moe_backend=flashinfer_cutlass
|
||||||
- --load-format=fastsafetensors
|
- --load-format=fastsafetensors
|
||||||
|
|||||||
Executable
+54
@@ -0,0 +1,54 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Apply Sortformer diarization patches to a running parakeet-asr container.
|
||||||
|
#
|
||||||
|
# Run from the spark-control repo root on the laptop:
|
||||||
|
# bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user>
|
||||||
|
#
|
||||||
|
# What it does:
|
||||||
|
# 1. Backs up the current /opt/parakeet/app/main.py inside the container
|
||||||
|
# (writable layer; survives docker restart but NOT docker rm).
|
||||||
|
# 2. Copies the patched main.py + new diarizer.py into the container.
|
||||||
|
# 3. Restarts the container so the new code + Sortformer model load.
|
||||||
|
#
|
||||||
|
# Reversibility:
|
||||||
|
# - The backup of main.py is at /opt/parakeet/app/main.py.pre-sortformer
|
||||||
|
# inside the container. Restore with:
|
||||||
|
# docker exec parakeet-asr cp /opt/parakeet/app/main.py.pre-sortformer /opt/parakeet/app/main.py
|
||||||
|
# docker exec parakeet-asr rm -f /opt/parakeet/app/diarizer.py
|
||||||
|
# docker restart parakeet-asr
|
||||||
|
# - If the container is ever `docker rm`'d (volume rebuild), re-run this
|
||||||
|
# script. We will eventually fold this into spark-control as an action.
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
HOST="${1:?usage: apply.sh <spark2-host> <ssh-user>}"
|
||||||
|
USER="${2:?usage: apply.sh <spark2-host> <ssh-user>}"
|
||||||
|
CONTAINER="${CONTAINER:-parakeet-asr}"
|
||||||
|
|
||||||
|
REPO_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||||
|
|
||||||
|
echo "→ Backing up current main.py inside ${CONTAINER}..."
|
||||||
|
ssh "${USER}@${HOST}" "docker exec ${CONTAINER} sh -c \
|
||||||
|
'test -f /opt/parakeet/app/main.py.pre-sortformer || cp /opt/parakeet/app/main.py /opt/parakeet/app/main.py.pre-sortformer'"
|
||||||
|
|
||||||
|
echo "→ Copying diarizer.py into container..."
|
||||||
|
ssh "${USER}@${HOST}" "docker exec -i ${CONTAINER} sh -c \
|
||||||
|
'cat > /opt/parakeet/app/diarizer.py'" < "${REPO_DIR}/diarizer.py"
|
||||||
|
|
||||||
|
echo "→ Copying patched main.py into container..."
|
||||||
|
ssh "${USER}@${HOST}" "docker exec -i ${CONTAINER} sh -c \
|
||||||
|
'cat > /opt/parakeet/app/main.py'" < "${REPO_DIR}/main.py"
|
||||||
|
|
||||||
|
echo "→ Verifying syntax inside container..."
|
||||||
|
ssh "${USER}@${HOST}" "docker exec ${CONTAINER} python3 -c \
|
||||||
|
'import ast; ast.parse(open(\"/opt/parakeet/app/diarizer.py\").read()); ast.parse(open(\"/opt/parakeet/app/main.py\").read()); print(\"py OK\")'"
|
||||||
|
|
||||||
|
echo "→ Restarting ${CONTAINER}..."
|
||||||
|
ssh "${USER}@${HOST}" "docker restart ${CONTAINER}"
|
||||||
|
|
||||||
|
echo
|
||||||
|
echo "✔ Patches applied. Sortformer model (~150 MB) will download on first load — wait ~30s before testing."
|
||||||
|
echo
|
||||||
|
echo "Test once it's ready:"
|
||||||
|
echo " curl -sS http://${HOST}:8000/health"
|
||||||
|
echo " curl -sS -X POST http://${HOST}:8000/v1/audio/diarize -F file=@some-audio.mp3 | head -c 500"
|
||||||
@@ -0,0 +1,164 @@
|
|||||||
|
"""Speaker diarization via NVIDIA NeMo Sortformer.
|
||||||
|
|
||||||
|
This module is dropped into the Parakeet container at /opt/parakeet/app/diarizer.py
|
||||||
|
and loaded alongside the existing ASR model. The Sortformer model identifies who
|
||||||
|
is speaking when in an audio file, output as a list of {start_s, end_s, speaker}
|
||||||
|
turns. It does NOT transcribe — pair its output with Parakeet's word-level
|
||||||
|
timestamps to produce a diarized transcript.
|
||||||
|
|
||||||
|
Model: nvidia/diar_sortformer_4spk-v1 (~150 MB, NeMo ecosystem, ungated)
|
||||||
|
|
||||||
|
Memory: adds ~200 MB to the running container. Same GPU as Parakeet (Spark 2
|
||||||
|
unified GB10). No interference with Parakeet inference because they're called
|
||||||
|
on separate code paths and CUDA handles concurrent kernels.
|
||||||
|
"""
|
||||||
|
import io
|
||||||
|
import os
|
||||||
|
import logging
|
||||||
|
import tempfile
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import soundfile as sf
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
DIARIZER_MODEL = os.getenv("DIARIZER_MODEL", "nvidia/diar_sortformer_4spk-v1")
|
||||||
|
TARGET_SAMPLE_RATE = 16000
|
||||||
|
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
|
||||||
|
|
||||||
|
def _convert_to_wav_16k_mono(audio_bytes: bytes, original_filename: str) -> str:
|
||||||
|
"""Same conversion as transcriber.py — keeps a uniform input format
|
||||||
|
for the diarizer regardless of upload mime type."""
|
||||||
|
suffix = Path(original_filename).suffix.lower() if original_filename else ".wav"
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_in:
|
||||||
|
tmp_in.write(audio_bytes)
|
||||||
|
tmp_in_path = tmp_in.name
|
||||||
|
tmp_out_path = tmp_in_path + ".converted.wav"
|
||||||
|
try:
|
||||||
|
cmd = ["ffmpeg", "-y", "-i", tmp_in_path, "-ac", "1", "-ar", "16000",
|
||||||
|
"-sample_fmt", "s16", "-f", "wav", tmp_out_path]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, timeout=300)
|
||||||
|
if result.returncode != 0:
|
||||||
|
raise RuntimeError(f"ffmpeg failed: {result.stderr.decode()[:500]}")
|
||||||
|
return tmp_out_path
|
||||||
|
finally:
|
||||||
|
try: os.unlink(tmp_in_path)
|
||||||
|
except OSError: pass
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_sortformer_segments(raw_output) -> list[dict]:
|
||||||
|
"""Sortformer.diarize() returns List[List[str]] where each inner list is
|
||||||
|
per-file results: each entry is a space-separated 'start_s end_s speaker_label'
|
||||||
|
triplet (e.g., '0.00 4.50 speaker_0'). Normalize to our canonical format."""
|
||||||
|
if not raw_output:
|
||||||
|
return []
|
||||||
|
# Single-file invocation → take first inner list
|
||||||
|
entries = raw_output[0] if isinstance(raw_output, list) and raw_output and isinstance(raw_output[0], list) else raw_output
|
||||||
|
segments = []
|
||||||
|
for entry in entries:
|
||||||
|
if not entry:
|
||||||
|
continue
|
||||||
|
if isinstance(entry, str):
|
||||||
|
parts = entry.strip().split()
|
||||||
|
if len(parts) >= 3:
|
||||||
|
try:
|
||||||
|
start = float(parts[0])
|
||||||
|
end = float(parts[1])
|
||||||
|
speaker_raw = parts[2]
|
||||||
|
# Normalize "speaker_0" / "spk_0" / "0" → "Speaker_0"
|
||||||
|
if speaker_raw.lower().startswith("speaker_"):
|
||||||
|
idx = speaker_raw.split("_", 1)[1]
|
||||||
|
elif speaker_raw.lower().startswith("spk_"):
|
||||||
|
idx = speaker_raw.split("_", 1)[1]
|
||||||
|
elif speaker_raw.isdigit():
|
||||||
|
idx = speaker_raw
|
||||||
|
else:
|
||||||
|
idx = speaker_raw
|
||||||
|
segments.append({
|
||||||
|
"start_s": start,
|
||||||
|
"end_s": end,
|
||||||
|
"speaker": f"Speaker_{idx}",
|
||||||
|
})
|
||||||
|
except (ValueError, IndexError) as e:
|
||||||
|
logger.warning(f"unparsable sortformer entry: {entry!r} ({e})")
|
||||||
|
continue
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
class SortformerDiarizer:
|
||||||
|
def __init__(self):
|
||||||
|
self.model = None
|
||||||
|
self._loaded = False
|
||||||
|
|
||||||
|
def load_model(self):
|
||||||
|
if self._loaded:
|
||||||
|
return
|
||||||
|
logger.info(f"Loading diarizer {DIARIZER_MODEL} on {DEVICE}...")
|
||||||
|
from nemo.collections.asr.models import SortformerEncLabelModel
|
||||||
|
self.model = SortformerEncLabelModel.from_pretrained(DIARIZER_MODEL)
|
||||||
|
self.model.eval()
|
||||||
|
if DEVICE == "cuda":
|
||||||
|
self.model = self.model.cuda()
|
||||||
|
self._loaded = True
|
||||||
|
logger.info(f"Diarizer loaded on {DEVICE}")
|
||||||
|
|
||||||
|
def diarize(self, audio_bytes: bytes, filename: str = "audio.wav") -> dict:
|
||||||
|
"""Run diarization on a single audio file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{
|
||||||
|
"segments": [{"start_s": float, "end_s": float, "speaker": str}, ...],
|
||||||
|
"speakers_detected": ["Speaker_0", "Speaker_1", ...],
|
||||||
|
"duration": float,
|
||||||
|
"model": str,
|
||||||
|
"device": str,
|
||||||
|
}
|
||||||
|
|
||||||
|
Speaker labels are zero-indexed strings like "Speaker_0", "Speaker_1",
|
||||||
|
etc. They are NOT real names — that mapping happens downstream via LLM
|
||||||
|
analysis or manual UI correction.
|
||||||
|
"""
|
||||||
|
if not self._loaded:
|
||||||
|
self.load_model()
|
||||||
|
if not audio_bytes:
|
||||||
|
raise ValueError("empty audio")
|
||||||
|
wav_path = None
|
||||||
|
try:
|
||||||
|
wav_path = _convert_to_wav_16k_mono(audio_bytes, filename)
|
||||||
|
data, sr = sf.read(wav_path)
|
||||||
|
duration = len(data) / sr
|
||||||
|
logger.info(f"Diarizing {duration:.1f}s of audio ({filename})")
|
||||||
|
|
||||||
|
with torch.no_grad():
|
||||||
|
raw = self.model.diarize(
|
||||||
|
audio=[wav_path],
|
||||||
|
batch_size=1,
|
||||||
|
verbose=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
segments = _parse_sortformer_segments(raw)
|
||||||
|
speakers = sorted({s["speaker"] for s in segments})
|
||||||
|
logger.info(f"Detected {len(speakers)} speakers across {len(segments)} turns")
|
||||||
|
|
||||||
|
if DEVICE == "cuda":
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"segments": segments,
|
||||||
|
"speakers_detected": speakers,
|
||||||
|
"duration": round(duration, 3),
|
||||||
|
"model": DIARIZER_MODEL,
|
||||||
|
"device": DEVICE,
|
||||||
|
}
|
||||||
|
finally:
|
||||||
|
if wav_path:
|
||||||
|
try: os.unlink(wav_path)
|
||||||
|
except OSError: pass
|
||||||
|
|
||||||
|
|
||||||
|
diarizer = SortformerDiarizer()
|
||||||
@@ -0,0 +1,158 @@
|
|||||||
|
import os
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from fastapi import FastAPI, File, Form, UploadFile, HTTPException
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
from app.transcriber import transcriber, MODEL_NAME, DEVICE
|
||||||
|
from app.diarizer import diarizer, DIARIZER_MODEL
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
|
||||||
|
logger = logging.getLogger("parakeet-api")
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
logger.info(f"Loading ASR model {MODEL_NAME} on {DEVICE}")
|
||||||
|
transcriber.load_model()
|
||||||
|
logger.info("ASR model ready")
|
||||||
|
logger.info(f"Loading diarizer {DIARIZER_MODEL} on {DEVICE}")
|
||||||
|
diarizer.load_model()
|
||||||
|
logger.info("Diarizer ready")
|
||||||
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(title="Parakeet ASR + Sortformer Diarization API", version="1.2.0", lifespan=lifespan)
|
||||||
|
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True,
|
||||||
|
allow_methods=["*"], allow_headers=["*"])
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
return {"service": "parakeet-asr", "model": MODEL_NAME, "diarizer": DIARIZER_MODEL, "device": DEVICE,
|
||||||
|
"endpoints": {"transcribe": "/v1/audio/transcriptions",
|
||||||
|
"diarize": "/v1/audio/diarize",
|
||||||
|
"models": "/v1/models", "health": "/health"}}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
return {"status": "ready" if (transcriber._loaded and diarizer._loaded) else "loading",
|
||||||
|
"asr_loaded": transcriber._loaded,
|
||||||
|
"diarizer_loaded": diarizer._loaded,
|
||||||
|
"model": MODEL_NAME,
|
||||||
|
"diarizer_model": DIARIZER_MODEL,
|
||||||
|
"device": DEVICE}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/v1/models")
|
||||||
|
async def list_models():
|
||||||
|
return {"object": "list", "data": [
|
||||||
|
{"id": "parakeet-tdt-0.6b-v3", "object": "model", "owned_by": "nvidia", "kind": "stt"},
|
||||||
|
{"id": "whisper-1", "object": "model", "owned_by": "nvidia", "kind": "stt"},
|
||||||
|
{"id": DIARIZER_MODEL.split("/")[-1], "object": "model", "owned_by": "nvidia", "kind": "diarization"}]}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/transcriptions")
|
||||||
|
async def transcribe(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default="parakeet-tdt-0.6b-v3"),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json"),
|
||||||
|
temperature: Optional[float] = Form(default=0.0),
|
||||||
|
prompt: Optional[str] = Form(default=None),
|
||||||
|
):
|
||||||
|
if not transcriber._loaded:
|
||||||
|
raise HTTPException(status_code=503, detail="Model loading")
|
||||||
|
audio_bytes = await file.read()
|
||||||
|
if len(audio_bytes) == 0:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty file")
|
||||||
|
|
||||||
|
max_size = int(os.getenv("MAX_UPLOAD_MB", "200")) * 1024 * 1024
|
||||||
|
if len(audio_bytes) > max_size:
|
||||||
|
raise HTTPException(status_code=413, detail=f"File too large")
|
||||||
|
|
||||||
|
want_timestamps = response_format == "verbose_json"
|
||||||
|
start_time = time.time()
|
||||||
|
try:
|
||||||
|
result = transcriber.transcribe(
|
||||||
|
audio_bytes, file.filename, language, timestamps=want_timestamps
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception("Transcription failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"Failed: {e}")
|
||||||
|
elapsed = time.time() - start_time
|
||||||
|
duration = result.get("duration", 0)
|
||||||
|
rtfx = duration / elapsed if elapsed > 0 else 0
|
||||||
|
logger.info(f"Done: {duration:.1f}s in {elapsed:.1f}s ({rtfx:.0f}x rt)")
|
||||||
|
|
||||||
|
if response_format == "text":
|
||||||
|
return JSONResponse(content=result["text"], media_type="text/plain")
|
||||||
|
if response_format == "verbose_json":
|
||||||
|
return {
|
||||||
|
"task": "transcribe",
|
||||||
|
"language": language or "en",
|
||||||
|
"duration": duration,
|
||||||
|
"text": result["text"],
|
||||||
|
"segments": result.get("segments", []),
|
||||||
|
"words": result.get("words", []),
|
||||||
|
}
|
||||||
|
return {"text": result["text"]}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/translations")
|
||||||
|
async def translate(file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default="parakeet-tdt-0.6b-v3"),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json")):
|
||||||
|
return await transcribe(file=file, model=model, language=language,
|
||||||
|
response_format=response_format)
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/diarize")
|
||||||
|
async def diarize(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
):
|
||||||
|
"""Speaker diarization via Sortformer.
|
||||||
|
|
||||||
|
Returns who-spoke-when as a list of turns. Does NOT transcribe — pair this
|
||||||
|
output with /v1/audio/transcriptions (verbose_json) and merge by timestamp
|
||||||
|
to produce a diarized transcript.
|
||||||
|
|
||||||
|
Response shape:
|
||||||
|
{
|
||||||
|
"segments": [{"start_s": 0.00, "end_s": 4.50, "speaker": "Speaker_0"}, ...],
|
||||||
|
"speakers_detected": ["Speaker_0", "Speaker_1"],
|
||||||
|
"duration": 90.5,
|
||||||
|
"model": "nvidia/diar_sortformer_4spk-v1",
|
||||||
|
"device": "cuda"
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
if not diarizer._loaded:
|
||||||
|
raise HTTPException(status_code=503, detail="Diarizer loading")
|
||||||
|
audio_bytes = await file.read()
|
||||||
|
if len(audio_bytes) == 0:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty file")
|
||||||
|
|
||||||
|
max_size = int(os.getenv("MAX_UPLOAD_MB", "200")) * 1024 * 1024
|
||||||
|
if len(audio_bytes) > max_size:
|
||||||
|
raise HTTPException(status_code=413, detail="File too large")
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
try:
|
||||||
|
result = diarizer.diarize(audio_bytes, file.filename or "audio.wav")
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception("Diarization failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"Failed: {e}")
|
||||||
|
elapsed = time.time() - start_time
|
||||||
|
duration = result.get("duration", 0)
|
||||||
|
rtfx = duration / elapsed if elapsed > 0 else 0
|
||||||
|
logger.info(f"Diarized {duration:.1f}s in {elapsed:.1f}s ({rtfx:.0f}x rt), "
|
||||||
|
f"{len(result['speakers_detected'])} speakers, {len(result['segments'])} turns")
|
||||||
|
return result
|
||||||
@@ -0,0 +1,105 @@
|
|||||||
|
import os
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from fastapi import FastAPI, File, Form, UploadFile, HTTPException
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
from app.transcriber import transcriber, MODEL_NAME, DEVICE
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
|
||||||
|
logger = logging.getLogger("parakeet-api")
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
logger.info(f"Loading model {MODEL_NAME} on {DEVICE}")
|
||||||
|
transcriber.load_model()
|
||||||
|
logger.info("Model ready")
|
||||||
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(title="Parakeet ASR API", version="1.1.0", lifespan=lifespan)
|
||||||
|
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True,
|
||||||
|
allow_methods=["*"], allow_headers=["*"])
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
return {"service": "parakeet-asr", "model": MODEL_NAME, "device": DEVICE,
|
||||||
|
"endpoints": {"transcribe": "/v1/audio/transcriptions",
|
||||||
|
"models": "/v1/models", "health": "/health"}}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
return {"status": "ready" if transcriber._loaded else "loading",
|
||||||
|
"model": MODEL_NAME, "device": DEVICE}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/v1/models")
|
||||||
|
async def list_models():
|
||||||
|
return {"object": "list", "data": [
|
||||||
|
{"id": "parakeet-tdt-0.6b-v3", "object": "model", "owned_by": "nvidia"},
|
||||||
|
{"id": "whisper-1", "object": "model", "owned_by": "nvidia"}]}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/transcriptions")
|
||||||
|
async def transcribe(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default="parakeet-tdt-0.6b-v3"),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json"),
|
||||||
|
temperature: Optional[float] = Form(default=0.0),
|
||||||
|
prompt: Optional[str] = Form(default=None),
|
||||||
|
):
|
||||||
|
if not transcriber._loaded:
|
||||||
|
raise HTTPException(status_code=503, detail="Model loading")
|
||||||
|
audio_bytes = await file.read()
|
||||||
|
if len(audio_bytes) == 0:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty file")
|
||||||
|
|
||||||
|
max_size = int(os.getenv("MAX_UPLOAD_MB", "200")) * 1024 * 1024
|
||||||
|
if len(audio_bytes) > max_size:
|
||||||
|
raise HTTPException(status_code=413, detail=f"File too large")
|
||||||
|
|
||||||
|
want_timestamps = response_format == "verbose_json"
|
||||||
|
start_time = time.time()
|
||||||
|
try:
|
||||||
|
result = transcriber.transcribe(
|
||||||
|
audio_bytes, file.filename, language, timestamps=want_timestamps
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception("Transcription failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"Failed: {e}")
|
||||||
|
elapsed = time.time() - start_time
|
||||||
|
duration = result.get("duration", 0)
|
||||||
|
rtfx = duration / elapsed if elapsed > 0 else 0
|
||||||
|
logger.info(f"Done: {duration:.1f}s in {elapsed:.1f}s ({rtfx:.0f}x rt)")
|
||||||
|
|
||||||
|
if response_format == "text":
|
||||||
|
return JSONResponse(content=result["text"], media_type="text/plain")
|
||||||
|
if response_format == "verbose_json":
|
||||||
|
return {
|
||||||
|
"task": "transcribe",
|
||||||
|
"language": language or "en",
|
||||||
|
"duration": duration,
|
||||||
|
"text": result["text"],
|
||||||
|
"segments": result.get("segments", []),
|
||||||
|
"words": result.get("words", []),
|
||||||
|
}
|
||||||
|
return {"text": result["text"]}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/translations")
|
||||||
|
async def translate(file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default="parakeet-tdt-0.6b-v3"),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json")):
|
||||||
|
return await transcribe(file=file, model=model, language=language,
|
||||||
|
response_format=response_format)
|
||||||
@@ -9,6 +9,7 @@ dependencies = [
|
|||||||
"pydantic>=2.9",
|
"pydantic>=2.9",
|
||||||
"pyyaml>=6.0",
|
"pyyaml>=6.0",
|
||||||
"httpx>=0.27",
|
"httpx>=0.27",
|
||||||
|
"python-multipart>=0.0.9",
|
||||||
]
|
]
|
||||||
|
|
||||||
[build-system]
|
[build-system]
|
||||||
|
|||||||
@@ -0,0 +1,51 @@
|
|||||||
|
# WhisperX ASR + diarization container for Spark 2 (Blackwell GB10, sm_120).
|
||||||
|
#
|
||||||
|
# Replaces the custom Parakeet wrapper + Sortformer overlay with a single
|
||||||
|
# mainline pipeline: faster-whisper for transcription + pyannote.audio 3.1
|
||||||
|
# for diarization + wav2vec2 forced alignment for word-level timestamps.
|
||||||
|
#
|
||||||
|
# Build (on Spark 2, where Blackwell + nvcr.io credentials are available):
|
||||||
|
# docker build -t whisperx-asr:latest .
|
||||||
|
#
|
||||||
|
# Run:
|
||||||
|
# docker run -d --restart unless-stopped --name whisperx-asr \
|
||||||
|
# --gpus all --memory=40g \
|
||||||
|
# -p 8002:8002 \
|
||||||
|
# -v whisperx-models:/root/.cache/huggingface \
|
||||||
|
# -e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
|
||||||
|
# -e WHISPER_MODEL=medium \
|
||||||
|
# whisperx-asr:latest
|
||||||
|
#
|
||||||
|
# The memory cap is intentional: even if WhisperX hits a pathological input,
|
||||||
|
# it gets OOM-killed cleanly instead of swap-thrashing the whole Spark.
|
||||||
|
|
||||||
|
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
||||||
|
|
||||||
|
# WhisperX runs ffmpeg under the hood for audio decoding
|
||||||
|
RUN apt-get update \
|
||||||
|
&& apt-get install -y --no-install-recommends ffmpeg \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Install whisperx + the FastAPI wrapper deps. --break-system-packages because
|
||||||
|
# the NGC PyTorch image has its own managed Python that's flagged "system".
|
||||||
|
COPY requirements.txt /tmp/requirements.txt
|
||||||
|
RUN pip install --break-system-packages --no-cache-dir -r /tmp/requirements.txt
|
||||||
|
|
||||||
|
# Pre-warm the default Whisper + alignment models at build time so first-call
|
||||||
|
# latency on a fresh container is small. (~3 GB cached into the image; if you
|
||||||
|
# want a smaller image, comment this out and accept the first-call download.)
|
||||||
|
ARG WHISPER_MODEL=medium
|
||||||
|
ENV WHISPER_MODEL=${WHISPER_MODEL}
|
||||||
|
RUN python3 -c "import whisperx; whisperx.load_model('${WHISPER_MODEL}', 'cpu', compute_type='int8')" \
|
||||||
|
&& python3 -c "import whisperx; whisperx.load_align_model(language_code='en', device='cpu')"
|
||||||
|
|
||||||
|
WORKDIR /opt/whisperx
|
||||||
|
COPY app /opt/whisperx/app
|
||||||
|
|
||||||
|
# Expose for spark-control's proxy on Spark 2
|
||||||
|
EXPOSE 8002
|
||||||
|
|
||||||
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=180s \
|
||||||
|
CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8002/health')" || exit 1
|
||||||
|
|
||||||
|
CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002", "--workers", "1"]
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
# WhisperX container for Spark 2
|
||||||
|
|
||||||
|
Replaces the custom Parakeet wrapper + Sortformer overlay (v0.10/v0.11) with a
|
||||||
|
single mainline pipeline:
|
||||||
|
|
||||||
|
- **faster-whisper** (CTranslate2-optimized) for STT
|
||||||
|
- **pyannote.audio 3.1** for speaker diarization (sliding-window — handles
|
||||||
|
long files in bounded memory, fixes the Sortformer OOM on 90-min audio)
|
||||||
|
- **wav2vec2 forced alignment** for word-level timestamps
|
||||||
|
|
||||||
|
Exposes the same API surface spark-control already proxies to, so the cutover
|
||||||
|
is a one-URL change in the audio proxy:
|
||||||
|
|
||||||
|
- `GET /health` — readiness probe
|
||||||
|
- `GET /v1/models` — model list
|
||||||
|
- `POST /v1/audio/transcriptions` — OpenAI-shaped STT
|
||||||
|
- `POST /v1/audio/transcribe-with-speakers` — merged diarized transcript
|
||||||
|
(matches spark-control's response shape exactly)
|
||||||
|
|
||||||
|
## Deploy to Spark 2
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Copy this directory to Spark 2
|
||||||
|
rsync -av --delete image/whisperx_container/ modelo@192.168.1.87:~/whisperx-build/
|
||||||
|
|
||||||
|
# 2. SSH in and build
|
||||||
|
ssh modelo@192.168.1.87
|
||||||
|
cd ~/whisperx-build
|
||||||
|
docker build -t whisperx-asr:latest .
|
||||||
|
|
||||||
|
# 3. Run alongside the existing parakeet-asr (which stays on 8000 for now)
|
||||||
|
docker run -d --restart unless-stopped --name whisperx-asr \
|
||||||
|
--gpus all --memory=40g \
|
||||||
|
-p 8002:8002 \
|
||||||
|
-v whisperx-models:/root/.cache/huggingface \
|
||||||
|
-e HF_TOKEN="$(cat ~/.cache/huggingface/token)" \
|
||||||
|
-e WHISPER_MODEL=medium \
|
||||||
|
whisperx-asr:latest
|
||||||
|
|
||||||
|
# 4. Watch first-start logs (model load + first health check)
|
||||||
|
docker logs -f whisperx-asr
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model size knobs
|
||||||
|
|
||||||
|
`WHISPER_MODEL` env var. Defaults to `medium`. Options:
|
||||||
|
|
||||||
|
| Model | Size | Speed (GB10) | Quality |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `tiny` | ~75M | ~120x rt | low |
|
||||||
|
| `base` | ~74M | ~80x rt | ok |
|
||||||
|
| `small` | ~244M | ~50x rt | good |
|
||||||
|
| `medium`| ~769M | ~30x rt | excellent (**default**) |
|
||||||
|
| `large-v3`| ~1.5B | ~15x rt | best |
|
||||||
|
|
||||||
|
For a 90-min file, medium takes ~3 min STT + ~9 min diarize ≈ ~12 min total.
|
||||||
|
|
||||||
|
## Memory budget
|
||||||
|
|
||||||
|
The `--memory=40g` cap is intentional. Spark 2 has 122 GB unified, of which
|
||||||
|
~35 GB is consumed by parakeet-asr + magpie-tts. The 40 GB cap leaves
|
||||||
|
comfortable headroom for both the model weights (~5 GB) and pyannote's
|
||||||
|
in-memory features (~5–15 GB for a 90-min audio). If WhisperX hits a
|
||||||
|
pathological input it gets OOM-killed cleanly instead of swap-thrashing the
|
||||||
|
whole Spark — the symptom we hit with the unbounded Sortformer container.
|
||||||
|
|
||||||
|
## Rollback to Parakeet+Sortformer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker stop whisperx-asr && docker rm whisperx-asr
|
||||||
|
```
|
||||||
|
|
||||||
|
The parakeet-asr container stays running throughout — spark-control's proxy
|
||||||
|
URL switch is reversible via config or version downgrade.
|
||||||
@@ -0,0 +1,355 @@
|
|||||||
|
"""WhisperX FastAPI wrapper — STT + speaker diarization in a single endpoint.
|
||||||
|
|
||||||
|
Endpoints (designed to be drop-in compatible with the existing spark-control
|
||||||
|
audio API surface, so the proxy just changes its upstream URL):
|
||||||
|
|
||||||
|
GET / — service info
|
||||||
|
GET /health — readiness probe
|
||||||
|
GET /v1/models — list loaded models
|
||||||
|
POST /v1/audio/transcriptions — OpenAI-shaped STT (no speakers)
|
||||||
|
POST /v1/audio/transcribe-with-speakers — merged diarized transcript
|
||||||
|
|
||||||
|
The /transcribe-with-speakers response shape EXACTLY matches what
|
||||||
|
spark-control's /api/audio/transcribe-with-speakers returns today (the one
|
||||||
|
that recap-relay's PR spec was written against), so swapping the upstream
|
||||||
|
from Parakeet+Sortformer to WhisperX is a one-URL change in the proxy.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import tempfile
|
||||||
|
import logging
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import whisperx
|
||||||
|
from fastapi import FastAPI, File, Form, UploadFile, HTTPException
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger("whisperx-api")
|
||||||
|
|
||||||
|
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
COMPUTE_TYPE = os.getenv("COMPUTE_TYPE", "float16" if DEVICE == "cuda" else "int8")
|
||||||
|
WHISPER_MODEL = os.getenv("WHISPER_MODEL", "medium")
|
||||||
|
DEFAULT_LANG = os.getenv("DEFAULT_LANGUAGE", "en")
|
||||||
|
BATCH_SIZE = int(os.getenv("BATCH_SIZE", "16"))
|
||||||
|
HF_TOKEN = os.getenv("HF_TOKEN") or None
|
||||||
|
|
||||||
|
|
||||||
|
class WhisperXEngine:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.transcribe_model = None
|
||||||
|
self.align_model = None
|
||||||
|
self.align_metadata = None
|
||||||
|
self.diarize_model = None
|
||||||
|
self._loaded = False
|
||||||
|
|
||||||
|
def load(self) -> None:
|
||||||
|
if self._loaded:
|
||||||
|
return
|
||||||
|
logger.info(f"Loading whisper-{WHISPER_MODEL} on {DEVICE} ({COMPUTE_TYPE})")
|
||||||
|
self.transcribe_model = whisperx.load_model(
|
||||||
|
WHISPER_MODEL, DEVICE, compute_type=COMPUTE_TYPE
|
||||||
|
)
|
||||||
|
logger.info(f"Loading alignment model for {DEFAULT_LANG}")
|
||||||
|
self.align_model, self.align_metadata = whisperx.load_align_model(
|
||||||
|
language_code=DEFAULT_LANG, device=DEVICE
|
||||||
|
)
|
||||||
|
if HF_TOKEN:
|
||||||
|
logger.info("Loading pyannote diarization pipeline (3.1)")
|
||||||
|
try:
|
||||||
|
self.diarize_model = whisperx.DiarizationPipeline(
|
||||||
|
use_auth_token=HF_TOKEN, device=DEVICE
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception(f"Diarization pipeline failed to load: {e}")
|
||||||
|
self.diarize_model = None
|
||||||
|
else:
|
||||||
|
logger.warning(
|
||||||
|
"HF_TOKEN not set — diarization disabled. /transcribe-with-speakers "
|
||||||
|
"will return 503. /transcriptions still works."
|
||||||
|
)
|
||||||
|
self._loaded = True
|
||||||
|
logger.info("WhisperX engine ready")
|
||||||
|
|
||||||
|
def transcribe(self, audio_bytes: bytes, filename: str, want_timestamps: bool = True) -> dict:
|
||||||
|
if not self._loaded:
|
||||||
|
self.load()
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
||||||
|
tmp.write(audio_bytes)
|
||||||
|
tmp_path = tmp.name
|
||||||
|
try:
|
||||||
|
audio = whisperx.load_audio(tmp_path)
|
||||||
|
duration = float(audio.shape[0]) / 16000.0
|
||||||
|
result = self.transcribe_model.transcribe(
|
||||||
|
audio, batch_size=BATCH_SIZE, language=DEFAULT_LANG
|
||||||
|
)
|
||||||
|
language = result.get("language") or DEFAULT_LANG
|
||||||
|
if want_timestamps:
|
||||||
|
aligned = whisperx.align(
|
||||||
|
result["segments"],
|
||||||
|
self.align_model,
|
||||||
|
self.align_metadata,
|
||||||
|
audio,
|
||||||
|
DEVICE,
|
||||||
|
return_char_alignments=False,
|
||||||
|
)
|
||||||
|
segments = aligned.get("segments", [])
|
||||||
|
else:
|
||||||
|
segments = result.get("segments", [])
|
||||||
|
full_text = " ".join(s.get("text", "").strip() for s in segments).strip()
|
||||||
|
return {
|
||||||
|
"duration": duration,
|
||||||
|
"language": language,
|
||||||
|
"text": full_text,
|
||||||
|
"segments": segments,
|
||||||
|
"audio_path": tmp_path,
|
||||||
|
"audio": audio, # caller can reuse for diarization without re-loading
|
||||||
|
}
|
||||||
|
finally:
|
||||||
|
# NOTE: caller is responsible for unlinking the temp file. We expose it
|
||||||
|
# in the return dict so diarization can run on the same audio without
|
||||||
|
# disk re-IO. The unlink happens in the request handler's finally.
|
||||||
|
pass
|
||||||
|
|
||||||
|
def diarize(self, audio) -> dict:
|
||||||
|
if self.diarize_model is None:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Diarization pipeline not loaded (HF_TOKEN missing or load failed)"
|
||||||
|
)
|
||||||
|
diar = self.diarize_model(audio)
|
||||||
|
return diar
|
||||||
|
|
||||||
|
|
||||||
|
engine = WhisperXEngine()
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
engine.load()
|
||||||
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(
|
||||||
|
title="WhisperX ASR + Diarization",
|
||||||
|
version="1.0.0",
|
||||||
|
lifespan=lifespan,
|
||||||
|
)
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=["*"],
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["*"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root() -> dict:
|
||||||
|
return {
|
||||||
|
"service": "whisperx",
|
||||||
|
"device": DEVICE,
|
||||||
|
"models": {
|
||||||
|
"transcription": f"whisper-{WHISPER_MODEL}",
|
||||||
|
"alignment": f"wav2vec2-{DEFAULT_LANG}",
|
||||||
|
"diarization": "pyannote-speaker-diarization-3.1" if engine.diarize_model else None,
|
||||||
|
},
|
||||||
|
"endpoints": {
|
||||||
|
"transcriptions": "/v1/audio/transcriptions",
|
||||||
|
"transcribe_with_speakers": "/v1/audio/transcribe-with-speakers",
|
||||||
|
"models": "/v1/models",
|
||||||
|
"health": "/health",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health() -> dict:
|
||||||
|
return {
|
||||||
|
"status": "ready" if engine._loaded else "loading",
|
||||||
|
"transcribe_loaded": engine.transcribe_model is not None,
|
||||||
|
"align_loaded": engine.align_model is not None,
|
||||||
|
"diarizer_loaded": engine.diarize_model is not None,
|
||||||
|
"model": f"whisper-{WHISPER_MODEL}",
|
||||||
|
"device": DEVICE,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/v1/models")
|
||||||
|
async def list_models() -> dict:
|
||||||
|
data = [
|
||||||
|
{"id": f"whisper-{WHISPER_MODEL}", "object": "model", "owned_by": "openai", "kind": "stt"},
|
||||||
|
]
|
||||||
|
if engine.diarize_model is not None:
|
||||||
|
data.append(
|
||||||
|
{"id": "pyannote-speaker-diarization-3.1", "object": "model",
|
||||||
|
"owned_by": "pyannote", "kind": "diarization"}
|
||||||
|
)
|
||||||
|
return {"object": "list", "data": data}
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_speaker(label: str) -> str:
|
||||||
|
"""WhisperX/pyannote uses 'SPEAKER_00' / 'SPEAKER_01' / ... — normalize to
|
||||||
|
the same 'Speaker_0' shape spark-control's existing endpoint returns."""
|
||||||
|
if not label:
|
||||||
|
return "Speaker_unknown"
|
||||||
|
if label.upper().startswith("SPEAKER_"):
|
||||||
|
idx = label.split("_", 1)[1].lstrip("0") or "0"
|
||||||
|
return f"Speaker_{idx}"
|
||||||
|
return label
|
||||||
|
|
||||||
|
|
||||||
|
def _segments_to_blocks(segments: list[dict]) -> list[dict]:
|
||||||
|
"""Convert WhisperX's per-utterance segments into the
|
||||||
|
[{start_ms, end_ms, speaker, text}, ...] block shape spark-control returns
|
||||||
|
today. Groups consecutive same-speaker segments into one block."""
|
||||||
|
blocks: list[dict] = []
|
||||||
|
cur = None
|
||||||
|
for s in segments:
|
||||||
|
spk_raw = s.get("speaker") or "Speaker_unknown"
|
||||||
|
spk = _normalize_speaker(spk_raw)
|
||||||
|
text = (s.get("text") or "").strip()
|
||||||
|
start_ms = int(float(s.get("start", 0)) * 1000)
|
||||||
|
end_ms = int(float(s.get("end", 0)) * 1000)
|
||||||
|
if not text:
|
||||||
|
continue
|
||||||
|
if cur is None or cur["speaker"] != spk or start_ms - cur["end_ms"] > 1500:
|
||||||
|
if cur is not None:
|
||||||
|
blocks.append(cur)
|
||||||
|
cur = {"start_ms": start_ms, "end_ms": end_ms, "speaker": spk, "text": text}
|
||||||
|
else:
|
||||||
|
cur["text"] = (cur["text"] + " " + text).strip()
|
||||||
|
cur["end_ms"] = end_ms
|
||||||
|
if cur is not None:
|
||||||
|
blocks.append(cur)
|
||||||
|
return blocks
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/transcriptions")
|
||||||
|
async def transcribe(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
model: Optional[str] = Form(default=None),
|
||||||
|
language: Optional[str] = Form(default=None),
|
||||||
|
response_format: Optional[str] = Form(default="json"),
|
||||||
|
temperature: Optional[float] = Form(default=None),
|
||||||
|
prompt: Optional[str] = Form(default=None),
|
||||||
|
):
|
||||||
|
if not engine._loaded:
|
||||||
|
raise HTTPException(status_code=503, detail="Engine loading")
|
||||||
|
audio_bytes = await file.read()
|
||||||
|
if not audio_bytes:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty file")
|
||||||
|
|
||||||
|
start_t = time.time()
|
||||||
|
audio_path = None
|
||||||
|
try:
|
||||||
|
result = engine.transcribe(
|
||||||
|
audio_bytes,
|
||||||
|
file.filename or "audio.wav",
|
||||||
|
want_timestamps=(response_format == "verbose_json"),
|
||||||
|
)
|
||||||
|
audio_path = result.pop("audio_path", None)
|
||||||
|
result.pop("audio", None)
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception("Transcription failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"Failed: {e}")
|
||||||
|
finally:
|
||||||
|
if audio_path:
|
||||||
|
try: os.unlink(audio_path)
|
||||||
|
except OSError: pass
|
||||||
|
|
||||||
|
elapsed = time.time() - start_t
|
||||||
|
duration = result.get("duration", 0.0)
|
||||||
|
logger.info(f"Transcribed {duration:.1f}s in {elapsed:.1f}s ({duration/elapsed:.0f}x rt)")
|
||||||
|
|
||||||
|
if response_format == "text":
|
||||||
|
return JSONResponse(content=result["text"], media_type="text/plain")
|
||||||
|
if response_format == "verbose_json":
|
||||||
|
words = []
|
||||||
|
for s in result.get("segments", []):
|
||||||
|
for w in s.get("words", []) or []:
|
||||||
|
words.append({
|
||||||
|
"word": w.get("word"),
|
||||||
|
"start": w.get("start"),
|
||||||
|
"end": w.get("end"),
|
||||||
|
"score": w.get("score"),
|
||||||
|
})
|
||||||
|
return {
|
||||||
|
"task": "transcribe",
|
||||||
|
"language": result.get("language", "en"),
|
||||||
|
"duration": duration,
|
||||||
|
"text": result["text"],
|
||||||
|
"segments": [
|
||||||
|
{"start": s.get("start"), "end": s.get("end"), "text": s.get("text", "").strip()}
|
||||||
|
for s in result.get("segments", [])
|
||||||
|
],
|
||||||
|
"words": words,
|
||||||
|
}
|
||||||
|
return {"text": result["text"]}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/v1/audio/transcribe-with-speakers")
|
||||||
|
async def transcribe_with_speakers(file: UploadFile = File(...)) -> dict:
|
||||||
|
"""Merged STT + diarization. Response shape matches spark-control's
|
||||||
|
/api/audio/transcribe-with-speakers exactly — recap-relay's PR spec
|
||||||
|
needs no changes when we cut over."""
|
||||||
|
if not engine._loaded:
|
||||||
|
raise HTTPException(status_code=503, detail="Engine loading")
|
||||||
|
if engine.diarize_model is None:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="Diarization unavailable — HF_TOKEN not set or pyannote failed to load",
|
||||||
|
)
|
||||||
|
audio_bytes = await file.read()
|
||||||
|
if not audio_bytes:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty file")
|
||||||
|
|
||||||
|
start_t = time.time()
|
||||||
|
audio_path = None
|
||||||
|
try:
|
||||||
|
result = engine.transcribe(
|
||||||
|
audio_bytes, file.filename or "audio.wav", want_timestamps=True
|
||||||
|
)
|
||||||
|
audio_path = result.pop("audio_path", None)
|
||||||
|
audio = result.pop("audio")
|
||||||
|
# Diarize on the in-memory audio (no second decode)
|
||||||
|
logger.info("Running pyannote diarization…")
|
||||||
|
diar = engine.diarize(audio)
|
||||||
|
# whisperx.assign_word_speakers writes speaker labels into the
|
||||||
|
# aligned segments + their nested words
|
||||||
|
result_with_speakers = whisperx.assign_word_speakers(
|
||||||
|
diar, {"segments": result["segments"]}
|
||||||
|
)
|
||||||
|
segments_in = result_with_speakers.get("segments", [])
|
||||||
|
blocks = _segments_to_blocks(segments_in)
|
||||||
|
speakers = sorted({b["speaker"] for b in blocks if b["speaker"] != "Speaker_unknown"})
|
||||||
|
except Exception as e:
|
||||||
|
logger.exception("Diarized transcription failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"Failed: {e}")
|
||||||
|
finally:
|
||||||
|
if audio_path:
|
||||||
|
try: os.unlink(audio_path)
|
||||||
|
except OSError: pass
|
||||||
|
|
||||||
|
elapsed = time.time() - start_t
|
||||||
|
duration = result.get("duration", 0.0)
|
||||||
|
logger.info(
|
||||||
|
f"Transcribed+diarized {duration:.1f}s in {elapsed:.1f}s "
|
||||||
|
f"({duration/elapsed:.0f}x rt), {len(speakers)} speakers, {len(blocks)} blocks"
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"duration": duration,
|
||||||
|
"language": result.get("language", "en"),
|
||||||
|
"speakers_detected": speakers,
|
||||||
|
"segments": blocks,
|
||||||
|
"models": {
|
||||||
|
"transcription": f"whisper-{WHISPER_MODEL}",
|
||||||
|
"diarization": "pyannote-speaker-diarization-3.1",
|
||||||
|
},
|
||||||
|
}
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
whisperx==3.4.3
|
||||||
|
fastapi>=0.115
|
||||||
|
uvicorn[standard]>=0.32
|
||||||
|
python-multipart>=0.0.9
|
||||||
|
soundfile>=0.12
|
||||||
+10
-2
@@ -9,7 +9,7 @@
|
|||||||
**Fix:**
|
**Fix:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-2-host> 'docker run --rm -v magpie-model-cache:/cache alpine chown -R 1000:1000 /cache && docker restart magpie-tts'
|
ssh modelo@<spark-2-host> 'docker run --rm -v magpie-model-cache:/cache alpine chown -R 1000:1000 /cache && docker restart magpie-tts'
|
||||||
```
|
```
|
||||||
|
|
||||||
The trick is the `docker run --rm alpine chown` — it runs as root inside the throwaway container, which is enough to chown the bind-mounted volume on the host, without needing `sudo` on the host itself. After the chown + restart, magpie downloaded its ~3 GB model from NGC into the cache and came up healthy on `:9000`.
|
The trick is the `docker run --rm alpine chown` — it runs as root inside the throwaway container, which is enough to chown the bind-mounted volume on the host, without needing `sudo` on the host itself. After the chown + restart, magpie downloaded its ~3 GB model from NGC into the cache and came up healthy on `:9000`.
|
||||||
@@ -20,9 +20,17 @@ The trick is the `docker run --rm alpine chown` — it runs as root inside the t
|
|||||||
|
|
||||||
This flag is Blackwell-specific. If vLLM in the container reports `unrecognized arguments: --moe_backend` or similar, edit `models.yaml` for `qwen36` and drop that flag. The swap UI does NOT auto-fallback in v0.1 — failure surfaces in the log stream.
|
This flag is Blackwell-specific. If vLLM in the container reports `unrecognized arguments: --moe_backend` or similar, edit `models.yaml` for `qwen36` and drop that flag. The swap UI does NOT auto-fallback in v0.1 — failure surfaces in the log stream.
|
||||||
|
|
||||||
|
## Qwen3.6 Mamba block-size assertion (fixed in v0.6.0:1)
|
||||||
|
|
||||||
|
Qwen3.6 uses a Mamba-attention hybrid that requires `--max-num-batched-tokens >= 2096`. vLLM's default is 2048, which trips `AssertionError: In Mamba cache align mode, block_size (2096) must be <= max_num_batched_tokens (2048)`. Fix: bake `--max-num-batched-tokens=16384` into the bundled qwen36 entry — matches the upstream qwen3.5-35b-a3b-fp8 recipe.
|
||||||
|
|
||||||
|
## Multimodal token budget for vision models (fixed in v0.8.0:1)
|
||||||
|
|
||||||
|
After the eugr/spark-vllm-docker update, vLLM became stricter about multimodal token budgets. Vision-capable models like Gemma 4 31B and Qwen3-VL crash at engine init with `ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496) is larger than max_num_batched_tokens (2048)`. Fix: bake `--max-num-batched-tokens=16384` into every model that has the `vision` capability. Now applied to qwen3-vl, gemma4, and qwen36 (which was already set for the Mamba issue).
|
||||||
|
|
||||||
## Two SSH paths to Spark 1 from the laptop
|
## Two SSH paths to Spark 1 from the laptop
|
||||||
|
|
||||||
`ssh <spark-user>@<spark-1-ip>` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for `<spark-1-host>.local`. Always use the `.local` hostname or `<spark-2-ip>`-style entries that ARE matched.
|
`ssh modelo@192.168.1.103` does NOT work from the laptop because the NVIDIA Sync ssh_config only has a Host entry for `spark-27ea.local`. Always use the `.local` hostname or `192.168.1.87`-style entries that ARE matched.
|
||||||
|
|
||||||
## Older models in `models.yaml`
|
## Older models in `models.yaml`
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -1,6 +1,6 @@
|
|||||||
MIT License
|
MIT License
|
||||||
|
|
||||||
Copyright (c) 2026 Alice
|
Copyright (c) 2026 Grant
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ This package SSHes into your Spark server to run cluster commands, so it needs a
|
|||||||
```bash
|
```bash
|
||||||
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
||||||
```
|
```
|
||||||
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `<spark-user>`).
|
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `modelo`).
|
||||||
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
||||||
|
|
||||||
## Using Spark Control
|
## Using Spark Control
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ This package SSHes into your Spark server to run cluster commands, so it needs a
|
|||||||
```bash
|
```bash
|
||||||
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
echo "<paste-pubkey-here>" >> ~/.ssh/authorized_keys
|
||||||
```
|
```
|
||||||
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `<spark-user>`).
|
3. **Open Actions → Configure Sparks.** Enter the LAN hostnames or IPs for Spark 1 and Spark 2, plus the SSH username (usually `modelo`).
|
||||||
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
4. **Open the Web UI.** It will hit each Spark to confirm. If both indicators are green you're done.
|
||||||
|
|
||||||
## Using Spark Control
|
## Using Spark Control
|
||||||
|
|||||||
@@ -76,6 +76,24 @@ const inputSpec = InputSpec.of({
|
|||||||
placeholder: 'magpie-tts',
|
placeholder: 'magpie-tts',
|
||||||
masked: false,
|
masked: false,
|
||||||
}),
|
}),
|
||||||
|
open_webui_url: Value.text({
|
||||||
|
name: 'Open WebUI URL (optional)',
|
||||||
|
description:
|
||||||
|
'If you also run Open WebUI on your LAN, paste its URL here. Spark Control will then show a one-click "Open chat" button next to the current model so you can jump straight to it.',
|
||||||
|
required: false,
|
||||||
|
default: null,
|
||||||
|
placeholder: 'e.g. https://open-webui.yourserver.local',
|
||||||
|
masked: false,
|
||||||
|
}),
|
||||||
|
ngc_api_key: Value.text({
|
||||||
|
name: 'NGC API key (optional)',
|
||||||
|
description:
|
||||||
|
'NVIDIA NGC personal API key — needed to install NIM containers (Parakeet, Magpie, etc.) from nvcr.io. Get one free at https://ngc.nvidia.com/setup/personal-key. Stored only on this Start9 server; passed to docker as the NGC_API_KEY env var when installing NIM services.',
|
||||||
|
required: false,
|
||||||
|
default: null,
|
||||||
|
placeholder: 'starts with "nvapi-..."',
|
||||||
|
masked: true,
|
||||||
|
}),
|
||||||
})
|
})
|
||||||
|
|
||||||
export const configureSparks = sdk.Action.withInput(
|
export const configureSparks = sdk.Action.withInput(
|
||||||
|
|||||||
@@ -14,6 +14,10 @@ export const sparkConfigSchema = z.object({
|
|||||||
magpie_host: z.string().catch(''),
|
magpie_host: z.string().catch(''),
|
||||||
magpie_user: z.string().catch(''),
|
magpie_user: z.string().catch(''),
|
||||||
magpie_container: z.string().catch(''),
|
magpie_container: z.string().catch(''),
|
||||||
|
// Optional Open WebUI deep-link
|
||||||
|
open_webui_url: z.string().catch(''),
|
||||||
|
// Optional NGC API key for pulling NIM containers from nvcr.io/nim/...
|
||||||
|
ngc_api_key: z.string().catch(''),
|
||||||
})
|
})
|
||||||
|
|
||||||
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
export type SparkConfig = z.infer<typeof sparkConfigSchema>
|
||||||
|
|||||||
@@ -19,6 +19,8 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
magpie_host: '',
|
magpie_host: '',
|
||||||
magpie_user: '',
|
magpie_user: '',
|
||||||
magpie_container: '',
|
magpie_container: '',
|
||||||
|
open_webui_url: '',
|
||||||
|
ngc_api_key: '',
|
||||||
}
|
}
|
||||||
|
|
||||||
return sdk.Daemons.of(effects).addDaemon('primary', {
|
return sdk.Daemons.of(effects).addDaemon('primary', {
|
||||||
@@ -47,6 +49,10 @@ export const main = sdk.setupMain(async ({ effects }) => {
|
|||||||
MAGPIE_USER: cfg.magpie_user,
|
MAGPIE_USER: cfg.magpie_user,
|
||||||
MAGPIE_CONTAINER: cfg.magpie_container,
|
MAGPIE_CONTAINER: cfg.magpie_container,
|
||||||
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
MODELS_OVERRIDES: '/data/models-overrides.yaml',
|
||||||
|
SERVICES_OVERRIDES: '/data/services-overrides.yaml',
|
||||||
|
CONNECTIVITY_LOG: '/data/connectivity.json',
|
||||||
|
OPEN_WEBUI_URL: cfg.open_webui_url,
|
||||||
|
NGC_API_KEY: cfg.ngc_api_key,
|
||||||
BIND_PORT: String(uiPort),
|
BIND_PORT: String(uiPort),
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||||
|
|
||||||
export const v0_1_0 = VersionInfo.of({
|
export const v0_1_0 = VersionInfo.of({
|
||||||
version: '0.2.3:0',
|
version: '0.12.0:1',
|
||||||
releaseNotes: {
|
releaseNotes: {
|
||||||
en_US:
|
en_US:
|
||||||
'Per-model Advanced settings + downloaded-model catalog flow. Each card now has an Advanced button: max context tokens, GPU memory %, and optimization toggles (fastsafetensors, prefix caching, FP8 KV cache). After a download finishes, a dialog appears to add the model to the catalog with those same knobs as launch defaults. Custom models can be deleted. Overrides persist in /data/models-overrides.yaml and survive package updates.',
|
'v0.12.0:1 — hotfix: 0.12.0:0\'s install action used shlex.quote() on the remote build path, which wraps `~/whisperx-build/...` in single quotes — the remote shell then doesn\'t expand the tilde and treats it as a literal directory named `~`. Result: "bash: line 1: ~/whisperx-build/Dockerfile: No such file or directory" on the very first file copy. Same bug pattern we hit before with $HOME in the disk probe. Rewrote to embed $HOME in double-quoted remote shell strings; hardcoded file names (Dockerfile, requirements.txt, README.md, app/main.py) embed unquoted inside that scope. All other 0.12.0 behavior is unchanged.',
|
||||||
},
|
},
|
||||||
migrations: {
|
migrations: {
|
||||||
up: async ({ effects }) => {},
|
up: async ({ effects }) => {},
|
||||||
|
|||||||
+8
-8
@@ -37,7 +37,7 @@ These take effect on the **next swap to that model**. If a swap fails after this
|
|||||||
## Adding a new model
|
## Adding a new model
|
||||||
|
|
||||||
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
1. Add an entry to `image/models.yaml`. Required fields: `display_name`, `repo`, `size_gb`, `mode` (`solo` or `cluster`), `vllm_args`. Optional but recommended: `description` (one paragraph — what the model is, what it's good for, how it differs from others; renders below the meta tags in each card), `capabilities` (tags like `[vision, reasoning, tools]`), `expected_ready_seconds`.
|
||||||
2. Confirm the weights are on the Spark: `ssh <spark-user>@<spark-1-host>.local 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
|
2. Confirm the weights are on the Spark: `ssh modelo@spark-27ea.local 'ls ~/.cache/huggingface/hub/'`. If not, download with `./hf-download.sh <repo>` on Spark 1.
|
||||||
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
|
3. Rebuild + redeploy the package: `cd package && make x86 && make install`.
|
||||||
|
|
||||||
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
|
If `description` is omitted, the card simply hides that section — no need to populate it for every model. Keep descriptions generic (not user-specific) so the catalog stays portable.
|
||||||
@@ -47,7 +47,7 @@ If `description` is omitted, the card simply hides that section — no need to p
|
|||||||
If the UI is unavailable and you need to swap by hand:
|
If the UI is unavailable and you need to swap by hand:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-host>.local
|
ssh modelo@spark-27ea.local
|
||||||
cd ~/spark-vllm-docker
|
cd ~/spark-vllm-docker
|
||||||
./launch-cluster.sh stop
|
./launch-cluster.sh stop
|
||||||
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
|
./launch-cluster.sh --solo -d exec vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
|
||||||
@@ -61,19 +61,19 @@ docker logs -f vllm_node # wait for "Application startup complete."
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Is vLLM serving?
|
# Is vLLM serving?
|
||||||
curl -s http://<spark-1-ip>:8888/v1/models | jq .
|
curl -s http://192.168.1.103:8888/v1/models | jq .
|
||||||
|
|
||||||
# Cluster status (containers up?)
|
# Cluster status (containers up?)
|
||||||
ssh <spark-user>@<spark-1-host>.local 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
ssh modelo@spark-27ea.local 'cd ~/spark-vllm-docker && ./launch-cluster.sh status'
|
||||||
|
|
||||||
# Tail current model's logs
|
# Tail current model's logs
|
||||||
ssh <spark-user>@<spark-1-host>.local 'docker logs --tail 200 -f vllm_node'
|
ssh modelo@spark-27ea.local 'docker logs --tail 200 -f vllm_node'
|
||||||
|
|
||||||
# Parakeet
|
# Parakeet
|
||||||
curl -s http://<spark-2-ip>:8000/health
|
curl -s http://192.168.1.87:8000/health
|
||||||
|
|
||||||
# Magpie (see known-issues.md)
|
# Magpie (see known-issues.md)
|
||||||
curl -s http://<spark-2-ip>:9000/v1/health/ready
|
curl -s http://192.168.1.87:9000/v1/health/ready
|
||||||
```
|
```
|
||||||
|
|
||||||
## Hard reset
|
## Hard reset
|
||||||
@@ -81,7 +81,7 @@ curl -s http://<spark-2-ip>:9000/v1/health/ready
|
|||||||
If launch-cluster.sh gets stuck:
|
If launch-cluster.sh gets stuck:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh <spark-user>@<spark-1-host>.local
|
ssh modelo@spark-27ea.local
|
||||||
cd ~/spark-vllm-docker
|
cd ~/spark-vllm-docker
|
||||||
./launch-cluster.sh stop
|
./launch-cluster.sh stop
|
||||||
docker ps -aq | xargs -r docker rm -f
|
docker ps -aq | xargs -r docker rm -f
|
||||||
|
|||||||
Reference in New Issue
Block a user