v0.13.0:3 - proxy /v1/chat/completions through Spark Control to vLLM
Recap Relay dev caught that all audio endpoints route through Spark
Control but chat-completions didn't — clients had to know about both
SC AND the direct vLLM URL on Spark 1. Closes that last gap.
New endpoints:
POST /v1/chat/completions — OpenAI-shape, forwards to vLLM on Spark 1
POST /v1/completions — legacy OpenAI completions, same path
Implementation (image/app/llm_proxy.py):
- Dumb forwarder: request body passed through verbatim, response body
streamed back chunk-by-chunk. No transformation. vLLM already speaks
the same shape; adding any logic here would just create skew.
- Streaming: parses body for `stream: true` and uses httpx.AsyncClient
.stream() + FastAPI StreamingResponse if so. Non-streaming path is
a simple post-and-return.
- 30-minute timeout to accommodate large-context completions (default
httpx 5s would kill anything substantial).
- On upstream non-200 in streaming mode: emits one SSE `error` event
so the client's parser doesn't hang on an empty stream forever.
- On upstream connection error: HTTP 502 with "vllm unreachable" detail.
Now clients can use ONE host for everything:
POST https://spark-control/api/audio/diarize-chunk
POST https://spark-control/v1/audio/transcriptions
POST https://spark-control/v1/chat/completions
GET https://spark-control/api/endpoints (still works for clients that
prefer the direct URLs)
No parakeet container changes. No Reapply patches needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -16,6 +16,7 @@ from .audio_proxy import build_router as build_audio_router
|
||||
from .deep_health import DeepHealth
|
||||
from .disk import delete_from_disk, probe_disk
|
||||
from .download import DownloadManager
|
||||
from .llm_proxy import build_router as build_llm_router
|
||||
from .hardware import HardwareProbe
|
||||
from .health import check_magpie, check_parakeet, check_vllm
|
||||
from .models import load_catalog
|
||||
@@ -64,6 +65,12 @@ app.mount("/static", StaticFiles(directory=_STATIC_DIR), name="static")
|
||||
# when Parakeet returns 500, instead of waiting up to 5 min for the periodic probe.
|
||||
app.include_router(build_audio_router(settings, deep_health=deep_health))
|
||||
|
||||
# OpenAI-compatible LLM proxy: /v1/chat/completions, /v1/completions.
|
||||
# Forwards to whatever vLLM is currently running on Spark 1 (per the LLM swap
|
||||
# state). Supports SSE streaming when stream=true. Same trusted-host model
|
||||
# as the audio proxy — clients only need one URL for everything.
|
||||
app.include_router(build_llm_router(settings))
|
||||
|
||||
|
||||
@app.get("/", include_in_schema=False)
|
||||
async def index() -> FileResponse:
|
||||
|
||||
Reference in New Issue
Block a user