v0.13.0:4 - redaction gateway, embeddings proxy, expanded audio API
- Add redaction gateway (redaction_gateway.py, redaction/ scrub + tests) - Add embeddings proxy and spark_embed service (Dockerfile + main.py) - Expand audio_proxy with speaker-aware handling; deep_health/health/server updates - Package: configureSparks action + sparkConfig model updates, manifest/main wiring - Docs: AUDIO_API, EMBEDDINGS, REDACTION_GATEWAY; HANDOFF and runbook/known-issues refresh
This commit is contained in:
@@ -0,0 +1,36 @@
|
||||
# spark-embed — dense embeddings (bge-m3) + reranker (bge-reranker-v2-m3)
|
||||
# Built FROM the NGC PyTorch image that is already proven to run on the DGX
|
||||
# Spark's GB10 (sm_121) GPU — the same base behind our vLLM and Kokoro work.
|
||||
#
|
||||
# Why not HF Text Embeddings Inference (TEI)? As of 2026 TEI ships no arm64
|
||||
# CUDA image (all *-cuda tags are amd64-only), so it won't run on the Spark.
|
||||
# Building on NGC torch sidesteps that AND avoids torchaudio (the dependency
|
||||
# that sank the WhisperX attempt). bge-m3 + the reranker are XLM-RoBERTa
|
||||
# encoders — no flash-attn, no torchaudio, just SDPA attention on torch.
|
||||
FROM nvcr.io/nvidia/pytorch:25.11-py3
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Hard-pin the NGC torch version in a constraints file so pip CANNOT replace it
|
||||
# while resolving sentence-transformers. NGC's torch carries a local version
|
||||
# string (…nv25.11) not on PyPI; pinning it makes pip treat the already-installed
|
||||
# build as satisfying the requirement instead of pulling a PyPI wheel that
|
||||
# wouldn't have sm_121 kernels. (Same technique as the v0.12.0 torch-ABI work.)
|
||||
# transformers is NOT preinstalled in this NGC base, so it installs fresh from
|
||||
# PyPI; we cap it (<5) so a future major can't silently change loading behavior.
|
||||
RUN python -c "import torch; \
|
||||
open('/tmp/constraints.txt','w').write('torch==%s\n' % torch.__version__)" \
|
||||
&& cat /tmp/constraints.txt \
|
||||
&& pip install --no-cache-dir -c /tmp/constraints.txt \
|
||||
"sentence-transformers>=3.0" "transformers<5" "fastapi>=0.115" "uvicorn[standard]>=0.30"
|
||||
|
||||
COPY main.py /app/main.py
|
||||
|
||||
# Persist HuggingFace model downloads (bge-m3 ~2.3GB + reranker ~2.3GB) on a
|
||||
# mounted volume so container recreates don't re-download.
|
||||
ENV HF_HOME=/data/hf
|
||||
ENV DENSE_MODEL=BAAI/bge-m3
|
||||
ENV RERANK_MODEL=BAAI/bge-reranker-v2-m3
|
||||
|
||||
EXPOSE 8088
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8088"]
|
||||
Reference in New Issue
Block a user