spark-control

Files

T

Keysat e775906caa v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints

Spark Control now exposes a per-chunk worker designed for Recap Relay
to orchestrate against. Recap Relay does the chunking + global speaker
clustering (consistent with how it already handles the Gemini path);
Spark Control handles the GPU-bound per-chunk work.

Parakeet container:
  - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model
    (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method
    runs Sortformer + extracts one 192-dim voice fingerprint per detected
    local speaker (concatenating each speaker's audio across the chunk
    and running TitaNet's get_embedding).
  - main.py: new POST /v1/audio/diarize-chunk endpoint that returns
    segments + speakers_detected + fingerprints + models in one shot.

Spark Control:
  - new POST /api/audio/diarize-chunk that proxies to parakeet's new
    endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s
    retry-after) as the other audio endpoints. Returns the raw JSON
    upstream because Recap Relay is the consumer; no merging needed.

Response shape Recap Relay receives per chunk:
  {
    "duration": 300.0,
    "segments":  [{"start_s","end_s","speaker"}, ...],   # LOCAL labels
    "speakers_detected": ["Speaker_0","Speaker_1",...],
    "fingerprints": {"Speaker_0":[192 floats], ...},
    "models": {"diarization":"...","embedding":"..."}
  }

Recap Relay's job:
  1. Chunk audio (existing chunking infrastructure)
  2. POST each chunk to /api/audio/diarize-chunk in parallel
  3. Collect all fingerprints from all chunks
  4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine)
  5. Re-label segments with global cluster IDs
  6. Concatenate transcripts (from a separate parallel call to
     /v1/audio/transcriptions) with timestamp offsets and merge with
     re-labeled diar segments

After installing v0.13.0:1, click "Reapply patches" on the Speech Models
card to push the updated diarizer.py + main.py into the parakeet
container — TitaNet will download (~25 MB) on first call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-19 11:37:05 -05:00

app

v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints

2026-05-19 11:37:05 -05:00

parakeet_patches

v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints

2026-05-19 11:37:05 -05:00

Dockerfile

v0.13.0:0 - revert WhisperX migration; back to Parakeet + Sortformer

2026-05-19 08:03:19 -05:00

entrypoint.sh

Initial scaffold: image/ FastAPI app, models.yaml, docs

2026-05-12 09:29:13 -05:00

models.yaml

v0.8.0:3 - add --max-num-batched-tokens=16384 to vision models (gemma4, qwen3-vl)

2026-05-12 14:47:32 -05:00

pyproject.toml

v0.9.0:1 - hotfix: add python-multipart for /v1/audio/transcriptions

2026-05-17 17:01:52 -05:00