v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints
Spark Control now exposes a per-chunk worker designed for Recap Relay
to orchestrate against. Recap Relay does the chunking + global speaker
clustering (consistent with how it already handles the Gemini path);
Spark Control handles the GPU-bound per-chunk work.
Parakeet container:
- diarizer.py: now also loads NVIDIA TitaNet speaker-verification model
(~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method
runs Sortformer + extracts one 192-dim voice fingerprint per detected
local speaker (concatenating each speaker's audio across the chunk
and running TitaNet's get_embedding).
- main.py: new POST /v1/audio/diarize-chunk endpoint that returns
segments + speakers_detected + fingerprints + models in one shot.
Spark Control:
- new POST /api/audio/diarize-chunk that proxies to parakeet's new
endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s
retry-after) as the other audio endpoints. Returns the raw JSON
upstream because Recap Relay is the consumer; no merging needed.
Response shape Recap Relay receives per chunk:
{
"duration": 300.0,
"segments": [{"start_s","end_s","speaker"}, ...], # LOCAL labels
"speakers_detected": ["Speaker_0","Speaker_1",...],
"fingerprints": {"Speaker_0":[192 floats], ...},
"models": {"diarization":"...","embedding":"..."}
}
Recap Relay's job:
1. Chunk audio (existing chunking infrastructure)
2. POST each chunk to /api/audio/diarize-chunk in parallel
3. Collect all fingerprints from all chunks
4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine)
5. Re-label segments with global cluster IDs
6. Concatenate transcripts (from a separate parallel call to
/v1/audio/transcriptions) with timestamp offsets and merge with
re-labeled diar segments
After installing v0.13.0:1, click "Reapply patches" on the Speech Models
card to push the updated diarizer.py + main.py into the parakeet
container — TitaNet will download (~25 MB) on first call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,10 +1,10 @@
|
||||
import { VersionInfo, IMPOSSIBLE } from '@start9labs/start-sdk'
|
||||
|
||||
export const v0_1_0 = VersionInfo.of({
|
||||
version: '0.13.0:0',
|
||||
version: '0.13.0:1',
|
||||
releaseNotes: {
|
||||
en_US:
|
||||
'v0.13.0 — WhisperX migration reverted. Five hotfixes deep with no working build; the fundamental problem (NGC PyTorch on ARM64 ships a custom-versioned torch with no matching torchaudio anywhere) was always going to bite. All WhisperX install plumbing has been removed from spark-control: the install banner + progress dialog, the install endpoints, the audio-proxy WhisperX-preferred branch, the whisperx service registration, the WHISPERX_* env vars, and the build-context files. Spark 2 has been cleaned (container removed, build dir removed, ~6.8 GB of dangling layers + builder cache reclaimed). The dashboard now looks as it did before the migration attempt: Parakeet + Sortformer is the only audio path, unchanged. v0.13.0:1+ will add the actually-needed fixes: a memory cap on the parakeet container (so the 90-min audio crash can\'t take down Spark 2 again — worst case is a clean OOM-kill of the container), and a chunking proxy that splits long audio before sending to Sortformer.',
|
||||
'v0.13.0:1 — per-chunk diarization worker with voice fingerprints. Adds POST /api/audio/diarize-chunk to Spark Control: given one audio chunk, returns Sortformer diarization segments (with LOCAL speaker labels) PLUS a 192-dim TitaNet voice fingerprint per detected speaker. Designed for Recap Relay to call per-chunk and then cluster fingerprints across chunks via cosine similarity for globally consistent speaker IDs. Parakeet container also gets a new /v1/audio/diarize-chunk endpoint and loads NVIDIA TitaNet (nvidia/speakerverification_en_titanet_large, ~25 MB, NeMo-native, no torchaudio drama). Click Reapply patches on the Speech Models card after install to pick up the diarizer.py + main.py updates. Sortformer + Parakeet + Magpie unchanged.',
|
||||
},
|
||||
migrations: {
|
||||
up: async ({ effects }) => {},
|
||||
|
||||
Reference in New Issue
Block a user