spark-control

grant/spark-control

Fork 0

Commit Graph

Author	SHA1	Message	Date
Keysat	e775906caa	v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints Spark Control now exposes a per-chunk worker designed for Recap Relay to orchestrate against. Recap Relay does the chunking + global speaker clustering (consistent with how it already handles the Gemini path); Spark Control handles the GPU-bound per-chunk work. Parakeet container: - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method runs Sortformer + extracts one 192-dim voice fingerprint per detected local speaker (concatenating each speaker's audio across the chunk and running TitaNet's get_embedding). - main.py: new POST /v1/audio/diarize-chunk endpoint that returns segments + speakers_detected + fingerprints + models in one shot. Spark Control: - new POST /api/audio/diarize-chunk that proxies to parakeet's new endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s retry-after) as the other audio endpoints. Returns the raw JSON upstream because Recap Relay is the consumer; no merging needed. Response shape Recap Relay receives per chunk: { "duration": 300.0, "segments": [{"start_s","end_s","speaker"}, ...], # LOCAL labels "speakers_detected": ["Speaker_0","Speaker_1",...], "fingerprints": {"Speaker_0":[192 floats], ...}, "models": {"diarization":"...","embedding":"..."} } Recap Relay's job: 1. Chunk audio (existing chunking infrastructure) 2. POST each chunk to /api/audio/diarize-chunk in parallel 3. Collect all fingerprints from all chunks 4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine) 5. Re-label segments with global cluster IDs 6. Concatenate transcripts (from a separate parallel call to /v1/audio/transcriptions) with timestamp offsets and merge with re-labeled diar segments After installing v0.13.0:1, click "Reapply patches" on the Speech Models card to push the updated diarizer.py + main.py into the parakeet container — TitaNet will download (~25 MB) on first call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 11:37:05 -05:00
Keysat	713cd09cc2	v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers Adds a new pipeline for diarized transcription that any client (recap-relay, ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM or UI included — name resolution / analysis happen downstream where prompts and rendering are configurable. Architecture: Spark 2 / parakeet-asr container: + /opt/parakeet/app/diarizer.py (new: SortformerDiarizer class) + /opt/parakeet/app/main.py (patched: loads diarizer, adds /v1/audio/diarize endpoint) Model: nvidia/diar_sortformer_4spk-v1 (~150 MB, ungated, NeMo native) Spark Control: + POST /api/audio/transcribe-with-speakers Body: multipart file Returns: { duration, language, speakers_detected, segments: [{start_ms, end_ms, speaker, text}, ...], models: {transcription, diarization} } Runs Parakeet ASR + Sortformer in parallel, merges words to speaker turns by timestamp, groups into speaker-change blocks (breaks also on >1.5s silence gaps). + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns 503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2. Apply Sortformer patches to the running Parakeet container with: bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user> Patches are reversible — apply.sh backs up the original main.py inside the container at main.py.pre-sortformer before overwriting. Restore by copying that file back and removing diarizer.py, then docker restart. v0.11 follow-up: dashboard "Speech Models" panel to swap/update model versions from the UI instead of needing to re-run apply.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:14:48 -05:00

Author

SHA1

Message

Date

Keysat

e775906caa

v0.13.0:1 - per-chunk diarization worker with TitaNet voice fingerprints

Spark Control now exposes a per-chunk worker designed for Recap Relay
to orchestrate against. Recap Relay does the chunking + global speaker
clustering (consistent with how it already handles the Gemini path);
Spark Control handles the GPU-bound per-chunk work.

Parakeet container:
  - diarizer.py: now also loads NVIDIA TitaNet speaker-verification model
    (~25 MB, NeMo-native, no torchaudio). New diarize_chunk() method
    runs Sortformer + extracts one 192-dim voice fingerprint per detected
    local speaker (concatenating each speaker's audio across the chunk
    and running TitaNet's get_embedding).
  - main.py: new POST /v1/audio/diarize-chunk endpoint that returns
    segments + speakers_detected + fingerprints + models in one shot.

Spark Control:
  - new POST /api/audio/diarize-chunk that proxies to parakeet's new
    endpoint. Same CUDA-wedge recovery (503 + deep-health probe + 60s
    retry-after) as the other audio endpoints. Returns the raw JSON
    upstream because Recap Relay is the consumer; no merging needed.

Response shape Recap Relay receives per chunk:
  {
    "duration": 300.0,
    "segments":  [{"start_s","end_s","speaker"}, ...],   # LOCAL labels
    "speakers_detected": ["Speaker_0","Speaker_1",...],
    "fingerprints": {"Speaker_0":[192 floats], ...},
    "models": {"diarization":"...","embedding":"..."}
  }

Recap Relay's job:
  1. Chunk audio (existing chunking infrastructure)
  2. POST each chunk to /api/audio/diarize-chunk in parallel
  3. Collect all fingerprints from all chunks
  4. sklearn AgglomerativeClustering(distance_threshold=0.7, metric=cosine)
  5. Re-label segments with global cluster IDs
  6. Concatenate transcripts (from a separate parallel call to
     /v1/audio/transcriptions) with timestamp offsets and merge with
     re-labeled diar segments

After installing v0.13.0:1, click "Reapply patches" on the Speech Models
card to push the updated diarizer.py + main.py into the parakeet
container — TitaNet will download (~25 MB) on first call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-19 11:37:05 -05:00

Keysat

713cd09cc2

v0.10.0:0 - speaker diarization via Sortformer + merged transcribe-with-speakers

Adds a new pipeline for diarized transcription that any client (recap-relay,
ad-hoc curl, future Mac-side tools) can call. Pure data pipeline, no LLM
or UI included — name resolution / analysis happen downstream where prompts
and rendering are configurable.

Architecture:
  Spark 2 / parakeet-asr container:
    + /opt/parakeet/app/diarizer.py        (new: SortformerDiarizer class)
    + /opt/parakeet/app/main.py            (patched: loads diarizer, adds
                                            /v1/audio/diarize endpoint)
    Model: nvidia/diar_sortformer_4spk-v1  (~150 MB, ungated, NeMo native)

  Spark Control:
    + POST /api/audio/transcribe-with-speakers
      Body: multipart file
      Returns: {
        duration, language, speakers_detected,
        segments: [{start_ms, end_ms, speaker, text}, ...],
        models: {transcription, diarization}
      }
      Runs Parakeet ASR + Sortformer in parallel, merges words to speaker
      turns by timestamp, groups into speaker-change blocks (breaks also
      on >1.5s silence gaps).
    + If Parakeet 500s mid-pipeline, kicks deep-health probe and returns
      503/Retry-After: 60 — same wedge-recovery pattern as v0.9.0:2.

Apply Sortformer patches to the running Parakeet container with:
  bash image/parakeet_patches/apply.sh <spark2-host> <ssh-user>

Patches are reversible — apply.sh backs up the original main.py inside the
container at main.py.pre-sortformer before overwriting. Restore by copying
that file back and removing diarizer.py, then docker restart.

v0.11 follow-up: dashboard "Speech Models" panel to swap/update model
versions from the UI instead of needing to re-run apply.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:14:48 -05:00

2 Commits