v0.10.0:1 - hotfix: merge function now joins words with proper spacing

Smoke testing v0.10.0:0 against a real anarlog audio.mp3 showed the
output running words together: "I'mrecordingrightnow", "don'tyoutry".

Root cause: _merge_words_with_speakers was doing "".join(cur_words),
assuming Parakeet returns words with leading whitespace (which the
hyprnote local Parakeet does, but the Spark-hosted Parakeet does not).

Rewrote the join with a small helper that:
  - Strips each token (handles both leading-space and no-leading-space
    word formats)
  - Joins with a single space
  - Keeps punctuation tight — no space before period/comma/colon/etc.

Verified post-install with the same test audio:
  [00:06] Speaker_0: I'm I'm recording right now.
  [00:18] Speaker_1: you're you're on your computer and your phone, right?

No other changes — Parakeet container patches and the endpoint shape
stay identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Keysat
2026-05-18 15:42:04 -05:00
parent 713cd09cc2
commit fda23088fe
2 changed files with 22 additions and 4 deletions
+20 -2
View File
@@ -344,6 +344,24 @@ def _merge_words_with_speakers(words: list[dict], diar_turns: list[dict]) -> lis
return []
SILENCE_BREAK_S = 1.5
def _join_words(parts: list[str]) -> str:
"""Join word tokens with proper spacing. Different STT outputs vary —
some include leading spaces in the word text (' morning'), some don't
('morning'). Normalize by stripping each token then joining with one
space; collapse multiple spaces. Keeps punctuation tight (no space
before period/comma/etc.)."""
cleaned = [p.strip() for p in parts if p and p.strip()]
if not cleaned:
return ""
out = cleaned[0]
for token in cleaned[1:]:
# No leading space before pure-punctuation tokens
if token and token[0] in ".,;:!?)]}'\"":
out += token
else:
out += " " + token
return out
blocks: list[dict] = []
cur_words: list[str] = []
cur_speaker: Optional[str] = None
@@ -367,7 +385,7 @@ def _merge_words_with_speakers(words: list[dict], diar_turns: list[dict]) -> lis
"start_ms": int(cur_start_s * 1000),
"end_ms": int(cur_end_s * 1000),
"speaker": cur_speaker,
"text": "".join(cur_words).strip(),
"text": _join_words(cur_words),
})
cur_words = [wt]
cur_speaker = spk
@@ -382,7 +400,7 @@ def _merge_words_with_speakers(words: list[dict], diar_turns: list[dict]) -> lis
"start_ms": int(cur_start_s * 1000),
"end_ms": int(cur_end_s * 1000),
"speaker": cur_speaker,
"text": "".join(cur_words).strip(),
"text": _join_words(cur_words),
})
return blocks