fix: reject near-silent audio to prevent Whisper hallucinations by egsok · Pull Request #1229 · cjpais/Handy

egsok · 2026-04-05T11:11:26Z

Summary

Add RMS energy gate (threshold 0.005) in the transcription pipeline — skips audio that passed VAD but is too quiet for real speech
Add minimum speech duration check (100ms / 1600 samples) in audio manager — discards tiny VAD leakage fragments before they get zero-padded to 1.25s

Problem

When recording a few seconds of "silence", Whisper hallucinates text like "Subtitles by the Amara.org community". Real microphones pick up ambient noise that occasionally exceeds VAD's threshold, causing SmoothedVad to flush its prefill buffer (~8000+ samples of near-silence). The padding logic then inflates this to 1.25 seconds, and Whisper hallucinates on it.

How it works

Primary defense — RMS energy gate (transcription.rs): Computes RMS of the audio buffer and rejects anything below 0.005. Mic self-noise is ~0.0001–0.001, ambient noise that fools VAD is ~0.001–0.005, while whispered speech starts at ~0.01. This catches all near-silent audio regardless of how it got through VAD.

Secondary defense — min duration (audio.rs): Fragments shorter than 100ms (1600 samples at 16kHz) are discarded as VAD leakage — SmoothedVad's minimum real output is ~8000 samples. A cheap safety net for edge cases.

Test plan

Record 2–3 seconds of silence → no transcription output, no hallucination
Record a short word (~0.3s) → transcribes normally
Record normal speech → identical behavior to before
Debug mode (Ctrl+Shift+D): "Audio RMS ... below silence threshold" logged when recording silence

Add RMS energy gate (threshold 0.005) in transcription pipeline to skip audio that passed VAD but is too quiet for meaningful speech. Also add minimum speech duration check (100ms) in audio manager to discard tiny VAD leakage fragments before they get zero-padded to 1.25s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cjpais · 2026-04-05T12:03:57Z

I think this is probably a good change, I will test it and pull it in. May need to pull in beta testers too to get more feedback on it to ensure no unintended consequences. But I think if there is very low RMS there should be no big problem

cjpais · 2026-04-07T08:55:10Z

Hmmm.. I need to think more on this. I am not convinced that we can ship this broadly to everyone without consequences

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reject near-silent audio to prevent Whisper hallucinations#1229

fix: reject near-silent audio to prevent Whisper hallucinations#1229
egsok wants to merge 1 commit intocjpais:mainfrom
egsok:fix/silence-hallucination

egsok commented Apr 5, 2026 •

edited

Loading

Uh oh!

cjpais commented Apr 5, 2026

Uh oh!

cjpais commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

egsok commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

How it works

Test plan

Uh oh!

cjpais commented Apr 5, 2026

Uh oh!

cjpais commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

egsok commented Apr 5, 2026 •

edited

Loading