Skip to content

Add VAD pre-filter for transcription#69

Merged
marcelklehr merged 3 commits into
nextcloud:mainfrom
nilsimda:feat/vad-filter
May 11, 2026
Merged

Add VAD pre-filter for transcription#69
marcelklehr merged 3 commits into
nextcloud:mainfrom
nilsimda:feat/vad-filter

Conversation

@nilsimda

@nilsimda nilsimda commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Exposes the Silero VAD that is already bundled with faster-whisper as an opt-in pre-filter, configurable via env vars. Default behaviour is unchanged (STT_WHISPER2_VAD_FILTER defaults to 0), so this is a no-op for existing deployments.

Why

For longer recordings or recordings with longer silences whisper starts to hallucinate or use repeated phrases. VAD helps mitigate this while also cutting down the amount of audio that needs to be transcribed.

How

faster-whisper.WhisperModel.transcribe() already accepts vad_filter and vad_parameters. The VAD model is part of the package's bundled assets. No extra download, no extra dependency. This PR just plumbs the flags through from environment variables.

Config

All variables are optional. Defaults are the same as faster whisper's defaults.

Env var Default Notes
STT_WHISPER2_VAD_FILTER 0 Set to 1 to enable.
STT_WHISPER2_VAD_THRESHOLD 0.5 Speech probability threshold.
STT_WHISPER2_VAD_MIN_SPEECH_MS 0 Drop speech chunks shorter than this.
STT_WHISPER2_VAD_MIN_SILENCE_MS 2000 Only collapse silences longer than this. Lower (e.g. 500) for aggressive filtering on call recordings.
STT_WHISPER2_VAD_SPEECH_PAD_MS 400 Padding around detected speech to avoid clipping word edges.

If there is interest in adding this: Is there a good place to document these? The Readme and/or the wiki?

Testing

I'm unsure if i should add an integration test for this path. This would double the cli pipeline time so I decided to wait on feedback first.

@marcelklehr

Copy link
Copy Markdown
Member

Hey!
Thank you for this contribution! Is there a reason you made this feature opt-in? I'm thinking it might be worthwhile to make it opt-out

@nilsimda

Copy link
Copy Markdown
Contributor Author

Thanks for the quick reply! Don't have a strong reason, I was being conservative. Going to flip it to opt-out :)

faster-whisper already bundles Silero VAD; this exposes it via
STT_WHISPER2_VAD_FILTER and four tuning env vars. Enabled by default;
set STT_WHISPER2_VAD_FILTER=0 to restore previous behaviour.

Signed-off-by: Nils Imdahl <imdahlnils@gmail.com>
@nilsimda nilsimda changed the title Add optional VAD pre-filter for transcription Add VAD pre-filter for transcription Apr 27, 2026
@nilsimda nilsimda marked this pull request as ready for review April 28, 2026 11:50
@marcelklehr marcelklehr requested a review from Copilot May 7, 2026 06:03

@marcelklehr marcelklehr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests pass

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Plumbs faster-whisper’s built-in Silero VAD options through to the transcription call so deployments can optionally pre-filter silence via environment variables, helping reduce hallucinations on long/silent recordings.

Changes:

  • Add VAD env-var configuration (filter toggle + parameter mapping) in lib/main.py.
  • Pass vad_filter / vad_parameters into WhisperModel.transcribe() when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/main.py Outdated
Comment thread lib/main.py Outdated
@marcelklehr marcelklehr merged commit 4073859 into nextcloud:main May 11, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants