Add VAD pre-filter for transcription by nilsimda · Pull Request #69 · nextcloud/stt_whisper2

nilsimda · 2026-04-26T22:41:04Z

Summary

Exposes the Silero VAD that is already bundled with faster-whisper as an opt-in pre-filter, configurable via env vars. Default behaviour is unchanged (STT_WHISPER2_VAD_FILTER defaults to 0), so this is a no-op for existing deployments.

Why

For longer recordings or recordings with longer silences whisper starts to hallucinate or use repeated phrases. VAD helps mitigate this while also cutting down the amount of audio that needs to be transcribed.

How

faster-whisper.WhisperModel.transcribe() already accepts vad_filter and vad_parameters. The VAD model is part of the package's bundled assets. No extra download, no extra dependency. This PR just plumbs the flags through from environment variables.

Config

All variables are optional. Defaults are the same as faster whisper's defaults.

Env var	Default	Notes
STT_WHISPER2_VAD_FILTER	0	Set to 1 to enable.
STT_WHISPER2_VAD_THRESHOLD	0.5	Speech probability threshold.
STT_WHISPER2_VAD_MIN_SPEECH_MS	0	Drop speech chunks shorter than this.
STT_WHISPER2_VAD_MIN_SILENCE_MS	2000	Only collapse silences longer than this. Lower (e.g. 500) for aggressive filtering on call recordings.
STT_WHISPER2_VAD_SPEECH_PAD_MS	400	Padding around detected speech to avoid clipping word edges.

If there is interest in adding this: Is there a good place to document these? The Readme and/or the wiki?

Testing

I'm unsure if i should add an integration test for this path. This would double the cli pipeline time so I decided to wait on feedback first.

marcelklehr · 2026-04-27T08:06:38Z

Hey!
Thank you for this contribution! Is there a reason you made this feature opt-in? I'm thinking it might be worthwhile to make it opt-out

nilsimda · 2026-04-27T09:16:47Z

Thanks for the quick reply! Don't have a strong reason, I was being conservative. Going to flip it to opt-out :)

faster-whisper already bundles Silero VAD; this exposes it via STT_WHISPER2_VAD_FILTER and four tuning env vars. Enabled by default; set STT_WHISPER2_VAD_FILTER=0 to restore previous behaviour. Signed-off-by: Nils Imdahl <imdahlnils@gmail.com>

marcelklehr

Looks good, tests pass

Copilot

Pull request overview

Plumbs faster-whisper’s built-in Silero VAD options through to the transcription call so deployments can optionally pre-filter silence via environment variables, helping reduce hallucinations on long/silent recordings.

Changes:

Add VAD env-var configuration (filter toggle + parameter mapping) in lib/main.py.
Pass vad_filter / vad_parameters into WhisperModel.transcribe() when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nilsimda force-pushed the feat/vad-filter branch from 4ecd163 to 91f0c1e Compare April 27, 2026 10:59

nilsimda changed the title ~~Add optional VAD pre-filter for transcription~~ Add VAD pre-filter for transcription Apr 27, 2026

nilsimda marked this pull request as ready for review April 28, 2026 11:50

marcelklehr requested a review from Copilot May 7, 2026 06:03

marcelklehr approved these changes May 7, 2026

View reviewed changes

Copilot started reviewing on behalf of marcelklehr May 7, 2026 06:04 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread lib/main.py Outdated

Comment thread lib/main.py Outdated

marcelklehr added 2 commits May 11, 2026 09:43

fix: Add a nice error message for parse errors

42bf6d3

fix: Add VAD filter env vars to info.xml

3e7a4c8

marcelklehr merged commit 4073859 into nextcloud:main May 11, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VAD pre-filter for transcription#69

Add VAD pre-filter for transcription#69
marcelklehr merged 3 commits into
nextcloud:mainfrom
nilsimda:feat/vad-filter

nilsimda commented Apr 26, 2026 •

edited

Loading

Uh oh!

marcelklehr commented Apr 27, 2026

Uh oh!

nilsimda commented Apr 27, 2026

Uh oh!

marcelklehr left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nilsimda commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How

Config

Testing

Uh oh!

marcelklehr commented Apr 27, 2026

Uh oh!

nilsimda commented Apr 27, 2026

Uh oh!

marcelklehr left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nilsimda commented Apr 26, 2026 •

edited

Loading