[FEAT] Voxtral Support by jorge-menjivar · Pull Request #3036 · huggingface/candle

jorge-menjivar · 2025-07-27T15:13:50Z

Adds support for mistralai/Voxtral-Mini-3B-2507.
Should also support mistralai/Voxtral-Small-24B-2507, but this is not tested.

These models only have support for tekken tokenizer files right now, so I created and added an optional tekken tokenizer crate to the examples workspace. (tekken-rs). This gets enabled with the tekken feature.

fixes #3028

Run example with

cargo run --example voxtral --features tekken,symphonia,cuda,cudnn --release

jorge-menjivar · 2025-07-27T15:14:57Z

@maximizemaxwell Thank you for your earlier work! Here is the working code if you want to try it.

maximizemaxwell · 2025-07-28T11:19:05Z

@greenrazer Could you review this feature implementations?

benedikt-schaber · 2025-07-29T20:44:27Z

Hey, thank you for the implementation. I had a rough one myself but was missing the tekken part among some other flaws. I went through the code and I think I might have some pointers for improvement. I am new to audio transformers, however, so please have some leniency with any mistakes :)

For the example:

unused code: the example has some unused code, it loads the mel filters when creating the model, but does not use them. Further, the reimplementation of pcm_to_filer in audo_processing.rs is unused
audo_utils.rs only contains a resample function, pcm_decode is its own module, so I think this could be named more specifically
seperating transcribe_audio_with_tokens from transcribe_audio_internal seems unecessary
post_process_transcription seems very specific, especially with the word combinators, is this necessary / should this be generalized? I only did a small number of tests (around 100 audio files of 30secs each) without the post processing but I also did not encounter the issues there.

For the main code:

voxtral_llama.rs might be mergable in llama.rs, it only adds the explicit head_dim option, voxtral defaults and "fixes" dtype consistency, single sequence support

greenrazer

Thank you for the contribution, great job!

Just fix the clippy errors and we should be good:

cargo clippy --workspace --tests --examples --fix -- -D warnings

jorge-menjivar · 2025-08-01T07:02:44Z

Thank you @greenrazer and @benedikt-schaber for the suggestions.

I have made the requested changes.

I have also tested whisper's and snac's examples to make sure they run correctly after moving the pcm_decode and resample functions to the shared module.

greenrazer

Looks good, Thanks!

* feat: implement some configs in voxtral * fix: fixed imports, implement more func * feat: implemented full version, need fixes * fix: fixed some compile errors * feat: add initial examples * fix: fixed voxtral.rs * fix: fixed compile errors in examples * fix: fixed compile errors * fix: update model integration * First working example * Remove unused melfilters code * Remove unused code * Reuse whisper's pcm_decode * Simplify generation function * Remove unnecessary post-process fun * Reuse snac's resample * Apply clippy suggestions * Remove unused filters * Improve example * Update tekken-rs * Clippy fixes --------- Co-authored-by: Max <naturale@hufs.ac.kr>

maximizemaxwell and others added 11 commits July 20, 2025 13:41

feat: implement some configs in voxtral

baad80d

fix: fixed imports, implement more func

66fd6d9

feat: implemented full version, need fixes

412125a

fix: fixed some compile errors

586154f

feat: add initial examples

161d3d5

fix: fixed voxtral.rs

81f4c2d

fix: fixed compile errors in examples

13e22cf

fix: fixed compile errors

3c6827e

fix: update model integration

d1ffe9f

First working example

fd6d330

Merge branch 'main' into jorge-voxtral-3

559d4ac

greenrazer self-assigned this Jul 29, 2025

greenrazer suggested changes Jul 29, 2025

View reviewed changes

Comment thread candle-transformers/src/models/voxtral/model.rs

Comment thread candle-examples/examples/voxtral/pcm_decode.rs Outdated

Comment thread candle-examples/examples/voxtral/audio_utils.rs Outdated

jorge-menjivar added 10 commits July 31, 2025 14:40

Remove unused melfilters code

fe5f1a0

Remove unused code

bddb513

Reuse whisper's pcm_decode

08da41e

Simplify generation function

de7f962

Remove unnecessary post-process fun

8d27762

Reuse snac's resample

b3285b0

Apply clippy suggestions

bc5f3a5

Remove unused filters

11e3d00

Improve example

588bb01

Update tekken-rs

550de96

jorge-menjivar requested a review from greenrazer August 1, 2025 07:02

Clippy fixes

49d3d48

greenrazer approved these changes Aug 4, 2025

View reviewed changes

greenrazer merged commit 21032cb into huggingface:main Aug 4, 2025
9 checks passed

jorge-menjivar mentioned this pull request Dec 25, 2025

Feature Request: voxtral implementation #3030

Draft

1 task

jorge-menjivar mentioned this pull request Apr 7, 2026

Add support for overriding model dir jorge-menjivar/super-stt#194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Voxtral Support#3036

[FEAT] Voxtral Support#3036
greenrazer merged 22 commits into
huggingface:mainfrom
jorge-menjivar:jorge-voxtral-3

jorge-menjivar commented Jul 27, 2025

Uh oh!

jorge-menjivar commented Jul 27, 2025

Uh oh!

maximizemaxwell commented Jul 28, 2025

Uh oh!

benedikt-schaber commented Jul 29, 2025

Uh oh!

greenrazer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorge-menjivar commented Aug 1, 2025

Uh oh!

greenrazer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jorge-menjivar commented Jul 27, 2025

Uh oh!

jorge-menjivar commented Jul 27, 2025

Uh oh!

maximizemaxwell commented Jul 28, 2025

Uh oh!

benedikt-schaber commented Jul 29, 2025

Uh oh!

greenrazer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorge-menjivar commented Aug 1, 2025

Uh oh!

greenrazer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants