Feature Request: voxtral implementation by maximizemaxwell · Pull Request #3030 · huggingface/candle

maximizemaxwell · 2025-07-22T12:03:43Z

What does this PR do?

Implemented voxtral and examples in candle.

Issue

Part of #3028

Requirements

Need fixes to fully run the code

jorge-menjivar · 2025-07-23T00:28:22Z

Proposal

The Candle code should replicate the output of the Transformers implementation of Voxtral, word by word, when run with deterministic values. My current test consists of three different audio files.

Challenges

I have encountered the following challenges while working on the current implementation:

Tekken

The Tekken tokenizer only seems to be officially supported in Python so far.

My current solution has been to reimplement the tokenizer in Rust completely to remove the Python deps. I hope that we can find a better solution for this; otherwise, I will publish my code as its crate.

Mel Spectrogram

The current implementation of Whisper's pcm_to_mel() in Candle does not seem to precisely match Python's WhisperFeatureExtractor, causing it to output different values.

My current solution is to rewrite the whole pcm_to_mel function externally (in the example) to better align with Python's. This is where I am struggling to understand what's going on, since Whisper is working fine in Candle.

eugenehp · 2025-12-25T04:08:12Z

Thank you for the good work! I'm curious if you had a chance to work further on this model, and if you have some drafts in the Tekken and PCM to MEL, or insights on what you've seen so far on the reasons why the output diverges between the Python and Rust implementations.

Thanks again and happy holidays!

jorge-menjivar · 2025-12-25T04:20:42Z

@eugenehp Yes! Support for voxtral ended up being merged via #3036 and should produce the exact same output as python. I ended up creating the tekken-rs crate to support tekken in rust.

eugenehp · 2025-12-25T08:36:23Z

Awesome and thank you!

maximizemaxwell added 9 commits July 20, 2025 13:41

feat: implement some configs in voxtral

baad80d

fix: fixed imports, implement more func

66fd6d9

feat: implemented full version, need fixes

412125a

fix: fixed some compile errors

586154f

feat: add initial examples

161d3d5

fix: fixed voxtral.rs

81f4c2d

fix: fixed compile errors in examples

13e22cf

fix: fixed compile errors

3c6827e

fix: update model integration

d1ffe9f

maximizemaxwell marked this pull request as draft July 22, 2025 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: voxtral implementation#3030

Feature Request: voxtral implementation#3030
maximizemaxwell wants to merge 9 commits into
huggingface:mainfrom
maximizemaxwell:voxtral-support

maximizemaxwell commented Jul 22, 2025

Uh oh!

jorge-menjivar commented Jul 23, 2025 •

edited

Loading

Uh oh!

eugenehp commented Dec 25, 2025

Uh oh!

jorge-menjivar commented Dec 25, 2025

Uh oh!

eugenehp commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maximizemaxwell commented Jul 22, 2025

What does this PR do?

Issue

Requirements

Uh oh!

jorge-menjivar commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal

Challenges

Tekken

Mel Spectrogram

Uh oh!

eugenehp commented Dec 25, 2025

Uh oh!

jorge-menjivar commented Dec 25, 2025

Uh oh!

eugenehp commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jorge-menjivar commented Jul 23, 2025 •

edited

Loading