Skip to content

Feature Request: voxtral implementation#3030

Draft
maximizemaxwell wants to merge 9 commits into
huggingface:mainfrom
maximizemaxwell:voxtral-support
Draft

Feature Request: voxtral implementation#3030
maximizemaxwell wants to merge 9 commits into
huggingface:mainfrom
maximizemaxwell:voxtral-support

Conversation

@maximizemaxwell

Copy link
Copy Markdown
Contributor

What does this PR do?

Implemented voxtral and examples in candle.

Issue

Part of #3028

Requirements

  • Need fixes to fully run the code

@maximizemaxwell maximizemaxwell marked this pull request as draft July 22, 2025 12:08
@jorge-menjivar

jorge-menjivar commented Jul 23, 2025

Copy link
Copy Markdown
Contributor

Proposal

The Candle code should replicate the output of the Transformers implementation of Voxtral, word by word, when run with deterministic values. My current test consists of three different audio files.

Challenges

I have encountered the following challenges while working on the current implementation:

Tekken

The Tekken tokenizer only seems to be officially supported in Python so far.

My current solution has been to reimplement the tokenizer in Rust completely to remove the Python deps. I hope that we can find a better solution for this; otherwise, I will publish my code as its crate.

Mel Spectrogram

The current implementation of Whisper's pcm_to_mel() in Candle does not seem to precisely match Python's WhisperFeatureExtractor, causing it to output different values.

My current solution is to rewrite the whole pcm_to_mel function externally (in the example) to better align with Python's. This is where I am struggling to understand what's going on, since Whisper is working fine in Candle.

@eugenehp

Copy link
Copy Markdown
Contributor

Thank you for the good work! I'm curious if you had a chance to work further on this model, and if you have some drafts in the Tekken and PCM to MEL, or insights on what you've seen so far on the reasons why the output diverges between the Python and Rust implementations.

Thanks again and happy holidays!

@jorge-menjivar

Copy link
Copy Markdown
Contributor

@eugenehp Yes! Support for voxtral ended up being merged via #3036 and should produce the exact same output as python. I ended up creating the tekken-rs crate to support tekken in rust.

@eugenehp

Copy link
Copy Markdown
Contributor

Awesome and thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants