Feature Request: voxtral implementation#3030
Conversation
ProposalThe Candle code should replicate the output of the Transformers implementation of Voxtral, word by word, when run with deterministic values. My current test consists of three different audio files. ChallengesI have encountered the following challenges while working on the current implementation: TekkenThe Tekken tokenizer only seems to be officially supported in Python so far. My current solution has been to reimplement the tokenizer in Rust completely to remove the Python deps. I hope that we can find a better solution for this; otherwise, I will publish my code as its crate. Mel SpectrogramThe current implementation of Whisper's My current solution is to rewrite the whole |
|
Thank you for the good work! I'm curious if you had a chance to work further on this model, and if you have some drafts in the Tekken and PCM to MEL, or insights on what you've seen so far on the reasons why the output diverges between the Python and Rust implementations. Thanks again and happy holidays! |
|
Awesome and thank you! |
What does this PR do?
Implemented voxtral and examples in candle.
Issue
Part of #3028
Requirements