Support Flash Pplx1 model by kozistr · Pull Request #848 · huggingface/text-embeddings-inference

kozistr · 2026-03-22T06:24:05Z

What does this PR do?

To support FP16 precision for the pplx1 model, I added FP16 support and also the flash version.

% need to be tested on GPU

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@alvarobartt

alvarobartt · 2026-03-22T11:03:38Z

Hey @kozistr thanks for this PR, but I'm afraid that due to the model being trained on BF16, FP16 won't work as the hidden states will explode and be NaN i.e., out of bounds of the distribution those were trained for as the maximum value that FP16 supports is 65504.

Also this is the reason why we didn't add support for it in the first, as we spotted it whilst collaborating with the Perplexity AI team on shipping this.

Note that something similar happens with Gemma3, and ideally adding native support for BF16 on both CUDA and Metal would work, but we're still yet to update candle to latest + merge #809.

kozistr · 2026-03-23T12:52:45Z

Hey @kozistr thanks for this PR, but I'm afraid that due to the model being trained on BF16, FP16 won't work as the hidden states will explode and be NaN i.e., out of bounds of the distribution those were trained for as the maximum value that FP16 supports is 65504.

Also this is the reason why we didn't add support for it in the first, as we spotted it whilst collaborating with the Perplexity AI team on shipping this.

Note that something similar happens with Gemma3, and ideally adding native support for BF16 on both CUDA and Metal would work, but we're still yet to update candle to latest + merge #809.

Thanks for the point, and fair enough! I was just thinking we could keep fp16 support as an option, though there's a chance of running into a NaN issue. Anyway, let's just roll with it :)

kozistr added 2 commits March 22, 2026 14:31

feature: support flash pplx1 model

fb318c4

fix: typo

5fcecc0

alvarobartt self-requested a review March 22, 2026 10:58

alvarobartt closed this Mar 22, 2026

kozistr deleted the feature/support-pplx-fp16 branch March 23, 2026 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Flash Pplx1 model#848

Support Flash Pplx1 model#848
kozistr wants to merge 2 commits intohuggingface:mainfrom
kozistr:feature/support-pplx-fp16

kozistr commented Mar 22, 2026

Uh oh!

alvarobartt commented Mar 22, 2026

Uh oh!

kozistr commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kozistr commented Mar 22, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

alvarobartt commented Mar 22, 2026

Uh oh!

kozistr commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants