Skip to content

Support Flash Pplx1 model#848

Closed
kozistr wants to merge 2 commits intohuggingface:mainfrom
kozistr:feature/support-pplx-fp16
Closed

Support Flash Pplx1 model#848
kozistr wants to merge 2 commits intohuggingface:mainfrom
kozistr:feature/support-pplx-fp16

Conversation

@kozistr
Copy link
Copy Markdown
Contributor

@kozistr kozistr commented Mar 22, 2026

What does this PR do?

To support FP16 precision for the pplx1 model, I added FP16 support and also the flash version.

% need to be tested on GPU

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@alvarobartt

@alvarobartt alvarobartt self-requested a review March 22, 2026 10:58
@alvarobartt
Copy link
Copy Markdown
Member

Hey @kozistr thanks for this PR, but I'm afraid that due to the model being trained on BF16, FP16 won't work as the hidden states will explode and be NaN i.e., out of bounds of the distribution those were trained for as the maximum value that FP16 supports is 65504.

Also this is the reason why we didn't add support for it in the first, as we spotted it whilst collaborating with the Perplexity AI team on shipping this.

Note that something similar happens with Gemma3, and ideally adding native support for BF16 on both CUDA and Metal would work, but we're still yet to update candle to latest + merge #809.

@kozistr
Copy link
Copy Markdown
Contributor Author

kozistr commented Mar 23, 2026

Hey @kozistr thanks for this PR, but I'm afraid that due to the model being trained on BF16, FP16 won't work as the hidden states will explode and be NaN i.e., out of bounds of the distribution those were trained for as the maximum value that FP16 supports is 65504.

Also this is the reason why we didn't add support for it in the first, as we spotted it whilst collaborating with the Perplexity AI team on shipping this.

Note that something similar happens with Gemma3, and ideally adding native support for BF16 on both CUDA and Metal would work, but we're still yet to update candle to latest + merge #809.

Thanks for the point, and fair enough! I was just thinking we could keep fp16 support as an option, though there's a chance of running into a NaN issue. Anyway, let's just roll with it :)

@kozistr kozistr deleted the feature/support-pplx-fp16 branch March 23, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants