Conversation
|
Hey @kozistr thanks for this PR, but I'm afraid that due to the model being trained on BF16, FP16 won't work as the hidden states will explode and be NaN i.e., out of bounds of the distribution those were trained for as the maximum value that FP16 supports is 65504. Also this is the reason why we didn't add support for it in the first, as we spotted it whilst collaborating with the Perplexity AI team on shipping this. Note that something similar happens with Gemma3, and ideally adding native support for BF16 on both CUDA and Metal would work, but we're still yet to update |
Thanks for the point, and fair enough! I was just thinking we could keep fp16 support as an option, though there's a chance of running into a NaN issue. Anyway, let's just roll with it :) |
What does this PR do?
To support FP16 precision for the pplx1 model, I added FP16 support and also the flash version.
% need to be tested on GPU
Before submitting
instasnapshots?Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@alvarobartt