fix clipping logic, add test for clipping functions#73
Conversation
|
I've merged both clamps into one. Could you double-check whether it still looks good? I'll get the tests to run/pass in a few hours. |
06fb5f7 to
d099575
Compare
- fix clip_at:
* remove the redundant clip_at = max(clip_at, eps) in
_compilable_rmsnorm_clip_
* remove the clip_at = max(clip_at, eps) in _clip and in
_compilable_global_rmsnorm_clip_. These lines made the numerator
of the scalar equal to max(clip_at, eps) instead of just clip_at.
- divide only once by numel in _compilable_global_rmsnorm_clip_ for
better perf/elegance
- other straightforward fixes
|
Sorry for the delay. I have been recovering from surgery. There are some issues with the edits. I fixed the straightforward ones in the new commit. Please take a look at it. Now for the more nuanced "issues" (considerations): The equality that you wrote down, is true, but the original expression is not equal to This being said, the use of So, the code is not wrong, but it does not match pytorch. As I stated in the issue, pytorch uses the following logic (which is what I copied exactly): If you think/know that using Now consider the value you used for |
|
Awesome, thank you for the detailed breakdown! Yeah, the max is on purpose. In my tests, the max empirically converges better by preserving numerical accuracy for longer. This is not the same semantics as torch, but it's consistent with how eps is handled in the rest of the library. I especially appreciate you catching the x vs x32 multiplication issue! Merging this now. Feel free to contribute again if you spot other issues! Regardless, I hope your surgery went well and wish you a swift recovery. |
Fixes #72.
Current state of the test:
I believe the ValueError is due to some unrelated issue. The AssertionError occurs at the very beginning of a clip function, before anything is done to the input tensors. Therefore, it seems that this error is due to Muon just not behaving well when its updates/grads are clipped. I am not really all that familiar with Muon, so let me know if the presence of this error makes sense to you and what you would like me to change to remedy it.