v3.1.1
- New Optimizer:
KLShampoo(adapted from https://arxiv.org/abs/2509.03378)- Note: HeavyBall KLShampoo differs from the paper for higher stability in long training horizons: It adds (1) PSGD's dampening, (2) PSGD's precond balancing
cautious_weight_decay, a boolean flag in every HeavyBall optimizer, switching between weight decay and cautious weight decay