Skip to content

v3.1.1

Choose a tag to compare

@ClashLuke ClashLuke released this 27 Apr 15:46
  • New Optimizer: KLShampoo (adapted from https://arxiv.org/abs/2509.03378)
    • Note: HeavyBall KLShampoo differs from the paper for higher stability in long training horizons: It adds (1) PSGD's dampening, (2) PSGD's precond balancing
  • cautious_weight_decay, a boolean flag in every HeavyBall optimizer, switching between weight decay and cautious weight decay