Skip to content

HeavySOAP, Buckets, and bugfixes

Latest

Choose a tag to compare

@ClashLuke ClashLuke released this 13 May 15:31
24fce43
  • Speed
    • Bucketing same-shape params batched into one kernel call, 10–20% faster. Less launch overhead, better batching in operations that support it
  • New Optimizers
    • HeavyKLSOAP / HeavyKLShampoo use Moore–Penrose pseudo-inverse instead of eps-regularized inverse on the KL eigenvalues, so collapsed directions don't blow up
    • HeavyKLSOAP / HeavySOAP / HeavySOAPNAdam / HeavySOAPLaProp / HeavySOAPAdEMAMix / HeavySOLP family tracks the eigenbasis with all moments
  • Bugfixes
    • ADOPT previously ran SGD
    • SAM no longer recompiles per ball_size
      • Caution flag no longer leaks across multi_tensor=False params
  • Breaking
    • Checkpoints load under weights_only=True -- this might cause issues with pickled custom precond schedules
    • Unknown kwargs are now errors, not warnings. Typos fail at construction. -- this will cause issues with custom functions that rely on uncaptured kwargs!