Skip to content

Koratahiu/Advanced_Optimizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Optimizers (AIO)

A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.

PyPI version Python versions License


📦 Installation

pip install adv_optm

Requires PyTorch 2.3+ for torch.compile support.


What's New

🌟 Version 2.5.x: The Massive Refactor

This major update introduces a complete architectural refactor of the library:

🆕 New Optimizers & Scaling

  • SinkSGD_adv: Added a powerful new optimizer to the lineup.
  • Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.

💾 Memory & State Precision Control

  • Granular State Precision (state_precision): Drastically reduce memory overhead with new optimizer state modes:
    • factored (Rank-2 factored mode)
    • fp32 (Full precision)
    • bf16_sr & int8_sr (BF16/Int8 with Stochastic Rounding)
  • Factored Second Moment (factored_2nd): Available for all Adam variants. Works seamlessly alongside any state_precision setting to further slash memory usage.

⚙️ Advanced Dynamics & Momentum

  • Variance Normalized Momentum (normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available for AdamW_adv, SignSGD_adv, and SinkSGD_adv.
  • Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers.
  • Preconditioning & Signs:
    • Added Variance/Confidence Preconditioning (snr_cond) for SignSGD_adv and SinkSGD_adv (requires normed_momentum). Read the technical reports: AASS & sink-v.
    • Added Adaptive Stochastic Sign with $L_\infty$ preconditioning (stochastic_sign) for SignSGD_Adv and Lion_adv.
  • Improved CANS (accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound.
  • New OrthoGrad modes (orthogonal_gradient): Standard OrthoGrad flattened and a new matrix-wise mode iterative.

⚓ Weight Decay Innovations

  • Centered Weight Decay (centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4.
  • Fisher Weight Decay (fisher_wd): Now available for Adam variants based on the FAdam paper.
  • Geometric Weight Decay: Added specifically for SinkSGD_adv and SignSGD_adv.

(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).

Click to see older release notes (v1.2.x - v2.1.x)

Version 2.1.x

  • New Optimizer: Added Signum (SignSGD with momentum) to the SignSGD_adv family.

Version 2.0.x

  • torch.compile Support: Fully implemented for all advanced optimizers. Enable via compiled_optimizer=True to heavily fuse and optimize the optimizer step path.
  • 📉 1-Bit Factored Mode: Vastly improved implementation via nnmf_factor=True.
  • 🛠️ Broad performance and stability improvements across all optimizers.

Version 1.2.x

  • Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
Optimizer Description
Muon_adv Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more.
AdaMuon_adv Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization.
  • Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
  • Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
  • Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
  • Fused Operations: Transitioned to fused and in-place operations wherever possible.

💡 Core Innovations

(Documentation expanding on the theory and usage of these features is coming soon!)

About

A family of highly efficient, lightweight yet powerful optimizers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages