feat(pt): optimze HybridMuon by borrowing some ideas from deepseek v4 paper by OutisLi · Pull Request #5424 · deepmodeling/deepmd-kit

OutisLi · 2026-04-27T05:53:05Z

Summary by CodeRabbit

Release Notes

Optimizer Configuration Updates
- Default learning rate adjustment coefficient optimized (0.2 → 0.18) for improved convergence
- Advanced optimization mode now enabled by default
- Enhanced numerical stability in optimizer epsilon handling
- Simplified Gram path configuration for distributed training environments
- Integrated PyTorch Dynamo cache sizing optimizations for better execution performance

Copilot

Pull request overview

This PR updates the PyTorch HybridMuon optimizer configuration and implementation to align with reported DeepSeek‑V4 calibration choices (two-stage Newton–Schulz schedule and match-RMS coefficient), and adjusts related training/config plumbing.

Changes:

Change HybridMuon match-RMS scaling default (lr_adjust_coeff) from 0.2 to 0.18 and update documentation accordingly.
Remove the training-time auto-disable of enable_gram in distributed mode.
Update HybridMuon’s orthogonalization and Adam epsilon behavior (two-stage Newton–Schulz; ADAM_EPS=1e-20) and introduce a Dynamo cache size tweak.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`deepmd/utils/argcheck.py`	Updates HybridMuon optimizer defaults/docs (notably `lr_adjust_coeff`) and removes distributed-disable wording for `enable_gram`.
`deepmd/pt/train/training.py`	Changes HybridMuon optimizer construction to pass `enable_gram` directly (no longer forced off under distributed training).
`deepmd/pt/optimizer/hybrid_muon.py`	Implements the DeepSeek‑style two-stage Newton–Schulz schedule, updates defaults/docs, changes Adam epsilon handling, and adjusts Dynamo cache behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai · 2026-04-27T05:58:59Z

📝 Walkthrough

Walkthrough

The pull request updates the HybridMuon optimizer with a two-stage Newton-Schulz orthogonalization schedule (fast and polish phases), introduces distinct epsilon values for Adam operations, adjusts default parameters, and integrates PyTorch Dynamo cache sizing. It also simplifies distributed training logic for the enable_gram flag.

Changes

Cohort / File(s)	Summary
HybridMuon Optimizer Implementation `deepmd/pt/optimizer/hybrid_muon.py`	Replaces single-stage Newton-Schulz quintic iteration with two-stage hybrid schedule (fast + polish phases). Introduces `NS_STEPS_FAST`, `NS_COEFF_FAST`, `NS_STEPS_POLISH`, `NS_COEFF_POLISH` coefficients. Adds distinct `ADAM_EPS=1e-20` for Adam denominator computations. Updates default `lr_adjust_coeff` from `0.2` to `0.18` and `magma_muon` from `False` to `True`. Adds PyTorch Dynamo cache sizing adjustment.
Configuration & Documentation Updates `deepmd/utils/argcheck.py`	Updates `lr_adjust_coeff` default value and help text to reflect new `0.18` default. Removes statement that compiled Gram Newton-Schulz is disabled during distributed training.
Training Integration `deepmd/pt/train/training.py`	Simplifies `enable_gram` configuration logic to always derive directly from `opt_param` instead of applying conditional disable for distributed training scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

PR #5412: Modifies the same HybridMuon optimizer internals, including enable_gram handling and magma_muon default behavior
PR #5130: Updates Muon-family optimizers' Newton-Schulz orthogonalization, iteration logic, and optimizer parameter defaults
PR #5149: Refactors hybrid_muon.py and introduces Newton-Schulz schedule and epsilon handling constants used in this PR

Suggested labels

enhancement, Python

Suggested reviewers

wanghan-iapcm
njzjz
iProzd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change—optimizing HybridMuon with ideas from DeepSeek V4—which aligns with the detailed changes across three files implementing a two-stage hybrid schedule, Adam epsilon separation, and parameter defaults.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

deepmd/pt/optimizer/hybrid_muon.py (1)
120-129: Avoid mutating torch._dynamo.config.cache_size_limit at module import time.

Importing this module unconditionally bumps a process-global PyTorch Dynamo setting, even for callers that never instantiate HybridMuonOptimizer (e.g., serving / inference entry points that merely import the optimizer registry). The cache-size bump is only needed for _GramNewtonSchulzOrthogonalizer._compiled_call (the sole torch.compile site in this file), so the side effect should be scoped to where it's actually required.
♻️ Move the bump into `_GramNewtonSchulzOrthogonalizer.__init__`
 import torch
-import torch._dynamo.config as _dynamo_config
 from torch.optim.optimizer import (
     Optimizer,
 )
 
 DYNAMO_CACHE_SIZE_LIMIT = 64
-_dynamo_config.cache_size_limit = max(
-    int(_dynamo_config.cache_size_limit),
-    DYNAMO_CACHE_SIZE_LIMIT,
-)
     def __init__(self) -> None:
+        # Gram NS compiles per-shape; bump Dynamo's cache budget locally so the
+        # repeated recompilation across MLIP parameter shapes doesn't spill,
+        # without polluting global state for unrelated torch.compile users.
+        import torch._dynamo.config as _dynamo_config
+
+        _dynamo_config.cache_size_limit = max(
+            int(_dynamo_config.cache_size_limit),
+            DYNAMO_CACHE_SIZE_LIMIT,
+        )
         # Gram path uses NS_EPS (same numerical role as Standard NS norm clamp).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7a925eec-d673-4562-93b0-abf4226eb0cf

📥 Commits

Reviewing files that changed from the base of the PR and between 9d63816 and 06b9b59.

📒 Files selected for processing (3)

deepmd/pt/optimizer/hybrid_muon.py
deepmd/pt/train/training.py
deepmd/utils/argcheck.py

codecov · 2026-04-27T06:47:41Z

Codecov Report

❌ Patch coverage is 75.00000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.42%. Comparing base (9d63816) to head (06b9b59).

Files with missing lines	Patch %	Lines
deepmd/pt/optimizer/hybrid_muon.py	75.00%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5424      +/-   ##
==========================================
+ Coverage   82.39%   82.42%   +0.03%     
==========================================
  Files         824      824              
  Lines       87395    87418      +23     
  Branches     4197     4197              
==========================================
+ Hits        72009    72055      +46     
+ Misses      14111    14088      -23     
  Partials     1275     1275

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

feat: optimze HybridMuon

06b9b59

Copilot AI review requested due to automatic review settings April 27, 2026 05:53

github-actions Bot added the Python label Apr 27, 2026

dosubot Bot added the enhancement label Apr 27, 2026

Copilot started reviewing on behalf of OutisLi April 27, 2026 05:53 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread deepmd/pt/optimizer/hybrid_muon.py

Comment thread deepmd/pt/optimizer/hybrid_muon.py

Comment thread deepmd/pt/optimizer/hybrid_muon.py

Comment thread deepmd/pt/optimizer/hybrid_muon.py

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pt): optimze HybridMuon by borrowing some ideas from deepseek v4 paper#5424

feat(pt): optimze HybridMuon by borrowing some ideas from deepseek v4 paper#5424
OutisLi wants to merge 1 commit intodeepmodeling:masterfrom
OutisLi:pr/muon

OutisLi commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OutisLi commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OutisLi commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Apr 27, 2026 •

edited

Loading