Fix warmup LR being zero at step 0 by Robby955 · Pull Request #1379 · EleutherAI/gpt-neox

Robby955 · 2026-04-08T23:04:02Z

Summary

The learning rate warmup formula in megatron/learning_rates.py produces LR=0 at step 0, causing the first training step to be a complete no-op. The step-1 checkpoint is identical to the step-0 checkpoint.

The Bug

The warmup formula on line 70 is:

return float(self.start_lr) * num_iters_ / self.warmup_iter

At step 0, num_iters_ is 0, so:

LR = start_lr * 0 / warmup_iter = 0

This means the gradient update at step 0 is multiplied by zero -- the model parameters don't change at all.

The Fix

# Before (step 0 → LR = 0):
return float(self.start_lr) * num_iters_ / self.warmup_iter

# After (step 0 → LR = start_lr / warmup_iter, same as step 1):
return float(self.start_lr) * max(1, num_iters_) / self.warmup_iter

Using max(1, num_iters_) ensures step 0 gets the same small nonzero LR that step 1 would have received (start_lr / warmup_iter), rather than zero. This is the minimal fix -- it avoids introducing new parameters or changing the config API.

Before/After behavior (warmup_iter=1000, start_lr=1e-3):

Step	Before (LR)	After (LR)
0	0.0	1e-6
1	1e-6	1e-6
2	2e-6	2e-6
...	...	...
1000	1e-3	1e-3

Steps 1+ are completely unchanged. Only step 0 is affected.

Impact

This bug affects all models trained with gpt-neox using LR warmup, including all Pythia models on the HuggingFace Hub (as noted by @StellaAthena in the issue). In every case, the step-1 checkpoint is identical to the step-0 checkpoint because the first training step does nothing.

At step 0, num_iters is 0 so the warmup formula `start_lr * num_iters / warmup_iter` yields LR=0. This means the very first training step is a no-op: the step-1 checkpoint is identical to the step-0 checkpoint. Fix by using max(1, num_iters) in the warmup formula so step 0 gets the same small nonzero LR that step 1 would have received (start_lr / warmup_iter). Fixes EleutherAI#1373

Robby955 requested a review from Quentin-Anthony as a code owner April 8, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix warmup LR being zero at step 0#1379

Fix warmup LR being zero at step 0#1379
Robby955 wants to merge 1 commit intoEleutherAI:mainfrom
Robby955:fix/warmup-lr-step-zero

Robby955 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Robby955 commented Apr 8, 2026

Summary

The Bug

The Fix

Before/After behavior (warmup_iter=1000, start_lr=1e-3):

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant