Skip to content

feat: online reward normalization (Welford’s algorithm)#427

Open
RUFFY-369 wants to merge 4 commits intoNousResearch:mainfrom
RUFFY-369:feat/reward-normalization
Open

feat: online reward normalization (Welford’s algorithm)#427
RUFFY-369 wants to merge 4 commits intoNousResearch:mainfrom
RUFFY-369:feat/reward-normalization

Conversation

@RUFFY-369
Copy link
Copy Markdown

@RUFFY-369 RUFFY-369 commented Mar 30, 2026

PR Type

  • Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

Added an online reward normalizer to BaseEnv to keep training stable as rewards shift. I used Welford’s Online Algorithm for the running Z-score calculation to keep it O(1) in memory and avoid storing large reward histories.

I included a configurable warmup_steps phase so the distribution doesn't start shifting until the mean/std estimates have statistically stabilized. This should fix the gradient explosion issues often seen in early RL training stages.

Related Issues

Type of Change

  • New feature (non-breaking change which adds functionality)

✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes (21/21 verified)
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root? (N/A)

RUFFY-369 and others added 4 commits March 28, 2026 03:31
…lity

Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant