feat: reward ensembles and inter-rater reliability metrics by RUFFY-369 · Pull Request #426 · NousResearch/atropos

RUFFY-369 · 2026-03-30T21:02:28Z

PR Type

RL Environment PR - Complete Environment Snapshot & Zero-Training sections
Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

I’ve added a new EnsembleReward class to atroposlib/envs/reward_fns/ to mitigate reward hacking and high variance in RL training. Instead of relying on a single scoring model, this allows for aggregating multiple scorers using mean, median, min, or majority_vote.

I also integrated Krippendorff's alpha to track inter-rater reliability (IRR) across the ensemble. This is a critical observability tool to catch when scorers are fundamentally disagreeing—a common signal that the agent is finding an unaligned edge case.

Related Issues

Part of [Enhancement] RL Training Infrastructure Stabilization & Observability #431 (RL Infrastructure Enhancements)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Code refactor (no functional changes)
Build/CI/CD related changes
Other (please describe):

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes (17/17 verified)
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root? (N/A)

…ability Add EnsembleReward to atroposlib/envs/reward_fns/ with: - Multiple aggregation strategies: mean, median, min, majority_vote - Krippendorff's alpha inter-rater reliability metric - Per-item disagreement tracking for reward hacking detection - Full integration with RewardRegistry 17/17 tests passing.

for more information, see https://pre-commit.ci

RUFFY-369 and others added 3 commits March 28, 2026 03:22

Merge branch 'NousResearch:main' into feat/reward-ensemble

f60fd92

[pre-commit.ci] auto fixes from pre-commit.com hooks

f68ae5e

for more information, see https://pre-commit.ci

This was referenced Mar 30, 2026

feat: online reward normalization (Welford’s algorithm) #427

Open

feat: API performance tracking and final infra integration #430

Open

[Enhancement] RL Training Infrastructure Stabilization & Observability #431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reward ensembles and inter-rater reliability metrics#426

feat: reward ensembles and inter-rater reliability metrics#426
RUFFY-369 wants to merge 3 commits intoNousResearch:mainfrom
RUFFY-369:feat/reward-ensemble

RUFFY-369 commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RUFFY-369 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

📝 General Information

Description

Related Issues

Type of Change

✅ Developer & Reviewer Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RUFFY-369 commented Mar 30, 2026 •

edited

Loading