use more robust validation, defined at dataset level by CarlosGomes98 · Pull Request #806 · mlcommons/training

CarlosGomes98 · 2025-07-23T12:58:51Z

This pr introduces changes to the validation to make it more uniform amongst submitters, making errors harder. Changes are as follows:

Background

During a model forward step, the model attempts to denoise a latent. To do this, we take a latent from an image (will be our ground truth) and add noise to it. The amount of noise we add depends on the timestep, a value from 0 to 1. Naturally, the more noise we add, the harder the denoising task, and so the larger the loss we should expect.

Validation

For validation, we follow the flux paper. We 8 equally spaced timesteps from [0, 1) -> (0, 1/8, 2/8, 3/8, 4/8, 5/8, 6/8, 7/8) and try to sample equally from them.
This means we have to select one of these timesteps for each validation sample.

Current approach

Currently, this is done dynamically at train time. If we let each sample have a # which corresponds to its order, its timestep will be (# % 8) / 8. Basically, we cycle from 0 to 7 over and over.
While for standard training this is fine, I realized there are a few edge cases which dont make this ideal for a benchmark:

There might be some weird combinations of batch sizes and numbers of devices that dont evenly divide the validation dataset. This might mean some timesteps have slightly more samples than others, so folks would calculate slightly different validation losses.
We rely on folks to correctly implement the same logic as the reference. If they dont, they might calculate a different validation loss.

Proposed solution

Rather than doing this dynamically, I propose that, at validation dataset creation time, we associate each sample with a timestep. This sample will always be evaluated with the same timestep regardless of parallelisms or framework. This ensures everyone calculates exactly the same validation metric.
The order used for this would be the exact same the reference currently generates dynamically, so there would be no need to regenerate RCPs (I did verify this anyway and the convergence is unchanged)

github-actions · 2025-07-23T12:59:00Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

ShriyaRishab · 2025-07-23T15:38:33Z

To avoid confusion, can we update the pseudocode in Quality metric section so t is not computed but is obtained directly from the dataset? It will help submitters to see pseudocode for what they need to implement instead of how the validation dataset was originally generated.

You can add an appendix in the end with the pseudocode used to generate the timestamps for the validation dataset as an FYI

use more robust validation, defined at dataset level

20e83ee

CarlosGomes98 requested a review from a team as a code owner July 23, 2025 12:58

Merge branch 'master' into flux/robust_validation

2789097

ShriyaRishab reviewed Jul 23, 2025

View reviewed changes

CarlosGomes98 added 2 commits July 23, 2025 17:41

clearer readme

91cdaa7

change name to flux1, add randomness info to readme

3e11b7c

ShriyaRishab approved these changes Jul 25, 2025

View reviewed changes

ShriyaRishab merged commit c627b4a into mlcommons:master Jul 25, 2025
1 check passed

github-actions Bot locked and limited conversation to collaborators Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use more robust validation, defined at dataset level#806

use more robust validation, defined at dataset level#806
ShriyaRishab merged 4 commits into
mlcommons:masterfrom
CarlosGomes98:flux/robust_validation

CarlosGomes98 commented Jul 23, 2025

Uh oh!

github-actions Bot commented Jul 23, 2025 •

edited

Loading

Uh oh!

ShriyaRishab Jul 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

CarlosGomes98 commented Jul 23, 2025

Background

Validation

Current approach

Proposed solution

Uh oh!

github-actions Bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShriyaRishab Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jul 23, 2025 •

edited

Loading

ShriyaRishab Jul 23, 2025 •

edited

Loading