From the config for both B200 and MI355X, the target loss is listed as 3.2 whereas the one defined in the rules is 3.34. Config: https://github.com/mlcommons/training/blob/master/small_llm_moe_pretraining/primus/config_MI355X_1x8x1.sh#L75 Rules: https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#41-closed-division
From the config for both B200 and MI355X, the target loss is listed as 3.2 whereas the one defined in the rules is 3.34.
Config:
https://github.com/mlcommons/training/blob/master/small_llm_moe_pretraining/primus/config_MI355X_1x8x1.sh#L75
Rules:
https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#41-closed-division