Skip to content

Add distribution_strategy and all_reduce_alg flags to TensorFlow BERT pretraining#745

Closed
rapsealk wants to merge 2 commits into
mlcommons:masterfrom
rapsealk:rapsealk/add-flags-to-tf-bert
Closed

Add distribution_strategy and all_reduce_alg flags to TensorFlow BERT pretraining#745
rapsealk wants to merge 2 commits into
mlcommons:masterfrom
rapsealk:rapsealk/add-flags-to-tf-bert

Conversation

@rapsealk

@rapsealk rapsealk commented Jun 5, 2024

Copy link
Copy Markdown
Member

Hello mlcommons team!

I've noticed that some utilities, such as additional flags, are missing in BERT pretraining, unlike ResNet50 image classification. This pull request is expected to be helpful to run BERT training on distributed environment.

References are below:

if distribution_strategy:
flags.DEFINE_string(
name="distribution_strategy", short_name="ds", default="mirrored",
help=help_wrap("The Distribution Strategy to use for training. "
"Accepted values are 'off', 'one_device', "
"'mirrored', 'parameter_server', 'collective', "
"case insensitive. 'off' means not to use "
"Distribution Strategy; 'default' means to choose "
"from `MirroredStrategy` or `OneDeviceStrategy` "
"according to the number of GPUs.")
)

if all_reduce_alg:
flags.DEFINE_string(
name="all_reduce_alg", short_name="ara", default=None,
help=help_wrap("Defines the algorithm to use for performing all-reduce."
"When specified with MirroredStrategy for single "
"worker, this controls "
"tf.contrib.distribute.AllReduceCrossTowerOps. When "
"specified with MultiWorkerMirroredStrategy, this "
"controls "
"tf.distribute.experimental.CollectiveCommunication; "
"valid options are `ring` and `nccl`."))

Refs

@rapsealk rapsealk requested a review from a team as a code owner June 5, 2024 00:51
@github-actions

github-actions Bot commented Jun 5, 2024

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@rapsealk rapsealk marked this pull request as draft June 5, 2024 03:35
@rapsealk rapsealk marked this pull request as ready for review June 5, 2024 04:21
@ShriyaRishab

ShriyaRishab commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

Closing as bert is retired and replaced with llama3.1 8b.

@github-actions github-actions Bot locked and limited conversation to collaborators Aug 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants