Skip to content

[single_stage_detector] Updated run_and_time.sh for customizing number of GPUs on single node#808

Merged
ShriyaRishab merged 3 commits into
mlcommons:masterfrom
jameslai-dev:dev_ssd_custom_dataset_gpus
Aug 11, 2025
Merged

[single_stage_detector] Updated run_and_time.sh for customizing number of GPUs on single node#808
ShriyaRishab merged 3 commits into
mlcommons:masterfrom
jameslai-dev:dev_ssd_custom_dataset_gpus

Conversation

@jameslai-dev

@jameslai-dev jameslai-dev commented Jul 28, 2025

Copy link
Copy Markdown
Contributor

Dear MLCommons team,

I appreciate your work, which has helped me verify the training performance after applying hardware resource virtualization to our bare-metal server.

I want to validate the model training performance in a reasonable time on both virtualized and non-virtualized environments. However, this is very challenging with the default Open Images dataset and our relatively small GPUs in a single node. Thus, I made some changes to the performance evaluation script, which may help others with similar use cases.

This pull request modifies the run_and_time.sh of SSD training, and introduces the following changes:

@jameslai-dev jameslai-dev requested a review from a team as a code owner July 28, 2025 08:32
@github-actions

github-actions Bot commented Jul 28, 2025

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@jameslai-dev

Copy link
Copy Markdown
Contributor Author

recheck

1 similar comment
@jameslai-dev

Copy link
Copy Markdown
Contributor Author

recheck

EVALBATCHSIZE=${EVALBATCHSIZE:-${BATCHSIZE}}
NUMEPOCHS=${NUMEPOCHS:-30}
LOG_INTERVAL=${LOG_INTERVAL:-20}
DATASET=${DATASET:-"openimages-mlperf"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLPerf requires us to use the same dataset as in the reference to ensure results are comparable across submissions. So we should not make it a changeable parameter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I've reverted the dataset customization commit.

@jameslai-dev jameslai-dev changed the title [single_stage_detector] Updated run_and_time.sh for customizing datasets and number of GPUs [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs Aug 9, 2025
@jameslai-dev jameslai-dev changed the title [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs on single node Aug 9, 2025
@ShriyaRishab ShriyaRishab merged commit c7b2283 into mlcommons:master Aug 11, 2025
1 check passed
@github-actions github-actions Bot locked and limited conversation to collaborators Aug 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants