Skip to content

included llama3.1 8b small llm training scripts#799

Closed
ZixianWangAMD wants to merge 29 commits into
mlcommons:masterfrom
ZixianWangAMD:small_llm_pretraining_new
Closed

included llama3.1 8b small llm training scripts#799
ZixianWangAMD wants to merge 29 commits into
mlcommons:masterfrom
ZixianWangAMD:small_llm_pretraining_new

Conversation

@ZixianWangAMD

Copy link
Copy Markdown
Contributor

No description provided.

@ZixianWangAMD ZixianWangAMD requested a review from a team as a code owner June 25, 2025 05:21
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
@zixian Wang
Zixian Wang seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request

Comment thread small_language_model_pretraining/nemo/README.md Outdated
Comment thread small_language_model_pretraining/nemo/README.md Outdated
Comment thread small_language_model_pretraining/nemo/README.md Outdated
Comment thread small_language_model_pretraining/nemo/config_MI325X_1x8x1_8b.sh Outdated
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py Outdated
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py Outdated
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py Outdated
Comment thread small_language_model_pretraining/nemo/utils/download_hf_llama3.sh Outdated
@mmarcinkiewicz

Copy link
Copy Markdown
Contributor

@ZixianWangAMD may I ask for a dockerfile that I can use to test it on H100? Or at least a hand how to modify the existing dockerfile?

@ShriyaRishab

Copy link
Copy Markdown
Contributor

Based on Training WG feedback, can you please change the folder name from small_language_model_pretraining to small_llm_pretraining since that is the agreed upon long name for this benchmark?
The short name (which will be used everywhere in the code + logging) will be llama3.1_8b

Comment thread small_language_model_pretraining/nemo/config_H200_1x8x1_8b.sh Outdated
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py Outdated
# warmup_steps = math.ceil(57600 * 8192 / 8192 / gbs * 0.1)
# # 230k samples
max_steps = math.ceil(230000 * 8192 / 8192 / gbs)
warmup_steps = math.ceil(230000 * 8192 / 8192 / gbs * 0.1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_steps should be fixed at 1.2M
warmup_steps should be parametrizable from the config

Comment thread small_language_model_pretraining/nemo/config_H200_1x8x1_8b.sh Outdated
Comment thread small_language_model_pretraining/nemo/config_MI325X_1x8x1_8b.sh Outdated
Comment thread small_language_model_pretraining/nemo/config_MI325X_1x8x1_8b.sh Outdated
Comment thread small_language_model_pretraining/nemo/dev/run_docker.sh
Comment thread small_language_model_pretraining/nemo/dev/run_llama31.sh
Comment thread small_language_model_pretraining/nemo/dev/run_llama31.sh Outdated
Comment thread small_language_model_pretraining/nemo/dev/run_llama31.sh Outdated
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py
Comment thread small_language_model_pretraining/nemo/dev/run_llama31.sh
Comment thread small_language_model_pretraining/nemo/pretrain_llama31.py
Once Rclone is installed, run the following command to authenticate with the bucket:

```
rclone config create mlc-training s3 provider=Cloudflare access_key_id=76ea42eadb867e854061a1806220ee1e secret_access_key=a53625c4d45e3ca8ac0df8a353ea3a41ffc3292aa25259addd8b7dc5a6ce2936 endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be a new pre-tokenized dataset to download? this still points the dataset for 405B

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is being uploaded after which we will modify the instructions.

ShriyaRishab
ShriyaRishab previously approved these changes Aug 8, 2025
@ShriyaRishab

ShriyaRishab commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

@ZixianWangAMD - can you please sign the CLA?

@ethanself

ethanself commented Aug 13, 2025

Copy link
Copy Markdown
Contributor

@ZixianWangAMD - can you please sign the CLA?

Zixian has signed the CLA with user ZixianWangAMD, but the check is failing due to an incorrect Git Config. Commits were made locally with "Zixian Wang". GitHub usernames cannot have spaces.

I did notice that Zixian appears to have multiple GitHub accounts.

To set correct global configuration:
git config --global user.name "Your Correct Name Here"
git config --global user.email "your.email@example.com"

OR

Set local configuration for a specific repo:
git config user.name "Project Specific Name"
git config user.email "project.email@example.com"

Once config is fixed, a rebase of the local repo will need to be done to fix the associated author of commits.

@suachong suachong mentioned this pull request Aug 13, 2025
@ShriyaRishab

Copy link
Copy Markdown
Contributor

Close as duplicate of #814

@github-actions github-actions Bot locked and limited conversation to collaborators Aug 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants