-
Notifications
You must be signed in to change notification settings - Fork 8
Modify TheRock manylinux Dockerfile to take an arch build arg. #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,8 @@ FROM quay.io/pypa/manylinux_2_28_x86_64 | |
| ARG THEROCK_INDEX_URL=https://rocm.nightlies.amd.com/v2/gfx950-dcgpu/ | ||
| # TheRock version number (e.g. "7.13.0a20260401" or "" for latest). | ||
| ARG THEROCK_VERSION="" | ||
| # GPU architecture / device family (selects the rocm[device-<arch>] extra). | ||
| ARG GFX_ARCH=gfx950 | ||
|
|
||
| # Install patchelf and headers for numactl | ||
| RUN --mount=type=cache,target=/var/cache/dnf \ | ||
|
|
@@ -20,7 +22,7 @@ RUN --mount=type=cache,target=/var/cache/dnf \ | |
| RUN --mount=type=cache,mode=0755,target=/root/.cache/pip \ | ||
| /opt/python/cp312-cp312/bin/python3 -m pip install \ | ||
| --pre --index-url "${THEROCK_INDEX_URL}" \ | ||
| "rocm[libraries,devel,device-gfx950]${THEROCK_VERSION:+==}${THEROCK_VERSION}" && \ | ||
| "rocm[libraries,devel,device-${GFX_ARCH}]${THEROCK_VERSION:+==}${THEROCK_VERSION}" && \ | ||
|
Comment on lines
-23
to
+25
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are many cases where you may want to install support for multiple devices, but this only allows a single See the docs at https://github.com/ROCm/TheRock/blob/main/RELEASES.md#installing-multi-arch-rocm-python-packages.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you could technically set the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ScottTodd We usually only run tests in our CI with a single arch at a time. Would this be a blocker for TheRock CI? I understand that the workaround is a little unwieldy. I am making a PR to clean up the other comments but none of those will change current functionality.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's some of what we do with gfx arch lists:
Practically speaking, in the CI context when we build for a specific list of targets that list will be what those packages treat as "all" (in tarballs, in python package device extras, etc.), so jobs like the pytorch and jax builds could either use an explicit list of all targets that were built or they can look up the list from the provided rocm packages / use "all". So I don't think this would necessarily be blocking for integration with TheRock's CI/CD as
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ScottTodd I have just opened a PR here: #473 to fix this just in case. |
||
| ln -sf /opt/python/cp312-cp312/bin/rocm-sdk /usr/local/bin/rocm-sdk && \ | ||
| rocm-sdk init | ||
|
|
||
|
|
@@ -61,4 +63,5 @@ RUN --mount=type=cache,target=/var/cache/dnf \ | |
| dnf clean all | ||
|
|
||
| LABEL com.amdgpu.therock_index_url="$THEROCK_INDEX_URL" \ | ||
| com.amdgpu.therock_version="$THEROCK_VERSION" | ||
| com.amdgpu.therock_version="$THEROCK_VERSION" \ | ||
| com.amdgpu.gfx_arch="$GFX_ARCH" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still using a per-family index as the default, and that index does not include packages supporting any
device-*extras (pip will warn but not error when you try to install an extra that does not exist).I'd suggest switching to https://rocm.nightlies.amd.com/whl-multi-arch/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a default which is never used. The multi-arch index is passed in through CI so we never actually use the single-arch index anymore. This shouldn't affect functionality. I can clean this up though. You are right that this shouldn't be around, even as a leftover artifact.