feat: Add JAX release support v0.10.0#6054
Conversation
✅ All Checks Passed — Ready for Review
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🎉 All checks passed! This PR is ready for review. |
There was a problem hiding this comment.
It would be great if we could move to wheels instead of tarballs and environment variable testing. Unit testing with theRock wheels without any environment variables also allows us to surface packaging issues if there are any. That is how we caught previous packaging issues. This way theRock CI not only tests libraries and systems bugs, it will also test the actual packaging
|
@ScottTodd , Is there any blocker for now from your side? If not. good to merge? |
|
I can do another review today. 15 other PRs in the queue. |
| build_mode: | ||
| description: "Build mode: native or manylinux." | ||
| type: string | ||
| required: true |
There was a problem hiding this comment.
nit: If there are only two expected string values, for workflow_dispatch you could use a "choice" input type to make this easier to use correctly.
| jax_repository: | ||
| description: "Repository containing the JAX release branch." | ||
| type: string | ||
| required: true |
There was a problem hiding this comment.
Can you add a default here, or at least mention which repositories are expected in the description? What syntax is this expecting, ROCm/jax or https://github.com/ROCm/jax?
| gfx_arch: | ||
| description: "GFX architecture used for the manylinux image build." | ||
| type: string | ||
| required: false | ||
| default: "" |
There was a problem hiding this comment.
Add an example to the description? We have a few syntax styles (gfx94X, gfx942, gfx942-dcgpu, etc.) and that helps a lot when working across multiple workflows.
| repository: | ||
| description: "Repository to checkout. Defaults to github.repository." | ||
| type: string | ||
| default: "" | ||
| ref: | ||
| description: "Branch, tag, or SHA to checkout. Defaults to the triggering ref." | ||
| type: string | ||
| default: "" |
There was a problem hiding this comment.
These repository and ref inputs aren't commonly provided when triggered by developers, so I'd prefer to keep them as the last inputs in the list. Looks like a few other workflows have similarly started adding inputs beyond them though.
I would move jax_repository, build_mode, and gfx_arch near the top of the inputs list since they will be set frequently.
| - name: Build JAX Wheels in manylinux container | ||
| if: ${{ inputs.build_mode == 'manylinux' }} | ||
| working-directory: jax-source | ||
| env: | ||
| ROCM_VERSION: ${{ inputs.rocm_version }} | ||
| PYTHON_VERSION: ${{ inputs.python_version }} | ||
| run: | | ||
| docker run --rm \ |
There was a problem hiding this comment.
Can you also update the docs at https://github.com/ROCm/TheRock/tree/main/external-builds/jax#build-instructions ? (The general progression for development work and feature enablement should be local builds --> dev builds --> release builds, this jumps straight to release builds in github actions workflows which is the hardest to debug)
| rocm_package_find_links_url: | ||
| description: "ROCm package index / find-links URL for the JAX manylinux build." | ||
| type: string | ||
| default: "https://rocm.devreleases.amd.com/whl-multi-arch/" |
There was a problem hiding this comment.
Remove default that won't work here too
| - name: Install ROCm packages | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| python -m pip install --pre \ | ||
| --index-url "${ROCM_PACKAGE_FIND_LINKS_URL}" \ |
There was a problem hiding this comment.
Wait. Which URL format do you want? They are different. The variable name here says "find links url" but it's installing with --index-url?
| --index-url "${ROCM_PACKAGE_FIND_LINKS_URL}" \ | ||
| rocm \ | ||
| rocm-sdk-devel \ | ||
| rocm-sdk-device-gfx942 |
There was a problem hiding this comment.
test_amdgpu_family is an input but this assumes that gfx942 is all that needs to be installed?
Use this code instead:
- ROCm:
TheRock/.github/workflows/test_rocm_wheels.yml
Lines 147 to 176 in 1371136
- torch:
TheRock/.github/workflows/test_pytorch_wheels.yml
Lines 148 to 157 in 1371136
TheRock/.github/workflows/test_pytorch_wheels.yml
Lines 198 to 212 in 1371136
| python -m pip install --pre \ | ||
| --index-url "${ROCM_PACKAGE_FIND_LINKS_URL}" \ | ||
| rocm \ | ||
| rocm-sdk-devel \ |
There was a problem hiding this comment.
Do JAX tests need rocm-sdk-devel? They shouldn't. Devel packages are for building, and most users of frameworks are not expected to install them.
See my other comment, this should just install rocm[libraries,device-gfxXXXX], or maybe something like jax[device-gfxXXXX] in the future if we want to go the same route as we did for torch extras.
| rocm-sdk-device-gfx942 | ||
| rocm-sdk init | ||
| echo "ROCM_ROOT=$(rocm-sdk path --root)" >> "$GITHUB_ENV" | ||
| ln -sfn "$(rocm-sdk path --root)" /opt/rocm |
There was a problem hiding this comment.
Please don't symlink rocm python packages to /opt/rocm/.
- Framework should not be dependent on the devel package at runtime (e.g. in tests)
- Python packages exist in python environments, they should not leak outside to the full system like this
Pre-commit check failed⛔ pre-commit failed Please run locally:
This repo uses |
Summary
ISSUE ID: #3878
Related PR: ROCm/rockrel#65
Extend the JAX multi-arch release workflows to support both
rocm-jaxlib-v0.9.1androcm-jaxlib-v0.10.0release lines.Changes
Release workflow
rocm-jaxlib-v0.9.1rocm-jaxlib-v0.10.0jax_repositorybuild_modegfx_archBuild workflow
rocm-jaxlib-v0.9.1
ROCm/rocm-jax.build/ci_build.rocm-jaxlib-v0.10.0
ROCm/jax.ROCm/rocm-jax.build/build.py.Testing
On rockrel:
From
Multi-Arch Release Linux JAX Wheelsworkflow: https://github.com/ROCm/rockrel/actions/runs/28078871973 (Completed)From
Multi-Arch Releaseworkflow https://github.com/ROCm/rockrel/actions/runs/28078897916 (Completed)Preserve existing GPU-family based runner selection through
configure_target_run.py.Add a JAX device visibility sanity check before running tests: