feat: add SM87 and SM110 compute capability support for Jetson devices#844
feat: add SM87 and SM110 compute capability support for Jetson devices#844thomas-hiddenpeak wants to merge 1 commit intohuggingface:mainfrom
Conversation
09ea208 to
ad8f9ab
Compare
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks @thomas-hiddenpeak!
Given that both sm_87 and sm_110 i.e., 8.7 and 11.0, compute capabilities are only for Jetson devices, don't you think it'd be better to create a custom Dockerfile for those as Dockerfile-jetson that builds both targets and comes with an entrypoint to forward to one or the other?
You're already familiar with it but see https://developer.nvidia.com/cuda/gpus
cuda-all-entrypoint.sh
Outdated
| elif [ ${compute_cap} -ge 80 -a ${compute_cap} -lt 87 ]; then | ||
| exec text-embeddings-router-80 "$@" | ||
| elif [ ${compute_cap} -eq 87 ]; then | ||
| exec text-embeddings-router-87 "$@" |
There was a problem hiding this comment.
Note that this will lead to 8.9 to be considered unsupported whilst 8.9 should indeed fallback to 8.0 compute capability instead if a dedicated target is not built, as there's no performance loss between those compute capabilities (Ampere and Ada Lovelace)
There was a problem hiding this comment.
Good catch! This has been resolved — we've reverted all changes to cuda-all-entrypoint.sh, so the SM89 → SM80 fallback behavior is untouched and works as before.
ad8f9ab to
dcbfbae
Compare
|
Hi @alvarobartt, great suggestion! You're right — both SM87 (Jetson Orin) and SM110 (Jetson Thor) are Jetson-specific compute capabilities, so a dedicated I've updated the PR accordingly:
The Jetson support is now fully isolated — no changes to existing Docker infrastructure. Please take another look when you get a chance. Thanks! 🤗 |
- Add (87, 87) and (110, 110) match arms in compute_cap.rs for dedicated Jetson Orin (SM87) and Jetson Thor (SM110) binary support - Add Dockerfile-jetson: builds SM87 binary using L4T JetPack r36.4.0 (CUDA 12.6) as build base and l4t-cuda:12.6.11-runtime for deployment - Add jetson-entrypoint.sh: runtime GPU detection for Jetson Orin (SM87) - Add comprehensive test coverage for SM87 and SM110 cross-compatibility
dcbfbae to
bf9ecfc
Compare
|
Hi @alvarobartt, a quick follow-up on the latest changes — I realized my previous comment wasn't fully accurate, so I want to clarify what's been updated: What changed since my last comment:
What stays the same:
Sorry for any confusion from the previous message. Let me know if you have any questions or further suggestions! |
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks @thomas-hiddenpeak but I'd also add 11.0 compute capability within the Dockerfile-jetson to route to either 8.7 or 11.0 depending on the host compute capability, right? i.e., this image should work for both rather than dedicated for only one target
|
|
||
| # On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime. | ||
| # Add compat path if it exists. | ||
| if [ -d /usr/local/cuda/compat ]; then | ||
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | ||
| fi |
There was a problem hiding this comment.
AFAIK this might not be required, right?
| # On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime. | |
| # Add compat path if it exists. | |
| if [ -d /usr/local/cuda/compat ]; then | |
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | |
| fi |
There was a problem hiding this comment.
You're right — on Jetson L4T, CUDA libraries are mounted from the host by nvidia-container-runtime, so the compat path is typically not needed. I've simplified this to just a conditional guard in case the path exists, but happy to remove it entirely if you prefer. The original cuda-all-entrypoint.sh has more elaborate version checking logic for the standard CUDA images, but that doesn't apply here.
| if [ ${compute_cap} -eq 87 ]; then | ||
| exec text-embeddings-router-87 "$@" |
There was a problem hiding this comment.
Shouldn't we also build the target for text-embeddings-router-110 and add it here based on the host compute capability?
There was a problem hiding this comment.
Yes, that would be the ideal setup! Unfortunately SM110 (Jetson Thor) can't be reliably built with the currently available toolchain — CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities. JetPack 7 (with CUDA 13.x) will fix this, but its official L4T container images aren't on NGC yet.
The compute_cap.rs changes already include the (110, 110) match arm, so once JetPack 7 images are available, adding SM110 here will be a straightforward update — just add the second build target and this entrypoint route.
|
Hi @alvarobartt, thanks for the feedback! I totally agree that ideally the Jetson image should support both SM87 and SM110 in one build. However, there's a practical blocker for SM110 (Jetson Thor) right now:
What I'd suggest:
This way we get Jetson Orin users unblocked now without shipping something untested for Jetson Thor. Does that sound reasonable to you? |
Summary
Add CUDA compute capability support for Jetson devices:
Changes
backends/candle/src/compute_cap.rs(87, 87) => trueand(110, 110) => truematch arms for dedicated Jetson binary supportDockerfile-jetson(new)nvcr.io/nvidia/l4t-jetpack:r36.4.0(JetPack 6.1, CUDA 12.6)nvcr.io/nvidia/l4t-cuda:12.6.11-runtime(minimal aarch64 runtime)jetson-entrypoint.sh(new)Testing
Notes
Dockerfile-cuda-allorcuda-all-entrypoint.shcompute_cap.rsto future-proof; the Dockerfile will be extended when official L4T images for JetPack 7 (Jetson Thor) are available