Skip to content

feat: add SM87 and SM110 compute capability support for Jetson devices#844

Open
thomas-hiddenpeak wants to merge 1 commit intohuggingface:mainfrom
thomas-hiddenpeak:split-sm87-support
Open

feat: add SM87 and SM110 compute capability support for Jetson devices#844
thomas-hiddenpeak wants to merge 1 commit intohuggingface:mainfrom
thomas-hiddenpeak:split-sm87-support

Conversation

@thomas-hiddenpeak
Copy link
Copy Markdown

@thomas-hiddenpeak thomas-hiddenpeak commented Mar 12, 2026

Summary

Add CUDA compute capability support for Jetson devices:

  • SM87 (8.7) — Jetson Orin series (Ampere architecture)
  • SM110 (11.0) — Jetson Thor (Blackwell architecture)

Changes

backends/candle/src/compute_cap.rs

  • Add (87, 87) => true and (110, 110) => true match arms for dedicated Jetson binary support
  • Add comprehensive test coverage for SM87 and SM110 cross-compatibility

Dockerfile-jetson (new)

  • Dedicated Dockerfile for Jetson devices using official NVIDIA L4T images
  • Build stage: nvcr.io/nvidia/l4t-jetpack:r36.4.0 (JetPack 6.1, CUDA 12.6)
  • Runtime stage: nvcr.io/nvidia/l4t-cuda:12.6.11-runtime (minimal aarch64 runtime)
  • Currently builds SM87 (Jetson Orin) target only
  • SM110 (Jetson Thor) will be added when JetPack 7 L4T images become available on NGC

jetson-entrypoint.sh (new)

  • Runtime GPU detection and routing to the SM87 binary
  • Simplified CUDA compat handling for L4T environment

Testing

  • SM87: Verified on Jetson AGX Orin 64GB with JetPack 6.1 / CUDA 12.6
  • Both embedding and reranking models work correctly

Notes

  • No changes to existing Dockerfile-cuda-all or cuda-all-entrypoint.sh
  • Jetson support is fully isolated in its own Dockerfile as suggested by @alvarobartt
  • SM110 runtime matching is included in compute_cap.rs to future-proof; the Dockerfile will be extended when official L4T images for JetPack 7 (Jetson Thor) are available

@thomas-hiddenpeak thomas-hiddenpeak force-pushed the split-sm87-support branch 2 times, most recently from 09ea208 to ad8f9ab Compare March 12, 2026 17:23
@thomas-hiddenpeak thomas-hiddenpeak changed the title Add SM87 Docker image support Add SM87 and SM110 compute capability support Mar 12, 2026
Copy link
Copy Markdown
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomas-hiddenpeak!

Given that both sm_87 and sm_110 i.e., 8.7 and 11.0, compute capabilities are only for Jetson devices, don't you think it'd be better to create a custom Dockerfile for those as Dockerfile-jetson that builds both targets and comes with an entrypoint to forward to one or the other?

You're already familiar with it but see https://developer.nvidia.com/cuda/gpus

Comment on lines +33 to +36
elif [ ${compute_cap} -ge 80 -a ${compute_cap} -lt 87 ]; then
exec text-embeddings-router-80 "$@"
elif [ ${compute_cap} -eq 87 ]; then
exec text-embeddings-router-87 "$@"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will lead to 8.9 to be considered unsupported whilst 8.9 should indeed fallback to 8.0 compute capability instead if a dedicated target is not built, as there's no performance loss between those compute capabilities (Ampere and Ada Lovelace)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This has been resolved — we've reverted all changes to cuda-all-entrypoint.sh, so the SM89 → SM80 fallback behavior is untouched and works as before.

@alvarobartt alvarobartt added this to the v1.10.0 milestone Mar 12, 2026
@thomas-hiddenpeak thomas-hiddenpeak changed the title Add SM87 and SM110 compute capability support feat: add SM87 and SM110 compute capability support for Jetson devices Mar 12, 2026
@thomas-hiddenpeak
Copy link
Copy Markdown
Author

Hi @alvarobartt, great suggestion! You're right — both SM87 (Jetson Orin) and SM110 (Jetson Thor) are Jetson-specific compute capabilities, so a dedicated Dockerfile-jetson makes much more sense than modifying the existing Dockerfile-cuda-all.

I've updated the PR accordingly:

  • Reverted all changes to Dockerfile-cuda-all, cuda-all-entrypoint.sh, and matrix.json
  • Added Dockerfile-jetson: builds both SM87 and SM110 targets in a single image, with architecture-aware sccache (supports aarch64 natively)
  • Added jetson-entrypoint.sh: detects GPU compute capability at runtime and routes to the correct binary
  • Kept the compute_cap.rs changes for SM87/SM110 matching logic and tests

The Jetson support is now fully isolated — no changes to existing Docker infrastructure. Please take another look when you get a chance. Thanks! 🤗

- Add (87, 87) and (110, 110) match arms in compute_cap.rs for dedicated
  Jetson Orin (SM87) and Jetson Thor (SM110) binary support
- Add Dockerfile-jetson: builds SM87 binary using L4T JetPack r36.4.0
  (CUDA 12.6) as build base and l4t-cuda:12.6.11-runtime for deployment
- Add jetson-entrypoint.sh: runtime GPU detection for Jetson Orin (SM87)
- Add comprehensive test coverage for SM87 and SM110 cross-compatibility
@thomas-hiddenpeak
Copy link
Copy Markdown
Author

Hi @alvarobartt, a quick follow-up on the latest changes — I realized my previous comment wasn't fully accurate, so I want to clarify what's been updated:

What changed since my last comment:

  1. Base images: Switched from nvidia/cuda to official NVIDIA L4T images, which are purpose-built for Jetson:

    • Build stage: nvcr.io/nvidia/l4t-jetpack:r36.4.0 (JetPack 6.1, CUDA 12.6, aarch64)
    • Runtime stage: nvcr.io/nvidia/l4t-cuda:12.6.11-runtime (lightweight aarch64 runtime)
  2. SM110 removed from Dockerfile: My previous comment mentioned building both SM87 and SM110 targets, but after further research, JetPack 7 (which supports Jetson Thor / SM110) doesn't have official L4T container images on NGC yet. So the Dockerfile now only builds SM87 (Jetson Orin) to keep things working and verifiable today.

  3. SM110 kept in compute_cap.rs: The runtime matching logic for SM110 is still included as a forward-looking addition — it will be ready to use once JetPack 7 L4T images become available and we extend the Dockerfile.

What stays the same:

  • Fully isolated Jetson support via Dockerfile-jetson + jetson-entrypoint.sh (no changes to existing Docker infrastructure)
  • compute_cap.rs SM87/SM110 matching logic and comprehensive tests

Sorry for any confusion from the previous message. Let me know if you have any questions or further suggestions!

Copy link
Copy Markdown
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomas-hiddenpeak but I'd also add 11.0 compute capability within the Dockerfile-jetson to route to either 8.7 or 11.0 depending on the host compute capability, right? i.e., this image should work for both rather than dedicated for only one target

Comment on lines +7 to +12

# On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime.
# Add compat path if it exists.
if [ -d /usr/local/cuda/compat ]; then
export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this might not be required, right?

Suggested change
# On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime.
# Add compat path if it exists.
if [ -d /usr/local/cuda/compat ]; then
export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"
fi

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — on Jetson L4T, CUDA libraries are mounted from the host by nvidia-container-runtime, so the compat path is typically not needed. I've simplified this to just a conditional guard in case the path exists, but happy to remove it entirely if you prefer. The original cuda-all-entrypoint.sh has more elaborate version checking logic for the standard CUDA images, but that doesn't apply here.

Comment on lines +16 to +17
if [ ${compute_cap} -eq 87 ]; then
exec text-embeddings-router-87 "$@"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also build the target for text-embeddings-router-110 and add it here based on the host compute capability?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the ideal setup! Unfortunately SM110 (Jetson Thor) can't be reliably built with the currently available toolchain — CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities. JetPack 7 (with CUDA 13.x) will fix this, but its official L4T container images aren't on NGC yet.

The compute_cap.rs changes already include the (110, 110) match arm, so once JetPack 7 images are available, adding SM110 here will be a straightforward update — just add the second build target and this entrypoint route.

@thomas-hiddenpeak
Copy link
Copy Markdown
Author

Hi @alvarobartt, thanks for the feedback! I totally agree that ideally the Jetson image should support both SM87 and SM110 in one build.

However, there's a practical blocker for SM110 (Jetson Thor) right now:

  1. CUDA version limitation: The current L4T images ship with CUDA 12.6 (JetPack 6.x). CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities as well — so we can't reliably build a working SM110 binary with the available toolchain today.

  2. No JetPack 7 L4T images on NGC yet: JetPack 7, which will ship with CUDA 13.x and proper SM110 support, doesn't have official L4T container images available on NGC at this time.

What I'd suggest:

  • Merge the current PR with SM87 Dockerfile support (Jetson Orin — verified and working)
  • The compute_cap.rs changes already include SM110 matching logic, so the runtime side is ready
  • Once JetPack 7 L4T images land on NGC, we can add SM110 to the Dockerfile-jetson as a straightforward follow-up — just adding the second build target and entrypoint route

This way we get Jetson Orin users unblocked now without shipping something untested for Jetson Thor. Does that sound reasonable to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants