Vulkan: support Linux/Windows desktop GPUs and opt-in wheel builds by Reubend · Pull Request #20138 · pytorch/executorch

Reubend · 2026-06-09T03:53:55Z

Summary

We already provide good desktop support for NVIDIA GPUs through our CUDA backend, which works well on both Linux and Windows. However, our Vulkan backend only provides solid support for Android, leaving AMD GPUs without sufficient support. That's a shame since Vulkan is well supported on every operating system and with every major GPU manufacturer.

This PR gets the Vulkan backend building and running correctly on Linux and Windows desktop GPUs (NVIDIA, AMD, Intel), and adds an opt-in way to build pre-built Vulkan binaries. It leaves everything Android related the same so that we don't regress anything for that platform.

Most of the backend was already portable, So this is mostly build fixes, a few small fixes for desktop GPUs, and packaging/CI plumbing.

fixes:#20140

Changes

Picks the right exception flag for MSVC vs GCC/Clang, finds glslc on Windows, suppresses third-party (VMA) warnings on GCC/MSVC, and gives a clear error if the Vulkan submodules aren't checked out.
Compiles the cooperative-matrix shader. It needs a newer Vulkan target than the default, so that one shader now builds against Vulkan 1.3.
Fixes correctness on desktop GPUs. Turns on the device features the shaders actually use (int16/int64/float64), which were never enabled before; picks a discrete GPU instead of the first one found if one exists (with an ETVK_DEVICE_INDEX override for multi-GPU machines); avoids an invalid image-copy on compute-only queues; and fixes a buffer-size check that compared the wrong units.
Makes the small_texture_limits export option work as intended. It was being silently dropped; now it round-trips so you can target GPUs with smaller texture limits. Adds a small unit test for it.
Adds opt-in packaging and CI. Behind the EXECUTORCH_BUILD_VULKAN flag, the wheel build can include Vulkan; adds a real-GPU (NVIDIA) test job and a Windows/MSVC build job. The default wheel and other backends are untouched.

Android Safety

Every change is gated so Android behaves exactly as before: build differences are behind compiler/OS checks, the new export and runtime options are opt-in and default to today's behavior, and the device-selection / feature changes are based on what the GPU reports . The existing SwiftShader CI job is unchanged.

Testing

Built with the Vulkan SDK (glslc on PATH) and run on an NVIDIA A100 (driver 580.126.09, Vulkan 1.4.312):

# Build the backend + a runner (the Vulkan SDK's glslc must be on PATH)
cmake -B cmake-out -S . -GNinja -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_VULKAN=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON

cmake --build cmake-out --target vulkan_backend executor_runner   # 402/402, 0 errors

# Export a model to Vulkan and run it on the GPU
python -m examples.vulkan.export -m mv2 -o .
./cmake-out/executor_runner --model_path mv2.pte

I ran a small fp32 model and an int8 model on the A100 and matched the reference output (fp32 to 5 decimals, int8 to 4 decimals). The int8 run exercises the integer-dot-product / int16 shaders that SwiftShader can't run.
All the shaders compiled fine and the new unit tests added here passed.

I didn't test the windows build yet, so I'll be relying on CI for that.

TODO

We also need to publish a Vulkan wheel to PyPI. The build supports it (EXECUTORCH_BUILD_VULKAN=1 + glslc), but we need a Vulkan entry added to the shared build-wheels-*.yml workflows.

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

pytorch-bot · 2026-06-09T03:53:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20138

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

❌ 44 New Failures, 8 Unclassified Failures

As of commit 3c00f2e with merge base e285edf ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 3f524ea4171b0ae2e14f523261eb87331d92edeee0f9e4ae102ef17ffaa1f0ff /exec failed with exit code 1
pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t e688f87ac15a877a8ecf18979592f8ccd86b6c182166b86f2d999608ae0efc92 /exec failed with exit code 1
pull / test-qnn-buck-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t ffe0e523c9d09d0d50b9a5b5c86cfdfce558dda6893ae71a4f8164d87f897280 /exec failed with exit code 1
pull / test-qnn-delegate-linux / linux-job (gh)
RuntimeError: Command docker exec -t cbe7a121bd0e286b4b51a90dd95b762d87e66f5c28f39acbbd0307f9692b5c94 /exec failed with exit code 1
pull / test-qnn-models-linux (dl3) / linux-job (gh)
RuntimeError: Command docker exec -t 3f429b59863206a705397c2ad649859535d85ed3c6b97b7f78b327a0f4e5c5ef /exec failed with exit code 1
pull / test-qnn-models-linux (mv2) / linux-job (gh)
RuntimeError: Command docker exec -t 2d81e201aab208d3097c897e5de8a55f3d842f9a35082490d1d8a19f351bfaa5 /exec failed with exit code 1
pull / test-qnn-models-linux (mv3) / linux-job (gh)
RuntimeError: Command docker exec -t 1d4692def2b5f20b784d308bb8a8d73c88ca1c2c345a03384fdb743261c58faf /exec failed with exit code 1
pull / test-qnn-passes-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2e7df9fbc55265323c1e86ba7f25c1be81954b5f37e10b02cfb65b9599f61c57 /exec failed with exit code 1
pull / test-qnn-python-imports-linux / linux-job (gh)
RuntimeError: Command docker exec -t 92c7039a3aca117ae905c998de194355cae822ac67463173858a1d399b1cd382 /exec failed with exit code 1
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t 76bb1ded25756f585acfa1e8c0b17dcf592edb32a2d58a27cd37dfb00c6ba9ae /exec failed with exit code 1
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 803e5a1992af5deee27f89fbdeac7cea90dfcea393fc16c216eba23c4954c8c4 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 837265d730cbf57013e925bf7486f9bc0e18fc6b4b150d6094b5aaffecd7a364 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t 54114fce1db9085d027ecc301834dddd09cabad361b29b3140c59ef8876b7fff /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t d4de591e04f5eb73dc6caeb6e8e128e3631f91aef647c75bdba02394bdd0e5b5 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh)
RuntimeError: Command docker exec -t f919ecff437dd8be687dc0a93762291ec572ad9e97e139af34a1ea242b241b1f /exec failed with exit code 1
pull / test-sqnr-static-llm-qnn-linux (smollm2_135m) / linux-job (gh)
RuntimeError: Command docker exec -t c20c898383c3bb09b6c1b3ad4255d7f629f155059591469855288291fdae95e4 /exec failed with exit code 1
pull / test-static-llama-qnn-linux (stories_110m) / linux-job (gh)
RuntimeError: Command docker exec -t 859df49e85bab93e91109979c7e9a41d8d5664b55b2cfcc0d10de68081cc1020 /exec failed with exit code 1
pull / test-static-llama-qnn-linux (stories_260k_bc) / linux-job (gh)
RuntimeError: Command docker exec -t 70f698ad01ba1d0974cc01f22fef6f9c89c3c07331922d72bdfbc1a70afeca89 /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch
Test QNN Backend / test-qnn / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t 7c732f70ef05cc2f9c810ec4ee527f00eb12bdfeaea0a32ac9a08d9bbaec6e7e /exec failed with exit code 1
Test QNN Backend / test-qnn / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t a30fd13ea7531488d5a83c35a92447a1ecca739ca55d237f6994f648553ff435 /exec failed with exit code 1
trunk / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 7957566ee736476008e794991cb581e6e6ff1d77b6e114b038205e185ed7c884 /exec failed with exit code 1
trunk / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t eb14c3852bd4717652cbe3748450a829a46ee3c4d4b0ca7e4d019073e1588ecf /exec failed with exit code 1
trunk / test-qnn-model (fp32, conv_former) / linux-job (gh)
RuntimeError: Command docker exec -t 736db03cc86899d93d0d9bdb9d43ec08e5ffe38de15f55c891d9652b768d1a9f /exec failed with exit code 1
trunk / test-qnn-model (fp32, dl3) / linux-job (gh)
RuntimeError: Command docker exec -t 640c359f8bf0f6f2b6ef2297beee972b5cec8727f918a0edff12e647b7c816f4 /exec failed with exit code 1
trunk / test-qnn-model (fp32, ic3) / linux-job (gh)
RuntimeError: Command docker exec -t ab4c994aa83f1986bc90bbb7d9262b121d21bfe5d0614fb5ef0c79b77d75d123 /exec failed with exit code 1
trunk / test-qnn-model (fp32, ic4) / linux-job (gh)
RuntimeError: Command docker exec -t 05f6ecfe7444b742dd50699809e711723d4eb15a3493a08d0ab37f6d0eeb2fd9 /exec failed with exit code 1
trunk / test-qnn-model (fp32, mb) / linux-job (gh)
RuntimeError: Command docker exec -t b46cf2814795ed184a32a8a741ef9b0c00fc9494fd33a02b1e82b7f761c07702 /exec failed with exit code 1
trunk / test-qnn-model (fp32, mv2) / linux-job (gh)
RuntimeError: Command docker exec -t acc2766e6d62d83abb497ec76bb4f88d6d3b64650a40dcadc52cd1db23ae1374 /exec failed with exit code 1
trunk / test-qnn-model (fp32, mv3) / linux-job (gh)
RuntimeError: Command docker exec -t 1907fe59da748dc1eefb28d543cc44e90b64c69e91fd0dff5b91690c725277eb /exec failed with exit code 1
trunk / test-qnn-model (fp32, vit) / linux-job (gh)
RuntimeError: Command docker exec -t ab14a135ae0958be6b3ee07a000dd3dbd75118c4ff4cdc65e30d5fab3c33db7d /exec failed with exit code 1
trunk / test-qnn-model (fp32, w2l) / linux-job (gh)
RuntimeError: Command docker exec -t 9e7ffde8f4a6f77e706d725f38d12e209ae26cf62996cfcc44fa1f7df5729cbb /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, albert) / linux-job (gh)
RuntimeError: Command docker exec -t 2de0885e62f3c0cfd6f801b9802add81371d3a846f8bca628c4139ff8c84a907 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, bert) / linux-job (gh)
RuntimeError: Command docker exec -t acb70ec1548dc7dc57bce35203cc3399962350417d665fa2a4d522420fbea04f /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, cvt) / linux-job (gh)
RuntimeError: Command docker exec -t b764ad6231d9adc734d0f5e3e653c8cb69a6b9e03b7b37dea1dc01af0d3b7ec0 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, distilbert) / linux-job (gh)
RuntimeError: Command docker exec -t 464252a522cff2eada8376a2b5b587fde1ba986372c19a134f31fc685896d210 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, dit) / linux-job (gh)
RuntimeError: Command docker exec -t 3d0dcdfead7f3855798775708b14189332eb877604616697de9a177bdfbf6aed /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, efficientnet) / linux-job (gh)
RuntimeError: Command docker exec -t a38e686906f0822f13e8e330aa9c6ffeae7e462511c268b54ec405c00cfa5e42 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, focalnet) / linux-job (gh)
RuntimeError: Command docker exec -t 442d499305e3483876a6ef99bc3a05c988ef0190e0c3e8e90dc37566614dad11 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, mobilevit_v1) / linux-job (gh)
RuntimeError: Command docker exec -t 5b706646822cfc2e4f065c65f70fa73f7f299b0aa1abd316ba81748b371dbf4d /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, mobilevit_v2) / linux-job (gh)
RuntimeError: Command docker exec -t 16bb7898cc1af3ba5d7699042f466b7e79f5461803409f930c25193da751e382 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, pvt) / linux-job (gh)
RuntimeError: Command docker exec -t 0a0c9bbeaebf862b30e68482a87e42de84e1fa33ad3fe4467cff6646c14459d1 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, roberta) / linux-job (gh)
RuntimeError: Command docker exec -t 1aed280cb9244d3e336df6a93709465f3a458983cb06c40318a0b883b6afdd0f /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, swin) / linux-job (gh)
RuntimeError: Command docker exec -t 489fe7b5d7846f8610a5dafcd1d2431d80efe91a064b3f81800407964dbac472 /exec failed with exit code 1

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Linux Wheels / pytorch/executorch / build-manywheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x86_64
Cadence Build & Test / hifi-build / hifi4 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Input required and not supplied: aws-region
pull / test-qnn-direct-build-linux / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t c6bb94d866737718e5c09ff2878a57e8de1b35af2277a80bf571c90382bbe041 /exec failed with exit code 127
Test Vulkan Backend (specialized runners) / test-vulkan-nvidia / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 2f9cbced40b59a4a436a78130cdec12ad652c94bcb06ef8cfa98fef0755ce8fd /exec failed with exit code 1
trunk / test-cortex-m-ops / run (cortex-m0plus) / cortex-m-ops-cortex-m0plus (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 92b74693abf5b191b11fd17820cb22a7d193cbe339f9a9c24714166d05d731be /exec failed with exit code 4
trunk / test-cortex-m-ops / run (cortex-m7) / cortex-m-ops-cortex-m7 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t f17aabb17bf48e93c0712edae572e1cb29ba22bab40ea5096842d442fe0bdecb /exec failed with exit code 4

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-09T03:54:04Z

The committers listed above are authorized under a signed CLA.

✅ login: Reubend / name: Reuben Dunn (427011e)

Reubend · 2026-06-09T16:56:20Z

I took a look at the 3 test failures that the CI found, and I don't think that they're caused by any of my changes here. Please let me know if you find anything that indicates differently.

SS-JIA · 2026-06-11T22:52:39Z

ack. Taking a look soon.

SS-JIA

⏺ PR #20138 — changes by theme

  Theme 1: Build portability (make it compile off-Android)

  - backends/vulkan/CMakeLists.txt — per-compiler exception flag (/EHsc MSVC vs -fexceptions); submodule-existence check w/ actionable error.
  - backends/vulkan/cmake/ShaderLibrary.cmake — find glslc via VULKAN_SDK HINTS (Windows); graceful skip in wheel builds.
  - backends/vulkan/runtime/vk_api/memory/vma_api.h — warning suppression for GCC + MSVC (was clang-only).
  - backends/vulkan/runtime/gen_vulkan_spv.py — NameError guard when cache_dir is None.

  Theme 2: Shader compilation

  - backends/vulkan/runtime/graph/ops/glsl/coopmat_mm.yaml — target Vulkan 1.3 so GL_KHR_cooperative_matrix (needs SPIR-V 1.6) compiles.

  Theme 3: Runtime correctness on desktop GPUs

  - vk_api/Adapter.cpp — enable int16/int64/float64 device features shaders already use.
  - vk_api/Runtime.cpp — prefer discrete GPU over software/integrated; ETVK_DEVICE_INDEX override.
  - api/Context.cpp — guard blit against compute-only queues.
  - api/containers/Tensor.cpp — fix buffer-size check (bytes vs numel).

  Theme 4: AOT compile-option round-trip fix

  - vulkan_preprocess.py — deserialize small_texture_limits + skip_memory_planning (were silently dropped); fix tuple-mutation bug.
  - partitioner/vulkan_partitioner.py — wire small_texture_limits into partition texture limits.
  - utils.py — add SMALL_TEXTURE_LIMITS constant.
  - test/test_vulkan_compile_options.py — regression test for round-trip.

  Theme 5: Packaging (opt-in wheel)

  - setup.py — build vulkan_backend target when EXECUTORCH_BUILD_VULKAN.
  - tools/cmake/preset/pybind.cmake — enable Vulkan in Linux/Windows wheel if flag + glslc present.

  Theme 6: CI infra

  - .github/workflows/test-backend-vulkan.yml — new real-GPU (NVIDIA) job.
  - .github/workflows/vulkan-windows.yml — new MSVC build job.
  - .ci/scripts/test_backend.sh — pick real-gpu vs swiftshader ICD.
  - .ci/scripts/setup-vulkan-linux-deps.sh — arg-based SwiftShader/real-GPU setup.
  - .ci/scripts/setup-vulkan-windows-deps.ps1, setup-windows-msvc-vulkan.ps1 — Windows glslc + build.
  - .ci/scripts/wheel/pre_build_script.sh — provision Vulkan SDK/submodules when flag set.
  - .ci/scripts/wheel/test_linux.py, test_windows.py — check VulkanBackend registered.

All changes from Theme 1 through 5 look super solid! Thanks for putting together those fixes.

For Theme 6, just have a small request regarding the new CI tests added. Overall, the PR looks great!

I'll also import this PR as a diff just so that these changes can run against our Meta internal CI flows. Note that this diff won't be landed, I will abandon and unlink it from the PR once we get the CI signals.

SS-JIA · 2026-06-12T02:38:35Z

+  // expose compute-only queue families this could otherwise be invalid usage.
+  // On mobile the single universal queue always has these bits set.
+  VK_CHECK_COND(
+      queue_.capabilities & (VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_TRANSFER_BIT),


Potentially valid point raised by Claude:

Context.cpp blit guard — ⚠️ one real question. Guard allows GRAPHICS | TRANSFER. But per Vulkan spec vkCmdBlitImage requires GRAPHICS specifically — transfer queues do not support blit. Compute+transfer-without-graphics queue passes check but blit still invalid usage. In practice selected compute queue on desktop usually has graphics, so works. Still: check should likely be VK_QUEUE_GRAPHICS_BIT only. Strictly better than no guard (today), but the TRANSFER_BIT could let an invalid case slip silently.

If true, seems that only GRAPHICS_BIT should be checked.

SS-JIA · 2026-06-12T02:40:13Z

@@ -312,7 +356,7 @@ Runtime::Runtime(const RuntimeConfig config)
    try {
      switch (config.default_selector) {
        case AdapterSelector::First:


optional, but with the updated behaviour it seems First is no longer accurate. Would recommend renaming to Auto, perhaps.

SS-JIA · 2026-06-12T03:08:33Z

@@ -0,0 +1,48 @@
+name: Test Vulkan Backend Windows Build


I would recommend creating a new vulkan.yml workflow file and just putting all vulkan CI jobs that require special runners there (i.e. build-vulkan-windows-msvc and test-vulkan-real-gpu, which perhaps should be renamed to test-vulkan-nvidia), for example:

# .github/workflows/vulkan.yml name: Test Vulkan Backend (specialized runners) on: push: branches: [main, release/*] tags: [ciflow/nightly/*] pull_request: paths: - .github/workflows/vulkan.yml - backends/vulkan/** - .ci/scripts/setup-vulkan-linux-deps.sh - .ci/scripts/setup-vulkan-windows-deps.ps1 - .ci/scripts/setup-windows-msvc-vulkan.ps1 workflow_dispatch: concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} cancel-in-progress: true permissions: contents: read jobs: changed-files: name: Get changed files uses: ./.github/workflows/_get-changed-files.yml with: include-push-diff: true # so push commits can also be path-filtered run-decision: name: CI run decision uses: ./.github/workflows/_ci-run-decision.yml test-vulkan-nvidia: needs: [changed-files, run-decision] # Path-filtered: skip commits that don't touch Vulkan-relevant paths, # except on sampled full runs (see _ci-run-decision.yml). if: | github.event_name != 'pull_request' && ( contains(needs.changed-files.outputs.changed-files, 'backends/vulkan/') || contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-vulkan-linux-deps.sh') || contains(needs.changed-files.outputs.changed-files, '.github/workflows/vulkan.yml') || needs.run-decision.outputs.is-full-run == 'true' ) uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main # ... (existing real-gpu config) build-vulkan-windows-msvc: needs: [changed-files, run-decision] if: | contains(needs.changed-files.outputs.changed-files, 'backends/vulkan/') || contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-vulkan-windows-deps.ps1') || contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-windows-msvc-vulkan.ps1') || contains(needs.changed-files.outputs.changed-files, '.github/workflows/vulkan.yml') || needs.run-decision.outputs.is-full-run == 'true' uses: pytorch/test-infra/.github/workflows/windows_job.yml@main # ... (existing windows config)

Also on the topic of test-vulkan-real-gpu / test-vulkan-nvidia, I think we can also consider:

Have it replicate the workflow of both test-vulkan-models-linux and test-vulkan-operators-linux for full coverage

Remove the github.event_name != 'pull_request' check, since the changed-files mechanism + on:pull_request:paths mechanism should drastically reduce the frequency at which this job is triggered.

meta-codesync · 2026-06-12T03:19:32Z

@SS-JIA has imported this pull request. If you are a Meta employee, you can view this in D108371716.

SS-JIA · 2026-06-12T20:28:00Z

+  // max_buffer_numel() returns maxStorageBufferRange, which is a size in bytes,
+  // so compare it against the buffer size in bytes (not the element count).
+  VK_CHECK_COND(
+      element_size(dtype) * numel <=


Although this updated check is correct, it seems to be breaking some of our Meta Internal tests 😅

For context, internally we have some tests that run LLMs on-device, which require large tensors for the embedding weights. Since the check now multiples numel * element_size(dtype), the tests are now failing with:

libc++abi: terminating due to uncaught exception of type N9vkcompute5vkapi5ErrorE: Exception raised from allocate_buffer at xplat/executorch/backends/vulkan/runtime/api/containers/Tensor.cpp:687: (element_size(dtype) * numel <= context_ptr->adapter_ptr()->max_buffer_numel()) is false!

I do realize this means the tests were hitting undefined behaviour (or alternatively, the driver is not reporting the maximum allowed buffer nbytes correctly since the models still function correctly) but to avoid breaking the tests, I would recommend leaving the check as-is for now, and documenting the discrepancy for now while I figure out a way to handle this scenario for LLMs.

The Vulkan backend was developed for Android GPUs. This makes it build and run on Linux/Windows desktop discrete GPUs (NVIDIA/AMD/Intel) and adds opt-in pre-built Vulkan wheels, with no change to Android behavior (build divergence is behind compile-time guards; runtime changes key off queried capabilities). Covers build portability, discrete-GPU correctness fixes (real-GPU device/ICD selection, shaderInt16/Int64/Float64 enablement, a blit queue guard, and texel-rounded buffer allocations to avoid out-of-bounds vec4 reads), and EXECUTORCH_BUILD_VULKAN-gated CI/packaging (a new vulkan.yml runs the real-GPU NVIDIA and Windows MSVC jobs, plus opt-in wheel plumbing). Tested on an NVIDIA A100; the SwiftShader CI path is unchanged. This change was authored with Claude.

Reubend · 2026-06-13T01:20:14Z

Thanks @SS-JIA! I addressed the comments and also fixed a separate issue I found inside of allocate_buffer. For the VK_CHECK_COND that was breaking internal tests, I reverted it to the way it was but left a comment indicating how we need to fix it in the future.

Reubend added module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ release notes: vulkan Changes to the Vulkan backend delegate labels Jun 9, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2026

Reubend force-pushed the vulkan-compatibility branch 2 times, most recently from 23db5fc to 1904c5c Compare June 9, 2026 04:47

Reubend marked this pull request as ready for review June 9, 2026 16:55

Reubend requested review from SS-JIA, kirklandsign and larryliu0820 as code owners June 9, 2026 16:55

Reubend added this to ExecuTorch Vulkan Backend Jun 9, 2026

github-project-automation Bot moved this to To triage in ExecuTorch Vulkan Backend Jun 9, 2026

SS-JIA requested changes Jun 12, 2026

View reviewed changes

SS-JIA reviewed Jun 12, 2026

View reviewed changes

Reubend force-pushed the vulkan-compatibility branch from 1904c5c to 3c00f2e Compare June 13, 2026 01:17

Reubend had a problem deploying to cadence June 13, 2026 01:17 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: support Linux/Windows desktop GPUs and opt-in wheel builds#20138

Vulkan: support Linux/Windows desktop GPUs and opt-in wheel builds#20138
Reubend wants to merge 1 commit into
pytorch:mainfrom
Reubend:vulkan-compatibility

Reubend commented Jun 9, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reubend commented Jun 9, 2026

Uh oh!

SS-JIA commented Jun 11, 2026

Uh oh!

SS-JIA left a comment

Uh oh!

SS-JIA Jun 12, 2026

Uh oh!

SS-JIA Jun 12, 2026

Uh oh!

SS-JIA Jun 12, 2026

Uh oh!

meta-codesync Bot commented Jun 12, 2026

Uh oh!

SS-JIA Jun 12, 2026 •

edited

Loading

Uh oh!

Reubend commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Reubend commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Android Safety

Testing

TODO

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20138

❗ 1 Active SEVs

❌ 44 New Failures, 8 Unclassified Failures

Uh oh!

linux-foundation-easycla Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reubend commented Jun 9, 2026

Uh oh!

SS-JIA commented Jun 11, 2026

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

SS-JIA Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

SS-JIA Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

SS-JIA Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

meta-codesync Bot commented Jun 12, 2026

Uh oh!

SS-JIA Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reubend commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reubend commented Jun 9, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 9, 2026 •

edited

Loading

SS-JIA Jun 12, 2026 •

edited

Loading