[Doc] Cuda Plugin EP integration by tianleiwu · Pull Request #2236 · microsoft/onnxruntime-genai

tianleiwu · 2026-06-22T18:45:12Z

No description provided.

Copilot

Pull request overview

Adds a new documentation page describing how ONNX Runtime GenAI integrates the CUDA Execution Provider when used as a V2 plugin, including intended selection flow, deployment layouts, build knobs, and testing guidance.

Changes:

Introduces docs/cuda_plugin_ep_integration.md to document CUDA plugin EP registration and selection behavior.
Includes diagrams and step-by-step guidance for deployment and validation scenarios.

+- **CUDA plugin EP** — a standalone shared library
+  (`libonnxruntime_providers_cuda_plugin.so` / `onnxruntime_providers_cuda_plugin.dll`)
+  registered with the `OrtEnv` and consumed through the **V2 plugin API**
+  (`AppendExecutionProvider_V2` over `OrtEpDevice`s), discovered under the EP name
+  `CudaPluginExecutionProvider`.


+```
+GetDeviceInterface(CUDA)                  // genai-owned device + compute stream
+AddCudaStreamConfig(...)                  // share genai's stream via user_compute_stream
+
+[bundled build only] TryRegisterBundledCudaPluginEp()   // self-register default library
+
+if AppendExecutionProviderV2("CudaPluginExecutionProvider")   // plugin EP, if present
+    -> done
+else if AppendExecutionProviderV2("CUDAExecutionProvider")    // built-in plugged in
+    -> done
+else
+    AppendProviderBridgeExecutionProvider(...)                // built-in provider bridge
+```


+```mermaid
+flowchart TD
+    A[AppendExecutionProvider cuda] --> B[GetDeviceInterface CUDA<br/>AddCudaStreamConfig]
+    B --> C{bundled build?}
+    C -- yes --> D[TryRegisterBundledCudaPluginEp<br/>register default library name]
+    C -- no --> E
+    D --> E{plugin EP registered?<br/>CudaPluginExecutionProvider}
+    E -- yes --> F[AppendExecutionProvider_V2<br/>plugin EP]
+    E -- no --> G{built-in plugged in?<br/>CUDAExecutionProvider}
+    G -- yes --> H[AppendExecutionProvider_V2<br/>built-in EP]
+    G -- no --> I[AppendProviderBridgeExecutionProvider<br/>CUDA_V2 options]
+```


+- C API: `OgaRegisterExecutionProviderLibrary("CudaPluginExecutionProvider", path)`
+- Python: `og.register_execution_provider_library("CudaPluginExecutionProvider", path)`
+- C#: `OrtEnv.Instance().RegisterExecutionProviderLibrary("CudaPluginExecutionProvider", path)`


+In the bundled layout genai calls `TryRegisterBundledCudaPluginEp()`, which registers the
+platform default library file name (`libonnxruntime_providers_cuda_plugin.so` /
+`.dylib` / `.dll`). The bare file name is resolved by the OS loader through the
+genai/onnxruntime RPATH (`$ORIGIN` on Linux), so the plugin is found next to
+`libonnxruntime`. Registration is idempotent (deduped by
+`EnsureExecutionProviderLibraryRegistered`) and best-effort: if the library is missing or
+fails to load, a warning is logged and the flow falls back to the built-in CUDA EP.


+### Build option
+
+Declared in [`cmake/options.cmake`](../cmake/options.cmake):
+
+```cmake
+cmake_dependent_option(REGISTER_BUNDLED_CUDA_PLUGIN_EP
+  "Auto-register the bundled CUDA plugin EP library" OFF "USE_CUDA" OFF)
+```
+
+When `ON`, [`cmake/check_cuda.cmake`](../cmake/check_cuda.cmake) adds the compile
+definition `ORTGENAI_REGISTER_BUNDLED_CUDA_PLUGIN_EP=1`, which gates the
+`TryRegisterBundledCudaPluginEp()` call and the default-library-name constants in
+`session_options.cpp`. The option depends on `USE_CUDA` and defaults to `OFF`, so the
+default build is the clean "caller registers out-of-band" model.
+
+Build examples:
+
+```bash
+# Default: separate-directory layout, caller registers the plugin out-of-band.
+python build.py --use_cuda
+
+# Bundled layout: genai self-registers the plugin shipped next to libonnxruntime.
+python build.py --use_cuda --cmake_extra_defines REGISTER_BUNDLED_CUDA_PLUGIN_EP=ON
+```
+


+| EP | V2 registration name | V1 / built-in fallback | Library registration |
+| --- | --- | --- | --- |
+| WebGPU | `WebGpuExecutionProvider` | legacy by name | auto-surfaced by libonnxruntime (in-tree) |
+| NvTensorRtRtx | `NvTensorRTRTXExecutionProvider` | legacy by name | caller, out-of-band |
+| CUDA | `CudaPluginExecutionProvider` | built-in provider bridge | caller out-of-band (default) **or** genai bundled (build option) |


+## Files changed
+
+| File | Change |
+| --- | --- |
+| [`cmake/options.cmake`](../cmake/options.cmake) | Added `REGISTER_BUNDLED_CUDA_PLUGIN_EP` option (default `OFF`). |
+| [`cmake/check_cuda.cmake`](../cmake/check_cuda.cmake) | Emits `ORTGENAI_REGISTER_BUNDLED_CUDA_PLUGIN_EP=1` when the option is `ON`. |
+| [`src/cuda/session_options.cpp`](../src/cuda/session_options.cpp) | Try-plugin-then-built-in selection; bundled self-registration guarded by the compile definition. |


+- **Separate-directory layout (default build):** register the plugin out-of-band, e.g.
+  `og.register_execution_provider_library("CudaPluginExecutionProvider", "<plugin dir>/libonnxruntime_providers_cuda_plugin.so")`,
+  then run a model with the `cuda` provider and confirm the plugin EP is selected (the
+  graph capture/replay log lines mention `CudaPluginExecutionProvider`).
+- **Bundled layout (`REGISTER_BUNDLED_CUDA_PLUGIN_EP=ON`):** place the plugin library next
+  to `libonnxruntime`, run a model with no caller-side registration, and confirm the plugin
+  EP is selected automatically.


add doc

3302c85

Copilot AI review requested due to automatic review settings June 22, 2026 18:45

Copilot started reviewing on behalf of tianleiwu June 22, 2026 18:47 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

tianleiwu marked this pull request as draft June 24, 2026 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Cuda Plugin EP integration#2236

[Doc] Cuda Plugin EP integration#2236
tianleiwu wants to merge 1 commit into
mainfrom
tlwu/cuda_plug_ep_integration

tianleiwu commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tianleiwu commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants