-
-
Notifications
You must be signed in to change notification settings - Fork 562
ONNX Runtime
SD.Next includes support for ONNX Runtime.
--use-directml is currently not available because torch-directml is not released for the latest PyTorch.
This does not prevent use of DmlExecutionProvider.
Set Diffusers pipeline to ONNX Stable Diffusion on the System tab.
The performance depends on the execution provider.
Currently, CUDAExecutionProvider and DmlExecutionProvider are supported.
| Provider | ONNX | Olive | GPU | CPU |
|---|---|---|---|---|
| CPUExecutionProvider | ✅ | ❌ | ❌ | ✅ |
| DmlExecutionProvider | ✅ | ✅ | ✅ | ❌ |
| CUDAExecutionProvider | ✅ | ✅ | ✅ | ❌ |
| ROCMExecutionProvider | ✅ | 🚧 | ✅ | ❌ |
| OpenVINOExecutionProvider | ✅ | ✅ | ✅ | ✅ |
Not recommended.
Enabled by default.
You can select DmlExecutionProvider by installing onnxruntime-directml.
DirectX 12 is required (Windows or WSL).
You can select CUDAExecutionProvider by installing onnxruntime-gpu (it may already be installed).
Olive for ROCm is working in progress.
Under development.
- Models from huggingface
- Hires and second pass (without sdxl refiner)
- .safetensors VAE
- SD Inpaint may not work.
- SD Upscale pipeline is not tested.
- SDXL Refiner does not work. (due to onnxruntime's issue)
I'm getting OnnxStableDiffusionPipeline.__init__() missing 4 required positional arguments: 'vae_encoder', 'vae_decoder', 'text_encoder', and 'unet'
This is usually caused by a broken model cache generated by a failed conversion or Olive run.
Remove the affected cache in models/ONNX/cache.
You can also manage cache from the ONNX tab in the UI (enable it in settings if hidden).
Olive is a hardware-aware model optimization tool that combines compression, optimization, and compilation (from PyPI).
As Olive optimizes the models in ONNX format, you should set up ONNX Runtime first.
- Go to
Systemtab →Compute Settings. - Select
Model,Text EncoderandVAEinCompile Model. - Set
Model compile backendtoolive-ai.
Olive-specific settings are under Olive in Compute Settings.
Run these commands using PowerShell.
.\venv\Scripts\activate
pip uninstall torch-directml
pip install torch torchvision --upgrade
pip install onnxruntime-directml
.\webui.batModel optimization occurs automatically before generation.
Supported model inputs include .safetensors, .ckpt, and Diffusers pretrained models.
Optimization time depends on your system and execution provider.
The optimized models are automatically cached and used later to create images of the same size (height and width).
If your system does not have enough memory for local optimization, or you want to skip local optimization, download an optimized model from Hugging Face.
Go to Models → Huggingface tab and download optimized model.
TBA
| Property | Value |
|---|---|
| Prompt | a castle, best quality |
| Negative Prompt | worst quality |
| Sampler | Euler |
| Sampling Steps | 20 |
| Device | RX 7900 XTX 24GB |
| Version | olive-ai(0.4.0) onnxruntime-directml(1.16.3) ROCm(5.6) torch(olive: 2.1.2, rocm: 2.1.0) |
| Model | runwayml/stable-diffusion-v1-5 (ROCm), lshqqytiger/stable-diffusion-v1-5-olive (Olive) |
| Precision | fp16 |
| Token Merging | Olive(0, not supported) ROCm(0.5) |
| Olive with DmlExecutionProvider | ROCm |
|---|---|
![]() |
![]() |
- The generation is faster.
- Uses less graphics memory.
- Optimization is required for every models and image sizes.
- Some features are unavailable.
After activating python venv, run this command and try again:
(venv) $ pip uninstall onnxruntime onnxruntime-... -y
