Skip to content

support npu multinomial#10668

Merged
ShawnXuan merged 2 commits into
masterfrom
support_npu_multinomial
Aug 20, 2025
Merged

support npu multinomial#10668
ShawnXuan merged 2 commits into
masterfrom
support_npu_multinomial

Conversation

@ShawnXuan

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

Copy link
Copy Markdown
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor
Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.2ms (= 4323.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 58.1ms (= 5810.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.34 (= 58.1ms / 43.2ms)

OneFlow resnet50 time: 26.5ms (= 2646.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3743.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.4ms / 26.5ms)

OneFlow resnet50 time: 17.8ms (= 3552.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.3ms (= 7064.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.99 (= 35.3ms / 17.8ms)

OneFlow resnet50 time: 15.6ms (= 3120.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.1ms (= 6421.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 2.06 (= 32.1ms / 15.6ms)

OneFlow resnet50 time: 15.0ms (= 3007.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.5ms (= 5897.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.96 (= 29.5ms / 15.0ms)

OneFlow swin dataloader time: 0.212s (= 42.444s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.684s / 200, num_workers=1)
Relative speed: 0.605 (= 0.128s / 0.212s)

OneFlow swin dataloader time: 0.057s (= 11.372s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.468s / 200, num_workers=4)
Relative speed: 0.569 (= 0.032s / 0.057s)

OneFlow swin dataloader time: 0.030s (= 5.974s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.388s / 200, num_workers=8)
Relative speed: 0.567 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 52.1ms (= 5210.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.4ms (= 6644.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 66.4ms / 52.1ms)

OneFlow resnet50 time: 37.2ms (= 3722.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 52.6ms (= 5255.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.41 (= 52.6ms / 37.2ms)

OneFlow resnet50 time: 27.9ms (= 5576.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 42.1ms (= 8425.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 42.1ms / 27.9ms)

OneFlow resnet50 time: 27.0ms (= 5397.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.6ms (= 8113.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.50 (= 40.6ms / 27.0ms)

OneFlow resnet50 time: 27.0ms (= 5407.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.8ms (= 7362.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 36.8ms / 27.0ms)

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor
Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.7ms (= 4370.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 5729.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.31 (= 57.3ms / 43.7ms)

OneFlow resnet50 time: 26.3ms (= 2626.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.3ms (= 3834.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.46 (= 38.3ms / 26.3ms)

OneFlow resnet50 time: 19.2ms (= 3844.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.0ms (= 7209.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.88 (= 36.0ms / 19.2ms)

OneFlow resnet50 time: 17.6ms (= 3512.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 33.6ms (= 6727.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.92 (= 33.6ms / 17.6ms)

OneFlow resnet50 time: 17.4ms (= 3483.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.4ms (= 5674.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.63 (= 28.4ms / 17.4ms)

OneFlow swin dataloader time: 0.200s (= 40.030s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.397s / 200, num_workers=1)
Relative speed: 0.634 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.055s (= 11.030s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.565s / 200, num_workers=4)
Relative speed: 0.595 (= 0.033s / 0.055s)

OneFlow swin dataloader time: 0.030s (= 6.038s / 200, num_workers=8)
PyTorch swin dataloader time: 0.016s (= 3.285s / 200, num_workers=8)
Relative speed: 0.544 (= 0.016s / 0.030s)

❌ OneFlow resnet50 time: 49.4ms (= 4944.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.9ms (= 6594.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 65.9ms / 49.4ms)

OneFlow resnet50 time: 36.7ms (= 3666.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.4ms (= 4637.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 46.4ms / 36.7ms)

OneFlow resnet50 time: 27.8ms (= 5560.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.9ms (= 7782.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 38.9ms / 27.8ms)

OneFlow resnet50 time: 25.3ms (= 5053.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.8ms (= 7758.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 38.8ms / 25.3ms)

OneFlow resnet50 time: 24.8ms (= 4954.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.8ms (= 7158.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 35.8ms / 24.8ms)

@ShawnXuan ShawnXuan merged commit 25c8978 into master Aug 20, 2025
20 checks passed
@ShawnXuan ShawnXuan deleted the support_npu_multinomial branch August 20, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants