vulkan int8 kernels by nihui · Pull Request #6751 · Tencent/ncnn

nihui · 2026-05-28T02:35:02Z

No description provided.

Implement Vulkan int8 paths for convolution, convolutiondepthwise, innerproduct, and gemm with scalar fallback and integer dot product shader branches. Add goal.md and enable existing int8 unit tests for Vulkan coverage.

tencent-adm · 2026-05-28T02:35:18Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-05-28T02:37:53Z

Codecov Report

❌ Patch coverage is 75.15734% with 750 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.79%. Comparing base (a5392dd) to head (e9c21e8).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/vulkan/convolution_vulkan.cpp	66.40%	387 Missing ⚠️
src/layer/vulkan/gemm_vulkan.cpp	52.81%	268 Missing ⚠️
src/layer/vulkan/padding_vulkan.cpp	86.18%	50 Missing ⚠️
src/gpu.cpp	88.23%	10 Missing ⚠️
src/layer/vulkan/convolutiondepthwise_vulkan.cpp	97.76%	10 Missing ⚠️
src/layer/vulkan/innerproduct_vulkan.cpp	96.96%	9 Missing ⚠️
src/layer/x86/convolution_x86.cpp	55.55%	4 Missing ⚠️
src/layer/arm/convolution_arm.cpp	60.00%	2 Missing ⚠️
src/layer/convolution.cpp	60.00%	2 Missing ⚠️
src/layer/loongarch/convolution_loongarch.cpp	71.42%	2 Missing ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6751      +/-   ##
==========================================
- Coverage   95.49%   94.79%   -0.70%     
==========================================
  Files         946      946              
  Lines      410295   413288    +2993     
==========================================
- Hits       391791   391777      -14     
- Misses      18504    21511    +3007

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae1231abec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T05:10:17Z

+
+    const int elempack = opt.use_packing_layout && num_input % 4 == 0 ? 4 : 1;
+
+    if (!bottom_is_int8)


Repack existing int8 convolution inputs

When the input is already an 8-bit VkMat, this path skips the quantize helper that would create the num_input % 4 == 0 ? 4 : 1 layout, and it never converts bottom.elempack to the elempack specialization used by the int8 convolution shaders. For an int8 pack1 producer feeding a convolution whose input channel count is divisible by 4, the channel-count check still passes, but convolution_packed_int8.comp takes its elempack == 4 branch and reads the pack1 buffer through the bottom_blob_4 binding. Mirror the depthwise path by converting already-int8 inputs to the expected pack before dispatch.

Useful? React with 👍 / 👎.

…kan-int-2

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f11fe82ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T07:29:30Z

+    int out_elempack = 1;
+    if (use_cooperative_matrix)


Honor output_elempack without cooperative matrix

When int8 Gemm runs the non-cooperative path (for example on devices without int8 cooperative matrix support, or when cooperative matrices are disabled) and the model sets output_elempack to 4, this leaves out_elempack at 1 and allocates a scalar output even though the caller requested pack4. The regular Vulkan Gemm applies output_elempack before allocation, while gemm_int8.comp only writes a scalar sfp buffer, so downstream layers/shapes that expect the requested pack4 layout receive the wrong layout.

Useful? React with 👍 / 👎.

This reverts commit 5c7a631.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12dcacfc4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T11:19:15Z

+
+    sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1);
+
+    buffer_st4(top_blob_data, gx / 4, sumfp32);


Guard tail output lanes for scalar int8 InnerProduct

When the non-GEMM int8 InnerProduct path has num_output not divisible by 4 and the sum8 path is not used, forward_int8 allocates top_blob with out_elempack == 1, but this shader still dispatches ceil(num_output/4) groups and writes a full sfpvec4 for the last group. For example num_output == 5 allocates 5 scalar outputs while the second invocation writes lanes 4..7, overrunning or corrupting data beyond the VkMat; the tail needs scalar/bounded stores or the output must be allocated as padded pack4.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-11T11:19:15Z

+
+    sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1);
+
+    buffer_st4(top_blob_data, gx, sumfp32);


Guard tail output lanes in sum8 reduction

When the large-input int8 InnerProduct uses the sum8 path and num_output is not divisible by 4, top_blob is still allocated as scalar (out_elempack == 1), but the reducer writes one full sfpvec4 for each rounded output group. A case like num_output == 5 makes the final reducer invocation store four lanes even though only one output remains, so the shader writes past the scalar output buffer; mirror the tail handling here or allocate a padded pack4 output.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 520fd75d5f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf9019e11d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nihui · 2026-06-17T07:59:41Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9c21e857a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vulkan: add int8 kernels for core layers

afa0a11

Implement Vulkan int8 paths for convolution, convolutiondepthwise, innerproduct, and gemm with scalar fallback and integer dot product shader branches. Add goal.md and enable existing int8 unit tests for Vulkan coverage.

github-actions Bot added core vulkan test labels May 28, 2026

nihui marked this pull request as draft May 28, 2026 02:35

nihui added 6 commits May 28, 2026 16:35

wip

3203318

wip

fd24d89

w

ef9bf88

w

511d385

w

9daec25

doc

48b2b89

github-actions Bot added the doc label May 28, 2026

Merge branch 'master' into vulkan-int-2

fbf4170

nihui mentioned this pull request May 29, 2026

feature plan #6753

Open

6 tasks

w

670ab6b

github-actions Bot added layer x86 labels May 29, 2026

nihui added 2 commits May 29, 2026 16:28

wip

6a7cad2

w

5c1a6c7

github-actions Bot added riscv arm loongarch mips labels Jun 1, 2026

nihui and others added 3 commits June 1, 2026 14:39

w

4bf0cc3

w

e6fc454

apply code-format changes

56479b6

nihui closed this Jun 1, 2026

nihui reopened this Jun 1, 2026

nihui added 2 commits June 11, 2026 12:39

w

cd120be

f

a9dcb3f

nihui force-pushed the vulkan-int-2 branch from c626f96 to a9dcb3f Compare June 11, 2026 05:03

apply code-format changes

ae1231a

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

nihui added 3 commits June 11, 2026 13:24

w

e53d306

Merge branch 'vulkan-int-2' of https://github.com/nihui/ncnn into vul…

c96f47f

…kan-int-2

w

9f11fe8

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

nihui added 4 commits June 11, 2026 16:07

b

ea98740

b

0acc622

w

5c7a631

Revert "w"

12dcacf

This reverts commit 5c7a631.

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

nihui added 4 commits June 11, 2026 19:22

w

1c660ec

w

a0f5504

vulkan: keep int8 convolution bias in fp32

a6d87e8

cc

520fd75

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/layer/vulkan/shader/padding_pack4_int8.comp

nihui and others added 6 commits June 16, 2026 16:55

w

c42faab

apply code-format changes

5a1c4dc

cc

7030978

Merge branch 'master' into vulkan-int-2

14e2d4a

f

a5b3314

f

cf9019e

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/layer/convolution.cpp Outdated

f

e9c21e8

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/gpu.cpp

nihui merged commit f6f734f into Tencent:master Jun 17, 2026
125 of 132 checks passed


		const int elempack = opt.use_packing_layout && num_input % 4 == 0 ? 4 : 1;

		if (!bottom_is_int8)


		sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1);

		buffer_st4(top_blob_data, gx / 4, sumfp32);


		sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1);

		buffer_st4(top_blob_data, gx, sumfp32);

Uh oh!

Conversation

nihui commented May 28, 2026

Uh oh!

tencent-adm commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nihui commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 28, 2026 •

edited

Loading