vulkan int8 kernels#6751
Conversation
Implement Vulkan int8 paths for convolution, convolutiondepthwise, innerproduct, and gemm with scalar fallback and integer dot product shader branches. Add goal.md and enable existing int8 unit tests for Vulkan coverage.
|
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6751 +/- ##
==========================================
- Coverage 95.49% 94.79% -0.70%
==========================================
Files 946 946
Lines 410295 413288 +2993
==========================================
- Hits 391791 391777 -14
- Misses 18504 21511 +3007 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae1231abec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| const int elempack = opt.use_packing_layout && num_input % 4 == 0 ? 4 : 1; | ||
|
|
||
| if (!bottom_is_int8) |
There was a problem hiding this comment.
Repack existing int8 convolution inputs
When the input is already an 8-bit VkMat, this path skips the quantize helper that would create the num_input % 4 == 0 ? 4 : 1 layout, and it never converts bottom.elempack to the elempack specialization used by the int8 convolution shaders. For an int8 pack1 producer feeding a convolution whose input channel count is divisible by 4, the channel-count check still passes, but convolution_packed_int8.comp takes its elempack == 4 branch and reads the pack1 buffer through the bottom_blob_4 binding. Mirror the depthwise path by converting already-int8 inputs to the expected pack before dispatch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f11fe82ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| int out_elempack = 1; | ||
| if (use_cooperative_matrix) |
There was a problem hiding this comment.
Honor output_elempack without cooperative matrix
When int8 Gemm runs the non-cooperative path (for example on devices without int8 cooperative matrix support, or when cooperative matrices are disabled) and the model sets output_elempack to 4, this leaves out_elempack at 1 and allocates a scalar output even though the caller requested pack4. The regular Vulkan Gemm applies output_elempack before allocation, while gemm_int8.comp only writes a scalar sfp buffer, so downstream layers/shapes that expect the requested pack4 layout receive the wrong layout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12dcacfc4e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1); | ||
|
|
||
| buffer_st4(top_blob_data, gx / 4, sumfp32); |
There was a problem hiding this comment.
Guard tail output lanes for scalar int8 InnerProduct
When the non-GEMM int8 InnerProduct path has num_output not divisible by 4 and the sum8 path is not used, forward_int8 allocates top_blob with out_elempack == 1, but this shader still dispatches ceil(num_output/4) groups and writes a full sfpvec4 for the last group. For example num_output == 5 allocates 5 scalar outputs while the second invocation writes lanes 4..7, overrunning or corrupting data beyond the VkMat; the tail needs scalar/bounded stores or the output must be allocated as padded pack4.
Useful? React with 👍 / 👎.
|
|
||
| sumfp32 = activation_afpvec4(sumfp32, activation_type, activation_param_0, activation_param_1); | ||
|
|
||
| buffer_st4(top_blob_data, gx, sumfp32); |
There was a problem hiding this comment.
Guard tail output lanes in sum8 reduction
When the large-input int8 InnerProduct uses the sum8 path and num_output is not divisible by 4, top_blob is still allocated as scalar (out_elempack == 1), but the reducer writes one full sfpvec4 for each rounded output group. A case like num_output == 5 makes the final reducer invocation store four lanes even though only one output remains, so the shader writes past the scalar output buffer; mirror the tail handling here or allocate a padded pack4 output.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 520fd75d5f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf9019e11d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9c21e857a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
No description provided.