Skip to content

unaryop avx512 mask optimization#6098

Merged
nihui merged 1 commit into
Tencent:masterfrom
lfalive:unaryop-avx512-mask-optimization
May 30, 2025
Merged

unaryop avx512 mask optimization#6098
nihui merged 1 commit into
Tencent:masterfrom
lfalive:unaryop-avx512-mask-optimization

Conversation

@lfalive

@lfalive lfalive commented May 30, 2025

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot added the x86 label May 30, 2025
@github-actions

Copy link
Copy Markdown

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 16511232 16503040 -8192 😘
armhf 7369820 7369820 0 😘
aarch64 10775560 10775560 0 😘

@codecov-commenter

codecov-commenter commented May 30, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.70%. Comparing base (7fd167f) to head (1e98457).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6098      +/-   ##
==========================================
+ Coverage   95.59%   95.70%   +0.10%     
==========================================
  Files         827      827              
  Lines      270122   270128       +6     
==========================================
+ Hits       258232   258533     +301     
+ Misses      11890    11595     -295     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui nihui requested a review from Copilot May 30, 2025 11:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the unary operation implementation by leveraging AVX512 mask instructions to handle the remaining elements when processing vectors.

  • Removed obsolete preprocessor guards for SSE2/AVX in the AVX512 code block
  • Added a remainder handling branch using __mmask16 for handling elements not divisible by 16

Comment thread src/layer/x86/unaryop_x86.cpp
@nihui nihui merged commit 2ef954b into Tencent:master May 30, 2025
79 of 81 checks passed
@nihui

nihui commented May 30, 2025

Copy link
Copy Markdown
Member

Thanks for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants