Skip to content

Optimize RVV backend for Deconvolution and QuantizedAdd#4547

Open
jxgxxx wants to merge 1 commit into
alibaba:masterfrom
jxgxxx:rvv-optimization-v2
Open

Optimize RVV backend for Deconvolution and QuantizedAdd#4547
jxgxxx wants to merge 1 commit into
alibaba:masterfrom
jxgxxx:rvv-optimization-v2

Conversation

@jxgxxx

@jxgxxx jxgxxx commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

This PR introduces pure RISC-V Vector (RVV) architectural optimizations for the CPU backend, specifically targeting the Deconvolution (weight transformation MNNDeconvTransformWeightC4) and QuantizedAdd operators.

Performance Benchmarks:

  • Hardware: SG2044 64-Core RISC-V 64-bit
  • OS: PolyOS Server 24.03 LTS
Function / Operator Input Dimensions & Configuration Baseline (C++) RVV Optimized Speedup Correctness
MNNDeconvTransformWeightC4 M/OC=130, K/IC=64, kernel=3x3, area=9, outC4=33 605.405 ms 482.462 ms 1.25x PASS
MNNDeconvTransformWeightC4 M/OC=128, K/IC=64, kernel=3x3, area=9, outC4=32 573.147 ms 462.189 ms 1.24x PASS
CPUQuantizedAdd elements=802816 (1x112x112x64), 1-thread, iter=200 4417.90 ms 194.379 ms 22.73x PASS
CPUQuantizedAdd elements=802816 (1x112x112x64), 4-threads, iter=500 49751.6 ms 22560.6 ms 2.21x PASS

Module

CPU (specifically RVV backend)

Type

  • Feature
  • Bugfix
  • Perf
  • Refact
  • Style
  • Doc
  • Test
  • Chore

Checklist

  • Commit message follows [Module:Type] Description format
  • Code compiles without errors
  • Tested on relevant platform(s)
  • No unrelated format or style changes included

Co-authored-by: jxgxxx <1955992348@qq.com>
Co-authored-by: typer-J <2236066784@qq.com>
Co-authored-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>
Co-authored-by: lyd1992 <liuyudong@iscas.ac.cn>
Co-authored-by: YuanSheng <yuansheng@isrc.iscas.ac.cn>
@wangzhaode wangzhaode self-assigned this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants