feat: add RainFusion sparse attention for wan2.2.#1867
Conversation
…r Wan2.2 Replace RainFusion V3 with V2 block sparse attention using frame-pairing approach and aclnnRainFusionAttention kernel. Key changes: - Add npu_rain_fusion_attention wrapping aclnnRainFusionAttention - Add npu_block_sparse_attention support - Refactor V2 rearrange/inverse using merge B*N approach, split into two stages to avoid >8 dim tensors on NPU - Cat QKV before rearrange for 1 call instead of 3 - Fix inverse rearrange output order (first_frame before rest) - Fix transpose selectIdx per aclnn spec, capture N/D dims before reshape - Perf: explicit contiguous before to_flat/from_flat, use view not reshape - Simplify RainFusion V2 config, remove V3-only flags - Add roundtrip check (rearrange + inverse) to verify correctness - Add dit flags into global_config.h
There was a problem hiding this comment.
Code Review
This pull request introduces RainFusionV2 block-wise sparse attention for the Wan video generation model, adding new NPU kernels (npu_block_sparse_attention and npu_rain_fusion_attention), configuration flags, and integration into the transformer and pipeline. Feedback on the changes includes removing a leftover debug statement containing an absolute local path in the pipeline, eliminating redundant optional copies in the block sparse attention kernel, and resolving an anonymous namespace anti-pattern in the rainfusion_attention.h header. Additionally, several style guide violations need to be addressed, specifically replacing at:: namespace usage with torch:: and replacing functional/C-style casts with static_cast or appropriate literals.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
a. rainfusion_enabled , bool, whethen enable rainfusion attention or not.
b. rainfusion_sparsity , double 0.5~0.9 ; with value higher, faster and lower percision.
c. rainfusion_pool_size , int , block size default 128.
d. rainfusion_sparse_start_step , int, decide which step to start use sparsity ; with higher value faster and lower percision.