Skip to content

feat: add RainFusion sparse attention for wan2.2.#1867

Open
ethan686 wants to merge 2 commits into
jd-opensource:mainfrom
ethan686:wan22_dev
Open

feat: add RainFusion sparse attention for wan2.2.#1867
ethan686 wants to merge 2 commits into
jd-opensource:mainfrom
ethan686:wan22_dev

Conversation

@ethan686

@ethan686 ethan686 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator
  1. add two new aclnn op: aclnnRainFusionAttention and aclnnBlockSparseAttention(for future A5, currently not actual used.)
  2. 4 new dit configs:
    a. rainfusion_enabled , bool, whethen enable rainfusion attention or not.
    b. rainfusion_sparsity , double 0.5~0.9 ; with value higher, faster and lower percision.
    c. rainfusion_pool_size , int , block size default 128.
    d. rainfusion_sparse_start_step , int, decide which step to start use sparsity ; with higher value faster and lower percision.
  3. with or without sparsity=0.8 , start_step=15: mindiesd percision compare result is 0.9333, xllm percision compare result is 0.9555

…r Wan2.2

Replace RainFusion V3 with V2 block sparse attention using frame-pairing
approach and aclnnRainFusionAttention kernel. Key changes:

- Add npu_rain_fusion_attention wrapping aclnnRainFusionAttention
- Add npu_block_sparse_attention support
- Refactor V2 rearrange/inverse using merge B*N approach, split into two
  stages to avoid >8 dim tensors on NPU
- Cat QKV before rearrange for 1 call instead of 3
- Fix inverse rearrange output order (first_frame before rest)
- Fix transpose selectIdx per aclnn spec, capture N/D dims before reshape
- Perf: explicit contiguous before to_flat/from_flat, use view not reshape
- Simplify RainFusion V2 config, remove V3-only flags
- Add roundtrip check (rearrange + inverse) to verify correctness
- Add dit flags into global_config.h

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces RainFusionV2 block-wise sparse attention for the Wan video generation model, adding new NPU kernels (npu_block_sparse_attention and npu_rain_fusion_attention), configuration flags, and integration into the transformer and pipeline. Feedback on the changes includes removing a leftover debug statement containing an absolute local path in the pipeline, eliminating redundant optional copies in the block sparse attention kernel, and resolving an anonymous namespace anti-pattern in the rainfusion_attention.h header. Additionally, several style guide violations need to be addressed, specifically replacing at:: namespace usage with torch:: and replacing functional/C-style casts with static_cast or appropriate literals.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread xllm/models/dit/pipelines/pipeline_wan_i2v.h
Comment thread xllm/core/kernels/npu/npu_rain_fusion_attention.cpp Outdated
Comment thread xllm/core/kernels/npu/npu_block_sparse_attention.cpp Outdated
Comment thread xllm/models/dit/utils/rainfusion_attention.h Outdated
Comment thread xllm/models/dit/utils/rainfusion_attention.h Outdated
Comment thread xllm/models/dit/utils/rainfusion_attention.h Outdated
Comment thread xllm/models/dit/utils/rainfusion_attention.h Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant