Question bout Autoregressive Diffusion Training

I am working on an action-controlled Wan2.2 TI2V (Image-to-Video) model. When converting the model from bidirectional flow to an autoregressive teacher for distillation, I encountered the following issue:
**Training predictions:** Look normal and reasonable.
**Inference results:** Are significantly worse, with poor quality.
I have attached the following for reference:

Training prediction samples

https://github.com/user-attachments/assets/9c1c27f0-4bad-454e-96df-5659f8267790

Inference output samples:

https://github.com/user-attachments/assets/0424b2f8-9ad9-4e77-a442-435b58d07bf8

Training loss curves
<img width="1197" height="297" alt="Image" src="https://github.com/user-attachments/assets/b74f4faf-2ba9-490d-bf7d-fef431072a77" />
What could be causing this discrepancy between training and inference quality?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question bout Autoregressive Diffusion Training #45

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question bout Autoregressive Diffusion Training #45

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions