Hi authors,
Thanks for the great work!
I am very interested in your work and would like to reproduce it! I would like to inquire about how the timestep should be modulated when the student model's input contains tokens with two different noise levels. Should we use the larger timestep for modulation, or should we perform token-wise timestep modulation?
Hi authors,
Thanks for the great work!
I am very interested in your work and would like to reproduce it! I would like to inquire about how the timestep should be modulated when the student model's input contains tokens with two different noise levels. Should we use the larger timestep for modulation, or should we perform token-wise timestep modulation?