optimize inference performance by zhuxiaoxuhit · Pull Request #21 · meituan-longcat/LongCat-AudioDiT

zhuxiaoxuhit · 2026-04-10T07:10:09Z

odeint_euler: only keep final state instead of stacking full trajectory, saves O(steps * batch * seq * dim) memory
clone x in ode function to avoid mutating integrator state in-place
fix _project() using .to(device_type) which loses cuda device index on multi-gpu
move einops import to module level
use nn.Sequential for attention to_out
reuse encode_prompt_audio() result to avoid duplicate vae encoding and duplicate audio loading in inference scripts
add prompt_latent / prompt_duration_frames params to forward() for pre-encoded prompt

zhuxiaoxuhit added 2 commits April 10, 2026 14:43

fix ode memory waste, in-place mutation and multi-gpu device bug

5d1854b

avoid duplicate vae encoding in inference

635857d

Provide feedback