Skip to content

optimize inference performance#21

Open
zhuxiaoxuhit wants to merge 2 commits into
meituan-longcat:mainfrom
zhuxiaoxuhit:optimize-inference
Open

optimize inference performance#21
zhuxiaoxuhit wants to merge 2 commits into
meituan-longcat:mainfrom
zhuxiaoxuhit:optimize-inference

Conversation

@zhuxiaoxuhit

Copy link
Copy Markdown
  • odeint_euler: only keep final state instead of stacking full trajectory, saves O(steps * batch * seq * dim) memory
  • clone x in ode function to avoid mutating integrator state in-place
  • fix _project() using .to(device_type) which loses cuda device index on multi-gpu
  • move einops import to module level
  • use nn.Sequential for attention to_out
  • reuse encode_prompt_audio() result to avoid duplicate vae encoding and duplicate audio loading in inference scripts
  • add prompt_latent / prompt_duration_frames params to forward() for pre-encoded prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant