First of all, thanks for the great work!
I'd like to test the model without the flash attention, using sdpa instead.
When I run the inference code in README, it makes OOM error.
Though I changed the max_frames on conversation content 180 to 16, the model still tries to allocate huge memory such as 92.4GiB.
Does anyone know how to solve it?
First of all, thanks for the great work!
I'd like to test the model without the flash attention, using sdpa instead.
When I run the inference code in README, it makes OOM error.
Though I changed the max_frames on conversation content 180 to 16, the model still tries to allocate huge memory such as 92.4GiB.
Does anyone know how to solve it?