CUDA OUT OF MEMORY on Example inference code

First of all, thanks for the great work!
I'd like to test the model without the flash attention, using sdpa instead.
When I run the inference code in README, it makes OOM error.
Though I changed the max_frames on conversation content 180 to 16, the model still tries to allocate huge memory such as 92.4GiB.
Does anyone know how to solve it?