Hi,
Thank you for the great work.
I'm using audio_visual branch with VideoLLaMA2.1-7B-AV, Audio Encoder: Fine-tuned BEATs_iter3+(AS2M)(cpt2, video encoder: siglip-so400m-patch14-384.
The result found: Overall Accuracy: 65.75% Total Responses: 9185
Which is significantly lower than the reported result 80.9.
Please give me some suggestions on how to reproduce the results. Or what I'm doing incorrectly.
Hi,
Thank you for the great work.
I'm using
audio_visualbranch with VideoLLaMA2.1-7B-AV, Audio Encoder: Fine-tuned BEATs_iter3+(AS2M)(cpt2, video encoder: siglip-so400m-patch14-384.The result found:
Overall Accuracy: 65.75% Total Responses: 9185Which is significantly lower than the reported result 80.9.
Please give me some suggestions on how to reproduce the results. Or what I'm doing incorrectly.