Hello,
My name is Gangmin Choi, an undergraduate student at Kyung Hee University in Korea. After reading the LLAVIDAL paper, I tried to reproduce the results on MSVD, MSRVTT, and ActivityNet QA using your model, in order to compare it fairly with VideoChatGPT under the same baseline settings.
While evaluating VideoChatGPT on MSVD, MSRVTT, and ActivityNet QA, I consistently observed higher performance than what was reported in the original paper.
The results I obtained using your model are as follows:
ActivityNet QA: 40.68
MSVD QA: 68.74
MSRVTT QA: 57.19
Above msvd and msrvtt is testset. I also tried validation set, but it's also similar as testset
(At val - MSVD : 68.1, MSRVTT - 57.1)
For the experiments, I downloaded and used the following models from Hugging Face:
Model: models--mmaaz60--LLaVA-7B-Lightening-v1-1
Projection path: models--MBZUAI--Video-ChatGPT-7B
I also used the GPT-3.5-Turbo API for evaluation. However, I am not sure if there is an issue with my setup or if I might be missing something important.
May I ask if you have recently evaluated the model’s performance on these benchmarks? Any guidance or clarification would be greatly appreciated.
Hello,
My name is Gangmin Choi, an undergraduate student at Kyung Hee University in Korea. After reading the LLAVIDAL paper, I tried to reproduce the results on MSVD, MSRVTT, and ActivityNet QA using your model, in order to compare it fairly with VideoChatGPT under the same baseline settings.
While evaluating VideoChatGPT on MSVD, MSRVTT, and ActivityNet QA, I consistently observed higher performance than what was reported in the original paper.
The results I obtained using your model are as follows:
ActivityNet QA: 40.68
MSVD QA: 68.74
MSRVTT QA: 57.19
Above msvd and msrvtt is testset. I also tried validation set, but it's also similar as testset
(At val - MSVD : 68.1, MSRVTT - 57.1)
For the experiments, I downloaded and used the following models from Hugging Face:
Model: models--mmaaz60--LLaVA-7B-Lightening-v1-1
Projection path: models--MBZUAI--Video-ChatGPT-7B
I also used the GPT-3.5-Turbo API for evaluation. However, I am not sure if there is an issue with my setup or if I might be missing something important.
May I ask if you have recently evaluated the model’s performance on these benchmarks? Any guidance or clarification would be greatly appreciated.