Evaluation results(activitynetqa, msvd-qa, msrvttqa) is higher than paper table.

Hello,
My name is Gangmin Choi, an undergraduate student at Kyung Hee University in Korea. After reading the LLAVIDAL paper, I tried to reproduce the results on MSVD, MSRVTT, and ActivityNet QA using your model, in order to compare it fairly with VideoChatGPT under the same baseline settings.

While evaluating VideoChatGPT on MSVD, MSRVTT, and ActivityNet QA, I consistently observed higher performance than what was reported in the original paper.

The results I obtained using your model are as follows:

ActivityNet QA: 40.68

MSVD QA: 68.74

MSRVTT QA: 57.19

Above msvd and msrvtt is testset. I also tried validation set, but it's also similar as testset
(At val -  MSVD : 68.1,  MSRVTT - 57.1)  

For the experiments, I downloaded and used the following models from Hugging Face:

Model: models--mmaaz60--LLaVA-7B-Lightening-v1-1

Projection path: models--MBZUAI--Video-ChatGPT-7B

I also used the GPT-3.5-Turbo API for evaluation. However, I am not sure if there is an issue with my setup or if I might be missing something important.

May I ask if you have recently evaluated the model’s performance on these benchmarks? Any guidance or clarification would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation results(activitynetqa, msvd-qa, msrvttqa) is higher than paper table. #135

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluation results(activitynetqa, msvd-qa, msrvttqa) is higher than paper table. #135

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions