Skip to content

Evaluation results(activitynetqa, msvd-qa, msrvttqa) is higher than paper table. #135

@choi-gang-min

Description

@choi-gang-min

Hello,
My name is Gangmin Choi, an undergraduate student at Kyung Hee University in Korea. After reading the LLAVIDAL paper, I tried to reproduce the results on MSVD, MSRVTT, and ActivityNet QA using your model, in order to compare it fairly with VideoChatGPT under the same baseline settings.

While evaluating VideoChatGPT on MSVD, MSRVTT, and ActivityNet QA, I consistently observed higher performance than what was reported in the original paper.

The results I obtained using your model are as follows:

ActivityNet QA: 40.68

MSVD QA: 68.74

MSRVTT QA: 57.19

Above msvd and msrvtt is testset. I also tried validation set, but it's also similar as testset
(At val - MSVD : 68.1, MSRVTT - 57.1)

For the experiments, I downloaded and used the following models from Hugging Face:

Model: models--mmaaz60--LLaVA-7B-Lightening-v1-1

Projection path: models--MBZUAI--Video-ChatGPT-7B

I also used the GPT-3.5-Turbo API for evaluation. However, I am not sure if there is an issue with my setup or if I might be missing something important.

May I ask if you have recently evaluated the model’s performance on these benchmarks? Any guidance or clarification would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions