Reproduction gap for MiniCPM-V-4.6 on AI2D_TEST and MMMU_DEV_VAL with VLMEvalKit

Hi, I am trying to reproduce the MiniCPM-V-4.6 results reported in the README using VLMEvalKit, but I observe a noticeable gap.

Model:
- openbmb/MiniCPM-V-4.6

Evaluation tool:
- VLMEvalKit
- Datasets: AI2D_TEST, MMMU_DEV_VAL
- Judge: exact_matching
- Mode: all

Command:
```bash
python run.py \
  --data MMMU_DEV_VAL AI2D_TEST \
  --model MiniCPM-V-4_6 \
  --mode all \
  --judge exact_matching \
  --reuse \
  --verbose

Dataset | Observed
AI2D_TEST | 78.76
MMMU_DEV_VAL | 42.56

Dataset | README
AI2D | 84.2
MMMU | 53.6

I see this warning during generation: Setting pad_token_id to eos_token_id:248044 for open-end generation.

I am not using an official MiniCPM-V-4.6 adapter from upstream VLMEvalKit. I added a local MiniCPM_V_4_6 adapter under vlmeval/vlm/minicpm_v.py, registered as MiniCPM-V-4_6. It inherits MiniCPM_V_4 for prompt building / CoT / answer extraction, and only changes loading/generation to use AutoModelForImageTextToText and AutoProcessor for openbmb/MiniCPM-V-4.6.

Parameters:
downsample_mode = "16x"
max_slice_nums = 36
num_beams = 3
do_sample = False
max_new_tokens = 2048 for CoT datasets
max_new_tokens = 1024 for non-CoT datasets
add_generation_prompt = True
processor_kwargs.images_kwargs.downsample_mode = "16x"
processor_kwargs.images_kwargs.max_slice_nums = 36

Could you please help check whether my evaluation setup is correct?

If there is any mismatch with the official evaluation protocol, could you suggest the recommended way to reproduce the reported AI2D/MMMU results for MiniCPM-V-4.6?

Please let me know if additional logs, prediction files, or environment details would be useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproduction gap for MiniCPM-V-4.6 on AI2D_TEST and MMMU_DEV_VAL with VLMEvalKit #1115

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Reproduction gap for MiniCPM-V-4.6 on AI2D_TEST and MMMU_DEV_VAL with VLMEvalKit #1115

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions