Hi, thank you for releasing this great model.
I tried to evaluate your model on olmocr-bench, but I was not able to reproduce the reported 83.1 score.
In my experiments, I obtained the following results:
chandra_original_md : Average Score: 79.6% ± 1.0% (average of per-JSONL scores)
arxiv_math.jsonl : 79.0% (2311/2927 tests)
baseline : 97.6% (1361/1394 tests)
headers_footers.jsonl : 91.7% (697/760 tests)
long_tiny_text.jsonl : 82.8% (366/442 tests)
multi_column.jsonl : 76.2% (674/884 tests)
old_scans.jsonl : 48.1% (253/526 tests)
old_scans_math.jsonl : 77.7% (356/458 tests)
table_tests.jsonl : 84.0% (858/1022 tests)
I implemented parse_markdown and then ran the model using the official olmocr-bench evaluation code.
Could you please share more details about the exact evaluation settings you used to obtain the 83.1 score?
Any additional details would be very helpful.
Thank you in advance for your help.
Hi, thank you for releasing this great model.
I tried to evaluate your model on olmocr-bench, but I was not able to reproduce the reported 83.1 score.
In my experiments, I obtained the following results:
chandra_original_md : Average Score: 79.6% ± 1.0% (average of per-JSONL scores)
I implemented parse_markdown and then ran the model using the official olmocr-bench evaluation code.
Could you please share more details about the exact evaluation settings you used to obtain the 83.1 score?
Any additional details would be very helpful.
Thank you in advance for your help.