Unable to reproduce OlmOCR performance on olmocr-bench

Hi, thank you for releasing this great model.

I tried to evaluate your model on olmocr-bench, but I was not able to reproduce the reported 83.1 score.
In my experiments, I obtained the following results:

chandra_original_md  : Average Score: 79.6% ± 1.0% (average of per-JSONL scores)

        arxiv_math.jsonl              : 79.0% (2311/2927 tests)
        baseline                      : 97.6% (1361/1394 tests)
        headers_footers.jsonl         : 91.7% (697/760 tests)
        long_tiny_text.jsonl          : 82.8% (366/442 tests)
        multi_column.jsonl            : 76.2% (674/884 tests)
        old_scans.jsonl               : 48.1% (253/526 tests)
        old_scans_math.jsonl          : 77.7% (356/458 tests)
        table_tests.jsonl             : 84.0% (858/1022 tests)

I implemented parse_markdown and then ran the model using the official olmocr-bench evaluation code.

Could you please share more details about the exact evaluation settings you used to obtain the 83.1 score?

Any additional details would be very helpful.
Thank you in advance for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce OlmOCR performance on olmocr-bench #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to reproduce OlmOCR performance on olmocr-bench #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions