Skip to content

Unable to reproduce OlmOCR performance on olmocr-bench #63

@Backdrop9019

Description

@Backdrop9019

Hi, thank you for releasing this great model.

I tried to evaluate your model on olmocr-bench, but I was not able to reproduce the reported 83.1 score.
In my experiments, I obtained the following results:

chandra_original_md : Average Score: 79.6% ± 1.0% (average of per-JSONL scores)

    arxiv_math.jsonl              : 79.0% (2311/2927 tests)
    baseline                      : 97.6% (1361/1394 tests)
    headers_footers.jsonl         : 91.7% (697/760 tests)
    long_tiny_text.jsonl          : 82.8% (366/442 tests)
    multi_column.jsonl            : 76.2% (674/884 tests)
    old_scans.jsonl               : 48.1% (253/526 tests)
    old_scans_math.jsonl          : 77.7% (356/458 tests)
    table_tests.jsonl             : 84.0% (858/1022 tests)

I implemented parse_markdown and then ran the model using the official olmocr-bench evaluation code.

Could you please share more details about the exact evaluation settings you used to obtain the 83.1 score?

Any additional details would be very helpful.
Thank you in advance for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions