Skip to content

Inconsistent table structure output between OCRFast and OCRHighQuality #10

Description

@mirinterplay

Description

I found inconsistent behaviors in table recognition results between OCRFast and OCRHighQuality in uniparser-tools.

When using:

  • OCRFast
    • the table reconstruction requires combining:
      • placeholders
      • contents
      • structure
  • OCRHighQuality
    • the structure field already contains the final table data directly

This causes inconsistent downstream parsing logic because the output schema/semantics are different between the two OCR modes.


Expected Behavior

Ideally, both OCR modes should provide a consistent table structure format.

For example:

  • either both return normalized table structures requiring reconstruction
  • or both return fully reconstructed table data

A unified output format would make integration and downstream processing much easier.


Actual Behavior

OCRFast

Need to reconstruct table data manually from:

{
  "placeholders": ...,
  "contents": ...,
  "structure": ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions