Description
I found inconsistent behaviors in table recognition results between OCRFast and OCRHighQuality in uniparser-tools.
When using:
OCRFast
- the table reconstruction requires combining:
placeholders
contents
structure
OCRHighQuality
- the
structure field already contains the final table data directly
This causes inconsistent downstream parsing logic because the output schema/semantics are different between the two OCR modes.
Expected Behavior
Ideally, both OCR modes should provide a consistent table structure format.
For example:
- either both return normalized table structures requiring reconstruction
- or both return fully reconstructed table data
A unified output format would make integration and downstream processing much easier.
Actual Behavior
OCRFast
Need to reconstruct table data manually from:
{
"placeholders": ...,
"contents": ...,
"structure": ...
}
Description
I found inconsistent behaviors in table recognition results between
OCRFastandOCRHighQualityinuniparser-tools.When using:
OCRFastplaceholderscontentsstructureOCRHighQualitystructurefield already contains the final table data directlyThis causes inconsistent downstream parsing logic because the output schema/semantics are different between the two OCR modes.
Expected Behavior
Ideally, both OCR modes should provide a consistent table structure format.
For example:
A unified output format would make integration and downstream processing much easier.
Actual Behavior
OCRFast
Need to reconstruct table data manually from:
{ "placeholders": ..., "contents": ..., "structure": ... }