Skip to content

fix: expose do_reading_order and force_backend_text pipeline options#117

Open
Frank-Schruefer wants to merge 1 commit intodocling-project:mainfrom
Frank-Schruefer:fix/expose-do-reading-order-force-backend-text
Open

fix: expose do_reading_order and force_backend_text pipeline options#117
Frank-Schruefer wants to merge 1 commit intodocling-project:mainfrom
Frank-Schruefer:fix/expose-do-reading-order-force-backend-text

Conversation

@Frank-Schruefer
Copy link
Copy Markdown

Problem

PdfPipelineOptions already supports do_reading_order and force_backend_text, but these options were not reachable via the API:

  • ConvertDocumentsOptions had no corresponding fields
  • manager.py did not pass them through to PdfPipelineOptions

Fix

  • Add do_reading_order field to ConvertDocumentsOptions (default true)
  • Add force_backend_text field to ConvertDocumentsOptions (default false)
  • Wire both fields through in manager.py

Why it matters

do_reading_order=false is useful for PDFs where the reading-order predictor produces incorrect results (e.g. scanned PDFs with a native text layer and many small orphan clusters). force_backend_text is useful for PDFs with reliable programmatic text layers where layout-model text detection is unnecessary.

PdfPipelineOptions already supports do_reading_order and force_backend_text,
but ConvertDocumentsOptions had no corresponding fields and manager.py did
not pass them through to the pipeline. Add the missing fields and wire them
up so callers can control reading-order prediction and native text extraction
via the API.

Signed-off-by: Frank Schruefer <frank.schruefer@t-online.de>
Signed-off-by: stone <frank.schruefer@t-online.de>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

DCO Check Passed

Thanks @Frank-Schruefer, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 5, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant