Skip to content

An empty page_range is treated as 'all pages' instead of 'no pages' #497

@YizukiAme

Description

@YizukiAme

Bug Description

load_pdf() branches on if page_range: rather than page_range is not None. An explicitly empty selection such as [] or range(0) falls into the else branch and is replaced with list(range(last_page)) — processing the entire PDF instead of zero pages.

Programmatic callers using load_pdf(), load_from_file(), load_from_folder(), or CLILoader with a precomputed empty page selection get the opposite of what they asked for.

Root Cause

surya/input/load.py L19-27:

def load_pdf(pdf_path, page_range: List[int] | None = None, dpi=settings.IMAGE_DPI):
    doc = open_pdf(pdf_path)
    last_page = len(doc)

    if page_range:      # ← [] is falsy, falls through!
        assert all([0 <= page < last_page for page in page_range])
    else:
        page_range = list(range(last_page))  # ← processes ALL pages

Steps to Reproduce

from surya.input.load import load_pdf

# User explicitly requests zero pages
images, names = load_pdf("document.pdf", page_range=[])
print(len(images))  # Expected: 0, Actual: len(document) — all pages!

Expected Behavior

page_range=[] should result in no pages being processed, or should raise a clear ValueError.

Suggested Fix

if page_range is not None:  # distinguish "not provided" from "empty"
    if len(page_range) == 0:
        return [], []  # or raise ValueError("Empty page range")
    assert all([0 <= page < last_page for page in page_range])
else:
    page_range = list(range(last_page))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions