Bug Description
load_pdf() branches on if page_range: rather than page_range is not None. An explicitly empty selection such as [] or range(0) falls into the else branch and is replaced with list(range(last_page)) — processing the entire PDF instead of zero pages.
Programmatic callers using load_pdf(), load_from_file(), load_from_folder(), or CLILoader with a precomputed empty page selection get the opposite of what they asked for.
Root Cause
surya/input/load.py L19-27:
def load_pdf(pdf_path, page_range: List[int] | None = None, dpi=settings.IMAGE_DPI):
doc = open_pdf(pdf_path)
last_page = len(doc)
if page_range: # ← [] is falsy, falls through!
assert all([0 <= page < last_page for page in page_range])
else:
page_range = list(range(last_page)) # ← processes ALL pages
Steps to Reproduce
from surya.input.load import load_pdf
# User explicitly requests zero pages
images, names = load_pdf("document.pdf", page_range=[])
print(len(images)) # Expected: 0, Actual: len(document) — all pages!
Expected Behavior
page_range=[] should result in no pages being processed, or should raise a clear ValueError.
Suggested Fix
if page_range is not None: # distinguish "not provided" from "empty"
if len(page_range) == 0:
return [], [] # or raise ValueError("Empty page range")
assert all([0 <= page < last_page for page in page_range])
else:
page_range = list(range(last_page))
Bug Description
load_pdf()branches onif page_range:rather thanpage_range is not None. An explicitly empty selection such as[]orrange(0)falls into theelsebranch and is replaced withlist(range(last_page))— processing the entire PDF instead of zero pages.Programmatic callers using
load_pdf(),load_from_file(),load_from_folder(), orCLILoaderwith a precomputed empty page selection get the opposite of what they asked for.Root Cause
surya/input/load.pyL19-27:Steps to Reproduce
Expected Behavior
page_range=[]should result in no pages being processed, or should raise a clearValueError.Suggested Fix