Conversation
Summary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis pull request performs extensive structural refactoring across the pdf_craft codebase. Changes include: (1) reorganizing and consolidating imports in Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@pdf_craft/error.py`:
- Around line 27-34: InterruptedError.__init__ declares self._kind but never
assigns it; update InterruptedError to accept a kind: InterruptedKind parameter
(e.g., def __init__(self, metering: OCRTokensMetering, kind: InterruptedKind) ->
None) and store it to self._kind, and then update the caller in
to_interrupted_error to pass the computed kind when constructing
InterruptedError so the instance preserves the interruption kind.
In `@pdf_craft/pdf/__init__.py`:
- Line 4: Replace the wildcard import from .ref with an explicit import of
TITLE_TAGS: change the line "from .ref import *" to import only TITLE_TAGS
(e.g., "from .ref import TITLE_TAGS") and, if you need to control exports from
this package, add or update __all__ to include "TITLE_TAGS" so static analysis
(ruff F403) no longer flags the wildcard import.
In `@pdf_craft/sequence/mark.py`:
- Around line 207-222: The DOUBLE_CIRCLED_NUMBER tuple currently maps (0, "⓵")
which misaligns numeric values with their glyphs; in the
NumberStyle.DOUBLE_CIRCLED_NUMBER mapping update the integer keys to match the
visual digits (i.e., start at 1 for "⓵", 2 for "⓶", etc.) so each tuple pair
correctly associates the number with its corresponding double-circled Unicode
character.
🧹 Nitpick comments (9)
pdf_craft/common/xml.py (1)
21-25: XML parsing security: defusedxml recommended but low priority given input sources.The static analysis tool flags
fromstringas potentially vulnerable to XXE and entity-expansion attacks (Billion-Laughs, quadratic blowup). However, reviewing the codebase shows thatread_xml()is only used for internal project-generated XML files (TOC structure, page metadata, chapter headers) stored locally—not untrusted sources like user uploads or external APIs.While using
defusedxml.ElementTree.fromstringwould be a good security hardening practice and aligns with Python security best practices, it is not critical for this use case given the trusted input boundary. If you decide to adopt defusedxml, ensure it's added as a dependency inpyproject.toml.pdf_craft/common/asset.py (1)
16-34: Well-implemented asset deduplication with proper cleanup.The
clipmethod correctly handles:
- Temp file creation with UUID to avoid collisions
- Hash-based deduplication (skipping if target exists)
- Cleanup on both success and exception paths
Minor style suggestion:
raise ecan be simplified toraisefor cleaner exception re-raising.♻️ Optional simplification
except Exception as e: if temp_path.exists(): temp_path.unlink() - raise e + raisepdf_craft/markdown/render/layouts.py (1)
62-79: Duplicatedrender_memberclosures.The
render_memberclosure is implemented identically in bothrender_paragraph(lines 62-79) and_render_asset(lines 98-115). Consider extracting this into a shared factory function or a standalone helper to reduce duplication.♻️ Suggested refactor
+def _create_member_render(ref_id_to_number: RefIdMap | None) -> _MemberRender: + def render_member(part: BlockMember | str) -> Generator[str, None, None]: + if isinstance(part, str): + yield to_markdown_string( + kind=ExpressionKind.TEXT, + content=part, + ) + elif isinstance(part, InlineExpression): + latex_content = part.content.strip() + if latex_content: + yield to_markdown_string( + kind=part.kind, + content=latex_content, + ) + elif ref_id_to_number and isinstance(part, Reference): + ref_number = ref_id_to_number.get(part.id, 1) + yield "[^" + yield str(ref_number) + yield "]" + return render_memberThen use
render_member = _create_member_render(ref_id_to_number)in both functions.Also applies to: 98-115
pdf_craft/pdf/types.py (1)
78-80: Consider using explicit validation instead of assertions.Assertions can be disabled with
python -O, making these order checks ineffective in optimized production builds. If order consistency is a critical invariant, consider raising an explicitValueErrorinstead.♻️ Suggested refactor
for i, layout in enumerate(page.body_layouts): - assert layout.order == i, ( - f"body_layouts[{i}].order should be {i}, got {layout.order}" - ) + if layout.order != i: + raise ValueError( + f"body_layouts[{i}].order should be {i}, got {layout.order}" + ) body_element.append(_encode_layout(layout))Apply the same pattern for
footnotes_layouts.Also applies to: 86-88
pdf_craft/pdf/ocr.py (1)
148-150: Consider simplifying DPI default handling.The multi-line conditional could be simplified. Since
dpi=0is likely not a valid value, you could use the parameter default directly.♻️ Suggested simplification
image = ref.render( - dpi=dpi - if dpi is not None - else 300, # DPI=300 for scanned page + dpi=dpi or 300, # DPI=300 for scanned page max_image_file_size=max_page_image_file_size, )Alternatively, set the default in the function signature:
dpi: int = 300instead ofdpi: int | None = None.pdf_craft/sequence/mark.py (1)
112-395: Ruff RUF001 warnings are false positives for this file.The static analysis warnings about "ambiguous characters" (Roman numerals, fullwidth digits, mathematical numbers) are intentional. This file specifically maps special Unicode number characters for OCR mark detection. These characters must remain as-is.
Consider adding a
# noqa: RUF001comment at the file level or configuring ruff to ignore this rule for this file.pdf_craft/sequence/chapter.py (2)
338-394: Movedecode_block_memberdefinition outside the loop for efficiency.The nested function
decode_block_memberis defined inside thefor block_el in parent.findall("block")loop (starting at line 308), causing it to be recreated on every iteration. Since it only capturescontext_tagandreferences_mapwhich don't change during iteration, move the definition before the loop.♻️ Suggested refactor
def _decode_block_elements( parent: Element, context_tag: str, references_map: dict[tuple[int, int], Reference] | None = None, ) -> list[BlockLayout]: + def decode_block_member(child: Element) -> BlockMember: + if child.tag == "ref": + ref_id = child.get("id") + if ref_id is None: + raise ValueError( + f"<{context_tag}><block><ref> missing required attribute 'id'" + ) + # ... rest of the function body unchanged ... + elif child.tag == "inline_expr": + # ... unchanged ... + else: + raise ValueError( + f"<{context_tag}><block> contains unknown element: <{child.tag}>" + ) + blocks: list[BlockLayout] = [] for block_el in parent.findall("block"): # ... attribute parsing unchanged ... - def decode_block_member(child: Element) -> BlockMember: - # ... function body ... blocks.append( BlockLayout( # ... unchanged ... ) ) return blocks
468-469: Move the import to the top of the file.The late import of
transform2markinside_decode_reference(line 468) adds unnecessary overhead on each call. Sincemark.pydoes not import fromchapter.py, there is no circular dependency preventing this import from being moved to the top of the file with the other imports.pdf_craft/__init__.py (1)
3-9: Consider renamingInterruptedErrorto avoid shadowing the built-in.
InterruptedErrorshadows Python's built-inInterruptedErrorexception. While the custom exception serves a different purpose (OCR/PDF processing interruption) and has a different signature (requiresmeteringparameter), the naming collision could cause confusion. Users familiar with the built-in exception for I/O operations might inadvertently reference the wrong exception type.Renaming to
OCRInterruptedErrororProcessingInterruptedErrorwould improve clarity without functional impact.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (56)
pdf_craft/__init__.pypdf_craft/common/__init__.pypdf_craft/common/asset.pypdf_craft/common/cv_splitter.pypdf_craft/common/folder.pypdf_craft/common/reader.pypdf_craft/common/statistics.pypdf_craft/common/xml.pypdf_craft/epub/latex_to_text.pypdf_craft/epub/render.pypdf_craft/epub/toc_collection.pypdf_craft/error.pypdf_craft/expression.pypdf_craft/functions.pypdf_craft/language.pypdf_craft/markdown/paragraph/__init__.pypdf_craft/markdown/paragraph/parser.pypdf_craft/markdown/paragraph/render.pypdf_craft/markdown/paragraph/tags.pypdf_craft/markdown/paragraph/types.pypdf_craft/markdown/render/layouts.pypdf_craft/markdown/render/render.pypdf_craft/metering.pypdf_craft/pdf/__init__.pypdf_craft/pdf/handler.pypdf_craft/pdf/ngrams.pypdf_craft/pdf/ocr.pypdf_craft/pdf/page_extractor.pypdf_craft/pdf/page_ref.pypdf_craft/pdf/types.pypdf_craft/sequence/__init__.pypdf_craft/sequence/analyse_level.pypdf_craft/sequence/chapter.pypdf_craft/sequence/content.pypdf_craft/sequence/generation.pypdf_craft/sequence/jointer.pypdf_craft/sequence/mark.pypdf_craft/sequence/reader.pypdf_craft/sequence/reading_serials.pypdf_craft/sequence/reference.pypdf_craft/toc/__init__.pypdf_craft/toc/analysing.pypdf_craft/toc/text.pypdf_craft/toc/toc_levels.pypdf_craft/toc/toc_pages.pypdf_craft/toc/types.pypdf_craft/transform.pyscripts/clean_analysing.pyscripts/gen_epub.pyscripts/gen_md.pytest.pytests/test_cv_splitter.pytests/test_expression.pytests/test_jointer.pytests/test_parser.pytests/test_reading_serials.py
🧰 Additional context used
🧬 Code graph analysis (29)
tests/test_parser.py (2)
pdf_craft/markdown/paragraph/types.py (1)
HTMLTag(11-14)pdf_craft/markdown/paragraph/parser.py (1)
parse_raw_markdown(8-56)
tests/test_reading_serials.py (1)
pdf_craft/pdf/types.py (1)
PageLayout(24-29)
pdf_craft/common/reader.py (1)
pdf_craft/common/xml.py (1)
read_xml(21-25)
pdf_craft/markdown/paragraph/render.py (2)
pdf_craft/language.py (1)
is_chinese_char(5-19)pdf_craft/markdown/paragraph/types.py (1)
HTMLTag(11-14)
pdf_craft/markdown/paragraph/__init__.py (2)
pdf_craft/markdown/paragraph/tags.py (5)
HTMLTagDefinition(53-58)is_protocol_allowed(547-561)is_tag_filtered(539-540)is_tag_ignored(543-544)tag_definition(535-536)pdf_craft/markdown/paragraph/types.py (3)
HTMLTag(11-14)decode(25-49)encode(52-75)
pdf_craft/markdown/paragraph/parser.py (2)
pdf_craft/markdown/paragraph/tags.py (4)
is_protocol_allowed(547-561)is_tag_filtered(539-540)is_tag_ignored(543-544)tag_definition(535-536)pdf_craft/markdown/paragraph/types.py (1)
HTMLTag(11-14)
pdf_craft/sequence/jointer.py (7)
pdf_craft/expression.py (3)
ExpressionKind(6-11)ParsedItem(15-20)parse_latex_expressions(68-193)pdf_craft/language.py (1)
is_latin_letter(1-2)pdf_craft/markdown/paragraph/parser.py (1)
parse_raw_markdown(8-56)pdf_craft/pdf/types.py (1)
PageLayout(24-29)pdf_craft/sequence/chapter.py (4)
AssetLayout(50-57)BlockLayout(61-65)InlineExpression(28-30)ParagraphLayout(21-24)pdf_craft/sequence/content.py (3)
expand_text_in_content(42-56)first(9-16)last(19-26)pdf_craft/sequence/reading_serials.py (1)
split_reading_serials(22-69)
pdf_craft/sequence/analyse_level.py (2)
pdf_craft/sequence/chapter.py (1)
Chapter(14-17)pdf_craft/common/cv_splitter.py (1)
split_by_cv(47-75)
pdf_craft/sequence/generation.py (10)
pdf_craft/common/xml.py (1)
save_xml(28-40)pdf_craft/pdf/types.py (3)
Page(14-20)decode(44-67)encode(70-91)pdf_craft/sequence/chapter.py (7)
decode(85-115)AssetLayout(50-57)Chapter(14-17)ParagraphLayout(21-24)Reference(34-42)encode(118-144)id(41-42)pdf_craft/toc/types.py (5)
decode(50-93)Toc(15-20)TocInfo(9-11)iter_toc(23-26)encode(29-47)pdf_craft/sequence/analyse_level.py (1)
analyse_chapter_internal_levels(10-24)pdf_craft/sequence/content.py (2)
expand_text_in_content(42-56)join_texts_in_content(29-39)pdf_craft/sequence/jointer.py (1)
Jointer(77-292)pdf_craft/sequence/mark.py (2)
Mark(36-55)search_marks(83-89)pdf_craft/sequence/reference.py (2)
References(11-126)page_index(26-27)pdf_craft/pdf/page_ref.py (1)
page_index(83-84)
pdf_craft/toc/toc_pages.py (3)
pdf_craft/language.py (1)
is_latin_letter(1-2)pdf_craft/toc/text.py (1)
normalize_text(364-370)pdf_craft/pdf/page_ref.py (2)
PageRef(73-121)page_index(83-84)
pdf_craft/markdown/render/render.py (4)
pdf_craft/metering.py (1)
check_aborted(8-12)pdf_craft/sequence/chapter.py (3)
Reference(34-42)references_to_map(78-82)search_references_in_chapter(68-75)pdf_craft/sequence/reader.py (1)
create_chapters_reader(8-26)pdf_craft/markdown/render/layouts.py (1)
render_layouts(21-50)
pdf_craft/sequence/reference.py (3)
pdf_craft/sequence/chapter.py (4)
AssetLayout(50-57)BlockLayout(61-65)ParagraphLayout(21-24)Reference(34-42)pdf_craft/sequence/mark.py (2)
Mark(36-55)transform2mark(74-80)pdf_craft/pdf/page_ref.py (1)
page_index(83-84)
scripts/gen_md.py (5)
pdf_craft/pdf/ocr.py (1)
OCREventKind(20-26)pdf_craft/functions.py (1)
transform_markdown(25-69)pdf_craft/transform.py (1)
transform_markdown(43-109)pdf_craft/pdf/page_ref.py (1)
page_index(83-84)pdf_craft/sequence/reference.py (1)
page_index(26-27)
pdf_craft/epub/toc_collection.py (1)
pdf_craft/toc/types.py (2)
Toc(15-20)decode(50-93)
tests/test_expression.py (1)
pdf_craft/expression.py (2)
ExpressionKind(6-11)parse_latex_expressions(68-193)
pdf_craft/pdf/handler.py (5)
pdf_craft/error.py (1)
PDFError(6-9)pdf_craft/pdf/types.py (1)
PDFDocumentMetadata(33-41)pdf_craft/pdf/page_ref.py (2)
pages_count(41-43)page_index(83-84)pdf_craft/pdf/ocr.py (1)
metadata(60-65)pdf_craft/sequence/reference.py (2)
page_index(26-27)get(29-30)
pdf_craft/pdf/types.py (1)
pdf_craft/common/xml.py (1)
indent(5-18)
pdf_craft/sequence/reading_serials.py (3)
pdf_craft/common/statistics.py (1)
avg(4-14)pdf_craft/common/cv_splitter.py (2)
split_by_cv(47-75)size(26-34)pdf_craft/pdf/types.py (1)
PageLayout(24-29)
pdf_craft/sequence/reader.py (3)
pdf_craft/common/reader.py (1)
XMLReader(11-37)pdf_craft/common/xml.py (1)
read_xml(21-25)pdf_craft/sequence/chapter.py (2)
Chapter(14-17)decode(85-115)
scripts/gen_epub.py (4)
pdf_craft/pdf/ocr.py (1)
OCREventKind(20-26)pdf_craft/functions.py (1)
transform_epub(72-124)pdf_craft/transform.py (1)
transform_epub(111-185)scripts/gen_md.py (1)
_format_duration(29-38)
pdf_craft/error.py (1)
pdf_craft/metering.py (2)
InterruptedKind(21-23)OCRTokensMetering(16-18)
pdf_craft/transform.py (8)
pdf_craft/common/folder.py (1)
EnsureFolder(5-24)pdf_craft/error.py (3)
PDFError(6-9)is_inline_error(19-20)to_interrupted_error(37-53)pdf_craft/pdf/page_ref.py (1)
render(86-111)pdf_craft/metering.py (1)
OCRTokensMetering(16-18)pdf_craft/pdf/ocr.py (1)
OCR(40-244)pdf_craft/sequence/generation.py (1)
generate_chapter_files(22-40)pdf_craft/to_path.py (1)
to_path(5-9)pdf_craft/toc/analysing.py (1)
analyse_toc(16-24)
pdf_craft/sequence/content.py (2)
pdf_craft/markdown/paragraph/types.py (1)
HTMLTag(11-14)pdf_craft/sequence/generation.py (1)
expand(172-180)
pdf_craft/sequence/chapter.py (5)
pdf_craft/common/xml.py (1)
indent(5-18)pdf_craft/expression.py (3)
ExpressionKind(6-11)decode_expression_kind(36-48)encode_expression_kind(23-33)pdf_craft/markdown/paragraph/types.py (3)
HTMLTag(11-14)decode(25-49)encode(52-75)pdf_craft/sequence/reference.py (2)
page_index(26-27)get(29-30)pdf_craft/sequence/mark.py (1)
transform2mark(74-80)
pdf_craft/markdown/render/layouts.py (3)
pdf_craft/expression.py (2)
ExpressionKind(6-11)to_markdown_string(51-65)pdf_craft/sequence/chapter.py (4)
AssetLayout(50-57)InlineExpression(28-30)ParagraphLayout(21-24)Reference(34-42)pdf_craft/markdown/paragraph/render.py (1)
render_markdown_paragraph(7-16)
pdf_craft/pdf/__init__.py (3)
pdf_craft/pdf/handler.py (4)
DefaultPDFDocument(47-182)DefaultPDFHandler(31-41)PDFDocument(13-23)PDFHandler(27-28)pdf_craft/pdf/page_ref.py (1)
pdf_pages_count(11-27)pdf_craft/pdf/types.py (3)
Page(14-20)PageLayout(24-29)PDFDocumentMetadata(33-41)
pdf_craft/toc/text.py (1)
pdf_craft/language.py (1)
is_latin_letter(1-2)
pdf_craft/pdf/ocr.py (8)
pdf_craft/common/asset.py (1)
AssetHub(12-41)pdf_craft/error.py (2)
OCRError(12-16)PDFError(6-9)pdf_craft/metering.py (1)
check_aborted(8-12)pdf_craft/to_path.py (1)
to_path(5-9)pdf_craft/pdf/handler.py (2)
DefaultPDFHandler(31-41)PDFHandler(27-28)pdf_craft/pdf/types.py (4)
Page(14-20)PageLayout(24-29)PDFDocumentMetadata(33-41)encode(70-91)pdf_craft/pdf/page_extractor.py (1)
PageExtractorNode(15-180)pdf_craft/pdf/page_ref.py (1)
PageRefContext(30-66)
pdf_craft/pdf/page_ref.py (4)
pdf_craft/error.py (1)
PDFError(6-9)pdf_craft/pdf/handler.py (3)
DefaultPDFHandler(31-41)PDFDocument(13-23)PDFHandler(27-28)pdf_craft/toc/toc_pages.py (1)
PageRef(18-21)pdf_craft/sequence/reference.py (1)
page_index(26-27)
🪛 Ruff (0.14.11)
pdf_craft/toc/toc_levels.py
42-42: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
124-124: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
150-150: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
tests/test_reading_serials.py
168-168: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
pdf_craft/sequence/jointer.py
22-22: String contains ambiguous ! (FULLWIDTH EXCLAMATION MARK). Did you mean ! (EXCLAMATION MARK)?
(RUF001)
23-23: String contains ambiguous ? (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?
(RUF001)
25-25: String contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF001)
28-28: String contains ambiguous ; (FULLWIDTH SEMICOLON). Did you mean ; (SEMICOLON)?
(RUF001)
44-44: String contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF001)
48-48: String contains ambiguous ‐ (HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF001)
49-49: String contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF001)
50-50: String contains ambiguous ‒ (FIGURE DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
51-51: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
169-169: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
pdf_craft/common/__init__.py
3-3: from .folder import * used; unable to detect undefined names
(F403)
4-4: from .reader import * used; unable to detect undefined names
(F403)
5-5: from .statistics import * used; unable to detect undefined names
(F403)
pdf_craft/sequence/analyse_level.py
12-12: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
14-14: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
scripts/clean_analysing.py
63-63: String contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF001)
pdf_craft/common/xml.py
23-23: Using xml to parse untrusted data is known to be vulnerable to XML attacks; use defusedxml equivalents
(S314)
25-25: Avoid specifying long messages outside the exception class
(TRY003)
pdf_craft/toc/toc_pages.py
11-11: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
38-38: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
38-38: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
168-168: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
pdf_craft/epub/toc_collection.py
86-88: Avoid specifying long messages outside the exception class
(TRY003)
pdf_craft/pdf/page_extractor.py
90-94: Avoid specifying long messages outside the exception class
(TRY003)
110-110: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
110-110: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
pdf_craft/pdf/handler.py
130-132: Avoid specifying long messages outside the exception class
(TRY003)
141-143: Avoid specifying long messages outside the exception class
(TRY003)
pdf_craft/error.py
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
pdf_craft/transform.py
107-109: Avoid specifying long messages outside the exception class
(TRY003)
183-185: Avoid specifying long messages outside the exception class
(TRY003)
pdf_craft/sequence/chapter.py
157-157: Avoid specifying long messages outside the exception class
(TRY003)
159-161: Avoid specifying long messages outside the exception class
(TRY003)
164-164: Avoid specifying long messages outside the exception class
(TRY003)
168-170: Avoid specifying long messages outside the exception class
(TRY003)
185-187: Avoid specifying long messages outside the exception class
(TRY003)
192-192: Avoid specifying long messages outside the exception class
(TRY003)
294-296: Avoid specifying long messages outside the exception class
(TRY003)
298-298: Avoid specifying long messages outside the exception class
(TRY003)
311-313: Avoid specifying long messages outside the exception class
(TRY003)
317-319: Avoid specifying long messages outside the exception class
(TRY003)
323-325: Avoid specifying long messages outside the exception class
(TRY003)
329-331: Avoid specifying long messages outside the exception class
(TRY003)
342-344: Avoid specifying long messages outside the exception class
(TRY003)
349-351: Abstract raise to an inner function
(TRY301)
349-351: Avoid specifying long messages outside the exception class
(TRY003)
355-357: Avoid specifying long messages outside the exception class
(TRY003)
364-366: Avoid specifying long messages outside the exception class
(TRY003)
368-370: Avoid specifying long messages outside the exception class
(TRY003)
375-377: Avoid specifying long messages outside the exception class
(TRY003)
383-385: Avoid specifying long messages outside the exception class
(TRY003)
448-448: Avoid specifying long messages outside the exception class
(TRY003)
453-455: Abstract raise to an inner function
(TRY301)
453-455: Avoid specifying long messages outside the exception class
(TRY003)
459-461: Avoid specifying long messages outside the exception class
(TRY003)
465-465: Avoid specifying long messages outside the exception class
(TRY003)
pdf_craft/sequence/mark.py
118-118: String contains ambiguous Ⅰ (ROMAN NUMERAL ONE). Did you mean I (LATIN CAPITAL LETTER I)?
(RUF001)
122-122: String contains ambiguous Ⅴ (ROMAN NUMERAL FIVE). Did you mean V (LATIN CAPITAL LETTER V)?
(RUF001)
127-127: String contains ambiguous Ⅹ (ROMAN NUMERAL TEN). Did you mean X (LATIN CAPITAL LETTER X)?
(RUF001)
136-136: String contains ambiguous ⅰ (SMALL ROMAN NUMERAL ONE). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
140-140: String contains ambiguous ⅴ (SMALL ROMAN NUMERAL FIVE). Did you mean v (LATIN SMALL LETTER V)?
(RUF001)
145-145: String contains ambiguous ⅹ (SMALL ROMAN NUMERAL TEN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
318-318: String contains ambiguous 0 (FULLWIDTH DIGIT ZERO). Did you mean 0 (DIGIT ZERO)?
(RUF001)
319-319: String contains ambiguous 1 (FULLWIDTH DIGIT ONE). Did you mean 1 (DIGIT ONE)?
(RUF001)
320-320: String contains ambiguous 2 (FULLWIDTH DIGIT TWO). Did you mean 2 (DIGIT TWO)?
(RUF001)
321-321: String contains ambiguous 3 (FULLWIDTH DIGIT THREE). Did you mean 3 (DIGIT THREE)?
(RUF001)
322-322: String contains ambiguous 4 (FULLWIDTH DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?
(RUF001)
323-323: String contains ambiguous 5 (FULLWIDTH DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?
(RUF001)
324-324: String contains ambiguous 6 (FULLWIDTH DIGIT SIX). Did you mean 6 (DIGIT SIX)?
(RUF001)
325-325: String contains ambiguous 7 (FULLWIDTH DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?
(RUF001)
326-326: String contains ambiguous 8 (FULLWIDTH DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?
(RUF001)
327-327: String contains ambiguous 9 (FULLWIDTH DIGIT NINE). Did you mean 9 (DIGIT NINE)?
(RUF001)
334-334: String contains ambiguous 𝟬 (MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
335-335: String contains ambiguous 𝟭 (MATHEMATICAL SANS-SERIF BOLD DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?
(RUF001)
336-336: String contains ambiguous 𝟮 (MATHEMATICAL SANS-SERIF BOLD DIGIT TWO). Did you mean 2 (DIGIT TWO)?
(RUF001)
337-337: String contains ambiguous 𝟯 (MATHEMATICAL SANS-SERIF BOLD DIGIT THREE). Did you mean 3 (DIGIT THREE)?
(RUF001)
338-338: String contains ambiguous 𝟰 (MATHEMATICAL SANS-SERIF BOLD DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?
(RUF001)
339-339: String contains ambiguous 𝟱 (MATHEMATICAL SANS-SERIF BOLD DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?
(RUF001)
340-340: String contains ambiguous 𝟲 (MATHEMATICAL SANS-SERIF BOLD DIGIT SIX). Did you mean 6 (DIGIT SIX)?
(RUF001)
341-341: String contains ambiguous 𝟳 (MATHEMATICAL SANS-SERIF BOLD DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?
(RUF001)
342-342: String contains ambiguous 𝟴 (MATHEMATICAL SANS-SERIF BOLD DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?
(RUF001)
343-343: String contains ambiguous 𝟵 (MATHEMATICAL SANS-SERIF BOLD DIGIT NINE). Did you mean 9 (DIGIT NINE)?
(RUF001)
350-350: String contains ambiguous 𝟎 (MATHEMATICAL BOLD DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
351-351: String contains ambiguous 𝟏 (MATHEMATICAL BOLD DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?
(RUF001)
352-352: String contains ambiguous 𝟐 (MATHEMATICAL BOLD DIGIT TWO). Did you mean 2 (DIGIT TWO)?
(RUF001)
353-353: String contains ambiguous 𝟑 (MATHEMATICAL BOLD DIGIT THREE). Did you mean 3 (DIGIT THREE)?
(RUF001)
354-354: String contains ambiguous 𝟒 (MATHEMATICAL BOLD DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?
(RUF001)
355-355: String contains ambiguous 𝟓 (MATHEMATICAL BOLD DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?
(RUF001)
356-356: String contains ambiguous 𝟔 (MATHEMATICAL BOLD DIGIT SIX). Did you mean 6 (DIGIT SIX)?
(RUF001)
357-357: String contains ambiguous 𝟕 (MATHEMATICAL BOLD DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?
(RUF001)
358-358: String contains ambiguous 𝟖 (MATHEMATICAL BOLD DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?
(RUF001)
359-359: String contains ambiguous 𝟗 (MATHEMATICAL BOLD DIGIT NINE). Did you mean 9 (DIGIT NINE)?
(RUF001)
366-366: String contains ambiguous 𝟘 (MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
367-367: String contains ambiguous 𝟙 (MATHEMATICAL DOUBLE-STRUCK DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?
(RUF001)
368-368: String contains ambiguous 𝟚 (MATHEMATICAL DOUBLE-STRUCK DIGIT TWO). Did you mean 2 (DIGIT TWO)?
(RUF001)
369-369: String contains ambiguous 𝟛 (MATHEMATICAL DOUBLE-STRUCK DIGIT THREE). Did you mean 3 (DIGIT THREE)?
(RUF001)
370-370: String contains ambiguous 𝟜 (MATHEMATICAL DOUBLE-STRUCK DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?
(RUF001)
371-371: String contains ambiguous 𝟝 (MATHEMATICAL DOUBLE-STRUCK DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?
(RUF001)
372-372: String contains ambiguous 𝟞 (MATHEMATICAL DOUBLE-STRUCK DIGIT SIX). Did you mean 6 (DIGIT SIX)?
(RUF001)
373-373: String contains ambiguous 𝟟 (MATHEMATICAL DOUBLE-STRUCK DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?
(RUF001)
374-374: String contains ambiguous 𝟠 (MATHEMATICAL DOUBLE-STRUCK DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?
(RUF001)
375-375: String contains ambiguous 𝟡 (MATHEMATICAL DOUBLE-STRUCK DIGIT NINE). Did you mean 9 (DIGIT NINE)?
(RUF001)
pdf_craft/pdf/__init__.py
4-4: from .ref import * used; unable to detect undefined names
(F403)
pdf_craft/toc/text.py
58-58: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
58-58: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
59-59: String contains ambiguous ‐ (HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF001)
60-60: String contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF001)
61-61: String contains ambiguous ‒ (FIGURE DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
62-62: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
69-69: String contains ambiguous ‚ (SINGLE LOW-9 QUOTATION MARK). Did you mean , (COMMA)?
(RUF001)
70-70: String contains ambiguous ‛ (SINGLE HIGH-REVERSED-9 QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
78-78: String contains ambiguous ․ (ONE DOT LEADER). Did you mean . (FULL STOP)?
(RUF001)
82-82: String contains ambiguous ′ (PRIME). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
85-85: String contains ambiguous ‵ (REVERSED PRIME). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
88-88: String contains ambiguous ‹ (SINGLE LEFT-POINTING ANGLE QUOTATION MARK). Did you mean < (LESS-THAN SIGN)?
(RUF001)
89-89: String contains ambiguous › (SINGLE RIGHT-POINTING ANGLE QUOTATION MARK). Did you mean > (GREATER-THAN SIGN)?
(RUF001)
96-96: String contains ambiguous ⁁ (CARET INSERTION POINT). Did you mean / (SOLIDUS)?
(RUF001)
98-98: String contains ambiguous ⁃ (HYPHEN BULLET). Did you mean - (HYPHEN-MINUS)?
(RUF001)
99-99: String contains ambiguous ⁄ (FRACTION SLASH). Did you mean / (SOLIDUS)?
(RUF001)
109-109: String contains ambiguous ⁎ (LOW ASTERISK). Did you mean * (ASTERISK)?
(RUF001)
114-114: String contains ambiguous ⁓ (SWUNG DASH). Did you mean ~ (TILDE)?
(RUF001)
121-121: String contains ambiguous ⁚ (TWO DOT PUNCTUATION). Did you mean : (COLON)?
(RUF001)
126-126: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
126-126: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
191-191: String contains ambiguous ⹀ (DOUBLE HYPHEN). Did you mean = (EQUALS SIGN)?
(RUF001)
207-207: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
207-207: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
220-220: String contains ambiguous 〔 (LEFT TORTOISE SHELL BRACKET). Did you mean ( (LEFT PARENTHESIS)?
(RUF001)
221-221: String contains ambiguous 〕 (RIGHT TORTOISE SHELL BRACKET). Did you mean ) (RIGHT PARENTHESIS)?
(RUF001)
235-235: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
235-235: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
236-236: String contains ambiguous ! (FULLWIDTH EXCLAMATION MARK). Did you mean ! (EXCLAMATION MARK)?
(RUF001)
237-237: String contains ambiguous " (FULLWIDTH QUOTATION MARK). Did you mean " (QUOTATION MARK)?
(RUF001)
238-238: String contains ambiguous # (FULLWIDTH NUMBER SIGN). Did you mean # (NUMBER SIGN)?
(RUF001)
239-239: String contains ambiguous % (FULLWIDTH PERCENT SIGN). Did you mean % (PERCENT SIGN)?
(RUF001)
240-240: String contains ambiguous & (FULLWIDTH AMPERSAND). Did you mean & (AMPERSAND)?
(RUF001)
241-241: String contains ambiguous ' (FULLWIDTH APOSTROPHE). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
242-242: String contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF001)
243-243: String contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF001)
244-244: String contains ambiguous * (FULLWIDTH ASTERISK). Did you mean * (ASTERISK)?
(RUF001)
245-245: String contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF001)
246-246: String contains ambiguous . (FULLWIDTH FULL STOP). Did you mean . (FULL STOP)?
(RUF001)
247-247: String contains ambiguous / (FULLWIDTH SOLIDUS). Did you mean / (SOLIDUS)?
(RUF001)
248-248: String contains ambiguous : (FULLWIDTH COLON). Did you mean : (COLON)?
(RUF001)
249-249: String contains ambiguous ; (FULLWIDTH SEMICOLON). Did you mean ; (SEMICOLON)?
(RUF001)
250-250: String contains ambiguous ? (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?
(RUF001)
251-251: String contains ambiguous @ (FULLWIDTH COMMERCIAL AT). Did you mean @ (COMMERCIAL AT)?
(RUF001)
252-252: String contains ambiguous [ (FULLWIDTH LEFT SQUARE BRACKET). Did you mean [ (LEFT SQUARE BRACKET)?
(RUF001)
253-253: String contains ambiguous \ (FULLWIDTH REVERSE SOLIDUS). Did you mean \ (REVERSE SOLIDUS)?
(RUF001)
254-254: String contains ambiguous ] (FULLWIDTH RIGHT SQUARE BRACKET). Did you mean ] (RIGHT SQUARE BRACKET)?
(RUF001)
255-255: String contains ambiguous ^ (FULLWIDTH CIRCUMFLEX ACCENT). Did you mean ^ (CIRCUMFLEX ACCENT)?
(RUF001)
256-256: String contains ambiguous _ (FULLWIDTH LOW LINE). Did you mean _ (LOW LINE)?
(RUF001)
257-257: String contains ambiguous ` (FULLWIDTH GRAVE ACCENT). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
258-258: String contains ambiguous { (FULLWIDTH LEFT CURLY BRACKET). Did you mean { (LEFT CURLY BRACKET)?
(RUF001)
259-259: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
260-260: String contains ambiguous } (FULLWIDTH RIGHT CURLY BRACKET). Did you mean } (RIGHT CURLY BRACKET)?
(RUF001)
261-261: String contains ambiguous ~ (FULLWIDTH TILDE). Did you mean ~ (TILDE)?
(RUF001)
269-269: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
269-269: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
273-273: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
273-273: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
275-275: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
275-275: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
276-276: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
276-276: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
276-276: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
281-281: String contains ambiguous ؍ (ARABIC DATE SEPARATOR). Did you mean , (COMMA)?
(RUF001)
288-288: String contains ambiguous ٫ (ARABIC DECIMAL SEPARATOR). Did you mean , (COMMA)?
(RUF001)
290-290: String contains ambiguous ٭ (ARABIC FIVE POINTED STAR). Did you mean * (ASTERISK)?
(RUF001)
294-294: String contains ambiguous ׀ (HEBREW PUNCTUATION PASEQ). Did you mean l (LATIN SMALL LETTER L)?
(RUF001)
295-295: String contains ambiguous ׃ (HEBREW PUNCTUATION SOF PASUQ). Did you mean : (COLON)?
(RUF001)
297-297: String contains ambiguous ׳ (HEBREW PUNCTUATION GERESH). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
326-326: String contains ambiguous ᙮ (CANADIAN SYLLABICS FULL STOP). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
336-336: String contains ambiguous ᠃ (MONGOLIAN FULL STOP). Did you mean : (COLON)?
(RUF001)
342-342: String contains ambiguous ᠉ (MONGOLIAN MANCHU FULL STOP). Did you mean : (COLON)?
(RUF001)
357-357: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
357-357: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
358-358: String contains ambiguous ⁁ (CARET INSERTION POINT). Did you mean / (SOLIDUS)?
(RUF001)
359-359: String contains ambiguous ⁓ (SWUNG DASH). Did you mean ~ (TILDE)?
(RUF001)
pdf_craft/pdf/page_ref.py
108-111: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # 不可直接用 doc-page-extractor 的 Error,该库的一切都是懒加载,若暴露,则无法懒加载 | ||
| class InterruptedError(Exception): | ||
| """Raised when the operation is interrupted by the user.""" | ||
|
|
||
| def __init__(self, metering: OCRTokensMetering) -> None: | ||
| super().__init__() | ||
| self._kind: InterruptedKind | ||
| self._metering: OCRTokensMetering = metering |
There was a problem hiding this comment.
InterruptedError._kind is declared but never assigned.
Line 33 declares self._kind: InterruptedKind but it's never assigned a value. In to_interrupted_error, the kind variable is computed but not passed to InterruptedError.__init__ or set on the instance.
If _kind is intended to be accessible, the constructor should accept and store it:
🔧 Proposed fix
class InterruptedError(Exception):
"""Raised when the operation is interrupted by the user."""
- def __init__(self, metering: OCRTokensMetering) -> None:
+ def __init__(self, kind: InterruptedKind, metering: OCRTokensMetering) -> None:
super().__init__()
- self._kind: InterruptedKind
+ self._kind: InterruptedKind = kind
self._metering: OCRTokensMetering = meteringAnd update the instantiation in to_interrupted_error:
if kind is not None:
return InterruptedError(
+ kind,
OCRTokensMetering(
input_tokens=error.input_tokens,
output_tokens=error.output_tokens,
)
)🧰 Tools
🪛 Ruff (0.14.11)
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
27-27: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
🤖 Prompt for AI Agents
In `@pdf_craft/error.py` around lines 27 - 34, InterruptedError.__init__ declares
self._kind but never assigns it; update InterruptedError to accept a kind:
InterruptedKind parameter (e.g., def __init__(self, metering: OCRTokensMetering,
kind: InterruptedKind) -> None) and store it to self._kind, and then update the
caller in to_interrupted_error to pass the computed kind when constructing
InterruptedError so the instance preserves the interruption kind.
| from .page_ref import pdf_pages_count | ||
| from .types import decode, encode, Page, PageLayout, PDFDocumentMetadata, DeepSeekOCRSize | ||
| from .handler import PDFHandler, PDFDocument, DefaultPDFHandler, DefaultPDFDocument | ||
| from .ref import * |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the ref.py file in pdf_craft/pdf
fd -t f "ref.py" pdf_craft/pdfRepository: oomol-lab/pdf-craft
Length of output: 109
🏁 Script executed:
# Check the content of ref.py to see if it defines __all__
cat -n pdf_craft/pdf/ref.pyRepository: oomol-lab/pdf-craft
Length of output: 105
🏁 Script executed:
# Check the current state of __init__.py
cat -n pdf_craft/pdf/__init__.pyRepository: oomol-lab/pdf-craft
Length of output: 451
🏁 Script executed:
# Get the exact line count of ref.py
wc -l pdf_craft/pdf/ref.pyRepository: oomol-lab/pdf-craft
Length of output: 85
🏁 Script executed:
# Search for __all__ definition in ref.py specifically
rg "__all__" pdf_craft/pdf/ref.pyRepository: oomol-lab/pdf-craft
Length of output: 45
🏁 Script executed:
# Get the complete content of ref.py with all details
cat pdf_craft/pdf/ref.pyRepository: oomol-lab/pdf-craft
Length of output: 98
Replace wildcard import with explicit import.
The from .ref import * triggers ruff F403 and makes it impossible to statically determine what names are exported. Since .ref only exports TITLE_TAGS and does not define __all__, replace this with an explicit import:
from .ref import TITLE_TAGSThis eliminates the F403 warning and improves code clarity.
🧰 Tools
🪛 Ruff (0.14.11)
4-4: from .ref import * used; unable to detect undefined names
(F403)
🤖 Prompt for AI Agents
In `@pdf_craft/pdf/__init__.py` at line 4, Replace the wildcard import from .ref
with an explicit import of TITLE_TAGS: change the line "from .ref import *" to
import only TITLE_TAGS (e.g., "from .ref import TITLE_TAGS") and, if you need to
control exports from this package, add or update __all__ to include "TITLE_TAGS"
so static analysis (ruff F403) no longer flags the wildcard import.
| ( | ||
| NumberClass.CIRCLED_NUMBER, | ||
| NumberStyle.DOUBLE_CIRCLED_NUMBER, | ||
| ( | ||
| (0, "⓵"), | ||
| (1, "⓶"), | ||
| (2, "⓷"), | ||
| (3, "⓸"), | ||
| (4, "⓹"), | ||
| (5, "⓺"), | ||
| (6, "⓻"), | ||
| (7, "⓼"), | ||
| (8, "⓽"), | ||
| (9, "⓾"), | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Verify: DOUBLE_CIRCLED_NUMBER mapping starts at 0 but uses ⓵ (double circled one).
The mapping (0, "⓵") appears incorrect. The Unicode character ⓵ (U+24F5) is "DOUBLE CIRCLED DIGIT ONE", not zero. This would cause Mark.number to be 0 when the visual character represents 1.
Compare with CIRCLED_NUMBER which correctly maps (0, "⓪") to the circled zero character.
🔧 Suggested fix
(
NumberClass.CIRCLED_NUMBER,
NumberStyle.DOUBLE_CIRCLED_NUMBER,
(
- (0, "⓵"),
- (1, "⓶"),
- (2, "⓷"),
- (3, "⓸"),
- (4, "⓹"),
- (5, "⓺"),
- (6, "⓻"),
- (7, "⓼"),
- (8, "⓽"),
- (9, "⓾"),
+ (1, "⓵"),
+ (2, "⓶"),
+ (3, "⓷"),
+ (4, "⓸"),
+ (5, "⓹"),
+ (6, "⓺"),
+ (7, "⓻"),
+ (8, "⓼"),
+ (9, "⓽"),
+ (10, "⓾"),
),
),🤖 Prompt for AI Agents
In `@pdf_craft/sequence/mark.py` around lines 207 - 222, The DOUBLE_CIRCLED_NUMBER
tuple currently maps (0, "⓵") which misaligns numeric values with their glyphs;
in the NumberStyle.DOUBLE_CIRCLED_NUMBER mapping update the integer keys to
match the visual digits (i.e., start at 1 for "⓵", 2 for "⓶", etc.) so each
tuple pair correctly associates the number with its corresponding double-circled
Unicode character.
No description provided.