Skip to content

style: fix all *.py by ruff#331

Merged
Moskize91 merged 1 commit intomainfrom
style
Jan 14, 2026
Merged

style: fix all *.py by ruff#331
Moskize91 merged 1 commit intomainfrom
style

Conversation

@Moskize91
Copy link
Copy Markdown
Contributor

No description provided.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 14, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added image clipping functionality to asset management.
    • New OCR event type RENDERED for enhanced processing visibility.
    • Extended numbering and marking styles for document structures.
    • Added statistical median() calculation function.
  • Improvements

    • Enhanced duration formatting with improved time unit display.
    • Expanded metadata structures for richer document information.
    • Added page_index property to references for easier navigation.
  • Refactoring

    • Reorganized imports and public API surfaces for clarity.
    • Made PDF error page_index parameter optional.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This pull request performs extensive structural refactoring across the pdf_craft codebase. Changes include: (1) reorganizing and consolidating imports in __init__.py files across modules (pdf_craft, common, pdf, sequence, toc) to expose new symbols for error handling, statistics, and metadata; (2) adding new public data structures (PDFDocumentMetadata, PageLayout, Page) and expanding enums (OCREventKind, NumberClass, NumberStyle); (3) modifying existing dataclasses (OCRTokensMetering, Toc, Reference, Chapter, AssetLayout) with additional fields; (4) introducing new functions and methods (median(), clip(), read_xml/save_xml, layout decoding/encoding helpers, page_index property); (5) standardizing imports and formatting across 50+ files; and (6) refining logic in layout filtering, error handling, and type checking. Most changes preserve existing behavior while reorganizing code structure and expanding the public API surface.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive No description was provided by the author, making it impossible to verify if the description relates to the changeset. Add a description explaining the purpose of the style fixes, such as the scope of ruff formatting applied and any significant changes to the codebase structure.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required format with type and subject: 'style: fix all *.py by ruff' clearly indicates code style formatting changes across Python files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Moskize91 Moskize91 merged commit f100980 into main Jan 14, 2026
1 of 2 checks passed
@Moskize91 Moskize91 deleted the style branch January 14, 2026 02:31
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@pdf_craft/error.py`:
- Around line 27-34: InterruptedError.__init__ declares self._kind but never
assigns it; update InterruptedError to accept a kind: InterruptedKind parameter
(e.g., def __init__(self, metering: OCRTokensMetering, kind: InterruptedKind) ->
None) and store it to self._kind, and then update the caller in
to_interrupted_error to pass the computed kind when constructing
InterruptedError so the instance preserves the interruption kind.

In `@pdf_craft/pdf/__init__.py`:
- Line 4: Replace the wildcard import from .ref with an explicit import of
TITLE_TAGS: change the line "from .ref import *" to import only TITLE_TAGS
(e.g., "from .ref import TITLE_TAGS") and, if you need to control exports from
this package, add or update __all__ to include "TITLE_TAGS" so static analysis
(ruff F403) no longer flags the wildcard import.

In `@pdf_craft/sequence/mark.py`:
- Around line 207-222: The DOUBLE_CIRCLED_NUMBER tuple currently maps (0, "⓵")
which misaligns numeric values with their glyphs; in the
NumberStyle.DOUBLE_CIRCLED_NUMBER mapping update the integer keys to match the
visual digits (i.e., start at 1 for "⓵", 2 for "⓶", etc.) so each tuple pair
correctly associates the number with its corresponding double-circled Unicode
character.
🧹 Nitpick comments (9)
pdf_craft/common/xml.py (1)

21-25: XML parsing security: defusedxml recommended but low priority given input sources.

The static analysis tool flags fromstring as potentially vulnerable to XXE and entity-expansion attacks (Billion-Laughs, quadratic blowup). However, reviewing the codebase shows that read_xml() is only used for internal project-generated XML files (TOC structure, page metadata, chapter headers) stored locally—not untrusted sources like user uploads or external APIs.

While using defusedxml.ElementTree.fromstring would be a good security hardening practice and aligns with Python security best practices, it is not critical for this use case given the trusted input boundary. If you decide to adopt defusedxml, ensure it's added as a dependency in pyproject.toml.

pdf_craft/common/asset.py (1)

16-34: Well-implemented asset deduplication with proper cleanup.

The clip method correctly handles:

  • Temp file creation with UUID to avoid collisions
  • Hash-based deduplication (skipping if target exists)
  • Cleanup on both success and exception paths

Minor style suggestion: raise e can be simplified to raise for cleaner exception re-raising.

♻️ Optional simplification
         except Exception as e:
             if temp_path.exists():
                 temp_path.unlink()
-            raise e
+            raise
pdf_craft/markdown/render/layouts.py (1)

62-79: Duplicated render_member closures.

The render_member closure is implemented identically in both render_paragraph (lines 62-79) and _render_asset (lines 98-115). Consider extracting this into a shared factory function or a standalone helper to reduce duplication.

♻️ Suggested refactor
+def _create_member_render(ref_id_to_number: RefIdMap | None) -> _MemberRender:
+    def render_member(part: BlockMember | str) -> Generator[str, None, None]:
+        if isinstance(part, str):
+            yield to_markdown_string(
+                kind=ExpressionKind.TEXT,
+                content=part,
+            )
+        elif isinstance(part, InlineExpression):
+            latex_content = part.content.strip()
+            if latex_content:
+                yield to_markdown_string(
+                    kind=part.kind,
+                    content=latex_content,
+                )
+        elif ref_id_to_number and isinstance(part, Reference):
+            ref_number = ref_id_to_number.get(part.id, 1)
+            yield "[^"
+            yield str(ref_number)
+            yield "]"
+    return render_member

Then use render_member = _create_member_render(ref_id_to_number) in both functions.

Also applies to: 98-115

pdf_craft/pdf/types.py (1)

78-80: Consider using explicit validation instead of assertions.

Assertions can be disabled with python -O, making these order checks ineffective in optimized production builds. If order consistency is a critical invariant, consider raising an explicit ValueError instead.

♻️ Suggested refactor
         for i, layout in enumerate(page.body_layouts):
-            assert layout.order == i, (
-                f"body_layouts[{i}].order should be {i}, got {layout.order}"
-            )
+            if layout.order != i:
+                raise ValueError(
+                    f"body_layouts[{i}].order should be {i}, got {layout.order}"
+                )
             body_element.append(_encode_layout(layout))

Apply the same pattern for footnotes_layouts.

Also applies to: 86-88

pdf_craft/pdf/ocr.py (1)

148-150: Consider simplifying DPI default handling.

The multi-line conditional could be simplified. Since dpi=0 is likely not a valid value, you could use the parameter default directly.

♻️ Suggested simplification
                     image = ref.render(
-                            dpi=dpi
-                            if dpi is not None
-                            else 300,  # DPI=300 for scanned page
+                            dpi=dpi or 300,  # DPI=300 for scanned page
                             max_image_file_size=max_page_image_file_size,
                         )

Alternatively, set the default in the function signature: dpi: int = 300 instead of dpi: int | None = None.

pdf_craft/sequence/mark.py (1)

112-395: Ruff RUF001 warnings are false positives for this file.

The static analysis warnings about "ambiguous characters" (Roman numerals, fullwidth digits, mathematical numbers) are intentional. This file specifically maps special Unicode number characters for OCR mark detection. These characters must remain as-is.

Consider adding a # noqa: RUF001 comment at the file level or configuring ruff to ignore this rule for this file.

pdf_craft/sequence/chapter.py (2)

338-394: Move decode_block_member definition outside the loop for efficiency.

The nested function decode_block_member is defined inside the for block_el in parent.findall("block") loop (starting at line 308), causing it to be recreated on every iteration. Since it only captures context_tag and references_map which don't change during iteration, move the definition before the loop.

♻️ Suggested refactor
 def _decode_block_elements(
     parent: Element,
     context_tag: str,
     references_map: dict[tuple[int, int], Reference] | None = None,
 ) -> list[BlockLayout]:
+    def decode_block_member(child: Element) -> BlockMember:
+        if child.tag == "ref":
+            ref_id = child.get("id")
+            if ref_id is None:
+                raise ValueError(
+                    f"<{context_tag}><block><ref> missing required attribute 'id'"
+                )
+            # ... rest of the function body unchanged ...
+        elif child.tag == "inline_expr":
+            # ... unchanged ...
+        else:
+            raise ValueError(
+                f"<{context_tag}><block> contains unknown element: <{child.tag}>"
+            )
+
     blocks: list[BlockLayout] = []
     for block_el in parent.findall("block"):
         # ... attribute parsing unchanged ...
-        def decode_block_member(child: Element) -> BlockMember:
-            # ... function body ...
 
         blocks.append(
             BlockLayout(
                 # ... unchanged ...
             )
         )
     return blocks

468-469: Move the import to the top of the file.

The late import of transform2mark inside _decode_reference (line 468) adds unnecessary overhead on each call. Since mark.py does not import from chapter.py, there is no circular dependency preventing this import from being moved to the top of the file with the other imports.

pdf_craft/__init__.py (1)

3-9: Consider renaming InterruptedError to avoid shadowing the built-in.

InterruptedError shadows Python's built-in InterruptedError exception. While the custom exception serves a different purpose (OCR/PDF processing interruption) and has a different signature (requires metering parameter), the naming collision could cause confusion. Users familiar with the built-in exception for I/O operations might inadvertently reference the wrong exception type.

Renaming to OCRInterruptedError or ProcessingInterruptedError would improve clarity without functional impact.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8e0f80 and 2d0dd5f.

📒 Files selected for processing (56)
  • pdf_craft/__init__.py
  • pdf_craft/common/__init__.py
  • pdf_craft/common/asset.py
  • pdf_craft/common/cv_splitter.py
  • pdf_craft/common/folder.py
  • pdf_craft/common/reader.py
  • pdf_craft/common/statistics.py
  • pdf_craft/common/xml.py
  • pdf_craft/epub/latex_to_text.py
  • pdf_craft/epub/render.py
  • pdf_craft/epub/toc_collection.py
  • pdf_craft/error.py
  • pdf_craft/expression.py
  • pdf_craft/functions.py
  • pdf_craft/language.py
  • pdf_craft/markdown/paragraph/__init__.py
  • pdf_craft/markdown/paragraph/parser.py
  • pdf_craft/markdown/paragraph/render.py
  • pdf_craft/markdown/paragraph/tags.py
  • pdf_craft/markdown/paragraph/types.py
  • pdf_craft/markdown/render/layouts.py
  • pdf_craft/markdown/render/render.py
  • pdf_craft/metering.py
  • pdf_craft/pdf/__init__.py
  • pdf_craft/pdf/handler.py
  • pdf_craft/pdf/ngrams.py
  • pdf_craft/pdf/ocr.py
  • pdf_craft/pdf/page_extractor.py
  • pdf_craft/pdf/page_ref.py
  • pdf_craft/pdf/types.py
  • pdf_craft/sequence/__init__.py
  • pdf_craft/sequence/analyse_level.py
  • pdf_craft/sequence/chapter.py
  • pdf_craft/sequence/content.py
  • pdf_craft/sequence/generation.py
  • pdf_craft/sequence/jointer.py
  • pdf_craft/sequence/mark.py
  • pdf_craft/sequence/reader.py
  • pdf_craft/sequence/reading_serials.py
  • pdf_craft/sequence/reference.py
  • pdf_craft/toc/__init__.py
  • pdf_craft/toc/analysing.py
  • pdf_craft/toc/text.py
  • pdf_craft/toc/toc_levels.py
  • pdf_craft/toc/toc_pages.py
  • pdf_craft/toc/types.py
  • pdf_craft/transform.py
  • scripts/clean_analysing.py
  • scripts/gen_epub.py
  • scripts/gen_md.py
  • test.py
  • tests/test_cv_splitter.py
  • tests/test_expression.py
  • tests/test_jointer.py
  • tests/test_parser.py
  • tests/test_reading_serials.py
🧰 Additional context used
🧬 Code graph analysis (29)
tests/test_parser.py (2)
pdf_craft/markdown/paragraph/types.py (1)
  • HTMLTag (11-14)
pdf_craft/markdown/paragraph/parser.py (1)
  • parse_raw_markdown (8-56)
tests/test_reading_serials.py (1)
pdf_craft/pdf/types.py (1)
  • PageLayout (24-29)
pdf_craft/common/reader.py (1)
pdf_craft/common/xml.py (1)
  • read_xml (21-25)
pdf_craft/markdown/paragraph/render.py (2)
pdf_craft/language.py (1)
  • is_chinese_char (5-19)
pdf_craft/markdown/paragraph/types.py (1)
  • HTMLTag (11-14)
pdf_craft/markdown/paragraph/__init__.py (2)
pdf_craft/markdown/paragraph/tags.py (5)
  • HTMLTagDefinition (53-58)
  • is_protocol_allowed (547-561)
  • is_tag_filtered (539-540)
  • is_tag_ignored (543-544)
  • tag_definition (535-536)
pdf_craft/markdown/paragraph/types.py (3)
  • HTMLTag (11-14)
  • decode (25-49)
  • encode (52-75)
pdf_craft/markdown/paragraph/parser.py (2)
pdf_craft/markdown/paragraph/tags.py (4)
  • is_protocol_allowed (547-561)
  • is_tag_filtered (539-540)
  • is_tag_ignored (543-544)
  • tag_definition (535-536)
pdf_craft/markdown/paragraph/types.py (1)
  • HTMLTag (11-14)
pdf_craft/sequence/jointer.py (7)
pdf_craft/expression.py (3)
  • ExpressionKind (6-11)
  • ParsedItem (15-20)
  • parse_latex_expressions (68-193)
pdf_craft/language.py (1)
  • is_latin_letter (1-2)
pdf_craft/markdown/paragraph/parser.py (1)
  • parse_raw_markdown (8-56)
pdf_craft/pdf/types.py (1)
  • PageLayout (24-29)
pdf_craft/sequence/chapter.py (4)
  • AssetLayout (50-57)
  • BlockLayout (61-65)
  • InlineExpression (28-30)
  • ParagraphLayout (21-24)
pdf_craft/sequence/content.py (3)
  • expand_text_in_content (42-56)
  • first (9-16)
  • last (19-26)
pdf_craft/sequence/reading_serials.py (1)
  • split_reading_serials (22-69)
pdf_craft/sequence/analyse_level.py (2)
pdf_craft/sequence/chapter.py (1)
  • Chapter (14-17)
pdf_craft/common/cv_splitter.py (1)
  • split_by_cv (47-75)
pdf_craft/sequence/generation.py (10)
pdf_craft/common/xml.py (1)
  • save_xml (28-40)
pdf_craft/pdf/types.py (3)
  • Page (14-20)
  • decode (44-67)
  • encode (70-91)
pdf_craft/sequence/chapter.py (7)
  • decode (85-115)
  • AssetLayout (50-57)
  • Chapter (14-17)
  • ParagraphLayout (21-24)
  • Reference (34-42)
  • encode (118-144)
  • id (41-42)
pdf_craft/toc/types.py (5)
  • decode (50-93)
  • Toc (15-20)
  • TocInfo (9-11)
  • iter_toc (23-26)
  • encode (29-47)
pdf_craft/sequence/analyse_level.py (1)
  • analyse_chapter_internal_levels (10-24)
pdf_craft/sequence/content.py (2)
  • expand_text_in_content (42-56)
  • join_texts_in_content (29-39)
pdf_craft/sequence/jointer.py (1)
  • Jointer (77-292)
pdf_craft/sequence/mark.py (2)
  • Mark (36-55)
  • search_marks (83-89)
pdf_craft/sequence/reference.py (2)
  • References (11-126)
  • page_index (26-27)
pdf_craft/pdf/page_ref.py (1)
  • page_index (83-84)
pdf_craft/toc/toc_pages.py (3)
pdf_craft/language.py (1)
  • is_latin_letter (1-2)
pdf_craft/toc/text.py (1)
  • normalize_text (364-370)
pdf_craft/pdf/page_ref.py (2)
  • PageRef (73-121)
  • page_index (83-84)
pdf_craft/markdown/render/render.py (4)
pdf_craft/metering.py (1)
  • check_aborted (8-12)
pdf_craft/sequence/chapter.py (3)
  • Reference (34-42)
  • references_to_map (78-82)
  • search_references_in_chapter (68-75)
pdf_craft/sequence/reader.py (1)
  • create_chapters_reader (8-26)
pdf_craft/markdown/render/layouts.py (1)
  • render_layouts (21-50)
pdf_craft/sequence/reference.py (3)
pdf_craft/sequence/chapter.py (4)
  • AssetLayout (50-57)
  • BlockLayout (61-65)
  • ParagraphLayout (21-24)
  • Reference (34-42)
pdf_craft/sequence/mark.py (2)
  • Mark (36-55)
  • transform2mark (74-80)
pdf_craft/pdf/page_ref.py (1)
  • page_index (83-84)
scripts/gen_md.py (5)
pdf_craft/pdf/ocr.py (1)
  • OCREventKind (20-26)
pdf_craft/functions.py (1)
  • transform_markdown (25-69)
pdf_craft/transform.py (1)
  • transform_markdown (43-109)
pdf_craft/pdf/page_ref.py (1)
  • page_index (83-84)
pdf_craft/sequence/reference.py (1)
  • page_index (26-27)
pdf_craft/epub/toc_collection.py (1)
pdf_craft/toc/types.py (2)
  • Toc (15-20)
  • decode (50-93)
tests/test_expression.py (1)
pdf_craft/expression.py (2)
  • ExpressionKind (6-11)
  • parse_latex_expressions (68-193)
pdf_craft/pdf/handler.py (5)
pdf_craft/error.py (1)
  • PDFError (6-9)
pdf_craft/pdf/types.py (1)
  • PDFDocumentMetadata (33-41)
pdf_craft/pdf/page_ref.py (2)
  • pages_count (41-43)
  • page_index (83-84)
pdf_craft/pdf/ocr.py (1)
  • metadata (60-65)
pdf_craft/sequence/reference.py (2)
  • page_index (26-27)
  • get (29-30)
pdf_craft/pdf/types.py (1)
pdf_craft/common/xml.py (1)
  • indent (5-18)
pdf_craft/sequence/reading_serials.py (3)
pdf_craft/common/statistics.py (1)
  • avg (4-14)
pdf_craft/common/cv_splitter.py (2)
  • split_by_cv (47-75)
  • size (26-34)
pdf_craft/pdf/types.py (1)
  • PageLayout (24-29)
pdf_craft/sequence/reader.py (3)
pdf_craft/common/reader.py (1)
  • XMLReader (11-37)
pdf_craft/common/xml.py (1)
  • read_xml (21-25)
pdf_craft/sequence/chapter.py (2)
  • Chapter (14-17)
  • decode (85-115)
scripts/gen_epub.py (4)
pdf_craft/pdf/ocr.py (1)
  • OCREventKind (20-26)
pdf_craft/functions.py (1)
  • transform_epub (72-124)
pdf_craft/transform.py (1)
  • transform_epub (111-185)
scripts/gen_md.py (1)
  • _format_duration (29-38)
pdf_craft/error.py (1)
pdf_craft/metering.py (2)
  • InterruptedKind (21-23)
  • OCRTokensMetering (16-18)
pdf_craft/transform.py (8)
pdf_craft/common/folder.py (1)
  • EnsureFolder (5-24)
pdf_craft/error.py (3)
  • PDFError (6-9)
  • is_inline_error (19-20)
  • to_interrupted_error (37-53)
pdf_craft/pdf/page_ref.py (1)
  • render (86-111)
pdf_craft/metering.py (1)
  • OCRTokensMetering (16-18)
pdf_craft/pdf/ocr.py (1)
  • OCR (40-244)
pdf_craft/sequence/generation.py (1)
  • generate_chapter_files (22-40)
pdf_craft/to_path.py (1)
  • to_path (5-9)
pdf_craft/toc/analysing.py (1)
  • analyse_toc (16-24)
pdf_craft/sequence/content.py (2)
pdf_craft/markdown/paragraph/types.py (1)
  • HTMLTag (11-14)
pdf_craft/sequence/generation.py (1)
  • expand (172-180)
pdf_craft/sequence/chapter.py (5)
pdf_craft/common/xml.py (1)
  • indent (5-18)
pdf_craft/expression.py (3)
  • ExpressionKind (6-11)
  • decode_expression_kind (36-48)
  • encode_expression_kind (23-33)
pdf_craft/markdown/paragraph/types.py (3)
  • HTMLTag (11-14)
  • decode (25-49)
  • encode (52-75)
pdf_craft/sequence/reference.py (2)
  • page_index (26-27)
  • get (29-30)
pdf_craft/sequence/mark.py (1)
  • transform2mark (74-80)
pdf_craft/markdown/render/layouts.py (3)
pdf_craft/expression.py (2)
  • ExpressionKind (6-11)
  • to_markdown_string (51-65)
pdf_craft/sequence/chapter.py (4)
  • AssetLayout (50-57)
  • InlineExpression (28-30)
  • ParagraphLayout (21-24)
  • Reference (34-42)
pdf_craft/markdown/paragraph/render.py (1)
  • render_markdown_paragraph (7-16)
pdf_craft/pdf/__init__.py (3)
pdf_craft/pdf/handler.py (4)
  • DefaultPDFDocument (47-182)
  • DefaultPDFHandler (31-41)
  • PDFDocument (13-23)
  • PDFHandler (27-28)
pdf_craft/pdf/page_ref.py (1)
  • pdf_pages_count (11-27)
pdf_craft/pdf/types.py (3)
  • Page (14-20)
  • PageLayout (24-29)
  • PDFDocumentMetadata (33-41)
pdf_craft/toc/text.py (1)
pdf_craft/language.py (1)
  • is_latin_letter (1-2)
pdf_craft/pdf/ocr.py (8)
pdf_craft/common/asset.py (1)
  • AssetHub (12-41)
pdf_craft/error.py (2)
  • OCRError (12-16)
  • PDFError (6-9)
pdf_craft/metering.py (1)
  • check_aborted (8-12)
pdf_craft/to_path.py (1)
  • to_path (5-9)
pdf_craft/pdf/handler.py (2)
  • DefaultPDFHandler (31-41)
  • PDFHandler (27-28)
pdf_craft/pdf/types.py (4)
  • Page (14-20)
  • PageLayout (24-29)
  • PDFDocumentMetadata (33-41)
  • encode (70-91)
pdf_craft/pdf/page_extractor.py (1)
  • PageExtractorNode (15-180)
pdf_craft/pdf/page_ref.py (1)
  • PageRefContext (30-66)
pdf_craft/pdf/page_ref.py (4)
pdf_craft/error.py (1)
  • PDFError (6-9)
pdf_craft/pdf/handler.py (3)
  • DefaultPDFHandler (31-41)
  • PDFDocument (13-23)
  • PDFHandler (27-28)
pdf_craft/toc/toc_pages.py (1)
  • PageRef (18-21)
pdf_craft/sequence/reference.py (1)
  • page_index (26-27)
🪛 Ruff (0.14.11)
pdf_craft/toc/toc_levels.py

42-42: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


124-124: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


150-150: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

tests/test_reading_serials.py

168-168: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

pdf_craft/sequence/jointer.py

22-22: String contains ambiguous (FULLWIDTH EXCLAMATION MARK). Did you mean ! (EXCLAMATION MARK)?

(RUF001)


23-23: String contains ambiguous (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?

(RUF001)


25-25: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


28-28: String contains ambiguous (FULLWIDTH SEMICOLON). Did you mean ; (SEMICOLON)?

(RUF001)


44-44: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


48-48: String contains ambiguous (HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF001)


49-49: String contains ambiguous (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF001)


50-50: String contains ambiguous (FIGURE DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


51-51: String contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


169-169: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

pdf_craft/common/__init__.py

3-3: from .folder import * used; unable to detect undefined names

(F403)


4-4: from .reader import * used; unable to detect undefined names

(F403)


5-5: from .statistics import * used; unable to detect undefined names

(F403)

pdf_craft/sequence/analyse_level.py

12-12: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


14-14: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

scripts/clean_analysing.py

63-63: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

pdf_craft/common/xml.py

23-23: Using xml to parse untrusted data is known to be vulnerable to XML attacks; use defusedxml equivalents

(S314)


25-25: Avoid specifying long messages outside the exception class

(TRY003)

pdf_craft/toc/toc_pages.py

11-11: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


38-38: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


38-38: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


168-168: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

pdf_craft/epub/toc_collection.py

86-88: Avoid specifying long messages outside the exception class

(TRY003)

pdf_craft/pdf/page_extractor.py

90-94: Avoid specifying long messages outside the exception class

(TRY003)


110-110: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


110-110: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)

pdf_craft/pdf/handler.py

130-132: Avoid specifying long messages outside the exception class

(TRY003)


141-143: Avoid specifying long messages outside the exception class

(TRY003)

pdf_craft/error.py

27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

pdf_craft/transform.py

107-109: Avoid specifying long messages outside the exception class

(TRY003)


183-185: Avoid specifying long messages outside the exception class

(TRY003)

pdf_craft/sequence/chapter.py

157-157: Avoid specifying long messages outside the exception class

(TRY003)


159-161: Avoid specifying long messages outside the exception class

(TRY003)


164-164: Avoid specifying long messages outside the exception class

(TRY003)


168-170: Avoid specifying long messages outside the exception class

(TRY003)


185-187: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Avoid specifying long messages outside the exception class

(TRY003)


294-296: Avoid specifying long messages outside the exception class

(TRY003)


298-298: Avoid specifying long messages outside the exception class

(TRY003)


311-313: Avoid specifying long messages outside the exception class

(TRY003)


317-319: Avoid specifying long messages outside the exception class

(TRY003)


323-325: Avoid specifying long messages outside the exception class

(TRY003)


329-331: Avoid specifying long messages outside the exception class

(TRY003)


342-344: Avoid specifying long messages outside the exception class

(TRY003)


349-351: Abstract raise to an inner function

(TRY301)


349-351: Avoid specifying long messages outside the exception class

(TRY003)


355-357: Avoid specifying long messages outside the exception class

(TRY003)


364-366: Avoid specifying long messages outside the exception class

(TRY003)


368-370: Avoid specifying long messages outside the exception class

(TRY003)


375-377: Avoid specifying long messages outside the exception class

(TRY003)


383-385: Avoid specifying long messages outside the exception class

(TRY003)


448-448: Avoid specifying long messages outside the exception class

(TRY003)


453-455: Abstract raise to an inner function

(TRY301)


453-455: Avoid specifying long messages outside the exception class

(TRY003)


459-461: Avoid specifying long messages outside the exception class

(TRY003)


465-465: Avoid specifying long messages outside the exception class

(TRY003)

pdf_craft/sequence/mark.py

118-118: String contains ambiguous (ROMAN NUMERAL ONE). Did you mean I (LATIN CAPITAL LETTER I)?

(RUF001)


122-122: String contains ambiguous (ROMAN NUMERAL FIVE). Did you mean V (LATIN CAPITAL LETTER V)?

(RUF001)


127-127: String contains ambiguous (ROMAN NUMERAL TEN). Did you mean X (LATIN CAPITAL LETTER X)?

(RUF001)


136-136: String contains ambiguous (SMALL ROMAN NUMERAL ONE). Did you mean i (LATIN SMALL LETTER I)?

(RUF001)


140-140: String contains ambiguous (SMALL ROMAN NUMERAL FIVE). Did you mean v (LATIN SMALL LETTER V)?

(RUF001)


145-145: String contains ambiguous (SMALL ROMAN NUMERAL TEN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


318-318: String contains ambiguous (FULLWIDTH DIGIT ZERO). Did you mean 0 (DIGIT ZERO)?

(RUF001)


319-319: String contains ambiguous (FULLWIDTH DIGIT ONE). Did you mean 1 (DIGIT ONE)?

(RUF001)


320-320: String contains ambiguous (FULLWIDTH DIGIT TWO). Did you mean 2 (DIGIT TWO)?

(RUF001)


321-321: String contains ambiguous (FULLWIDTH DIGIT THREE). Did you mean 3 (DIGIT THREE)?

(RUF001)


322-322: String contains ambiguous (FULLWIDTH DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?

(RUF001)


323-323: String contains ambiguous (FULLWIDTH DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?

(RUF001)


324-324: String contains ambiguous (FULLWIDTH DIGIT SIX). Did you mean 6 (DIGIT SIX)?

(RUF001)


325-325: String contains ambiguous (FULLWIDTH DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?

(RUF001)


326-326: String contains ambiguous (FULLWIDTH DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?

(RUF001)


327-327: String contains ambiguous (FULLWIDTH DIGIT NINE). Did you mean 9 (DIGIT NINE)?

(RUF001)


334-334: String contains ambiguous 𝟬 (MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?

(RUF001)


335-335: String contains ambiguous 𝟭 (MATHEMATICAL SANS-SERIF BOLD DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?

(RUF001)


336-336: String contains ambiguous 𝟮 (MATHEMATICAL SANS-SERIF BOLD DIGIT TWO). Did you mean 2 (DIGIT TWO)?

(RUF001)


337-337: String contains ambiguous 𝟯 (MATHEMATICAL SANS-SERIF BOLD DIGIT THREE). Did you mean 3 (DIGIT THREE)?

(RUF001)


338-338: String contains ambiguous 𝟰 (MATHEMATICAL SANS-SERIF BOLD DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?

(RUF001)


339-339: String contains ambiguous 𝟱 (MATHEMATICAL SANS-SERIF BOLD DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?

(RUF001)


340-340: String contains ambiguous 𝟲 (MATHEMATICAL SANS-SERIF BOLD DIGIT SIX). Did you mean 6 (DIGIT SIX)?

(RUF001)


341-341: String contains ambiguous 𝟳 (MATHEMATICAL SANS-SERIF BOLD DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?

(RUF001)


342-342: String contains ambiguous 𝟴 (MATHEMATICAL SANS-SERIF BOLD DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?

(RUF001)


343-343: String contains ambiguous 𝟵 (MATHEMATICAL SANS-SERIF BOLD DIGIT NINE). Did you mean 9 (DIGIT NINE)?

(RUF001)


350-350: String contains ambiguous 𝟎 (MATHEMATICAL BOLD DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?

(RUF001)


351-351: String contains ambiguous 𝟏 (MATHEMATICAL BOLD DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?

(RUF001)


352-352: String contains ambiguous 𝟐 (MATHEMATICAL BOLD DIGIT TWO). Did you mean 2 (DIGIT TWO)?

(RUF001)


353-353: String contains ambiguous 𝟑 (MATHEMATICAL BOLD DIGIT THREE). Did you mean 3 (DIGIT THREE)?

(RUF001)


354-354: String contains ambiguous 𝟒 (MATHEMATICAL BOLD DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?

(RUF001)


355-355: String contains ambiguous 𝟓 (MATHEMATICAL BOLD DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?

(RUF001)


356-356: String contains ambiguous 𝟔 (MATHEMATICAL BOLD DIGIT SIX). Did you mean 6 (DIGIT SIX)?

(RUF001)


357-357: String contains ambiguous 𝟕 (MATHEMATICAL BOLD DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?

(RUF001)


358-358: String contains ambiguous 𝟖 (MATHEMATICAL BOLD DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?

(RUF001)


359-359: String contains ambiguous 𝟗 (MATHEMATICAL BOLD DIGIT NINE). Did you mean 9 (DIGIT NINE)?

(RUF001)


366-366: String contains ambiguous 𝟘 (MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO). Did you mean O (LATIN CAPITAL LETTER O)?

(RUF001)


367-367: String contains ambiguous 𝟙 (MATHEMATICAL DOUBLE-STRUCK DIGIT ONE). Did you mean I (LATIN CAPITAL LETTER I)?

(RUF001)


368-368: String contains ambiguous 𝟚 (MATHEMATICAL DOUBLE-STRUCK DIGIT TWO). Did you mean 2 (DIGIT TWO)?

(RUF001)


369-369: String contains ambiguous 𝟛 (MATHEMATICAL DOUBLE-STRUCK DIGIT THREE). Did you mean 3 (DIGIT THREE)?

(RUF001)


370-370: String contains ambiguous 𝟜 (MATHEMATICAL DOUBLE-STRUCK DIGIT FOUR). Did you mean 4 (DIGIT FOUR)?

(RUF001)


371-371: String contains ambiguous 𝟝 (MATHEMATICAL DOUBLE-STRUCK DIGIT FIVE). Did you mean 5 (DIGIT FIVE)?

(RUF001)


372-372: String contains ambiguous 𝟞 (MATHEMATICAL DOUBLE-STRUCK DIGIT SIX). Did you mean 6 (DIGIT SIX)?

(RUF001)


373-373: String contains ambiguous 𝟟 (MATHEMATICAL DOUBLE-STRUCK DIGIT SEVEN). Did you mean 7 (DIGIT SEVEN)?

(RUF001)


374-374: String contains ambiguous 𝟠 (MATHEMATICAL DOUBLE-STRUCK DIGIT EIGHT). Did you mean 8 (DIGIT EIGHT)?

(RUF001)


375-375: String contains ambiguous 𝟡 (MATHEMATICAL DOUBLE-STRUCK DIGIT NINE). Did you mean 9 (DIGIT NINE)?

(RUF001)

pdf_craft/pdf/__init__.py

4-4: from .ref import * used; unable to detect undefined names

(F403)

pdf_craft/toc/text.py

58-58: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


58-58: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


59-59: String contains ambiguous (HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF001)


60-60: String contains ambiguous (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF001)


61-61: String contains ambiguous (FIGURE DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


62-62: String contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


69-69: String contains ambiguous (SINGLE LOW-9 QUOTATION MARK). Did you mean , (COMMA)?

(RUF001)


70-70: String contains ambiguous (SINGLE HIGH-REVERSED-9 QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


78-78: String contains ambiguous (ONE DOT LEADER). Did you mean . (FULL STOP)?

(RUF001)


82-82: String contains ambiguous (PRIME). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


85-85: String contains ambiguous (REVERSED PRIME). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


88-88: String contains ambiguous (SINGLE LEFT-POINTING ANGLE QUOTATION MARK). Did you mean < (LESS-THAN SIGN)?

(RUF001)


89-89: String contains ambiguous (SINGLE RIGHT-POINTING ANGLE QUOTATION MARK). Did you mean > (GREATER-THAN SIGN)?

(RUF001)


96-96: String contains ambiguous (CARET INSERTION POINT). Did you mean / (SOLIDUS)?

(RUF001)


98-98: String contains ambiguous (HYPHEN BULLET). Did you mean - (HYPHEN-MINUS)?

(RUF001)


99-99: String contains ambiguous (FRACTION SLASH). Did you mean / (SOLIDUS)?

(RUF001)


109-109: String contains ambiguous (LOW ASTERISK). Did you mean * (ASTERISK)?

(RUF001)


114-114: String contains ambiguous (SWUNG DASH). Did you mean ~ (TILDE)?

(RUF001)


121-121: String contains ambiguous (TWO DOT PUNCTUATION). Did you mean : (COLON)?

(RUF001)


126-126: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


126-126: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


191-191: String contains ambiguous (DOUBLE HYPHEN). Did you mean = (EQUALS SIGN)?

(RUF001)


207-207: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


207-207: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


220-220: String contains ambiguous (LEFT TORTOISE SHELL BRACKET). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


221-221: String contains ambiguous (RIGHT TORTOISE SHELL BRACKET). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


235-235: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


235-235: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


236-236: String contains ambiguous (FULLWIDTH EXCLAMATION MARK). Did you mean ! (EXCLAMATION MARK)?

(RUF001)


237-237: String contains ambiguous (FULLWIDTH QUOTATION MARK). Did you mean " (QUOTATION MARK)?

(RUF001)


238-238: String contains ambiguous (FULLWIDTH NUMBER SIGN). Did you mean # (NUMBER SIGN)?

(RUF001)


239-239: String contains ambiguous (FULLWIDTH PERCENT SIGN). Did you mean % (PERCENT SIGN)?

(RUF001)


240-240: String contains ambiguous (FULLWIDTH AMPERSAND). Did you mean & (AMPERSAND)?

(RUF001)


241-241: String contains ambiguous (FULLWIDTH APOSTROPHE). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


242-242: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


243-243: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


244-244: String contains ambiguous (FULLWIDTH ASTERISK). Did you mean * (ASTERISK)?

(RUF001)


245-245: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


246-246: String contains ambiguous (FULLWIDTH FULL STOP). Did you mean . (FULL STOP)?

(RUF001)


247-247: String contains ambiguous (FULLWIDTH SOLIDUS). Did you mean / (SOLIDUS)?

(RUF001)


248-248: String contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF001)


249-249: String contains ambiguous (FULLWIDTH SEMICOLON). Did you mean ; (SEMICOLON)?

(RUF001)


250-250: String contains ambiguous (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?

(RUF001)


251-251: String contains ambiguous (FULLWIDTH COMMERCIAL AT). Did you mean @ (COMMERCIAL AT)?

(RUF001)


252-252: String contains ambiguous (FULLWIDTH LEFT SQUARE BRACKET). Did you mean [ (LEFT SQUARE BRACKET)?

(RUF001)


253-253: String contains ambiguous (FULLWIDTH REVERSE SOLIDUS). Did you mean \ (REVERSE SOLIDUS)?

(RUF001)


254-254: String contains ambiguous (FULLWIDTH RIGHT SQUARE BRACKET). Did you mean ] (RIGHT SQUARE BRACKET)?

(RUF001)


255-255: String contains ambiguous (FULLWIDTH CIRCUMFLEX ACCENT). Did you mean ^ (CIRCUMFLEX ACCENT)?

(RUF001)


256-256: String contains ambiguous _ (FULLWIDTH LOW LINE). Did you mean _ (LOW LINE)?

(RUF001)


257-257: String contains ambiguous (FULLWIDTH GRAVE ACCENT). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


258-258: String contains ambiguous (FULLWIDTH LEFT CURLY BRACKET). Did you mean { (LEFT CURLY BRACKET)?

(RUF001)


259-259: String contains ambiguous (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?

(RUF001)


260-260: String contains ambiguous (FULLWIDTH RIGHT CURLY BRACKET). Did you mean } (RIGHT CURLY BRACKET)?

(RUF001)


261-261: String contains ambiguous (FULLWIDTH TILDE). Did you mean ~ (TILDE)?

(RUF001)


269-269: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


269-269: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


273-273: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


273-273: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


275-275: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


275-275: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


276-276: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


276-276: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


276-276: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


281-281: String contains ambiguous ؍ (ARABIC DATE SEPARATOR). Did you mean , (COMMA)?

(RUF001)


288-288: String contains ambiguous ٫ (ARABIC DECIMAL SEPARATOR). Did you mean , (COMMA)?

(RUF001)


290-290: String contains ambiguous ٭ (ARABIC FIVE POINTED STAR). Did you mean * (ASTERISK)?

(RUF001)


294-294: String contains ambiguous ׀ (HEBREW PUNCTUATION PASEQ). Did you mean l (LATIN SMALL LETTER L)?

(RUF001)


295-295: String contains ambiguous ׃ (HEBREW PUNCTUATION SOF PASUQ). Did you mean : (COLON)?

(RUF001)


297-297: String contains ambiguous ׳ (HEBREW PUNCTUATION GERESH). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


326-326: String contains ambiguous (CANADIAN SYLLABICS FULL STOP). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


336-336: String contains ambiguous (MONGOLIAN FULL STOP). Did you mean : (COLON)?

(RUF001)


342-342: String contains ambiguous (MONGOLIAN MANCHU FULL STOP). Did you mean : (COLON)?

(RUF001)


357-357: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


357-357: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


358-358: String contains ambiguous (CARET INSERTION POINT). Did you mean / (SOLIDUS)?

(RUF001)


359-359: String contains ambiguous (SWUNG DASH). Did you mean ~ (TILDE)?

(RUF001)

pdf_craft/pdf/page_ref.py

108-111: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 27 to 34
# 不可直接用 doc-page-extractor 的 Error,该库的一切都是懒加载,若暴露,则无法懒加载
class InterruptedError(Exception):
"""Raised when the operation is interrupted by the user."""

def __init__(self, metering: OCRTokensMetering) -> None:
super().__init__()
self._kind: InterruptedKind
self._metering: OCRTokensMetering = metering
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

InterruptedError._kind is declared but never assigned.

Line 33 declares self._kind: InterruptedKind but it's never assigned a value. In to_interrupted_error, the kind variable is computed but not passed to InterruptedError.__init__ or set on the instance.

If _kind is intended to be accessible, the constructor should accept and store it:

🔧 Proposed fix
 class InterruptedError(Exception):
     """Raised when the operation is interrupted by the user."""

-    def __init__(self, metering: OCRTokensMetering) -> None:
+    def __init__(self, kind: InterruptedKind, metering: OCRTokensMetering) -> None:
         super().__init__()
-        self._kind: InterruptedKind
+        self._kind: InterruptedKind = kind
         self._metering: OCRTokensMetering = metering

And update the instantiation in to_interrupted_error:

         if kind is not None:
             return InterruptedError(
+                kind,
                 OCRTokensMetering(
                     input_tokens=error.input_tokens,
                     output_tokens=error.output_tokens,
                 )
             )
🧰 Tools
🪛 Ruff (0.14.11)

27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


27-27: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

🤖 Prompt for AI Agents
In `@pdf_craft/error.py` around lines 27 - 34, InterruptedError.__init__ declares
self._kind but never assigns it; update InterruptedError to accept a kind:
InterruptedKind parameter (e.g., def __init__(self, metering: OCRTokensMetering,
kind: InterruptedKind) -> None) and store it to self._kind, and then update the
caller in to_interrupted_error to pass the computed kind when constructing
InterruptedError so the instance preserves the interruption kind.

from .page_ref import pdf_pages_count
from .types import decode, encode, Page, PageLayout, PDFDocumentMetadata, DeepSeekOCRSize
from .handler import PDFHandler, PDFDocument, DefaultPDFHandler, DefaultPDFDocument
from .ref import *
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate the ref.py file in pdf_craft/pdf
fd -t f "ref.py" pdf_craft/pdf

Repository: oomol-lab/pdf-craft

Length of output: 109


🏁 Script executed:

# Check the content of ref.py to see if it defines __all__
cat -n pdf_craft/pdf/ref.py

Repository: oomol-lab/pdf-craft

Length of output: 105


🏁 Script executed:

# Check the current state of __init__.py
cat -n pdf_craft/pdf/__init__.py

Repository: oomol-lab/pdf-craft

Length of output: 451


🏁 Script executed:

# Get the exact line count of ref.py
wc -l pdf_craft/pdf/ref.py

Repository: oomol-lab/pdf-craft

Length of output: 85


🏁 Script executed:

# Search for __all__ definition in ref.py specifically
rg "__all__" pdf_craft/pdf/ref.py

Repository: oomol-lab/pdf-craft

Length of output: 45


🏁 Script executed:

# Get the complete content of ref.py with all details
cat pdf_craft/pdf/ref.py

Repository: oomol-lab/pdf-craft

Length of output: 98


Replace wildcard import with explicit import.

The from .ref import * triggers ruff F403 and makes it impossible to statically determine what names are exported. Since .ref only exports TITLE_TAGS and does not define __all__, replace this with an explicit import:

from .ref import TITLE_TAGS

This eliminates the F403 warning and improves code clarity.

🧰 Tools
🪛 Ruff (0.14.11)

4-4: from .ref import * used; unable to detect undefined names

(F403)

🤖 Prompt for AI Agents
In `@pdf_craft/pdf/__init__.py` at line 4, Replace the wildcard import from .ref
with an explicit import of TITLE_TAGS: change the line "from .ref import *" to
import only TITLE_TAGS (e.g., "from .ref import TITLE_TAGS") and, if you need to
control exports from this package, add or update __all__ to include "TITLE_TAGS"
so static analysis (ruff F403) no longer flags the wildcard import.

Comment on lines +207 to +222
(
NumberClass.CIRCLED_NUMBER,
NumberStyle.DOUBLE_CIRCLED_NUMBER,
(
(0, "⓵"),
(1, "⓶"),
(2, "⓷"),
(3, "⓸"),
(4, "⓹"),
(5, "⓺"),
(6, "⓻"),
(7, "⓼"),
(8, "⓽"),
(9, "⓾"),
),
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify: DOUBLE_CIRCLED_NUMBER mapping starts at 0 but uses ⓵ (double circled one).

The mapping (0, "⓵") appears incorrect. The Unicode character (U+24F5) is "DOUBLE CIRCLED DIGIT ONE", not zero. This would cause Mark.number to be 0 when the visual character represents 1.

Compare with CIRCLED_NUMBER which correctly maps (0, "⓪") to the circled zero character.

🔧 Suggested fix
         (
             NumberClass.CIRCLED_NUMBER,
             NumberStyle.DOUBLE_CIRCLED_NUMBER,
             (
-                (0, "⓵"),
-                (1, "⓶"),
-                (2, "⓷"),
-                (3, "⓸"),
-                (4, "⓹"),
-                (5, "⓺"),
-                (6, "⓻"),
-                (7, "⓼"),
-                (8, "⓽"),
-                (9, "⓾"),
+                (1, "⓵"),
+                (2, "⓶"),
+                (3, "⓷"),
+                (4, "⓸"),
+                (5, "⓹"),
+                (6, "⓺"),
+                (7, "⓻"),
+                (8, "⓼"),
+                (9, "⓽"),
+                (10, "⓾"),
             ),
         ),
🤖 Prompt for AI Agents
In `@pdf_craft/sequence/mark.py` around lines 207 - 222, The DOUBLE_CIRCLED_NUMBER
tuple currently maps (0, "⓵") which misaligns numeric values with their glyphs;
in the NumberStyle.DOUBLE_CIRCLED_NUMBER mapping update the integer keys to
match the visual digits (i.e., start at 1 for "⓵", 2 for "⓶", etc.) so each
tuple pair correctly associates the number with its corresponding double-circled
Unicode character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant