[BUG: Breaking] 'SuryaDecoderConfig' object has no attribute 'pad_token_id'

## 🧨 Describe the Bug

Presently for me, installing surya in a fresh venv, it fails when run with "'SuryaDecoderConfig' object has no attribute 'pad_token_id'"

Manually downgrading transformers to 4.57.3 fixes the issue.

Paste the **complete** stack trace or error output, if available.

<details>
<summary>Click to expand</summary>

Traceback (most recent call last):
  File "/home/jamie/numbers/.venv/bin/surya_ocr", line 7, in <module>
    sys.exit(ocr_text_cli())
             ~~~~~~~~~~~~^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/click/core.py", line 1406, in main
    rv = self.invoke(ctx)
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/scripts/ocr_text.py", line 29, in ocr_text_cli
    foundation_predictor = FoundationPredictor()
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/foundation/__init__.py", line 113, in __init__
    super().__init__(checkpoint, device, dtype, attention_implementation)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/common/predictor.py", line 37, in __init__
    self.model = loader.model(device, dtype, attention_implementation)
                 ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/foundation/loader.py", line 69, in model
    model = model_cls.from_pretrained(
            ~~~~~~~~~~~~~~~~~~~~~~~~~^
        self.checkpoint, dtype=dtype, config=config, ignore_mismatched_sizes=True
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ).to(device)
    ^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/common/s3.py", line 182, in from_pretrained
    return super().from_pretrained(local_path, *args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/transformers/modeling_utils.py", line 4072, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/common/surya/__init__.py", line 123, in __init__
    decoder = SuryaDecoderModel(config.decoder)
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/surya/common/surya/decoder/__init__.py", line 447, in __init__
    self.padding_idx = config.pad_token_id
                       ^^^^^^^^^^^^^^^^^^^
  File "/home/jamie/numbers/.venv/lib/python3.13/site-packages/transformers/configuration_utils.py", line 164, in __getattribute__
    return super().__getattribute__(key)
           ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
AttributeError: 'SuryaDecoderConfig' object has no attribute 'pad_token_id'

</details>

## ⚙️ Environment

Please fill in all relevant details:

- surya-ocr==0.17.1
- Python 3.13.7
- torch==2.9.1+cpu
- transformers==5.0.0
- Arch Linux

## 📟 Command or Code Used

Paste the **exact bash command** or **Python code** you used to run Marker:

<details>
<summary>Click to expand</summary>

surya_ocr test.png --images --output_dir test

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG: Breaking] 'SuryaDecoderConfig' object has no attribute 'pad_token_id' #484

🧨 Describe the Bug

⚙️ Environment

📟 Command or Code Used

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG: Breaking] 'SuryaDecoderConfig' object has no attribute 'pad_token_id' #484

Description

🧨 Describe the Bug

⚙️ Environment

📟 Command or Code Used

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions