Skip to content

torch.AcceleratorError in unpack_qkv_with_mask on MPS (Apple Silicon) #490

@kiki830621

Description

@kiki830621

Bug Description

surya-ocr crashes with torch.AcceleratorError: index out of bounds during layout recognition and OCR text recognition on Apple Silicon (MPS backend) when processing longer PDFs (60+ pages).

The crash occurs in surya/common/surya/encoder/__init__.py:438:

def unpack_qkv_with_mask(self, q, k, v, cu_seqlens):
    seq_lengths = cu_seqlens[1:] - cu_seqlens[:-1]
    max_seq_len = seq_lengths.max().item()  # ← crash here
torch.AcceleratorError: index 7984 is out of bounds: 2, range 0 to 2184

Root Cause

This is a PyTorch MPS backend bug — the .max() kernel produces an incorrect index for certain tensor shapes. The tensor itself is small (sequence lengths), but the MPS kernel has an edge case.

Environment

  • surya-ocr: 0.17.1
  • PyTorch: 2.7.0
  • macOS: Darwin 25.3.0 (Apple M4 Pro)
  • Python: 3.13

Suggested Fix

Option 1 — Use an alternative tensor op that may route through a different MPS kernel:

max_seq_len = torch.amax(seq_lengths, dim=0).item()

Option 2 — Surgical .cpu() on just this tiny tensor (negligible perf impact):

max_seq_len = seq_lengths.cpu().max().item()

The heavy attention/matmul computations all stay on MPS. Only this small indexing operation (tens of elements) moves to CPU, so performance cost is effectively zero.

Workarounds Tested

Workaround Result
TORCH_DEVICE=cpu ✅ Works but ~10x slower, unusable for long docs
PYTORCH_ENABLE_MPS_FALLBACK=1 ❌ Doesn't help (MPS "supports" .max(), it's just buggy)
Smaller batch via --page_range ⚠️ Reduces frequency but doesn't eliminate

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions