fix: use sequential read instead of seek for multipart uploads on Windows by s-zx · Pull Request #3912 · huggingface/huggingface_hub

s-zx · 2026-03-10T22:03:16Z

Summary

File uploads over 2GB get stuck at ~1.98GB on Windows 11 with Python 3.12. The upload speed reduces then stops entirely.

Root Cause

On Windows, file.seek() with offset >= 2GB can fail due to 32-bit signed integer limits in some C runtime APIs. The multipart LFS upload used SliceFileObj, which seeks to chunk_size * part_idx for each part. When that offset exceeds 2^31 bytes, the seek fails and the upload hangs.

Fix

Replace SliceFileObj + seek with sequential reads: file.read(chunk_size) for each chunk. This avoids large-offset seeks entirely and works for files of any size on Windows.

Changes:

src/huggingface_hub/lfs.py: _upload_parts_iteratively now reads chunks sequentially
src/huggingface_hub/cli/lfs.py: lfs_multipart_upload (git-lfs custom transfer) uses the same approach

Fixes #3871

Note

Low Risk
Small, localized change to multipart upload chunk reading logic; primary risk is behavior differences around EOF/truncated files and memory usage per chunk.

Overview
Fixes multipart LFS uploads hanging on Windows for files >2GB by removing SliceFileObj/seek()-based chunking and switching to sequential read(chunk_size) uploads in both the CLI custom transfer agent (cli/lfs.py) and the library uploader (lfs._upload_parts_iteratively).

Adds a guard that raises a ValueError if a chunk unexpectedly reads empty before all parts are uploaded, making truncated reads fail fast instead of stalling.

^{Written by Cursor Bugbot for commit 062c99b. This will update automatically on new commits. Configure here.}

…dows File uploads over 2GB get stuck on Windows because file.seek() with offset >= 2GB can fail due to 32-bit signed integer limits in some Windows APIs. The multipart LFS upload used SliceFileObj which seeks to chunk_size * part_idx for each part; when that offset exceeds 2^31 bytes, the seek fails and the upload hangs. Fix by reading chunks sequentially (file.read(chunk_size)) instead of seeking to each part offset. This avoids large-offset seeks entirely and works for files of any size on Windows. Fixes huggingface#3871 Signed-off-by: s-zx <s-zx@users.noreply.github.com>

bot-ci-comment · 2026-03-12T15:48:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin

Hi @s-zx thanks for raising this error. I had never thought of this limitation on 32-bits systems. I do understand the need for a fix but I think it should rather be done in SliceFileObj instead with iterative file.seek if needed.

The problem of the current solution is that it introduces a potentially large memory footprint which we want to avoid at all costs (especially since it's to fix an issue happening only in rare cases). Could you rewrite the PR differently? Thanks in advance

Wauplin · 2026-03-12T15:47:02Z

src/huggingface_hub/cli/lfs.py

+                # Read chunk sequentially instead of using SliceFileObj + seek.
+                # On Windows, file.seek() with offset >= 2GB can fail due to 32-bit
+                # signed integer limits in some APIs (see issue #3871).
+                chunk_data = file.read(chunk_size)


This will load the entire chunk_data into memory which we absolutely want to avoid on large files.

Move the 32-bit seek overflow fix into SliceFileObj itself rather than removing SliceFileObj usage from callers. This keeps the streaming behavior (no large chunks loaded into memory) while avoiding seek() calls with offsets >= 2**31 that fail on 32-bit Windows. Add a _safe_seek() helper that breaks large SEEK_SET/SEEK_CUR offsets into multiple incremental SEEK_CUR steps, each within the 32-bit safe range. Small offsets pass straight through with no overhead. Addresses review feedback on PR huggingface#3912.

s-zx · 2026-03-25T23:00:00Z

@Wauplin Thanks for the feedback — reworked in the latest commit.

The fix is now entirely inside SliceFileObj:

Added _safe_seek() helper in _lfs.py that breaks large SEEK_SET offsets (>= 2^31) into incremental SEEK_CUR steps of at most 2^31 - 1 bytes
Updated the three seek call sites in SliceFileObj.__enter__, __exit__, and seek() to use _safe_seek()
Reverted cli/lfs.py and lfs.py back to original SliceFileObj usage — no more loading chunks into memory

Zero-copy, no memory overhead, and the callers don't need to know about the 32-bit limitation.

Wauplin requested changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use sequential read instead of seek for multipart uploads on Windows#3912

fix: use sequential read instead of seek for multipart uploads on Windows#3912
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/3871-windows-2gb-upload

s-zx commented Mar 10, 2026 •

edited by cursor bot

Loading

Uh oh!

bot-ci-comment bot commented Mar 12, 2026

Uh oh!

Wauplin left a comment

Uh oh!

Wauplin Mar 12, 2026

Uh oh!

s-zx commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

s-zx commented Mar 10, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Uh oh!

bot-ci-comment bot commented Mar 12, 2026

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Wauplin Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

s-zx commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

s-zx commented Mar 10, 2026 •

edited by cursor bot

Loading