Add already_sharded option to skip double-sharding of DataLoaders by devangpratap · Pull Request #4087 · huggingface/accelerate

devangpratap · 2026-06-25T23:52:17Z

What does this PR do?

When a user already shards their DataLoader across processes (for example with a rank-aware DistributedSampler), Accelerator.prepare shards it a second time. The data ends up split twice, so each process only iterates a fraction of the samples it should. This is a recurring report: #3520, #4062, #4075.

This adds an already_sharded option to DataLoaderConfiguration (threaded through to prepare_data_loader). When set, Accelerate assumes each process's DataLoader is already sharded and skips its own sharding, while still keeping the rest of the wrapper behaviour: device placement, set_epoch forwarding, and dataloader state tracking.

from accelerate import Accelerator
from accelerate.utils import DataLoaderConfiguration

accelerator = Accelerator(
    dataloader_config=DataLoaderConfiguration(already_sharded=True)
)

sampler = DistributedSampler(dataset, num_replicas=accelerator.num_processes, rank=accelerator.process_index)
dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler)
dataloader = accelerator.prepare(dataloader)  # not split a second time

already_sharded=True is rejected together with dispatch_batches=True or split_batches=True, since both conflict with a dataloader that is already split per process. Accelerate's even_batches padding is also skipped in this mode, so the user is responsible for each process iterating the same number of batches.

Fixes #4075 (also addresses #3520 and #4062).

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue? Linked above ([Feature request] Support already-sharded DataLoaders in Accelerator.prepare #4075).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@SunMarc

When a user shards their DataLoader per-process (e.g. with a rank-aware DistributedSampler), Accelerator.prepare shards it a second time, so each process ends up iterating only a fraction of its intended data. This adds an `already_sharded` flag to DataLoaderConfiguration (and prepare_data_loader) that tells Accelerate to keep the user's sharding and skip its own, while still applying device placement, set_epoch forwarding, and state tracking. It is rejected together with dispatch_batches or split_batches, since those modes conflict with a dataloader that is already split per process.

devangpratap force-pushed the feat/already-sharded-dataloader branch from 135e74a to c4dd843 Compare June 26, 2026 00:29

Merge branch 'main' into feat/already-sharded-dataloader

7ca9876

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add already_sharded option to skip double-sharding of DataLoaders#4087

Add already_sharded option to skip double-sharding of DataLoaders#4087
devangpratap wants to merge 2 commits into
huggingface:mainfrom
devangpratap:feat/already-sharded-dataloader

devangpratap commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

devangpratap commented Jun 25, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant