Skip to content

Add already_sharded option to skip double-sharding of DataLoaders#4087

Open
devangpratap wants to merge 2 commits into
huggingface:mainfrom
devangpratap:feat/already-sharded-dataloader
Open

Add already_sharded option to skip double-sharding of DataLoaders#4087
devangpratap wants to merge 2 commits into
huggingface:mainfrom
devangpratap:feat/already-sharded-dataloader

Conversation

@devangpratap

Copy link
Copy Markdown

What does this PR do?

When a user already shards their DataLoader across processes (for example with a rank-aware DistributedSampler), Accelerator.prepare shards it a second time. The data ends up split twice, so each process only iterates a fraction of the samples it should. This is a recurring report: #3520, #4062, #4075.

This adds an already_sharded option to DataLoaderConfiguration (threaded through to prepare_data_loader). When set, Accelerate assumes each process's DataLoader is already sharded and skips its own sharding, while still keeping the rest of the wrapper behaviour: device placement, set_epoch forwarding, and dataloader state tracking.

from accelerate import Accelerator
from accelerate.utils import DataLoaderConfiguration

accelerator = Accelerator(
    dataloader_config=DataLoaderConfiguration(already_sharded=True)
)

sampler = DistributedSampler(dataset, num_replicas=accelerator.num_processes, rank=accelerator.process_index)
dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler)
dataloader = accelerator.prepare(dataloader)  # not split a second time

already_sharded=True is rejected together with dispatch_batches=True or split_batches=True, since both conflict with a dataloader that is already split per process. Accelerate's even_batches padding is also skipped in this mode, so the user is responsible for each process iterating the same number of batches.

Fixes #4075 (also addresses #3520 and #4062).

Before submitting

Who can review?

@SunMarc

When a user shards their DataLoader per-process (e.g. with a rank-aware
DistributedSampler), Accelerator.prepare shards it a second time, so each
process ends up iterating only a fraction of its intended data. This adds
an `already_sharded` flag to DataLoaderConfiguration (and prepare_data_loader)
that tells Accelerate to keep the user's sharding and skip its own, while
still applying device placement, set_epoch forwarding, and state tracking.

It is rejected together with dispatch_batches or split_batches, since those
modes conflict with a dataloader that is already split per process.
@devangpratap devangpratap force-pushed the feat/already-sharded-dataloader branch from 135e74a to c4dd843 Compare June 26, 2026 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] Support already-sharded DataLoaders in Accelerator.prepare

1 participant