Skip to content

Batch prefetching#1461

Open
dxqb wants to merge 5 commits into
Nerogar:masterfrom
dxqb:prefetch-next-batch
Open

Batch prefetching#1461
dxqb wants to merge 5 commits into
Nerogar:masterfrom
dxqb:prefetch-next-batch

Conversation

@dxqb

@dxqb dxqb commented May 17, 2026

Copy link
Copy Markdown
Collaborator

"Dataloader Threads" is a misnomer. It sounds like we are using multiple threads to load data. We actually don't:
It is the number of threads that are used to build the cache.

Loading from the cache is currently done sequentially in the training loop: load batch 1 - train batch 1 - load batch 2 - train batch 2 - ...

This can have a major performance impact if the cache lives on hdd.

This PR renames the "Dataloader Threads" to "Caching Threads" and introduces batch prefetching:
during training of batch 1, batch 2 is loaded from disk

@Calamdor

grafik

dxqb and others added 2 commits May 17, 2026 09:29
Adds prefetch_next_batch option that loads the next batch on a background
thread, overlapping disk reads with the current training step. Most
beneficial when caching is enabled. Renames dataloader_threads to
caching_threads to better reflect its purpose.

The UI places Prefetch Next Batch above Clear cache before training.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g loop

Tensor uploads to the GPU in OutputPipelineModule were enqueued on the
default CUDA stream, so each H2D transfer had to wait for the current
training step's GPU work to finish before it could start.  Running the
producer under its own stream lets uploads proceed independently,
allowing the prefetch queue to stay ahead of the training loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dxqb

dxqb commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author
  • using prefetch when cache is disabled might cause issues. the text encoder and/or VAE run in parallel during the training step. That's probably okay, but they could theoretically also still run while the model is moved off GPU for sampling, backup, ...
    either add something to wait for the prefetcher, or disable prefetching when disk cache isn't used

@dxqb dxqb added the preview merged in the preview branch label May 29, 2026
dxqb added a commit to TheForgotten69/OneTrainer that referenced this pull request Jun 3, 2026
dxqb added a commit that referenced this pull request Jun 4, 2026
@dxqb

dxqb commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

Claude: Heads up — the dataloader_threadscaching_threads rename's migration (__migration_10, the if "dataloader_threads" in migrated_data: migrated_data["caching_threads"] = migrated_data.pop("dataloader_threads") step) won't run for the built-in training_presets/#*.json files.

BaseTopBarView.__load_current_config forces loaded_dict["__version"] = default_config.config_version for built-in presets (files starting with #), on the assumption that they're "saved in the most recent version" — this skips the entire migration chain. The 21 built-in presets that still have "dataloader_threads": 1 will silently lose that value (the key is no longer in TrainConfig's fields, so it's ignored), and caching_threads falls back to its default of 2 instead.

Not data-destroying since 2 is a sane default, but worth either updating those 21 preset files to "caching_threads" directly, or keeping dataloader_threads as a deprecated alias read at load time.

dxqb added a commit that referenced this pull request Jun 14, 2026
This PR renamed dataloader_threads to caching_threads in TrainConfig,
but built-in presets are loaded with migrate=False, so the old key in
these preset files was silently dropped, leaving caching_threads at its
default of 2. That conflicts with the offloading guard in create.py
("layer offloading can not be activated if caching_threads > 1") for
any preset combining offloading with the old key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb added a commit to dxqb/OneTrainer that referenced this pull request Jun 17, 2026
… key

These presets were added after PR Nerogar#1461 renamed dataloader_threads to
caching_threads, but were copied from an already-stale template. None
of the anima/ideogram/lens model branches have Nerogar#1461 merged, so this
fixes them directly on preview.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb and others added 2 commits June 19, 2026 07:48
# Conflicts:
#	modules/dataLoader/ErnieBaseDataLoader.py
#	modules/dataLoader/Flux2BaseDataLoader.py
#	modules/dataLoader/ZImageBaseDataLoader.py
#	training_presets/#flux2 Finetune 16GB.json
#	training_presets/#flux2 Finetune 24GB.json
#	training_presets/#flux2 LoRA 16GB.json
#	training_presets/#flux2 LoRA 8GB.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb added a commit that referenced this pull request Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview merged in the preview branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant