Batch prefetching by dxqb · Pull Request #1461 · Nerogar/OneTrainer

dxqb · 2026-05-17T07:42:33Z

"Dataloader Threads" is a misnomer. It sounds like we are using multiple threads to load data. We actually don't:
It is the number of threads that are used to build the cache.

Loading from the cache is currently done sequentially in the training loop: load batch 1 - train batch 1 - load batch 2 - train batch 2 - ...

This can have a major performance impact if the cache lives on hdd.

This PR renames the "Dataloader Threads" to "Caching Threads" and introduces batch prefetching:
during training of batch 1, batch 2 is loaded from disk

@Calamdor

Adds prefetch_next_batch option that loads the next batch on a background thread, overlapping disk reads with the current training step. Most beneficial when caching is enabled. Renames dataloader_threads to caching_threads to better reflect its purpose. The UI places Prefetch Next Batch above Clear cache before training. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g loop Tensor uploads to the GPU in OutputPipelineModule were enqueued on the default CUDA stream, so each H2D transfer had to wait for the current training step's GPU work to finish before it could start. Running the producer under its own stream lets uploads proceed independently, allowing the prefetch queue to stay ahead of the training loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dxqb · 2026-05-17T09:34:43Z

using prefetch when cache is disabled might cause issues. the text encoder and/or VAE run in parallel during the training step. That's probably okay, but they could theoretically also still run while the model is moved off GPU for sampling, backup, ...
either add something to wait for the prefetcher, or disable prefetching when disk cache isn't used

dxqb · 2026-06-14T15:34:02Z

Claude: Heads up — the dataloader_threads → caching_threads rename's migration (__migration_10, the if "dataloader_threads" in migrated_data: migrated_data["caching_threads"] = migrated_data.pop("dataloader_threads") step) won't run for the built-in training_presets/#*.json files.

BaseTopBarView.__load_current_config forces loaded_dict["__version"] = default_config.config_version for built-in presets (files starting with #), on the assumption that they're "saved in the most recent version" — this skips the entire migration chain. The 21 built-in presets that still have "dataloader_threads": 1 will silently lose that value (the key is no longer in TrainConfig's fields, so it's ignored), and caching_threads falls back to its default of 2 instead.

Not data-destroying since 2 is a sane default, but worth either updating those 21 preset files to "caching_threads" directly, or keeping dataloader_threads as a deprecated alias read at load time.

This PR renamed dataloader_threads to caching_threads in TrainConfig, but built-in presets are loaded with migrate=False, so the old key in these preset files was silently dropped, leaving caching_threads at its default of 2. That conflicts with the offloading guard in create.py ("layer offloading can not be activated if caching_threads > 1") for any preset combining offloading with the old key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… key These presets were added after PR Nerogar#1461 renamed dataloader_threads to caching_threads, but were copied from an already-stale template. None of the anima/ideogram/lens model branches have Nerogar#1461 merged, so this fixes them directly on preview. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts: # modules/dataLoader/ErnieBaseDataLoader.py # modules/dataLoader/Flux2BaseDataLoader.py # modules/dataLoader/ZImageBaseDataLoader.py # training_presets/#flux2 Finetune 16GB.json # training_presets/#flux2 Finetune 24GB.json # training_presets/#flux2 LoRA 16GB.json # training_presets/#flux2 LoRA 8GB.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dxqb and others added 2 commits May 17, 2026 09:29

dxqb added the preview merged in the preview branch label May 29, 2026

dxqb added a commit to TheForgotten69/OneTrainer that referenced this pull request Jun 3, 2026

Merge PR Nerogar#1461 (Batch prefetching) into preview

61e063d

dxqb added a commit that referenced this pull request Jun 4, 2026

Merge PR #1461 (Batch prefetching) into preview

6667a35

dxqb added a commit that referenced this pull request Jun 14, 2026

Merge PR #1461 (Batch prefetching) into preview

501f97e

dxqb and others added 2 commits June 19, 2026 07:48

Remove accidentally committed local CLAUDE.md symlink

d378d33

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dxqb added a commit that referenced this pull request Jun 19, 2026

Merge PR #1461 (Batch prefetching) into preview

eac416e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch prefetching#1461

Batch prefetching#1461
dxqb wants to merge 5 commits into
Nerogar:masterfrom
dxqb:prefetch-next-batch

dxqb commented May 17, 2026 •

edited

Loading

Uh oh!

dxqb commented May 17, 2026

Uh oh!

dxqb commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dxqb commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented May 17, 2026

Uh oh!

dxqb commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dxqb commented May 17, 2026 •

edited

Loading