Experiment and tune num_workers, batch_size and pre_fetch_factors. Depending on how much variance there is a across models this could per-model/per-GPU type configuration.
Recommend using something like this to record GPU utilization during the runs:
nvidia-smi --query-gpu=timestamp,name,utilization.gpu,memory.total,memory.used,memory.free --format=csv > gpu_log.csv
Experiment and tune num_workers, batch_size and pre_fetch_factors. Depending on how much variance there is a across models this could per-model/per-GPU type configuration.
Recommend using something like this to record GPU utilization during the runs: