Tune worker to prod GPUs

Experiment and tune num_workers, batch_size and pre_fetch_factors. Depending on how much variance there is a across models this could per-model/per-GPU type configuration.

Recommend using something like this to record GPU utilization during the runs:
```
nvidia-smi --query-gpu=timestamp,name,utilization.gpu,memory.total,memory.used,memory.free --format=csv > gpu_log.csv
```