Hi, the paper report a wall-time of 6 weeks for training the biggest model. Do you have estimates for the training of the smaller models? Best Leonhard
Hi,
the paper report a wall-time of 6 weeks for training the biggest model.
Do you have estimates for the training of the smaller models?
Best
Leonhard