I pretrained a condeser-roberta-base on the same data and hyperparameters, but the results on downstream tasks were not high. Have you ever tried condenser pretraining on RoBERTa-base ? Thank you
I pretrained a condeser-roberta-base on the same data and hyperparameters, but the results on downstream tasks were not high.
Have you ever tried condenser pretraining on RoBERTa-base ?
Thank you