I used the command ``` nlp-train transformer_glue \ --task_name mrpc \ --model_name_or_path bert-base-uncased \ --model_type quant_bert \ --learning_rate 2e-5 \ --output_dir /tmp/mrpc-8bit \ --evaluate_during_training \ --data_dir /path/to/MRPC \ --do_lower_case ``` to training the model and ``` nlp-inference transformer_glue \ --model_path /tmp/mrpc-8bit \ --task_name mrpc \ --model_type quant_bert \ --output_dir /tmp/mrpc-8bit \ --data_dir /path/to/MRPC \ --do_lower_case \ --overwrite_output_dir \ --load_quantized_model ``` to do inference,but got the same performance as no flag `--load_quantized_model`.How could I improve the inference performance?
I used the command
to training the model and
to do inference,but got the same performance as no flag
--load_quantized_model.How could I improve the inference performance?