First of all, I would like to express my sincere appreciation for your research.
I am reaching out because I am experiencing difficulties obtaining a DashScope API key.
As mentioned above, due to the difficulty in obtaining the API, I am currently using the Hugging Face version of the Qwen2.5-72B-Instruct model as a substitute. I would like to ask whether there are any differences between this version and the DashScope-based Qwen2.5-72B-Instruct model referenced in your code.
Additionally, I noticed that the default input sequence length for the Qwen2.5-72B-Instruct model is set to 32,768 tokens. Could you kindly let me know what sequence length you used in your experiments, or if you extended it beyond the default?
Thank you very much for your time and assistance.
First of all, I would like to express my sincere appreciation for your research.
I am reaching out because I am experiencing difficulties obtaining a DashScope API key.
As mentioned above, due to the difficulty in obtaining the API, I am currently using the Hugging Face version of the Qwen2.5-72B-Instruct model as a substitute. I would like to ask whether there are any differences between this version and the DashScope-based Qwen2.5-72B-Instruct model referenced in your code.
Additionally, I noticed that the default input sequence length for the Qwen2.5-72B-Instruct model is set to 32,768 tokens. Could you kindly let me know what sequence length you used in your experiments, or if you extended it beyond the default?
Thank you very much for your time and assistance.