I am trying to understand how did you evaluate your models using HSK dataset? Is the code released? I can not find it. Could you publish the dataset?
我们从教师与学习者两个方面出发,分别对几个模型在国际汉语教师资格证考试与汉语水平考试(HSK)上的表现进行了测评。其中HSK考试采用2018年官方出版的考试真题,从一级到六级各选择一套。国际汉语教师资格证考试采用2021年出版的官方真题。试题以客观题为主,主观题不参与计分。以HSK4-6级为例:</p>
试题(客观题) | Taoli 1.0 | GPT-4
-- | -- | --
HSK4 | 55 | 78
HSK5 | 60 | 85
HSK6 | 42 | 76
I am trying to understand how did you evaluate your models using HSK dataset? Is the code released? I can not find it. Could you publish the dataset?