Skip to content

HSK evaluation? #2

@Jeffwan

Description

@Jeffwan

I am trying to understand how did you evaluate your models using HSK dataset? Is the code released? I can not find it. Could you publish the dataset?

我们从教师与学习者两个方面出发,分别对几个模型在国际汉语教师资格证考试与汉语水平考试(HSK)上的表现进行了测评。其中HSK考试采用2018年官方出版的考试真题,从一级到六级各选择一套。国际汉语教师资格证考试采用2021年出版的官方真题。试题以客观题为主,主观题不参与计分。以HSK4-6级为例:</p>

试题(客观题) | Taoli 1.0 | GPT-4
-- | -- | --
HSK4 | 55 | 78
HSK5 | 60 | 85
HSK6 | 42 | 76

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions