Skip to content

可以披露下图4中 SC-MoE的实验参数设置么? #39

@yimeng0701

Description

@yimeng0701

RT
可否披露下图4中这几组对比实验的具体模型架构,实验设置,以及训练token数?(a) 2.4B-16B with MLA, (b) 3B-20B with MHA, and
(c) 15B-193B with GQA

因为我自己测好像SC-MoE效果没有这么好,想学习下实验差别在哪里~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions