Skip to content

ulysses, num_attention_head整除问题 #5

Description

@xs1997zju

hello, 看论文和代码描述, 序列并行使用的是DeepSpeed-Ulysses, 并行度设置为world size, 但DeepSpeed-Ulysses并行需要num_attn_heads能够整除并行度, 如果是单机8卡, 而qwen2.5的num_attn_heads是28, 这个也能跑起来?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions