Skip to content

MTP weights can not export from DCP to HF safetensors with megatron export and swift export #9504

@HowardZorn

Description

@HowardZorn

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

如果在训练时候选择保存DCP,通常只能用MS-Swift自带的export命令进行HF safetensors转换。然而,无论是选择使用megatron export还是swift export处理DCP,都无法转换出MTP权重。

How to Reproduce / 如何复现

找到一个任意带MTP的模型(比如Qwen3.5系列),进行训练,ckpt格式选择DCP。之后用megatron export或者swift export进行转换即可复现问题。

Additional Information / 补充信息

怀疑是这段代码的问题:
https://github.com/modelscope/ms-swift/blob/277c9702ecd738877c064a6409881bfb32a8784b/swift/megatron/arguments/megatron_args.py#L663-664

如果直接load args.json,会忽略MTP argument,请问这是feature还是bug?我需要提交一个PR修复吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions