Skip to content

Performance of image level visual language tasks for alpha-clip LLaVA-1.5 #68

Description

@Lay-du

大佬您好,论文中有提到Alpha-CLIP训练时文本编码器冻结,图像编码器在GRIT20M上全参调整,代码实现时也会预先加载预训练的CLIP权重。请问图像编码器的初始权重是加载的预训练的CLIP中图像编码器的权重吗。如果直接用clip_l14@336_grit_20m_4xe替换LLaVA中的CLIP,经过与LLaVA相同的预训练和指令微调阶段后,模型在图像级视觉语言任务基准上的性能会有明显下降吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions