Skip to content

Why only mlp2x_gelu pretrained weights when code supports multiple projector types? #94

Description

@swetha4444

I notice that the VideoLLaMA3 codebase has support for multiple projector architectures:

mlp2x_gelu (currently used in all pretrained models)
mlp3x_gelu
linear
simp_spatial_conv
However, all the official released models (VideoLLaMA3-2B, VideoLLaMA3-7B, etc.) only use mlp2x_gelu projector.

Why were other projector types not trained and released as alternatives?
Are there plans to release variants with different projector types?
Can you share insights on the performance comparison between different projector types?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions