Why only mlp2x_gelu pretrained weights when code supports multiple projector types?

I notice that the VideoLLaMA3 codebase has support for multiple projector architectures:

mlp2x_gelu (currently used in all pretrained models)
mlp3x_gelu
linear
simp_spatial_conv
However, all the official released models (VideoLLaMA3-2B, VideoLLaMA3-7B, etc.) only use mlp2x_gelu projector.

Why were other projector types not trained and released as alternatives? 
Are there plans to release variants with different projector types?
Can you share insights on the performance comparison between different projector types?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why only mlp2x_gelu pretrained weights when code supports multiple projector types? #94

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Why only mlp2x_gelu pretrained weights when code supports multiple projector types? #94

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions