I notice that the VideoLLaMA3 codebase has support for multiple projector architectures:
mlp2x_gelu (currently used in all pretrained models)
mlp3x_gelu
linear
simp_spatial_conv
However, all the official released models (VideoLLaMA3-2B, VideoLLaMA3-7B, etc.) only use mlp2x_gelu projector.
Why were other projector types not trained and released as alternatives?
Are there plans to release variants with different projector types?
Can you share insights on the performance comparison between different projector types?
I notice that the VideoLLaMA3 codebase has support for multiple projector architectures:
mlp2x_gelu (currently used in all pretrained models)
mlp3x_gelu
linear
simp_spatial_conv
However, all the official released models (VideoLLaMA3-2B, VideoLLaMA3-7B, etc.) only use mlp2x_gelu projector.
Why were other projector types not trained and released as alternatives?
Are there plans to release variants with different projector types?
Can you share insights on the performance comparison between different projector types?