Hi,
Thank you for sharing your wonderful study! The concepts of prefusion and vision compression are very intriguing.
Despite several attempts to reproduce your work with the Llava-1.5-7B model, I have encountered failures: the training loss decreases similarly to that of the original Llava, but the downstream performance is much worse.
- My trials have included varying learning hyperparameters, conducting separate intermediate training for prefusion and compression while freezing the projector and LLM, and initializing from the Llava checkpoint.
Could you please share the exact training script used for the paper and release the model checkpoint of LLaVA-v1.5-Vicuna-7B for future research?
Thank you for checking this matter.
Hi,
Thank you for sharing your wonderful study! The concepts of prefusion and vision compression are very intriguing.
Despite several attempts to reproduce your work with the Llava-1.5-7B model, I have encountered failures: the training loss decreases similarly to that of the original Llava, but the downstream performance is much worse.
Could you please share the exact training script used for the paper and release the model checkpoint of LLaVA-v1.5-Vicuna-7B for future research?
Thank you for checking this matter.