Thank you for your great work! I have encountered gaps when reproducing the reported results of some baseline models.
For example, the reported avg QA GPT-acc of llava-1.5 is 17.18, but i only get 11.46 when i try to reproduce.
Could you kindly release the evaluation prompt and scripts of the baseline model?
Thanks for your time and efforts.