I have tried the OpenVLA-7B-GRAPE-Simpler checkpoint and OpenVLA-7B-SFT-Simpler which you have released on huggingface on the same task PutCarrotOnthePlate in Simpler-Env, but the success rate was really unstable. Sometimes the SR for SFT can reach 40%, sometimes falls at 16%. And this situation also happened on the SR for GRAPE (jump from 18% to 38%). It seems that from the perspective of success rate for one specific task, GRAPE does not outperform SFT for too much. Did you encounter such problem? I'm wondering that is it related to the random initial state for each episode?
I have tried the OpenVLA-7B-GRAPE-Simpler checkpoint and OpenVLA-7B-SFT-Simpler which you have released on huggingface on the same task PutCarrotOnthePlate in Simpler-Env, but the success rate was really unstable. Sometimes the SR for SFT can reach 40%, sometimes falls at 16%. And this situation also happened on the SR for GRAPE (jump from 18% to 38%). It seems that from the perspective of success rate for one specific task, GRAPE does not outperform SFT for too much. Did you encounter such problem? I'm wondering that is it related to the random initial state for each episode?