You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set $D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?
I guess user-costumed given $D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$. Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?
The text was updated successfully, but these errors were encountered:
RewindL
changed the title
D_S0 for policy SFT in 0-th iteration ?
What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ?
Feb 11, 2025
Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set$D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?
I guess user-costumed given$D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$ . Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?
The text was updated successfully, but these errors were encountered: