What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

RewindL · 2025-02-07T06:35:51Z

Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set $D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?

I guess user-costumed given $D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$. Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?

RewindL changed the title ~~D_S0 for policy SFT in 0-th iteration ?~~ What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

RewindL commented Feb 7, 2025 •

edited

Loading

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

Comments

RewindL commented Feb 7, 2025 • edited Loading

RewindL commented Feb 7, 2025 •

edited

Loading