Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

Open
RewindL opened this issue Feb 7, 2025 · 0 comments
Open

Comments

@RewindL
Copy link

RewindL commented Feb 7, 2025

Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set $D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?

I guess user-costumed given $D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$. Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?

@RewindL RewindL changed the title D_S0 for policy SFT in 0-th iteration ? What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant