Hi, great work on LogicPuzzleRL,
Quick question: In Equation 1 (Section 3.2), you describe an intermediate reward r_int that evaluates reasoning steps. I can find r_fmt and r_final in the code, but couldn't locate where r_int is implemented. Could you point me to it?
Thanks
Hi, great work on LogicPuzzleRL,
Quick question: In Equation 1 (Section 3.2), you describe an intermediate reward r_int that evaluates reasoning steps. I can find r_fmt and r_final in the code, but couldn't locate where r_int is implemented. Could you point me to it?
Thanks