[NOT MEANT TO MERG!] GRPO reward func for coding dataset #105

August-murr · 2025-01-29T08:00:23Z

refer to #28

This is an example of how to use the OpenCoders dataset to create a reward function for the GRPOTrainer. The reward function parses the generated code, creates an evaluation script, and executes it using E2B to calculate its accuracy by counting the number of test cases that passed, as well as measuring the execution time.

It's not perfect since there are edge cases and potential failures that have not been addressed.

I haven't been able to test it with GRPO due to other issues. I would appreciate it if anyone could try this out and see if there are any other issues or if it works properly.

A similar approach was used to train R1, although it likely relied solely on a LeetCode dataset.

kalogyu · 2025-01-31T16:28:22Z

Thanks for sharing the approach. I understand that this implementation is still in the testing phase and has some potential edge cases. I'll be happy to help test it out and look for any issues.

refer to #28

This is an example of how to use the OpenCoders dataset to create a reward function for the GRPOTrainer. The reward function parses the generated code, creates an evaluation script, and executes it using E2B to calculate its accuracy by counting the number of test cases that passed, as well as measuring the execution time.

It's not perfect since there are edge cases and potential failures that have not been addressed.

I haven't been able to test it with GRPO due to other issues. I would appreciate it if anyone could try this out and see if there are any other issues or if it works properly.

A similar approach was used to train R1, although it likely relied solely on a LeetCode dataset.

August-murr added 2 commits January 29, 2025 07:43

grpo reward func for coding

30944fe

Delete .gitpod.yml

62edc88

August-murr mentioned this pull request Feb 10, 2025

Can't find: for LeetCode problems, a compiler can be used to generate feedback based on predefined test cases. #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOT MEANT TO MERG!] GRPO reward func for coding dataset #105

[NOT MEANT TO MERG!] GRPO reward func for coding dataset #105

August-murr commented Jan 29, 2025

kalogyu commented Jan 31, 2025 •

edited

Loading

[NOT MEANT TO MERG!] GRPO reward func for coding dataset #105

Are you sure you want to change the base?

[NOT MEANT TO MERG!] GRPO reward func for coding dataset #105

Conversation

August-murr commented Jan 29, 2025

kalogyu commented Jan 31, 2025 • edited Loading

kalogyu commented Jan 31, 2025 •

edited

Loading