Wrong repetition penalty imported #276

casper-hansen · 2025-02-10T16:54:57Z

Hi maintainers of open-r1. I just wanted to make a comment on the latest repetition penalty after talking with one of the authors of the Demystifying long CoT paper.

I just noticed that you imported the wrong repetition penalty from the Demystifying Long CoT code. The repetition penalty that was actually used in the paper was the class RepetitionDensePenalty.

This may be a problem because the difference in implementation is global vs token-level penalty.

I had an LLM generate a feature difference:

Feature	`RepetitionDensePenalty`	`get_repetition_penalty`
Granularity	Token-level penalties	Global sequence penalty
Penalty Type	Fixed value per token	Scaled based on repetition
Implementation	Modifies token rewards directly	Computes a single penalty score
Use Case	Reinforcement learning (RL) reward models	Global repetition control in scoring
Effect on Rewards	Directly affects token-level reward tensor	Adjusts overall sequence score

The text was updated successfully, but these errors were encountered:

edbeeching · 2025-02-12T17:42:41Z

Thanks, I will look into this tomorrow.

kashif linked a pull request Feb 12, 2025 that will close this issue

[rewards] use dense rep penalty #296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong repetition penalty imported #276

Wrong repetition penalty imported #276

casper-hansen commented Feb 10, 2025

edbeeching commented Feb 12, 2025

Wrong repetition penalty imported #276

Wrong repetition penalty imported #276

Comments

casper-hansen commented Feb 10, 2025

edbeeching commented Feb 12, 2025