You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi maintainers of open-r1. I just wanted to make a comment on the latest repetition penalty after talking with one of the authors of the Demystifying long CoT paper.
I just noticed that you imported the wrong repetition penalty from the Demystifying Long CoT code. The repetition penalty that was actually used in the paper was the class RepetitionDensePenalty.
This may be a problem because the difference in implementation is global vs token-level penalty.
I had an LLM generate a feature difference:
Feature
RepetitionDensePenalty
get_repetition_penalty
Granularity
Token-level penalties
Global sequence penalty
Penalty Type
Fixed value per token
Scaled based on repetition
Implementation
Modifies token rewards directly
Computes a single penalty score
Use Case
Reinforcement learning (RL) reward models
Global repetition control in scoring
Effect on Rewards
Directly affects token-level reward tensor
Adjusts overall sequence score
The text was updated successfully, but these errors were encountered:
Hi maintainers of open-r1. I just wanted to make a comment on the latest repetition penalty after talking with one of the authors of the Demystifying long CoT paper.
I just noticed that you imported the wrong repetition penalty from the Demystifying Long CoT code. The repetition penalty that was actually used in the paper was the class RepetitionDensePenalty.
This may be a problem because the difference in implementation is global vs token-level penalty.
I had an LLM generate a feature difference:
RepetitionDensePenalty
get_repetition_penalty
The text was updated successfully, but these errors were encountered: