Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Default value of cosine_min_value_wrong parameter #305

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhangsheng377
Copy link

According to the semantics of the parameter name, min_value should be smaller than max_value, but the original default value does not meet this point and is inconsistent with the correct default value of the get_cosine_scaled_reward function in rewards.py.

def get_cosine_scaled_reward(
    min_value_wrong: float = -1.0,
    ...
):
...

In the get_cosine_scaled_reward function, min_value and max_value will be exchanged when the question is wrong:

if is_correct:
    min_value = min_value_correct
    max_value = max_value_correct
else:
    # Swap min/max for incorrect answers
    min_value = max_value_wrong
    max_value = min_value_wrong

reward = min_value + 0.5 * (max_value - min_value) * (1.0 + cosine)

Then in the formula max_value - min_value, the correct default value will get a negative value, but using the current default value will get a positive value, so that the shorter the wrong question is, the higher the score will be. This is inconsistent with the description of Longer incorrect solutions are penalized less than shorter ones. in the get_cosine_scaled_reward function comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants