-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi authors, thank you for the excellent paper and well-organized repository!
I have a question regarding the implementation of the demonstration-based personalization described in the paper. In the method section, it is stated that informative sample demonstrations are incorporated alongside the synthesized persona to personalize the reward model:
However, in the actual implementation class LLMAsAJudgeProgramPersona, it looks like only the persona is provided to the judge without any demonstration of pervious preferences.
I also ran the code and inspected a real call trace, which looked like this:
A judge call prompt
Your input fields are:
1. `conversation` (str): The conversation context leading up to the completions.
2. `first_completion` (str): The first of the two possible completions to judge between.
3. `second_completion` (str): The second of the two possible completions to judge between.
Your output fields are:
1. `reasoning` (str)
2. `preference` (Literal['First', 'Second']): The completion that the judge is more likely to prefer. Possible values are 'First' and 'Second'.
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## conversation ## ]]
{conversation}
[[ ## first_completion ## ]]
{first_completion}
[[ ## second_completion ## ]]
{second_completion}
[[ ## reasoning ## ]]
{reasoning}
[[ ## preference ## ]]
{preference} # note: the value you produce must exactly match (no extra characters) one of: First; Second
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Given a conversation and two completions from different models, alongside some prior judgements and a user persona, determine which completion the human judge is more likely to prefer. Use any provided context as well as the provided persona to speculate about the personal preferences of the judge. You are a personalized reward model for this user, so think carefully about what this user will like.
The user you are judging completions for has the FOLLOWING PERSONA: ===
The synthesized persona is a thoughtful and empathetic individual who values clarity, depth, and emotional sensitivity in responses. They are interested in understanding complex social and personal issues with nuance and appreciate guidance that combines practical advice with respect for individual feelings and boundaries. This user likely navigates interpersonal relationships carefully and seeks to foster harmony and understanding, especially in family and romantic contexts. They prefer explanations that educate and clarify, helping them to better comprehend and engage with diverse perspectives. This persona values communication that is both informative and compassionate, aiming to support personal growth and healthy relationships.
===
Now, given the conversation and two completions, decide which completion the user is more likely to prefer. Remember to consider the user's persona and preferences as you make your decision.
I don’t see demonstration examples being injected anywhere in this path.
My question is: Am I misunderstanding the intended design, or is the current implementation effectively doing “persona-only” personalization without demonstrations?