Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support Issue #4
User Simulation for Multi-Turn RL
We propose adding a user simulator to an RL framework to create realistic, varied user interactions for better training and testing.
Motivation:
The motivation behind modifying the user simulation mechanism is to create a more realistic and flexible environment for evaluating and training AI agents. The previous approach relied on static or overly simplistic user feedback, which limited the diversity and authenticity of simulated user interactions. By enhancing the simulation, we aim to better mimic real-world user behavior, improve the robustness of agent training, and enable more accurate benchmarking of agent performance.
Key Points:
Persona-Based Feedback:
The new mechanism samples user personas from a configurable dataset, allowing simulated feedback to reflect a variety of user backgrounds and preferences. This increases the diversity and realism of the feedback provided to the agent.
Dynamic Feedback Generation:
Instead of using only pre-defined feedback, the system now leverages an LLM (e.g., DeepSeek) to generate contextually relevant, concise, and constructive feedback based on the persona and the agent’s response. This makes the feedback more adaptive and nuanced.
Configurable and Extensible:
The simulation parameters, such as feedback probability, persona dataset, and fallback feedbacks, are now easily configurable via YAML files. This design allows for straightforward extension and customization for different research or application needs.
Seamless Integration:
The new mechanism is integrated into the agent’s workflow, ensuring that feedback is only provided when required (controlled by a flag), and that the feedback is appended to the next observation in a way that is compatible with the agent’s prompt structure.