Skip to content

Add user simulation#13

Open
JiwenJ wants to merge 9 commits intoSimple-Efficient:mainfrom
JiwenJ:add_user_simulation
Open

Add user simulation#13
JiwenJ wants to merge 9 commits intoSimple-Efficient:mainfrom
JiwenJ:add_user_simulation

Conversation

@JiwenJ
Copy link
Collaborator

@JiwenJ JiwenJ commented Jun 3, 2025

Support Issue #4

User Simulation for Multi-Turn RL

We propose adding a user simulator to an RL framework to create realistic, varied user interactions for better training and testing.

Motivation:
The motivation behind modifying the user simulation mechanism is to create a more realistic and flexible environment for evaluating and training AI agents. The previous approach relied on static or overly simplistic user feedback, which limited the diversity and authenticity of simulated user interactions. By enhancing the simulation, we aim to better mimic real-world user behavior, improve the robustness of agent training, and enable more accurate benchmarking of agent performance.

Key Points:

  1. Persona-Based Feedback:
    The new mechanism samples user personas from a configurable dataset, allowing simulated feedback to reflect a variety of user backgrounds and preferences. This increases the diversity and realism of the feedback provided to the agent.

  2. Dynamic Feedback Generation:
    Instead of using only pre-defined feedback, the system now leverages an LLM (e.g., DeepSeek) to generate contextually relevant, concise, and constructive feedback based on the persona and the agent’s response. This makes the feedback more adaptive and nuanced.

  3. Configurable and Extensible:
    The simulation parameters, such as feedback probability, persona dataset, and fallback feedbacks, are now easily configurable via YAML files. This design allows for straightforward extension and customization for different research or application needs.

  4. Seamless Integration:
    The new mechanism is integrated into the agent’s workflow, ensuring that feedback is only provided when required (controlled by a flag), and that the feedback is appended to the next observation in a way that is compatible with the agent’s prompt structure.

@gjyin gjyin marked this pull request as draft October 3, 2025 08:50
@gjyin gjyin marked this pull request as ready for review October 3, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant