Skip to content

tune hyperparameters for RLHF model #4

@GrantorShadow

Description

@GrantorShadow

Increase the training iterations: Train the PPO model for more iterations, as the model might not have converged yet.

Adjust the PPO hyperparameters: Experiment with different hyperparameters such as learning rate, batch size, and discount factor. refer to the Stable Baselines documentation for more details.

Modify the MLP architecture: Adjust the number of layers and neurons in the MLP, as well as the activation functions, to improve the model's capacity to learn complex patterns.

Experiment with other algorithms: There are several other reinforcement learning algorithms available in the Stable Baselines library, such as DDPG, SAC, and A2C. experiment with these.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions