Train Strategy PPO agent #923

101AlexMartin · 2024-03-30T12:50:00Z

101AlexMartin
Mar 30, 2024

Hello. I'm relatively new to RL and I'm trying to train a PPO agent, but the accuracy over the number of episodes is very unstable (sometimes the average (window of 50 episodes) return is positive, sometimes negative). I wonder if the way I'm training the PPO agent is correct. This is what I currently do:

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
        data_spec=agent.collect_data_spec,
        batch_size=tf_environment.batch_size,
        max_length=1024
    )
dataset = replay_buffer.as_dataset(
        num_parallel_calls=1, 
        sample_batch_size=128,
        num_steps=epi_runs
    )
iterator = iter(dataset)
# Train the agent
for epi in range(epis_train):
        
       # Run an episode consisting on 16 agent decisions and add the trajectories to the dataset
        end_flag = False
        while not end_flag:
            collect_step(tf_environment, agent.collect_policy)
            end_flag = tf_environment.envs[0]._episode_ended

        # Sample a batch of data from the buffer and update the agent's network.
        experience, _ = next(iterator)
        train_loss = agent.train(experience)

This does not make much sense to me based on the experience I have with ANNs, where in every epoch the loss is computed over the whole set of training episodes. To train the PPO agent, what would make sense to me would be to use the whole experience dataset to train the agent, and only train it after a certain number of new episodes are added to the dataset, but I wonder if in RL this is not the case, and instead the agents are retrained every time a new episode is added to the dataset, using a batch size smaller than the total number of episodes in the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Strategy PPO agent #923

{{title}}

Replies: 0 comments

Select a reply

Train Strategy PPO agent #923

101AlexMartin Mar 30, 2024

Replies: 0 comments

101AlexMartin
Mar 30, 2024