Artificial Intelligence FIUBA

Training AI models with reinforcement learning 3D project in Unity. It corresponds to the group practical work of the course ARTIFICIAL INTELLIGENCE (95.25) at the Faculty of Engineering of the University of Buenos Aires (FIUBA).

About the Project

Introduction

The idea of the project was to create two new examples of relatively simple environments and refine or improve two of the examples provided by the ML-Agents team in the toolkit (four examples in total), in order to cover more aspects of the subject while keeping it manageable. This approach was chosen because the group had no previous experience in the Unity environment, nor in reinforcement learning.

Reinforcement Learning

Reinforcement learning is a teaching technique that involves rewarding positive behaviors and punishing negative ones. It consists of empirical learning, so the computer agent is constantly searching for those decisions that reward it and avoids those paths that, based on its own experience, are penalized.

Some concepts:

Agent: The entity that learns and makes decisions.
Environment: The context in which the agent interacts and receives feedback.
Observations: The different elements that make up the environment. They correspond to the input layer of the neural network.
Actions: The options that the agent can take in response to the observations of the environment. It corresponds to the output layer of the neural network.
Rewards: The positive or negative feedback that the agent receives for its actions.

Basketball

This is a simple example created from scratch where the agent learns only within the environment, that is, with a limited set of observations and actions, it tries to score a basketball into a hoop, it is rewarded if it succeeds, and also penalized under certain conditions to achieve the desired behavior more quickly.

Start	Final Result

Walker

Again, an example of an agent that learns only in the environment, in this case, it is an example provided by the Unity ML-Agents toolkit that we sought to improve. The focus was on achieving a more human-like walking behavior for the agent, which was an iterative process with various tests to finally achieve a satisfactory result.

Start	Final Result

Volleyball

This example was also created from scratch, with the aim of covering the training of agent vs. agent, where they learn by playing against each other. Various problems arose in achieving the desired behavior, as the agents maximized their rewards by exploiting unforeseen situations, but the expected result was finally achieved with a wide range of rewards.

Start	Final Result

Soccer

Finally, this example seeks to explore the learning of agents vs. agents, that is, by groups of agents playing with each other as a team. Once again, we worked on an example provided in the toolkit, it consisted of two teams of two agents. This was expanded to six agents per team and agents with different positions on the court (for example: goalkeeper) were introduced, and therefore different behaviors. The end result was achieved with a complex reward set relative to the others.

Start	Final Result

Used frameworks

Unity ML-Agents

To develop the work we use ML-Agents, a reinforcement learning framework developed by [Unity Technologies](https:// store.unity.com/download) that allows developers of games and other simulation environments to train artificial intelligence (AI) agents in virtual environments.

TensorBoard

For the visualization of training over time we use TensorBoard, the toolkit developed by TensorFlow. Within the application you can analyze the training statistics as well as the change of the models policy over time. To run TensorBoard, use:

$ tensorboard --logdir results

Where results is the folder generated by ML-Agents with the respective neural network models.

PyTorch

PyTorch is an open source library for performing computations using data flow graphs, the fundamental way to represent deep learning models. Many of the Unity ML-Agents toolkit models are implemented on top of this library.

Dependencies

Python (3.8.13 or higher)
Unity (2021.3 or later)
Unity package com.unity.ml-agents
Unity package com.unity.ml-agents.extensions

$ python -m pip install mlagents==0.30.0

$ pip3 install torch~=1.7.1 -f https://download.pytorch.org/whl/torch_stable.html

$ pip3 install tensorboard

Training

With the exception of the Soccer example, which uses MA-POCA because it is learning in groups, the others make use of the algorithm developed by OpenAI, PPO (Proximate Policy Optimization), the same is a technique that uses a neural network to approximate the ideal function that maps an agent's observations to the best action that an agent can perform in a given state. This is an iterative process in which we train, visualize the training metrics, and adjust hyperparameters accordingly.

Some metrics of interest:

Variable	Description
entropy	Uncertainty measure. This corresponds to how random an agent's decisions are.
beta	It corresponds to the strength of the entropy regularization, which makes the policy "more random". This ensures that agents properly explore the action space during training.
gamma	Discount factor for future rewards. This can be thought of as how far into the future the agent should be concerned with possible rewards. In situations where the agent should be acting in the present to prepare for rewards in the far future, this value should be large. In cases where the rewards are more immediate, it may be less.
epsilon	Acceptable threshold for divergence between old and new policy during gradient-down update. Setting this value to a small value will result in more stable updates, but will also slow down the training process.
buffer_size	How many experiences (agent observations, actions, and earned rewards) should be collected before any model learning or update is done. Too high a value can impair training
batch_size	The number of experiences used for one iteration of a gradient descent update. This should always be a fraction of the buffer_size
learning_rate	Force each step of gradient descent update.
num_layers	How many hidden layers are present after the observation input.
hidden_units	How many units are in each fully connected layer of the neural network.
max_steps	How many steps of the simulation will the training last. For more complex problems the number should be raised.

An example file:

behaviors:
  Walker:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048 //
      buffer_size: 20480
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      epoch_num: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.995
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 30000000
    time_horizon: 1000
    summary_freq: 30000

To start a training session, all you have to do is have the scene open in Unity with the agent you want to train and run:

mlagents-learn <path to configuration file> --run-id= <unique id of neural network model>

The following flags can be used:

--resume : Resume a training session for a given id.
--force : Overwrite an id.
--initialize-from= : Start a training session for a new id from a pretrained model.

More information

Reinforcement Learning
Example Learning Environments
Installation & Set-up
Training with Proximal Policy Optimization
Training Configuration File
Training intelligent adversaries using self-play with ML-Agents
Training In Cooperative Multi-Agent Environments with MA-POCA
Using TensorBoard to Observe Training

Authors

Manuel Dieguez
Tomas Della Vecchia
James Marczewski
Ignacio Montecalvo

Acknowledgments

Some of the examples provided in Unity ML-Agents Toolkit were used as a base.
For the Basketball environment, a modified version of Basketball hoop by hotdoghans was used under license CC BY 4.0.
For the background of the menu an image of Luke Chesser was used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-en.md

README-en.md

Artificial Intelligence FIUBA

About the Project

Introduction

Reinforcement Learning

Some concepts:

Basketball

Walker

Volleyball

Soccer

Used frameworks

Unity ML-Agents

TensorBoard

PyTorch

Dependencies

Training

Some metrics of interest:

An example file:

More information

Authors

Acknowledgments

Files

README-en.md

Latest commit

History

README-en.md

File metadata and controls

Artificial Intelligence FIUBA

About the Project

Introduction

Reinforcement Learning

Some concepts:

Basketball

Walker

Volleyball

Soccer

Used frameworks

Unity ML-Agents

TensorBoard

PyTorch

Dependencies

Training

Some metrics of interest:

An example file:

More information

Authors

Acknowledgments