pip install -r requirements.txt
python env.py
- Python 3.10+
- Numpy
- Pytorch
- Gymnasium
-
What is Q learning?
-
A Model-Free Reinforcement Learning algorithm to learn the Quality value of taking an Action in a particular State.
-
Following the Bellman update equation, we can train an agent to take high quality actions that lead to states that maximize return in reward
-
We construct a Quality table of states , actions, rewards, and iteratively update it with the equation above.
-
-
Applying Deep Learning
-
Instead of storing a table of state transitions, use neural networks to approximate the Q function.
Why? When dealing with extremely large or continuous state spaces, storing the Quality function in a table is no longer feasible.
-
Replay Buffer
- Represents the agents memory
- Store transitions on every step (state, action, reward, next_state, terminated)
- Circular insertion
- Samples batches of transitions for neural network training
-
New Update Equation:
-
Note: Using Mean Squared Error for loss, and Stochastic Gradient Descent for back propogation
-
-
Modifictions
-
Double Deep Q Networks
Purpose: Stabilize training
-
Double Duelling Deep Q Networks
Purpose: Faster convergence
-
References:
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning
- van Hasselt, H., Guez, A. & Silver, D. (2015). Deep Reinforcement Learning with Double Q-learning.
- Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M. & de Freitas, N. (2015). Dueling Network Architectures for Deep Reinforcement Learning