Winter 2020 COMP 767 Reinforcement Learning
Notebooks Containing Algorithms Implemented
Multi-arm Bandits and Dynamic Programming: https://colab.research.google.com/drive/13nmNXWho9nYPhSY7pQ4G8kvMPX1mwFQ5#scrollTo=tj3JQdfmY9s5
Continuous Random Walk: https://colab.research.google.com/drive/1gTMDA-xn8CsZUmiyIrNBpJYCIbxnDGuV
Q-Learning: https://colab.research.google.com/drive/1T17Y267DB4bQP3cWesqPWiXYYaUbi-Sa
SARSA: https://colab.research.google.com/drive/10xDdPaS5hS_s1LxL8yMyME-14_k9-1x8#scrollTo=uCUPei6ditZH
Expected SARSA: https://colab.research.google.com/drive/1dRbApM006FAOrjkeUNGHxWzBx6aWXRIQ
Baird's Counter-Example: https://colab.research.google.com/drive/1XuvgcUjTf7kzRavsnJM4WKadXjenlE09
REINFORCE/Actor-Critic: https://colab.research.google.com/drive/1qXu1aWxGQu2jlsxrrZnp1ToOtJKAfLTI