서튼 교수 강화학습 교재 공부
- Sutton & Barto, Reinforcement learning: an introduction, 2e
- 질문(?)과 코멘트(!) 작성
- 시뮬레이션 코드 리팩토링 (CleanRL 처럼)
- introduction
- multi-armed bandits
- finite Markov decision processes
- dynamic programming
- Monte Carlo methods
- temporal-difference learning
- n-step bootstrapping
- planning and learning with tabular methods
- on-policy prediction with approximation
- on-policy control with approximation
- off-policy methods with approximation
- eligibility traces
- policy gradient methods
- psychology
- neuroscience
- applications and case studies 17.frontiers