See Slides and recorded Video for the lecture on youtube.
- MDP framework, Terminology, Bellman equation
- Markov Decision Process
- Bellman equation
- Bellman optimal equation
- Infinite horizon discounted problem
- Fix point iteration (Contraction mapping)
- Dynamic Programming
- Value iteration (VI)
- Policy iteration (PI)
- Policy Evaluation
- Policy Improvement
- Approximate PI
- Bellman Error
- Tabular TD(0)-learning
- Q-factor
- Q-learning as stochastic VI (off policy)
- optimistic PI for Q-factors: SARSA (on policy)
An overview of modern (Deep) Reinforcement Learning Algorithms:
- Things not yet covered
- On-policy/Off-policy
- Safe exploration
Discuss in the google groups!