slides video

References:

  1. Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., & Moritz, P. (2015, July). Trust Region Policy Optimization. In ICML (Vol. 37, pp. 1889-1897).
  2. Kakade, S., & Langford, J. (2002, July). Approximately optimal approximate reinforcement learning. In ICML (Vol. 2, pp. 267-274).
  3. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  4. Kakade, S. M. (2002). A natural policy gradient. In Advances in neural information processing systems (pp. 1531-1538).