Slides

Short Abstract: The policy gradient methods have achieved much empirical progress in challenging reinforcement learning tasks with large (continuous) action space. However, little is known about their theoretical properties (e.g. the convergence rate to global optimum). The most recent works view the problems as a non-concave optimization problem and show that they have sublinear convergence rate. In this talk, we instead take a policy iteration view and shows that policy gradient methods enjoy a linear convergence rate.

Reference: 1) https://arxiv.org/abs/2007.11120