Tight regret bounds for model-based reinforcement learning with greedy policies

Speaker: Ziniu Li Title: Tight regret bounds for model-based reinforcement learning with greedy policies

Reference: Efroni, Yonathan, et al. “Tight regret bounds for model-based reinforcement learning with greedy policies.” NeurIPS 2019.

Sparkling points: this paper 1) revisits an old idea: real-time dynamic programming (RTDP) (i.e., a one-step planning algorithm); 2) proves RTDP improves the computational efficiency of MBRL methods without sacrificing regret; 3) sheds light on practical algorithms.