Tight regret bounds for model-based reinforcement learning with greedy policies
Ziniu Li (CUHK-Shenzhen)
Speaker: Ziniu Li Title: Tight regret bounds for model-based reinforcement learning with greedy policies
Reference: Efroni, Yonathan, et al. “Tight regret bounds for model-based reinforcement learning with greedy policies.” NeurIPS 2019.
Sparkling points: this paper 1) revisits an old idea: real-time dynamic programming (RTDP) (i.e., a one-step planning algorithm); 2) proves RTDP improves the computational efficiency of MBRL methods without sacrificing regret; 3) sheds light on practical algorithms.