On the sample complexity for best policy identification in reinforcement learning
Tian Xu (NJU)
Short Abstract: In this talk, we consider the best policy identification (BPI) setting. We introduce a provably efficient method BPI-UCBVI which achieves minimax optimal sample complexity up to poly-log terms in BPI.
Reference: 1) https://arxiv.org/abs/2007.13442