On the sample complexity for best policy identification in reinforcement learning

Slides

Short Abstract: In this talk, we consider the best policy identification (BPI) setting. We introduce a provably efficient method BPI-UCBVI which achieves minimax optimal sample complexity up to poly-log terms in BPI.

Reference: 1) https://arxiv.org/abs/2007.13442