Deep Exploration via Randomized Value Functions

Slides

Short Abstract: In this talk, we review the RLSVI algorithm and give the regret bound for finite-horizon time-inhomogeneous tabular setting. RLSVI is a theoretical sound and practically scalable algorithm, an alternative principled methodology compared to optimistic algorithms.

Reference: Osband, I., Van Roy, B., Russo, D. J., & Wen, Z. (2019). Deep Exploration via Randomized Value Functions. Journal of Machine Learning Research, 20(124), 1-62. Etc.