Slides

Short Abstract: Generative oracle provides a simple way to analyze and design RL algorithms. In this setting, we do not need to consider the online learning (e.g., exploration). With this setting, we could understand the basic property of sampling-based Bellman operator with minimal assumptions. In particular, this sharing aims to answer the following questions: 1) why model-based algorithms could be more effective? 2) how can we improve the naive sampling-based methods by the technique of variance-reduction? 3) what’s the best way to induce a good policy from an imperfect value function?

Reference: 1) https://hal.archives-ouvertes.fr/hal-00831875/document 2) https://arxiv.org/abs/1710.09988