Slides

Short Abstract: In offline reinforcement learning setting, we have access to a fixed dataset but we can’t interact with the environment any more. In this setting, we don’t need to consider the exploration issue. Therefore, in contrast to online setting, we follow the pessimism principle. In this talk, we aim to introduce two methods to realize the pessimism principle in offline RL setting and their theoretical guarantees.

Reference: 1) https://arxiv.org/abs/2007.08202 2) https://arxiv.org/abs/2007.11091