Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Speaker: Xuhui Liu

Time: Apr 2 2pm-5pm

Short Abstract: Generally speaking, there are two main methods to perform off-policy evaluation: direct method and importance sampling. However, direct method suffers high bias and importance sampling suffers high variance. In order to overcome the disadvantage of these methods, MAGIC combines the two methods together to get a tradeoff between variance and bias. We show empirically that MAGIC produces estimates that often have orders of magnitude lower mean squared error than existing methods—it makes more efficient use of the available data.

Reference:

Philip S. Thomas, Emma Brunskill. “Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning.” http://proceedings.mlr.press/v48/thomasa16.pdf