Slides

Speaker: Yingru Li

Title: Eluder dimension and potential lemma

Short Abstract: The first establishes a connection between posterior sampling and UCB algorithms. This result lets us convert regret bounds developed for UCB algorithms into Bayesian regret bounds for posterior sampling.

Second theoretical contribution is a Bayesian regret bound for posterior sampling that applies broadly on general function classes and can be specialised to many model classes. This bound depends on a new notion called the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm Bayesian regret bounds for specific model classes, the general bound matches the best available for linear models and is stronger than the best available for generalized linear models.

Finally, if time permitted, I would like to discuss the conceptual relationship between eluder dimension and celebrated elliptical potential lemma from both linear algebra viewpoint and information theory viewpoint.

Primary Reference: Russo, Daniel, and Benjamin Van Roy. “Learning to optimize via posterior sampling.” Mathematics of Operations Research 39.4 (2014): 1221-1243.

Russo, Daniel, and Benjamin Van Roy. “Eluder Dimension and the Sample Complexity of Optimistic Exploration.” NIPS. 2013.