Slides

Speaker: Lizhang Miao

Time: Mar 5 2pm-5pm

Short Abstract: Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel Ω(n2/3) dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that Θ(n2/3) is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free O(n‾√) regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

Reference:

Hao, Botao, Tor Lattimore, and Mengdi Wang. “High-Dimensional Sparse Linear Bandits.” arXiv preprint arXiv:2011.04020 (2020).