Distributionally-Aware Exploration for CVaR Bandits

Slides

Short Abstract: In this talk, we consider stochastic MAB with a popular risk-sensitive objective called the Conditional Value at Risk (CVaR). We present a an novel optimism-based algorithm for this setting: instead of adding bonuses to the CVaR estimate of each arm, we apply optimism at the sample level, generating an optimistic set of samples for each arm and then computing CVaR estimates from them instead. The regret bounds for the algorithm along with experiments showing order-of-magnitude improvements over baselines and prior work will be shown.

Reference:

1) https://rkeramati.github.io/assets/pdf/CVaR_Bandit.pdf

2) https://www0.gsb.columbia.edu/faculty/azeevi/PAPERS/mab_general_risk-MOR.pdf