Increasing the Action Gap
Mark Gluzman (iDDA, CUHK-Shenzhen)
See Slides and recorded Video for the lecture on youtube.
Bellemare, Marc G., Ostrovski, Georg, Guez, Arthur, Thomas, Philip S., and Munos, R´emi. Increasing the action gap: New operators for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
Some insights how to interpreted and choose \alpha for the AL operator: Greg Farquhar, The Advantage Learning Operator, 2016, http://aims.robots.ox.ac.uk/wp-content/uploads/2015/07/RL minproject v1-Greg.pdf
New optimality-preserving operators on Q-functions: Yingdong Lu, Mark S. Squillante, Chai Wah Wu, ”A General Family of Robust Stochastic Operators for Reinforcement Learning”, 2018, https://arxiv.org/abs/1805.08122
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Humanlevel control through deep reinforcement learning. Nature 518(7540):529–533.
van Hasselt, H. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23.