Reward-free Exploration for Reinforcement Learning
Tian Xu (NJU)
Short Abstract: We study the exploration problem when the reward feedback is absent. We provide an algorithm named RL-Express which achieves nearly minimax optimal sample complexity in episodic non-stationary MDP.
Reference: Fast active learning for pure exploration in reinforcement learning