Online Target Q-Learning with Reverse Experience Replay - Efficiently Finding The Optimal Policy for Linear MDPs
Ziniu Li (CUHK-Shenzhen)