eduzhai > Applied Sciences > Engineering >

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

  • Save

... pages left unread,continue reading

Document pages: 26 pages

Abstract: This paper addresses the problem of model-free reinforcement learning forRobust Markov Decision Process (RMDP) with large state spaces. The goal of theRMDP framework is to find a policy that is robust against the parameteruncertainties due to the mismatch between the simulator model and real-worldsettings. We first propose the Robust Least Squares Policy Evaluationalgorithm, which is a multi-step online model-free learning algorithm forpolicy evaluation. We prove the convergence of this algorithm using stochasticapproximation techniques. We then propose Robust Least Squares Policy Iteration(RLSPI) algorithm for learning the optimal robust policy. We also give ageneral weighted Euclidean norm bound on the error (closeness to optimality) ofthe resulting policy. Finally, we demonstrate the performance of our RLSPIalgorithm on some standard benchmark problems.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...