eduzhai > Applied Sciences > Engineering >

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

  • Save

... pages left unread,continue reading

Document pages: 51 pages

Abstract: Model-based reinforcement learning algorithms with probabilistic dynamicalmodels are amongst the most data-efficient learning methods. This is oftenattributed to their ability to distinguish between epistemic and aleatoricuncertainty. However, while most algorithms distinguish these two uncertaintiesfor learning the model, they ignore it when optimizing the policy, which leadsto greedy and insufficient exploration. At the same time, there are nopractical solvers for optimistic exploration algorithms. In this paper, wepropose a practical optimistic exploration algorithm (H-UCRL). H-UCRLreparameterizes the set of plausible models and hallucinates control directlyon the epistemic uncertainty. By augmenting the input space with thehallucinated inputs, H-UCRL can be solved using standard greedy planners.Furthermore, we analyze H-UCRL and construct a general regret bound forwell-calibrated models, which is provably sublinear in the case of GaussianProcess models. Based on this theoretical foundation, we show how optimisticexploration can be easily combined with state-of-the-art reinforcement learningalgorithms and different probabilistic models. Our experiments demonstrate thatoptimistic exploration significantly speeds-up learning when there arepenalties on actions, a setting that is notoriously difficult for existingmodel-based reinforcement learning algorithms.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×