eduzhai > Applied Sciences > Engineering >

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

  • Save

... pages left unread,continue reading

Document pages: 25 pages

Abstract: In this paper we propose several novel distributed gradient-based temporaldifference algorithms for multi-agent off-policy learning of linearapproximation of the value function in Markov decision processes with strictinformation structure constraints, limiting inter-agent communications to smallneighborhoods. The algorithms are composed of: 1) local parameter updates basedon single-agent off-policy gradient temporal difference learning algorithms,including eligibility traces with state dependent parameters, and 2) linearstochastic time varying consensus schemes, represented by directed graphs. Theproposed algorithms differ by their form, definition of eligibility traces,selection of time scales and the way of incorporating consensus iterations. Themain contribution of the paper is a convergence analysis based on the generalproperties of the underlying Feller-Markov processes and the stochastic timevarying consensus model. We prove, under general assumptions, that theparameter estimates generated by all the proposed algorithms weakly converge tothe corresponding ordinary differential equations (ODE) with precisely definedinvariant sets. It is demonstrated how the adopted methodology can be appliedto temporal-difference algorithms under weaker information structureconstraints. The variance reduction effect of the proposed algorithms isdemonstrated by formulating and analyzing an asymptotic stochastic differentialequation. Specific guidelines for communication network design are provided.The algorithms superior properties are illustrated by characteristicsimulation results.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...