Название: Multi-Agent Coordination
Автор: Amit Konar
Издательство: John Wiley & Sons Limited
Жанр: Программы
isbn: 9781119699026
isbn:
Several algorithms for multi‐agent learning are available in the literature, each with one specific flavor to optimize certain learning intents of the agents. Of these algorithms, quite a few interesting works on the MAQL have been reported in the literature. Among the state‐of‐the‐art MAQL algorithms, the following need special mentions. Claus and Boutilier, aimed at solving the coordination problem using two types of reinforcement learners. The first one, called independent learner (IL), takes care of the learning behavior of individual agents by ignoring the presence of other agents. The second one, called joint action learner (JAL), considers all agents including the self to learn at joint action‐space. Unlike JAL, in Team Q‐learning proposed by Littman, an agent updates its Q‐value at a joint state–action pair without utilizing associated agents' reward; rather the value function of the agent at the next joint state is evaluated by obtaining the maximum Q‐value among the joint actions at the next joint state. Ville proposed Asymmetric‐Q learning (AQL) algorithm, where the leader agents are capable of maintaining all the agents' Q‐tables. However, the follower agents are not allowed to maintain all the agents' Q‐tables and hence, they just maximize their own rewards. In AQL, agents always achieve the pure strategy Nash equilibrium (NE), although there does exist mixed strategy NE. Hu and Wellman extended the Littman's Minimax Q‐learning to the general‐sum stochastic game (where the summation of all agents' payoff is neither zero nor constant) by taking into account of other agents' dynamics using NE. They also offered a proof of convergence of their algorithm. In case of multiple NE occurrences, one is selected optimally. Littman proposed Friend‐or‐Foe Q‐learning (FQL) algorithm for general‐sum games. In this algorithm, the learner is instructed to treat each other agent either as a friend in Friend Q‐learning or as a foe in Foe Q‐learning. FQL provides a stronger convergence guarantee in comparison to that of the existing NE‐based learning rule. Greenwald and Hall proposed correlated Q‐learning (CQL) employing correlated equilibrium (CE) to generalize both Nash Q‐learning (NQL) and FQL. The bottlenecks of the above MAQL algorithms are update policy selection for adaptation of the Q‐tables in joint state–action space and the curse of dimensionality with an increase in the number of learning agents. Several attempts have been made to handle the curse of dimensionality in MAQL. Jelle and Nikos proposed Sparse Cooperative Q‐learning, where a sparse representation of the joint state–action space of the agents is done by identifying the need for coordination among the agents at a joint state. Here, agents undertake coordination by their actions only in a few joint states. Hence, each agent maintains two Q‐tables: one is the individual‐action Q‐table for uncoordinated joint states and another one is the joint action Q‐table to represent the coordinated joint states. In case of uncoordinated states, a global Q‐value is evaluated by adding the individual Q‐values. Zinkevich offers a neural network‐based approach for generalized representation of the state‐space for multi‐agent coordination. By such generalization, agents (here robots) can avoid collision with an obstacle or other robots by collecting minimum information from the sensors. Reinaldo et al. proposed a novel algorithm to heuristically accelerate the TMAQL algorithms.
In the literature of MAQL, agents either converge to NE or CE. The equilibrium‐based MAQL algorithms are most popular for their inherent ability to determine optimal strategy (equilibrium) at a given joint state. Hu et al. identified the phenomenon of similar equilibria in different joint states and introduced the concept of equilibrium transfer to accelerate the state‐of‐the‐art equilibrium‐based MAQL (NQL and CQL). In equilibrium transfer, agents recycle the previously computed equilibria having very small transfer loss. Recently, Zhang et al. attempted to reduce the dimension of the Q‐tables in NQL. The reduction is done by allowing the agents to store the Q‐values in joint state–individual action space, instead of joint state–action space.
In the state‐of‐the‐art MAQL (NQL and CQL), balancing exploration/exploitation during the learning phase is an important issue. Traditional approaches used to balance exploration/exploitation in MAQL are summarized here. The greedy exploration, although has wide publicity, needs to tune the value of which is time‐costly. In the Boltzmann strategy, the action selection probability is controlled by tuning a control parameter (temperature) and by utilizing the Q‐values due to all actions at a given state. Here, the setting of temperature to infinity (zero) implies pure exploration (exploitation). Unfortunately, the Boltzmann strategy antagonistically affects the speed of learning. Evolution of the Boltzmann strategy toward better performance is observed in a series of literature. However, the above selection mechanisms are not suitable for selecting a joint action preferred for the team (all the agents) because of the dissimilar joint Q‐values offered by the agents at a common joint state–action pair. There are traces of literature concerning joint action selection at a joint state during learning. However, with the best of our knowledge, there is no work in the literature, which considers the work, presented in this book.
The book includes six chapters. Chapter 1 provides an introduction to the multi‐robot coordination algorithms for complex real‐world problems, including transportation of a box/stick, formation control for defense applications and soccer playing by multiple robots utilizing the principles of RL, the theory of games, dynamic programming, and/or EA. Naturally, this chapter provides a thorough survey of the existing literature of RL with a brief overview of the evolutionary optimization to examine the role of the algorithms in the context of multi‐agent coordination. Chapter 1 includes multi‐robot coordination employing evolutionary optimization, and especially RL for cooperative, competitive, and their composition for application to static and dynamic games. The latter part of the chapter deals with an overview of the metrics used to compare the performance of the algorithms while coordinating. Fundamental metrics for performance analysis are defined to study the learning and planning algorithms.
Chapter 2 offers learning‐based planning algorithms, by extending the traditional multi‐agent Q‐learning algorithms (NQL and CQL) for multi‐robot coordination and planning. This extension is achieved by employing two interesting properties. The first property deals with the exploration of the team‐goal (simultaneous success of all the robots) and the other property is related to the selection of joint action at a given joint state. The exploration of team‐goal is realized by allowing the agents, capable of reaching their goals, to wait at their individual goal states, until the remaining agents explore their individual goals synchronously or asynchronously. Selection of joint action, which is a crucial problem in traditional multi‐agent Q‐learning, is performed here by taking the intersection of individual preferred joint actions of all the agents. In case the resulting intersection is a null set, the individual actions are selected randomly or otherwise following classical techniques. The superiority of the proposed learning and learning‐based planning algorithms are validated over contestant algorithms in terms of the speed of convergence and run‐time complexity, respectively.
In Chapter 3, it is shown that robots may select the suboptimal equilibrium in the presence of multiple types of equilibria (here NE or CE). In the above perspective, robots need to adapt to such a strategy, which can select the optimal equilibrium in each step of the learning and the planning. To address the bottleneck of the optimal equilibrium selection among multiple types, Chapter СКАЧАТЬ