research article a new decentralized approach of multiagent cooperative...

12
Research Article A New Decentralized Approach of Multiagent Cooperative Pursuit Based on the Iterated Elimination of Dominated Strategies Model Mohammed El Habib Souidi 1,2 and Songhao Piao 1 1 Harbin Institute of Technology, Computer Science and Technology, Harbin 150001, China 2 Department of Computer Science, University of Khenchela, 40000 Khenchela, Algeria Correspondence should be addressed to Mohammed El Habib Souidi; [email protected] Received 20 June 2016; Revised 7 September 2016; Accepted 25 September 2016 Academic Editor: Vladimir Turetsky Copyright © 2016 M. E. H. Souidi and S. Piao. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Game eory is a promising approach to acquire coalition formations in multiagent systems. is paper is focused on the importance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasion problems. In order to address this task, we propose a decentralized coalition formation algorithm based on the Iterated Elimination of Dominated Strategies (IEDS). is Game eory process is common to solve problems requiring the withdrawal of dominated strategies iteratively. Furthermore, we have used the Markov Decision Process (MDP) principles to control the motion strategy of the agents in the environment. e simulation results demonstrate the feasibility and the validity of the given approach in comparison with different decentralized methods. 1. Introduction Multiagent System (MAS) is the cooperation of an orga- nized set of intelligent agents situated in an environment in order to coordinate their performances and resolve complex problems. Coalition formation, considered as a major focus in social systems, is an important method of cooperation. In general, coalition formation between the agents is goal- directed and short-lived. Coalitions are formed to achieve a specific objective and dissolve when it is accomplished. Coalition formation has received a considerable amount of attention in recent research [1, 2]. Further, some research activities are based on the notion of Organization that allows the coalition of the agents in the form of groups as well as the cooperation between their members. Consequently, Ferber et al. [3] proposed AALAADIN organizational model based on three principal axes: Agent, Group, and the Role used simultaneously to describe the concrete agents’ organizations (AGR). Multiagent cooperative pursuit is a known multiagent problem [4, 5]. Based on identical conditions between pursuers and evaders, the pursuit-evasion problem can be classified into single object pursuit and multiobject pursuit. Also, this problem has been considered in many references and its applications were inspired by an equally diverse set of approaches and useful techniques, such as Organization [6], in which we have used the principles of AGR organizational model to propose a pursuit coalition formation algorithm. Also, in order to equip each pursuit group with a dynamic access mechanism, we have introduced a flexible organiza- tional model extended from AGR through the application of fuzzy logic principles that determines the membership degree of each pursuer in relation to each group [7]. Furthermore, Cai et al. introduced an economical auction mechanism (MPMEGBTBA) [8], where an advanced task negotiation process based on Task Bundle Auction was proposed in order to allocate the task dynamically through dynamic coalition formation of multiple agents. Also, we can find several researches treating the pursuit problem through different principles as Graph eory [9], Polygonal Environment [10, 11], Data Mining [12], and Reinforcement Learning [13]. In this kind of problem, the pursuers and Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2016, Article ID 5192423, 11 pages http://dx.doi.org/10.1155/2016/5192423

Upload: others

Post on 08-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Research ArticleA New Decentralized Approach of MultiagentCooperative Pursuit Based on the Iterated Elimination ofDominated Strategies Model

Mohammed El Habib Souidi12 and Songhao Piao1

1Harbin Institute of Technology Computer Science and Technology Harbin 150001 China2Department of Computer Science University of Khenchela 40000 Khenchela Algeria

Correspondence should be addressed to Mohammed El Habib Souidi mohamedsouidi1989hotmailcom

Received 20 June 2016 Revised 7 September 2016 Accepted 25 September 2016

Academic Editor Vladimir Turetsky

Copyright copy 2016 M E H Souidi and S Piao This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Game Theory is a promising approach to acquire coalition formations in multiagent systems This paper is focused on theimportance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasionproblems In order to address this task we propose a decentralized coalition formation algorithm based on the Iterated Eliminationof Dominated Strategies (IEDS) This GameTheory process is common to solve problems requiring the withdrawal of dominatedstrategies iteratively Furthermore we have used theMarkovDecisionProcess (MDP) principles to control themotion strategy of theagents in the environment The simulation results demonstrate the feasibility and the validity of the given approach in comparisonwith different decentralized methods

1 Introduction

Multiagent System (MAS) is the cooperation of an orga-nized set of intelligent agents situated in an environment inorder to coordinate their performances and resolve complexproblems Coalition formation considered as a major focusin social systems is an important method of cooperationIn general coalition formation between the agents is goal-directed and short-lived Coalitions are formed to achievea specific objective and dissolve when it is accomplishedCoalition formation has received a considerable amount ofattention in recent research [1 2] Further some researchactivities are based on the notion of Organization that allowsthe coalition of the agents in the form of groups as well as thecooperation between their members Consequently Ferberet al [3] proposed AALAADIN organizational model basedon three principal axes Agent Group and the Role usedsimultaneously to describe the concrete agentsrsquo organizations(AGR)

Multiagent cooperative pursuit is a known multiagentproblem [4 5] Based on identical conditions between

pursuers and evaders the pursuit-evasion problem can beclassified into single object pursuit and multiobject pursuitAlso this problem has been considered in many referencesand its applications were inspired by an equally diverse set ofapproaches and useful techniques such as Organization [6]in which we have used the principles of AGR organizationalmodel to propose a pursuit coalition formation algorithmAlso in order to equip each pursuit group with a dynamicaccess mechanism we have introduced a flexible organiza-tional model extended from AGR through the application offuzzy logic principles that determines themembership degreeof each pursuer in relation to each group [7]

Furthermore Cai et al introduced an economical auctionmechanism (MPMEGBTBA) [8] where an advanced tasknegotiation process based on Task Bundle Auction wasproposed in order to allocate the task dynamically throughdynamic coalition formation of multiple agents Also wecan find several researches treating the pursuit problemthrough different principles as Graph Theory [9] PolygonalEnvironment [10 11] Data Mining [12] and ReinforcementLearning [13] In this kind of problem the pursuers and

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2016 Article ID 5192423 11 pageshttpdxdoiorg10115520165192423

2 Mathematical Problems in Engineering

evaders are presented in different kinds of types The type ofa pursuer denotes its pursuit capacity Otherwise the type ofan evader reflects the number and type of pursuers requiredto perform its capture The value of an evader indicates theexpected rewards that should be returned to the relevantpursuers after the achievement of the capture

Game Theory can be considered as the simplest wayto model situations of conflict and studies the interactionsbetween interested agents The classic question relating gametheory to multiagent systems is ldquowhat is the best actionthat an agent can performrdquo This principle has been widelyused in multiagent pursuit problems [14ndash17]The negotiationbased on Game Theory is focused on the value and rewardsof each agent which appropriately reflect the objective ofagentrsquos negotiation (satisfying the goal of each agent) Themain advantage provided by Game Theory algorithms inMAS is the coordination appearing through the hypoth-esis of mutual rationality of the agents Therefore TheGameTheory algorithms are used to coordinate autonomousrational agents without the use of coordination mechanismexplicitly integrated in themodel of agents Also they providedifferent methods that define the optimal agentsrsquo coalitionsin several types of problems Otherwise the disadvantagesof these algorithms concern the agents which are frequentlyconsidered as perfect rationales Moreover Game Theoryalgorithms are focused on the value of the optimal solutionand overlook the most efficient method to achieve it

In this paper we have focused on the Iterated Eliminationof Dominated Strategies (IEDS) Game Theory technique topropose coalition formation algorithm as part of the pursuit-evasion problems A strategy is the complete specificationof the agentrsquos behavior in any situation (in the case of anextensive form game it means what behavior the agent mustundertake according to the set of information provided)Moreover we have used Markov Decision Process (MDP)principles in order to control the motion strategy of eachagent This process (MDP) provides formalism of model andresolves planning and learning problems under uncertainty[18 19]

The paper is organized as follows in Section 2 we discussthe main related works based on the same principles used inthis paper In Section 3 we focus on the pursuit-evasion prob-lem by submitting a detailed explanation of the simulationenvironment and its different contents Also we approach theenvironmental agents by the definition and the clarificationof the principal characteristics of pursuers and evaders InSection 4 the Iterated Elimination of Dominated Strategiesprinciple is described anddetailed by an applicationrsquos exampleof this GameTheory process In Section 5 the basic principlesof Markov Decision Process are motivated Therefore weclarify the principles of the primary functions better knownas the reward and transition functions In Section 6 weintroduce the distributed coalition formation algorithm witha detailed clarification of the coalition progress A simulationof the pursuit-evasion game example is shown in Section 7 inthis part we describe our simulation environment in a specificmanner Also we present the results achieved in comparisonwith other outcomes based on other theories Finally Sec-tion 8 contains concluding remarks about this paper

2 Related Work

There exist a lot of works based on game theoretic prin-ciples regarding the PE problem such as in [14] where theauthor described the control of an autonomous agentsrsquo teamtracking an intelligent evader in a nonaccurately mappedterrain based on a method to calculate the Nash equilibriumpolicies by resolving an equivalent zero-sum matrix gameIn this example among all Nash equilibrium the evaderselects the one which optimizes its deterministic distance tothe pursuersrsquo team In order to resolve the problems oftenencountered in the algorithms of pursuit-evasion games suchas computational complexity and the lack of universalityDong et al [15] propose a hybrid algorithm founded onimproved dynamic artificial potential field and differentialgame where Nash equilibrium solution is optimal for bothpursuer and evader in barrier-free zone in pursuit-evasiongame and in accordance with environment changes aroundthe pursuit elements the algorithm is applied with flexibilityMoreover in [16] Amigoni and Basilico have presented anapproach to calculate the optimal pursuerrsquos strategy thatmaximizes the probability of the targetrsquos capture in a givenenvironment This approach is based on the definition ofthe pursuit-evasion game theoretic model as well as on itsresolution through the mathematical programming

More recently Lin et al [20] proposed a pursuit-evasiondifferential game based on Nash strategies involving limitedobservations On the one hand the evader undertakes thestandard feedback Nash strategy On the other hand thepursuers undertake Nash strategies based on the novelconcept of best achievable performance indices proposedThis model has potential applications in cases where sev-eral weakly equipped pursuing vehicles are tracking well-equipped unmanned vehicle

In relation to PE MDP is usually used to providethe motion planning for the mobile pursuers through themaximization of the rewards obtained during the pursuitIn [21] a Partially Observable Markov Decision Process(POMDP) algorithm is used to search the mobile target ina known graph The main objective is to ensure the captureof the targets via the clearing of the graph in minimal timeIn [22] the authors propose a new approach ContinuousTime Markov Decision Process (CTMDP) to address the PEproblem In relation to MDP CTMDP takes into account theimpact of the transition time between the states involving astrong robustness against changes in transition probabilityIn [23] the authors proposed an innovative approach totallybased on MDP with the aim of resolving sequential multia-gent decision problems by allowing agents to reason explicitlyabout specific coordinationmechanisms In otherwords theydetermined a value iteration algorithm to compute the opti-mal policies that recognizes and reasons about CoordinationProblems

Furthermore we can consider other works based onMDP and IEDS such as [24] in which an exact dynamicprogramming algorithm for partially observable stochasticgames (POSGs) is developed Also it is proven that the algo-rithm iteratively eliminates very weakly dominated strategieswithout first forming a normal form representation of the

Mathematical Problems in Engineering 3

game when it is applied to finite-horizon POSGs Otherwiseseveral types of coordination mechanisms are currently usedsuch as Stochastic Clustering Auctions (SCAs) [25 26] whichrepresent a class of cooperative auctionmethods based on themodified Swendsen-Wang method It permits each robot toreconstitute the tasks that have been linked and applies toheterogeneous teams Other mechanisms are market-basedas TraderBots [27] applied on greedy agents in order toprovide a detailed analysis of the requirements for robust andefficient multirobot coordination in dynamic environmentsFrom the point of view of Game Theory some researchactivities [28] investigated the optimal coordination approachfor multiagent foraging indeed they built the equivalencebetween the optimal solution of MAS and the equilibriumof the game according to the same case and then theyintroduced evolutionarily stable strategy to be of service inresolving the equilibrium selection problem of traditionalGameTheory

3 Problem Description

In this section we focus on the cooperation problem inwhich119899 pursuers situated in a limitary toroidal grid environment119883have to capture 119898 evaders of different types The expressionsof 119875 = 1198751 119875119899 and 119864 = 1198641 119864119898 represent thecollection of 119899 pursuers and119898 evaders respectively Pursuersand evaders represent the roles that the agents can play Eachevader is characterized by a type Re with Re isin I II III IVto indicate howmany pursuers are required to capture it Herewe suppose that the pursuers can evaluate the evadersrsquo typesafter the localization There exist some fixed obstacles withdifferent shapes and sizes in the environment119883The positioncould be destined for the mapping mp

119883 rarr 0 1 such as forall119909 isin 119883 mp(119909) = 1 well then 119909 isan obstacle

In our proposal the strategies of each pursuer are guided bydetermining factors that reflect the individual development ofthe pursuer during the execution of the assigned tasks Thesefactors are detailed as follows

Self-Confidence Degree In multiagent systems each agentmust be able to execute the services requested by the otheragents The self-confidence degree is the assessment of theagentrsquos success in relation to the assigned tasks It is denotedand computed in the following way

forallConf isin [01 1] Conf = max(01 119862119904119862119905) (1)

119862119904 is the number of tasks that the agent has accomplished 119862119905is the number of tasks in which the agent has participated

The Credit In the case where the agent cannot perform atask then its credit will be affected The credit of an agent isdesignated and calculated as follows

forallCredit isin [0 1] Credit = min(1 1 minus 119862119887119862119905 minus 119862119904) (2)

119862119887 is the number of the abandoned tasks by the agent

Environment Position The position of the agent in theenvironment is a crucial criterion for the pursuit sequencesbecause the capture will be easier if the pursuer is closer tothe evader The position Pos is computed as follows

Pos = Dist (119878119875 119878119864) (3)

119878119875 is the state (cell) of the pursuer 119878119864 is the state (cell) ofthe evader Dist is the distance between the pursuer and theevader

Dist (119878119875 119878119864) = radic(CC119875119894 minus CC119864119894)2 + (CC119875119895 minus CC119864119895)2 (4)

(CC119875119894 CC119875119895) is the Cartesian coordinates of the pursuer(CC119864119894 CC119864119895) is the Cartesian coordinates of the evader

In order to distinguish the different coalitions eachpursuer belonging to a coalition calculates the value returnedto itself through this strategy This computation is basedon the factors characterizing the pursuers For example apursuer (1198751) belongs to the coalition (Co) the value of thiscoalition in relation to this pursuer is calculated as follows

Co(val1198751)

= Coef1 times Conf1 + Coef2 times Credit1 + Coef3 times Pos1sum3119896=1 Coef119896

+ Resum119894=2

Coef1 times Conf119894 + Coef2 times Credit119894 + Coef3 times Pos119894Re times sum3119896=1 Coef119896

(5)

Coef is coefficient of each factorOn the basis of these values and using IEDS method our

mechanismwill be able to select the optimal pursuit coalitionfor each evader detected as detailed in Section 6

4 The Iterated Elimination ofDominated Strategies (IEDS)

The coalition is a set of pursuers required to capture theevaders detected In the coalition each pursuer must corre-spond to a specific strategy In our proposal a pure strategy 119904119894defines a specific pursuit grouprsquos integration that the pursuerwill follow in every possible and attainable situation duringthe pursuit Such coalitions may not be random or drawnfrom a distribution as in the case of mixed strategies Astrategy str119894 dominates another strategy str1015840119894 if and only if forevery potential combination of the other playersrsquo actions strminus119894

120583119894 (str119894 strminus119894) ge 120583119894 (str1015840119894 strminus119894) (6)

120583 is function that returns the results obtained through theapplication of a specific strategy

Consider the strategic game shown in Table 1 where thecolumn player has three pure strategies and the row playerhas only two (a) Knowing that the values shown in eachcase represent the expected payoffs returned to the players inthe case of selecting the current strategy Playing the Centeris always better than playing the Right side for the column

4 Mathematical Problems in Engineering

Table 1 Application of IEDS technique

(a)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(b)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(c)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(d)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1Bold fonts reflect how the dominated strategies are deleted

player Consequently we can assume he will eventually stopplaying Right side because it is a dominated strategy (b) Sowe can ignore the Right side column after its eliminationNow row player has a dominated strategy UP Eventuallyrow player stops playing UP then row-UP gets eliminated(c) Finally we have two remaining choices Down-Left andDown-Center and column player notices that it can only winby playing Left (d) So we can deduce that the IEDS solutionis (Down Left) with the following payoff (6 6)

5 Markov Decision Process Principles

Markov Decision Processes (MDPs) provide a mathematicalframework tomodel decisionmaking in situationswhere out-comes are somewhat random and partially under the controlof a decisionmaker In cooperative multiagent systems MDPallows the formalization of sequential decision problemsThisprocess only models the cooperative systems in which thereward function is shared by all players MDP is defined by⟨119873 119878 119860 119879 119877⟩ as follows

119873 is the number of agents Ag119894 in the system 119894 isin1 119873119878 corresponds to the set of agentsrsquo states 119904119860 119860 = 1198601 times 1198602 times sdot sdot sdot times 119860119873 defines the set of jointactions of the agents where 119860 119894 is the set of localactions of the agent Ag119894

119879 is the transition function It returns the probability119879(119904 119886 1199041015840)meaning that the agent goes into the state 1199041015840if it runs the joint action 119886 isin 119860 from state 119904119877 defines the reward function 119877(119904 119886 1199041015840) representsthe reward obtained by the agent when it transits from

the state 119904 to the state 1199041015840 by the execution of the action119886

51 Reward Function In MDP problem the next statesselected are the states returning maximum definitive rewardIn our proposal we have used Heuristic functions in orderto calculate the immediate reward of each state The rewardfunction defines the goals that the pursuers have to achieveand identifies the environmental obstacles To calculate thisfunction we relied on the agentsrsquo environment positiondetailed in Section 3 which allows the distribution of therewards on the environmental cells fairly The calculation ofthe rewards in each state 119904 concerned is effectuated as follows

119877 (119904 119886) =

120574 if 119864119894 sube 1199040 mp (119909) = 1120574 minus Val (Dist (CC119875CC119864)) else

(7)

120574 is the maximum reward Val(Dist(CC119875CC119864)) representsthe distance value

Regarding the distribution of the rewards in the standardcells we note that the reward function is inversely propor-tional to the distance function

Figure 1 illustrates a part of our simulation environmentdetailed in Section 7 The values displayed in the differentcells [1198811 1198812 1198813] represent the gains generated by the rewardfunction The dynamic rewards will be awarded to anypursuer situated in the cell concerned during the pursuit

1198811 the reward could be obtained if the pursuerconcerned tracks the first evader1198812 the reward could be obtained if the pursuerconcerned tracks the second evader1198813 is the index of the cell (occupied or free)

52 Transition Function The transition probabilities (120588)describe the dynamism of the environment They play therole of the next-state function in a problem-solving searchknowing that every state could be the possible next stateaccording to the action undertaken in the actual state Ourapproach is developed in grid of cells environment whereeach agent can move in four different states 119904up 119904down 119904leftand 119904right

The transition probabilities of the pursuers are based onthe reward degree as shown

sum1199041015840

120588 (1199041015840 | 119904 119886) = 1

120588 (1199041015840 | 119904 119886) = 119877 (1199041015840 119886)120574

120588 (1199041015840 | 119904 119886) = max (120588 (119904 | 119904 119886) 120588 (119904up | 119904 119886) 120588 (119904down | 119904 119886) 120588 (119904right | 119904 119886) 120588 (119904left | 119904 119886))

forall119904 119886

(8)

Mathematical Problems in Engineering 5

[41 97 0]

[42 96 0]

[43 95 0]

[44 94 0]

[40 98 1]

[41 97 0]

[42 96 0]

[43 95 0]

[39 97 0]

[40 96 0]

[41 95 0]

[42 94 1]

[38 96 0]

[39 95 0]

[41 93 0]

[37 95 0]

[38 94 0]

[39 93 0]

[40 92 0]

[36 94 0]

[37 93 0]

[38 92 0]

[39 91 0]

[35 93 0]

[36 92 0]

[38 90 0]

[34 92 0]

[35 91 0]

[36 90 0]

[37 89 0]

[34 90 0]

[35 89 0]

[36 88 0]

[33 91 1]

Figure 1 Reward function applied to the grid environment Cells with a red frame the selected states blue agents pursuers green agentsevaders black cells cells containing obstacles

The linkages between the evader and each pursuer shownin Figure 2 reflect the optimal trajectories provided by theapplication of the method proposed in this section duringeach different pursuit step

6 Coalition Formation AlgorithmBased on IEDS

A number of coalition formation algorithms have beendeveloped to define which of the potential coalitions shouldactually be formed To do so they typically compute avalue for each coalition known as the coalition value whichprovides an indication of the expected results that could bederived if this coalition is constitutedThen having calculatedall the coalitional values the decision about the optimalcoalition to form can be selected We employ an iterativealgorithm in order to determine the optimal coalitions ofagents It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coali-tions that have lower contribution values to MAS efficiencyThe pseudocode of our algorithm is shown in Algorithm 1

First the algorithm calculates all the possible coalitions(Nbrcl) that the pursuers can form before their filtration asneeded The expected number of the possible coalitions toform is calculated according to the following

Nbrcl = 119899(119899 minus Re1)Re1 times

119899 minus Re1(119899 minus (Re1 + Re2))Re2

times sdot sdot sdot times 119899 minus (Re1 + sdot sdot sdot + Re119873minus1)(119899 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(119899 minus sum119896=119895minus1119896=0

Re119896)(119899 minus sum119896=119895

119896=0Re119896)Re119895

(9)

119899 is the number of pursuers in the environment 119873 is thenumber of evaders detected Re0 = 0

In order to distribute the calculation of the possiblecoalitions among the pursuers the possible general coalitions

119899 The number of pursuers119894 = 0119896 = 0119895 = indicator of the chase iterationCalculate the possible coalitionsWhile (119862life gt 0) doCalculate the value of each coalitionWhile (number of coalitions gt 1) do

Eliminate the dominated strategy of 119875119894119894 larr 119894 mod 119899 + 1end whileAssign the pursuersrsquo roles according to theSelected coalitionChase iteration

End whileIf (capture = true) thenWhile (119896 le 119899)

Update (Reward119875119896 )119870++end while

ElseThe guilty pursuers pay some fines

end if

Algorithm 1

(Ω) will be calculated A general coalition enrolls all thepursuers required to capture the set of evaders detected

Ω = 119899(119899 minus 120582)120582 (10)

120582 = (Re1 + Re2 + sdot sdot sdotRe119873)The general coalitions generated will be equitably dis-

tributed among the agents playing the role Pursuer Specif-ically each general coalition will be composed of 119873 pur-suit groups From each general coalition generated through

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

2 Mathematical Problems in Engineering

evaders are presented in different kinds of types The type ofa pursuer denotes its pursuit capacity Otherwise the type ofan evader reflects the number and type of pursuers requiredto perform its capture The value of an evader indicates theexpected rewards that should be returned to the relevantpursuers after the achievement of the capture

Game Theory can be considered as the simplest wayto model situations of conflict and studies the interactionsbetween interested agents The classic question relating gametheory to multiagent systems is ldquowhat is the best actionthat an agent can performrdquo This principle has been widelyused in multiagent pursuit problems [14ndash17]The negotiationbased on Game Theory is focused on the value and rewardsof each agent which appropriately reflect the objective ofagentrsquos negotiation (satisfying the goal of each agent) Themain advantage provided by Game Theory algorithms inMAS is the coordination appearing through the hypoth-esis of mutual rationality of the agents Therefore TheGameTheory algorithms are used to coordinate autonomousrational agents without the use of coordination mechanismexplicitly integrated in themodel of agents Also they providedifferent methods that define the optimal agentsrsquo coalitionsin several types of problems Otherwise the disadvantagesof these algorithms concern the agents which are frequentlyconsidered as perfect rationales Moreover Game Theoryalgorithms are focused on the value of the optimal solutionand overlook the most efficient method to achieve it

In this paper we have focused on the Iterated Eliminationof Dominated Strategies (IEDS) Game Theory technique topropose coalition formation algorithm as part of the pursuit-evasion problems A strategy is the complete specificationof the agentrsquos behavior in any situation (in the case of anextensive form game it means what behavior the agent mustundertake according to the set of information provided)Moreover we have used Markov Decision Process (MDP)principles in order to control the motion strategy of eachagent This process (MDP) provides formalism of model andresolves planning and learning problems under uncertainty[18 19]

The paper is organized as follows in Section 2 we discussthe main related works based on the same principles used inthis paper In Section 3 we focus on the pursuit-evasion prob-lem by submitting a detailed explanation of the simulationenvironment and its different contents Also we approach theenvironmental agents by the definition and the clarificationof the principal characteristics of pursuers and evaders InSection 4 the Iterated Elimination of Dominated Strategiesprinciple is described anddetailed by an applicationrsquos exampleof this GameTheory process In Section 5 the basic principlesof Markov Decision Process are motivated Therefore weclarify the principles of the primary functions better knownas the reward and transition functions In Section 6 weintroduce the distributed coalition formation algorithm witha detailed clarification of the coalition progress A simulationof the pursuit-evasion game example is shown in Section 7 inthis part we describe our simulation environment in a specificmanner Also we present the results achieved in comparisonwith other outcomes based on other theories Finally Sec-tion 8 contains concluding remarks about this paper

2 Related Work

There exist a lot of works based on game theoretic prin-ciples regarding the PE problem such as in [14] where theauthor described the control of an autonomous agentsrsquo teamtracking an intelligent evader in a nonaccurately mappedterrain based on a method to calculate the Nash equilibriumpolicies by resolving an equivalent zero-sum matrix gameIn this example among all Nash equilibrium the evaderselects the one which optimizes its deterministic distance tothe pursuersrsquo team In order to resolve the problems oftenencountered in the algorithms of pursuit-evasion games suchas computational complexity and the lack of universalityDong et al [15] propose a hybrid algorithm founded onimproved dynamic artificial potential field and differentialgame where Nash equilibrium solution is optimal for bothpursuer and evader in barrier-free zone in pursuit-evasiongame and in accordance with environment changes aroundthe pursuit elements the algorithm is applied with flexibilityMoreover in [16] Amigoni and Basilico have presented anapproach to calculate the optimal pursuerrsquos strategy thatmaximizes the probability of the targetrsquos capture in a givenenvironment This approach is based on the definition ofthe pursuit-evasion game theoretic model as well as on itsresolution through the mathematical programming

More recently Lin et al [20] proposed a pursuit-evasiondifferential game based on Nash strategies involving limitedobservations On the one hand the evader undertakes thestandard feedback Nash strategy On the other hand thepursuers undertake Nash strategies based on the novelconcept of best achievable performance indices proposedThis model has potential applications in cases where sev-eral weakly equipped pursuing vehicles are tracking well-equipped unmanned vehicle

In relation to PE MDP is usually used to providethe motion planning for the mobile pursuers through themaximization of the rewards obtained during the pursuitIn [21] a Partially Observable Markov Decision Process(POMDP) algorithm is used to search the mobile target ina known graph The main objective is to ensure the captureof the targets via the clearing of the graph in minimal timeIn [22] the authors propose a new approach ContinuousTime Markov Decision Process (CTMDP) to address the PEproblem In relation to MDP CTMDP takes into account theimpact of the transition time between the states involving astrong robustness against changes in transition probabilityIn [23] the authors proposed an innovative approach totallybased on MDP with the aim of resolving sequential multia-gent decision problems by allowing agents to reason explicitlyabout specific coordinationmechanisms In otherwords theydetermined a value iteration algorithm to compute the opti-mal policies that recognizes and reasons about CoordinationProblems

Furthermore we can consider other works based onMDP and IEDS such as [24] in which an exact dynamicprogramming algorithm for partially observable stochasticgames (POSGs) is developed Also it is proven that the algo-rithm iteratively eliminates very weakly dominated strategieswithout first forming a normal form representation of the

Mathematical Problems in Engineering 3

game when it is applied to finite-horizon POSGs Otherwiseseveral types of coordination mechanisms are currently usedsuch as Stochastic Clustering Auctions (SCAs) [25 26] whichrepresent a class of cooperative auctionmethods based on themodified Swendsen-Wang method It permits each robot toreconstitute the tasks that have been linked and applies toheterogeneous teams Other mechanisms are market-basedas TraderBots [27] applied on greedy agents in order toprovide a detailed analysis of the requirements for robust andefficient multirobot coordination in dynamic environmentsFrom the point of view of Game Theory some researchactivities [28] investigated the optimal coordination approachfor multiagent foraging indeed they built the equivalencebetween the optimal solution of MAS and the equilibriumof the game according to the same case and then theyintroduced evolutionarily stable strategy to be of service inresolving the equilibrium selection problem of traditionalGameTheory

3 Problem Description

In this section we focus on the cooperation problem inwhich119899 pursuers situated in a limitary toroidal grid environment119883have to capture 119898 evaders of different types The expressionsof 119875 = 1198751 119875119899 and 119864 = 1198641 119864119898 represent thecollection of 119899 pursuers and119898 evaders respectively Pursuersand evaders represent the roles that the agents can play Eachevader is characterized by a type Re with Re isin I II III IVto indicate howmany pursuers are required to capture it Herewe suppose that the pursuers can evaluate the evadersrsquo typesafter the localization There exist some fixed obstacles withdifferent shapes and sizes in the environment119883The positioncould be destined for the mapping mp

119883 rarr 0 1 such as forall119909 isin 119883 mp(119909) = 1 well then 119909 isan obstacle

In our proposal the strategies of each pursuer are guided bydetermining factors that reflect the individual development ofthe pursuer during the execution of the assigned tasks Thesefactors are detailed as follows

Self-Confidence Degree In multiagent systems each agentmust be able to execute the services requested by the otheragents The self-confidence degree is the assessment of theagentrsquos success in relation to the assigned tasks It is denotedand computed in the following way

forallConf isin [01 1] Conf = max(01 119862119904119862119905) (1)

119862119904 is the number of tasks that the agent has accomplished 119862119905is the number of tasks in which the agent has participated

The Credit In the case where the agent cannot perform atask then its credit will be affected The credit of an agent isdesignated and calculated as follows

forallCredit isin [0 1] Credit = min(1 1 minus 119862119887119862119905 minus 119862119904) (2)

119862119887 is the number of the abandoned tasks by the agent

Environment Position The position of the agent in theenvironment is a crucial criterion for the pursuit sequencesbecause the capture will be easier if the pursuer is closer tothe evader The position Pos is computed as follows

Pos = Dist (119878119875 119878119864) (3)

119878119875 is the state (cell) of the pursuer 119878119864 is the state (cell) ofthe evader Dist is the distance between the pursuer and theevader

Dist (119878119875 119878119864) = radic(CC119875119894 minus CC119864119894)2 + (CC119875119895 minus CC119864119895)2 (4)

(CC119875119894 CC119875119895) is the Cartesian coordinates of the pursuer(CC119864119894 CC119864119895) is the Cartesian coordinates of the evader

In order to distinguish the different coalitions eachpursuer belonging to a coalition calculates the value returnedto itself through this strategy This computation is basedon the factors characterizing the pursuers For example apursuer (1198751) belongs to the coalition (Co) the value of thiscoalition in relation to this pursuer is calculated as follows

Co(val1198751)

= Coef1 times Conf1 + Coef2 times Credit1 + Coef3 times Pos1sum3119896=1 Coef119896

+ Resum119894=2

Coef1 times Conf119894 + Coef2 times Credit119894 + Coef3 times Pos119894Re times sum3119896=1 Coef119896

(5)

Coef is coefficient of each factorOn the basis of these values and using IEDS method our

mechanismwill be able to select the optimal pursuit coalitionfor each evader detected as detailed in Section 6

4 The Iterated Elimination ofDominated Strategies (IEDS)

The coalition is a set of pursuers required to capture theevaders detected In the coalition each pursuer must corre-spond to a specific strategy In our proposal a pure strategy 119904119894defines a specific pursuit grouprsquos integration that the pursuerwill follow in every possible and attainable situation duringthe pursuit Such coalitions may not be random or drawnfrom a distribution as in the case of mixed strategies Astrategy str119894 dominates another strategy str1015840119894 if and only if forevery potential combination of the other playersrsquo actions strminus119894

120583119894 (str119894 strminus119894) ge 120583119894 (str1015840119894 strminus119894) (6)

120583 is function that returns the results obtained through theapplication of a specific strategy

Consider the strategic game shown in Table 1 where thecolumn player has three pure strategies and the row playerhas only two (a) Knowing that the values shown in eachcase represent the expected payoffs returned to the players inthe case of selecting the current strategy Playing the Centeris always better than playing the Right side for the column

4 Mathematical Problems in Engineering

Table 1 Application of IEDS technique

(a)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(b)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(c)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(d)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1Bold fonts reflect how the dominated strategies are deleted

player Consequently we can assume he will eventually stopplaying Right side because it is a dominated strategy (b) Sowe can ignore the Right side column after its eliminationNow row player has a dominated strategy UP Eventuallyrow player stops playing UP then row-UP gets eliminated(c) Finally we have two remaining choices Down-Left andDown-Center and column player notices that it can only winby playing Left (d) So we can deduce that the IEDS solutionis (Down Left) with the following payoff (6 6)

5 Markov Decision Process Principles

Markov Decision Processes (MDPs) provide a mathematicalframework tomodel decisionmaking in situationswhere out-comes are somewhat random and partially under the controlof a decisionmaker In cooperative multiagent systems MDPallows the formalization of sequential decision problemsThisprocess only models the cooperative systems in which thereward function is shared by all players MDP is defined by⟨119873 119878 119860 119879 119877⟩ as follows

119873 is the number of agents Ag119894 in the system 119894 isin1 119873119878 corresponds to the set of agentsrsquo states 119904119860 119860 = 1198601 times 1198602 times sdot sdot sdot times 119860119873 defines the set of jointactions of the agents where 119860 119894 is the set of localactions of the agent Ag119894

119879 is the transition function It returns the probability119879(119904 119886 1199041015840)meaning that the agent goes into the state 1199041015840if it runs the joint action 119886 isin 119860 from state 119904119877 defines the reward function 119877(119904 119886 1199041015840) representsthe reward obtained by the agent when it transits from

the state 119904 to the state 1199041015840 by the execution of the action119886

51 Reward Function In MDP problem the next statesselected are the states returning maximum definitive rewardIn our proposal we have used Heuristic functions in orderto calculate the immediate reward of each state The rewardfunction defines the goals that the pursuers have to achieveand identifies the environmental obstacles To calculate thisfunction we relied on the agentsrsquo environment positiondetailed in Section 3 which allows the distribution of therewards on the environmental cells fairly The calculation ofthe rewards in each state 119904 concerned is effectuated as follows

119877 (119904 119886) =

120574 if 119864119894 sube 1199040 mp (119909) = 1120574 minus Val (Dist (CC119875CC119864)) else

(7)

120574 is the maximum reward Val(Dist(CC119875CC119864)) representsthe distance value

Regarding the distribution of the rewards in the standardcells we note that the reward function is inversely propor-tional to the distance function

Figure 1 illustrates a part of our simulation environmentdetailed in Section 7 The values displayed in the differentcells [1198811 1198812 1198813] represent the gains generated by the rewardfunction The dynamic rewards will be awarded to anypursuer situated in the cell concerned during the pursuit

1198811 the reward could be obtained if the pursuerconcerned tracks the first evader1198812 the reward could be obtained if the pursuerconcerned tracks the second evader1198813 is the index of the cell (occupied or free)

52 Transition Function The transition probabilities (120588)describe the dynamism of the environment They play therole of the next-state function in a problem-solving searchknowing that every state could be the possible next stateaccording to the action undertaken in the actual state Ourapproach is developed in grid of cells environment whereeach agent can move in four different states 119904up 119904down 119904leftand 119904right

The transition probabilities of the pursuers are based onthe reward degree as shown

sum1199041015840

120588 (1199041015840 | 119904 119886) = 1

120588 (1199041015840 | 119904 119886) = 119877 (1199041015840 119886)120574

120588 (1199041015840 | 119904 119886) = max (120588 (119904 | 119904 119886) 120588 (119904up | 119904 119886) 120588 (119904down | 119904 119886) 120588 (119904right | 119904 119886) 120588 (119904left | 119904 119886))

forall119904 119886

(8)

Mathematical Problems in Engineering 5

[41 97 0]

[42 96 0]

[43 95 0]

[44 94 0]

[40 98 1]

[41 97 0]

[42 96 0]

[43 95 0]

[39 97 0]

[40 96 0]

[41 95 0]

[42 94 1]

[38 96 0]

[39 95 0]

[41 93 0]

[37 95 0]

[38 94 0]

[39 93 0]

[40 92 0]

[36 94 0]

[37 93 0]

[38 92 0]

[39 91 0]

[35 93 0]

[36 92 0]

[38 90 0]

[34 92 0]

[35 91 0]

[36 90 0]

[37 89 0]

[34 90 0]

[35 89 0]

[36 88 0]

[33 91 1]

Figure 1 Reward function applied to the grid environment Cells with a red frame the selected states blue agents pursuers green agentsevaders black cells cells containing obstacles

The linkages between the evader and each pursuer shownin Figure 2 reflect the optimal trajectories provided by theapplication of the method proposed in this section duringeach different pursuit step

6 Coalition Formation AlgorithmBased on IEDS

A number of coalition formation algorithms have beendeveloped to define which of the potential coalitions shouldactually be formed To do so they typically compute avalue for each coalition known as the coalition value whichprovides an indication of the expected results that could bederived if this coalition is constitutedThen having calculatedall the coalitional values the decision about the optimalcoalition to form can be selected We employ an iterativealgorithm in order to determine the optimal coalitions ofagents It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coali-tions that have lower contribution values to MAS efficiencyThe pseudocode of our algorithm is shown in Algorithm 1

First the algorithm calculates all the possible coalitions(Nbrcl) that the pursuers can form before their filtration asneeded The expected number of the possible coalitions toform is calculated according to the following

Nbrcl = 119899(119899 minus Re1)Re1 times

119899 minus Re1(119899 minus (Re1 + Re2))Re2

times sdot sdot sdot times 119899 minus (Re1 + sdot sdot sdot + Re119873minus1)(119899 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(119899 minus sum119896=119895minus1119896=0

Re119896)(119899 minus sum119896=119895

119896=0Re119896)Re119895

(9)

119899 is the number of pursuers in the environment 119873 is thenumber of evaders detected Re0 = 0

In order to distribute the calculation of the possiblecoalitions among the pursuers the possible general coalitions

119899 The number of pursuers119894 = 0119896 = 0119895 = indicator of the chase iterationCalculate the possible coalitionsWhile (119862life gt 0) doCalculate the value of each coalitionWhile (number of coalitions gt 1) do

Eliminate the dominated strategy of 119875119894119894 larr 119894 mod 119899 + 1end whileAssign the pursuersrsquo roles according to theSelected coalitionChase iteration

End whileIf (capture = true) thenWhile (119896 le 119899)

Update (Reward119875119896 )119870++end while

ElseThe guilty pursuers pay some fines

end if

Algorithm 1

(Ω) will be calculated A general coalition enrolls all thepursuers required to capture the set of evaders detected

Ω = 119899(119899 minus 120582)120582 (10)

120582 = (Re1 + Re2 + sdot sdot sdotRe119873)The general coalitions generated will be equitably dis-

tributed among the agents playing the role Pursuer Specif-ically each general coalition will be composed of 119873 pur-suit groups From each general coalition generated through

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Mathematical Problems in Engineering 3

game when it is applied to finite-horizon POSGs Otherwiseseveral types of coordination mechanisms are currently usedsuch as Stochastic Clustering Auctions (SCAs) [25 26] whichrepresent a class of cooperative auctionmethods based on themodified Swendsen-Wang method It permits each robot toreconstitute the tasks that have been linked and applies toheterogeneous teams Other mechanisms are market-basedas TraderBots [27] applied on greedy agents in order toprovide a detailed analysis of the requirements for robust andefficient multirobot coordination in dynamic environmentsFrom the point of view of Game Theory some researchactivities [28] investigated the optimal coordination approachfor multiagent foraging indeed they built the equivalencebetween the optimal solution of MAS and the equilibriumof the game according to the same case and then theyintroduced evolutionarily stable strategy to be of service inresolving the equilibrium selection problem of traditionalGameTheory

3 Problem Description

In this section we focus on the cooperation problem inwhich119899 pursuers situated in a limitary toroidal grid environment119883have to capture 119898 evaders of different types The expressionsof 119875 = 1198751 119875119899 and 119864 = 1198641 119864119898 represent thecollection of 119899 pursuers and119898 evaders respectively Pursuersand evaders represent the roles that the agents can play Eachevader is characterized by a type Re with Re isin I II III IVto indicate howmany pursuers are required to capture it Herewe suppose that the pursuers can evaluate the evadersrsquo typesafter the localization There exist some fixed obstacles withdifferent shapes and sizes in the environment119883The positioncould be destined for the mapping mp

119883 rarr 0 1 such as forall119909 isin 119883 mp(119909) = 1 well then 119909 isan obstacle

In our proposal the strategies of each pursuer are guided bydetermining factors that reflect the individual development ofthe pursuer during the execution of the assigned tasks Thesefactors are detailed as follows

Self-Confidence Degree In multiagent systems each agentmust be able to execute the services requested by the otheragents The self-confidence degree is the assessment of theagentrsquos success in relation to the assigned tasks It is denotedand computed in the following way

forallConf isin [01 1] Conf = max(01 119862119904119862119905) (1)

119862119904 is the number of tasks that the agent has accomplished 119862119905is the number of tasks in which the agent has participated

The Credit In the case where the agent cannot perform atask then its credit will be affected The credit of an agent isdesignated and calculated as follows

forallCredit isin [0 1] Credit = min(1 1 minus 119862119887119862119905 minus 119862119904) (2)

119862119887 is the number of the abandoned tasks by the agent

Environment Position The position of the agent in theenvironment is a crucial criterion for the pursuit sequencesbecause the capture will be easier if the pursuer is closer tothe evader The position Pos is computed as follows

Pos = Dist (119878119875 119878119864) (3)

119878119875 is the state (cell) of the pursuer 119878119864 is the state (cell) ofthe evader Dist is the distance between the pursuer and theevader

Dist (119878119875 119878119864) = radic(CC119875119894 minus CC119864119894)2 + (CC119875119895 minus CC119864119895)2 (4)

(CC119875119894 CC119875119895) is the Cartesian coordinates of the pursuer(CC119864119894 CC119864119895) is the Cartesian coordinates of the evader

In order to distinguish the different coalitions eachpursuer belonging to a coalition calculates the value returnedto itself through this strategy This computation is basedon the factors characterizing the pursuers For example apursuer (1198751) belongs to the coalition (Co) the value of thiscoalition in relation to this pursuer is calculated as follows

Co(val1198751)

= Coef1 times Conf1 + Coef2 times Credit1 + Coef3 times Pos1sum3119896=1 Coef119896

+ Resum119894=2

Coef1 times Conf119894 + Coef2 times Credit119894 + Coef3 times Pos119894Re times sum3119896=1 Coef119896

(5)

Coef is coefficient of each factorOn the basis of these values and using IEDS method our

mechanismwill be able to select the optimal pursuit coalitionfor each evader detected as detailed in Section 6

4 The Iterated Elimination ofDominated Strategies (IEDS)

The coalition is a set of pursuers required to capture theevaders detected In the coalition each pursuer must corre-spond to a specific strategy In our proposal a pure strategy 119904119894defines a specific pursuit grouprsquos integration that the pursuerwill follow in every possible and attainable situation duringthe pursuit Such coalitions may not be random or drawnfrom a distribution as in the case of mixed strategies Astrategy str119894 dominates another strategy str1015840119894 if and only if forevery potential combination of the other playersrsquo actions strminus119894

120583119894 (str119894 strminus119894) ge 120583119894 (str1015840119894 strminus119894) (6)

120583 is function that returns the results obtained through theapplication of a specific strategy

Consider the strategic game shown in Table 1 where thecolumn player has three pure strategies and the row playerhas only two (a) Knowing that the values shown in eachcase represent the expected payoffs returned to the players inthe case of selecting the current strategy Playing the Centeris always better than playing the Right side for the column

4 Mathematical Problems in Engineering

Table 1 Application of IEDS technique

(a)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(b)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(c)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(d)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1Bold fonts reflect how the dominated strategies are deleted

player Consequently we can assume he will eventually stopplaying Right side because it is a dominated strategy (b) Sowe can ignore the Right side column after its eliminationNow row player has a dominated strategy UP Eventuallyrow player stops playing UP then row-UP gets eliminated(c) Finally we have two remaining choices Down-Left andDown-Center and column player notices that it can only winby playing Left (d) So we can deduce that the IEDS solutionis (Down Left) with the following payoff (6 6)

5 Markov Decision Process Principles

Markov Decision Processes (MDPs) provide a mathematicalframework tomodel decisionmaking in situationswhere out-comes are somewhat random and partially under the controlof a decisionmaker In cooperative multiagent systems MDPallows the formalization of sequential decision problemsThisprocess only models the cooperative systems in which thereward function is shared by all players MDP is defined by⟨119873 119878 119860 119879 119877⟩ as follows

119873 is the number of agents Ag119894 in the system 119894 isin1 119873119878 corresponds to the set of agentsrsquo states 119904119860 119860 = 1198601 times 1198602 times sdot sdot sdot times 119860119873 defines the set of jointactions of the agents where 119860 119894 is the set of localactions of the agent Ag119894

119879 is the transition function It returns the probability119879(119904 119886 1199041015840)meaning that the agent goes into the state 1199041015840if it runs the joint action 119886 isin 119860 from state 119904119877 defines the reward function 119877(119904 119886 1199041015840) representsthe reward obtained by the agent when it transits from

the state 119904 to the state 1199041015840 by the execution of the action119886

51 Reward Function In MDP problem the next statesselected are the states returning maximum definitive rewardIn our proposal we have used Heuristic functions in orderto calculate the immediate reward of each state The rewardfunction defines the goals that the pursuers have to achieveand identifies the environmental obstacles To calculate thisfunction we relied on the agentsrsquo environment positiondetailed in Section 3 which allows the distribution of therewards on the environmental cells fairly The calculation ofthe rewards in each state 119904 concerned is effectuated as follows

119877 (119904 119886) =

120574 if 119864119894 sube 1199040 mp (119909) = 1120574 minus Val (Dist (CC119875CC119864)) else

(7)

120574 is the maximum reward Val(Dist(CC119875CC119864)) representsthe distance value

Regarding the distribution of the rewards in the standardcells we note that the reward function is inversely propor-tional to the distance function

Figure 1 illustrates a part of our simulation environmentdetailed in Section 7 The values displayed in the differentcells [1198811 1198812 1198813] represent the gains generated by the rewardfunction The dynamic rewards will be awarded to anypursuer situated in the cell concerned during the pursuit

1198811 the reward could be obtained if the pursuerconcerned tracks the first evader1198812 the reward could be obtained if the pursuerconcerned tracks the second evader1198813 is the index of the cell (occupied or free)

52 Transition Function The transition probabilities (120588)describe the dynamism of the environment They play therole of the next-state function in a problem-solving searchknowing that every state could be the possible next stateaccording to the action undertaken in the actual state Ourapproach is developed in grid of cells environment whereeach agent can move in four different states 119904up 119904down 119904leftand 119904right

The transition probabilities of the pursuers are based onthe reward degree as shown

sum1199041015840

120588 (1199041015840 | 119904 119886) = 1

120588 (1199041015840 | 119904 119886) = 119877 (1199041015840 119886)120574

120588 (1199041015840 | 119904 119886) = max (120588 (119904 | 119904 119886) 120588 (119904up | 119904 119886) 120588 (119904down | 119904 119886) 120588 (119904right | 119904 119886) 120588 (119904left | 119904 119886))

forall119904 119886

(8)

Mathematical Problems in Engineering 5

[41 97 0]

[42 96 0]

[43 95 0]

[44 94 0]

[40 98 1]

[41 97 0]

[42 96 0]

[43 95 0]

[39 97 0]

[40 96 0]

[41 95 0]

[42 94 1]

[38 96 0]

[39 95 0]

[41 93 0]

[37 95 0]

[38 94 0]

[39 93 0]

[40 92 0]

[36 94 0]

[37 93 0]

[38 92 0]

[39 91 0]

[35 93 0]

[36 92 0]

[38 90 0]

[34 92 0]

[35 91 0]

[36 90 0]

[37 89 0]

[34 90 0]

[35 89 0]

[36 88 0]

[33 91 1]

Figure 1 Reward function applied to the grid environment Cells with a red frame the selected states blue agents pursuers green agentsevaders black cells cells containing obstacles

The linkages between the evader and each pursuer shownin Figure 2 reflect the optimal trajectories provided by theapplication of the method proposed in this section duringeach different pursuit step

6 Coalition Formation AlgorithmBased on IEDS

A number of coalition formation algorithms have beendeveloped to define which of the potential coalitions shouldactually be formed To do so they typically compute avalue for each coalition known as the coalition value whichprovides an indication of the expected results that could bederived if this coalition is constitutedThen having calculatedall the coalitional values the decision about the optimalcoalition to form can be selected We employ an iterativealgorithm in order to determine the optimal coalitions ofagents It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coali-tions that have lower contribution values to MAS efficiencyThe pseudocode of our algorithm is shown in Algorithm 1

First the algorithm calculates all the possible coalitions(Nbrcl) that the pursuers can form before their filtration asneeded The expected number of the possible coalitions toform is calculated according to the following

Nbrcl = 119899(119899 minus Re1)Re1 times

119899 minus Re1(119899 minus (Re1 + Re2))Re2

times sdot sdot sdot times 119899 minus (Re1 + sdot sdot sdot + Re119873minus1)(119899 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(119899 minus sum119896=119895minus1119896=0

Re119896)(119899 minus sum119896=119895

119896=0Re119896)Re119895

(9)

119899 is the number of pursuers in the environment 119873 is thenumber of evaders detected Re0 = 0

In order to distribute the calculation of the possiblecoalitions among the pursuers the possible general coalitions

119899 The number of pursuers119894 = 0119896 = 0119895 = indicator of the chase iterationCalculate the possible coalitionsWhile (119862life gt 0) doCalculate the value of each coalitionWhile (number of coalitions gt 1) do

Eliminate the dominated strategy of 119875119894119894 larr 119894 mod 119899 + 1end whileAssign the pursuersrsquo roles according to theSelected coalitionChase iteration

End whileIf (capture = true) thenWhile (119896 le 119899)

Update (Reward119875119896 )119870++end while

ElseThe guilty pursuers pay some fines

end if

Algorithm 1

(Ω) will be calculated A general coalition enrolls all thepursuers required to capture the set of evaders detected

Ω = 119899(119899 minus 120582)120582 (10)

120582 = (Re1 + Re2 + sdot sdot sdotRe119873)The general coalitions generated will be equitably dis-

tributed among the agents playing the role Pursuer Specif-ically each general coalition will be composed of 119873 pur-suit groups From each general coalition generated through

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

4 Mathematical Problems in Engineering

Table 1 Application of IEDS technique

(a)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(b)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(c)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1

(d)

Left Center RightUp 5 4 3 8 1 5Down 6 6 6 0 minus3 minus1Bold fonts reflect how the dominated strategies are deleted

player Consequently we can assume he will eventually stopplaying Right side because it is a dominated strategy (b) Sowe can ignore the Right side column after its eliminationNow row player has a dominated strategy UP Eventuallyrow player stops playing UP then row-UP gets eliminated(c) Finally we have two remaining choices Down-Left andDown-Center and column player notices that it can only winby playing Left (d) So we can deduce that the IEDS solutionis (Down Left) with the following payoff (6 6)

5 Markov Decision Process Principles

Markov Decision Processes (MDPs) provide a mathematicalframework tomodel decisionmaking in situationswhere out-comes are somewhat random and partially under the controlof a decisionmaker In cooperative multiagent systems MDPallows the formalization of sequential decision problemsThisprocess only models the cooperative systems in which thereward function is shared by all players MDP is defined by⟨119873 119878 119860 119879 119877⟩ as follows

119873 is the number of agents Ag119894 in the system 119894 isin1 119873119878 corresponds to the set of agentsrsquo states 119904119860 119860 = 1198601 times 1198602 times sdot sdot sdot times 119860119873 defines the set of jointactions of the agents where 119860 119894 is the set of localactions of the agent Ag119894

119879 is the transition function It returns the probability119879(119904 119886 1199041015840)meaning that the agent goes into the state 1199041015840if it runs the joint action 119886 isin 119860 from state 119904119877 defines the reward function 119877(119904 119886 1199041015840) representsthe reward obtained by the agent when it transits from

the state 119904 to the state 1199041015840 by the execution of the action119886

51 Reward Function In MDP problem the next statesselected are the states returning maximum definitive rewardIn our proposal we have used Heuristic functions in orderto calculate the immediate reward of each state The rewardfunction defines the goals that the pursuers have to achieveand identifies the environmental obstacles To calculate thisfunction we relied on the agentsrsquo environment positiondetailed in Section 3 which allows the distribution of therewards on the environmental cells fairly The calculation ofthe rewards in each state 119904 concerned is effectuated as follows

119877 (119904 119886) =

120574 if 119864119894 sube 1199040 mp (119909) = 1120574 minus Val (Dist (CC119875CC119864)) else

(7)

120574 is the maximum reward Val(Dist(CC119875CC119864)) representsthe distance value

Regarding the distribution of the rewards in the standardcells we note that the reward function is inversely propor-tional to the distance function

Figure 1 illustrates a part of our simulation environmentdetailed in Section 7 The values displayed in the differentcells [1198811 1198812 1198813] represent the gains generated by the rewardfunction The dynamic rewards will be awarded to anypursuer situated in the cell concerned during the pursuit

1198811 the reward could be obtained if the pursuerconcerned tracks the first evader1198812 the reward could be obtained if the pursuerconcerned tracks the second evader1198813 is the index of the cell (occupied or free)

52 Transition Function The transition probabilities (120588)describe the dynamism of the environment They play therole of the next-state function in a problem-solving searchknowing that every state could be the possible next stateaccording to the action undertaken in the actual state Ourapproach is developed in grid of cells environment whereeach agent can move in four different states 119904up 119904down 119904leftand 119904right

The transition probabilities of the pursuers are based onthe reward degree as shown

sum1199041015840

120588 (1199041015840 | 119904 119886) = 1

120588 (1199041015840 | 119904 119886) = 119877 (1199041015840 119886)120574

120588 (1199041015840 | 119904 119886) = max (120588 (119904 | 119904 119886) 120588 (119904up | 119904 119886) 120588 (119904down | 119904 119886) 120588 (119904right | 119904 119886) 120588 (119904left | 119904 119886))

forall119904 119886

(8)

Mathematical Problems in Engineering 5

[41 97 0]

[42 96 0]

[43 95 0]

[44 94 0]

[40 98 1]

[41 97 0]

[42 96 0]

[43 95 0]

[39 97 0]

[40 96 0]

[41 95 0]

[42 94 1]

[38 96 0]

[39 95 0]

[41 93 0]

[37 95 0]

[38 94 0]

[39 93 0]

[40 92 0]

[36 94 0]

[37 93 0]

[38 92 0]

[39 91 0]

[35 93 0]

[36 92 0]

[38 90 0]

[34 92 0]

[35 91 0]

[36 90 0]

[37 89 0]

[34 90 0]

[35 89 0]

[36 88 0]

[33 91 1]

Figure 1 Reward function applied to the grid environment Cells with a red frame the selected states blue agents pursuers green agentsevaders black cells cells containing obstacles

The linkages between the evader and each pursuer shownin Figure 2 reflect the optimal trajectories provided by theapplication of the method proposed in this section duringeach different pursuit step

6 Coalition Formation AlgorithmBased on IEDS

A number of coalition formation algorithms have beendeveloped to define which of the potential coalitions shouldactually be formed To do so they typically compute avalue for each coalition known as the coalition value whichprovides an indication of the expected results that could bederived if this coalition is constitutedThen having calculatedall the coalitional values the decision about the optimalcoalition to form can be selected We employ an iterativealgorithm in order to determine the optimal coalitions ofagents It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coali-tions that have lower contribution values to MAS efficiencyThe pseudocode of our algorithm is shown in Algorithm 1

First the algorithm calculates all the possible coalitions(Nbrcl) that the pursuers can form before their filtration asneeded The expected number of the possible coalitions toform is calculated according to the following

Nbrcl = 119899(119899 minus Re1)Re1 times

119899 minus Re1(119899 minus (Re1 + Re2))Re2

times sdot sdot sdot times 119899 minus (Re1 + sdot sdot sdot + Re119873minus1)(119899 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(119899 minus sum119896=119895minus1119896=0

Re119896)(119899 minus sum119896=119895

119896=0Re119896)Re119895

(9)

119899 is the number of pursuers in the environment 119873 is thenumber of evaders detected Re0 = 0

In order to distribute the calculation of the possiblecoalitions among the pursuers the possible general coalitions

119899 The number of pursuers119894 = 0119896 = 0119895 = indicator of the chase iterationCalculate the possible coalitionsWhile (119862life gt 0) doCalculate the value of each coalitionWhile (number of coalitions gt 1) do

Eliminate the dominated strategy of 119875119894119894 larr 119894 mod 119899 + 1end whileAssign the pursuersrsquo roles according to theSelected coalitionChase iteration

End whileIf (capture = true) thenWhile (119896 le 119899)

Update (Reward119875119896 )119870++end while

ElseThe guilty pursuers pay some fines

end if

Algorithm 1

(Ω) will be calculated A general coalition enrolls all thepursuers required to capture the set of evaders detected

Ω = 119899(119899 minus 120582)120582 (10)

120582 = (Re1 + Re2 + sdot sdot sdotRe119873)The general coalitions generated will be equitably dis-

tributed among the agents playing the role Pursuer Specif-ically each general coalition will be composed of 119873 pur-suit groups From each general coalition generated through

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Mathematical Problems in Engineering 5

[41 97 0]

[42 96 0]

[43 95 0]

[44 94 0]

[40 98 1]

[41 97 0]

[42 96 0]

[43 95 0]

[39 97 0]

[40 96 0]

[41 95 0]

[42 94 1]

[38 96 0]

[39 95 0]

[41 93 0]

[37 95 0]

[38 94 0]

[39 93 0]

[40 92 0]

[36 94 0]

[37 93 0]

[38 92 0]

[39 91 0]

[35 93 0]

[36 92 0]

[38 90 0]

[34 92 0]

[35 91 0]

[36 90 0]

[37 89 0]

[34 90 0]

[35 89 0]

[36 88 0]

[33 91 1]

Figure 1 Reward function applied to the grid environment Cells with a red frame the selected states blue agents pursuers green agentsevaders black cells cells containing obstacles

The linkages between the evader and each pursuer shownin Figure 2 reflect the optimal trajectories provided by theapplication of the method proposed in this section duringeach different pursuit step

6 Coalition Formation AlgorithmBased on IEDS

A number of coalition formation algorithms have beendeveloped to define which of the potential coalitions shouldactually be formed To do so they typically compute avalue for each coalition known as the coalition value whichprovides an indication of the expected results that could bederived if this coalition is constitutedThen having calculatedall the coalitional values the decision about the optimalcoalition to form can be selected We employ an iterativealgorithm in order to determine the optimal coalitions ofagents It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coali-tions that have lower contribution values to MAS efficiencyThe pseudocode of our algorithm is shown in Algorithm 1

First the algorithm calculates all the possible coalitions(Nbrcl) that the pursuers can form before their filtration asneeded The expected number of the possible coalitions toform is calculated according to the following

Nbrcl = 119899(119899 minus Re1)Re1 times

119899 minus Re1(119899 minus (Re1 + Re2))Re2

times sdot sdot sdot times 119899 minus (Re1 + sdot sdot sdot + Re119873minus1)(119899 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(119899 minus sum119896=119895minus1119896=0

Re119896)(119899 minus sum119896=119895

119896=0Re119896)Re119895

(9)

119899 is the number of pursuers in the environment 119873 is thenumber of evaders detected Re0 = 0

In order to distribute the calculation of the possiblecoalitions among the pursuers the possible general coalitions

119899 The number of pursuers119894 = 0119896 = 0119895 = indicator of the chase iterationCalculate the possible coalitionsWhile (119862life gt 0) doCalculate the value of each coalitionWhile (number of coalitions gt 1) do

Eliminate the dominated strategy of 119875119894119894 larr 119894 mod 119899 + 1end whileAssign the pursuersrsquo roles according to theSelected coalitionChase iteration

End whileIf (capture = true) thenWhile (119896 le 119899)

Update (Reward119875119896 )119870++end while

ElseThe guilty pursuers pay some fines

end if

Algorithm 1

(Ω) will be calculated A general coalition enrolls all thepursuers required to capture the set of evaders detected

Ω = 119899(119899 minus 120582)120582 (10)

120582 = (Re1 + Re2 + sdot sdot sdotRe119873)The general coalitions generated will be equitably dis-

tributed among the agents playing the role Pursuer Specif-ically each general coalition will be composed of 119873 pur-suit groups From each general coalition generated through

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

6 Mathematical Problems in Engineering

[53 88 0] [52 89 0] [51 90 0] [50 91 0] [49 92 0] [48 93 0] [47 94 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 1] [41 90 0] [40 89 0] [39 88 0]

[54 89 1] [52 91 0] [51 92 0] [50 93 0] [49 94 0] [48 95 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [42 91 0] [41 90 0]

[55 90 0] [54 91 0] [53 92 0] [52 93 0] [51 94 0] [50 95 0] [49 96 0] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0] [41 90 0]

[56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 98 1] [48 97 0] [47 96 0] [46 95 0] [45 94 0] [44 93 0] [43 92 0] [42 91 0]

[57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 97 0] [49 96 0] [48 95 0] [47 94 0] [46 93 0] [45 92 0] [44 91 0] [43 90 0]

[58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 95 0] [51 96 0] [50 95 0] [49 94 0] [48 93 0] [47 92 0] [46 91 0] [45 90 0] [44 89 0]

[59 88 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [52 95 0] [51 94 0] [50 93 0] [49 92 0] [47 90 0] [46 89 0] [45 88 0]

[60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 94 0] [52 93 0] [51 92 0] [50 91 0] [49 90 0] [48 89 0] [47 88 0] [46 87 0]

[61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [56 91 0] [55 92 0] [54 93 0] [53 92 0] [52 91 0] [51 90 0] [50 89 0] [49 88 0] [48 87 0] [47 86 0]

[62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 0] [55 92 0] [54 91 0] [53 90 0] [52 89 0] [51 88 0] [50 87 0] [49 86 0]

[63 84 0] [62 85 0] [61 86 0] [60 87 0] [59 88 0] [58 89 0] [57 90 1] [56 91 0] [55 90 0] [54 89 0] [53 88 0] [52 87 0] [51 86 0] [50 85 0] [49 84 1]

Figure 2 Pursuersrsquo behaviors prediction after the transition function application

precedent calculation equation (10) a number of possiblecoalition formations () will be computed

= 120582(120582 minus Re1)Re1 times

120582 minus Re1(120582 minus (Re1 + Re2))Re2

times sdot sdot sdot times 120582 minus (Re1 + sdot sdot sdot + Re119873minus1)(120582 minus (Re1 + Re2 + sdot sdot sdot + Re119873))Re119873

= 119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(11)

Nbrcl = Ω times

= 119899(119899 minus 120582)120582 times

119873prod119895=1

(120582 minus sum119896=119895minus1119896=0

Re119896)(120582 minus sum119896=119895

119896=0Re119896)Re119895

(12)

This decentralized technique aims to balance the computa-tion of the possible coalition formations among the pursuersFurthermore this method is more detailed in Section 7 viaits application to the case study Noting that the value of eachcoalition generated in relation to each pursuer contained willbe calculated according to (5) Each pursuer shares the coali-tions calculated with the others to start the coalition selection

process Secondly we apply the Iterated Elimination ofDomi-nated Strategies principle with the aim of finding the optimalcoalition through this process Knowing that each strategyis represented by a possible coalition formation Alternatelyeach pursuer eliminates the coalition with the lower value inrelation to itself and sends the update to the next pursuer con-cerned Pursuers are assigned in accordance with the selectedcoalition Each pursuer performs only one chase iterationThe algorithm repeats these instructions until the end of thechase life When 119862life = 0 and the captures are accomplishedsome rewards will be attributed to each one of the participat-ing pursuers the rewards are determined as follows

Rewards119901 = 119877 (119904 119886)119871 (13)

119871 is the number of the coalitionrsquos membersOtherwise in the case of capture failure the guilty

pursuers must pay some fines to the rest of the coalitionrsquosmembersThese fines are calculated as the followingmanner

120574 = (1199040 1198861 1199041 1198862 1199042 119904ℎ 119886ℎ) Fines = ℎminus1sum

119894=119908

119877 (119904119894 119886119894+1) (14)

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Mathematical Problems in Engineering 7

Table 2 The distribution of the possible coalitionsrsquo computation

Pursuers 1198751 1198752 1198753 1198754 1198755 1198756 1198757 1198758 1198759 11987510General coalitions 5 5 5 5 5 4 4 4 4 4Possible coalitions generated 350 350 350 350 350 280 280 280 280 280

Agentsrsquo localization

Possible coalitionsrsquo calculation

Value of coalitionsrsquo calculation

Dominated strategyrsquos elimination

Pursuersrsquo assignment

Chase iteration

Capture

Rewards Fines

Yes

Yes

Yes

No

No

NoClife = 0

Nbrcl gt 1

Figure 3 Flow chart of the algorithm

120574 is the set of states regarding the guilty pursuer 0 le 119908 le ℎwhere 119908 represents the index of coalitionrsquos beginning

Figure 3 reflects the flow chart of this pursuit algorithmresuming the different steps explained in this section fromthe detection to the capture of the existing evaders

7 Simulation Experiments

In order to evaluate the approach presented in this paperwe realize our pursuit-evasion game on an example takingplace in a rectangular two-dimensional grid with 100 times 100cells Also we can find some obstacles characterized by theconstancy and the solidity As regards the environmentalagents our simulations are based on ten (10) pursuers andtwo (02) evaders of type Re = IV As shown in Figure 4it is specifically detailed how a pursuer of this type can be

captured Each agent ismarkedwith an IDnumber Both pur-suers and evaders have a similar speed (one cell per iteration)and an excellent communication systemThe pursuersrsquo teamsare totally capable of determining their actual positions andthe evaders disappeared after the capture accomplishment Ifthe capture of the evader is performed the coalition createdto improve this pursuit will be automatically dissolved

Table 2 resumes the results obtained after the applicationof the decentralized computation of the possible coalitionson this case study according to the process explained inSection 6 In this case and according to (10) the possiblegeneral coalitions (Ω) are equal to 45 coalitions which willbe distributed on the existing pursuers as shown in Table 2From each general coalition a number of coalitions will begenerated ( = 70) according to (11)

Moreover we have studied the number of possible coali-tions generated in parallel by the pursuers in relation to thenumber of the existing pursuers as shown in Figure 5 Inrelation to the centralized method in which only one pursuercomputes the possible coalitions the decentralized methoddecreases significantly the time concerning this computationthrough its division on the number of the existing pursuers

In order to vary the types of coordination mechanismsused in our simulations we have seen the usefulness tocompare this work with our recent pursuit-evasion researchactivity based onAGRorganizationalmodel [6]We have alsoseen the usefulness to compare our results with the resultsachieved after the application of an auction mechanismillustrated in Case-C- [8] Noting that these twomethods arebased on decentralized coalition formation

Case-A- is pursuit based on (AGR) organizationalmodel [6]Case-B- is our new approach based on the IteratedElimination of Dominated Strategies (IEDS) princi-pleCase-C- is a pursuit based on an economical auctionmechanism (MPMEGBTBA) [8]

The results shown in Figure 6 represent the average capturingtime achieved during forty (40) different simulation casestudies (episodes) from the beginning to the end of eachone In order to showcase the difference between the differentcases we have seen the usefulness to take into considerationthe iteration concept which determines the number of statechanges regarding each agent during the pursuits

In the first case (AGR) the average capturing timeobtained equals 144225 iterations Furthermore we notean interesting decrease until 10057 iterations after theapplication of MPMEGBTBA due to the appropriate rolesrsquoattribution provided by this auction mechanism Howeverthe results that occurred through the application of IEDS

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

8 Mathematical Problems in Engineering

[49 94 0] [48 95 0] [47 96 0]

[50 95 0] [49 96 0] [48 97 1]

[51 96 0] [50 97 1] [49 98 1]

[52 95 0] [51 96 0] [50 97 1]

[53 94 0] [52 95 0] [51 96 0]

[54 93 0] [52 95 0]

[46 95 0]

[47 96 0]

[48 97 1]

[49 96 0]

[50 95 0]

[51 94 0]

[45 94 0]

[46 95 0]

[47 96 0]

[48 95 0]

[49 94 0]

[50 93 0]

[44 93 0]

[45 94 0]

[46 95 0]

[47 94 0]

[48 93 0]

[49 92 0]

[44 93 0]

[45 94 0]

[46 93 0]

[47 92 0]

[42 91 0]

[43 92 0]

[44 93 0]

[45 92 0]

[46 91 0]

[47 90 0]

Figure 4 Example evader of the type Re equals IV after the capture

0

100000

200000

300000

400000

500000

600000

Num

ber o

f pos

sible

coal

ition

s

11 12 13 14 1510Number of pursuers

minus100000

DecentralizedCentralized

Figure 5 Centralized and decentralized coalitionsrsquo computation inrelation to the number of pursuers

coalition formation algorithm revealed an average capturingtime of 78 iterations

Figure 7 shows the development of the pursuersrsquo rewardfunction during the same pursuit period of the different casesand the outcomes reflect the improvement brought by thedynamic formations and reformations of the pursuit teams

Finally we have focused on the study of the averagepursuersrsquo rewards obtained in each case of chase iterationduring full pursuit In Figure 8 the 119909-axis represents thevalue of rewards achieved by a pursuer and each unit 119910-axisrepresents chase iterations The results shown in this figurereveal a certain similarity between AGR and MPMEGBTBA

40

60

80

100

120

140

160

180

200

The a

vera

ge ca

ptur

ing

time (

itera

tions

)

30 4010 201Time (episodes)

Case-A-Case-B-Case-C-

Figure 6 Average capturing time after (40) different pursuits

in which the average pursuerrsquos rewards achieved reach 059and 0507 respectively Otherwise in IEDS the average resultincreases until 088

The results shown in Figure 9 represent the internallearning development (self-confidence development) of thepursuers during the pursuit applied to the three cases Thepositivity of the results is due to the grouping and theequitable task sharing between the different pursuit groupsimposed by the different coordination mechanisms appliedMoreover we can note the superiority of the results obtainedthrough IEDS in relation to the other cases provoked by the

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Mathematical Problems in Engineering 9

30

40

50

60

70

80

90

100

110

120

Purs

uers

rsquo rew

ards

dev

elopm

ent

10 20 30 40 50 60 70 781Time (iterations)

Case-A-Case-B-Case-C-

Figure 7 The pursuersrsquo rewards development

Case-C-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-B-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Case-A-

obta

ined

Aver

age p

ursu

ersrsquo

rew

ards

Time (iterations)

34

17

00

minus17

0 10 20 30 40 50

Figure 8 Average pursuersrsquo reward per iteration

Table 3 Pursuit result

AGR IEDS MPMEGBTBAAverage capturing time(iteration) 144225 78 10057

Average pursuersrsquo rewardsobtained by iteration 059 088 0507

Average pursuersrsquo self-confidence development 0408 0533 0451

0

1

2

3

4

5

6

7

8

Purs

uers

rsquo self

-con

fiden

ce d

evelo

pmen

t

20 40 60 80 1000Pursuit development ()

Case-A-Case-B-Case-C-

Figure 9 Pursuersrsquo learning development during the pursuit

dynamism of the coalition formations and the optimality oftask sharing provided by our algorithm

Table 3 summarizes the main results achieved we deducethat the pursuit algorithm based on the Iterative Eliminationof Dominated Strategies (IEDS) is better than the algorithmbased on AGR organizational model as well as the auctionmechanism based on MPMEGBTBA regarding the rewardrsquosdevelopment as well as the capturing time The leading causeof this fact is the dynamism of our coalitional groups Thisflexible mechanism improves the intelligence of the pursuersconcerning the displacements and the rewards acquisitionknowing that team reward is optimal in the case where eachpursuer undertakes the best path

8 Conclusion

This paper presents a kind of a decentralized coalitionmethod based on GameTheory principles for different typesof pursuit the proposed method demonstrates the positiveimpact imposed by the dynamismof the coalition formationsFirstly we have extended our coalition algorithm from theIterated Elimination of Dominated Strategies This processallows us to determine the optimal pursuit coalition strategyaccording to the Game Theory principles Secondly wehave focused on the Markov Decision Process as a motion

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

10 Mathematical Problems in Engineering

strategy of our pursuers in the environment (grid of cells)To highlight our proposal we have developed a comparativestudy between our algorithm and a decentralized strategyof coalition based on AGR organizational model as well asan auction mechanism based on MPMEGBTBA Simulationresults shown in this paper demonstrate that the algorithmbased on IEDS is feasible and effective

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This paper is supported by National Natural Science Foun-dation of China (no 61375081) and a special fund project ofHarbin science and technology innovation talents research(no RC2013XK010002)

References

[1] A Ghazikhani H R Mashadi and R Monsefi ldquoA novelalgorithm for coalition formation in multi-agent systems usingcooperative game theoryrdquo in Proceedings of the 18th IranianConference on Electrical Engineering (ICEE rsquo10) pp 512ndash516Isfahan Iran May 2010

[2] L Boongasame ldquoPreference coalition formation algorithm forbuyer coalitionrdquo in Proceedings of the 9th International JointConference on Computer Science and Software Engineering(JCSSE rsquo12) pp 225ndash230 Bangkok Thailand May 2012

[3] J Ferber O Gutknecht and F Michel ldquoFrom agents to orga-nizations an organizational view of multi-agent systemsrdquo inAgent-Oriented Software Engineering IV 4th InternationalWork-shop AOSE 2003 Melbourne Australia July 15 2003 RevisedPapers P Giorgini J Muller and J Odell Eds vol 2935of Lecture Notes in Computer Science pp 214ndash230 SpringerBerlin Germany 2004

[4] J Y Kuo H-F Yu K F-R Liu and F-W Lee ldquoMultiagentcooperative learning strategies for pursuit-evasion gamesrdquoMathematical Problems in Engineering vol 2015 Article ID964871 13 pages 2015

[5] G I Ibragimov and M Salimi ldquoPursuit-evasion differentialgame with many inertial playersrdquo Mathematical Problems inEngineering vol 2009 Article ID 653723 15 pages 2009

[6] M Souidi S Piao G Li and L Chang ldquoCoalition formationalgorithm based on organization and Markov decision processfor multi-player pursuit evasionrdquo International Journal of Mul-tiagent and Grid Systems vol 11 no 1 pp 1ndash13 2015

[7] M E-H Souidi P Songhao L Guo and C Lin ldquoMulti-agentcooperation pursuit based on an extension of AALAADINorganisational modelrdquo Journal of Experimental amp TheoreticalArtificial Intelligence vol 28 no 6 pp 1075ndash1088 2016

[8] Z-S Cai L-N Sun H-B Gao P-C Zhou S-H Piao andQ-C Huang ldquoMulti-robot cooperative pursuit based on taskbundle auctionsrdquo in Intelligent Robotics and Applications CXiong Y Huang Y Xiong and H Liu Eds vol 5314 ofLecture Notes in Computer Science pp 235ndash244 SpringerBerlin Germany 2008

[9] B Goode A Kurdila and M Roan ldquoA graph theoreticalapproach toward a switched feedback controller for pursuit-evasion scenariosrdquo in Proceedings of the American Control

Conference (ACC rsquo11) pp 4804ndash4809 San Francisco Calif USAJune 2011

[10] V Isler S Kannan and S Khanna ldquoRandomized pursuitndashevasion in a polygonal environmentrdquo IEEE Transactions onRobotics vol 21 no 5 pp 875ndash884 2005

[11] J Thunberg P Ogren and X Hu ldquoA Boolean Control Networkapproach to pursuit evasion problems in polygonal environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation (ICRA rsquo11) pp 4506ndash4511 May 2011

[12] J Li Q Pan and B Hong ldquoA new approach of multi-robotcooperative pursuit based on association rule data miningrdquoInternational Journal of Advanced Robotic Systems vol 7 no 3pp 165ndash172 2010

[13] J Liu S Liu HWu andY Zhang ldquoA pursuit-evasion algorithmbased on hierarchical reinforcement learningrdquo in Proceedingsof the International Conference on Measuring Technology andMechatronics Automation (ICMTMA rsquo09) vol 2 pp 482ndash486IEEE Hunan China April 2009

[14] J P Hespanha M Prandini and S Sastry ldquoProbabilisticpursuit-evasion games a one-step Nash approachrdquo in Proceed-ings of the 39th IEEE Conference on Decision and Control vol 3pp 2272ndash2277 Sydney Australia December 2000

[15] J Dong X Zhang and X Jia ldquoStrategies of pursuit-evasiongame based on improved potential field and differential gametheory for mobile robotsrdquo in Proceedings of the 2nd Interna-tional Conference on Instrumentation Measurement ComputerCommunication and Control (IMCCC rsquo12) pp 1452ndash1456 IEEEHarbin China December 2012

[16] F Amigoni and N Basilico ldquoA game theoretical approach tofinding optimal strategies for pursuit evasion in grid environ-mentsrdquo in Proceedings of the IEEE International Conference onRobotics and Automation River Centre pp 2155ndash2162 SaintPaul Minn USA May 2012

[17] R Liu and Z-S Cai ldquoA novel approach based on Evolution-ary Game Theoretic model for multi-player pursuit evasionrdquoin Proceedings of the International Conference on ComputerMechatronics Control and Electronic Engineering (CMCE rsquo10)vol 1 pp 107ndash110 August 2010

[18] B Khosravifar F Bouchet R Feyzi-Behnagh R Azevedo and JM Harley ldquoUsing intelligent multi-agent systems to model andfoster self-regulated learning a theoretically-based approachusing Markov decision processrdquo in Proceedings of the 27th IEEEInternational Conference on Advanced Information Networkingand Applications (AINA rsquo13) pp 413ndash420 IEEE BarcelonaSpain March 2013

[19] L Ting Z Cheng and ZWeiming ldquoPlanning for target systemstriking based on Markov decision processrdquo in Proceedingsof the IEEE International Conference on Service Operationsand Logistics and Informatics (SOLI rsquo13) pp 154ndash159 IEEEDongguan China July 2013

[20] W Lin Z Qu and M A Simaan ldquoNash strategies for pursuit-evasion differential games involving limited observationsrdquo IEEETransactions on Aerospace and Electronic Systems vol 51 no 2pp 1347ndash1356 2015

[21] E Ehsan and F Kunwar ldquoProbabilistic search and pursuitevasion on a graphrdquo Transactions on Machine Learning andArtificial Intelligence vol 3 no 3 pp 57ndash65 2015

[22] S Jia X Wang and L Shen ldquoA continuous-time markovdecision process-based method with application in a pursuit-evasion examplerdquo IEEE Transactions on Systems Man andCybernetics Systems vol 46 no 9 pp 1215ndash1225 2016

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Mathematical Problems in Engineering 11

[23] C Boutilier ldquoSequential optimality and coordination in mul-tiagent systemsrdquo in Proceedings of the 16th International JointConference on Artificial Intelligence (IJCAI rsquo99) vol 1 pp 478ndash485 Stockholm Sweden August 1999

[24] E A Hansen D S Bernstein and S Zilberstein ldquoDynamicprogramming for partially observable stochastic gamesrdquo in Pro-ceedings of the 19th National Conference on Artificial Intelligencepp 709ndash715 2004

[25] K Zhang E G Collins Jr and A Barbu ldquoAn efficient stochas-tic clustering auction for heterogeneous robotic collaborativeteamsrdquo Journal of Intelligent amp Robotic Systems vol 72 no 3-4 pp 541ndash558 2013

[26] K Zhang E G Collins Jr and D Shi ldquoCentralized anddistributed task allocation in multi-robot teams via a stochasticclustering auctionrdquo ACM Transactions on Autonomous andAdaptive Systems vol 7 no 2 article 21 2012

[27] M B Dias and T Sandholm TraderBots a new paradigmfor robust and efficient multirobot coordination in dynamicenvironments [PhD thesis] The Robotics Institute CarnegieMellon University Pittsburgh Pa USA 2004

[28] Y Wang Evolutionary Game Theory Based Cooperation Algo-rithm inMulti-Agent SystemMultiagent Systems InTech RijekaCroatia 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A New Decentralized Approach of Multiagent Cooperative …downloads.hindawi.com/journals/mpe/2016/5192423.pdf · 2019-07-30 · is is an open access article distributed

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of