research article a decentralized partially...

16
Research Article A Decentralized Partially Observable Decision Model for Recognizing the Multiagent Goal in Simulation Systems Shiguang Yue, 1,2 Kristina Yordanova, 2 Frank Krüger, 2 Thomas Kirste, 2 and Yabing Zha 1 1 College of Information System and Management, National University of Defense Technology, Changsha 410073, China 2 Institute of Computer Science, University of Rostock, 18051 Rostock, Germany Correspondence should be addressed to Shiguang Yue; [email protected] Received 29 February 2016; Revised 17 July 2016; Accepted 24 July 2016 Academic Editor: Lu Zhen Copyright © 2016 Shiguang Yue et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Multiagent goal recognition is important in many simulation systems. Many of the existing modeling methods need detailed domain knowledge of agents’ cooperative behaviors and a training dataset to estimate policies. To solve these problems, we propose a novel decentralized partially observable decision model (Dec-POMDM), which models cooperative behaviors by joint policies. In this compact way, we only focus on the distribution of joint policies. Additionally, a model-free algorithm, cooperative colearning based on Sarsa, is exploited to estimate agents’ policies under the assumption of rationality, which makes the training dataset unnecessary. In the inference, considering that the Dec-POMDM is discrete and its state space is large, we implement a marginal filter (MF) under the framework of the Dec-POMDM, where the initial world states and results of actions are uncertain. In the experiments, a new scenario is designed based on the standard predator-prey problem: we increase the number of preys, and our aim is to recognize the real target of predators. Experiment results show that (a) our method recognizes goals well even when they change dynamically; (b) the Dec-POMDM outperforms supervised trained HMMs in terms of precision, recall, and F-measure; and (c) the MF infers goals more efficiently than the particle filter under the framework of the Dec-POMDM. 1. Introduction With the fast development of computational soſtware and artificial intelligence techniques, agent-based simulation sys- tems become more and more popular for staff training, policy analysis and evaluation, and even entertainments. In devel- oping these systems, people always need to create human- like agents who can make decisions and have interactions with other agents or humans autonomously. For example, in the famous real-time strategy game Star-Craſt, the AI players have to construct buildings, collect resources, produce units, and defeat their enemies [1]. Unfortunately, even though many decision and planning algorithms have been applied to improve the intelligence of these agents, they are still easily defeated, especially when human players play with them in the same scenario for several times. One important reason for that is that these agents are unable to recognize the goal of their opponents or friends. On the other hand, that is what the human players usually do in the game [2]. Obviously, if agents know the goal of others, they can make counter decisions more efficiently. Because goal recognition is significant for creating human-like agents and decision support, many related mod- els and algorithms have been proposed and applied in differ- ent fields, such as hidden Markov models (HMMs) [3], par- tially observable Markov decision processes (POMDPs) [4], Markov logical networks (MLNs) [5], and particle filtering (PF) [6, 7]. However, most of the existing research focuses on single agent scenarios. However, in some scenarios missions are so complex that a number of agents have to constitute a group and achieve their joint goal through cooperation. And our aim is to identify the joint goal of the group but not one member. In most cases, it does not work to directly apply methods for recognizing the single agent in multiagent goal recognition, because we have to consider the relations or interactions between agents and the state space is usually very large. Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2016, Article ID 5323121, 15 pages http://dx.doi.org/10.1155/2016/5323121

Upload: others

Post on 19-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Research ArticleA Decentralized Partially Observable Decision Model forRecognizing the Multiagent Goal in Simulation Systems

Shiguang Yue12 Kristina Yordanova2 Frank Kruumlger2 Thomas Kirste2 and Yabing Zha1

1College of Information System and Management National University of Defense Technology Changsha 410073 China2Institute of Computer Science University of Rostock 18051 Rostock Germany

Correspondence should be addressed to Shiguang Yue yshg052hotmailcom

Received 29 February 2016 Revised 17 July 2016 Accepted 24 July 2016

Academic Editor Lu Zhen

Copyright copy 2016 Shiguang Yue et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Multiagent goal recognition is important inmany simulation systemsMany of the existingmodelingmethods need detailed domainknowledge of agentsrsquo cooperative behaviors and a training dataset to estimate policies To solve these problems we propose a noveldecentralized partially observable decision model (Dec-POMDM) which models cooperative behaviors by joint policies In thiscompact way we only focus on the distribution of joint policies Additionally a model-free algorithm cooperative colearning basedon Sarsa is exploited to estimate agentsrsquo policies under the assumption of rationality whichmakes the training dataset unnecessaryIn the inference considering that theDec-POMDM is discrete and its state space is large we implement amarginal filter (MF) underthe framework of the Dec-POMDM where the initial world states and results of actions are uncertain In the experiments a newscenario is designed based on the standard predator-prey problem we increase the number of preys and our aim is to recognizethe real target of predators Experiment results show that (a) ourmethod recognizes goals well even when they change dynamically(b) the Dec-POMDM outperforms supervised trained HMMs in terms of precision recall and F-measure and (c) the MF infersgoals more efficiently than the particle filter under the framework of the Dec-POMDM

1 Introduction

With the fast development of computational software andartificial intelligence techniques agent-based simulation sys-tems becomemore andmore popular for staff training policyanalysis and evaluation and even entertainments In devel-oping these systems people always need to create human-like agents who can make decisions and have interactionswith other agents or humans autonomously For example inthe famous real-time strategy game Star-Craft the AI playershave to construct buildings collect resources produce unitsand defeat their enemies [1] Unfortunately even thoughmany decision and planning algorithms have been applied toimprove the intelligence of these agents they are still easilydefeated especially when human players play with them inthe same scenario for several times One important reasonfor that is that these agents are unable to recognize the goal oftheir opponents or friendsOn the other hand that is what thehuman players usually do in the game [2] Obviously if agents

know the goal of others they can make counter decisionsmore efficiently

Because goal recognition is significant for creatinghuman-like agents and decision support many related mod-els and algorithms have been proposed and applied in differ-ent fields such as hidden Markov models (HMMs) [3] par-tially observable Markov decision processes (POMDPs) [4]Markov logical networks (MLNs) [5] and particle filtering(PF) [6 7] However most of the existing research focuses onsingle agent scenarios However in some scenarios missionsare so complex that a number of agents have to constitutea group and achieve their joint goal through cooperationAnd our aim is to identify the joint goal of the group butnot one member In most cases it does not work to directlyapply methods for recognizing the single agent in multiagentgoal recognition because we have to consider the relations orinteractions between agents and the state space is usually verylarge

Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2016 Article ID 5323121 15 pageshttpdxdoiorg10115520165323121

2 Discrete Dynamics in Nature and Society

There are three fundamental components in the frame-work of multiagent goal recognition (a) modeling the agentsrsquobehaviors the environment and the observations for the rec-ognizer (b) estimating the parameters of the model obtainedthrough learning or other methods and (c) inferring thegoals from the observations In the past people have donesome works on all these aspects However we still have somedifficulties in recognizing multiagent goals in simulationsystems

(a) For modeling behaviors we usually have little knowl-edge about the details of agentsrsquo cooperation such asthe decomposition of the complex task the allocationof the subtasks the communication and other detailsEven though this information is available it is hard topresent all of them formally in a model in practice

(b) For learning parameters sometimes a training datasetfor supervised or unsupervised learning cannot beprovided Even if we have a training set the unsu-pervised learning is still infeasible because the statespace in multiagent scenarios is always very largeAdditionally the supervised learningmay suffer fromthe overfitting problem which will be shown in ourexperiments

(c) For inferring goals traditional exact filters such as anHMM filter are infeasible because the state space islargeThewidely applied PF is available for computingthe posterior distribution of goals but it may failwhen there are not sufficient particles and increasingthe number of particles will consume much morecomputing time

To solve the problems above we present a solution forrecognizing multiagent goals in simulation systemsThe coreof our method is a novel decentralized partially observableMarkov decision model (Dec-POMDM) After modeling theagentsrsquo behaviors the environment and the observationsfor the recognizer by the Dec-POMDM we use an existingmultiagent reinforcement learning (MARL) algorithm toestimate the behavior parameters and a marginal filter (MF)to infer the joint goal of agents Our method has severaladvantages considering the above problems

(a) For the modeling problem the Dec-POMDMpresents the agentsrsquo behaviors in a compact way TheDec-POMDM can be regarded as an extension ofthe well-known decentralized partially observableMarkov decision process (Dec-POMDP) [8] As intheDec-POMDP all details of cooperation are hiddenin joint policies in the Dec-POMDM In this implicitway of behavior modeling we only need to concernourselves with the selection of primitive actions withgiven goals and situations Further knowledge oninteractions between agents is unnecessary Anotheradvantage of the Dec-POMDM is that it can makeuse of the large amount of existing algorithms for theDec-POMDP which will be explained later

(b) For the problem of estimating the agentsrsquo joint pol-icies the MARL algorithm does not need a training

dataset We borrow the definition of goals from thedomain of planning under uncertainty and associateeach goal with a reward function Then we assumethat agents achieve their joint goal by executingoptimal policies which can bring them themaximumcumulative rewardThe optimal policies define coop-erative behaviorswell and can be computed accuratelyor approximated by any algorithm for the Dec-POMDP In this way the training dataset is unneces-sary In this paper the cooperative colearning basedon the Sarsa algorithm is exploited because it doesnot need information of themodel whichmay be dif-ficult to get in complex scenarios [9] Actually heuris-tic algorithms such as memory-bounded dynamicprogramming (MBDP) and joint equilibrium-basedsearch for policies (JESP) may also work if we haveenough information of the environment model [1011]

(c) For the inference the MF outperforms the PF whenthe state space is discrete and large which has beenproved in [12 13] Additionally we will also showthat the MF solves the inference failure problem ofthe PF in our research Another contribution is thatwe implement the MF under the framework of Dec-POMDM in which filtering process is different fromthe work in [12 13]

To validate the Dec-POMDM together with the MARLandMF inmultiagent goal recognition wemodify the classicpredator-prey problem and design a new scenario [9] Inthis scenario there are more than one prey and predatorshave to choose one prey and capture it by moving on a gridmap Additionally predators may change their goal on thehalf way Our method is applied to recognize the real goalof predators based on the observed noisy traces We also usethe simulation model to generate different training datasetswhich consist of different numbers of labeled traces Withthese datasets we compare performances of our method andthe widely applied hidden Markov model (HMM) whosetransition matrix is obtained through supervised learning(the MF is applied doing inference for both models) In theinference part results computed by the MF are comparedto those of PF with different numbers of particles Allperformances are evaluated in terms of precision recall and119865-measure

The rest of the paper is organized as follows Section 2introduces related work Section 3 gives the formal definitionof the Dec-POMDM the modelrsquos DBN representation thecooperative colearning algorithm and the baseline usedfor comparison Section 4 introduces how to use the MFto infer the joint goal Section 5 presents the scenariossettings and results of our experiments Subsequently wedraw conclusions and discuss future works in Section 6

2 Related Work

In many simulation systems such as training systems andcommercial digital games effects of actions are uncertainwhich is usually caused by two reasons (a) agents execute

Discrete Dynamics in Nature and Society 3

erroneous actions with a given probability (b) environmentstates are impacted by some events which are not undercontrol of the agents Because of this uncertainty in sim-ulation systems a goal cannot be achieved by a certainsequence of primitive actions as what we do in the classicalplanning Instead we have to find a set of policies whichdefine the distribution of selecting actions within givensituations Since policies essentially reveal the goal of plan-ning goals can be inferred as discrete parameters or hiddenstates after knowing their corresponding policies and thisprocess is usually implemented under the Markov decisionframework

21 Recognizing Goals of a Single Agent People have pro-posed many methods for recognizing goals of a single agentsome of them are foundations of methods for multiagent goalrecognition Baker et al proposed a computational frame-work based on Bayesian inverse planning for recognizingmental states such as goals [14] They assumed that theagent is rational actions are selected based on an optimalor approximate optimal value function given the beliefsabout the world and the posterior distribution of goals iscomputed by Bayesian inference The core of Bakerrsquos methodis that policies are computed by the planner based on thestandard Markov decision process (MDP) which does notmodel the observing process of the agent Thus Ramırez andGeffner extended Bakerrsquos work by applying the goal-POMDPin formalizing the problem [4] Compared to the MDP thePOMDP models the relation between real world state andobservation of the agent explicitly compared to the POMDPthe goal-POMDP defines the set of goal states BesidesRamırez and Geffner also solved the inference problem evenwhen observations are incomplete Works in [4 14] are verypromising but both of them suffer two limitations (a) theinput for goal recognition is an action sequence howeversometimes we only have observations of environment statesfrom real or virtual sensors and translating observations ofstates to actions is not easy (b) the goal is estimated as a staticparameter however it may be interrupted and changed inone episode

Recently the computational state space model (CSSM)became more and more popular for human behavior model-ing [15 16] In the CSSM transitionmodels of the underlyingdynamic systemcan be described by any computable functionusing compact algorithmic representations Kruger et alalso discussed the performances of applying the CSSM onintention recognition in different real scenarios [16 17] Inthis research (a) intentions as well as actions and envi-ronment states are modeled as hidden states which canbe inferred by online filtering algorithm (b) observationsreflect not only primitive actions but also environment statesThe limitation of the research in [17] is that goal inferenceis not implemented in scenarios where results of actionsare uncertain Another related work on human behaviormodeling under the MDP framework was done by Tastan etal [18] By making use of the inverse reinforcement learning(IRL) and the PF they learned the opponentrsquos motion modeland tracked it in the game Unreal 2004 This work made

the following contributions (a) the features for decision inthe pursuit problem were abstracted (b) IRL was used tolearn the reward function of the opponent (c) the solveddecision model was regarded as the motion function inthe PF However IRL relies on a large dataset and Tastanrsquosmethod is proposed for tracking but not for goal recognition

22 Multiagent Goal Recognition Based on Inverse PlanningThe inverse planning theory can also be used in the multi-agent domain Baker et al inferred relational goals betweenagents (such as chasing and fleeing) by using multiagentMDP framework to model interactions between agents andthe environment [19] In this model each agent selectedactions based on the world state its goal and its beliefsabout other agentsrsquo goals Mental state inference is doneby inverse planning under the assumption that all agentsare approximately rational Ullman et al also successfullyapplied this theory in more complex social goals such ashelping and hindering where an agentrsquos goals depend onthe goals of other agents [20] In the military domainRiordan et al borrowed Bakerrsquos idea and applied Bayesianinverse planning to inferred intents in multi-UnmannedAerial Systems (UASs) [21] Additionally IRL was also usedto learn reward function Even though Bakerrsquos theory is quitepromising it can only work when every agent has accurateknowledge of the world state because the multiagent MDPdoes not model the observing process Besides Bayesianinverse planning does not allow the goal to change Anotherrelated work under the Markov decision framework inmultiagent settings was done by Doshi et al [22] Althoughtheir main aim is to learn the agentsrsquo behavior modelswithout recognizing goals the process of estimating mentalstates is very similar to Bayesian approaches for probabilisticplan recognition In Doshirsquos work the interactive partiallyobservable Markov decision process (I-POMDP) was used tomodel interactions between agents I-POMDP is an extensionof POMDP formultiagent settings Comparing to POMDP I-POMDP defines an interactive state space which combinesthe traditional physical state space with explicit models ofother agents sharing the environment in order to predict theirbehavior Thus I-POMDP is applicable in situations whereagents may have identical or conflicting objectives HoweverI-POMDP has to deal with the problem ldquowhat do you thinkthat I think that you thinkrdquo which makes finding optimal orapproximately optimal policies very hard [23] Actually inmany multiagent scenarios such as the football game or thefirst person shooting game the agents being recognized sharea common goal This makes the Dec-POMDP frameworksufficient for modeling cooperation behaviors Additionallythe increasing interestsnumber of works of planning theorybased onDec-POMDP can provide us with a large number ofplanners [24 25]

23 Multiagent Goal Recognition Based on DBN Filtering Ifall actions and world states (the agent and the environment)are defined as variables with time labels the MDP can beregarded as a special case of directed probabilistic graphicalmodels (PGMs) With this idea some people ignore the

4 Discrete Dynamics in Nature and Society

reward function but only concern themselves with the poli-cies and infer goals under the dynamic Bayesian frameworkFor example Saria and Mahadevan presented a theoreticalframework for online probabilistic plan recognition in coop-erative multiagent systems This model extends the AbstractHidden Markov Model and consists of a Hierarchical Mul-tiagent Markov Process that allows reasoning about theinteraction among multiple cooperating agents The Rao-Blackwellized particle filtering (RBPF) is also used for theinference [26 27] Pfeffer et al [28] studied the problem ofmonitoring goals team structure and state in a dynamicenvironment an urban warfare field where uncoordinatedor loosely coordinated units attempt to attack a target Theyused the standard DBN to model cooperation behaviors(communication group constitution) and the world statesAn extension of the PF named factored particle filtering isalso exploited in the inference We also proposed a LogicalHierarchical Hidden Semi-Markov Model (LHHSMM) torecognize goals as well as cooperation modes of a team ina complex environment where the observation was noisyand partially missing and the goal was changeable TheLHHSMM is a branch of the Statistical Relational Learning(SRL) method which combines the PGM and the first orderlogic it also presents the team behaviors in a hierarchicalway The inference for the LHHSMM was done by a logicalparticle filer These works based on directed PGM theoryhave the advantage that they can use filtering algorithmHowever they suffer some problems (a) constructing thegraph model needs a lot of domain knowledge and we haveto vary the structure in different applications (b) the graphstructure will be very complex when the number of agents islarge which will make parameter learning and goal inferencetime consuming sometimes even infeasible (c) they need atraining dataset Othermodels based on data-driven trainingsuch as the Markov logic networks and deep learning havethe same problems listed above [29 30]

3 The Model

We propose the Dec-POMDM for formalizing the worldstates behaviors and goals in the problem In this sectionwe first introduce the formal definition of the Dec-POMDMand explain relations among variables in thismodel by aDBNrepresentation Then the planning algorithm for finding outthe policies is given

31 Formalization One of foundations of our POMDM is thewidely applied Dec-POMDP However the Dec-POMDP isproposed for solving decision problem and there is no defini-tion of the goal and the observation model for the recognizerin the Dec-POMDP Additionally the multiagent joint goalmay be terminated because of achievement or interruptionThus we design the Dec-POMDM as a combination of threeparts (a) the standard Dec-POMDP (b) the joint goal andgoal termination variable (c) the observation model for therecognizer The Dec-POMDP is the foundation of the Dec-POMDM

A Dec-POMDM is a tuple ⟨119863 119878 119860 119879 119877Ω119874 120582 ℎ 119868 119866

119864 Υ 119884 119885 119862 119861⟩ where

(i) 119863 = 1 2 119899 is the set of 119899 agents(ii) 119878 is a finite set of world states 119904 which contains all

necessary information for making a decision(iii) 119860 is the finite set of joint actions(iv) 119879 is the state transition function(v) 119877 is the reward function(vi) Ω is the finite set of joint observations for agents

making a decision(vii) 119874 is the observation probability function for agents

making a decision(viii) 120582 is the discount factor(ix) ℎ is the horizon of the problem(x) 119868 is the initial state distribution at stage 119905 = 0(xi) 119866 is the set of possible joint goals(xii) 119864 is the set of goal termination variables(xiii) Υ is the observation function for the recognizer(xiv) 119884 is the finite set of joint observations for the recog-

nizer(xv) 119885 is the goal selection function(xvi) 119862 is the goal termination function(xvii) 119861 is the initial goal distribution at stage 119905 = 0

Symbols including 119863 119878 119879 Ω 119874 120582 ℎ 119868 in the Dec-POMDM have the same meanings as those in the Dec-POMDP More definition details and explanations can befound in [9 27] The reward function is defined as 119877 119866 times

119878 times 119860 times 119878 rarr R which shows that the reward depends on thejoint goal the goal set 119866 consists of all possible joint goalsthe goal termination variable set 119864 = 0 1 indicates whetherthe current goal will be continued in the next step (if 119890 isin 119864

is 0 the goal will be continued otherwise a new goal will beselected again in the next step) the observation function forthe recognizer is defined asΥ 119878times119884 rarr [0 1]Υ(119904 119910) = 119901(119910 |

119904) is the probability that the recognizer observes 119910 isin 119884 whilethe real worlds state is 119904 isin 119878 the goal selection function isdefined as 119885 119878 times 119866 rarr [0 1] 119885(119904 119892119900119886119897) = 119901(119892119900119886119897 | 119904) is theconditional probability that agents select 119892119900119886119897 isin 119866 as the newgoal while theworld state is 119904 the goal termination function isdefined as119862 119878times119866 rarr [0 1]119862(119904 119892119900119886119897) = 119901(119890 = 1 | 119904 119892119900119886119897) isthe conditional probability that agents terminate their 119892119900119886119897 isin119866 while the world state is 119904

In the Dec-POMDP the policy of the 119894th agent is definedas Π119894 Ωlowast

119894times 119860119894rarr [0 1] where 119860

119894isin 119860 is the set of possible

actions of agent 119894 Ωlowast119894is the set of observation sequences

Thus given an observation sequence 119900119894

1 119900119894

2 119900

119894

119905 the

agent 119894 selects an action with a probability defined by Π119894

Since the selection of actions depends on the history of theobservations the Dec-POMDP does not satisfy the Markovassumption This attribute makes inferring goals online veryhard (a) if we precompute policies and store them offline itwill require a very large memory because of the combination

Discrete Dynamics in Nature and Society 5

goaltminus1

etminus1

oi

tminus1oj

tminus1

ai

tminus1

aj

tminus1Agent i

Stminus1

ytminus1

Time t minus 1

goalt

et

oi

toj

tai

t

aj

t

St

yt

Time t

Figure 1 The DBN representation of the Dec-POMDM in two adjacent time slices

of possible observations (b) if we compute policies online thefiltering algorithm is infeasible because weights of possiblestates cannot be updated with only the last observation Onepossible solution is to define an interior belief variable foreach agent to filter the world state but it will make theinferring process much more complex In this paper wesimply assume that all agents are memoryless as the work in[9]Then the policy of the 119894th agent is defined asΠ119894 119866timesΩ

119894times

119860119894rarr [0 1] where Ω

119894is the set of possible observations of

agent 119894 The definition of policies in the Dec-POMDM showsthat (a) an agent does not need the history of observationsfor making decisions (b) selection of actions depends on thegoal at that stage In the next section we will further explainthe relations of variables in the Dec-POMDM by its DBNrepresentation

32 The DBN Representation After estimating the agentsrsquopolicies by a multiagent reinforcement learning algorithmwe do not need the reward function in the inference processThus in the DBN representation of the Dec-POMDM thereare six sorts of variables in total the goal the goal termi-nation the action observations for agents making decisionstate and observations for the recognizer In this section wefirst analyze how these variables are affected by other factorsThen we give the DBN representation in two adjacent timeslices

(A) Goal (119892119900119886119897) The goal 119892119900119886119897119905at time 119905 depends on the goal

119892119900119886119897119905minus1

the goal termination variable 119890119905minus1

and the state 119904119905minus1

If 119890119905minus1

= 0 agents keep their goal at time 119905 otherwise 119892119900119886119897119905is

selected depending on 119904119905minus1

(B) Goal Termination Variable (119890) The goal terminationvariable 119890

119905at time 119905 depends on the goal 119892119900119886119897

119905and state 119904

119905

at the same timeThe 119892119900119886119897119905is terminated with the probability

119901(119890119905= 1 | 119904

119905 119892119900119886119897119905)

(C) Action (119886) The action 119886119894

119905selected by agent 119894 at time 119905

depends on the goal 119892119900119886119897119905and its observation at the same

time and the distribution of actions is defined by Π119894

(D) Observations for Agents Making Decision (119900) The obser-vation 119900

119894

119905of agent 119894 at time 119905 reflects the real world state 119904

119905minus1

at time 119905 minus 1 and the agent 119894 observes 119900119894119905with the probability

119901(119900119894

119905| 119904119905minus1

)

(E) State (119904)The world state 119904119905at time 119905 depends on the state

119904119905minus1

at time 119905 minus 1 and the actions of all agents at time 119905 Thedistribution of the updated states can be computed by thestate transition function 119879

(F) Observations for the Recognizer (119910) The observation 119910119905

for the recognizer at time 119905 reflects the real world state 119904119905

at the same time and the recognizer observes 119910119905with the

probability 119901(119910119905| 119904119905)

The DBN representation of the Dec-POMDM in twoadjacent time slices presents all dependencies among vari-ables discussed above as is shown in Figure 1

In Figure 1 only actions and observations of agent 119894 andagent 119895 are presented for simplicity and each agent has noknowledge about others and can only make decision basedon its own observations Although the Dec-POMDM hasa hierarchical structure it models the task decompositionand allocation in an inexplicit way all information aboutcooperation is hidden in the joint policies From filteringtheory point of view the joint policies actually play the roleof the motion function Thus estimating policies is a keyproblem for goal inference

33 Learning the Policy Because the Dec-POMDP is essen-tially a DBN we can simply use some data-driven methodsto learn parameters from a training dataset However thetraining dataset is not always available in some cases Besideswhen the number of agents is large the DBN structurewill become large and complex which makes supervisedor unsupervised learning time consuming To solve theseproblems we assume that the agents to be recognized arerational which is reasonable when there is no history ofagents Then we can use an existing planner based onthe Dec-POMDP framework to find out the optimal orapproximately optimal policies for each possible goal

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

2 Discrete Dynamics in Nature and Society

There are three fundamental components in the frame-work of multiagent goal recognition (a) modeling the agentsrsquobehaviors the environment and the observations for the rec-ognizer (b) estimating the parameters of the model obtainedthrough learning or other methods and (c) inferring thegoals from the observations In the past people have donesome works on all these aspects However we still have somedifficulties in recognizing multiagent goals in simulationsystems

(a) For modeling behaviors we usually have little knowl-edge about the details of agentsrsquo cooperation such asthe decomposition of the complex task the allocationof the subtasks the communication and other detailsEven though this information is available it is hard topresent all of them formally in a model in practice

(b) For learning parameters sometimes a training datasetfor supervised or unsupervised learning cannot beprovided Even if we have a training set the unsu-pervised learning is still infeasible because the statespace in multiagent scenarios is always very largeAdditionally the supervised learningmay suffer fromthe overfitting problem which will be shown in ourexperiments

(c) For inferring goals traditional exact filters such as anHMM filter are infeasible because the state space islargeThewidely applied PF is available for computingthe posterior distribution of goals but it may failwhen there are not sufficient particles and increasingthe number of particles will consume much morecomputing time

To solve the problems above we present a solution forrecognizing multiagent goals in simulation systemsThe coreof our method is a novel decentralized partially observableMarkov decision model (Dec-POMDM) After modeling theagentsrsquo behaviors the environment and the observationsfor the recognizer by the Dec-POMDM we use an existingmultiagent reinforcement learning (MARL) algorithm toestimate the behavior parameters and a marginal filter (MF)to infer the joint goal of agents Our method has severaladvantages considering the above problems

(a) For the modeling problem the Dec-POMDMpresents the agentsrsquo behaviors in a compact way TheDec-POMDM can be regarded as an extension ofthe well-known decentralized partially observableMarkov decision process (Dec-POMDP) [8] As intheDec-POMDP all details of cooperation are hiddenin joint policies in the Dec-POMDM In this implicitway of behavior modeling we only need to concernourselves with the selection of primitive actions withgiven goals and situations Further knowledge oninteractions between agents is unnecessary Anotheradvantage of the Dec-POMDM is that it can makeuse of the large amount of existing algorithms for theDec-POMDP which will be explained later

(b) For the problem of estimating the agentsrsquo joint pol-icies the MARL algorithm does not need a training

dataset We borrow the definition of goals from thedomain of planning under uncertainty and associateeach goal with a reward function Then we assumethat agents achieve their joint goal by executingoptimal policies which can bring them themaximumcumulative rewardThe optimal policies define coop-erative behaviorswell and can be computed accuratelyor approximated by any algorithm for the Dec-POMDP In this way the training dataset is unneces-sary In this paper the cooperative colearning basedon the Sarsa algorithm is exploited because it doesnot need information of themodel whichmay be dif-ficult to get in complex scenarios [9] Actually heuris-tic algorithms such as memory-bounded dynamicprogramming (MBDP) and joint equilibrium-basedsearch for policies (JESP) may also work if we haveenough information of the environment model [1011]

(c) For the inference the MF outperforms the PF whenthe state space is discrete and large which has beenproved in [12 13] Additionally we will also showthat the MF solves the inference failure problem ofthe PF in our research Another contribution is thatwe implement the MF under the framework of Dec-POMDM in which filtering process is different fromthe work in [12 13]

To validate the Dec-POMDM together with the MARLandMF inmultiagent goal recognition wemodify the classicpredator-prey problem and design a new scenario [9] Inthis scenario there are more than one prey and predatorshave to choose one prey and capture it by moving on a gridmap Additionally predators may change their goal on thehalf way Our method is applied to recognize the real goalof predators based on the observed noisy traces We also usethe simulation model to generate different training datasetswhich consist of different numbers of labeled traces Withthese datasets we compare performances of our method andthe widely applied hidden Markov model (HMM) whosetransition matrix is obtained through supervised learning(the MF is applied doing inference for both models) In theinference part results computed by the MF are comparedto those of PF with different numbers of particles Allperformances are evaluated in terms of precision recall and119865-measure

The rest of the paper is organized as follows Section 2introduces related work Section 3 gives the formal definitionof the Dec-POMDM the modelrsquos DBN representation thecooperative colearning algorithm and the baseline usedfor comparison Section 4 introduces how to use the MFto infer the joint goal Section 5 presents the scenariossettings and results of our experiments Subsequently wedraw conclusions and discuss future works in Section 6

2 Related Work

In many simulation systems such as training systems andcommercial digital games effects of actions are uncertainwhich is usually caused by two reasons (a) agents execute

Discrete Dynamics in Nature and Society 3

erroneous actions with a given probability (b) environmentstates are impacted by some events which are not undercontrol of the agents Because of this uncertainty in sim-ulation systems a goal cannot be achieved by a certainsequence of primitive actions as what we do in the classicalplanning Instead we have to find a set of policies whichdefine the distribution of selecting actions within givensituations Since policies essentially reveal the goal of plan-ning goals can be inferred as discrete parameters or hiddenstates after knowing their corresponding policies and thisprocess is usually implemented under the Markov decisionframework

21 Recognizing Goals of a Single Agent People have pro-posed many methods for recognizing goals of a single agentsome of them are foundations of methods for multiagent goalrecognition Baker et al proposed a computational frame-work based on Bayesian inverse planning for recognizingmental states such as goals [14] They assumed that theagent is rational actions are selected based on an optimalor approximate optimal value function given the beliefsabout the world and the posterior distribution of goals iscomputed by Bayesian inference The core of Bakerrsquos methodis that policies are computed by the planner based on thestandard Markov decision process (MDP) which does notmodel the observing process of the agent Thus Ramırez andGeffner extended Bakerrsquos work by applying the goal-POMDPin formalizing the problem [4] Compared to the MDP thePOMDP models the relation between real world state andobservation of the agent explicitly compared to the POMDPthe goal-POMDP defines the set of goal states BesidesRamırez and Geffner also solved the inference problem evenwhen observations are incomplete Works in [4 14] are verypromising but both of them suffer two limitations (a) theinput for goal recognition is an action sequence howeversometimes we only have observations of environment statesfrom real or virtual sensors and translating observations ofstates to actions is not easy (b) the goal is estimated as a staticparameter however it may be interrupted and changed inone episode

Recently the computational state space model (CSSM)became more and more popular for human behavior model-ing [15 16] In the CSSM transitionmodels of the underlyingdynamic systemcan be described by any computable functionusing compact algorithmic representations Kruger et alalso discussed the performances of applying the CSSM onintention recognition in different real scenarios [16 17] Inthis research (a) intentions as well as actions and envi-ronment states are modeled as hidden states which canbe inferred by online filtering algorithm (b) observationsreflect not only primitive actions but also environment statesThe limitation of the research in [17] is that goal inferenceis not implemented in scenarios where results of actionsare uncertain Another related work on human behaviormodeling under the MDP framework was done by Tastan etal [18] By making use of the inverse reinforcement learning(IRL) and the PF they learned the opponentrsquos motion modeland tracked it in the game Unreal 2004 This work made

the following contributions (a) the features for decision inthe pursuit problem were abstracted (b) IRL was used tolearn the reward function of the opponent (c) the solveddecision model was regarded as the motion function inthe PF However IRL relies on a large dataset and Tastanrsquosmethod is proposed for tracking but not for goal recognition

22 Multiagent Goal Recognition Based on Inverse PlanningThe inverse planning theory can also be used in the multi-agent domain Baker et al inferred relational goals betweenagents (such as chasing and fleeing) by using multiagentMDP framework to model interactions between agents andthe environment [19] In this model each agent selectedactions based on the world state its goal and its beliefsabout other agentsrsquo goals Mental state inference is doneby inverse planning under the assumption that all agentsare approximately rational Ullman et al also successfullyapplied this theory in more complex social goals such ashelping and hindering where an agentrsquos goals depend onthe goals of other agents [20] In the military domainRiordan et al borrowed Bakerrsquos idea and applied Bayesianinverse planning to inferred intents in multi-UnmannedAerial Systems (UASs) [21] Additionally IRL was also usedto learn reward function Even though Bakerrsquos theory is quitepromising it can only work when every agent has accurateknowledge of the world state because the multiagent MDPdoes not model the observing process Besides Bayesianinverse planning does not allow the goal to change Anotherrelated work under the Markov decision framework inmultiagent settings was done by Doshi et al [22] Althoughtheir main aim is to learn the agentsrsquo behavior modelswithout recognizing goals the process of estimating mentalstates is very similar to Bayesian approaches for probabilisticplan recognition In Doshirsquos work the interactive partiallyobservable Markov decision process (I-POMDP) was used tomodel interactions between agents I-POMDP is an extensionof POMDP formultiagent settings Comparing to POMDP I-POMDP defines an interactive state space which combinesthe traditional physical state space with explicit models ofother agents sharing the environment in order to predict theirbehavior Thus I-POMDP is applicable in situations whereagents may have identical or conflicting objectives HoweverI-POMDP has to deal with the problem ldquowhat do you thinkthat I think that you thinkrdquo which makes finding optimal orapproximately optimal policies very hard [23] Actually inmany multiagent scenarios such as the football game or thefirst person shooting game the agents being recognized sharea common goal This makes the Dec-POMDP frameworksufficient for modeling cooperation behaviors Additionallythe increasing interestsnumber of works of planning theorybased onDec-POMDP can provide us with a large number ofplanners [24 25]

23 Multiagent Goal Recognition Based on DBN Filtering Ifall actions and world states (the agent and the environment)are defined as variables with time labels the MDP can beregarded as a special case of directed probabilistic graphicalmodels (PGMs) With this idea some people ignore the

4 Discrete Dynamics in Nature and Society

reward function but only concern themselves with the poli-cies and infer goals under the dynamic Bayesian frameworkFor example Saria and Mahadevan presented a theoreticalframework for online probabilistic plan recognition in coop-erative multiagent systems This model extends the AbstractHidden Markov Model and consists of a Hierarchical Mul-tiagent Markov Process that allows reasoning about theinteraction among multiple cooperating agents The Rao-Blackwellized particle filtering (RBPF) is also used for theinference [26 27] Pfeffer et al [28] studied the problem ofmonitoring goals team structure and state in a dynamicenvironment an urban warfare field where uncoordinatedor loosely coordinated units attempt to attack a target Theyused the standard DBN to model cooperation behaviors(communication group constitution) and the world statesAn extension of the PF named factored particle filtering isalso exploited in the inference We also proposed a LogicalHierarchical Hidden Semi-Markov Model (LHHSMM) torecognize goals as well as cooperation modes of a team ina complex environment where the observation was noisyand partially missing and the goal was changeable TheLHHSMM is a branch of the Statistical Relational Learning(SRL) method which combines the PGM and the first orderlogic it also presents the team behaviors in a hierarchicalway The inference for the LHHSMM was done by a logicalparticle filer These works based on directed PGM theoryhave the advantage that they can use filtering algorithmHowever they suffer some problems (a) constructing thegraph model needs a lot of domain knowledge and we haveto vary the structure in different applications (b) the graphstructure will be very complex when the number of agents islarge which will make parameter learning and goal inferencetime consuming sometimes even infeasible (c) they need atraining dataset Othermodels based on data-driven trainingsuch as the Markov logic networks and deep learning havethe same problems listed above [29 30]

3 The Model

We propose the Dec-POMDM for formalizing the worldstates behaviors and goals in the problem In this sectionwe first introduce the formal definition of the Dec-POMDMand explain relations among variables in thismodel by aDBNrepresentation Then the planning algorithm for finding outthe policies is given

31 Formalization One of foundations of our POMDM is thewidely applied Dec-POMDP However the Dec-POMDP isproposed for solving decision problem and there is no defini-tion of the goal and the observation model for the recognizerin the Dec-POMDP Additionally the multiagent joint goalmay be terminated because of achievement or interruptionThus we design the Dec-POMDM as a combination of threeparts (a) the standard Dec-POMDP (b) the joint goal andgoal termination variable (c) the observation model for therecognizer The Dec-POMDP is the foundation of the Dec-POMDM

A Dec-POMDM is a tuple ⟨119863 119878 119860 119879 119877Ω119874 120582 ℎ 119868 119866

119864 Υ 119884 119885 119862 119861⟩ where

(i) 119863 = 1 2 119899 is the set of 119899 agents(ii) 119878 is a finite set of world states 119904 which contains all

necessary information for making a decision(iii) 119860 is the finite set of joint actions(iv) 119879 is the state transition function(v) 119877 is the reward function(vi) Ω is the finite set of joint observations for agents

making a decision(vii) 119874 is the observation probability function for agents

making a decision(viii) 120582 is the discount factor(ix) ℎ is the horizon of the problem(x) 119868 is the initial state distribution at stage 119905 = 0(xi) 119866 is the set of possible joint goals(xii) 119864 is the set of goal termination variables(xiii) Υ is the observation function for the recognizer(xiv) 119884 is the finite set of joint observations for the recog-

nizer(xv) 119885 is the goal selection function(xvi) 119862 is the goal termination function(xvii) 119861 is the initial goal distribution at stage 119905 = 0

Symbols including 119863 119878 119879 Ω 119874 120582 ℎ 119868 in the Dec-POMDM have the same meanings as those in the Dec-POMDP More definition details and explanations can befound in [9 27] The reward function is defined as 119877 119866 times

119878 times 119860 times 119878 rarr R which shows that the reward depends on thejoint goal the goal set 119866 consists of all possible joint goalsthe goal termination variable set 119864 = 0 1 indicates whetherthe current goal will be continued in the next step (if 119890 isin 119864

is 0 the goal will be continued otherwise a new goal will beselected again in the next step) the observation function forthe recognizer is defined asΥ 119878times119884 rarr [0 1]Υ(119904 119910) = 119901(119910 |

119904) is the probability that the recognizer observes 119910 isin 119884 whilethe real worlds state is 119904 isin 119878 the goal selection function isdefined as 119885 119878 times 119866 rarr [0 1] 119885(119904 119892119900119886119897) = 119901(119892119900119886119897 | 119904) is theconditional probability that agents select 119892119900119886119897 isin 119866 as the newgoal while theworld state is 119904 the goal termination function isdefined as119862 119878times119866 rarr [0 1]119862(119904 119892119900119886119897) = 119901(119890 = 1 | 119904 119892119900119886119897) isthe conditional probability that agents terminate their 119892119900119886119897 isin119866 while the world state is 119904

In the Dec-POMDP the policy of the 119894th agent is definedas Π119894 Ωlowast

119894times 119860119894rarr [0 1] where 119860

119894isin 119860 is the set of possible

actions of agent 119894 Ωlowast119894is the set of observation sequences

Thus given an observation sequence 119900119894

1 119900119894

2 119900

119894

119905 the

agent 119894 selects an action with a probability defined by Π119894

Since the selection of actions depends on the history of theobservations the Dec-POMDP does not satisfy the Markovassumption This attribute makes inferring goals online veryhard (a) if we precompute policies and store them offline itwill require a very large memory because of the combination

Discrete Dynamics in Nature and Society 5

goaltminus1

etminus1

oi

tminus1oj

tminus1

ai

tminus1

aj

tminus1Agent i

Stminus1

ytminus1

Time t minus 1

goalt

et

oi

toj

tai

t

aj

t

St

yt

Time t

Figure 1 The DBN representation of the Dec-POMDM in two adjacent time slices

of possible observations (b) if we compute policies online thefiltering algorithm is infeasible because weights of possiblestates cannot be updated with only the last observation Onepossible solution is to define an interior belief variable foreach agent to filter the world state but it will make theinferring process much more complex In this paper wesimply assume that all agents are memoryless as the work in[9]Then the policy of the 119894th agent is defined asΠ119894 119866timesΩ

119894times

119860119894rarr [0 1] where Ω

119894is the set of possible observations of

agent 119894 The definition of policies in the Dec-POMDM showsthat (a) an agent does not need the history of observationsfor making decisions (b) selection of actions depends on thegoal at that stage In the next section we will further explainthe relations of variables in the Dec-POMDM by its DBNrepresentation

32 The DBN Representation After estimating the agentsrsquopolicies by a multiagent reinforcement learning algorithmwe do not need the reward function in the inference processThus in the DBN representation of the Dec-POMDM thereare six sorts of variables in total the goal the goal termi-nation the action observations for agents making decisionstate and observations for the recognizer In this section wefirst analyze how these variables are affected by other factorsThen we give the DBN representation in two adjacent timeslices

(A) Goal (119892119900119886119897) The goal 119892119900119886119897119905at time 119905 depends on the goal

119892119900119886119897119905minus1

the goal termination variable 119890119905minus1

and the state 119904119905minus1

If 119890119905minus1

= 0 agents keep their goal at time 119905 otherwise 119892119900119886119897119905is

selected depending on 119904119905minus1

(B) Goal Termination Variable (119890) The goal terminationvariable 119890

119905at time 119905 depends on the goal 119892119900119886119897

119905and state 119904

119905

at the same timeThe 119892119900119886119897119905is terminated with the probability

119901(119890119905= 1 | 119904

119905 119892119900119886119897119905)

(C) Action (119886) The action 119886119894

119905selected by agent 119894 at time 119905

depends on the goal 119892119900119886119897119905and its observation at the same

time and the distribution of actions is defined by Π119894

(D) Observations for Agents Making Decision (119900) The obser-vation 119900

119894

119905of agent 119894 at time 119905 reflects the real world state 119904

119905minus1

at time 119905 minus 1 and the agent 119894 observes 119900119894119905with the probability

119901(119900119894

119905| 119904119905minus1

)

(E) State (119904)The world state 119904119905at time 119905 depends on the state

119904119905minus1

at time 119905 minus 1 and the actions of all agents at time 119905 Thedistribution of the updated states can be computed by thestate transition function 119879

(F) Observations for the Recognizer (119910) The observation 119910119905

for the recognizer at time 119905 reflects the real world state 119904119905

at the same time and the recognizer observes 119910119905with the

probability 119901(119910119905| 119904119905)

The DBN representation of the Dec-POMDM in twoadjacent time slices presents all dependencies among vari-ables discussed above as is shown in Figure 1

In Figure 1 only actions and observations of agent 119894 andagent 119895 are presented for simplicity and each agent has noknowledge about others and can only make decision basedon its own observations Although the Dec-POMDM hasa hierarchical structure it models the task decompositionand allocation in an inexplicit way all information aboutcooperation is hidden in the joint policies From filteringtheory point of view the joint policies actually play the roleof the motion function Thus estimating policies is a keyproblem for goal inference

33 Learning the Policy Because the Dec-POMDP is essen-tially a DBN we can simply use some data-driven methodsto learn parameters from a training dataset However thetraining dataset is not always available in some cases Besideswhen the number of agents is large the DBN structurewill become large and complex which makes supervisedor unsupervised learning time consuming To solve theseproblems we assume that the agents to be recognized arerational which is reasonable when there is no history ofagents Then we can use an existing planner based onthe Dec-POMDP framework to find out the optimal orapproximately optimal policies for each possible goal

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 3

erroneous actions with a given probability (b) environmentstates are impacted by some events which are not undercontrol of the agents Because of this uncertainty in sim-ulation systems a goal cannot be achieved by a certainsequence of primitive actions as what we do in the classicalplanning Instead we have to find a set of policies whichdefine the distribution of selecting actions within givensituations Since policies essentially reveal the goal of plan-ning goals can be inferred as discrete parameters or hiddenstates after knowing their corresponding policies and thisprocess is usually implemented under the Markov decisionframework

21 Recognizing Goals of a Single Agent People have pro-posed many methods for recognizing goals of a single agentsome of them are foundations of methods for multiagent goalrecognition Baker et al proposed a computational frame-work based on Bayesian inverse planning for recognizingmental states such as goals [14] They assumed that theagent is rational actions are selected based on an optimalor approximate optimal value function given the beliefsabout the world and the posterior distribution of goals iscomputed by Bayesian inference The core of Bakerrsquos methodis that policies are computed by the planner based on thestandard Markov decision process (MDP) which does notmodel the observing process of the agent Thus Ramırez andGeffner extended Bakerrsquos work by applying the goal-POMDPin formalizing the problem [4] Compared to the MDP thePOMDP models the relation between real world state andobservation of the agent explicitly compared to the POMDPthe goal-POMDP defines the set of goal states BesidesRamırez and Geffner also solved the inference problem evenwhen observations are incomplete Works in [4 14] are verypromising but both of them suffer two limitations (a) theinput for goal recognition is an action sequence howeversometimes we only have observations of environment statesfrom real or virtual sensors and translating observations ofstates to actions is not easy (b) the goal is estimated as a staticparameter however it may be interrupted and changed inone episode

Recently the computational state space model (CSSM)became more and more popular for human behavior model-ing [15 16] In the CSSM transitionmodels of the underlyingdynamic systemcan be described by any computable functionusing compact algorithmic representations Kruger et alalso discussed the performances of applying the CSSM onintention recognition in different real scenarios [16 17] Inthis research (a) intentions as well as actions and envi-ronment states are modeled as hidden states which canbe inferred by online filtering algorithm (b) observationsreflect not only primitive actions but also environment statesThe limitation of the research in [17] is that goal inferenceis not implemented in scenarios where results of actionsare uncertain Another related work on human behaviormodeling under the MDP framework was done by Tastan etal [18] By making use of the inverse reinforcement learning(IRL) and the PF they learned the opponentrsquos motion modeland tracked it in the game Unreal 2004 This work made

the following contributions (a) the features for decision inthe pursuit problem were abstracted (b) IRL was used tolearn the reward function of the opponent (c) the solveddecision model was regarded as the motion function inthe PF However IRL relies on a large dataset and Tastanrsquosmethod is proposed for tracking but not for goal recognition

22 Multiagent Goal Recognition Based on Inverse PlanningThe inverse planning theory can also be used in the multi-agent domain Baker et al inferred relational goals betweenagents (such as chasing and fleeing) by using multiagentMDP framework to model interactions between agents andthe environment [19] In this model each agent selectedactions based on the world state its goal and its beliefsabout other agentsrsquo goals Mental state inference is doneby inverse planning under the assumption that all agentsare approximately rational Ullman et al also successfullyapplied this theory in more complex social goals such ashelping and hindering where an agentrsquos goals depend onthe goals of other agents [20] In the military domainRiordan et al borrowed Bakerrsquos idea and applied Bayesianinverse planning to inferred intents in multi-UnmannedAerial Systems (UASs) [21] Additionally IRL was also usedto learn reward function Even though Bakerrsquos theory is quitepromising it can only work when every agent has accurateknowledge of the world state because the multiagent MDPdoes not model the observing process Besides Bayesianinverse planning does not allow the goal to change Anotherrelated work under the Markov decision framework inmultiagent settings was done by Doshi et al [22] Althoughtheir main aim is to learn the agentsrsquo behavior modelswithout recognizing goals the process of estimating mentalstates is very similar to Bayesian approaches for probabilisticplan recognition In Doshirsquos work the interactive partiallyobservable Markov decision process (I-POMDP) was used tomodel interactions between agents I-POMDP is an extensionof POMDP formultiagent settings Comparing to POMDP I-POMDP defines an interactive state space which combinesthe traditional physical state space with explicit models ofother agents sharing the environment in order to predict theirbehavior Thus I-POMDP is applicable in situations whereagents may have identical or conflicting objectives HoweverI-POMDP has to deal with the problem ldquowhat do you thinkthat I think that you thinkrdquo which makes finding optimal orapproximately optimal policies very hard [23] Actually inmany multiagent scenarios such as the football game or thefirst person shooting game the agents being recognized sharea common goal This makes the Dec-POMDP frameworksufficient for modeling cooperation behaviors Additionallythe increasing interestsnumber of works of planning theorybased onDec-POMDP can provide us with a large number ofplanners [24 25]

23 Multiagent Goal Recognition Based on DBN Filtering Ifall actions and world states (the agent and the environment)are defined as variables with time labels the MDP can beregarded as a special case of directed probabilistic graphicalmodels (PGMs) With this idea some people ignore the

4 Discrete Dynamics in Nature and Society

reward function but only concern themselves with the poli-cies and infer goals under the dynamic Bayesian frameworkFor example Saria and Mahadevan presented a theoreticalframework for online probabilistic plan recognition in coop-erative multiagent systems This model extends the AbstractHidden Markov Model and consists of a Hierarchical Mul-tiagent Markov Process that allows reasoning about theinteraction among multiple cooperating agents The Rao-Blackwellized particle filtering (RBPF) is also used for theinference [26 27] Pfeffer et al [28] studied the problem ofmonitoring goals team structure and state in a dynamicenvironment an urban warfare field where uncoordinatedor loosely coordinated units attempt to attack a target Theyused the standard DBN to model cooperation behaviors(communication group constitution) and the world statesAn extension of the PF named factored particle filtering isalso exploited in the inference We also proposed a LogicalHierarchical Hidden Semi-Markov Model (LHHSMM) torecognize goals as well as cooperation modes of a team ina complex environment where the observation was noisyand partially missing and the goal was changeable TheLHHSMM is a branch of the Statistical Relational Learning(SRL) method which combines the PGM and the first orderlogic it also presents the team behaviors in a hierarchicalway The inference for the LHHSMM was done by a logicalparticle filer These works based on directed PGM theoryhave the advantage that they can use filtering algorithmHowever they suffer some problems (a) constructing thegraph model needs a lot of domain knowledge and we haveto vary the structure in different applications (b) the graphstructure will be very complex when the number of agents islarge which will make parameter learning and goal inferencetime consuming sometimes even infeasible (c) they need atraining dataset Othermodels based on data-driven trainingsuch as the Markov logic networks and deep learning havethe same problems listed above [29 30]

3 The Model

We propose the Dec-POMDM for formalizing the worldstates behaviors and goals in the problem In this sectionwe first introduce the formal definition of the Dec-POMDMand explain relations among variables in thismodel by aDBNrepresentation Then the planning algorithm for finding outthe policies is given

31 Formalization One of foundations of our POMDM is thewidely applied Dec-POMDP However the Dec-POMDP isproposed for solving decision problem and there is no defini-tion of the goal and the observation model for the recognizerin the Dec-POMDP Additionally the multiagent joint goalmay be terminated because of achievement or interruptionThus we design the Dec-POMDM as a combination of threeparts (a) the standard Dec-POMDP (b) the joint goal andgoal termination variable (c) the observation model for therecognizer The Dec-POMDP is the foundation of the Dec-POMDM

A Dec-POMDM is a tuple ⟨119863 119878 119860 119879 119877Ω119874 120582 ℎ 119868 119866

119864 Υ 119884 119885 119862 119861⟩ where

(i) 119863 = 1 2 119899 is the set of 119899 agents(ii) 119878 is a finite set of world states 119904 which contains all

necessary information for making a decision(iii) 119860 is the finite set of joint actions(iv) 119879 is the state transition function(v) 119877 is the reward function(vi) Ω is the finite set of joint observations for agents

making a decision(vii) 119874 is the observation probability function for agents

making a decision(viii) 120582 is the discount factor(ix) ℎ is the horizon of the problem(x) 119868 is the initial state distribution at stage 119905 = 0(xi) 119866 is the set of possible joint goals(xii) 119864 is the set of goal termination variables(xiii) Υ is the observation function for the recognizer(xiv) 119884 is the finite set of joint observations for the recog-

nizer(xv) 119885 is the goal selection function(xvi) 119862 is the goal termination function(xvii) 119861 is the initial goal distribution at stage 119905 = 0

Symbols including 119863 119878 119879 Ω 119874 120582 ℎ 119868 in the Dec-POMDM have the same meanings as those in the Dec-POMDP More definition details and explanations can befound in [9 27] The reward function is defined as 119877 119866 times

119878 times 119860 times 119878 rarr R which shows that the reward depends on thejoint goal the goal set 119866 consists of all possible joint goalsthe goal termination variable set 119864 = 0 1 indicates whetherthe current goal will be continued in the next step (if 119890 isin 119864

is 0 the goal will be continued otherwise a new goal will beselected again in the next step) the observation function forthe recognizer is defined asΥ 119878times119884 rarr [0 1]Υ(119904 119910) = 119901(119910 |

119904) is the probability that the recognizer observes 119910 isin 119884 whilethe real worlds state is 119904 isin 119878 the goal selection function isdefined as 119885 119878 times 119866 rarr [0 1] 119885(119904 119892119900119886119897) = 119901(119892119900119886119897 | 119904) is theconditional probability that agents select 119892119900119886119897 isin 119866 as the newgoal while theworld state is 119904 the goal termination function isdefined as119862 119878times119866 rarr [0 1]119862(119904 119892119900119886119897) = 119901(119890 = 1 | 119904 119892119900119886119897) isthe conditional probability that agents terminate their 119892119900119886119897 isin119866 while the world state is 119904

In the Dec-POMDP the policy of the 119894th agent is definedas Π119894 Ωlowast

119894times 119860119894rarr [0 1] where 119860

119894isin 119860 is the set of possible

actions of agent 119894 Ωlowast119894is the set of observation sequences

Thus given an observation sequence 119900119894

1 119900119894

2 119900

119894

119905 the

agent 119894 selects an action with a probability defined by Π119894

Since the selection of actions depends on the history of theobservations the Dec-POMDP does not satisfy the Markovassumption This attribute makes inferring goals online veryhard (a) if we precompute policies and store them offline itwill require a very large memory because of the combination

Discrete Dynamics in Nature and Society 5

goaltminus1

etminus1

oi

tminus1oj

tminus1

ai

tminus1

aj

tminus1Agent i

Stminus1

ytminus1

Time t minus 1

goalt

et

oi

toj

tai

t

aj

t

St

yt

Time t

Figure 1 The DBN representation of the Dec-POMDM in two adjacent time slices

of possible observations (b) if we compute policies online thefiltering algorithm is infeasible because weights of possiblestates cannot be updated with only the last observation Onepossible solution is to define an interior belief variable foreach agent to filter the world state but it will make theinferring process much more complex In this paper wesimply assume that all agents are memoryless as the work in[9]Then the policy of the 119894th agent is defined asΠ119894 119866timesΩ

119894times

119860119894rarr [0 1] where Ω

119894is the set of possible observations of

agent 119894 The definition of policies in the Dec-POMDM showsthat (a) an agent does not need the history of observationsfor making decisions (b) selection of actions depends on thegoal at that stage In the next section we will further explainthe relations of variables in the Dec-POMDM by its DBNrepresentation

32 The DBN Representation After estimating the agentsrsquopolicies by a multiagent reinforcement learning algorithmwe do not need the reward function in the inference processThus in the DBN representation of the Dec-POMDM thereare six sorts of variables in total the goal the goal termi-nation the action observations for agents making decisionstate and observations for the recognizer In this section wefirst analyze how these variables are affected by other factorsThen we give the DBN representation in two adjacent timeslices

(A) Goal (119892119900119886119897) The goal 119892119900119886119897119905at time 119905 depends on the goal

119892119900119886119897119905minus1

the goal termination variable 119890119905minus1

and the state 119904119905minus1

If 119890119905minus1

= 0 agents keep their goal at time 119905 otherwise 119892119900119886119897119905is

selected depending on 119904119905minus1

(B) Goal Termination Variable (119890) The goal terminationvariable 119890

119905at time 119905 depends on the goal 119892119900119886119897

119905and state 119904

119905

at the same timeThe 119892119900119886119897119905is terminated with the probability

119901(119890119905= 1 | 119904

119905 119892119900119886119897119905)

(C) Action (119886) The action 119886119894

119905selected by agent 119894 at time 119905

depends on the goal 119892119900119886119897119905and its observation at the same

time and the distribution of actions is defined by Π119894

(D) Observations for Agents Making Decision (119900) The obser-vation 119900

119894

119905of agent 119894 at time 119905 reflects the real world state 119904

119905minus1

at time 119905 minus 1 and the agent 119894 observes 119900119894119905with the probability

119901(119900119894

119905| 119904119905minus1

)

(E) State (119904)The world state 119904119905at time 119905 depends on the state

119904119905minus1

at time 119905 minus 1 and the actions of all agents at time 119905 Thedistribution of the updated states can be computed by thestate transition function 119879

(F) Observations for the Recognizer (119910) The observation 119910119905

for the recognizer at time 119905 reflects the real world state 119904119905

at the same time and the recognizer observes 119910119905with the

probability 119901(119910119905| 119904119905)

The DBN representation of the Dec-POMDM in twoadjacent time slices presents all dependencies among vari-ables discussed above as is shown in Figure 1

In Figure 1 only actions and observations of agent 119894 andagent 119895 are presented for simplicity and each agent has noknowledge about others and can only make decision basedon its own observations Although the Dec-POMDM hasa hierarchical structure it models the task decompositionand allocation in an inexplicit way all information aboutcooperation is hidden in the joint policies From filteringtheory point of view the joint policies actually play the roleof the motion function Thus estimating policies is a keyproblem for goal inference

33 Learning the Policy Because the Dec-POMDP is essen-tially a DBN we can simply use some data-driven methodsto learn parameters from a training dataset However thetraining dataset is not always available in some cases Besideswhen the number of agents is large the DBN structurewill become large and complex which makes supervisedor unsupervised learning time consuming To solve theseproblems we assume that the agents to be recognized arerational which is reasonable when there is no history ofagents Then we can use an existing planner based onthe Dec-POMDP framework to find out the optimal orapproximately optimal policies for each possible goal

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

4 Discrete Dynamics in Nature and Society

reward function but only concern themselves with the poli-cies and infer goals under the dynamic Bayesian frameworkFor example Saria and Mahadevan presented a theoreticalframework for online probabilistic plan recognition in coop-erative multiagent systems This model extends the AbstractHidden Markov Model and consists of a Hierarchical Mul-tiagent Markov Process that allows reasoning about theinteraction among multiple cooperating agents The Rao-Blackwellized particle filtering (RBPF) is also used for theinference [26 27] Pfeffer et al [28] studied the problem ofmonitoring goals team structure and state in a dynamicenvironment an urban warfare field where uncoordinatedor loosely coordinated units attempt to attack a target Theyused the standard DBN to model cooperation behaviors(communication group constitution) and the world statesAn extension of the PF named factored particle filtering isalso exploited in the inference We also proposed a LogicalHierarchical Hidden Semi-Markov Model (LHHSMM) torecognize goals as well as cooperation modes of a team ina complex environment where the observation was noisyand partially missing and the goal was changeable TheLHHSMM is a branch of the Statistical Relational Learning(SRL) method which combines the PGM and the first orderlogic it also presents the team behaviors in a hierarchicalway The inference for the LHHSMM was done by a logicalparticle filer These works based on directed PGM theoryhave the advantage that they can use filtering algorithmHowever they suffer some problems (a) constructing thegraph model needs a lot of domain knowledge and we haveto vary the structure in different applications (b) the graphstructure will be very complex when the number of agents islarge which will make parameter learning and goal inferencetime consuming sometimes even infeasible (c) they need atraining dataset Othermodels based on data-driven trainingsuch as the Markov logic networks and deep learning havethe same problems listed above [29 30]

3 The Model

We propose the Dec-POMDM for formalizing the worldstates behaviors and goals in the problem In this sectionwe first introduce the formal definition of the Dec-POMDMand explain relations among variables in thismodel by aDBNrepresentation Then the planning algorithm for finding outthe policies is given

31 Formalization One of foundations of our POMDM is thewidely applied Dec-POMDP However the Dec-POMDP isproposed for solving decision problem and there is no defini-tion of the goal and the observation model for the recognizerin the Dec-POMDP Additionally the multiagent joint goalmay be terminated because of achievement or interruptionThus we design the Dec-POMDM as a combination of threeparts (a) the standard Dec-POMDP (b) the joint goal andgoal termination variable (c) the observation model for therecognizer The Dec-POMDP is the foundation of the Dec-POMDM

A Dec-POMDM is a tuple ⟨119863 119878 119860 119879 119877Ω119874 120582 ℎ 119868 119866

119864 Υ 119884 119885 119862 119861⟩ where

(i) 119863 = 1 2 119899 is the set of 119899 agents(ii) 119878 is a finite set of world states 119904 which contains all

necessary information for making a decision(iii) 119860 is the finite set of joint actions(iv) 119879 is the state transition function(v) 119877 is the reward function(vi) Ω is the finite set of joint observations for agents

making a decision(vii) 119874 is the observation probability function for agents

making a decision(viii) 120582 is the discount factor(ix) ℎ is the horizon of the problem(x) 119868 is the initial state distribution at stage 119905 = 0(xi) 119866 is the set of possible joint goals(xii) 119864 is the set of goal termination variables(xiii) Υ is the observation function for the recognizer(xiv) 119884 is the finite set of joint observations for the recog-

nizer(xv) 119885 is the goal selection function(xvi) 119862 is the goal termination function(xvii) 119861 is the initial goal distribution at stage 119905 = 0

Symbols including 119863 119878 119879 Ω 119874 120582 ℎ 119868 in the Dec-POMDM have the same meanings as those in the Dec-POMDP More definition details and explanations can befound in [9 27] The reward function is defined as 119877 119866 times

119878 times 119860 times 119878 rarr R which shows that the reward depends on thejoint goal the goal set 119866 consists of all possible joint goalsthe goal termination variable set 119864 = 0 1 indicates whetherthe current goal will be continued in the next step (if 119890 isin 119864

is 0 the goal will be continued otherwise a new goal will beselected again in the next step) the observation function forthe recognizer is defined asΥ 119878times119884 rarr [0 1]Υ(119904 119910) = 119901(119910 |

119904) is the probability that the recognizer observes 119910 isin 119884 whilethe real worlds state is 119904 isin 119878 the goal selection function isdefined as 119885 119878 times 119866 rarr [0 1] 119885(119904 119892119900119886119897) = 119901(119892119900119886119897 | 119904) is theconditional probability that agents select 119892119900119886119897 isin 119866 as the newgoal while theworld state is 119904 the goal termination function isdefined as119862 119878times119866 rarr [0 1]119862(119904 119892119900119886119897) = 119901(119890 = 1 | 119904 119892119900119886119897) isthe conditional probability that agents terminate their 119892119900119886119897 isin119866 while the world state is 119904

In the Dec-POMDP the policy of the 119894th agent is definedas Π119894 Ωlowast

119894times 119860119894rarr [0 1] where 119860

119894isin 119860 is the set of possible

actions of agent 119894 Ωlowast119894is the set of observation sequences

Thus given an observation sequence 119900119894

1 119900119894

2 119900

119894

119905 the

agent 119894 selects an action with a probability defined by Π119894

Since the selection of actions depends on the history of theobservations the Dec-POMDP does not satisfy the Markovassumption This attribute makes inferring goals online veryhard (a) if we precompute policies and store them offline itwill require a very large memory because of the combination

Discrete Dynamics in Nature and Society 5

goaltminus1

etminus1

oi

tminus1oj

tminus1

ai

tminus1

aj

tminus1Agent i

Stminus1

ytminus1

Time t minus 1

goalt

et

oi

toj

tai

t

aj

t

St

yt

Time t

Figure 1 The DBN representation of the Dec-POMDM in two adjacent time slices

of possible observations (b) if we compute policies online thefiltering algorithm is infeasible because weights of possiblestates cannot be updated with only the last observation Onepossible solution is to define an interior belief variable foreach agent to filter the world state but it will make theinferring process much more complex In this paper wesimply assume that all agents are memoryless as the work in[9]Then the policy of the 119894th agent is defined asΠ119894 119866timesΩ

119894times

119860119894rarr [0 1] where Ω

119894is the set of possible observations of

agent 119894 The definition of policies in the Dec-POMDM showsthat (a) an agent does not need the history of observationsfor making decisions (b) selection of actions depends on thegoal at that stage In the next section we will further explainthe relations of variables in the Dec-POMDM by its DBNrepresentation

32 The DBN Representation After estimating the agentsrsquopolicies by a multiagent reinforcement learning algorithmwe do not need the reward function in the inference processThus in the DBN representation of the Dec-POMDM thereare six sorts of variables in total the goal the goal termi-nation the action observations for agents making decisionstate and observations for the recognizer In this section wefirst analyze how these variables are affected by other factorsThen we give the DBN representation in two adjacent timeslices

(A) Goal (119892119900119886119897) The goal 119892119900119886119897119905at time 119905 depends on the goal

119892119900119886119897119905minus1

the goal termination variable 119890119905minus1

and the state 119904119905minus1

If 119890119905minus1

= 0 agents keep their goal at time 119905 otherwise 119892119900119886119897119905is

selected depending on 119904119905minus1

(B) Goal Termination Variable (119890) The goal terminationvariable 119890

119905at time 119905 depends on the goal 119892119900119886119897

119905and state 119904

119905

at the same timeThe 119892119900119886119897119905is terminated with the probability

119901(119890119905= 1 | 119904

119905 119892119900119886119897119905)

(C) Action (119886) The action 119886119894

119905selected by agent 119894 at time 119905

depends on the goal 119892119900119886119897119905and its observation at the same

time and the distribution of actions is defined by Π119894

(D) Observations for Agents Making Decision (119900) The obser-vation 119900

119894

119905of agent 119894 at time 119905 reflects the real world state 119904

119905minus1

at time 119905 minus 1 and the agent 119894 observes 119900119894119905with the probability

119901(119900119894

119905| 119904119905minus1

)

(E) State (119904)The world state 119904119905at time 119905 depends on the state

119904119905minus1

at time 119905 minus 1 and the actions of all agents at time 119905 Thedistribution of the updated states can be computed by thestate transition function 119879

(F) Observations for the Recognizer (119910) The observation 119910119905

for the recognizer at time 119905 reflects the real world state 119904119905

at the same time and the recognizer observes 119910119905with the

probability 119901(119910119905| 119904119905)

The DBN representation of the Dec-POMDM in twoadjacent time slices presents all dependencies among vari-ables discussed above as is shown in Figure 1

In Figure 1 only actions and observations of agent 119894 andagent 119895 are presented for simplicity and each agent has noknowledge about others and can only make decision basedon its own observations Although the Dec-POMDM hasa hierarchical structure it models the task decompositionand allocation in an inexplicit way all information aboutcooperation is hidden in the joint policies From filteringtheory point of view the joint policies actually play the roleof the motion function Thus estimating policies is a keyproblem for goal inference

33 Learning the Policy Because the Dec-POMDP is essen-tially a DBN we can simply use some data-driven methodsto learn parameters from a training dataset However thetraining dataset is not always available in some cases Besideswhen the number of agents is large the DBN structurewill become large and complex which makes supervisedor unsupervised learning time consuming To solve theseproblems we assume that the agents to be recognized arerational which is reasonable when there is no history ofagents Then we can use an existing planner based onthe Dec-POMDP framework to find out the optimal orapproximately optimal policies for each possible goal

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 5

goaltminus1

etminus1

oi

tminus1oj

tminus1

ai

tminus1

aj

tminus1Agent i

Stminus1

ytminus1

Time t minus 1

goalt

et

oi

toj

tai

t

aj

t

St

yt

Time t

Figure 1 The DBN representation of the Dec-POMDM in two adjacent time slices

of possible observations (b) if we compute policies online thefiltering algorithm is infeasible because weights of possiblestates cannot be updated with only the last observation Onepossible solution is to define an interior belief variable foreach agent to filter the world state but it will make theinferring process much more complex In this paper wesimply assume that all agents are memoryless as the work in[9]Then the policy of the 119894th agent is defined asΠ119894 119866timesΩ

119894times

119860119894rarr [0 1] where Ω

119894is the set of possible observations of

agent 119894 The definition of policies in the Dec-POMDM showsthat (a) an agent does not need the history of observationsfor making decisions (b) selection of actions depends on thegoal at that stage In the next section we will further explainthe relations of variables in the Dec-POMDM by its DBNrepresentation

32 The DBN Representation After estimating the agentsrsquopolicies by a multiagent reinforcement learning algorithmwe do not need the reward function in the inference processThus in the DBN representation of the Dec-POMDM thereare six sorts of variables in total the goal the goal termi-nation the action observations for agents making decisionstate and observations for the recognizer In this section wefirst analyze how these variables are affected by other factorsThen we give the DBN representation in two adjacent timeslices

(A) Goal (119892119900119886119897) The goal 119892119900119886119897119905at time 119905 depends on the goal

119892119900119886119897119905minus1

the goal termination variable 119890119905minus1

and the state 119904119905minus1

If 119890119905minus1

= 0 agents keep their goal at time 119905 otherwise 119892119900119886119897119905is

selected depending on 119904119905minus1

(B) Goal Termination Variable (119890) The goal terminationvariable 119890

119905at time 119905 depends on the goal 119892119900119886119897

119905and state 119904

119905

at the same timeThe 119892119900119886119897119905is terminated with the probability

119901(119890119905= 1 | 119904

119905 119892119900119886119897119905)

(C) Action (119886) The action 119886119894

119905selected by agent 119894 at time 119905

depends on the goal 119892119900119886119897119905and its observation at the same

time and the distribution of actions is defined by Π119894

(D) Observations for Agents Making Decision (119900) The obser-vation 119900

119894

119905of agent 119894 at time 119905 reflects the real world state 119904

119905minus1

at time 119905 minus 1 and the agent 119894 observes 119900119894119905with the probability

119901(119900119894

119905| 119904119905minus1

)

(E) State (119904)The world state 119904119905at time 119905 depends on the state

119904119905minus1

at time 119905 minus 1 and the actions of all agents at time 119905 Thedistribution of the updated states can be computed by thestate transition function 119879

(F) Observations for the Recognizer (119910) The observation 119910119905

for the recognizer at time 119905 reflects the real world state 119904119905

at the same time and the recognizer observes 119910119905with the

probability 119901(119910119905| 119904119905)

The DBN representation of the Dec-POMDM in twoadjacent time slices presents all dependencies among vari-ables discussed above as is shown in Figure 1

In Figure 1 only actions and observations of agent 119894 andagent 119895 are presented for simplicity and each agent has noknowledge about others and can only make decision basedon its own observations Although the Dec-POMDM hasa hierarchical structure it models the task decompositionand allocation in an inexplicit way all information aboutcooperation is hidden in the joint policies From filteringtheory point of view the joint policies actually play the roleof the motion function Thus estimating policies is a keyproblem for goal inference

33 Learning the Policy Because the Dec-POMDP is essen-tially a DBN we can simply use some data-driven methodsto learn parameters from a training dataset However thetraining dataset is not always available in some cases Besideswhen the number of agents is large the DBN structurewill become large and complex which makes supervisedor unsupervised learning time consuming To solve theseproblems we assume that the agents to be recognized arerational which is reasonable when there is no history ofagents Then we can use an existing planner based onthe Dec-POMDP framework to find out the optimal orapproximately optimal policies for each possible goal

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

6 Discrete Dynamics in Nature and Society

Various algorithms have been proposed for solving Dec-POMDP problems Roughly these algorithms can be dividedinto two categories (a) model-based algorithms under thegeneral name of dynamic programming (b) model-free algo-rithms under the general name of reinforcement learning [8]In this paper we select a multiagent reinforcement learningalgorithm cooperative colearning based on Sarsa [9] becauseit does not need a state transition function of the world

The main idea of cooperative colearning is that at eachstep one chooses a subgroup of agents and updates theirpolicies to optimize the task given the rest of the agentshave fixed plans then after a number of iterations the jointpolicies can converge to a Nash equilibrium In this paperwe only consider settings where agents are homogeneousAll agents share the same observation model and policiesThus we only need to define one POMDP 119872 for all agentsAll parameters of119872 can be obtained directly from the givenDec-POMDP except for the transition function 119879

1015840 Later wewill show how to compute 1198791015840 from 119879 which is the transitionfunction of the Dec-POMDPThe Dec-POMDP problem canbe solved by the following steps [9]

Step 1 We set 119905 = 0 and start from an arbitrary policy Π0

Step 2 We select an arbitrary agent and compute the statetransition function 119879

1015840 of 119872 considering that policies of allagents are Π

119905 except the selected agent that is refining theplan

Step 3 We computeΠlowast which is the optimal policy of119872 andset Π119905+1 = Π

lowast

Step 4 Weupdate the policy of each agent toΠ119905+1 set 119905 = 119905+1and return to Step 2

In Step 2 the state transition function of the POMDP forany agent can be computed by formula (1) (we assume that werefine the plan for the first agent)

1198791015840(119904 1198861 1199041015840) = sum

(1199002119900119899)

119879 (119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840)

sdot

119899

prod

119894=2

1198741015840(119904 119900119899)

(1)

where 1198741015840 is the observation function defined in 119872 119886119894is the

action of the 119894th agent 119900119894 is the observation of the 119894th agent119879(119904 1198861 Π119905(1199002) Π

119905(119900119899) 1199041015840) is the probability that the state

transits from 119904 to 1199041015840 while the action of the first agent is 119886

1

and other agents choose actions based on their observationsand policy Π119905

Unfortunately computing formula (1) is always verydifficult in complex scenarios Thus we apply the Sarsaalgorithm for finding out the optimal policy of119872 (in Steps 2and 3) [8] In this process the POMDP problem is mapped tothe MDP problem by regarding the observation as the worldstate Then agents get feedback from the environment andwe do not need to compute the updated reward function andstate transition function

4 Inference

In the Dec-POMDM the goal is defined as a hidden stateindirectly reflected by observations In this case many filter-ing algorithms can be used to infer the goals However in themultiagent setting the world state space is large because ofcombinations of agentsrsquo states and goals Besides the Dec-POMDMmodels a discrete system To solve this problem weuse a MF algorithm to infer multiagent joint goals under theframework of the Dec-POMDM It has been proved efficientwhen the state space is large and discrete

Nyolt et al discussed the theory of the MF and howto apply it in a casual model in [12 13] Their main ideacan still work in our paper but there are also differencesbetween the Dec-POMDM and the causal model (a) theinitial distribution of states in our model is uncertain (b)the results of actions in our model are uncertain (c) we donot model duration of actions for simplicity Thus we haveto modify the MF for casual models and make it available forthe Dec-POMDM

When the MF is applied to infer goals under the frame-work of the Dec-POMDM the set of particles is definedas 119909119894

119905119894=1119873

119905 where 119909119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ 119873119905is the number

of particles at time 119905 and the weight of 119894th particle is119908119894

119905 The detailed procedures of goal inference are given in

Algorithm 1119898119905is the set of 119909

119894

119905 119908119894

119905119894=1119873

119905 which contains all par-ticles and their weights at each stage When we initialize119909119894

0 119908119894

0119894=1119873

0 the weights are computed by

119908119894

0= 119901 (119904

119894

0| 1199100) 119901 (119892119900119886119897

119894

0) 119901 (119890119894

0| 119892119900119886119897

119894

0 119904119894

0) (2)

119901(119892119900119886119897119894

0) and 119901(119890119894

0| 119892119900119886119897

119894

0 119904119894

0) are provided by the model and

119901(119904119894

0| 1199100) is computed by

119901 (119904119894

0| 1199100) prop 119901 (119910

0| 119904119894

0) 119901 (119904119894

0) (3)

119898119886119901 is a temp memory of particles and their weights whichare transited from one specific particle Because a worldstate 119904

119905may be after the execution of different actions from

given 119904119905minus1

we have to use a LOOKUP operation to querythe record in 119898119886119901 which covers the new particle 119909

119905 The

operation LOOKUP(119909119905 119898119886119901) searches 119909

119905in 119898119886119901 if there

is a record ⟨119909119895

119905 119908119895

119905⟩ in 119898119886119901 which covers 119909

119905 the operation

returns this record otherwise it returns empty This processis time consuming if we scan the 119898119886119901 for every query Onealternative method is to build a matrix for each 119898119886119901 whichrecords the indices of all reached particles Then if the indexof 119909119905is null we add a new record in 119898119886119901 and update the

index matrix otherwise we can read the index of 119909119905from the

matrix andmerge the weight directly After we finish building119898119886119901 its index matrix can be deleted to release memory Wealso need to note that this solution saves computing time butneeds extra memory

The operation PUT(119898119886119901 ⟨119909119905 119908119905⟩) adds a new record

⟨119909119905 119908119905⟩ in 119898119886119901 and indexes this new record The generated

119898119886119901 contains a group of particles and correspondingweightsSome of these particles may have been covered in 119898

119905 Thus

we have to use a MERGE operation to get a new 119898119905by

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 7

Set 119905 = 01198980larr 119909

119894

0 119908119894

0119894=11198730 Initialization

For 119905 = 1 max 119905119898119905larr 119873119906119897119897 We totally have max 119905 + 1 observations

Foreach 119909119894

119905minus1 119908119894

119905minus1 isin 119898

119905minus1

119898119886119901 larr997888 119873119906119897119897

If 119890119894119905minus1

= 1

Foreach 119892119900119886119897119905which satisfies that 119901(119892119900119886119897

119905| 119904119894

119905minus1) gt 0

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1) sdot 119901 (119892119900119886119897

119905| 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in119898119886119901

Else119892119900119886119897119905larr997888 119892119900119886119897

119894

119905minus1

Foreach = 1198861 1198862 119886

119899which satisfies thatprod119899

119896=1119901(119886119896| 119892119900119886119897

119905 119904119894

119905minus1) gt 0

Foreach 119904119905which satisfies that 119879(119904119894

119905minus1 119904119905) gt 0

Foreach 119890119905which satisfies that 119901(119890

119905| 119904119905 119892119900119886119897

119905) gt 0

119908119905= 119908119894

119905minus1sdot 119901 (119890119905| 119904119905 119892119900119886119897

119905) sdot 119879 (119904

119894

119905minus1 119904119905) sdot

119899

prod

119896=1

119901 (119886119896| 119892119900119886119897

119905 119904119894

119905minus1)

119909119905larr 119892119900119886119897

119905 119904119905 119890119905 A temp memory of a particle

⟨119909119895

119905 119908119895

119905⟩ larr LOOKUP(119909

119905 119898119886119901) Find out the record which is equal to 119909

119905in119898119886119901

If ⟨119909119895119905 119908119895

119905⟩ is not empty

⟨119909119895

119905 119908119895

119905⟩ larr ⟨119909

119895

119905 119908119895

119905+ 119908119905⟩ Merge the weight

ElsePUT(119898119886119901 ⟨119909

119905 119908119905⟩) Add a new record in map

MERGE(119898119905 119898119886119901)

UPDATE(119898119905)

PRUNE(119898119905)

NORMALIZE(119898119905)

DELETE(119898119905minus1

)

Algorithm 1 Goal inference based on the marginal filter under the framework of Dec-POMDM

merging119898119886119901 and the existing119898119905 In this process if a particle

119909119905in 119898119886119901 has not appeared in 119898

119905 we directly put 119909

119905and

its corresponding weight 119908119905into 119898

119905 otherwise we need to

add 119908119905into the weight of the record in 119898

119905which covers 119909

119905

Similarly an indexmatrix can also be used to save computingtime in the MERGE operation

Under the framework of Dec-POMDM we update 119908119894119905by

119908119894

119905= 119908119894

119905sdot 119901 (119910119905| 119904119894

119905) (4)

where 119908119894119905and 119909

119894

119905= ⟨119892119900119886119897

119894

119905 119890119894

119905 119904119894

119905⟩ are the weight and particle

respectively 119894th of record in 119898119905 and 119901(119910

119905| 119904119894

119905) is the

observation function for the recognizer in the model Thedetails of PRUNE operation can be found in [13] and we canuse the existing pruning technique directly in our paper

5 Experiments

51ThePredator-Prey Problem Thepredator-prey problem isa classic problem to evaluate multiagent decision algorithms

[9] However it cannot be used directly in this paper becausepredators have only one possible goal Thus we modify thestandard problem by setting more than one prey on the mapand our aim is to recognize the real target of the predatorsat each time Figure 2 shows the grid-based map in ourexperiments

In our scenario the map consists of 5 times 5 grids twopredators (red diamonds Predator PX and Predator PY)and two preys (blue stars Prey PA and Prey PB) The twopredators select one of the preys as their common goal andmove around to capture it As is shown in Figure 2 theobservation of the predators is not perfect a predator onlyknows the exact position of itself and others which are in thenearest 8 grids If another agent is out of its observing rangethe predator only knows its direction (8 possible directions)For the situation in Figure 2 Predator PX observes that noneis near to itself Prey PB is in the north Prey PA is in thesoutheast and Predator PY is in the south Predator PYobserves that none is near itself Prey PB and Predator PYare in the north and Prey PA is in the east In each step

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

8 Discrete Dynamics in Nature and Society

PA

PB

PX

PY

PX

PY PA

PB

PY PA

PB PX

Observation of Predator PY

Observation of Predator PX

agents making a decisionΩ observations for

Figure 2 The grid-based map in the predator-prey problem

all predators and preys can get into one of the four adjacentgrids (north south east and west) or stay at the currentgrid When two or more agents try to get into the samegrid or try to get out of the map they have to stay in thecurrent grid The predators can achieve their goal if and onlyif both of them are adjacent to their target Additionally thepredators may also change their goal before they capture aprey The recognizer can get the exact positions of the twopreys but its observations of the predators are noisyWe needto compute the posterior distribution of predatorsrsquo goals withthe observation trace

The modified predator-prey problem can be modeled bythe Dec-POMDM Some important elements are as follows

(a) 119863 the two predators(b) 119878 the positions of predators and preys(c) 119860 five actions for each predator moving into four

directions and staying(d) 119866 Prey PA or Prey PB(e) Ω the directions of agents far away and the real

positions of agents nearby(f) 119884 the real positions of preys and the noisy positions

of predators(g) 119877 predators getting a reward +1 once they achieve

their goal otherwise the immediate reward is 0(h) ℎ infinite horizons

With the definition above the effects of predatorrsquos actionsare uncertain and the state transition function depends onthe distribution of preysrsquo actions Thus actions of preysactually play the role of events in discrete dynamic systems

52 Settings In this section we provide additional details forthe scenario and the parameters used in the policy learningand inference algorithm

(A) The Scenario The preys are senseless they select eachaction with equal probability Initial positions of agents are

uniform The initial goal distribution is that 119901(1198921199001198861198970

=

119875119903119890119910119860) = 06 and 119901(1198921199001198861198970= 119875119903119890119910119861) = 04

We simplify the goal termination function in the follow-ing way if predators capture their target the goal is achievedotherwise the predators change their goal with a probabilityof 005 for every stateThe observation for the recognizer 119910

119905isin

119884 reveals the real position of each predator with a probabilityof 05 and the observation may be one of 8 neighbors of thecurrent grid with a probability of 058 When the predator ison the edge of the map the observation may be out of themap The observed results of the agents do not affect eachother

(B) Multiagent Reinforcement Learning The discount factoris 120582 = 08 The predator selects an action 119886

119894given the

observation 119900119894

119905with a probability

119901 (119886119894| 119900119894

119905) =

exp (119876 (119900119894

119905 119886119894) 120573)

sum5

119894=1exp (119876 (119900

119894

119905 119886119894) 120573)

(5)

where 120573 = 01 is the Boltzmann temperature We set 120573 gt

0 as a constant which means that predators always selectapproximately optimal actions In our scenario the 119876-valueconverges after 750 iterations In the learning process ifpredators cannot achieve their goal in 10000 steps we willreinitialize their positions and begin next episode

(C) The Marginal Filter In the MF inference a particleconsists of the joint goal the goal termination variable andthe positions of predators and preys Although there are 25possible positions for each predator or prey after gettingnew observation we can identify the positions of preys andthere are only 9 possible positions for each predator Thusthe number of possible values of particles at one step neverexceeds 9 times 9 times 2 times 2 = 324 after the UPDATE operation Inour experiments we simply set the max number of particlesas 324 then we do not need to prune any possible particleswhichmeans that we exploit an exact inferenceWe alsomakeuse of real settings in the Dec-POMDM and the real policiesof predators in the MF inference

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 9

Table 1 The details of two traces

Trace number Durations Targets Interrupted

1 119905 isin [1 7] Prey PA Yes119905 isin [8 13] Prey PB No

4 119905 isin [1 10] Prey PB No

53 Results and Discussion With the learned policy we runthe simulation model repeatedly and generated a test datasetconsisting of 100 traces There are 1183 steps averagely in onetrace and the number of steps in one trace varies from 6to 34 with a standard deviation of 536 We also found thatthe predators changed their goals for at least once in 35of the test traces To validate our method and compare itwith baselines we did experiments on three aspects (a) todiscuss the details of the recognition results obtained withour method we computed the recognition results of twospecific traces by our method (b) to show the advantages oftheDec-POMDM inmodeling we compared the recognitionperformances when the problem was modeled as a Dec-POMDM and as an HMM (c) to show efficiency of the MFunder the framework of the Dec-POMDM we compared therecognizing performances when the goal was inferred by theMF and the PF

In the second and the third parts performances wereevaluated by statistic metrics precision recall and 119865-measure Their meanings and computation details can befound in [31] The value of the three metrics is between 0and 1 a higher value means a better performance Sincethese metrics can only evaluate the recognition results ata single step and traces in the dataset had different timelengths we defined a positive integer 119896 (119896 = 1 2 5)The metric with 119896means that the corresponding observationsequences are 119910

119895isin1100

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil Here 119910119895

119905isin1lceil119896lowast119897119890119899119892119905ℎ1198955rceil

is theobservation sequence from time 1 to time lceil119896 lowast 119897119890119899119892119905ℎ

1198955rceil of

the 119895th trace and 119897119890119899119892119905ℎ119895 is the length of the 119895th trace And

we need to recognize 119892119900119886119897119905for each observation sequence

Thus metrics with different 119896 show the performances ofalgorithms in different phases of the simulation Additionallywe regarded the destination with largest probability as thefinal recognition result

(A)The Recognition Results of the Specific Traces To show thedetails of the recognition results obtained with our methodwe selected two specific traces from the dataset (number 1and number 4)These two traces were selected because Tracenumber 1 was the first trace where the goal was changedbefore it was achieved and number 4 was the first tracewhere the goal was kept until it was achieved The detailedinformation about the two traces is shown in Table 1

In Trace number 1 predators selected Prey PA as theirfirst goal from 119905 = 1 to 119905 = 7 Then the goal was changedto Prey PB which was achieved at 119905 = 13 In Trace number 4predators selected Prey PB as their initial goal This goal waskept until it was achieved at 119905 = 14 Given the real policiesand other parameters of the Dec-POMDM including 119879 119874Υ 119868 119885 119862 and 119861 we used the MF to compute the posterior

distribution of goals at each time The recognition results areshown in Figure 3

In Figure 3(a) the probability of the real goal (PreyPA) increases fast during the initial period At 119905 = 3 theprobability of Prey PA exceeds 09 and keeps a high value until119905 = 7 When the goal is changed at 119905 = 8 our method hasa very fast response because predators select highly certainjoint actions at this time In Figure 3(b) the probability ofthe false goal increases at 119905 = 2 and the probability ofthe real goal (Prey PB) is low at first The failure happensbecause the observations support that predators selected jointactions with small probability if the goal is Prey PB Anywaythe probability of the real goal increases continuously andexceeds that of Prey PA after 119905 = 5 With the recognitionresults of the two specific traces we conclude that the Dec-POMDM and MF can perform well regardless of the goalchange

(B) Comparison of the Performances of the Dec-POMDM andHMMs To show the advantages of the Dec-POMDM wemodeled the predator-prey problem as thewell-knownHMMas a baseline In the HMM we set the hidden state 119909 as thetuple ⟨119892119900119886119897 119901

119883 119901119884 119901119860 119901119861⟩ where 119901

119883 119901119884 119901119860 and 119901

119861are

positions of Predator PX Predator PY Prey PA and PreyPB respectively The observation model for the recognizer inthe HMM was the same as that in the Dec-POMDM Thusthere were 25 times 24 times 23 times 22 times 2 = 607200 possible statesin this HMM and the dimension of the transition matrixwas 607200 times 607200 Since the state space and transitionmatrix were too large an unsupervised learning methodsuch as Balm-Welch algorithmwas infeasible in this problemInstead we used a simple supervised learning counting thestate transitions based on labeled training datasets Withthe real policies of predators we run the simulation modelrepeatedly and generated five training datasets The detailedinformation of these datasets is shown in Table 2

The datasets HMM-50 HMM-100a and HMM-200awere generated in a random and incremental way (HMM-100a contains HMM-50 and HMM-200a contains HMM-100a) Since HMM-100a and HMM-200a both covered 79of traces of the test dataset which might cause an overfittingproblem we removed the traces covered by the test datasetin HMM-100a and HMM-200a and compensated them byextra traces In this way we got new datasets HMM-100band HMM-200b which did not cover any trace in the testdataset With these five labeled datasets we estimate thetransition matrix by counting state transitions Then the MFwas used to infer the goal However in the inference processwe may reach some states that have not been explored in thetraining dataset (HMM-200a only explores 492642 statesbut there are totally 607200 possible states) In this case weassumed that the hidden state would transit to a possiblestate with a uniform distribution The rest of the parametersin the HMM inference were the same as those in the Dec-POMDM inference We compare performances of the Dec-POMDMand theHMMsThe recognitionmetrics are shownin Figure 4

Figure 4 shows that comparing the results of the Dec-POMDM and the HMMs trained by different datasets is

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

10 Discrete Dynamics in Nature and Society

Prey APrey B

2 3 4 5 6 7 8 9 10 11 12 131Time

0010203040506070809

1Pr

obab

ility

(a) Recognition results of Trace number 1

Prey APrey B

0010203040506070809

1

Prob

abili

ty

2 3 4 5 6 7 8 9 101Time

(b) Recognition results of Trace number 4

Figure 3 Recognition results of two specific traces computed by the Dec-POMDM and MF

Table 2 Information of training datasets for the HMM

Name of the dataset Number of traces Number ofexplored states

Percentage of traces in the test datasetcovered by the training dataset

HMM-50 50000 279477 0HMM-100a 100000 384623 79HMM-100b 100000 384621 0HMM-200a 200000 492642 79HMM-200b 200000 492645 0

similar in terms of precision recall and 119865-measure Moreprecisely HMM-200a had the highest performance HMM-100a performed comparable to our Dec-POMDM but Dec-POMDM performed better after 119896 = 4 HMM-50 had theworst performance Generally there was no big differencebetween performances of the Dec-POMDM HMM-100aandHMM-200a even though the number of traces inHMM-200a was twice as large as that in HMM-100a The main rea-son is that the training datasetswere overfittedActually therewas a very serious performance decline after we removedthe traces covered in the test dataset from HMM-200a andHMM-100a In this case HMM-200b performed better thanHMM-100b but worse than our Dec-POMDM The resultsin Figure 4 showed that (a) our Dec-POMDM performedwell on three metrics almost as well as the overfitted trainedmodel (b) the learnedHMMsuffered an overfitting problemand there will be a serious performance decline if the trainingdataset does not cover most traces in the test dataset

The curves of HMM-50 HMM-100b and HMM-200balso showed that when we model the problem throughthe HMM it may be possible to improve the recognitionperformances by increasing the size of the training datasetHowever this solution is infeasible in practice Actually thedataset HMM-200a which consisted of 200000 traces onlycovered 8113 of all possible states and only 7146 of thestates in HMM-200a had been reached more than onceThus we can conclude that HMM-200a will have a poorperformance if agents select actions with higher uncertaintyBesides there is no doubt that the size of the training dataset

will be extremely large if most states are reached a largenumber of times In real applications it is very hard to obtaina training dataset with so large size especially when all tracesare labeled

We also performed a Wald test over the Dec-POMDMandHMMswith a different training dataset to prove that theirrecognition results came from different distributions Givenour test dataset there were 100 goals to be recognized foreach value of 119896 Let 119879119883 be the set of samples obtained fromthe Dec-POMDM then we set 119879119883

119894= 1 if the recognition

result of the Dec-POMDM is correct on the test case 119894 (119894 =1 2 100) and 119879119883

119894= 0 otherwise let 119879119884 be the set

of samples obtained from the baseline model (one of theHMMs) then we set 119879119884

119894= 1 if the recognition result of

the Dec-POMDM is correct on the test case 119894 and 119879119884119894= 0

otherwise We define 119879119863119894= 119879119883119894minus 119879119884119894 and let 120575 = E(119879119863

119894) =

E(119879119883119894) minus E(119879119884

119894) = 119875(119879119883

119894= 1) minus 119875(119879119884

119894= 1) the null

hypothesis is 120575 = 0 which means the recognition resultsfrom different models follow the same distribution A moredetailed test process can be found in [32] The matrix of 119901values is shown in Table 3

The 119901 values in Table 3 show that recognition results oftheDec-POMDM follow different distributions from those ofHMM-50 HMM-100b and HMM-200b respectively with ahigh confidence We cannot reject the null hypothesis whenthe baseline is an overfitted trained model The Wald testresults are consistent with the metrics in Figure 4 We alsoperformed the Wilcoxon test and the test results showed thesame trend

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 11

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1Pr

ecisi

on

(a) Precision

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

0010203040506070809

1

Reca

ll

1 2 25 3 35 4 45 515k

(b) Recall

HMM-50HMM-100aHMM-100b

HMM-200aHMM-200bDec-POMDM

1 2 25 3 35 4 45 515k

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 4 Recognition metrics of the Dec-POMDM and HMMs

Table 3 The 119901 values of the Wald test over the Dec-POMDM and HMMs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

Dec-POMDM versus HMM-50 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-100 1 00412 00478 02450 lt001Dec-POMDM versus HMM-100b 1 lt001 lt001 lt001 lt001Dec-POMDM versus HMM-200 1 03149 01929 04127 1Dec-POMDM versus HMM-200b 1 lt001 lt001 lt001 lt001

(C) Comparison of the Performances of the MF and thePF Here we exploited the PF as the baseline The modelinformation used in the PF is the same as that in the MFWe evaluated the PF with different number of particles andcompared their performances to the MF All inference wasdone under the framework of Dec-POMDMWe have to notethat when the PF used weighted particles to approximate theposterior distribution of the goal it is possible that all weightsdecrease to 0 if the number of particles is not large enoughIn this case we simply reset all weights 1Np to continue theinference where Np is the number of particles in the PF Therecognition metrics of the MF and PF are shown in Figure 5

In Figure 5 the red solid curve indicates themetrics of theMF The green blue purple black and cyan dashed curves

indicate the metrics of PF with 1000 2000 4000 6000 and16000 particles respectively All filters had similar precisionHowever considering the recall and the 119865-measure the MFhad the best performance and the PFwith the largest numberof particles performed better than the rest of PFWe got theseresults because an exact MF (without pruning step) is usedin this section and the PF can approximate the real posteriordistribution better with more particles

Similar to the testingmethod we used in Part (B) here wealso performed theWald test on theMF and PFwith differentnumber of particles The matrix of the 119901 values is shown inTable 4

The null hypothesis is that the recognition results of thebaseline and the MF follow the same distribution Generally

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

12 Discrete Dynamics in Nature and Society

Table 4 The 119901 values of the Wald test over the MF and PFs

119896 = 1 119896 = 2 119896 = 3 119896 = 4 119896 = 5

MF versus PF-1000 1 00176 lt001 lt001 lt001MF versus PF-2000 1 03637 02450 00300 lt001MF versus PF-4000 1 1 00910 00412 00115MF versus PF-6000 1 04127 01758 01758 00218MF versus PF-16000 1 05631 00412 1 01531

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

1 2 25 3 35 4 45 515k

0010203040506070809

1

Prec

ision

(a) Precision

15 2 25 3 35 4 45 51k

0010203040506070809

1

Reca

ll

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

(b) Recall

1 2 25 3 35 4 45 515k

PF-1000PF-2000PF-4000

PF-6000PF-16000MF

0010203040506070809

1

F-m

easu

re

(c) 119865-measure

Figure 5 Recognition metrics of the MF and PF

with larger 119896 and smaller number of particles we canreject the null hypothesis with higher confidence which isconsistent with the results in Figure 5

Since there was a variance in the results inferred by thePF (this is due to the fact that the PF performs approximateinference) we run the PFwith 6000 particles for 10 timesThemean value with 090 belief interval of the values of metricsat different 119896 is shown in Figure 6

The blue solid line indicates the mean value of the PFmetrics and the cyan area shows the 90 belief intervalWe need to note that since we do not know the underlyingdistribution of themetrics an empirical distributionwas usedto compute the belief interval At the same time because we

run PF for 10 times the bound of the 90 belief interval alsoindicates the extremum of PF We can see that the metricsof the MF are better than the mean of the PF when 119896 gt 1and even better than the maximum value of PF except for119896 = 4 Actually the MF also outperforms 90 of the runsof the PF at 119896 = 4 Additionally the MF only needs average of7578 of the time which is needed by the PF for inference ateach step Thus the MF consumes less time and has a betterperformance than the PF with 6000 particles

Generally the computational complexities of the PF andthe MF are both linear functions of number of particlesWhen there are multiple possible results of an action theMF consumes more time than the PF when their numbers of

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 13

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1Pr

ecisi

on

(a) Precision

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

Reca

ll

(b) Recall

MFMean of PF

Extremum of PF90 belief interval of PF

15 2 25 3 35 4 51 45k

05055

06065

07075

08085

09095

1

F-m

easu

re

(c) 119865-measure

Figure 6 Metrics of the PF with 6000 particles and the MF

particles are equal However since the MF does not duplicateparticles it needs much less particles than the PF when thestate space is large and discrete Actually the number ofpossible states in the MF after UPDATE operation is nevermore than 156 in the test dataset At the same time the PF hasto duplicate particles in the resampling step to approximatethe exact distribution which makes it inefficient under theframework of Dec-POMDM

6 Conclusion and Future Work

In this paper we propose a novelmodel for solvingmultiagentgoal recognition problems of the Dec-POMDM and presentits corresponding learning and inference algorithms whichsolve a multiagent goal recognition problem

First we use the Dec-POMDM to model the generalmultiagent goal recognition problem The Dec-POMDMpresents the agentsrsquo cooperation in a compact way detailsof cooperation are unnecessary in the modeling process Itcan also make use of existing algorithms for solving the Dec-POMDP problem

Second we show that the existing MARL algorithm canbe used to estimate the agentsrsquo policies in the Dec-POMDMassuming that agents are approximately rationalThismethod

does not need a training dataset and the state transitionfunction of the environment

Third we use the MF to infer goals under the frameworkof the Dec-POMDM and we show that the MF is moreefficient than the PFwhen the state space is large and discrete

Last we also design a modified predator-prey problem totest our method In this modified problem there are multiplepossible joint goals and agents may change their goals beforethey are achieved With this scenario we compare ourmethod to other baselines including the HMM and the PFThe experiment results show that the Dec-POMDM togetherwithMARL andMF algorithms can recognize themultiagentgoal well whether the goal is changed or not the Dec-POMDMoutperforms theHMM in terms of precision recalland 119865-measure and the MF can infer goals more efficientlythan the PF

In the future we plan to apply the Dec-POMDM in morecomplex scenarios Further research on pruning technique ofthe MF is also planned

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

14 Discrete Dynamics in Nature and Society

Acknowledgments

This work is sponsored by the National Natural ScienceFoundation of China under Grants no 61473300 and no61573369 andDFG research training group 1424 ldquoMultimodalSmart Appliance Ensembles for Mobile Applicationsrdquo

References

[1] G Synnaeve and P Bessiere ldquoA Bayesian model for planrecognition in RTS games applied to starcraftrdquo in Proceedings ofthe 7th AAAI Conference on Artificial Intelligence and InteractiveDigital Entertainment (AIIDE rsquo11) pp 79ndash84 Stanford CalifUSA October 2011

[2] F Kabanza P Bellefeuille F Bisson A R Benaskeur andH Irandoust ldquoOpponent behaviour recognition for real-timestrategy gamesrdquo in Proceedings of the 24th AAAI Conference onPlan Activity and Intent Recognition pp 29ndash36 July 2010

[3] F SoutheyW Loh and DWilkinson ldquoInferring complex agentmotions from partial trajectory observationsrdquo in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence(IJCAI rsquo07) pp 2631ndash2637 January 2007

[4] M Ramırez and H Geffner ldquoGoal recognition over POMDPsinferring the intention of a POMDP agentrdquo in Proceedings ofthe 22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) pp 2009ndash2014 Barcelona Spain July 2011

[5] E Y Ha J P Rowe B W Mott and J C Lester ldquoGoalrecognition with markov logic networks for player-adaptivegamesrdquo in Proceedings of the 7th AAAI Conference on ArtificialIntelligence and Interactive Digital Entertainment (AIIDE rsquo11)pp 32ndash39 Stanford Calif USA October 2011

[6] K Yordanova F Kruger and T Kirste ldquoContext aware approachfor activity recognition based on precondition-effect rulesrdquo inProceedings of the IEEE International Conference on PervasiveComputing and Communications Workshops (PERCOM Work-shops rsquo12) pp 602ndash607 IEEE Lugano Switzerland March 2012

[7] Q Yin S Yue Y Zha and P Jiao ldquoA semi-Markov decisionmodel for recognizing the destination of amaneuvering agent inreal time strategy gamesrdquo Mathematical Problems in Engineer-ing vol 2016 Article ID 1907971 12 pages 2016

[8] M Wiering and M van Otterlo Eds Reinforcement LearningState-of-the-Art Springer Science and Business Media 2012

[9] B Scherrer andC Francois ldquoCooperative co-learning amodel-based approach for solving multi-agent reinforcement prob-lemsrdquo in Proceedings of the 14th IEEE International Conferenceon Tools with Artificial Intelligence (ICTAI rsquo02) pp 463ndash468Washington DC USA 2002

[10] S Seuken and Z Shlomo ldquoMemory-bounded dynamic pro-gramming for DEC-POMDPsrdquo in Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo07) pp 2009ndash2015 Hyderabad India January 2007

[11] R Nair M Tambe M Yokoo D Pynadath and S MarsellaldquoTaming decentralized POMDPs towards efficient policy com-putation for multiagent settingsrdquo in Proceedings of the 18thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo03) pp 705ndash711 Acapulco Mexico August 2003

[12] M Nyolt F Kruger K Yordanova A Hein and T KirsteldquoMarginal filtering in large state spacesrdquo International Journalof Approximate Reasoning vol 61 pp 16ndash32 2015

[13] M Nyolt and T Kirste ldquoOn resampling for Bayesian filters indiscrete state spacesrdquo in Proceedings of the IEEE 27th Interna-tional Conference on Tools with Artificial Intelligence (ICTAI rsquo15)pp 526ndash533 Vietri sul Mare Italy November 2015

[14] C L Baker R Saxe and J B Tenenbaum ldquoAction understand-ing as inverse planningrdquo Cognition vol 113 no 3 pp 329ndash3492009

[15] L M Hiatt A M Harrison and J G Trafton ldquoAccommodatinghuman variability in human-robot teams through theory ofmindrdquo in Proceedings of the 22nd International Joint Conferenceon Artificial Intelligence (IJCAI rsquo11) pp 2066ndash2071 July 2011

[16] F Kruger K Yordanova A Hein and T Kirste ldquoPlan synthesisfor probabilistic activity recognitionrdquo in Proceedings of the 5thInternational Conference on Agents and Artificial Intelligence(ICAART rsquo13) pp 283ndash288 Barcelona Spain February 2013

[17] F Kruger M Nyolt K Yordanova A Hein and T KirsteldquoComputational state space models for activity and intentionrecognition A feasibility studyrdquo PLoS ONE vol 9 no 11 ArticleID e109381 2014

[18] B Tastan Y Chang and G Sukthankar ldquoLearning to interceptopponents in first person shooter gamesrdquo in Proceedings ofthe IEEE International Conference on Computational Intelligenceand Games (CIG rsquo12) pp 100ndash107 IEEE Granada SpainSeptember 2012

[19] C L Baker N D Goodman and J B Tenenbaum ldquoTheory-based social goal inferencerdquo in Proceedings of the 30th AnnualConference of the Cognitive Science Society pp 1447ndash1452Austin Tex USA July 2008

[20] T D Ullman C L Baker O Macindoe O Evans N D Good-man and J B Tenenbaum ldquoHelp or hinder bayesian modelsof social goal inferencerdquo in Proceedings of the 23rd AnnualConference onNeural Information Processing Systems (NIPS rsquo09)pp 1874ndash1882 December 2009

[21] B Riordan S Brimi N Schurr et al ldquoInferring user intent withBayesian inverse planning making sense of multi-UAS missionmanagementrdquo in Proceedings of the 20th Annual Conference onBehavior Representation in Modeling and Simulation (BRiMSrsquo11) pp 49ndash56 Sundance Utah USA March 2011

[22] P Doshi XQu andAGoodie ldquoDecision-theoretic planning inmulti-agent settings with application to behavioral modelingrdquoin Plan Activity and Intent RecognitionTheory and Practice GSukthankar R P Goldman C Geib et al Eds pp 205ndash224Elsevier Waltham Mass USA 2014

[23] F Wu Research on multi-agent system planning problem basedon decision theory [PhD thesis] University of Science andTech-nology of China Hefei China 2011

[24] F A Oliehoek M T J Spaan P Robbel and J V Messias TheMADP Toolbox 031 2015 httpwwwfransoliehoeknetdocsMADPToolbox-031pdf

[25] M Liu C Amato X Liao L Carin and J P How ldquoStick-breaking policy learning in Dec-POMDPsrdquo in Proceedings ofthe 24th International Joint Conference on Artificial Intelligence(IJCAI rsquo15) pp 2011ndash2018 July 2015

[26] H H Bui S Venkatesh and GWest ldquoPolicy recognition in theabstract hiddenMarkov modelrdquo Journal of Artificial IntelligenceResearch vol 17 pp 451ndash499 2002

[27] S Saria and S Mahadevan ldquoProbabilistic plan recognition inmulti-agent systemsrdquo in Proceedings of the 24th InternationalConference onAutomated Planning and Scheduling pp 287ndash296June 2004

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Discrete Dynamics in Nature and Society 15

[28] A Pfeffer S Das D Lawless and B Ng ldquoFactored reasoningfor monitoring dynamic team and goal formationrdquo InformationFusion vol 10 no 1 pp 99ndash106 2009

[29] A Sadilek and H Kautz ldquoLocation-based reasoning aboutcomplex multi-agent behaviorrdquo Journal of Artificial IntelligenceResearch vol 43 pp 87ndash133 2012

[30] WMin E Y Ha J Rowe BMott and J Lester ldquoDeep learning-based goal recognition in open-ended digital gamesrdquo in Pro-ceedings of the 10th AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE rsquo14) pp 37ndash43Raleigh NC USA October 2014

[31] S-G Yue P Jiao Y-B Zha andQ-J Yin ldquoA logical hierarchicalhidden semi-Markov model for team intention recognitionrdquoDiscrete Dynamics in Nature and Society vol 2015 Article ID975951 19 pages 2015

[32] L Wasserman Ed All of Statistics A Concise Course in Statis-tical Inference Springer Science and Business Media 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article A Decentralized Partially …downloads.hindawi.com/journals/ddns/2016/5323121.pdfResearch Article A Decentralized Partially Observable Decision Model for Recognizing

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of