an innovative multi-agent search-and-rescue path planning approach

An innovative multi-agent search-and-rescue path planning approach

Jean Berger a,n, Nassirou Lo b

a DRDC Valcartier, Quebec City, Canadab T-OptLogic Ltd. Quebec City, Canada

a r t i c l e i n f o

Available online 24 June 2014

Keywords:Search path planningSearch and rescueMulti-agentLinear programming

a b s t r a c t

Search and rescue path planning is known to be computationally hard, and most techniques developedto solve practical size problems have been unsuccessful to estimate an optimality gap. A mixed-integerlinear programming (MIP) formulation is proposed to optimally solve the multi-agent discrete searchand rescue (SAR) path planning problem, maximizing cumulative probability of success in detecting atarget. It extends a single agent decision model to a multi-agent setting capturing anticipated feedbackinformation resulting from possible observation outcomes during projected path execution whileexpanding possible agent actions to all possible neighboring move directions, considerably augmentingcomputational complexity. A network representation is further exploited to alleviate problem modeling,constraint specification, and speed-up computation. The proposed MIP approach uses CPLEX problem-solving technology in promptly providing near-optimal solutions for realistic problems, while offering arobust upper bound derived from Lagrangean integrality constraint relaxation. Modeling extension to aclosed-loop environment to incorporate real-time action outcomes over a receding time horizon caneven be envisioned given acceptable run-time performance. A generalized parameter-driven objectivefunction is then proposed and discussed to suitably define a variety of user-defined objectives.Computational results reporting the performance of the approach clearly show its value.

Crown Copyright & 2014 Published by Elsevier Ltd. All rights reserved.

1. Introduction

Search and rescue path planning is an increasingly importantproblem for a variety of civilian and military domains such ashomeland security and emergency management. The basic dis-crete SAR or optimal searcher path problem involving a stationarytarget is known to be NP-Hard [1]. SAR may be generallycharacterized through multiple dimensions and attributes includ-ing: one-sided search in which targets are non-responsive towardsearcher's actions, two-sided, describing target behavior diversity(cooperative, non-cooperative or anti-cooperative), stationary vs.moving target search, discrete vs. continuous time and spacesearch (efforts indivisibility/divisibility), observation model,static/dynamic as well as open and closed-loop decision models,pursued objectives, target and searcher multiplicity and diversity.Early work on related search problems emerges from searchtheory [2,3]. Search-theoretic approaches mostly relate to theeffort (time spent per visit) allocation decision problem ratherthan path construction. Based upon a mathematical framework,efforts have increasingly been devoted to algorithmic

contributions to handle more complex dynamic problem settingsand variants [4,5–7]. In counterpart, many contributions on searchpath planning may be found in the robotics literature in the area ofrobot motion planning [8] and, namely, terrain acquisition [9,10]and coverage path planning [11–13]. Robot motion planningexplored search path planning, primarily providing constrainedshortest path type solutions for coverage [14,15] probleminstances. These studies typically examine uncertain search envir-onment problems with limited prior domain knowledge, involvingunknown sparsely distributed static targets and obstacles. Sepa-rate work on robot search algorithms is also referenced on thepursuit-evasion [16] theme although the nature and complexity ofthe problem are somewhat different. Recent taxonomies andcomprehensive surveys on target search problems from searchtheory and artificial intelligence/distributed robotic control, andpursuit-evasion problem perspectives may be found in [17,5,18–20] and [16] respectively.

Exact problem-solving methods for sequential decision searchproblem formulations show computational complexity to scaleexponentially. For instance, dynamic programming [5,20,7,21] ortree-based search techniques [22,23] may satisfactorily workunder specific constraints and conditions but ultimately face thecurse of dimensionality, showing poor scalability even for moder-ate size problem. A MIP-based approach/formulation has recentlybeen proposed as well to solve a related constrained pursuit-

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/caor

Computers & Operations Research

http://dx.doi.org/10.1016/j.cor.2014.06.0160305-0548/Crown Copyright & 2014 Published by Elsevier Ltd. All rights reserved.

n Corresponding author.E-mail addresses: [email protected] (J. Berger),

[email protected] (N. Lo).

Computers & Operations Research 53 (2015) 24–31

www.sciencedirect.com/science/journal/03050548

www.elsevier.com/locate/caor

http://dx.doi.org/10.1016/j.cor.2014.06.016



http://crossmark.crossref.org/dialog/?doi=10.1016/j.cor.2014.06.016&domain=pdf



mailto:[email protected]

mailto:[email protected]



evasion problem [24]. But problem attributes, constraints andcomplexity prove distinctive from target search, while theapproach remains sub-optimal and problem-solving limited tosmall size problems. This paved the way to the development ofefficient heuristic and approximate methods. Some earlyapproaches simply reduce computational complexity by relaxingsome hard constraints to keep the problem manageable. Methodsinspired from search theory propose procedures mainly based onbranch and bound [21,7] or path finding An types of techniquesand variants. Despite the development of many heuristics andapproximate problem-solving techniques for the SAR problem[5,20], published procedures still deliver approximate solutionand mostly fail to provably estimate real performance optimalitygap for practical size problems, questioning their real expectedrelative efficiency.

In this paper, we propose a new exact mixed-integer linearprogramming formulation to optimally solve the multi-agentdiscrete search path planning problem for a stationary object. Inthe problem model, ‘open-loop with anticipated feedback’ refersto offline planning, while capturing information resulting frompredicted agent observations (projected cell visit action outcome)as opposed to real feedback. Anticipated feedback augments pureopen-loop formulations which simply ignore information feed-back, while significantly improving solution quality, and mitigat-ing computational complexity limitations traditionally associatedwith closed-loop problem formulations (e.g. dynamic program-ming, and partially observable Markov decision processes). Thiscontribution aims at extending a single agent search path planningdecision model [25] to a multi-agent environment in whichfeasible agent actions are further expanded to any possibleneighboring move directions, while capturing anticipated feed-back information resulting from possible observation outcomesoccurring from projected path execution. In that setting, the open-loop with anticipated feedback information (observations) deci-sion model involves n agents (searchers) with imperfect sensingcapability (but false-positive observations -free) searching an area(grid) to maximize cumulative probability of success in detecting atarget, given a time horizon and prior cell occupancy probabilitydistribution. The model takes advantage of anticipated feedbackinformation resulting from observations outcomes along the pathto update target occupancy beliefs and make better decisions.A network flow representation significantly reduces modelingcomplexity (e.g. constraint specification) as well as implementa-tion and computational costs. The new decision model relies on anabstract network representation, coupled to a parallel computingcapability (e.g. using the CPLEX solver [26]) to gain additionalspeed-up. The novelty lies in a new exact linear model, and thefast computation of near optimal solutions of practical sizeproblems, providing a tight upper bound on solution qualitythrough Lagrangean programming relaxation. The computableupper bound constitutes an objective measure to fairly estimateand compare performance gap against various techniques. Com-putational results prove the proposed approach very efficient.Small computational run-time naturally enables open-loop model(with anticipated feedback) extension to a closed-loop formulationin which action outcomes from the previous episode may beexplicitly incorporated in real-time to update target occupancybelief distribution. As a result, an updated solution can bedynamically computed, by periodically solving new probleminstances taking advantage of feedback information (from realobservation outcomes), over short rolling horizons. The idea is toreadily exploit episodic feedback information whenever available.In that case, associated computational run-time corresponds to thetime required to visit a cell. This way to embrace constructivedynamic planning in real time through inexpensive computationaleffort is largely preferable to dynamic programming techniques

aimed at computing an exhaustive optimal policy, mappingsuitable actions to any possible posterior states at a prohibitivecomputational cost. The proposed approach rather determines thebest sequence of moves given the current state while updating thepath solution resulting from partial path execution by repeatedlysolving a new problem instance characterizing the follow-on state.Similarly, large time horizon problems can be solved efficiently,optimizing multiple problem instances over receding horizons.

The structure of the paper is organized as follows. Section 2first introduces problem definition, describing the main character-istics of the open-loop search path planning problem with antici-pated feedback. Then the main solution concept for the problem ispresented in Section 3. It describes a new mixed-integer linearprogramming network flow formulation combined with networkrepresentation to efficiently compute a near-optimal solution. Theproposed CPLEX-based problem-solving technique and someimplementation issues are then briefly reported in Section 4.Section 5 reports and discusses computational results depictingthe value of the proposed method. Finally, a conclusion is given inSection 6.

2. Problem

2.1. General description

The discrete centralized search and rescue path planningproblem involves a team of n homogeneous stand-off sensoragents searching a stationary target in a bounded environmentover a given time horizon. From a search and rescue missionperspective, the goal consists in maximizing the cumulativeprobability of success in detecting a target within a given region.Represented through a grid, the search region characterizes anarea defined as a set of cells N, describing possible target locations.Presumably occupying a single cell, the precise location of thetarget is assumed unknown. A prior target location probabilitydensity distribution for which cell occupancy probabilities sum upto one can be derived from domain knowledge. It reflects possibleindividual cell occupancy, defining a grid cognitive map oruncertainty grid. Should the target be located outside the searchareas of interest, a special inaccessible, and invisible virtual cellwould simply be added to the basic problem description topreserve the sum of probability property. The cognitive mapconstitutes a knowledge base describing a particular world state,including variables such as target occupancy belief distribution,time, agent position and orientation. An example of a cognitivemap is illustrated in Fig. 1 at a specific point in time.

The duration of a cell visit or service time is assumed constant,specifying the period of each episode. Vehicles are assumed to visitdifferent cell locations at the same time, and fly at slightlydifferent altitudes to avoid colliding with one other. A search pathsolution consists in constructing an agent path plan selecting base-level control action to maximize target detection.

2.2. Agent path planning

Episodic agent search path planning decision is based onagent's position (cell location), specific orientation {N,S,E,W,NE,SE,SW,NW} and speed determining possible legal moves to adjacentcell locations. For example, the 3-move agent investigated in [25]is limited to three possible moving directions with respect to itscurrent heading, namely, ahead, right or left as depicted in Fig. 2.In this work, agent movement or maneuvering capability isgeneralized to all degrees of freedom, permitting free motionalong any possible directions to explore its neighborhood. Anagent can therefore legally move toward its neighboring cells

J. Berger, N. Lo / Computers & Operations Research 53 (2015) 24–31 25

offering eight alternate possible directions at each time step. Thisadditional capability expands an agent path solution space by afactor (8/3)T over a 3-move planning agent for a given timehorizon T, significantly increasing computational complexity.

The primary goal consists in planning base-level control actionmoves to maximize probability of success (target detection) overthe entire grid.

2.3. Cumulative probability of success

In the proposed open-loop SAR model, the probability tosuccessfully detect the target resulting from n agent path solutionexecutions on the grid is defined as the sum over cells of theproduct of the probability of detection reflected from cell visitsand target cell occupancy belief dictated by the cognitive map(grid) [5,27,28]. Cumulative probability of success (CPOS) for teampath solutions (sequence of cell visits) represents the probabilitythat detection occurs for the first time over one of the timeintervals defining horizon T. It relates probability of first detectionto binary observation outcome zct (1: positive, 0: negative) fromcell c visit over time interval t, target cell c occupancy state Xc (1:positive, 0: negative) and past observation outcomes (history) Zt�1

up to the end of interval t-1:

CPOS¼ ∑cAN

∑tpðzct ¼ 1;Xc ¼ 1; Zt�1 ¼ 0Þ ð1Þ

where Zt�1¼0 corresponds to a negative observation outcomeshistory {zc(0)0¼0, zc(1)1¼0,…, zc(t�1) t�1¼0} before time interval t,meaning that the target has not been observed so far.

Exploiting conditional independence probability property (p(A,B)¼p(A|B)p(B) ) and, the fact that current cell visit observationoutcome given target occupancy is independent of past

observation outcomes, Eq. (1) is further developed leading to theexpression

CPOS¼ ∑cAN

∑tpðzct ¼ 1jXc ¼ 1ÞpðXc ¼ 1jZt�1 ¼ 0Þ

� pðZt�1 ¼ 0Þ ð2ÞUsing Baye's theorem, posterior probability/belief of target cell

c occupancy given past negative observation outcomes is given by

pðXc ¼ 1jZt�1 ¼ 0Þ ¼ pðZt�1 ¼ 0jXc ¼ 1ÞpðXc ¼ 1ÞpðZt�1 ¼ 0Þ ð3Þ

By substituting expression (3) in (2), CPOS can be revisited asfollows:

CPOS¼ ∑cAN

∑tpðzct ¼ 1jXc ¼ 1ÞpðZt�1 ¼ 0Þ�

�pðZt�1 ¼ 0jXc ¼ 1ÞpðXc ¼ 1ÞpðZt�1 ¼ 0Þ

¼ ∑cAN

∑tpðzct ¼ 1jXc ¼ 1Þ

� pðZt�1 ¼ 0jXc ¼ 1ÞpðXc ¼ 1Þ ð4ÞCPOS can then be finally expressed in a more convenient form

as:

CPOS¼ ∑cAN

∑tpccpct ¼ ∑

cAN∑tposct ð5Þ

where posct represents the probability of successfully detecting thetarget for the first time over the period t during a visit in cell c. pctrefers to the ‘non-normalized’ posterior probability/belief of celltarget occupancy during time interval t which incorporates ‘antici-pated’ information feedback that would result from past visits, asderived from (3) and (4). As for pcc, it is the probability on aspecific agent visit c to correctly detect the target in cell c giventhat the target is present in cell c (p(zct¼1|Xc¼1)). pcc depends oncell c. It should be emphasized that CPOS definition referring tofirst target detection assumes no ‘false positive’ detection fromsensors (i.e. p(zct¼1| Xc¼0)¼0) to make sense, otherwise onecould not claim that the target has been found for sure. Accord-ingly, an agent sensor is assumed to be false positive free, meaningthat a vacant cell visit always results in a negative observationoutcome by the sensing agent. Conversely, based on this assump-tion, a positive observation confirms that the target has beenfound and that the search task may be interrupted. This conditiondoes not however preclude the occurrence of false negative out-comes (‘miss’) as agent sensors are not perfect. In the currentsetting, sensor range defining visibility or footprint (coverage ofobservable cells given the current sensor position) is limited to thecell being searched.

3. Mixed-integer linear programming model formulation

3.1. Network representation

A network representation is used to simplify modeling andconstraint specification as well as problem-solving, as it eliminatesthe need to explicitly capture all constraints. These includemaximum path length or deadline, admissible/legal move, anddisconnected subtours elimination which may significantly impactrun-time when handled explicitly.

Let Gk¼(Vk, Ak) be the grid network, a directed acyclic graphassociated with agent k Aη¼{1,…,n}, where Vk ¼ [

tATVkt is the set

of vertices associated to agent states (i.e. position and orientationstate variables during a given episode t AT¼{0,1,2,..,|T|-1}), and Ak

the set of arcs (i,j) where i,j A Vk, reflecting possible agent statetransition between consecutive episodes over the grid, corre-sponding to a legal move m selected from the action set A¼{left,ahead, right}. Nkt¼N is the set of possible cell locations {1,…,|N|}

Fig. 1. Uncertainty grid/cognitive map at time step t. The 4-agent team beliefs aredisplayed through multi-level shaded cell areas. Projected agent plans are repre-sented as possible paths.

Fig. 2. Agent's region of interest displayed as forward move projection span(possible paths), for a 3-move agent over a 3-step time horizon.

J. Berger, N. Lo / Computers & Operations Research 53 (2015) 24–3126

over the grid during episode t whereas Okt¼O refers to the set ofpossible agent orientations/headings {E,NE,N,NW,W,SW,S,SE} dur-ing episode t. As a result, Vk ¼ [

tATVkt ¼ [

tATðNkt � OktÞ: The nodes o

and d are additional fictitious origin and destination locationvertices defining legal path ends in graph. An excerpt from theabstracted representation for the agent network over two con-secutive episodes is given in Fig. 3. An integer binary flow decisionvariable xijk is associated to each arc (i,j)AAk. Agent k path solutioninclude arcs (i,j)AAk for which xijk¼1. Given an initial agent statei0(k), path may be defined over the grid network traveling alongarcs connecting o to d instantiating flow decision variables to buildfeasible paths and then, consequently, assigning visit decisionvariables involved in the objective function. Agent state vertexduplication over |T| episodes is aimed at eliminating disjointsolution subtours otherwise difficult to handle explicitly, andprovides a directed acyclic graph to represent a legal solutionthrough binary integer flow decision variables including a multi-cycle path (possible occurrence of many visits on the same cell).Duplication implicitly satisfies path length constraint as well. Thesignificant gain obtained through duplication clearly exceeds thecost incurred by slightly degraded model readability due to theutilization of more complex notations. The agent network includes|O| |N| |T| nodes and |O| |N| |T| |A| arcs. It is assumed that a cell c canbe visited at most Vc times.

3.2. Mathematical modeling

A mathematical mixed-integer linear programming (MIP) for-mulation is proposed for the discrete stationary target search andrescue (SAR) path planning problem. It extends the single agentmodel [25] to a multi-agent setting while incorporating anypossible agent action moves.

The open-loop decision model captures explicitly ahead of timeanticipated information feedback resulting from projected actionexecution to update target cell occupancy probability (belief).Accordingly, based on the completion of a projected visit in cellc during time interval t, the posterior probability of cell contain-ment pc0 tþ1 for any cell c0 is related to its prior belief pc0t by:

pc0tþ1 ¼ ð1�pccδcc0 Þpc0t ð6Þ

where δcc0 ¼1 if c0 ¼c and 0 otherwise. pc0t refers to the probability/belief of cell c0 target occupancy during time interval t whichincorporates ‘anticipated’ information feedback that would resultfrom past visits. Eq. (6) derives from (4) and (5) while exploitingconditional independence property in computing pc0tþ1.

The variables and parameters defining the decision model aregiven as follows:

η: set of homogeneous agents {1,2,…,n}N: set of cells defining the grid search area {1,2,..,|N|}T: set of time intervals defining the time horizon {0,1,…,|T|-1}Vc: maximum number of visits on cell cpcc: conditional probability of ‘correct’ target detection on avisit in cell c given that the target is located in c.βc: 1/(1-pcc)pct: ‘non-normalized’ belief of cell c target occupancy duringtime interval t. {pc0} refers to the initial belief distribution oftarget occupancy over the grid.posct: probability of success (finding the target) resulting fromthe observation of cell c at the end of time interval tCPOS: objective function defining cumulative probability of successvclt: binary decision variable corresponding to cumulative num-ber of visits l on cell c at the end of time interval t – vclt ¼1(otherwise 0)yct: binary decision variable reflecting agent position in episode t.It indicates that cell c is visited during time interval t – yc t ¼ 1(otherwise 0)xijk: state transition binary variable. xijk¼1 reflects agent k networkstate transition from state i to j between consecutive episodes.Agent k path solution includes arcs (i,j)AAk for which xijk¼1

The MIP decision model may be formulated as follows:

max CPOS¼maxfposct g

∑cAN

∑tAT

posct ð7Þ

Subject to the linear convex constraint setCell visits

∑0r lrVc

vclt ¼ 1 8cAN; 8 tAT ð8Þ

∑0r lrVc

lvclt ¼ ∑0r t0 r t

yct0 8cAN; 8 tAT ð9Þ

Belief update:

pc tþ1 ¼ ∑0r lrVc

pc0βlc

vclt 8cAN; 8 tAT ð10Þ

Probability of success:

posc t�pccpc trMð1�yc tÞ 8cAN; 8 tAT ;M41 ð11Þ

posc tryc t 8cAN; 8 tAT ð12ÞInitial probability:

pc 0 ¼ pc ðt ¼ 0Þ 8cAN ð13ÞNetwork coupling:

yc t ¼ ∑kA η

∑it ðcÞAVk

∑jt þ 1 AVk

xit ðcÞ jtþ 1k cAN; tAT ;

ðitðcÞ; jtþ1ÞAAk ð14Þ

Fig. 3. Agent grid network (directed acyclic graph) excerpt, over consecutiveepisodes t and tþ1 for a 3�3-cell grid. Nodes depict agent state (position,orientation, episode) whereas arcs capture node transition between episodesdefined by possible legal moves. Squares refer to grid cells enclosing 8 possibleagent orientations. A |T|-move path may be constructed by moving along arcs fromstage 0 to stage |T|-1.


Initial agent position:

xo i0ðkÞk ¼ 1 8kAη; i0ðkÞAVk ð15Þ

yc 0 ¼ ∑kA η

δc y0ðkÞ 8cAN; 8kAη ð16Þ

Initial/final path condition:

∑iAVk

xoik ¼ 1 8kAη ð17Þ

∑iAVk

xidk ¼ 1 8kAη ð18Þ

Flow conservation:

∑iAVk [fog

xijk� ∑iAVk [fdg

xjik ¼ 0 8kAη; 8 jAVk; ði; jÞAAk ð19Þ

Maximum path length:

∑iAVk

∑jAVk=fig

xijk ¼ T 8kAη; ði; jÞAAk ð20Þ

Decision variables

posct ; pctA ½0;1� yct ; vcltAf0;1g cAN; tAT ; 8 lAf0;VcgxijkAf0;1g 8kAη; ði; jÞAAk ð21Þ

The objective function shown in Eq. (7) defines cumulativeprobability of success over the agent path solution and timehorizon |T|. Constraints are governed through Eqs. (8)–(21). For agiven path solution, constraints (8) represent the cumulativenumber of visits paid on site c by the end of time interval t.Constraints (9) simply link that number to past visits on c so far.It should be noticed that simultaneous visits by multiple agents ona specific cell over a given time interval is implicitly prevented andreinforced by the fact that yctr1, limiting to at most one, thenumber of visits a cell can receive during an episode. For cellcoverage purposes, we assume a maximum number of visits Vc tobe performed on site c. The bound Vc can be pre-computed orselected arbitrarily large. Target occupancy probability update isgoverned by constraint set (10). It is the explicit form of Eq. (6)relating belief and number of conducted visits. Constraint sets (11)and (12) determine probability of success contributions. Bothinequations mutually reflect a visit requirement to a cell to ensurea feasible observation and an admissible success contributionaligned with the objective function. M is a constant. Initialprobability distribution is specified in (13). Constraint sets (14)–(20) reflect model and network coupling as well as flow con-straints imposed on/by the agent network. Constraints (14) linkcell visits to the agent path network, connecting outgoing arcsfrom network nodes (states) on stage t to the cell c being visitedduring episode t. Accordingly, arcs (it(c),jtþ1) relate to any agentstate transition starting from position c at stage t. Agent k initialstate i0(k) and position y0(k) as well as its related networkconnection are captured in constraints (15)–(16). Constraints (17)and (18) guarantee path solution departure and final arrival pointsto be uniquely defined. Flow conservation governed by constraints(19) aims at balancing the number of incoming and outgoing arcsrespectively for a given node. Constraints (20) guarantee a |T|-move path solution for an agent, but turn out to be unnecessary assolution constraints are implicitly satisfied by agent networkconstruction. Binary and continuous domain variables are thendefined in (21).

3.3. Single team network simplification

Given agent homogeneity, a single ‘team’ (n agent) T-stagenetwork G¼(V,A) representing possible team paths may alter-natively be used, requiring minor network adjustments toconcurrently incorporate agent action multiplicity subject to

non-simultaneous visits on a same cell. The resort to a singleteam network rather than multiple network-agent mapping pro-vides additional speed-up, number of decision variable reductionand significant computer savings (by a factor n). The resultingteam directed acyclic graph G¼ ðV;AÞ captures agent multiplicitysubstituting xijk integer flow decision variables for xij, slightlymodifying some key flow constraints:

∑iAV

xij� ∑Vc

l ¼ 0lvcl ¼ 0 8cAN; ði; jðcÞÞAA

xoi ¼∑kA ηδii0ðkÞ 8 iAV∑iAVxoi ¼ n;∑iAVxid ¼ n

∑iAV [fog

xij� ∑iAV [fdg

xji ¼ 0 8 jAV ; ði; jÞAA

xijAf0;1g8ði; jÞAA

δij ¼1 if i¼ j

0 otherwise

�

The expected computational gain comes at the low costexpense of reconstructing individual agent paths from the com-puted agent-free decision variables of the team network solution.The agent path reconstruction procedure is described next.

3.3.1. Agent path reconstructionA particular agent path is reconstructed using the team net-

work and its instantiated integer flow decision variables xuv.A legal T-move agent k path is simply generated by moving alongthe computed team solution arcs from its departure state nodei0(k) (combining initial cell and orientation) in stage 1 adding therelated cell to the evolving path, up to stage T, before finallyconverging to the destination node d. Decision variables areprogressively decremented as the path expands. The agent pathreconstruction algorithm is straightforward and fast (O(nT)), assummarized below:

For k¼ 1::n do �� cycle over agentsu¼ i0ðkÞ; pathk ¼ ϕ; t ¼ 1While ðtrTÞselect state transition ðu; vÞ AA such that xuv40pathk: cellðtÞ ¼ cellut ¼ tþ1xuv ¼ xuv�1; u¼ v

end While �� T�move path constructionend For �� agentk path solution

The path solution pathk in the above procedure is composed ofa sequence of T cell visits. The path element pathk.cell(t) refers tothe specific cell (cellu) visited by agent k in period t.

3.4. Dynamic planning and time horizon

Dynamic problem solution can be computed constructivelyover receding horizons by repeatedly exploiting real informationfeedback as it becomes available and a new optimization toprogressively improve solution quality. Aside the explicit inclusionof real information feedback, large time horizon problems aresimilarly solved through repeated fast subproblem optimizationsover receding horizons as pictured in Fig. 4. Time horizon isdivided in time intervals and corresponding subproblems sequen-tially solved over respective episodes of period ΔT. Accordingly, asubproblem solution periodically expands the overall currentpartial path solution progressively incorporating a small fractionof its solution moves (subperiod δT), while updating the objectivefunction with new path contributions. Limited move insertionsdefine overlapping episodes, mitigating the effects of myopic pathplanning. A new subproblem is then periodically solved subject tothe revisited objective function updated from the previous episode


accounting for the partial solution being progressively built. Theprocess is then reiterated until the time horizon has been covered.The strategy consists in taking advantage of the fast computationof reasonable time horizon subproblems over a limited number ofepisodes to quickly compute a near optimal solution to the originalproblem.

It should be mentioned that the approach would be suitable ifand only if the planning time horizon (in general) or period ΔT(receding horizon) is larger or equal to √Ν, the dimension of thegrid. This condition allows total cell belief visibility over the entiregrid to permit optimal planning over a given planning timehorizon (the agent always perceives the whole grid). Despite thiscondition, though, when the problem time horizon exceeds theplanning time horizon, an optimal solution is not guaranteed aslocal optimizations myopically carried out over limited periods ΔTmay still slightly degrade real optimal path solution. However, theexecution of that path solution would anyway be very limited inpractice, since intermediate observed outcomes would invalidatethat solution and likely call for path re-planning well before thetime horizon deadline. The proposed near optimal approach overreceding horizons nonetheless remains simple and easy to oper-ationalize in practice if large problem time horizons must beconsidered.

3.5. Discussion

The proposed formulation confers many advantages over alter-nate modeling procedures, as the linear model allows to efficientlycompute a bound on the optimal solution quality through Lagran-grean programming relaxation. This provides a comparative mea-sure to carry out performance gap analysis over alternatesolutions, as well as the ability to trade-off solution quality andrun-time for heuristic methods operating under tight temporalconstraints. Problem-solving may be naturally achieved usingwell-known efficient techniques from the IBM CPLEX software[26] package.

In other respect, objective function (7) as advocated in [27,28,5]is quite legitimate in principle to reflect an acceptable measure ofperformance for target detection. However, a naive utilization maynonetheless lead to undesirable situations, raising some question-able legitimacy concerns in practice for search-and-rescuedomains. In effect, Eq. (7) fails to discriminate different solutionsdemonstrating either a similar sum of success contributions, or anobjective-invariant property over some feasible move permuta-tions. This is partly due to the invariance property of the objectivefunction against cell visits ordering. Solution symmetry maynaturally occur for paths presenting multiple cell visits (cycles)or subpaths proximity in which directions are reversible or specificcell visits are interchangeable. Assuming a constant for pcc, a trivialexample is the circular path solution where an agent achieves a

round-way trip, performing single visits (e.g. c1 p1¼0.2, c2p2¼0.8). In that case, the agent trajectory may be clockwise (e.g.0.8, 0.2) or counter-clockwise (e.g. 0.2, 0.8). Based on (7), bothsolutions show the same quality (pcc), when in fact one of them(e.g. 0.8, 0.2) might be clearly preferable in the context of a search-and-rescue mission. As a result, the clockwise sequence of visitswith steadily increasing beliefs (0.8, 0.2) might suitably lead toearlier detection and then improve the chance of target survivalover the so-called ‘equivalent’ counter-clockwise path plan. There-fore, a more general objective function might be more appropriateto suit particular needs in further discriminating solutions (tie-breaking), such as optimizing cumulative probability of success,time-weighted cumulative probability of success or expectedtarget detection time. In that respect, a generalized parameter-driven objective function to suitably define a variety of objectivessuited by the user is proposed (|T|41):

maxfposct g

1þγ

Tj j�1∑cAN

∑tAT

1�γtTj j

� �posct ð22Þ

subject to the same constraint sets (8)–(10), (12)–(21) exceptfor inequation (11) to be revised as follows:

ð1þγÞðposct�pccpctÞrMð1�yc tÞ8cAN; 8 tAT ;M4 Tj j

The latter formulation is necessary to support both minimiza-tion and maximization problems. The discount parameter γ A [0,1][ {� |T|} in (22) tends to reduce probability of success objectivecontributions over time. It biases time-weighted objective defini-tion toward specific problem dimensions. When γ¼0, the general-ized function mimics the cumulative probability of successobjective introduced in (7), while γ¼ε (e.g. 0.01) proposes aslightly time-weighted probability of success contributions variantto ultimately discriminate CPOS-based solutions with identicalvisits (but different ordering), in maximizing target detectionearlier. The latter form corresponds to the dominant CPOS objec-tive, modulated by average CPOS(t) values over intermediate timeperiods t. It provides a tie-breaking mechanism modifying thebasic objective function to reduce the impact of the original CPOSobjective function multimodality and path solution symmetry.Alternatively, γ¼� |T| specifies an expected detection time mini-mization problem. When |T|¼1, the solutions are virtually equiva-lent for all the aforementioned objectives.

4. MIP algorithm – CPLEX solver

The IBM ILOG CPLEX parallel Optimizer version 12.2.0.0 [26]was used, essentially exploiting various optimized problem-solving techniques for large size problems. CPLEX solves the(exact) mixed integer programming (MIP) problem model impli-citly computing an upper bound on solution quality throughintegrality constraint relaxation referred as Lagrangean program-ming relaxation (LP).

Additional speed-up can be contemplated for implementationefficiency purposes. Accordingly, simplifications involve furtherreduction of the number of decision variables and constraints. Thisincludes the suppression of the belief update pit (by virtue of itsexplicit form described by Eq. (10)) and probability of success posctvariables and their respective related constraints from the model.It consists in substituting the content of those variables directly inthe revisited objective function through the introduction of a set ofnew binary integer variables wclt (and related constraints)expressed as the product of two binary integer variables, namely,the cumulative number of visits variable vclt and the agent position

Fig. 4. A large time horizon T is defined over T/δT receding horizons of period ΔT.Moves computed in subperiods δT form the final path solution to the originalproblem.


variable yct:

maxfwclt g

∑cAN

∑tAT

∑0r lrVc

pccpc0βl�1c

wclt

|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}posct

wclt ¼ vcltyct� �

2

wcltrvcltwcltryctwcltZvcltþyct�1

8><>:

But, as during problem-solving LP integrality constraint relaxa-tion on new variables tends to violate the intended quadraticrelationship and then initially deteriorate solution quality byincreasing both optimality gap and run-time, constraints on newvariables have been rather specified as logical constraints, afeature option offered by the CPLEX solver. As a result, theapproach significantly reduced the search space during thebranching process of the algorithm reporting an order of magni-tude gain in run-time.

5. Computational experiment

A computational experiment has been conducted to test theapproach for a variety of scenarios. The value of the proposed MIPapproach is assessed in terms of optimality gap and run-time.Computed solutions are reported against the relative target prob-ability detection optimality gap shown at the end of horizon |T|:

Opt gap¼ CPOSn�CPOSaCPOSn

ð23Þ

where CPOSn is the optimal cumulative probability of successdefined in (1) or a tight upper bound (LP solution), and CPOSathe performance of our approach for a given scenario. The closer(smaller) the optimality gap the better the performance.

5.1. Simulations

Computer simulations were conducted under the followingconditions:

� Prior cell occupancy belief distribution for grid size N: expo-nential, uniform, cluster; N¼10�10

� Homogeneous sensor agents:Actions: 8 movesVc¼5 for all cells cSensor parameters: pc¼0.8 for all cells

� Hardware Platform:Intel (R) Xeon (R) CPU X5670Shared-memory multi-processing: 8 processors,2.93 GHzRandom Access Memory: 16 Go, 64 bits binary represen-tation (double precision)

It should be noted that as target cell occupancy probability sumup to one, performance analysis for large grid turns out to be lessattractive. Accordingly, the larger the grid in general, the smaller(arbitrarily negligible) the related target cell occupancy belief,inevitably conducting either to significant visit payoffs for alimited number of prominently noticeable cells sparsely distrib-uted over a large area, or alternatively in near similar cell visitrewards, for which any sub-optimal algorithms would likelydemonstrate highly competitive (near similar) performance beha-vior. In both cases, this would result in a large and costly fractionof the total effort and time dedicated to the planning andconstruction of long and unimportant subpath segments, leadingultimately to marginal or insignificant gains. Consequently, grid

instances larger than 10�10 should be further downsized andaggregated to embrace minimal belief coverage, to ensure sub-stantial analysis and solution performance evaluation. This is whythis study limited its investigation to the exploration of 10�10grid instances.

5.2. Results

A sample of random simulation results is reported in Table 1 fora few 10�10 grid 8-move multi-agent scenarios over horizon T.Each entry corresponds to a separate problem instance. Thesubscript ‘CL’ to an instance identifier refers to a clustered beliefdistribution. Team size (number of agents) and time horizon arespecified in second and third column respectively. Performances interms of cumulative probability of success (CPOS) and optimalitygap for the optimal CPLEX solver – MIP, are reported in the fourthcolumn. Run-time expressed in seconds is shown in the lastcolumn.

Computational results show that an optimal solution is com-putable approximately in a minute run-time, except for instanceC (5 agents) where 142.7 s were necessary against 28 s to getless than a 1% gap. Solutions reported for 5-agent instances F–Ilabeled by the n symbol are computed much faster using thestatic model [29], in which maximum belief coverage

∑cAN

pc0 1�∑Vcl ¼ 0 1�pcc

� �lvcljTj� � �obtained in searching over the

grid reaches more than 95%, meaning that most promising cellshave already been covered and that best solutions from bothdecision models are nearly similar to one another, making unne-cessary extensive path solution computation for the proposeddecision model (no expected gain). Complete computation for thedecision model nonetheless shows a 0% gap for those instances.

Results surprisingly indicate that 8-move near optimal multi-agent solution may generally be computed on a second timescale.It is interesting to generally note an order of magnitude (approxi-mately 10) run-time ratio for 5 and 2 -agent problem instancesrespectively, despite their relative solution space size which is

Table 1Performance of CPLEX Solver (MIP) for a sample of 8-move 2,5 –agent data set(10�10 grid).

Instance # Agents Time horizon|T|

CPLEX Solver – MIP CPLEX solverrun-time (s)

CPOS Opt gap %

ACL 2 20 0.3623 0 3.245 10 0.7582 0 35.9

A 2 20 0.3630 0 4.95 10 0.7523 0 42.9

B 2 20 0.4021 0 6.15 10 0.7650 0 58.1

C 2 20 0.3692 0 2.45 10 0.7473 0 (1) 142.7(28.8)

D 2 20 0.4021 0 6.35 10 0.7651 0 57.8

ECL 2 20 0.5545 0 4.05 10 0.8905 0 22.0

FCL 2 20 0.7560 0 29.85 10 0.9780 0 10.4n

G 2 12 0.6595 0 1.45 10 0.9468 0 2.8n

HCL 2 12 0.7087 0 2.45 10 0.9547 0 2.4n

ICL 2 12 0.8208 0 8.95 10 0.9872 0 1.6n

J 2 20 0.3165 0 5.45 10 0.6138 0 63.3

K 2 20 0.2521 0 4.45 10 0.5736 0 38.13


exponential (8nT/8n0T0 �109). Providing best or near optimal solu-

tion and measurable gain (upper bound through Lagrangeanintegrality constraint relaxation) for practical size problems, theapproach may be repeatedly reused in dynamic settings exploitingintermediate sensor readings, given its small run-time. However,even though 5-agent scenarios involving a time horizon T largerthan 12 are generally computationally prohibitive and mightrequire several minutes to ensure convergence to solution optim-ality, T¼10 -move planning scenarios are sufficient to dynamicallybuild a path plan one step at a time, as the grid remains alwaysentirely visible to the planner during planning. It should also benoticed that reported path solutions for the 5-agent T¼10 scenar-ios mostly cover a significant portion of interesting cells asillustrated by CPOS performance results. Upgrading computationalpower technology through faster hardware and augmented paral-lel processing might further extend computable T.

6. Conclusion

An innovative mixed-integer linear programming (MIP)approach has been proposed to solve a probabilistic open-loopmulti-agent search and rescue path planning problem with antici-pated feedback, in which agent actions are subject to any neigh-boring move directions. Small computational cost naturally allowsdynamic planning through a closed-loop environment settingswhere real information feedback resulting from past sensor agentobservations is exploited to compute a revisited solution over arolling horizon during the next cycle (episode). The novelty of theapproach lies in a revisited combination of an extended problemformulation, an original network representation, and a refinedproblem-solving procedure based on linear programming CPLEXtechnology to efficiently compute near-optimal solution for prac-tical size problems, usually handled through heuristic methods.For the first time, an upper bound estimate on the optimalsolution naturally derived from the approach may be used forconvergence or performance comparison analysis purposes, and/or trading-off solution quality and execution time. Experimentalresults demonstrate the value of the proposed approach forpractical size problems, proving problem-solving to be feasible inreasonable time.

Future research directions will consist in considering general-ized sensor footprint, and increasingly complex observation mod-els (e.g. false-positive) while extending search to moving targets.Alternate research work will explore search problem modelingvariants involving heterogeneous sensing agents.

References

[1] Trummel KE, Weisinger JR. The complexity of the optimal searcher pathproblem. Oper Res 1986;34(2):324–7.

[2] Benkoski S, Monticino M, Weisinger J. A survey of the search theory literature.Naval Res Logist 1991;38:469–94.

[3] Stone L. What's happened in search theory since the 1975 Lanchester Prize?Oper Res 1989;37(3).

[4] Hohzaki R, Iida K. Optimal search plan for a moving target when a search pathis given. Math Jpn 1995;41(1):175–84.

[5] Lau H. Optimal Search in Structured Environments. Sydney: University ofTechnology; 2007 (Ph. D. thesis).

[6] Lau H, Dissanayake G. Probabilistic search for a moving target in an indoorenvironment. In: Proceedings of the IEEE/RSJ international conference intel-ligent on robots and systems; 2006, p. 3393–8.

[7] Lau H, Dissanayake G. Optimal search for multiple targets in a built environ-ment. In: Proceedings of the IEEE/RSJ international conference on intelligentrobots and systems. Edmonton, Alberta, Canada; 2005, p. 228–3.

[8] Latombe J-C. Robot motion planning, ser. international series in engineeringand computer science; robotics: vision, manipulation and sensors. Boston:Kluwer Academic Publishers; 1991.

[9] Choo C, Smith J, Nasrabadi N. An efficient terrain acquisition algorithm for amobile robot: In Proceedings of the IEEE international conference on roboticsand automation. Sacramento, CA; 1991: p. 306–11.

[10] Sankaranarayanan A, Masuda I. Sensor based terrain acquisition: A new,hierarchical algorithm and a basic theory. In: Proceedings of the IEEE/RSJinternational conference on intelligent robots and systems. Raleigh; 1992: p.1515–23.

[11] Svennebring J, Koenig S. Building terrain-covering ant robots: a feasibilitystudy. Auton Robots 2004;16(3):313–32.

[12] Wong S, MacDonald B. A topological coverage algorithm for mobile robots. In:Proceedings of the IEEE/RSJ international conference on intelligent robots andsystems. Las Vegas: 2003; p.1685–90.

[13] Yang S, Luo C. A neural network approach to complete coverage path planning.IEEE Trans. Syst Man Cybern B Cybern 2004;34(1):718–24.

[14] Rekleitis I, New AP, Rankin ES, Choset H. Efficient boustrophedon multi-robotcoverage: an algorithmic approach. Ann Math Artif Intell 2008;52:109–42.

[15] Wagner IA, Lindenbaum M, Bruckstein AM. Distributed covering by ant-robotsusing evaporating traces. IEEE Trans Robot Autom 1999;15(5):918–33.

[16] Chung T, Hollinger G, Isler V. Search and pursuit-evasion in mobile robotics: asurvey. Autonom Robots 2011;31(4):299–316.

[17] Royset JO, Sato H. Route optimization for multiple searchers. Naval Res Logist2010;57(8):701–17.

[18] Chung TH, Burdick J. Analysis of search decision making using probabilisticsearch strategies. IEEE Trans Robot 2012;28(10):132–44.

[19] Jin Y, Liao Y, Minai A, Polycarpou M. Balancing search and target response incooperative unmanned aerial vehicle (UAV) Teams. IEEE Trans Sys Man CybernPart B 2006;36(3):571–87.

[20] Hollinger GA. Search in the Physical World. CMU-RI-TR-10-20 [Ph.D. thesis].Robotics Institute, Carnegie Mellon University; 2010.

[21] Chung T. On probabilistic search decisions under searcher motion constraints.Workshop on algorithmic foundation of robotics VIII. Guanajuato, Mexico;2009. p. 501–16.

[22] Hollinger GA, Singh S. GSST: anytime guaranteed search with spanning trees.Auton Robots 2010;29(1):99–118.

[23] Washburn AR. Branch and Bound Methods for a Search Problem. Naval ResLogist 1998;45:243–57.

[24] Thunberg J, Ögren P. A Mixed Integer Linear Programming approach to pursuitevasion problems with optional connectivity constraints. Auton Robots2011;31(4):333–43.

[25] Berger J, Lo N, Noel M. Exact solution for search-and-rescue path planning. In:Proceedings of the 2nd international conference on information computerapplications. Rome, Italy; February 2013.

[26] ILOG CPLEX I.B.M. Optimizer, Available: ⟨http://www-01.ibm.com/software/integration/optimization/cplex-optimization-studio/cplex-optimizer/cplex-performance/⟩ and IBM ILOG CPLEX V12.1, 2009. Available: ⟨ftp://public.dhe.ibm.com/software/websphere/ ilog/docs/optimization/cplex/ps_usrmancplex.pdf⟩.

[27] Morin M, Abi-Zeid I, Lang P, Lamontagne L, Maupin P. The optimal searcherpath problem with a visibility criterion in discrete time and space. In:Proceedings of the International conference on Information Fusion. Seatle,USA; 2009.

[28] Morin M, Papillon AP, Laviolette F, Abi-Zeid I, Quimper C.G. Constraintprogramming for path planning with uncertainty: solving the optimal searchpath problem. In: Proceedings of the 18th conference on principles andpractice of constraint programming. Québec, Qc, Canada; 2012; p. 988–1003.

[29] Lo N, Berger J, Noel M. Toward optimizing static target search path planning.In: Proceedings of the 5th IEEE symposium on computational intelligence forsecurity and defence applications. Ottawa, Canada; July 2012.


http://refhub.elsevier.com/S0305-0548(14)00174-9/sbref1







































http://www-01.ibm.com/software/integration/optimization/cplex-optimization-studio/cplex-optimizer/cplex-performance/



ftp://public.dhe.ibm.com/software/websphere/ ilog/docs/optimization/cplex/ps_usrmancplex.pdf



an innovative multi-agent search-and-rescue path planning approach

Documents