a distributed evolutionary approach for multisite mapping on grids

23
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168 Published online 14 January 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1694 A distributed evolutionary approach for multisite mapping on grids I. De Falco, U. Scafuri and E. Tarantino , Institute of High Performance Computing and Networking, National Research Council of Italy, Via. P. Castellino 111, 80131 Naples, Italy SUMMARY In this paper attention is concentrated on the mapping of computationally intensive multi-task applications onto shared computational grids. This problem, already known to be as NP-complete in parallel systems, becomes even more arduous in such environments. To find a near-optimal mapping solution a parallel version of a Differential Evolution algorithm is presented and evaluated on different applications and operating conditions of the grid nodes. The purpose is to select for a given application the mapping solutions that minimize the greatest among the time intervals which each node dedicates to the execution of the tasks assigned to it. The experiments, effected with applications represented as task interaction graphs, demonstrate the ability of the evolutionary tool to perform multisite grid mapping, and show that the parallel approach is more effective than the sequential version both in enhancing the quality of the solution and in the time needed to get it. Copyright 2011 John Wiley & Sons, Ltd. Received 12 June 2009; Revised 25 October 2010; Accepted 6 November 2010 KEY WORDS: grid computing; multisite mapping; differential evolution 1. INTRODUCTION The last decade has witnessed a rapidly growing interest in computing grids [1]. Grids enable the aggregation and the sharing of geographically dispersed ‘autonomous’ computing nodes which normally use only a small percentage of their computational power to execute local workloads. This means that, even by aggregating the idle CPU cycles of these nodes, a considerable computational power is available which could be profitably exploited to execute very challenging applications. To efficiently exploit these idle cycles, programmers must deal with the classical problems of parallel computing as well as with the grid-specific ones, such as resource discovery, mapping and task scheduling [2]. The resource discovery phase has to determine the amount, type, and status of the available resources. The mapping phase must select, in accordance with the possible user requirements, the nodes that opportunely match the application needs with the available grid resources. Finally, the last phase establishes the schedule timing of the tasks on each node. In this work the first and third phases are assumed to be implicitly overcome. In fact the discovery phase is avoided by supposing as known both the characteristics and the status of the different nodes and the tasks requests. The third phase is eluded by selecting only the nodes which, according to their states, have at their disposal the computing resources needed to execute the tasks that are being assigned. This last assumption means that all the tasks of the same application are implicitly co-scheduled, i.e. contemporaneously loaded in the queues of runnable processes of the nodes they are assigned to. Naturally the co-scheduling represents one of the main requirements to be taken into account when scheduling a multi-task application and is necessary because, in the absence of information about the communication timing, the communicating tasks execution Correspondence to: E. Tarantino, ICAR–CNR, Via. P. Castellino 111, 80131 Naples, Italy. E-mail: [email protected] Copyright 2011 John Wiley & Sons, Ltd.

Upload: i-de-falco

Post on 11-Jun-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A distributed evolutionary approach for multisite mapping on grids

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2011; 23:1146–1168Published online 14 January 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1694

A distributed evolutionary approach for multisite mapping on grids

I. De Falco, U. Scafuri and E. Tarantino∗,†

Institute of High Performance Computing and Networking, National Research Council of Italy, Via. P. Castellino111, 80131 Naples, Italy

SUMMARY

In this paper attention is concentrated on the mapping of computationally intensive multi-task applicationsonto shared computational grids. This problem, already known to be as NP-complete in parallel systems,becomes even more arduous in such environments. To find a near-optimal mapping solution a parallelversion of a Differential Evolution algorithm is presented and evaluated on different applications andoperating conditions of the grid nodes. The purpose is to select for a given application the mappingsolutions that minimize the greatest among the time intervals which each node dedicates to the executionof the tasks assigned to it. The experiments, effected with applications represented as task interactiongraphs, demonstrate the ability of the evolutionary tool to perform multisite grid mapping, and show thatthe parallel approach is more effective than the sequential version both in enhancing the quality of thesolution and in the time needed to get it. Copyright � 2011 John Wiley & Sons, Ltd.

Received 12 June 2009; Revised 25 October 2010; Accepted 6 November 2010

KEY WORDS: grid computing; multisite mapping; differential evolution

1. INTRODUCTION

The last decade has witnessed a rapidly growing interest in computing grids [1]. Grids enable theaggregation and the sharing of geographically dispersed ‘autonomous’ computing nodes whichnormally use only a small percentage of their computational power to execute local workloads. Thismeans that, even by aggregating the idle CPU cycles of these nodes, a considerable computationalpower is available which could be profitably exploited to execute very challenging applications.

To efficiently exploit these idle cycles, programmers must deal with the classical problems ofparallel computing as well as with the grid-specific ones, such as resource discovery, mappingand task scheduling [2]. The resource discovery phase has to determine the amount, type, andstatus of the available resources. The mapping phase must select, in accordance with the possibleuser requirements, the nodes that opportunely match the application needs with the available gridresources. Finally, the last phase establishes the schedule timing of the tasks on each node.

In this work the first and third phases are assumed to be implicitly overcome. In fact thediscovery phase is avoided by supposing as known both the characteristics and the status of thedifferent nodes and the tasks requests. The third phase is eluded by selecting only the nodes which,according to their states, have at their disposal the computing resources needed to execute the tasksthat are being assigned. This last assumption means that all the tasks of the same application areimplicitly co-scheduled, i.e. contemporaneously loaded in the queues of runnable processes of thenodes they are assigned to. Naturally the co-scheduling represents one of the main requirementsto be taken into account when scheduling a multi-task application and is necessary because, inthe absence of information about the communication timing, the communicating tasks execution

∗Correspondence to: E. Tarantino, ICAR–CNR, Via. P. Castellino 111, 80131 Naples, Italy.†E-mail: [email protected]

Copyright � 2011 John Wiley & Sons, Ltd.

Page 2: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1147

proceeds only if their simultaneous allocation is guaranteed [3–5]. Thus this premised, the attentionis focused on the mapping phase only.

The mapping problem, already known as NP-complete for parallel systems [6, 7], becomes evenmore arduous in a distributed heterogeneous grid framework if we consider the ever increasingnumber of sites and nodes per site, due to the ever growing computational demands of large anddiverse groups of applications.

To maximize the application throughput, multisite mappers must simultaneously leverage thecollective computational resources of the nodes participating in the grid. In such a way, applicationsthat would otherwise wait for nodes to become available on a single site can potentially run earlierby aggregating disjoint resources throughout the multisite. This procedure can result in remarkablereductions in the queue waiting times [8].

In this paper a multisite mapping of parallel communicating tasks of computationally intensiveapplications, modelled as Task Interaction Graphs (TIGs) [9, 10], on non-dedicated grid nodes ispresented. Although several approaches have been investigated in the literature [11], the problemof optimally mapping being of NP-complete nature, metaheuristic algorithms [12, 13] turn out ashighly appropriate tools for tackling it [14, 15]. Hence, the development of mapping tools based onmetaheuristic optimization techniques, among which sequential evolutionary algorithms [15–21],represents an interesting area of research in spite of their problems of convergence speed. In fact,their strength is that they are able to produce good quality solutions for large and complex workflowapplications [22].

In most papers the allocation of applications to resources is performed with respect to one ormore optimization criteria, such as minimal completion time, minimal cost of assigned resourcesor maximal throughput. Here, for each application to be mapped we aim at minimizing the greatestamong the time intervals which each node dedicates to the execution of the tasks assigned to it.

This view leads towards the discovery of solutions which do not use for each task the mostpowerful available node if, due to task overlapping, its employment does not contribute to anautomatic reduction of the above greatest time intervals. In such a way, the manager uses resourcessuitable to the task requirements and avoids keeping busy the most powerful nodes which couldbe more profitably exploited for further applications.

In [23] a Differential Evolution (DE) [24, 25] approach is proposed to successfully deal with themapping problem. Here, we present a parallel approach based on a DE algorithm to enhance themapping solution and to reduce the response time of the automatic mapper. This faster tool, beingmore attractive and applicable in real situations when the time could result as a discriminatingfactor, is also more appealing to be incorporated into grid systems workflow management.

A comparison with other methods is not reported in this paper because, as far as we know, thereare no papers which suppose the same operating conditions as ours. Even in [11] it is remarkedthat heuristics developed to perform the mapping function are often difficult to compare becauseof different underlying assumptions. Hence, the experimental phase is finalized to evaluate thecapability of our evolutionary mapper to perform grid mapping, and to show that the parallelapproach is more effective than the sequential version both in improving the solution quality andin the time needed to gain it.

The structure of the paper is as follows: Section 2 outlines the mapping models and the posi-tioning of the proposed approach together with the related works; Section 3 presents our grid frame-work and Section 4 illustrates our evolutionary tool for the mapping problem. Section 5 describesthe distributed DE (DDE) algorithm, while Section 6 details the experiments on the test problemsfaced and discusses the results attained. Finally Section 7 draws conclusions and the future works.

2. MAPPING PROBLEMS IN GRIDS

2.1. Mapping models

Generally each mapping model includes three aspects: the program model, the architecture modeland the cost function model. The first describes the computation tasks and their communication

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 3: A distributed evolutionary approach for multisite mapping on grids

1148 I. DE FALCO, U. SCAFURI AND E. TARANTINO

patterns, the second accounts for the processors and their communication facilities, whereas thethird involves the function which is used to determine the cost of a mapping. The different choicesabout these three aspects give rise to a set of taxonomies for mapping tools. Especially the programmodel is important when classifying mapping strategies: it takes into account the relationshipamong the tasks making up the parallel applications to be placed on the grid nodes.

In the case of dependent tasks two program models are generally used to represent the structureof the computations and communication patterns of the parallel application as program graphs[26]: the Direct Acyclic Graph (DAG) [27] and the TIG.

The DAG model is well suited for computations where interactions take place only at thebeginning and at the end of each task. The parallel applications are commonly modelled as adirected graph where the nodes represent the program tasks and the edges represent both theprecedence relationship and the communication among tasks.

On the other hand, the TIG model is appropriate when temporal dependencies in the executionof tasks are not explicitly addressed: the tasks are considered simultaneously executable and theircommunications can take place at any time, in general according to iterative and non-deterministicpatterns. The parallel application is modelled as an undirected graph where the nodes representthe program tasks and the edges the mutual communication among tasks.

The architecture model is usually represented by a Processor Optimization Graph where nodesrepresent the processors and undirected edges represent bidirectional physical communicationlinks. In this case it is important to know whether the available processors are homogeneous orheterogeneous, whether the communications can take place on a fully connected network or makeuse of switches or buses, and so on. In heterogeneous systems such as grids, a weight is associatedwith each processor indicating its processing speed, and the processors are not fully connected,rather links connect the local sites together, and their performance affects that of the whole grid.

The choice of the program model has a direct consequence on the cost function which shouldbe optimized. In fact, in the case of a DAG the goal is to minimize the overall execution time,also called makespan, and ‘ready to execute’ times of each task should be accounted for. In a TIG,instead, the execution times are difficult to evaluate due to run-time dependencies, and the costfunctions may be broadly categorized as belonging to one of two models: the former, called Min–max, aims at minimizing the cost of the most loaded processor, while the ‘summed cost’ tries tominimize at the same time the load unbalance among nodes and the total communication cost [26].

Although the DAG model is more popular, it has been pointed out in [28] that TIG is moreflexible to describe the different program structures.

2.2. Our approach

We have decided to make use of the TIG program model in our work, with a Min–max approach.No assumptions have been made about the communication timing between the processes sincewe have hypothesized the co-scheduling of tasks. Moreover, we have considered a heterogeneousenvironment.

Although computationally intensive, metaheuristics can be applied as general-purpose tools toall program models [22]. Their use is justified from the points of view of generality of the mappingtool and quality of the solutions achieved in all situations that are more complicated than DAGs,as it is the case for TIGs. For this reason an evolutionary algorithm has been chosen as the engineof our mapper and, aiming at reducing the response time, a distributed version is investigated inthis paper. This means that our evolutionary-based tool is quite general and can also deal withcomplex workflow applications for which other mapping algorithms cannot be used at all. Ofcourse in specific program models, such as DAGs, this general-purpose mapping framework mightachieve worse performance than that of special-purpose ones, but in most cases the other mappingalgorithms cannot work.

As regards the specific features of the mapping tool designed, if the hierarchical taxonomyby Dong and Akl [29] is considered, it lies in the global, static, suboptimal, heuristic, adaptivetaxon. If, instead, the classification by Yu et al. [30] is used, then our mapper can be classified asbest-effort-based with use of metaheuristics.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 4: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1149

It is also to be underlined that, unlike all the other existing evolutionary approaches whichsimply search for a single site onto which to map the application [17], we deal with a multisiteapproach.

Furthermore, as a further distinctive issue with respect to other strategies in the literature [29],we consider each node making up the sites as the lowest computational unit.

Finally, since we are dealing with a shared grid, we explicitly take local loads of the nodes intoaccount.

2.3. Related works

Concerning the state of the art in the literature, many algorithms have been proposed which arespecific to the different operating environments [11, 31, 32]. Mapping tools based on metaheuristicoptimization techniques have received great interest due to their capability to deal with NP-complete problems [14, 15]. These metaheuristics include ant colony optimization [14, 33], tabusearch [34], simulated annealing [35] and evolutionary algorithms [36, 37]. Many experiments havebeen conducted with these latter methods.

Some of them are very effective for a class of problems only, as for instance in the case of Min–min, Max–min [29] and XSufferage [38] for independent task workloads [11, 15, 32, 33, 39–41]. Itshould be noted that they work very well for homogeneous platforms, whereas their performancesare affected in heterogeneous environments like grids.

Several evolutionary-based techniques have been used to face task allocation in a heterogeneousor grid environment but they are related to dedicated platforms [11, 14–19, 37, 42–44].

Some other approaches have addressed multisite application scheduling with co–allocation, but,different from our model, all the sites of the grid framework are assumed to be homogeneous[45–47] or else resource reservation is considered [48].

Other models are relative to heterogeneous environments as well but refer to single site mappingand the fitness function is calculated as a weighted makespan for each parallel computer so asto obtain a load balance on the grid nodes [35, 49], or to small four-node grids for which thecompletion time of an application is taken into account [50].

Depending on the program model reported in Section 2.1, the DAG workflow problem has beenextensively investigated and a number of scheduling heuristics were proposed. These heuristicscan be classified into several categories [29, 51], which are list scheduling algorithms [52] andgroup scheduling algorithms [53, 54].

In [55, 56], on the basis of a DAG form of the workflow, the mapping of a heavy communicationworkload within Service Level Agreement (SLA) is attained by reserving the resources that meetthe user’s deadline whereas in our technique no advance reservation is considered.

In [57] Aziz and El-Rewini present a framework for resource allocation and task scheduling ingrid environments for intensive applications, which can be represented as DAGs, while keepingmakespan as minimum as the current state of resources in grid allows.

These approaches based on DAGs differ from ours in the objective and working conditions andare not applicable to parallel applications structured as TIGs.

Finally, the problem of allocating TIGs to heterogeneous computing systems to minimize appli-cation completion time is addressed in some papers.

In [58] an approach is presented which is restricted to a single local area network, such asEthernet, and fully interconnected multiprocessor systems so that it cannot handle heterogeneoussystems like grids.

A generalized algorithm able to allocate TIGs as well as DAGs is proposed in [59], whichcomputes an initial deployment and then improves it by pairwise exchange. The algorithm is verygeneral, its complexity is high and the connections are homogeneous so that it is unsuitable forgrid environments.

The work by Jain et al. [60] proposes a fast mapping heuristic, called FastMap, designed torun on a hierarchy of schedulers. It is assumed that the schedulers are provided with an a prioriknowledge of the resources. Nevertheless, even in this case a comparison is not possible since themapping aims at minimizing the application execution time and, as a further difference from our

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 5: A distributed evolutionary approach for multisite mapping on grids

1150 I. DE FALCO, U. SCAFURI AND E. TARANTINO

approach, schedulers can make advance reservations. Moreover, dedicated sets of resources areconsidered and this entails that this heuristic is not exploitable on grids made up of shared nodes.

In [61] a launch-time heuristics, called WASE, for static scheduling of TIGs on a grid made up ofnon-dedicated computational resources is proposed with the goal to minimize task computationalrequirements, maximize the throughput between communicating tasks, and evaluate on-the-flythe resource availability to minimize the aging effect on the resource state. This heuristics ischaracterized by a preprocessing phase to find a set of sub-graphs, called clusters, grouping thetasks with high inter-communication costs and a scheduling phase in which the clusters are arrangedin descending order with respect to their minimal computational requests. Different from ourapproach, WASE presents the restriction that the tasks with high traffic are grouped into a set whichis submitted to the computing clusters in the same LAN. This heuristics becomes ineffective toperform an efficient starting clustering when the application tasks involved do not present similarfeatures.

3. OUR GRID FRAMEWORK

3.1. The architecture

Although the computational grids are finalized to provide ‘on demand’ the resources needed toexecute the submitted applications, they can be differentiated on the basis both of the user typologyand of the sharing/accessing rules adopted for them.

For example grids have been built among the different branches of the same organization whichhave computing systems dimensioned on their own average workloads but are unable to dealwith peak computational requests. By aggregating these sparse resources, the grid structure allowscomplying with these peak demands by means of the exclusive use of the assigned grid nodes.

Another example of computational grids is originated by the Seti@home project [62] in whichmillions of users make their unemployed resources available during their idle times.

From a logical viewpoint, the grid architecture considered in this paper presents similaritieswith both the grids described above. In fact, as the first grid, it is structured in sites and permits theexecution of an application presented by a user that does not locally have the requested resources,whereas as the second one, it allows access to the grid nodes during their idle times.

Thus, any site of our grid contains different computing systems each made up of non-dedicatedprocessing nodes. This means that each node executes concurrently its local and grid workloads.

3.2. The task scheduling model

From a conceptual perspective, our task scheduling model is realized as reported in Figure 1. Fromthe figure we hypothesize the existence of a unique Grid mapper (GM), whose interface is residenton a grid node, to which any node, by means of its Grid loader (GL), must forward the mappingrequests of all the grid applications submitted by its users.

We suppose that each grid node, by collecting information about its computation and communi-cation activities in a prefixed time interval, is able to provide an estimate of its average workload.In such a hypothesis, any node, periodically or when the average current local load varies, sendsvia its Node resource status module (NRS) this workload to the Grid resource status (GRS) agentof the GM.

We also assume that any grid node has two distinct task queues, one for the local applicationtasks and the other for the grid application tasks. When a user has to run an application, theApplication manager (AM) acts in the following way: if the application is submitted for a localexecution, it will be put into the local queue by means of its Local loader (LL) and executedaccording to the scheduling policy of the node; conversely, if the application must be run on thegrid, the AM forwards the grid application request to the GL which extracts the related applicationinformation (number of tasks, the amount of computations and communications for each task andso on) and sends this information to the Grid application requests module (GAR) of the GM. Thislatter, on the basis of available grid resources and task requirements, activates the execution of

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 6: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1151

Figure 1. Our multisite task scheduling model.

the parallel evolutionary mapping procedure and returns the allocation retrieved to the GL of thecalling node. At this point, the GL sends to the GLs of the nodes present in the received mappingsolution the list and the information relative to the application tasks that each of these loaders hasto load on its own node grid queue.

If a single task were assigned to a node with available resources, it might be idly waiting forevents, e.g. communications, and node resources would remain temporarily unexploited. Then, toavoid this and use best the available resources, the possibility to allocate more tasks, also belongingto different applications, on the same node is allowed.

As regards the grid processes inserted in the grid queue on the basis of their submission order,a pre-emptive FIFO scheduling policy is adopted. In particular, a grid process is running if andonly if there are neither ready processes in the local queue nor ready processes which precede itin the grid queue, while it is de-scheduled and returns ready as soon as either a local process ora grid process previously submitted becomes executable. It is worth noting that a running processis de–scheduled and becomes blocked also when it is waiting for a communication.

3.3. Definitions and assumptions

Some assumptions about the application tasks and the available resources must be made to performthe mapping on grids. In [63] the MACA mapping algorithm, based on the affinity between thefeatures of the hardware grid resources and the characteristics of the parallel application tasks, ispresented. Similar to [63], to focus the mapping problem in our premised grid, we need informationabout the number and the status of both accessible and demanded resources. Consequently, weassume to have a grid application subdivided into P tasks (demanded resources) to be mapped onthe N nodes of the grid (accessible resources).

We have to know the power (the number of instructions computed per time unit), the networkbandwidth and the average load in terms of computations and communications of each grid nodein any given time span. In fact, in a shared-resource computing environment, the available powerof each node and the bandwidth between couples of nodes vary over time due to the load fromconcurrent users. In particular, we need to know a priori the number of instructions �i computedper time unit on each node i . Furthermore, we assume to have cognition of the communicationbandwidth �i j between any couple of nodes i and j . It should be noted that �i j is the genericelement of an N × N symmetric matrix b with very high values on the main diagonal, i.e. �i i isthe bandwidth between two tasks on the same node.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 7: A distributed evolutionary approach for multisite mapping on grids

1152 I. DE FALCO, U. SCAFURI AND E. TARANTINO

In general grids address non-dedicated resources which have their own local workloads, thusaffecting a node local performance. Thus we must consider these load conditions to evaluatethe expected computation time. There exist several prediction models to face the challenge ofnon-dedicated resources [64–66]. For example, concerning the computational power, we supposeto know the average computational load �ci (�t) of the node i at a given time span �t with�ci (�t)∈ [0.0,1.0], where 0.0 means a node fully discharged and 1.0 a node locally loaded at100%. Hence (1−�ci (�t)) ·�i represents the fraction of power at node i available for executing gridtasks. Analogously, we assume to have information about the average bandwidth load ��i j

(�t)∈[0.0,1.0] relative to the communication channel between the nodes i and j where, as before,0.0 implies a bandwidth completely unused and 1.0 a communication channel totally busy dueto the communications requested by the other tasks on these nodes. Similarly (1−��i j

(�t)) ·�i jdenotes the fraction of bandwidth available for the communications of the grid tasks. A statisticalestimation of the average workload is provided by the N RS module, outlined in Section 3.2.

As regards the resources requested by the application, we assume to know for each task kthe respective number of instructions �k to be executed and the number of communications �kmbetween the kth and the mth tasks ∀m �=k. Obviously, �km is the generic element of a P × Psymmetric matrix w with all null elements on the main diagonal.

All this information can be obtained either by a static program analysis, or by using smartcompilers or by other tools which automatically generate it. For example the specialized GlobusToolkit [67, 68] includes the Resource Specification Language which constitutes an XML format todefine application requirements [69]. In addition, in [70] the Prophesy instrumentation tool, whichcollects aggregate information about sections of an application code for developing performancemodels, is described. This modeling technique can be used in the analysis of grid applications.

An estimation of the application requests is assumed to be supplied by the GL module, describedin Section 3.2.

4. DIFFERENTIAL EVOLUTION FOR MAPPING

4.1. The technique

DE [24, 25] is one of the most powerful and reliable stochastic optimization techniques in currentuse for problems with real parameters. DE is a very simple technique and takes few controlparameters, which makes it easy to use. Nonetheless, DE exhibits remarkable performance inoptimizing a wide variety of multidimensional and multimodal objective functions in terms offinal accuracy, convergence speed and robustness, and outperforms many of the already existingstochastic and direct search global optimization techniques [71, 72]. In particular, its effectivenessin scheduling independent tasks on heterogeneous environments is investigated in [73].

Within a multidimensional search space, a fixed number of solutions are randomly initialized,then evolved over time to explore the search space and to locate the optima of the fitness function,denoted with �, which evaluates the optimality of a potential solution. The definition of this fitnessfunction is closely correlated to the optimization problem under investigation. The population ofDE is subject to three operators of mutation, crossover and selection.

In particular, given a minimization problem within a q-dimensional search space of real param-eters, DE faces it by starting with a population of M randomly chosen potential solution vectorseach made up of q real values, one for each problem dimension. The population is evolved fromone generation to the next creating new individuals by combining vectors randomly chosen withinthe current population (mutation). The outcoming vectors are then mixed with a predeterminedtarget vector (crossover operator), thus producing the trial vector. Many different transformationschemes have been defined by the inventors to produce the candidate trial vector [24, 25]. Toexplicit the strategy they established a notation for each DE technique with a string like DE/x/y/z.In it DE stands for Differential Evolution, x is a string which denotes the vector to be perturbed(best= the best individual in current population, rand=a randomly chosen one, rand–to–best=a random one, but the current best participates in the perturbation too), y is the number of

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 8: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1153

difference vectors taken for perturbation of x (either 1 or 2), while z is the crossover method(exp=exponential, bin=binomial). For example the DE/rand/1/bin model involves a randomindividual to be perturbed by using one difference vector and by applying binomial crossover. Adifference vector is a subtraction between two solution vectors within the current population. Morespecifically, to create the new i th individual in the next population three integer numbers r1, r2and r3 in [1, . . . ,M] differing from one another and different from i are randomly chosen. Oncethe mutated trial vector xi∗ = xr3 + F ·(xr1 −xr2 ), where F is the mutation weighting factor whichcontrols the magnitude of the differential variation, is created, it will undergo binomial genewisecrossover.

Another integer number s in the range [1,q] is randomly generated. For each gene of the trialvector a random number � in [0.0,1.0] is generated, and if this is lower than the crossover rateC R (control parameter of the DE set by the user, in the same range as �) or the position j underaccount is exactly s, then the j th gene of the new trial xi ′ is generated as

xi ′j = xi∗ j = xr3 j + F ·(xr1 j −xr2 j ) (1)

otherwise the gene of the original vector is kept: xi ′ j = xi j . Finally, the selection phase takesplace: this new trial individual x ′

i is compared against xi in the current population and, if fitter,replaces it in the next population, otherwise the original survives into the new population. Such acomparison is performed on the basis of �. This basic scheme is repeated for a maximum numberof generations g or until some stopping criterion is fulfilled. The pseudocode of this procedure,specialized for the mapping problem, is reported at the end of Section 4.3.

4.2. Encoding

In general, any mapping solution should be represented by a vector l of P integers ranging in[1, N ]. To obtain l, the real values provided by DE in [1, N +1] are truncated before evaluation.The truncated value ��i� denotes the node onto which the task i is mapped by the proposedsolution.

As long as the mapping is considered by only characterizing the tasks by means of theircomputational needs �k , this is an NP-complete optimization problem, in which the allocationof a task does not affect that of the others, unless one attempts to load more tasks on the samenode. If instead, also communications �km are taken into account, the mapping problem becomesby far more complicated. In fact, the allocation of a task on a given node can cause the optimalmapping to need also other tasks to be allocated on the same node or in the same site, so as todecrease their communication times and thus their execution times, taking advantage of the highercommunication bandwidths existing within any site compared with those between sites.

This problem is a typical example of epistasis, i.e. a situation in which the value taken onby a variable influences those of other variables. Such a situation is also deceptive, since veryfrequently a solution l1 can be transformed into another with better fitness l2 only by passingthrough intermediate solutions, worse than both l1 and l2, which would be discarded. This is aproblem since DE accepts a new solution only if it is better than the current one. To overcome thislimitation we have introduced a new operator, named site mutation, applied with a probability pmany time a new individual must be generated. When this mutation is to be carried out, a position inthe current solution l is randomly chosen. Let us suppose that its value refers to a node belongingto a site Si . This value is equiprobabilistically modified into another value which is related to anode of another site, say S j . Then, any other task assigned to Si in the current solution is let torandomly migrate to a node of S j by inserting into the related position a random value within thebounds for S j . If site mutation does not take place, the classical transformations typical of DEmust be applied.

4.3. Fitness

The two major parties in grid computing, namely, resource consumers who submit various appli-cations, and resources providers who share their resources, can usually have different motivations

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 9: A distributed evolutionary approach for multisite mapping on grids

1154 I. DE FALCO, U. SCAFURI AND E. TARANTINO

and demands to satisfy when they join the grid. As an example users could be interested in the totalcost to run their application, while providers could pay more attention to the throughput of theirresources in a particular time interval. Thus objective functions may have to incorporate differentgoals.

Use of resources. We make use of the information about the number and the status of both theavailable and the requested resources. On the basis of the grid software architecture depicted inSection 3.2, the former are sent to GRS by the NRS of all the grid nodes and the latter are sent toGAR by the GL of the node on which the application has been submitted.

Denoting with �compi j and �comm

i j , respectively, the computation and communication timesrequested to execute task i on node j it is assigned to, the total time needed to execute i on j is

�i j =�compi j +�comm

i j

It is evaluated on the basis of the computation power and the bandwidth which remain availableonce the current workload has been deducted. Let �s

j be the summation on all the tasks assignedto the j th node by the current mapping. This value is the time spent by node j in executingcomputations and communications of all the tasks of the application under investigation andassigned to it by the proposed solution. Clearly, �s

j is equal to zero for all the nodes not includedin the vector l.

Considering that all the tasks are co-scheduled, the minimum time required to complete theapplication execution is given by the maximum value among all the �s

j . Then, the fitness function is

�(l)= maxj∈[1,N ]

{�sj } (2)

The goal of the evolutionary algorithm is to search for the smallest value among these maxima,i.e. to find the mapping which minimizes the maximal time interval which each node dedicates tothe execution of the tasks assigned to it.

If during the DE generation of new individuals the offspring has the same fitness value as itsparent, then the individual is selected for which the quantity

�∗(l)=N∑

j=1�s

j (3)

is greater. Such a quantity represents the total amount of time dedicated by the grid to the executionof the application. Obviously, this mechanism takes place also for the selection of the best individualin the population. Such a choice aims at favouring mappings that exploit best the shared resources,i.e. deployments that do not use for each task the most powerful nodes if, due to task overlapping,their employment does not reduce the above maximal time interval. This avoids keeping busywithout benefits the most powerful nodes which could be more profitably exploited for furtherapplications.

It should be noted that, though the fitness values of the proposed mappings are not related tothe completion time of the application, � and �∗ can be seen respectively as the lower and upperbounds of the application execution time.

The pseudocode of our DE for mapping is shown in the following Algorithm 1 in which theparticular case of the DE/rand/1/bin strategy with a fixed number of generations is outlined.

5. THE DISTRIBUTED ALGORITHM

Our DDE algorithm is based on the classical coarse–grained approach to Evolutionary Algorithms[74] in which a collection of networked subpopulations cooperate on the solution of a problemby a migration operator used to exchange individuals among them. It consists of a locally linkedstrategy, known as a stepping stone-model [75], in which each DE instance is connected to a numberof instances according to the degree of connectivity of the topology beneath. Each subpopulation

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 10: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1155

Algorithm 1RequireInformation on number, features and estimated average local load of grid nodesInformation on number and requirements of parallel application tasksControl parameters: M, F , C R and pmStopping condition: number of generations gInitialization:Initialize a population X = (x1, · · · , xM)evaluate the fitness function � for any individuals xiStart the optimization:while (maximal number of generations g is not reached) do

for i =1 to M dochoose a random real number psm ∈ [0.0,1.0]if (psm < pm ) (where pm is the site mutation probability)

apply site mutation to the current individual xi to generate the new one xi ′else

choose three integers r1, r2 and r3 in [1,M], different from one another and from ichoose an integer number s in [1,q]for j =1 to q do

choose a random real number �∈ [0.0,1.0]if ( (�<C R) OR ( j =s) )

xi ′j = xr3 j + F ·(xr1 j −xr2 j )

elsexi ′j = xi j

endifendfor

endifevaluate the fitness function �(xi ′ )if �(xi ′ )≤�(xi )

insert xi ′ in the new populationelse

insert xi in the new populationendif

endforendwhileOutputBest mapping of the application tasks onto grid nodes

is, thus, kept relatively ‘isolated’ from all the others in the sense that it can communicate withthem only through its neighbours.

Decisions must be taken for the migrant selection, i.e. the choice of the elements to be sent,and replacement, i.e the individuals to be replaced by the arriving migrants. Different strategiescan be devised: the migrants can be selected either according to better fitness values or randomly,and they might replace the worst individuals or substitute them only if better, or they might finallyreplace any individual (apart from the very best ones, of course) in the neighbouring subpopulation.Consistent with the biological events, it was noted that the number of migrants should not be highand the migration should occur after a period of stasis, otherwise the subsearch in a subpopulationmight be very perturbed by these continuously incoming elements [74].

This mechanism allows attaining both exploration and exploitation, which are the basic featuresfor a good search. Exploration means to wander through the search space so as to prevent prematureconvergence to local optima. Exploitation implies that one area is thoroughly examined, so thatwe can be confident to state whether or not this area is promising. In such a way, good solutionswill spread within the network with successive diffusions, so more and more processors willtry to sample that area (exploitation), and, on the other hand, there will exist at the same timeclusters of processors investigating different subareas of the search space (exploration). Therefore,a suitable percentage of migrants each subpopulation sends to its neighbours, called MigrationRate (MR), and an appropriate exchange frequency between neighbouring subpopulations everyMI generations, named Migration Interval, are to be introduced to exploit best the potential ofthis cooperating island model.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 11: A distributed evolutionary approach for multisite mapping on grids

1156 I. DE FALCO, U. SCAFURI AND E. TARANTINO

Figure 2. The ring topology.

Within this general framework we have implemented a distributed version for DE, which consistsof a set of classical DE schemes, running in parallel, assigned to different processing elementsarranged in a ring topology as shown in Figure 2, in which each generic DE instance (in black)has exactly two neighbouring communicating subpopulations (in grey).

This distributed algorithm is the procedure activated by the G M tool, described in Section 3.2,to find the mapping solution.

It is important to remark here that implementing a parallel or a distributed version of a DE isnot just a matter of speeding up execution. In fact it is well known in the field of evolutionaryalgorithms that a parallel or a distributed version of such an algorithm can reach final solutionsdifferent from those obtained by the corresponding sequential version, where different does notnecessarily mean better. This is due to the above described interactions among the subpopulationswhich determine a completely different dynamic behaviour in the distributed case with respect tothe sequential one. Thus to evaluate the effectiveness of a distributed version of an evolutionaryalgorithm it is important to take both the classical speedup and the final solution quality intoaccount.

6. EXPERIMENTS AND FINDINGS

In our experimental environment the grid arrangement is designed so as to simplify the checking ofthe proposed mapping solutions. In particular, by choosing conveniently the computing capabilitiesand the communication bandwidths of the grid nodes, and their load conditions, it is possible insome test cases, by planning simulations with suitable application requirements, to compute optimalor suboptimal solutions. The different tests are conceived to allow a simple comparison between acalculation by hand and the solution provided by the mapping tool. Moreover, the development ofthese applications is carried on to guide the readers to understand how the evolutionary mappingis able to adapt to the changes in grid operating conditions and in task requirements. Once verifiedthe effectiveness of the tool in particular situations, we can be confident of its ability to performwell also when this comparative analysis is impracticable since the optimal or suboptimal solutionsare difficult to deduct.

Our simulations refer to a grid structure, outlined in Figure 3, which aggregates 280 nodessubdivided into seven sites denoted A, B, C , D, E , F and G with 48, 48, 24, 56, 40, 32 and 32nodes, respectively. All the sites, with the exception of C , are composed of two or three sub-sitesin each of which we have a cluster with different number of nodes and performance.

Without loss of generality we suppose that each cluster is made up of homogeneous nodes. Theinternal numbers in Figure 3 denote the cluster size and the nominal power � expressed in termsof millions of instructions per second (MIPS), respectively. For example the site E is made upof two sub-sites E1 with a cluster consisting of 24 nodes with �=600, and E2 composed of 16nodes with �=500. It can be noticed that the power of the clusters decreases from 3000 to 100when going from A1 to G2.

Hereafter we indicate the nodes by means of the external indices shown in Figure 3, so that,for instance, 27 is the third node in A2 while 98 is the second node in C . The correspondencebetween the node indices and the sites they belong to is also reported in Table I.

We have hypothesized three communication bandwidths: bandwidth �i i available when tasksare mapped on the same node (intranode communication), bandwidth �i j between the nodes i andj belonging to the same site or sub-site (intrasite communication) and the bandwidth, still denotedwith �i j , when the nodes i and j belong to different sites or sub-sites (intersite communication).

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 12: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1157

Figure 3. The grid architecture.

Table I. The correspondence between site and node indices.

Sites A1 A2 A3 B1 B2 C D1 D2nodes 1–24 25–40 41–48 49–80 81–96 97–120 121–136 137–160Sites D3 E1 E2 F1 F2 G1 G2nodes 161–176 177–200 201–216 217–240 241–248 249–264 265–280

Table II. Intersite and intrasite bandwidths expressed in Mbit/s.

A1 A2 A3 B1 B2 C D1 D2 D3 E1 E2 F1 F2 G1 G2

A1 100A2 50 100A3 75 75 100B1 4 4 4 200B2 4 4 4 75 200C 6 6 6 6 6 300D1 8 8 8 8 8 8 400D2 8 8 8 8 8 8 50 400D3 8 8 8 8 8 8 75 75 400E1 10 10 10 10 10 10 10 10 10 800E2 10 10 10 10 10 10 10 10 10 100 800F1 20 20 20 20 20 20 20 20 20 10 10 1000F2 20 20 20 20 20 20 20 20 20 10 10 200 2000G1 30 30 30 30 30 30 30 30 30 16 16 30 30 4000G2 30 30 30 30 30 30 30 30 30 16 16 30 30 400 10000

Besides, we presume that all the �i i s have the same very high value (100 Gbit/s) so as to yieldthe intranode communication time negligible with respect to that of the intrasite and intersitecommunications.

For each site, the bandwidth of the output link is equal to that of the input link. In our case theintersite and intrasite bandwidths are reported in Table II.

Note that the intrasite bandwidth increases from 100 to 10 000 when going from A1 to G2 andthe bandwidth between clusters belonging to different sub-sites is lower than the intrasite one. Forexample each node of E1 communicates with a node of E2 with a bandwidth of 100 Mbit/s, whileboth the sub-sites have an intrasite bandwidth equal to 800 Mbit/s.

As each node operates concurrently on local and grid workloads, we assume to know the averageload of available grid resources for the time span of interest in terms of both computation andbandwidth.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 13: A distributed evolutionary approach for multisite mapping on grids

1158 I. DE FALCO, U. SCAFURI AND E. TARANTINO

Table III. The set of the investigated DE operators.

Operators DE1 DE2 DE3 DE4String DE/rand/1/bin DE/rand-to-best/1/exp DE/best/2/exp DE/rand/2/exp

Table IV. Parameter values for the operators.

Operator DE1 DE2 DE3 DE4

F 1.8 0.85 0.5 0.5CR 0.3 1 0.7 0.5

Once defined the grid characteristics and hypothesized three operating conditions for its nodes,i.e. either completely discharged or deterministically or randomly loaded, various applicationsmust be conceived to demonstrate the effectiveness of our evolutionary approach. In particular,applications characterized by task requests ranging from computation-bound to communication-bound passing through a balance between computations and communications, should be considered.Note that in the performed experiments we suppose that the average local load of a node is constantduring the entire execution time of the task allocated to it. Obviously a time-varying load wouldrequire only a different calculation and would not invalidate the approach proposed.

A DDE-based tool able to effectively allocate the tasks of a parallel application on grid nodesdistributed on different sites has been implemented. Because we are not able to perform a compar-ison with other approaches, the experimental activity is finalized to ascertain the degree of effec-tiveness of our parallel evolutionary tool and to evaluate the goodness of the mapping solutionproposed.

Preliminarily, a comparison with respect to a sequential version on four different DE operatorsis carried out. This comparison is effected by parity of the number of evaluations investigatingnot only the best solution but also the average quality of the solutions and the average number ofgenerations needed to attain them. The sequential and the parallel version, which uses the MessagePassing Interface (MPI) [76], are both written in C language. All the experiments for the sequentialalgorithm have been effected on a 1.5 GHz Pentium 4 processor while for the parallel algorithm acluster of 5 such processors, interconnected by a FastEthernet switch, has been employed.

The experimentation has been effected on an application structure with P =48, whose detaileddescription will be outlined later, considering three different simulations characterized by variousnode working conditions: absence of load (Simulation 1), fixed (Simulation 2) and random over-heads (Simulation 3). For each problem and version, 10 executions have been carried out, so as toinvestigate the dependence of the results on the random seed.

Four different DE operators have been compared on both a sequential and a parallel version. InTable III the investigated operators are depicted with the corresponding notation string as describedin Section 4.1.

The parameters of each DDE have been set as follows: M=50, g =10000, pm =0.2, MI =100and MR =1, i.e. the best individual. The choice of the control parameters C R and F is frequentlybased on empirical evidence and practical experience [71]. In this case, to avoid a tuning phase,the control values suggested by the inventors in [77] have been exploited. These values for thedifferent operators are reported in Table IV.

As we have five subpopulations in the parallel approach, the parameters for the correspondingsequential versions, excluding MI and MR not needed, are the same except for M which is equal to250 so as to leave the total number of evolutionary evaluations unchanged. This set of parametersis left unvaried for all the experiments carried out.

In Tables V and VI the results are reported for the sequential and parallel versions as a functionof the different operators. In particular the best found value �b, the number of times this value hasbeen obtained nb, the average best value over the 10 tests effected 〈�〉 and the related standard

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 14: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1159

Table V. Findings for the sequential version.

Sequential

Operator DE1 DE2 DE3 DE4

Simulation 1 �b 96.166 83.35 83.35 83.35nb 2 2 4 6〈�〉 113.818 89.616 86.655 85.044� 13.169 3.515 4.339 3.56〈g〉 4580 7167 5125 6694

Simulation 2 �b 102.835 102.835 102.835 102.835nb 4 4 10 10〈�〉 110.36 112.799 102.835 102.835� 16.662 10.919 0 0〈g〉 5578 8551 5651 7076

Simulation 3 �b 287.347 262.88 243.075 263.408nb 1 1 1 2〈�〉 293.186 275.226 250.723 278.225� 10.798 13.073 4.041 19.023〈g〉 8506 9455 8305 9319

Table VI. Findings for the parallel version.

Parallel

Operator DE1 DE2 DE3 DE4

Simulation 1 �b 83.766 83.291 83.291 83.291nb 2 2 4 4〈�〉 109.244 83.338 83.326 83.326� 19.963 0.026 0.031 0.031〈g〉 3236 4760 6264 6262

Simulation 2 �b 102.835 102.835 102.835 102.835nb 8 10 10 10〈�〉 104.835 102.835 102.835 102.835� 4.471 0 0 0〈g〉 7007 1480 1721 1942

Simulation 3 �b 226.474 214.381 196.247 202.8nb 1 1 1 1〈�〉 275.791 235.735 205.944 219.220� 37.479 23.421 7.58 11.149〈g〉 7624 6730 8395 8530

deviation �, and the average number of generations 〈g〉 to achieve the best solution in any testare outlined.

As can be seen from these tables, the operators DE3 and DE4 have similar behaviour althoughDE3 shows a slightly better performance. It is also clear that the parallel version is more effectivein terms of quality of best and average solutions, standard deviations and average number ofgenerations. Besides, the improvement is related even to the response time. In particular, theexecution of one run of the evolutionary mapper takes 340 s and 110 s for the sequential and parallelversions, respectively, for a speed-up in time equal to 3.1. It is worth noting that these executiontimes are obtained with non-dedicated Pentium IV processors and thus they can be noticeablyreduced using more powerful systems.

This established, for all the successive experimentations we report and comment on only thebest results achieved using the operator DE3 for the parallel version on two application structures,the first of which has been already investigated in the previous comparison phase. Thus, a completeset of tasks ranging from computation-bound to communication-bound has been considered.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 15: A distributed evolutionary approach for multisite mapping on grids

1160 I. DE FALCO, U. SCAFURI AND E. TARANTINO

Figure 4. The folded-torus structure of the application.

6.1. Structure 1

Simulation 1. The first set of experiments was carried out by considering an application made upof P =48 tasks numbered from 1 to 48 and structured as a folded torus (Figure 4).

The tasks depicted with the same colour present the same requirements in terms of computationsand communications. In particular, the tasks in the first and last rows have to execute �k =240 GigaInstructions (GI) and exchange 100 Mbit with the two neighbouring tasks placed on the same row.The tasks in the second and fifth rows have to perform �k =40 GI and exchange 6000 Mbit witheach of the tasks placed on the same row, while the tasks in the remaining two rows must carryout �k =500 Mega Instructions (MI) and exchange 200 Gbit with each of the two neighbouringtasks allotted on the same row. A further communication of 10 Mbit takes place in any couple ofneighbouring tasks deployed along the same column.

For the first simulation it is supposed that all the nodes are completely discharged (�ci (�t)=0.0 ∀i) and all the communication channels are unused (��i j

(�t)=0.0 ∀(i, j)).The best task allocation found by the parallel version is graphically depicted in Figure 5. The number

in each circle denotes the grid node onto which the corresponding application task has been placed.As can be seen the tasks of the first and last rows have been mapped on A1, those of the second

and the fifth rows on E1 and the tasks of the third and fourth rows on G2.Remembering that � is the maximal resource utilization time which represents also the maximal

sum of execution times on a node among all the co-allocated tasks on the basis of the mappingproposed, all the considerations to explain the value of � are effected taking into account theallocation task/node which implies the highest execution time.

The resource utilization time is �=83.291 s and this value represents the optimal one. In fact,it can be noted that all the 16 tasks in the first and last rows requiring the highest amount in termsof computation, i.e. 240 GI, have been mapped on 16 nodes of the sub-site A1 which contains themost powerful nodes. Any other allocations of these tasks would have yielded an execution timehigher than that proposed. For example, if at least one among the nodes with the immediatelyinferior power, i.e. those of A2 (2000 MIPS) was chosen, this time would equal 120 s (240 GI/2000MIPS=120 s). Analogous considerations can be adduced relative to the communication time forthe tasks in the third and fourth rows if the nodes of G2 were not picked.

The value �=83.291 s is obtained by adding the computation time to execute 40 GI on thenodes of E1 (40 GI/600 MIPS=66.666 s), the communication time to exchange 6000 Mbitwith the neighbours on the same row (2(6000Mbit/(800Mbit/s))=15s), the communication timeneeded to exchange 10 Mbit with the neighbours along the column with respect to the upper row(10Mbit/(10Mbit/s)=1.0s) and the lower one (10Mbit/(16Mbit/s)=0.625s).

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 16: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1161

Figure 5. The best task-to-node allocation of the application.

Simulation 2. This simulation has been carried out considering the same application requirementsof the previous case but assuming to have a load in terms of both computations and communications.In particular, all the nodes are discharged except the nodes in [1,5] and [31,48] which have�ci (�t)=0.7, while ��i j

(�t)=0.8 for all the communication channels among the nodes i and j inthe range [271,280]∈G2. For example, considering that the total available bandwidth of each nodein G2 is 10 000 as seen in the results in Table II, this specific bandwidth load is equal to 8000 Mbit.

The best mapping found is

l� =

⎧⎪⎨⎪⎩

17 18 19 20 21 22 23 24 73 74 75 76 77 78 79 80

249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

101 102 103 104 105 106 107 108 9 10 11 12 13 14 15 16

⎫⎪⎬⎪⎭

with �=102.8358 s.The tasks in the first and last rows have been still mapped on A1, those in the second on B1,

the tasks in the third and fourth rows on G1 while the ones in the fifth row on C .However, here the mapping proposed for � appropriately selects 16 out of 19 among the

discharged nodes in A1 (nodes in the range [6,24]).Considering as an example node 252, onto which task 20 is allotted, the execution time is attained

by adding the computation time to execute 500 MI on a node of G1 (500MI/200MIPS=2.5s),the communication time needed for task 20 to exchange 200 Gbit with its two neighbour tasks19 and 21, situated on the same row and mapped, respectively, on nodes 251 and 253 still on G1(2(200Gbit/(4000Mbit/s))=100s), and the time necessary to exchange 10 Mbit along the columnwith task 12 on node 76 of B1 (10Mbit/(30Mbit/s)=0.3333s) and with task 28 on node 260 stillbelonging to G1(10Mbit/(4000Mbit/s)=0.0025s).

In the same way, to deploy the tasks requiring high communication bandwidths, the mappingavoids to choose the nodes of G2 in [271,280], characterized by a bandwidth load equal to 80%,and appropriately selects nodes of G1. It is simple to ascertain that the allocation of these tasks onnodes not belonging to G1 would have yielded a value of � higher than 102.8358 s even if onlythe communications had been considered.

Simulation 3. Once it is established that the system works properly in predetermined conditions,we can test it over more realistic scenarios closer to real situations. With respect to the previoussimulations, here also the task computational requirements of the first row and the communicationneeds of the third and fourth rows are changed. In particular, the tasks in the first row have toperform �k =480 GI while those in the third and fourth rows have to exchange �i j =180 Gbit.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 17: A distributed evolutionary approach for multisite mapping on grids

1162 I. DE FALCO, U. SCAFURI AND E. TARANTINO

Table VII. Average loads for the nodes of A1.

0.11959 0.11954 0.18894 0.34883 0.34001 0.26160 0.16695 0.141030.29403 0.28682 0.33949 0.14205 0.33668 0.15112 0.24191 0.311840.16307 0.32558 0.36197 0.12919 0.189464 0.10624 0.22343 0.15149

The other tasks requirements remain the same as in the above cases. These variations have beenadduced to render the tasks of the first row crucial for evaluating the resource utilization time.This choice aims to facilitate the ‘a posteriori’ estimate of the proposed solutions.

This simulation has been carried out by changing the load in terms of both computations andbandwidth. In particular all the nodes are discharged except those in [1,50] for which �ci (�t) israndomly chosen in the range [0.1,0.4], while the communication channels are all unused withthe exception of the channels relative to the nodes in [151,280] in which ��i j

(�t) is randomlypicked in [0.1,0.9]. Obviously, in these conditions the optimal solution is unknown.

The best deployment achieved is

l� =

⎧⎪⎨⎪⎩

2 12 1 8 20 22 14 24 126 127 131 133 133 134 121 135

274 267 268 266 269 269 271 271 278 275 272 268 272 270 279 280

182 200 184 185 186 187 188 189 25 26 11 37 13 30 23 48

⎫⎪⎬⎪⎭

with �=196.247 s.Owing to the hypothesized computation and bandwidth loads, it is difficult to explain the

provided resource utilization time, as in the previous tests. However, it is possible to report someconsiderations that can help to prove the effectiveness of the mapping. It is worth noting that toattain the reported value of �, the first eight tasks, which have to execute �k =480 GI and thusare the most time-consuming, must be allocated on nodes with an available power not lower than480GI/193.917s=2394,67MIPS. To compute this value, the minimal communication time forthese tasks in the absence of communication load must be preventively subtracted to 196.247 s. Thistime turns out to be equal to 2.33 s, i.e. 2 s to communicate along the row and 0.33 s to exchange10 Mbit along the column supposing the highest intersite bandwidth of 30 Mbit. The above powerneed means that the first eight tasks must be allotted on nodes of A1 with a computational loadlower than 17.49%.

The resulting random loads for the nodes of A1 are shown in Table VII. As can be noted from thetable, only 10 nodes satisfy the above load threshold and, among them, the solution has correctlyselected those with the lowest load which are outlined in bold. This explanation has been effectedwith reference to nodes of A1, as any different deployments of the first eight tasks on nodes notbelonging to this site would have entailed a value of � higher than that proposed even if merelythe execution time for their computation requests had been considered.

6.2. Structure 2

After a tuning phase, for the second structure each DDE instance has M=30 whereas the valuesof all the other parameters involved remain unchanged. This experimental phase has been effectedby considering a master/slave application made up of P =27 tasks: a master denoted with 27 plus26 slaves numbered from 1 to 26 and divided into three groups: G1 arranged in a ring topology of12 tasks, G2 organized in a totally connected architecture made up of 5 tasks and G3 consistingof 9 tasks disposed as a mesh topology (see Figure 6).

The master task 27 has �k =1 GI to execute, 10 Mbit to exchange with task 17 and 10 Gbitwith each of tasks 12 and 26 which are the last tasks of each group.

Based on the amount of computations and communications, the 12 tasks of the ring are furthersubdivided into 3 subgroups each composed of 4 tasks: G11[1,4], G12[5,8] and G13[9,12].As regards the computations, each of the tasks of the first subgroup G11 must execute 500 GI,those of G12 have �k =40 GI and the tasks of G13 must perform �k =300 MI. Concerning

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 18: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1163

Figure 6. The mixed structure of the application.

Figure 7. The best task-to-node allocation of the application.

the communications, the tasks of G11, G12 and G13 exchange 100 Mbit, 9 Gbit and 600 Gbit,respectively, except for the couples of tasks (1,12), (4,5) and (8,9) which represent the first andthe last tasks of the three subgroups of the ring and exchange 10 Mbit.

Each of the five tasks in the totally connected topology has to carry out �k =30 GI and exchanges�i j =9 Gbit with each of the other tasks in its group.

According to the computations and communications to perform, the tasks of the mesh topologyalso are subdivided into three subgroups each constituted of the tasks in the same row: G31[18,20],G32[21,23] and G33[24,26]. The tasks of these three subgroups have �k equal to 500 GI, 40 GIand 300 MI, respectively. Moreover, each task exchanges 10 Mbit with the tasks to which is linkedalong the column while, with its partners along the row, it exchanges 100 Mbit, 9 Gbit and 600Gbit depending on whether it belongs to the first, the second or the third subgroup. The executionof the parallel version of the evolutionary tool takes 80 s.

Simulation 1. In this experiment it is assumed that all the nodes and the communication channelsare completely discharged and, hence, �ci (�t)=0.0 and ��i j

(�t)=0.0 ∀(i, j)∈ [1,280]. The bestallocation found is reported graphically in Figure 7.

The resource utilization time � is equal to 168.8667 s with the master process allocated on node264 of G1.

Considering the supposed amount of computations and the supposed number of communications,the tasks decisive in determining the resource utilization time are those belonging to G11 and G31for the computations and to G13 and G33 for the communications. The value � corresponds to anoptimal solution. In fact the tasks requiring the highest amount of communications are correctlyplaced on nodes of G2 while the computationally heaviest ones are deployed on nodes of A1. Thetasks belonging to G31 determine the resource utilization time. To explicit the evaluation we canrefer to just one of these tasks: � is attained by adding the computation time to execute 500 GI oftask 19 mapped on node 20 of A1 (500GI/3000MIPS=166.6667s), the communication time toexchange 100 Mbit with task 18 on node 8 and with task 20 on node 23 (2(100Mbit/(100Mbit/s)=2s), and finally the time needed to exchange 10 Mbit with task 22 deployed on node 29 of

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 19: A distributed evolutionary approach for multisite mapping on grids

1164 I. DE FALCO, U. SCAFURI AND E. TARANTINO

A2 which exploits the intersite bandwidth of 50 Mbit/s between two sub-sites belonging to A(10Mbit/(50Mbit/s)=0.2s). It is simple to ascertain that any different allocation of the tasks ofthe groups G11([1,4]) and G31([18,20]), which are the heaviest from the computational point ofview, on nodes not belonging to the site A1 would entail an execution time much greater than thatproposed. For instance, if the nodes of A2 were used the execution time due to just the computationwould be equal to 250 s. Analogous considerations can also be adduced for the communicationtime if the tasks of the subgroups G13 and G33, requiring the highest amount of communications,were not placed on the nodes of G2 which have the highest intrasite bandwidths.

Simulation 2. This simulation has been carried out by varying the computational and bandwidthloads. In particular all the nodes are discharged except the nodes in the ranges [1,10] and [166,184]which have �ci (�t)=0.7, while all the node channels are unused apart from those in [271, 280]which present ��i j

(�t)=0.8.The best mapping found is

l� ={

20 17 18 11 45 45 46 46 263 263 263 263 111 112 106 113 109

24 12 21 28 30 30 249 250 250 252

}

with �=168.8667 s and the master process allocated on node 252 of G1.The value of � for the proposed solution is still the minimal one. In fact, since once more the

maximal execution time is determined by task 19 placed on node 12 of A1, the computation time(166.6667 s), the communication overhead to exchange 100 Mbit along the row with task 18 onnode 24 and with task 20 on node 21 both of A1 (2 s), and the communication time to exchange 10Mbit with task 22 allotted on node 30 of A2 (0.2 s) hold the same values as in the previous case.

In this case the tool provides a mapping which excludes the loaded nodes and, among thedischarged ones, as occurred in the former simulation, selects those with the most suitable char-acteristics in terms of both computations and bandwidths, when a sufficient number of such nodesis available. This experiment is useful to demonstrate the general ability of the evolutionary toolto adapt to the status of the grid resources.

Simulation 3. The test has been performed by assuming a random load in terms of both compu-tations and communications. In particular, all the nodes are discharged except the nodes in therange [1,50] for which �ci (�t) is randomly set in [0.01,0.5] and those in [151,280] for which��i j

(�t) is hypothesized to be set randomly in [0,0.5].The best solution is

l� ={

1 14 5 9 90 91 89 90 277 278 279 279 126 127 128 129 136

18 20 21 110 104 105 272 275 267 276

}

with �=188.7065 s and the master process allocated on node 276 of G2.This simulation is very close to real conditions since the loads of nodes and bandwidths are

randomly distributed. Unfortunately, in this situation, it is extremely difficult, if not prohibitive fora user, even if he/she is particularly skilled, to identify the criteria on the basis of which he/shecan perform the placement to attain a suboptimal mapping.

To explain the effectiveness of the solution found we report in Table VIII the values of therandom loads for the nodes of A1 only, to verify that those involved in the solution proposed are themost appropriate ones. In particular the nodes with computational load lower than 11% are outlinedin bold.

There are seven tasks which must execute 500 GI. Thus, with reference to the computation timeonly, to obtain a value lower than �=188.7181 s seven nodes with residual power greater than

Table VIII. Average loads for the nodes of A1.

0.10287 0.01781 0.34801 0.41478 0.10738 0.12005 0.27454 0.158230.01317 0.28179 0.37737 0.39167 0.16647 0.06551 0.16098 0.387130.02767 0.07952 0.44071 0.06635 0.10096 0.41668 0.43331 0.33134

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 20: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1165

2649.454 MIPS, i.e. with an average load lower than 11.684%, have to be selected. As can be notedfrom Table VIII only the nine nodes reported in bold have in percents an average computationalload lower than this threshold.

In the same table the nodes picked for the execution of the tasks belonging to the subgroupsG11[1,4] and G31[18,20] are underlined. It is easy to verify that all these nodes present an averageload lower than 11% and that not all the nodes with the lowest average loads have been selected.This implies that the proposed mapping is a suboptimal solution. When during the evolution anoptimal node is discarded, it is possible that other nodes are also avoided. In our case the solutionhas picked four out of the six available nodes with the lowest average computation load. This factis tied to the fitness function which chooses for the execution of any single task the node with theappropriate power and not simply the most powerful one if this choice does not contribute to areduction in the maximal among the time intervals which each node dedicates to the execution ofthe tasks assigned to it. This leads to a better employment of the grid resources leaving free, whenpossible, the most powerful nodes which could be more profitably used for other applications.Hence, if there is a task whose execution slows down the performance of the whole application,as in our case task 3 on node 5, it is senseless to further speed up the execution time of theother co-allocated tasks belonging to the same application. Thus, considering that, in the startinghypotheses, it is difficult to manually sketch out a good mapping, the nominated solution, on thebasis of the considerations adduced, is a suboptimal one as well.

The value of � is attained considering the amount of computation time relative to the taskexecuted on the node with the highest average load which in our case is task 3 mapped onnode 5 with �ci (�t)=0.10738. The value of 188.7065 s is gained by adding the computationtime (500GI/2,678MIPS=186.7065s), the time to exchange 100 Mbit with task 2 on node 14(100Mbit/(100Mbit/s)=1s) and with task 4 on node 9 (100Mbit/(100Mbit/s)=1s).

7. CONCLUSIONS AND FUTURE WORKS

This paper faces the multisite mapping problem of computationally challenging multi-task applica-tions, modelled as TIGs, onto shared and heterogeneous computational grids by means of a DDEalgorithm. In particular for each application the goal is to minimize the greatest among the timeintervals which the nodes in the mapping solution dedicate to the execution of the tasks assignedto them. The results demonstrate that the approach is effective and show that the parallel algorithmperforms better than the comparably sized sequential version in terms of both quality of solutionand the time to gain it.

A question might arise about the actual ability of a unique mapper to satisfy all the requestscoming from a whole grid. In [78] very interesting statistical information is reported on theapplications submitted to a set of nine well-known grid environments. Namely, reference is madeto parameters, such as the inter-arrival time between successive applications, run time, wait timeand application parallelism. It is interesting to read that on average an application is submitted to agrid every 51 s, yet 90% of them is single processor, so that a grid mapper should be invoked every510 s on average. Moreover, that paper also reveals that around 90% of the submitted applicationshas a task parallelism not higher than 15. Our mapper takes about 80 s to map a 27-task applicationand about 110 s for a 48-task one, hence it is very likely capable of managing a number of requestslike those described in the above report. Nonetheless, the aim here is just to show that a DDE-basedapproach to mapping in grids is effective, but a future goal might also be the design of a multisitemapper model, in which each site runs an instance and the different mappers update the requiredinformation at the same time. This would require a synchronization mechanism.

The future works will also include a dynamic measure of the load of grid nodes and of therelated bandwidth overhead.

Finally, since Quality of Service (QoS) assumes an important role for many grid applications,we intend to enrich our tool so that it will be able to manage multiple QoS requirements as thoseon performance, reliability, bandwidth, cost, response time and so on.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 21: A distributed evolutionary approach for multisite mapping on grids

1166 I. DE FALCO, U. SCAFURI AND E. TARANTINO

REFERENCES

1. Berman F. High-performance schedulers. The Grid: Blueprint for a Future Computing Infrastructure, Foster I,Kesselman C (eds.). Morgan Kaufmann: San Francisco, CA, U.S.A., 1998; 279–307.

2. Schopf JM. Ten actions when grid scheduling: The user as a grid scheduler. Grid Resource Management: Stateof the Art and Future Trends, Nabrzyski J, Schopf JM, Weglartz J (eds.). Kluwer Academic Publishers: Norwell,MA, U.S.A., 2004; 15–23.

3. Mateescu G. Quality of service on the grid via metascheduling with resource co-scheduling and co-reservation.International Journal of High Performance Computing Applications 2003; 17(3):209–218.

4. Qu C. A grid advance reservation framework for co-allocation and co-reservation across heterogeneous localresource management systems. Proceedings of the Seventh International Conference on Parallel Processing andApplied Mathematics (Lecture Notes in Computer Science, vol. 4967), Wyrzykowski R, Chapman BM, Subhlok J,Fernandes de Mello R, Yang TL (eds.). Springer: Berlin/Heidelberg, 2007; 770–779.

5. Elmroth E, Tordsson J. Grid resource brokering algorithms enabling advance reservations and resource selectionbased on performance predictions. Future Generation Computer Systems 2008; 24(6):585–593.

6. Ibarra OH, Kim CE. Heuristic algorithms for scheduling independent tasks on non identical processors. Journalof Association for Computing Machinery 1977; 24(2):280–289.

7. Fernandez-Baca D. Allocating modules to processors in a distributed system. IEEE Transactions on SoftwareEngineering 1998; 15(11):1427–1436.

8. Zhang W, Fang B, Hu M, Liu X, Zhang H, Gao L. Multisite co-allocation scheduling algorithms for paralleljobs in computing grid environments. Science in China Series F: Information Sciences 2006; 49(6):906–926.

9. Long DL, Clarke LA. Task interaction graphs for concurrency analysis. Proceedings of the 11th InternationalConference on Software Engineering, Pittsburgh, PA, U.S.A., 1989; 44–52.

10. Sadayappan P, Ercal F, Ramanujam J. Cluster partitioning approaches to mapping parallel programs onto ahypercube. Parallel Computing 1990; 13:1–16.

11. Braun TD, Siegel HJ, Beck N, Bölöni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B. Acomparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributedcomputing systems. Journal of Parallel and Distributed Computing 2001; 61(6):810–837.

12. Mühlenbein H, Voigt HM. Gene pool recombination in genetic algorithms. Metaheuristics: Theory andApplications, Kelly JP, Osman IH (eds.). Kluwer Academic Publishers: Norwell, MA, U.S.A., 1996; 53–62.

13. Glover FW, Kochenberger GA. Handbook of Metaheuristics, Glover FW, Kochenberger GA (eds.). KluwerAcademic Publishers: Norwell, MA, U.S.A., 2003.

14. Abraham A, Buyya R, Nath B. Nature’s heuristics for scheduling jobs on computational grids. Proceedings ofthe Eighth IEEE International Conference on Advanced Computing and Communication, Cochin, India, 2000;45–52.

15. Xhafa F, Alba E, Dorronsoro B, Duran B. Efficient batch job scheduling in grids using cellular memeticalgorithms. Journal of Mathematical Modelling and Algorithms 2008; 7(2):217–236.

16. Kim S, Weissman JB. A genetic algorithm based approach for scheduling decomposable data grid applications.Proceedings of the International Conference on Parallel Processing (ICPP’04), Montreal, Canada, 2004; 406–413.

17. Bose A, Wickman B, Wood C. Mars: A metascheduler for distributed resources in campus grids. Proceedingsof the Fifth IEEE/ACM International Workshop on Grid Computing (GRID’04), Pittsburgh, PA, U.S.A., 2004;110–118.

18. Song S, Kwok YK, Hwang K. Security-driven heuristics and a fast genetic algorithm for trusted grid jobscheduling. Proceedings of the 19th International Symposium on Parallel and Distributed Processing (IPDPS),Denver, CO, U.S.A., April, 2005; 65.

19. Onwubolu G, Davendra G. Scheduling flowshops using differential evolution algorithm. European Journal ofOperational Research 2006; 171:674–692.

20. Kwok Y-K, Maciejewski AA, Siegel HJ, Ahmad I, Ghafoor A. A semi-static approach to mapping dynamiciterative tasks onto heterogeneous computing systems. Journal of Parallel and Distributed Computing 2006;66(1):77–98.

21. Dorronsoro B, Bouvry P, Cañero JA, Maciejewski AA, Siegel HJ. Multi-objective robust static mapping ofindependent task on grids. Proceedings on World Congress on Computational Intelligence (WCCI), Barcelona,Spain, July 2010.

22. Yu J, Buyya R, Ramamohanarao K. Workflow scheduling algorithms for grid computing. Metaheuristics forScheduling in Distributed Computing Environments, Studies in Computational Intelligence, Xhafa F, Abraham Aet al. (eds.). Springer: Berlin/Heidelberg, 2008; 173–214.

23. De Falco I, Della Cioppa A, Scafuri U, Tarantino E. Multiobjective Differential Evolution for mapping in a gridenvironment. Proceedings on High Performance Computing Conference (Lecture Notes in Computer Science, vol.4782), Perrott R, Dongarra J, Karczewski K, Wasniewski J (eds.). Springer: Berlin, 2007; 322–333.

24. Price K, Storn R. Differential evolution. Dr. Dobb’s Journal 1997; 22(4):18–24.25. Storn R, Price K. Differential evolution—A simple and efficient heuristic for global optimization over continuous

spaces. Journal of Global Optimization 1997; 11(4):341–359.26. Hluchy L, Senar M, Dobrucky M, Viet TD, Ripoli A, Cortes A. Mapping and scheduling of parallel programs.

Parallel program development for cluster computing: Methodology, tools and integrated environments, Advances

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 22: A distributed evolutionary approach for multisite mapping on grids

MULTISITE MAPPING ON GRIDS 1167

in Computation: Theory and Practice, vol. 5, chapter 3, Boriotti S, Dennis D (eds.). Nova Science PublishersInc: Huntington, NY, U.S.A., 2001; 45–68.

27. Kasahara H, Narita S. Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEETransactions on Computers 1984; 33(11):1023–1029.

28. Chan F, Cao J, Sun Y. Graph scaling: A technique for automating program construction and deployment inclusterGOP. Proceedings of the Fifth International Workshop in Advanced Parallel Processing Technologies(Lecture Notes in Computer Science, vol. 2834). Springer: Berlin/Heidelberg, 2003; 254–264.

29. Dong F, Akl SG. Scheduling algorithms for grid computing: State of the art and open problems. Technical Report2006-504, School of Computing, Queen’s University Kingston, Ontario, Canada, 2006.

30. Yu J, Buyya R, Ramamohanarao K. Workflow Scheduling Algorithms for Grid Computing, Metaheuristics forScheduling in Distributed Computing Environments, Xhafa F, Abraham A (eds.). Springer: Berlin, Germany,2008.

31. Munir EU, Li J-Z, Shi S-F, Rassol Q. Performance analysis of task scheduling heuristics in grid. Proceedings ofthe International Conference on Machine Learning and Cybernetics, Hong Kong, 2007; 3093–3098.

32. Izakian H, Abraham A, Snasel V. Comparison of heuristics for scheduling independent tasks on heterogeneousdistributed environments. Proceedings of the International Joint Conference on Computational Sciences andOptimization, Sanya, Hainan, 2009; 8–12.

33. Ricthie G, Levine J. A hybrid ant algorithm for scheduling independent jobs in heterogeneuos computingenvironments. Proceedings of the 23rd Workshop of the UK Planning and Scheduling Special Interest Group(PlanSIG 2004), University College Cork, Ireland, December 2004.

34. Xhafa F, Carretero J, Dorronsoro B, Alba E. A tabu search algorithm for scheduling independent jobs incomputational grids. Computing and Informatics Journal, Special issue on Intelligent Computational Methodsand Models 2009; 28:1001–1014.

35. Yarkhan A, Dongarra JJ. Experiments with scheduling using simulated annealing in a grid environment.Proceedings of the Third International Workshop on Grid Computing (Lecture Notes in Computer Science, vol.2536). Springer: Berlin/Heidelberg, 2002; 232–242.

36. Page AJ, Naughton TJ. Framework for task scheduling in heterogeneous distributed computing using geneticalgorithms. Artificial Intelligence Review 2004; 24:137–146.

37. Carretero J, Xhafa F, Abraham A. Genetic algorithm based schedulers for grid computing systems. InternationalJournal of Innovative Computing, Information and Control 2007; 3(5):1053–1071.

38. Casanova H, Legrand A, Zagorodnov D, Berman F. Heuristics for scheduling parameter sweep applicationsin grid environments. Proceedings of the Nineth Heterogeneous Computing Workshop, Cancun, Mexico, 2000;349–363.

39. Beaumont O, Legrand A, Robert Y. Scheduling divisible workloads on heterogeneous platforms. ParallelComputing 2004; 29(9):1121–1152.

40. Fujimoto N, Hagihara K. A comparison among grid scheduling algorithms for independent coarse-grained tasks.Proceedings of Symposium on Applications and the Internet-Workshops, Tokio, Japan, 2004; 674.

41. Sugavanam P, Siegel HJ, Maciejewski AA, Oltikar M, Mehta A, Pichel R, Horiuchi A, Shestak V, Al-Otaibi M,Krishnamurthy Y, Ali S, Zhang J, Aydin M, Lee P, Guru K, Raskey M, Pippin D. Robust static allocation ofresources for independent tasks under makespan and dollar cost constraints. Journal of Parallel and DistributedComputing 2007; 67:400–416.

42. Wang L, Siegel JS, Roychowdhury VP, Maciejewski AA. Task matching and scheduling in heterogeneouscomputing environments using a genetic-algorithm-based approach. Journal of Parallel and Distributed Computing1997; 47:8–22.

43. Theys MD, Braun TD, Kwok Y-K, Siegel HJ, Maciejewski AA. Mapping of tasks onto distributed heterogeneouscomputing systems using a genetic algorithm approach. Solutions to Parallel and Distributed Computing Problems:Lessons from Biological Sciences, Zomaya AY (ed.). Wiley: New York, NY, U.S.A., 2001; 135–178.

44. Shestak V, Chong EKP, Siegel HJ, Maciejewski AA, Benmohamed L, Wang I-J, Daley R. A hybrid branch-and-bound and evolutionary approach for allocating strings of applications to heterogeneous distributed computingsystems. Journal of Parallel and Distributed Computing 2008; 68(4):410–426.

45. Sinaga JMP, Mohamed HH, Epema DHJ. A dynamic co-allocation service in multicluster systems. Proceedingsof the Tenth Workshop on Job Scheduling Strategies for Parallel Processing (Lecture Nores in Computer Science,vol. 3277), Feitelson DG, Rudolph L, Schwiegelshohn U (eds.). Springer: Berlin, 2005; 194–209.

46. Bucur AID, Epema DHJ. The maximal utilization of processor co–allocation in multicluster systems. Proceedingsof the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, 2003; 60–69.

47. Mohamed HH, Epema DHJ. The design and implementation of the KOALA co-allocating grid scheduler.Proceedings of the European Grid Conference (Lecture Notes in Computer Science, vol. 3470). Springer:Berlin/Heidelberg, 2005; 640–650.

48. Zhang W, Cheng AMK, Hu M. Multisite co-allocation algorithms for computational grid. Proceedings of theParallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, April 2006.

49. Di Martino V, Mililotti M. Suboptimal scheduling in a grid using genetic algorithms. Parallel Computing 2004;30:553–565.

50. Gao Y, Huang JZ, Rong H. Adaptive grid job scheduling with genetic algorithm. Future Generation ComputerSystems 2005; 21:151–161.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe

Page 23: A distributed evolutionary approach for multisite mapping on grids

1168 I. DE FALCO, U. SCAFURI AND E. TARANTINO

51. Hall R, Rosenberg AL, Venkataramani A. A comparison of DAG-scheduling strategies for internet-basedcomputing. Proceedings of the 22nd International Parallel and Distributed Processing Symposium (IPDPS 2007),Long Beach, CA, U.S.A., 2007; 1–9.

52. Topcuoglu H, Hariri S, Wu M. Performance effective and low-complexity task scheduling for heterogeneouscomputing. IEEE Transaction on Parallel and Distributed Systems 2002; 13(3):260–274.

53. Muthuvelu N, Liu J, Soe NL, Venugopal SR, Sulistio A, Buyya R. A dynamic job grouping-based scheduling fordeploying applications with fine-grained tasks on global grids. Proceedings of the Third Australasian Workshopon Grid Computing and e-Research, Australia, 2005.

54. Cao H, Jin H, Wu X, Wu S, Shi X. DAGMap: Efficient and dependable scheduling of DAG workflow job inGrid. The Journal of Supercomputing 2010; 51:201–223.

55. Dang DM. Mapping heavy communication workflows onto grid resources within an SLA context. Lecture Notesin Computer Sciences, vol. 4208, 2006; 727–736.

56. Dang DM, Hsu DF. Mapping heavy communication grid-based workflows onto grid resources within an SLAcontext using metaheuristics. International Journal of High Performance Computing Applications 2008; 22(3):330–346.

57. Aziz A, El-Rewini H. Grid resource allocation and task scheduling for resource intensive applications. Proceedingsof the International Conference on Parallel Processing, Columbus, OH, U.S.A., 2006; 58–65.

58. Hui C-C, Chanson ST. Allocating task interaction graphs to processors in heterogeneous networks. IEEETransactions on Parallel and Distributed Systems 1997; 8(9):908–925.

59. Chaudhary V, Aggarwal JK. A generalized scheme for mapping parallel algorithms. IEEE Transactions on Paralleland Distributed Systems 1993; 4(3):328–346.

60. Jain A, Sanyal S, Das SK, Biswas R. FastMap: A distributed scheme for mapping large scale applications ontocomputational grids. Proceedings of the Second International Workshop on Challenges of Large Applications inDistributed Environments, Arlington, TX, U.S.A., 2004; 118–127.

61. Baraglia R, Ferrini R, Ricci L, Tonellotto N, Yahyapour R. A launch–time scheduling heuristic for parallelapplication on wide area networks. Journal of Grid Computing 2008; 6:159–175.

62. Sullivan III T, Werthimer D, Bowyer S, Cobb J, Gedye D, Anderson D. A new major SETI project based onProject Serendip, data and 100 000 personal computers. Proceedings of the Fifth International Conference onBioastronomy, Cosmovici CB, Bowyer S, Werthimer D (eds.). Compositori: Bologna, Italy, 1997.

63. Ki-Hyung K, Sang-Ryoul H. Mapping Cooperating Grid Applications by Affinity for Resource Characteristics(Lecture Notes in Computer Science, vol. 3397). Springer: Berlin, 2005; 313–322.

64. Wolski R, Spring N, Hayes J. The network weather service: a distributed resource performance forecasting servicefor metacomputing. Future Generation Computer Systems 1999; 15(5–6):757–768.

65. Gong L, Sun XH, Waston E. Performance modeling and prediction of non-dedicated network computing. IEEETransactions on Computer 2002; 51(9):1041–1055.

66. Sanjay HA, Vadhiyar S. Performance modeling of parallel applications for grid scheduling. Journal of Paralleland Distributed Computing 2008; 68:1135–1145.

67. Foster I, Kesselmann C, Tuecke S. The anatomy of the Grid: Enabling scalable virtual organization. InternationalJournal of Supercomputer Applications 2001; 15(3):200–222.

68. Foster I, Kesselmann C. The Grid 2: Blueprint for a New Computing Architecture. Morgan Kaufmann: SanFrancisco, CA, 2003.

69. Adzigogov L, Soldatos J, Polymenakos L. Emperor: An OGSA grid meta-scheduler based on dynamic resourcepredictions. Journal of Grid Computing 2005; 3(1–2):19–37.

70. Taylor V, Wu X, Geisler J, Li X, Lan Z, Hereld M, Judson I, Stevens R. Prophesy: An infrastructure forperformance analysis and modeling of parallel and grid applications. ACM SIGMETRICS Performance EvaluationReview 2003; 30(4):13–18.

71. Nobakhti A, Wang H. A simple self-adaptive Differential Evolution algorithm with application on the ALSTOMgasifier. Applied Soft Computing 2008; 8:350–370.

72. Das S, Konar A, Chakraborty UK, Abraham A. Differential Evolution with a neighborhood-based mutationoperator: A comparative study. IEEE Transactions on Evolutionary Computation 2009; 13(3):526–553.

73. Kromer P, Snasel V, Platos P, Abraham A, Izakian H. Scheduling independent tasks on heterogeneous distributedenvironments. Proceedings of the International Conference on Intelligent Networking and Collaborative Systems(InCos 2009), Barcelona, Spain, 2009; 170–174.

74. Cantú-Paz E. A summary of research on parallel genetic algorithms. Technical Report 95007, University ofIllinois, Urbana-Champaign, IL, U.S.A., July 1995.

75. Mühlenbein H. Evolution in time and space—The parallel genetic algorithm. Foundation of Genetic Algorithms,Rawlins GJE (ed.). Morgan Kaufmann: San Mateo, CA, 1992; 316–337.

76. Snir M, Otto S, Huss-Lederman S, Walker D, Dongarra J. MPI: The Complete Reference—The MPI Core, vol. 1.MIT Press: Cambridge, MA, 1998.

77. Storn R. Available at: http://read.pudn.com/downloads76/sourcecode/math/279783/de36.c__.htm [24 October2010].

78. Iosup A, Li H, Jan M, Anoep S, Dumitrescu C, Wolters L, Epema DHJ. The Grid Workloads Archive. FutureGeneration Computer Systems 2008; 24(7):672–686.

Copyright � 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:1146–1168DOI: 10.1002/cpe