Transcript
Page 1: A survey of task mapping on production grids

37

A Survey of Task Mapping on Production Grids

XAVIER GREHANT and ISABELLE DEMEURE, Institut Telecom, Telecom ParisTech,CNRS, LTCISVERRE JARP, CERN openlab

Grids designed for computationally demanding scientific applications started experimental phases ten yearsago and have been continuously delivering computing power to a wide range of applications for more thanhalf of this time. The observation of their emergence and evolution reveals actual constraints and successfulapproaches to task mapping across administrative boundaries. Beyond differences in distributions, services,protocols, and standards, a common architecture is outlined. Application-agnostic infrastructures built forresource registration, identification, and access control dispatch delegation to grid sites. Efficient task map-ping is managed by large, autonomous applications or collaborations that temporarily infiltrate resourcesfor their own benefits.

Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems;C.4 [Computer Systems Organization]: Performance of Systems—Design studies; K.6.0 [Managementof Computing and Information System]: General

General Terms: Design, Management, Performance, Reliability

Additional Key Words and Phrases: Computing Grids, Resource Allocation, Task Scheduling, Task Mapping,Grid Architecture, Resource Utilization, Meta-Scheduling, Pull Model, Late Binding

ACM Reference Format:Grehant, X., Demeure, I., and Jarp, S. 2013. A survey of task mapping on production grids. ACM Comput.Surv. 45, 3, Article 37 (June 2013), 25 pages.DOI: http://dx.doi.org/10.1145/2480741.2480754

1. INTRODUCTION

1.1. Scope

The focus of this article is on task mapping in production grids. The historical perspec-tive aims to help identify fundamental constraints behind grid resource allocation andthe logic of the evolution. This survey is written as a bibliographical basis for furtherresearch towards efficiency in grids.

In this section, we define important concepts and specify the limits of our study.

Definition 1 (Grid [Foster et al. 2001]). Grids coordinate resource sharing and prob-lem solving in dynamic, multi-institutional virtual organizations.

Authors’ addresses: X. Grehant and I. Demeure, Telecom ParisTech, 46 rue Barrault, 75013 Paris, France;S. Jarp, CERN openlab, CH-1211 Genve 23, Switzerland; corresponding author’s email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrights forcomponents of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of thiswork in other works requires prior specific permission and/or a fee. Permissions may be requested fromPublications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© 2013 ACM 0360-0300/2013/06-ART37 $15.00

DOI: http://dx.doi.org/10.1145/2480741.2480754

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 2: A survey of task mapping on production grids

37:2 X. Grehant et al.

Originally, virtual organizations federate individuals to participate in grids.

Virtual Organizations enable disparate groups of organizations and/or in-dividuals to share resources in a controlled fashion, so that members maycollaborate to achieve a shared goal [Foster et al. 2001].

In practice, resource users are distinguished from resource providers.

Definition 2 (Virtual Organization). A virtual Organization (VO) is a collaborationof individual users, perceived from the outside as a single user because they identifyas such, who run the same applications and have common objectives.

Definition 3 (Grid Site). A grid site is a set of grid resources under a single admin-istrative domain, including computing and storage resources.

An institution or organization controls the management of its own sites, includingconfiguration, access control, resource allocation mechanisms, and policies.

Definition 4 (Production Grid). A production grid is a solution implemented andused for the execution of large-scale applications from independent collaborations (VOs)on resources that span independent institutions (grid sites).

Definition 5 (Task Mapping [Kim et al. 2007]). An important research problem ishow to assign resources to tasks (match) and order the execution of tasks on theresources (schedule) to maximize some performance criterion of an heterogeneouscomputing system. This procedure of matching and scheduling is called mapping orresource allocation.

The mechanisms developed for resource allocation include dynamic schedulingand configuration, market simulations, feedback control, and a variety of heuristics[Milojicic et al. 2000; Ricci et al. 2006; Chen and Zhang 2009]. This survey identifiesthe mechanisms used in production grids. The analysis of production grid designs, theiremergence, and evolution helps apprehend the fundamental constraints that shouldbe taken into account when pursuing research on grid resource allocation for compu-tationally demanding applications.

The article is of limited relevance to connected areas.

—Clusters. The term grid is often misused to designate clusters inside a single orga-nization. A cluster is a set of connected computers inside the same administrativedomains. Definition 1 eliminates the confusion.

—Research Grids. Research grids aggregate resources from research teams in dis-tributed systems. Research teams lend slices of their own local resources to thecommunity. In return, they access distributed slices from other members to testtheir systems on large-scale testbeds. PlanetLab, Grid5000, and FutureGrid belongin this category [Robert 2003; Fiuczynski 2006; Crowcroft et al. 2008]. The distinc-tion between research and production grids is not a judgement on the readiness oftheir resource allocation systems. The allocation systems of PlanetLab and Grid5000(OAR) are both production quality and under use 24/7 [Peterson et al. 2005; Georgiouand Richard 2009]. Users of a research grid experiment with the deployment of net-worked services, while users of a production grid run computations. The allocationproblem in research grids is not covered in this survey.

—Prototypes. The goal is to capture constraints that govern grids in order for the readerto build insight on the relevance of research hypotheses in this area. Therefore, theanalysis does not cover grid prototypes, that is, solutions proposed as part of researchon grids, and not used in production.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 3: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:3

—Desktop grids. A desktop grid (such as SETI@home or any system based on BOINC)is the distribution of computations from a single application to personal computers[Anderson 2003]. Task mapping faces less constraints on a desktop grid than ona grid with many users and applications, and where resource providers are entireinstitutions. However, the reader familiar with desktop grids may understand thatsome elements of this analysis also apply to their context.

—Enterprise clouds. Companies such as Amazon Web Services sell the access to com-puting resources. Every paying user obtains dedicated and isolated virtual storagedevices and processors, which she manages just like her own cluster. In grids, theresponsibilities of resource users and providers are not so well distinguished. At theend of the article, we see how grids are evolving towards a closer relationship tocomputing clouds.

1.2. Outline

The remainder of this article is organized as follows. Section 2 describes the aggregationof multiple sites into a grid. Distribution of control introduces a number of challenges,such as security, user identification, information flow, seamless resource integration,and central monitoring. Initially, grid designs were focused on solving these issuesrather than on optimizing computing performance. Section 3 identifies constraintson allocation and their impact on performance. To improve performance, major userscentralize control on temporarily accessed resources. Section 4 presents this trend andcharacterizes the resulting environments on which efficient allocation can take place.

2. FEDERATING RESOURCES

Grid infrastructures take on a distinctive challenge. They aggregate resources fromdifferent institutions to run multiple applications from multiple users. Section 2.1presents major grids, their applications, and their participants, and Section 2.2presents the motivation of their federation in grids. They share workload constraints(Section 2.3) and parallelization methods (Section 2.4). The initial idea was to replicatethe structure of cluster management systems (Section 2.5) to face the same problemsin the wide area (Section 2.6).

2.1. Infrastructures

Grids started an initial exploratory phase in 2000 [Ruda 2001]. Production grids weredesigned to serve as large-scale computing resources for computationally demandingscientific workloads. Around 2004, they started to be used continuously for a widerange of applications. In production grids, academia or other public-funded institutionsvoluntarily offer a certain amount of their computing resources to external scientificprojects.

A few grid infrastructures scale to tens of thousands of nodes. They aggregate com-puting, data warehousing, and networking facilities from several institutions. Theydistribute middleware to operate resources and manage users and applications. Grid-specific components are often open-source academic projects.

The following is a non-exhaustive list of grid infrastructures.

—LCG. LCG is the computing grid for the Large Hadron Collider, CERN particle ac-celerator in Switzerland [Andreetto 2004; Codispoti et al. 2009]. It aggregates sitesmainly in Europe, but also in Taiwan and Korea. It groups together over 41,000 CPUsfrom 240 sites. The LCG is supported by the European Commission and more than90 organizations from over 30 countries. Participating regions identify their contri-butions separately like GridPP in the U.K. [the GridPP Collaboration 2006]. LCGresources also support applications from different scientific domains. It forms a

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 4: A survey of task mapping on production grids

37:4 X. Grehant et al.

general-purpose European grid. EGEE (Enabling Grids for E-sciences) carried outits coordination at CERN from 2003 to 2009, and EGI (European Grid Initiatives), afederation of national projects, from 2010.

—OSG. The Open Science Grid started in 2005. It is funded by U.S. LHC software andcomputing programs, the National Science Foundation (NSF), and the U.S. Depart-ment of Energy. It continued Grid3, started in 2003 [Avery 2007].

—TeraGrid. TeraGrid started in 2001 with funds from the NSF to establish a Dis-tributed Terascale Facility (DTF) [Pennington 2002; Hart 2011]. It brings togethernine major national computer centers in the U.S. It is deployed on several teraFLOPSsystems, including a petaFLOPS system. TeraGrid is continually increasing in ca-pacity thanks to NSF grants.

—NorduGrid. NorduGrid was funded in 2001 by the NORDUNet2 program to builda grid for countries in northern Europe based on the ARC middleware [Ellert et al.2007]. The NORDUNet2 program aimed to respond to the “American challenge” ofthe Next Generation Initiative (NGI) and Internet2 (I2). NorduGrid provides around5,000 CPUs over 50 sites.

Each of these grids is distinctive by the hardware resources integrated, by the orga-nizations supplying these resources, by the projects supported, and by the middleware.

These infrastructures must not be confused with software development projects likeGridbus1, Globus2, VDT3, Unicore [Streit et al. 2010], and Naregi Grid Middleware[Miura 2006; Sakane et al. 2009], which distribute consistent sets of grid middlewarecomponents and are active in the standardization effort [Foster et al. 2002]. Gridinfrastructures use and redistribute some of these components [Asadzadeh et al. 2004].

2.2. Objectives

Participants in grids usually have one of the following intents. They either want toconsolidate CPU cycles or avoid data movement.

Consolidated resources from multiple institutions make it possible to solve problemsof common interest, where the efforts of a single institution would require unreasonableexecution time. The resolution of NUG30, a quadratic assignment problem, illustratesthe use of a grid for high performance computing (HPC) [Anstreicher et al. 2000; Gouxet al. 2000]. The intent in HPC is to maximize the execution speed of each application.However, supercomputers are better suited for HPC in general. They offer a lowerlatency between computing units and schedule processes at a lower level.

By contrast with HPC, high throughput computing (HTC) systems intend to maxi-mize the sustained, cumulative amount of computation executed. A grid consolidatesresource allocation from multiple institutions and accepts applications from exter-nal collaborations. Its overall throughput is potentially higher than the sum of thethroughputs of participating institutions acting separately. However, an improvementis realized only if allocation is sufficiently accurate and responsive.

Grids have been prominently driven by the will to analyze unprecedented amountsof data. High energy physics (HEP) experiments at CERN, the European Center forNuclear Research, and Fermilab, an American proton-antiproton collider, attract col-laborations of particle physicists who account for most users of LCG, OSG, TeraGrid,and NorduGrid [Terekhov 2002; Graham et al. 2004]. They search for interesting eventsin the vast amount of data generated by detectors. The dozens of petabytes of data gen-erated by the Large Hadron Collider (LHC) are entirely replicated on a few primary

1gridbus.org.2globus.org.3Virtual Data Toolkit: vdt.cs.wisc.edu.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 5: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:5

sites and partially on secondary sites to avoid further data movement. Grids are pri-marily interesting for distributed data analysis. Computations run on the computercenters that store the data to avoid transferring data to every individual user. The gaindepends on the capability to appropriately distribute data in the first place and to maptasks according to data location.

An increasing number of non-HEP applications are now using production grids [Linand Yen 2010]. More data movement is expected in their case, as most of the CPUsaccessible on existing infrastructures still coincide with HEP data centers. However,grids provide affordable and scalable resources to computationally intensive projects,because they are backed by publicly funded institutions willing to support science ingeneral. From the very beginning, non-HEP use cases have received a lot of interest inthe grid community and; have helped make grids cross-disciplinary by design [Kasamet al. 2007; Kranzlmuller 2009].

2.3. Applicable Workloads

In order to characterize the candidate applications to be ran on a grid, we introduce afew definitions.

Definition 6 (Divisible [Blazewicz et al. 1999]). A divisible task is a computationwhich can be divided with arbitrary granularity into independent parts solved in par-allel by distributed computers.

Definition 7 (Embarrassingly Parallel [Harwood 2003]). A problem of size N is em-barrassingly parallel if it is quite easy to achieve a computational speedup of N withoutany interprocess communication.

Definition 8 (Partially Data Parallel [Harwood 2003]). A partially data parallelproblem divides the input data into a number of completely independent parts. Thesame computation is undertaken on each part. It may require pre- and postprocessingand redundant computations to avoid communication.

Divisible, embarrassingly parallel applications are most relevant to grids, becausethey can be parallelized with appropriate granularity and do not generate communica-tion overhead [Glatard et al. 2007b; Legrand et al. 2006]. More precisely, partially dataparallel applications are the most frequent. For instance, a sky map is decomposedinto regions of arbitrary sizes for parallel analysis [Annis et al. 2002]. The output of aparticle physics detector is decomposed into sets of events, that is, particles created atthe collision point.

2.4. Job: The Element of an Application

Definition 9 (Job [Savva et al. 2005]). A computational job is a uniquely identifiabletask, or a number of tasks running under a workflow system, which may involve theexecution of one or more processes or computer programs.

In practice, resource allocation or task mapping (Definition 5) consists in placing jobs oncomputers, typically general-purpose desktop computers or batch servers. VO membersmay prepare jobs manually. As an alternative, VOs port their applications to the gridwith application-specific frameworks that spawn jobs [Machado 2004].

An important part of the job description is the required resources, for example,CPU requirements. Requirements may specify the operating system flavor, applicationsoftware, and data. A job does not come with substantial data. If the required data isnot already present on the node, the job placement triggers its transfer from a remotestorage.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 6: A survey of task mapping on production grids

37:6 X. Grehant et al.

Fig. 1. Generic batch system.

2.5. On Single Administrative Domains

This section looks at local resource management systems. Such systems manage gridsites’ internal resources. Grids initially attempted to replicate their principles.

Pure batch systems fundamentally differ from systems designed with interactiveexecution in mind. The former submits jobs while the latter connects to resources.

2.5.1. Batch Job Submission. Batch schedulers use mature techniques to essentiallymanipulate job queues [Castagnera et al. 1994]. They dispatch jobs to balance theload between execution nodes and order them according to precedence policies. In thefollowing, we list some of the most popular batch scheduler systems.

—LSF. Load Sharing Facility started with Utopia, whose authors created PlatformComputing, a company with now 360 employees [Zhou et al. 1993].

—PBS. Portable Batch System is the flagship product of Altair, a 1,200-employee corpo-ration. It was originally developed at NASA in 1993 [Humphrey 2006]. PBS proposesdifferent scheduling policies implemented in two different systems, Torque and Maui,and a language to configure them [Bode et al. 2000].

—OGE. The Distributed Queuing System (DQS) was initially developed at FloridaState University in 1993. In 2000, Sun acquired all rights on its commercial versionand renamed it Sun Grid Engine (SGE). When Oracle acquired Sun in 2009, SGEbecame Oracle Grid Engine (OGE).

Figure 1 shows the standard features of a generic batch system. A master assignsjobs to queues depending on priorities and expected execution times. When actualexecution times are very different from expectations, the master reorders jobs withinand between queues. After a job starts to execute, the master can still check-point andmigrate it if the job or the operating system provides the capability.

2.5.2. Connection to Resources. Condor is a particular resource management system.Widely used for resource allocation in clusters, its principles contrast with those ofother cluster management systems. Condor does not push jobs to nodes. Instead, itconnect users and resources.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 7: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:7

Fig. 2. Condor.

Condor started in the 1980’s with the intent of selecting idle CPUs for use by activeusers.

Figure 2 shows how this is done. When some resource is needed, a ClassAd4 isemitted. On the other hand, when a CPU is idle, it also emits a ClassAd. A Collectorcollects ClassAds for the Matchmaker to find best matches. The Schedd, for SchedulerDaemon, is the process that sends the user ClassAd. The Startd, for Start Daemon,sends the resource ClassAd. When a match is decided, the Scheduler notifies the Scheddwith the addresss of the Startd. The Schedd and the Startd spawn two processesresponsible for the communication [Raman et al. 1998].

Some grid sites use Condor as an alternative to batch schedulers. Meta-schedulersuse Condor matchmaking to match job requirements with grid sites capabilities [Thainet al. 2002].

2.6. Allocation Concerns

In general-purpose distributed computing, resource allocation includes the manage-ment of resource reservation, co-allocation, interprocess communication, priorities, de-pendencies, interactive jobs, and process migration [Casavant and Kuhl 1988]. Gridsmust reinvent new ways to address these concerns.

—Resource Reservation. Given that a central resource management system cannotpervade grid sites, reserving resources from the grid eventually means reservingresources from a grid site. Negotiation and reservation protocols exist [Czajkowskiet al. 2002]. However, the general consensus is that, in order to avoid intrusion,resource providers only offer best effort, that is, no guarantees. Guarantees withlimited intrusion to resource owners would require dynamic reallocation inside andacross grid sites [Cherkasova et al. 2006].

4From Classified Advertisement.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 8: A survey of task mapping on production grids

37:8 X. Grehant et al.

—Co-allocation. Job co-allocation, that is, the co-allocation of several jobs to the samenode in order to improve resource utilization, is subject to grid sites policies. Resourceco-allocation, that is, the synchronous allocation of several nodes to a set of jobs, wouldrequire reservation mechanisms [Czajkowski et al. 1999].

—Interprocess Communication (IPC). IPC occurs if jobs communicate at runtime. Thiswould require resource simultaneous allocation and a communication path with lowlatency between execution nodes.

—Priorities. Institutions prioritize the projects they support. They regulate prioritiesat the level of their local schedulers and by declining job submissions based on theissuer’s identity.

—Dependencies. Some tasks use the result of others. Task dependency graphs can bespecified for bulk submission on single administrative domains. To execute dependentjobs on multiple sites, users must wait for a job completion before submitting a newjob that requires its result [Deelman et al. 2005].

—Interactive Jobs. Some tasks require inputs from the application or the user to pro-ceed. Condor supports this kind of communication. However, grid sites restrict con-nections initiated from the outside. In addition, users cannot in general reserveresources or predict execution windows. Therefore, the job must initiate all commu-nications.

—Process Migration. Condor, LSF, or a grid site operator can migrate running jobsinside a single site if the operating system or the job provides the capability [Milojicicet al. 2000]. Production grids do not support migration by users because it would bean intrusion to the resource provider. They do not support migration across grid sitesbecause not all sites support migration. In general, if a resource is preempted, jobsare simply discarded.

All these concerns are solved on local clusters. On grids, additional constraints maketheir resolution more complex: latency between sites is inherently high; access to data isnot uniform; the development of the infrastructure is separated from the developmentof applications; and grid sites are independent administrative domains. Each of thesefour constraints would require proper investigation. The following section shows theparticular importance of the grid site’s administrative independence.

3. DISRUPTIONS TO RESOURCE ALLOCATION

This section explains how the independence of resource providers determines a de factochoice for a resource allocation model. Despite ideals of ubiquitous resource presenceinspired by power grids, it appears that computing grids emerged with only basic batch-allocation capabilities. They revive the history of computer systems at a different scale[Ceruzzi 1994].

Before addressing resource allocation, grids are shaped by other concerns that resultfrom the independence of grid sites: the need for a site to screen users, apply its ownpolicies, and integrate its own systems and resources. These determinants suffice todraw a simple picture of the level of freedom left to resource allocation.

Section 3.1 shows how the independence of grid sites forces job management dele-gation. Delegation breaks the continuity of job control and its connection to the user(Section 3.2). In addition, it introduces unpredictable delays (Section 3.3). To ensureinteroperability of all grid sites, only the minimal control remains supported by alllocal management systems (Section 3.4).

3.1. Delegation

Global resource allocation is disrupted by grid site autonomy.In spite of their different nature, production grids reproduce the central control

designed for single administrative domains. Under a single administrative domain, a

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 9: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:9

Fig. 3. Delegation of job submission to a Globus Resource Allocation Manager (GRAM).

central scheduler assigns each job to a node (see Section 2.5.1). Similarly, in a grid,a central meta-scheduler or broker assigns each job to a grid site [Czaijkowski et al.2004].

Definition 10 (Broker [Asadzadeh et al. 2004]). A grid resource broker is a servicewith which the end users interact and that performs resource discovery, scheduling,and the processing of application jobs on the distributed grid resources.

The authority of the meta-scheduler ends at the boundaries of independent resourceproviders. Since interference is proscribed, resource aggregation is reduced to a simplesuperposition of grid sites.

On a grid site, a gatekeeper screens jobs and their owners identity.

Definition 11 (Gatekeeper [Gra 2002]). The gatekeeper is a process that exists beforeany request is submitted. When the gatekeeper receives an allocation request from aclient, it mutually authenticates with the client, maps the requester to a local user,starts a job manager on the local host as the local user, and passes the allocationarguments to the newly created job manager.

Grid sites screen jobs because they only accept jobs submitted by trusted users, andprovide differentiated service levels to different users. Job allocation is simply delegatedby the broker to a grid site via its site gatekeeper, because this is the simplest way tolet grid sites screen jobs and enforce their own access policies.

In most infrastructures, authentication and delegation are processed by the GlobusResource Allocation Manager (GRAM, Figure 3) [Foster 2006]. Submitted jobs queueat the grid broker. The broker never submits to an execution node directly. Instead, itfinds an appropriate site and submits a job to its gatekeeper, along with a certificate toidentify the job owner. If the gatekeeper refuses the job, it sends it back to the broker.If the broker accepts it, it starts a job manager process. The job manager submits thejob to the local batch system and collects job status information.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 10: A survey of task mapping on production grids

37:10 X. Grehant et al.

Fig. 4. Condor-G.

As a consequence of grid sites’ autonomy, no single component in a grid controlsresource allocation from job submission to end node assignment. The set of grid nodesis not considered as a whole: a node is never compared for assignment with anothernode from another site.

Resources are segregated between grid sites. They are also disconnected from users.

3.2. User and Job Disconnected

In single administrative domains, interactive jobs and runtime job migration is possibleusing a direct connection between user and execution node. In grids, jobs are delegatedto sites. Execution nodes are not directly accessible from outside of their sites. Theconnection is lost.

3.2.1. From Condor to Condor-G. With Condor (Section 2.5.2 and Figure 2), daemonstake responsibility for a task on both user node (the shadow) and execution node(the starter). These daemons maintain a connection between each other through whichusers or applications control remote computations at runtime. If the code is linked withappropriate libraries, both sides communicate seamlessly. In addition, the starter cancheck-point the task while it is running, and send checkpoints to the shadow, possiblyfor migration on another execution node.

This connection, runtime management, and migration is lost with Condor-G.Condor-G was produced in 2001 to introduce the matchmaking mechanism in gridbrokers [Frey et al. 2002]. G initially meant Globus, because Condor-G was designedto integrate with GRAM. Now it means Grid to denote the system’s generality.

With Condor-G (Figure 4), the Grid broker not only includes a matchmaker but hasalso taken over the schedd that would be run by the user in Condor. The schedd launchesa grid manager, the process responsible for a job. This means that the user fully dele-gates the job to the broker, including potential localization, control, and communicationat execution time. However, the grid manager also drops these capabilities. It simplysubmits the job to the grid site’s gatekeeper. As on standard GRAM, the gatekeeper

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 11: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:11

decides to accept or return the job to the broker. If it accepts the job, it forwards it to thegrid monitor, which deploys a job manager process. The job manager, in turn, submitsto the grid site’s local batch system of choice, which can be other than Condor.

Grid manager and job manager in Condor-G are the counterparts of Shadow andStarter in Condor. Their functionality is reduced. They submit jobs and do not managethem at runtime. Communication between each other is reduced to notifications of jobstatus.

3.2.2. gLite, a Middleware Distribution. Condor-G illustrates the loss of user controlon their jobs. Condor-G is integrated in distributions like gLite, LCG middleware[Litmaath 2007].

In a deployment of gLite, the broker is replicated to handle the load. On job sub-mission, a broker replica submits a Condor schedd to the selected site. Once runningon the site, the schedd sends a simple ClassAd to the broker to request the associatedjob. The schedd then forwards the job to the local batch system. It knows which brokerreplica sent the job. Therefore, it can communicate back the job status. In this case,the Condor matchmaker is not used to select the appropriate site but only to identifythe request of a single schedd. This construction makes little use of its componentsindividual functionalities.

On that account, gLite also distributes a simpler system as a more recent alternative.It consists in two simple java servlets, ICE and CREAM.5 ICE implements the practicalfunctionality of a gLite grid broker, and CREAM interfaces a local batch system on agrid site. ICE/CREAM reproduces the functionalities of task mapping that have beenachieved in practice in a central component that federates multiple administrativedomains. The result is a simple batch job submission to grid sites with access controland site selection.

3.2.3. DIET, a Hierarchy of Agents. DIET (Distributed Interactive Engineering Toolbox) isa middleware distribution developed at LIP, a French laboratory on parallel computing.DIET is used in bioinformatics, cosmology, and meteorology. Users submit their jobsto a Master Agent that forwards the request along a tree of Local Agents before thejob hits an execution node. Each agent dispatches the load between agents of the nextlevel and aggregates the replies to send them back to the client [Bouziane et al. 2010].

3.3. Out-of-Date Matching

A grid broker does not map jobs on execution nodes. Instead, it matches job require-ments against site resource advertisements. Advertized resources may not hold whenthe job is scheduled to run.

3.3.1. Job Description. A job carries a description of hardware and software flavor andconfiguration, resources, and data that it expects to find on the execution node. Thisis done in a format chosen by the infrastructure. gLite uses Condor ClassAds (seeSection 2.5.2), renamed JDL (Job Description Language). NorduGrid middleware,ARC6, has its own format called RSF (Resource Selection Framework). The OpenGrid Forum, a standardization consortium with representatives from industry andacademia, recommends another variant: JSDL (Job Submission Description Language)[Savva et al. 2005].

3.3.2. Resource Description. Sites, on the other hand, advertize their resources. Theydo not publish specific information for every node. Instead, they group their nodes intohomogeneous clusters called computing elements [Chien 2004].

5Computing Resource Execution And Management.6Advanced Resource Connector.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 12: A survey of task mapping on production grids

37:12 X. Grehant et al.

Definition 12 (Computing element [Andreozzi et al. 2007]). As a common abstrac-tion, the computing element (CE) refers to the characteristics, resource set, and policiesof a single queue of the underlying management system.

At the Grid level, computing capabilities appear as computing elements (each beinga set of job slots to which policies and status information are associated) that arereachable from a specific network endpoint.

A CE description includes information on operating system flavor, number of pro-cesses, and queue length. The same configuration is maintained on all nodes of the CE.A grid broker does not make a difference between them. Therefore, each computingelement has its own batch system. A site usually contains one or a few computingelements.

There are several CE description formats: the GLUE schema, which stands for GridLaboratory Uniform Environment, is the most used by early grids. GLUE is specializedin grid resources [Andreozzi et al. 2007]. The Common Information Model (CIM) appliesto distributed computing in general. It is recommended by DMTF7 and used notablyin Naregi.8 Different formats can be converted to ClassAds for processing by Condormatchmaker [Baranovski et al. 2003].

A grid broker delegates each job to the grid site that appears to have the least loadedCE that matches the job requirements. Each computing element typically updates itsinformation every five to fifteen minutes. Added to a job’s queuing time on the site, itgives the obsolescence of the broker’s resource perception. In this meantime, the stateof resources changes.

3.3.3. Consequences. Out-of-date matching is not fault tolerant. For instance, someCE failures result in jobs being considered terminated as they arrive. Before the mal-function is acknowledged, jobs keep flowing to the CE and failing. This situation iscalled a black hole.

Out-of-date matching proscribes fine-grained resource allocation. CEs and jobs de-scribe themselves in broad terms. The state of transient resources, such as cache,memory, disk, and CPU cannot be communicated.

Multiple brokers are instantiated to scale up when workload increases. Theoretically,protocols exist to coordinate multiple brokers that control job execution time. However,because matching is out-of-date, no instance knows precisely when a job is going torun, and a precise coordination cannot take place [Ranjan 2007].

For these reasons, performance optimization is not possible at the broker level. Acentral decision system that preserves the independence of grid sites fails to improveefficiency in their aggregation. The following section shows that it cannot support richerworkloads or service levels than individual sites can already support.

3.4. Intersecting the Capabilities of Local Systems

A grid broker must deal with any possible type of local resource management systempresent on grid sites.

There are two ways to integrate interfaces: standards and translators.

3.4.1. Standards. In a first approach, grid brokers and grid sites must implement stan-dard interfaces9 [Foster 2005]. Once a consensus is reached and all systems comply

7Distributed Management Task Force: www.dmtf.org.8National Research Grid Initiative: www.naregi.org.9Standardization organizations involved in grid computing include OGF (Open Grid Forum: ogf.org), IETF(Internet Engineering Task Force: ietf.org), OASIS (Organization for the Advancement of StructuredInformation Standards: oasis-open.org), DMTF (Distributed Management Task Force: dmtf.org).

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 13: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:13

with it, functionality is not lost along the path [Foster et al. 2002]. However, the processis long, and standards requires wide acceptance.

3.4.2. Translators. The second approach is pragmatic. Before standard interfaces areimplemented for all components in use, these components have to communicate any-ways. Instructions from the grid broker, forwarded to the site, must be understood bythe variety of local batch systems available.

GAHP (Grid ASCII Helper Protocol) is a translation protocol initially developed aspart of Condor-G for compatibility with Globus. It translates instructions from thegrid broker to various implementations of a site’s gatekeeper, and from the site’s jobmanager, that is, the process that controls a job on a grid site (Section 3.2), to variouslocal batch systems [Nakada et al. 2005].

Unfortunately, the vocabulary of translators intersects the capabilities of the systemsthey interface. With a few variants, it is reduced in practice to submit to submit a job,cancel to cancel a job submission, and status to get the status of a job submission[Rebatto 2005; Nishandar et al. 2005].

3.5. Partial Conclusion

The attempt to centrally orchestrate grid resources results in a simple superpositionof grid sites, no communication or control from the user at runtime, a delayed, broad-grained knowledge of the resources, and the simplest existing batch functionalities.

In practice, grid users have the option to skip the broker and submit directly to gridsites. Experienced users often choose this option.

In the next section, a recent trend shows that a decentralized allocation is possible.

4. USER-DRIVEN ALLOCATION

This section studies a decentralized approach that has recently gained momentum.In this approach, a large part of the allocation is processed by federated users andapplication frameworks.

This new trend is motivated by user concerns that do not find satisfaction with tra-ditional infrastructures (Section 4.1). A way to bypass the standard allocation processis identified in Section 4.2. Section 4.3 shows how a framework for large applicationsuse this bypass, as well as VO allocation systems (Section 4.4). A Condor mechanismwas found to be well suited for this purpose (Section 4.5). Section 4.6 shows how majorVOs start to change their way of allocating their tasks. This turn of events redefinesthe paradigms of grid resource allocation (Section 4.7).

4.1. User Concerns

Users are concerned with problem-solving and performance.

—Problem-solving. Users and applications have constraints on task mapping: job de-pendencies, interactivity, interprocess communication, prioritization, etc. Given theloose coupling of grid resources, best-suited applications are divisible into indepen-dent jobs. The absence of support for task-mapping constraints further restricts theclass of applications that can be handled directly.

—Performance. Low-level resources allocated to individual jobs determine the perfor-mance, even with independent, standalone jobs [Kim et al. 2007]. A grid assemblesheterogeneous nodes. Task mapping must take into account their differences in con-junction with job profiles. For example, a processor with little cache would not handlewell memory-intensive tasks, but it could handle together a CPU intensive task anda task that only downloads data [Xiao et al. 2000].

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 14: A survey of task mapping on production grids

37:14 X. Grehant et al.

Fig. 5. Late binding on a grid.

Attempts for centralized task mapping do not achieve support for problem-solving andperformance. Between the authority of resource users and resource providers, littleremains to be controlled by a third party. Still, grid infrastructures centralize theaccess to different sites. Starting there, some large VOs and application frameworkshave the capacity to address their own concerns by themselves.

The following identifies a common model, implementations, benefits, and the newconstraints in these environments.

4.2. Late Binding

The initial intent of grids is to consolidate resources from multiple administrativedomains for the benefit of multiple independent applications and users. The previoussections show that although grids connect their users with a wide range of resources,they do not address complex problems on their behalf, such as performance optimizationand support for inter-dependent jobs. Consequently, a growing number of VOs and largeapplications choose to address their own concerns with late binding.

Definition 13 (Late binding). Late binding is an allocation strategy in which a taskis selected to run only once a placeholder is scheduled for it on identified resources.

Figure 5 illustrates late binding on a grid. An allocation system is responsible for a setof jobs. (1) It submits monitors to the grid broker. (2) Each monitor follows the normaljob flow. The grid broker delegates it to a site gatekeeper. (3) The gatekeeper forwardsit to a batch system. The batch system assigns it to an execution node. (4) Once runningon the execution node, the monitor connects back to the allocation system to receiveactual jobs and handle their execution.

Late binding is also known as a pull model. The job is pulled from the resource, asopposed to being pushed by the broker.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 15: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:15

Late binding allows for controlled start time, placement, and runtime communica-tion.

—Runtime Communication. Monitors initiate connections from inside a grid site andacross the site’s firewall. This allows the user to stay connected to end resources andjobs.

—Placement. Before starting any jobs, monitors inspect resources, their location, andconfiguration.

—Controlled Start Time. Once a monitor starts to run on an execution node, establishsa connection with the user (or the user’s allocation system), and checks the nodeconfiguration, it can start jobs immediately on user request (or on request from theuser’s allocation system).

4.3. Allocation by Applications

An application that runs on a grid uses a module to spawn jobs. This module simplysubmits jobs to the grid or schedules them with late binding.

In the absence of late binding, unpredictable delays before execution must be ex-pected. A study on a major infrastructure shows that half of the jobs wait for morethan five minutes between submission and execution, and 5% wait for more than15 minutes [Glatard et al. 2007a].

With late binding, application schedulers control start time and directly interactwith end nodes. They address job dependencies, interaction with the user, interactionbetween jobs, real-time control, priorities, and fault tolerance [Moscicki 2006].

DIANE10 is a framework for scheduling applications on grids [Moscicki 2003; 2011].The project started in 2002 to port CERN applications to the DataGrid, the ances-tor of LCG. Other notable applications include genome sequencing and in-silico drugdiscovery against malaria and bird flu [Moscicki et al. 2004; Lee et al. 2006].

DIANE uses late binding to minimize makespan, that is, the time until the last jobcompletes. It keeps queues in front of each execution node and dynamically reassignsjobs to different queues. Reassignment is triggered when new monitors start execu-tion and bring new resources to the framework or when the progress on a node isunexpectedly slow [Moscicki 2006].

The more jobs an application spawns, the more useful DIANE is. A high job liquid-ity results in substantial gains from coordinated scheduling. Comparable or greaterliquidity is reached by centralizing jobs from the members of a scientific collaboration.

4.4. Allocation by Collaborations

A growing number of virtual organizations (VOs, Definition 2) take care of schedulingfor their members.

4.4.1. Examples. Particle physics is the area with the largest VOs and earliest use ofgrids. In particle physics, a VO is a collaboration that builds a detector and analyzesits data.

—At Fermilab, near Chicago, CDF11 and D012 study the results of Protons-Antiprotonscollisions, scheduled to analyze data until 2009. MINOS13 analyzes Neutrino oscil-lations.

10DIstributed ANalysis Environment.11Collider Detector at Fermilab. www-cdf.fnal.gov.12www-d0.fnal.gov.13Main Injector Neutrino Oscillator Search. www-numi.fnal.gov.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 16: A survey of task mapping on production grids

37:16 X. Grehant et al.

—At SLAC, Stanford Linear Accelerator Center, BaBar14 analyzes the violation ofcharge and parity (CP) symmetry in the decays of B mesons.

—At CERN, the European Center for Nuclear Research in Geneva, four VOs are finish-ing building detectors and preparing for data analysis for the coming years. ATLAS15

and CMS16 are two general-purpose detectors for analyzing proton-proton and heavyions collisions. ALICE17 studies Pb-Pb collisions generating a quark-gluon plasma,as in the early universe, and LHCb18 studies collisions of baryons containing theBeauty quark for CP violation measurements and rare decays observations.

Other scientific domains are also represented. For example, BIOMED is a VO withabout 300 users working on three areas: medical imaging, Bioinformatics, and drugdiscovery. BIOMED maintains application software for use on grids and domain-specificgrid portals [Maigne et al. 2004].

According to EGI operations portal19, in 2011 there were more than 200 active VOsregistered to EGI with a total of more than 12,000 users. Most VOs have under 50 users,and several only have two or three users.

4.4.2. When Scheduling by VOs is Appropriate. VOs differ from one another in size and or-ganization. Maintaining their own scheduling system is appropriate for large VOs within-house software development, a unique goal towards which everyone contributes, andlarge workloads. This is the case mostly in high energy physics (HEP). In other areas,VOs are often smaller and less organized. When they apply, the following elements giveauthority to a VO to schedule jobs for their users.

Identity. VOs provide grid users with an identity recognized by resource providers.Grid sites identify VOs to define bulk resource supply contracts and authenticate jobowners [Foster et al. 2001]. This applies to all VOs.

Priorities. In practice, grid middleware integrates authorization and prioritizationcapability on a per-VO basis. Users can bypass any VO-specific tool and access gridresources directly, avoiding VO-specific control and effectively using an arbitrary shareof the load granted to their VO. However, users may chose to rely on a VO schedulingsystem that selects and configures resources on their behalf to provide a higher levelof reliability and adaptation, and that prioritizes jobs on a per-user basis. The shareof resources managed by the VO system grows with the benefit of VO managementperceived by individual users.

Cooperation. The competition between individual users for access to computing re-sources is manageable as long as users have the same objective in mind and the sameincentives. Typically, VO members collaborate for a shared goal. All members are listedas authors on many publications, and all members would share a Nobel Prize.

De-multiplied Effort. Job submission is often naturally delegated to a few expertmembers. The gap is narrow between centralized skills to manage grid jobs and thedevelopment of a single allocation system for the whole VO.

Knowledge. VOs use a limited number of applications and algorithms. Their jobs haveknown resource requirements. VO administrators know which software configurationworks on the execution node. They can evaluate resources to accurately map their jobs.

14from BB. www-public.slac.stanford.edu/babar.15A Toroidal LHC ApparatuS. altas.ch.16Compact Muon Solenoid. cms.cern.ch.17Large Ion Collider Experiment. aliceinfo.cern.ch.18Large Hadron Collider beauty. lhcb.web.cern.ch.19cic.gridops.org.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 17: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:17

Liquidity. Finally, VOs can be large. For example, 1,900 physicists collaborate inATLAS. The number of users and jobs and the amount of resources used raises oppor-tunities for performance optimization.

Since VOs do not control the hardware, late binding is currently their only option forscheduling.

4.4.3. Late Binding by VOs. VOs request the highest possible/useful number of gridnodes and maximize the computing throughput on these nodes.

Grid nodes are requested by submitting monitors, as described in Section 4.2. Amonitor pretends to be a job and runs as long as allowed. Grid sites set maximumdurations, typically of 48 or 72 hours, after which the monitor must terminate.

For this time period, monitors receive actual jobs from the VO and control theirexecution. It appears to resource owners that jobs run for days. In fact, actual jobs lastfor a few minutes to a few hours. They bypass the queues of traditional grid submissionto be directly allocated on end nodes following a strategy defined by the collaboration.

VO schedulers first appeared in the LHC experiment collaborations due to theirlarge sizes but VOs in other domains also start using or at least considering thisapproach [Glatard and Camarasu-Pop 2010]. VO schedulers are not often mentionedin the literature. The following gives a brief overview.

4.5. Sudden Success of an Old Condor Mechanism

Condor is designed to use temporary idling nodes, a strategy known as CPU scavenging.To cope with transient resources, Condor implements late binding. Besides, since 2001,Condor offers a mechanism called glide-ins to form personal pools with nodes controlledby external batch schedulers. Since 2006, glide-ins are used to implement late bindingin grids.

4.5.1. Glide-ins for Personal Pools. Glide-ins are portable shell scripts that a user submitsto external batch schedulers. Once running on an end node, a glide-in launches a processequivalent to a Condor startd that makes itself known to a matchmaker.

A standard Condor pool takes care of an organization’s resources and users(Section 2.5.2 and Figure 2). The matchmaker serves all members of the organization.A Condor pool constituted with glide-ins is intended for personal use. The matchmakermatches classads from the sender’s schedd with classads from the glide-ins’ startds.

4.5.2. Glide-ins for Late Binding on Grids. Only recently, a few allocation systems equippedwith a Condor matchmaker started to systematically send glide-ins for late binding togrid nodes.

Once resources are identified, the remaining difficulty is to provide inbound connec-tivity to grid sites. The starter initiates a connection with a proxy outside of the gridsite, the GCB (Generic Connection Broker), and receives an identifier. To communicatewith the starter, the shadow sends a message to the GCB using this identifier, and theGCB redirects the packets [Son and Livny 2003; Beckles et al. 2005].

With glide-ins and inbound connectivity, VOs can constitute a personal pool withresources from different sites. Resources can even be accessed on multiple grids withdifferent job submission systems. Late binding solves, at the VO level, the problemof interoperability between grids. Standardization is a less practicable alternative[Elmroth and Tordsson 2009].

4.5.3. glideCAF. The Central Analysis Farm (CAF) of CDF, the particle physics VO, andthe Fermilab experiment were extended in 2005 to use grid resources with glideCAF,which integrates late binding with Condor glide-ins [Belforte et al. 2006]. In 2010, CDFswitched to glideinWMS (Section 4.5.5) [Zvada et al. 2010].

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 18: A survey of task mapping on production grids

37:18 X. Grehant et al.

4.5.4. Cronus. An individual initiative in Atlas gradually gained momentum and ledto Cronus in 2006. A year later, Cronus allocated a substantial part of Atlas jobsand controlled a dynamic pool of about 8,500 CPUs infiltrated on LCG, OSG, andNorduGrid, until 2008, when Atlas chose to finance its own official late binding system(Panda, Section 4.6.2).

In addition to short-circuiting grid submission delays, Cronus manages data distri-bution and load takeover.

—Data Distribution. Considering that Atlas jobs do not consume network bandwidth,Cronus downloads data in the background from major storage systems. Succeedingjobs do not wait for data transfers. Instead, they are placed where their data is alreadypresent. NorduGrid plans in advance data downloads before jobs are scheduled forexecution, but LCG does not: jobs start execution by requesting the data and stayidle until the download is complete. Cronus saves this idle time.

—Load Takeover. Cronus lets glide-ins take over the jobs of others whose lease is aboutto expire. This saves 80% of the jobs failures.

4.5.5. glideinWMS. glideinWMS is a project started in 2007 by U.S. CMS, the Americanpart of the CMS collaboration. It extends glideCAF and Cronus information system[Sfiligoi 2007; Sfiligoi et al. 2009].

4.6. An Evolution of VO Strategies

4.6.1. AliEn and Job Agents. Since 2001, ALICE Environment has been the distributeddata analysis environment of ALICE, a CERN detector and collaboration [Buncic et al.2003; Saiz et al. 2003; Anderlik et al. 2007].

Definition 14 (Job agent [Buncic et al. 2004]). The job agent is a Web service allow-ing users to interact with the running job, to send a signal, or to inspect the output.

Prior to job execution, the job agent can automatically install the software packagesrequired by the job. Jobs are classified on a central server in a number of queues. Whena job agent requests a job, a queue is selected, known to contain the most appropriateclass of jobs, and one of the first jobs found on this queue is sent. With AliEn, jobagents installed on computing elements (CEs, Section 12) monitor their status and pulljobs when appropriate. This method preserves from faulty configurations and “blackholes”, that is, the situation where jobs keep flowing to a faulty CE (Section 3.3.3).In 2004, AliEn served as a basis for the design of gLite, the LCG middleware [Laureet al. 2004]. However, gLite did not use job agents (gLite brokers push jobs to CEs),and AliEn continued to be developed and installed concurrently for ALICE.

4.6.2. DIRAC and Pilots. DIRAC, Distributed Infrastructure with Remote Agent Control,started in 2003 for LHCb, a CERN detector and collaboration [van Herwijnen et al.2003; Tsaregorodtsev et al. 2004]. On the LHCb computing facility, DIRAC uses similarjob agents as AliEn. On grid sites outside of LHCb control, DIRAC uses a mechanismsimilar to Condor glide-ins. It submits portable job agents, called pilot agents, to thebroker. Pilot agents start on execution nodes and contact local CEs [Paterson et al.2006; Casajus et al. 2010].

The pilot/glide-in mechanism is becoming the rule for large VOs. AliEn quickly madethe move to pilot agents. In late 2005, the U.S. Department of Energy funded a project,Panda, that uses the same architecture as DIRAC to manage ATLAS jobs on OSG[Wenaus et al. 2006; Nilsson 2007].

In summary, late binding has come in two different ways, through middleware im-plementing pulling natively and indirectly through user-controlled glide-ins or pilotagents. It changes the level of management of resources available to a VO.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 19: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:19

4.7. A New Deal for Old Problems

Late binding on grids, as opposed to the use of dedicated clusters, poses new con-straints to the deployment of task-mapping systems. The pool of available resourceshave different characteristics that affect task-mapping algorithms. Besides, it raisesnew questions and opportunities on the problems of accountability, fairness, and gridinteroperability.

4.7.1. New Constraints for Schedulers. The following characteristics describe a pool ofresources available to a VO scheduler via late binding.

—Convoluted Communication. TCP or UDP communication can be instantiated onlyfrom inside a grid site. Solutions are asynchronous, via messages passing and polling,or involve registration to a proxy and packet forwarding [Son and Livny 2003].

—Limited Node Control. A job is unprivileged on a grid site. The same is true for glide-ins and pilot agents. Isolating jobs inside virtual machines might allow for morecontrol from the VOs [Keahey et al. 2004; Grehant et al. 2007].

Using late binding, computers accessed on the wide area may appear as a dedicated,local cluster, although a few differences remain. First, grid resources are alloted for alimited period. Second, communication between remote nodes exhibit some latency.

—Transience. Nodes leave the pool when their maximum lease period expires. Othernodes join the pool when newly submitted pilot agents/glide-ins start execution. VOsoperate on pools where nodes constantly come and go.

—Latency. Both data access and communication between peers of a distributed systemare affected. Problems arise that were not present in the local area context. Accessto data is heterogeneous. Therefore, allocation must be data driven. In addition, nosingle point has a complete knowledge of the whole system state.

The upper part of Figure 6 represents a network of computers taken from a local cluster(top). Data is accessible everywhere, and communication is seamless. The lower partshows a network of computers taken from remote grid sites. Some are tightly connected,while other links exhibit some latency; and large data is not accessible everywhere.

These new constraints are the price to pay in order to avoid the delays and faultsexperienced on traditional job submission systems. Still, by letting VOs manage theirjobs queues, late binding allows them to obtain a better performance and predictability[Glatard and Camarasu-Pop 2010; Tan et al. 2010; Moscicki et al. 2011].

4.7.2. Accountability and Fairness. Late binding raises new questions on accountabilityof resource usage and fairness of allocation. What will be the impact if all users im-plement aggressive pilot submissions without middleware control? Can users still beaccountable for their activity, and can the allocation be fair? In accepting incoming jobs,resource providers make sure their resources are fairly shared between the VOs theychoose to support. However, they filter by VO rather than by user. This is how Cronus(Section 4.5.4) initially allowed an individual physicist well-versed in Condor to obtainfaster, more exclusive access to a large number of CPUs, and progressively to collectresources on behalf of many other members of the Atlas VO. Still, is it reasonable thatall VOs try allocating a maximum number of resources all the time in a shared system?Glide-ins/pilot agents could be over-provisioned and resources engaged without beingactually used.

The middleware and the resource provider see the sender of the pilot. However, latebinding hides the end user. When accounting is done by the Gatekeeper, the VO is stillaccountable but the sender of a pilot takes responsibility on behalf of end users. ThegLExec mechanism maps individual grid users to local UNIX users. On grid sites that

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 20: A survey of task mapping on production grids

37:20 X. Grehant et al.

Fig. 6. Constraints on wide area (bottom) compared to constraints on clusters (top).

support glExec, pilot agents have the option to run jobs using individual user’s identitiesinstead of the pilot identity [Sfiligoi et al. 2008]. On OSG, Fermilab execution nodeshave had gLExec installed since 2006, and glideCAF has used it since then. gLExec hasbeen included in gLite distributions since 2009. On grid sites that support gLExec, pilotsenders can protect themselves from accountability for potential misuses or securityrisks caused by their users.

4.7.3. Interoperability between Grids and with Clouds

Definition 15. Computing clouds are Internet services that provide computing re-sources allocated on demand and dedicated for the time of the reservation.

Computing clouds are Internet service providers with Web-service interfaces to accessresources and connect them. Amazon Web Services20 is one of the earliest and cur-rently most prominent examples. Other popular clouds include Rackspace21 and, morerecently, Windows Azure.22

Resources are available immediately, and their connectivity is not restricted. Cloudsdo not make it harder than local clusters for VOs to manage reservation, co-allocation,priorities, dependencies, and interactive jobs. The only perceived differences are thedifferences in latency and bandwidth. Late binding achieves a similar result by con-necting VOs and end-resources on grids.

Pilot agents and glide-ins can be sent to different grids and even to commercial clouds,as long as the sender knows how to obtain CPUs on these systems. The VO scheduler,to which they connect back, does not see whether the execution node is on a grid or

20aws.amazon.com.21rackspace.com.22windowsazure.com.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 21: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:21

on a cloud. The problem of interoperability between grids and clouds receives someattention in the scientific community [Kim et al. 2009]. Late binding lets executionnodes from various infrastructures interoperate. Computing clouds also offer otherservices, such as storage nodes or queuing services. Prior knowledge of their existenceand their protocols is still required, and middleware is still necessary to keep track ofthem [Blanco et al. 2009].

5. CONCLUSION

Grids are worldwide aggregations of computational resources and demand from multi-ple institutions. Systems that attempt to centrally orchestrate together grid resourcesand users did not count on the autonomy of administrative domains. The need to avoidobtrusiveness confined them to a simplistic superposition of grid sites, instead of anefficient consolidation of resource management.

However, some applications generate enough liquidity to graft their own opportunis-tic task-mapping mechanisms. They have interest and capabilities to check acquiredresources against their tasks. They operate fine-grained, dynamic mapping. They real-ize the move from resource allocation across autonomous applications to usage-centricallocation across autonomous resource providers, without a priori knowledge of re-source topology.

ACKNOWLEDGMENTS

Many thanks to Sanjay Padhi (UoW, CMS), Predrag Buncic (CERN, ALICE), Maarten Litmaath (CERN,EGEE), Jakub Moscicki (CERN, LHCb, ATLAS) and Stuart Paterson (CERN, LHCb) for sharing theirexpertise, experience, and vision.

REFERENCES

ANDERLIK, C., GREGERSEN, A. R., KLEIST, J., PETERS, A., AND SAIZ, P. 2007. Alice - arc integration. In Proceedingsof the International Conference on Computing in High Energy Physics.

ANDERSON, D. P. 2003. Public computing: Reconnecting people to science. In Proceedings of the Conference onShared Knowledge and the Web.

ANDREETTO, P. 2004. Practical approaches to grid workload and resource management in the egee project.In Proceedings of the Conference on Computing in High Energy and Nuclear Physics (CHEP’04). Vol. 2,899–902.

ANDREOZZI, S., BURKE, S., DONNO, F., FIELD, L., FISHER, S., JENSEN, J., KONYA, B., LITMAATH, M., MAMBELLI, M.,SCHOPF, J. M., VILJOEN, M., WILSON, A., AND ZAPPI, R. 2007. Glue schema specification version 1.3.http://glueschema.forge.cnaf.infn.it/Spec/V13.

ANNIS, J., ZHAO, Y., VOECKLER, J., WILDE, M., KENT, S., AND FOSTER, I. 2002. Applying chimera virtual dataconcepts to cluster finding in the sloan sky survey. In Proceedings of the ACM/IEEE Conference onSupercomputing (Supercomputing’02). IEEE Computer Society Press, Los Alamitos, CA. 1–14.

ANSTREICHER, K. M., BRIXIUS, N. W., GOUX, J.-P., AND LINDEROTH, J. 2000. Solving large quadratic assignmentproblems on computational grids. Tech. rep., MetaNEOS project, Iowa City, IA.

ASADZADEH, P., BUYYA, R., KEI, C. L., NAYAR, D., AND VENUGOPAL, S. 2004. Global grids and software toolk-its: A study of four grid middleware technologies. Tech. rep. GRIDS-TR-2004-5, Grid Computing andDistributed Systems Laboratory, University of Melbourne, Australia. July 1.

AVERY, P. 2007. Open science grid: Building and sustaining general cyberinfrastructure using a collaborativeapproach. First Monday 12, 6.

BARANOVSKI, A., GARZOGLIO, G., KREYMER, A., LUEKING, L., MURTHI, V., MHASHIKAR, P., RATNIKOV, F., ROY, A.,ROCKWELL, T., TANNENBAUM, S. S. T., TEREKHOV, I., WALKER, R., AND WUERTHWEIN, F. 2003. Management ofgrid jobs and information within samgrid. In Proceedings of the U.K. e-Science All Hands Meeting.

BECKLES, B., SON, S., AND KEWLEY, J. 2005. Current methods for negotiating firewalls for the condor system.In Proceedings of the 4th U.K. e-Science All Hands Meeting.

BELFORTE, S., HSU, S.-C., LIPELES, E., NORMAN, M., THWEIN, F. W., LUCCHESI, D., SARKAR, S., AND SFILIGOI, I. 2006.Glidecaf: A late binding approach to the grid. In Proceedings of the Conference on Computing in HighEnergy and Nuclear Physics (CHEP’06).

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 22: A survey of task mapping on production grids

37:22 X. Grehant et al.

BLANCO, C. V., HUEDO, E., MONTERO, R. S., AND LLORENTE, I. M. 2009. Dynamic provision of computing resourcesfrom grid infrastructures and cloud providers. In Proceedings of the Workshops at the Grid and PervasiveComputing Conference (GPC’09). IEEE Computer Society, Washington, DC, 113–120.

BLAZEWICZ, J., DROZDOWSKI, M., AND MARKIEWICZ, M. 1999. Divisible task scheduling concept and verification.Parallel Comput. 25, 1, 87–98.

BODE, B., HALSTEAD, D. M., KENDALL, R., LEI, Z., AND JACKSON, D. 2000. The portable batch scheduler and themaui scheduler on linux clusters. In Proceedings of the 4th Annual Showcase & Conference (LINUX-00).The USENIX Association, Berkeley, CA, 217–224.

BOUZIANE, H. L., PEREZ, C., AND PRIOL, T. 2010. Extending software component models with the master-workerparadigm. Parallel Comput. 36, 86–103.

BUNCIC, P., PETERS, A. J., AND SAIZ, P. 2003. The alien system, status and perspectives. In Proceedings of theConference on Computing in High-Energy Physics.

BUNCIC, P., PETERS, A. J., SAIZ, P., AND GROSSE-OETRINGHAUS, J. 2004. The architecture of the alien system. InProceedings of the Conference on Computing in High Energy and Nuclear Physics (CHEP’04).

CASAJUS, A., GRACIANI, R., PATERSON, S., TSAREGORODTSEV, A., AND THE LHCB DIRAC TEAM. 2010. Dirac pilotframework and the dirac workload management system. J. Physics: Conf. Series 219, 6, 062049.

CASAVANT, T. L. AND KUHL, J. G. 1988. A taxonomy of scheduling in general-purpose distributed computingsystems. IEEE Trans. Softw. Eng. 14, 2, 141–154.

CASTAGNERA, K., CHENG, D., FATOOHI, R., HOOK, E., KRAMER, B., MANNING, C., MUSCH, J., NIGGLEY, C., SAPHIR, W.,SHEPPARD, D., SMITH, M., STOCKDALE, I., WELCH, S., WILLIAMS, R., AND YIP, D. 1994. Nas experiences witha prototype cluster of workstations. In Proceedings of the ACM/IEEE Conference on Supercomputing(Supercomputing’94). ACM Press, New York, NY, 410–419.

CERUZZI, P. E. 1994. From batch to interactive: The evolution of computing systems, 1957-1969. In Proceedingsof the IFIP 13th World Computer Congress. 279–284.

CHEN, W.-N. AND ZHANG, J. 2009. An ant colony optimization approach to a grid workflow scheduling problemwith various qos requirements. Trans. Sys. Man Cyber Part C 39, 29–43.

CHERKASOVA, L., GUPTA, D., RYABINKIN, E., KURAKIN, R., DOBRETSOV, V., AND VAHDAT, A. 2006. Optimizing gridsite manager performance with virtual machines. In Proceedings of the 3rd USENIX Workshop on RealLarge Distributed Systems (WORLDS’06).

CHIEN, A. A. 2004. The Grid 2. Computing Elements 2nd Ed. Morgan Kaufman, Burlington, MA, Chapter 28,567–591.

CODISPOTI, G., GRANDI, G., FANFANI, A., SPIGA, D., CINQUILLI, M., FARINA, F., MICCIO, E., FANZAGO, F., SCIABA, A.,AND LACAPRARA, S. E. A. 2009. Use of the glite-wms in cms for production and analysis. In Proceedings ofthe 17th International Conference on Computing in High Energy and Nuclear Physics.

CROWCROFT, J. A., HAND, S. M., HARRIS, T. L., HERBERT, A. J., PARKER, M. A., AND PRATT, I. A. 2008. Futuregrid:A program for long-term research into grid systems architecture. Tech. rep., University of Cambridge.

CZAIJKOWSKI, K., FOSTER, I., AND KESSELMAN, C. 2004. Resource and Service Management 2nd Ed. MorganKaufman, Burlington, MA, Chapter 18, 259–283.

CZAJKOWSKI, K., FOSTER, I., KESSELMAN, C., SANDER, V., S, V., AND TUECKE, S. 2002. Snap: A protocol for negotiatingservice level agreements and coordinating resource management in distributed systems. In Proceedingsof the 8th Workshop on Job Scheduling Strategies for Parallel Processing. 153–183.

CZAJKOWSKI, K., FOSTER, I. T., AND KESSELMAN, C. 1999. Resource co-allocation in computational grids. InProceedings of the 8th IEEE International Symposium on High Performance Distributed Computing(HPDC’99). IEEE Computer Society.

DEELMAN, E., SINGH, G., SU, M.-H., BLYTHE, J., GIL, Y., KESSELMAN, C., MEHTA, G., VAHI, K., BERRIMAN, G. B.,GOOD, J., LAITY, A., JACOB, J. C., AND KATZ, D. S. 2005. Pegasus: A framework for mapping complex scientificworkflows onto distributed systems. Sci. Program. 13, 3, 219–237.

ELLERT, M., GRONAGER, M., KONSTANTINOV, A., KONYA, B., LINDEMANN, J., LIVENSON, I., NIELSEN, J. L., NIINIMAKI, M.,SMIRNOVA, O., AND WAANANEN, A. 2007. Advanced resource connector middleware for lightweight compu-tational grids. Future Gener. Comput. Syst. 23, 2, 219–240.

ELMROTH, E. AND TORDSSON, J. 2009. A standards-based grid resource brokering service supporting advancereservations, coallocation, and cross-grid interoperability. Concurrency Comput. Pract. Exp. 21, 18, 2298–2335.

FIUCZYNSKI, M. E. 2006. Planetlab: Overview, history, and future directions. Oper. Syst. Rev. 40, 1, 6–10.FOSTER, I., KESSELMAN, C., NICK, J., AND TUECKE, S. 2002. The physiology of the grid: An open grid services

architecture for distributed systems integration. In Proceedings of the 4th IEEE/ACM InternationalSymposium on Cluster Computing and the Grid.

FOSTER, I., KESSELMAN, C., AND TUECKE, S. 2001. The anatomy of the grid: Enabling scalable virtual organiza-tions. Int. J. Supercomput. Appl. 15, 3.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 23: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:23

FOSTER, I. T. 2005. Service oriented science. Science 308, 5723, 214–217.FOSTER, I. T. 2006. Globus toolkit version 4: Software for service-oriented systems. In Proceedings of the

FIP International Conference on Network and Parallel Computing. Lecture Notes in Computer Science,vol. 3779, Springer-Verlag, 2–13.

FREY, J., TANNENBAUM, T., FOSTER, I., LIVNY, M., AND TUECKE, S. 2002. Condor-G: A computation managementagent for multi-institutional grids. Cluster Comput. 5, 237–246.

GEORGIOU, Y. AND RICHARD, O. 2009. Grid5000: An experimental grid platform for computer science. Tech. rep.,MESCAL, Laboratoire Informatique et Distribution (ID)-IMAG.

GLATARD, T. AND CAMARASU-POP, S. 2010. Modelling pilot-job applications on production grids. In Proceedings ofthe International Conference on Parallel Processing (Euro-Par’09). Springer-Verlag, Berlin, Heidelberg,140–149.

GLATARD, T., LINGRAND, D., MONTAGNAT, J., AND RIVEILL, M. 2007a. Impact of the execution context on grid jobperformances. In Proceedings of the International Workshop on Context-Awareness and Mobility in GridComputing (WCAMG’07). IEEE, 713–718.

GLATARD, T., MONTAGNAT, J., AND PENNEC, X. 2007b. Optimizing jobs timeouts on clusters and production grids.In Proceedings of the International Symposium on Cluster Computing and the Grid (CCGrid’07). IEEE,100–107.

GOUX, J.-P., LINDEROTH, J., AND YODER, M. 2000. Metacomputing and the master-worker paradigm. Tech. rep.,Argonne National Labs. Oct. 17.

GRAHAM, G., CAVANAUGH, R., COUVARES, P., SMET, A. D., AND LIVNY, M. 2004. The Grid 2. Distributed DataAnalysis: Federated Computing for High-Energy Physics 2nd Ed. Morgan Kaufman, Berlington, MA,Chapter 10, 136–145.

GREHANT, X., PERNET, O., JARP, S., DEMEURE, I., AND TOFT, P. 2007. Xen management with smartfrog: On-demandsupply of heterogeneous, synchronized execution environments. In Proceedings of the Workshop onVirtualization in High-Performance Cluster and Grid Computing (VHPC ’07). Lecture Notes in ComputerScience, vol. 4854, Springer.

HART, D. L. 2011. Measuring teragrid: Workload characterization for a high-performance computing federa-tion. Int. J. High Perform. Comput. Appl.

HARWOOD, A. 2003. Networks and Parallel Processing Complexity. Melbourne School of Engineering, Depart-ment of Computer Science and Software Engineering.

HUMPHREY, M. 2006. Altair’s PBS - altair’s PBS professional update. In Proceedings of the ACM/IEEEConference on Super Computing (SC). ACM Press, 28.

KASAM, V., ZIMMERMANN, M., MAASS, A., SCHWICHTENBERG, H., WOLF, A., JACQ, N., BRETON, V., AND HOFMANN-APITIUS, M. 2007. Design of new plasmepsin inhibitors: A virtual high throughput screening approachon the egee grid. J. Chem. Inf. Model. 47, 5, 1818–28.

KEAHEY, K., DOERING, K., AND FOSTER, I. 2004. From sandbox to playground: Dynamic virtual environments inthe grid. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing (GRID’04).IEEE Computer Society, Washington, DC, 34–42.

KIM, H., EL KHAMRA, Y., JHA, S., AND PARASHAR, M. 2009. An autonomic approach to integrated hpc grid andcloud usage. In Proceedings of the 5th IEEE International Conference on e-Science (E-Science’09). IEEEComputer Society, Washington, DC, 366–373.

KIM, J.-K., SHIVLE, S., SIEGEL, H. J., MACIEJEWSKI, A. A., BRAUN, T. D., SCHNEIDER, M., TIDEMAN, S., CHITTA, R.,DILMAGHANI, R. B., JOSHI, R., KAUL, A., SHARMA, A., SRIPADA, S., VANGARI, P., AND YELLAMPALLI, S. S. 2007.Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment.J. Parallel Distrib. Comput. 67, 2, 154–169.

KRANZLMULLER, D. 2009. The future european grid infrastructure roadmap and challenges. In Proceedings ofthe Information Technology Interfaces.

LAURE, E., HEMMER, F., AIMAR, A., BARROSO, M., BUNCIC, P., MEGLIO, A. D., GUY, L., KUNSZT, P., BECO, S., PACINI, F.,PRELZ, F., SGARAVATTO, M., EDLUND, A., MULMO, O., GROEP, D., FISHER, S., AND LIVNY, M. 2004. Middlewarefor the next generation grid infrastructure. In Proceedings of Computing in High Energy Physics.

LEE, H.-C., HO, L.-Y., CHEN, H.-Y., WU, Y.-T., AND LIN, S. C. 2006. Efficient handling of large scale in-silicoscreening using diane. In Proceedings of the Enabling Grids for E-Science Conference (EGEE’06).

LEGRAND, A., SU, A., AND VIVIEN, F. 2006. Minimizing the stretch when scheduling flows of biological requests. InProceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’06).ACM Press, New York, NY, 103–112.

LIN, S. C. AND YEN, E., Eds. 2010. Data Driven E-science: Use Cases and Successful Applications of Distributed.Springer.

LITMAATH, M. 2007. Glite job submission chain v.1.2. http://litmaath.home.cern.ch/ litmaath/UI-WMS-CE-WN.

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 24: A survey of task mapping on production grids

37:24 X. Grehant et al.

MAIGNE, L., HILL, D., CALVAT, P., BRETON, V., REUILLON, R., LAZARO, D., LEGRE, Y., AND DONNARIEIX, D. 2004.Parallelization of monte carlo simulations and submission to a grid environment. Parallel Process. Lett.J. 14, 2, 177–196.

MACHADO, M. 2004. Enable existing applications for grid: Batch anywhere, independent concurrent batch,and parallel batch. Tech. rep., IBM. June.

MILOJICIC, D. S., DOUGLIS, F., PAINDAVEINE, Y., WHEELER, R., AND ZHOU, S. 2000. Process migration. ACM Comput.Surv. 32, 3, 241–299.

MIURA, K. 2006. Overview of japanese science grid project naregi. Progress Informatic 3, 1349-8614, 67–75.MOSCICKI, J. T. 2006. Efficient job handling in the grid: short deadline, interactivity, fault tolerance and

parallelism. In Proceedings of the EGEE User Forum.MOSCICKI, J. 2003. Diane - distributed analysis environment for grid-enabled simulation and analysis of

physics data. In Proceedings of the Nuclear Science Symposium Conference on Record (NSS). IEEE.MOSCICKI, J., LEE, H. C.,, GUATELLI, S., LIN, S., AND PIA, M. G. 2004. Biomedical applications on the grid: Efficient

management of parallel jobs. In Proceedings of the IEEE Nuclear Science Symposium Conference onRecord (NSS). IEEE.

MOSCICKI, J., LAMANNA, M., BUBAK, M., AND SLOOTB, P. 2011. Processing moldable tasks on the grid: Late jobbinding with lightweight user-level overlay. Int. J. Grid Comput. Theory, Methods Appl. (FGCS).

MOSCICKI, J. 2011. Understanding and mastering dynamics in computing grids: Processing moldable taskswith user-level overlay. Ph.D. dissertation, FNWI: Informatics Institute (II).

NAKADA, H., YAMADA, M., ITOU, Y., MATSUOKA, S., FREY, J., AND NAKANO, Y. 2005. Design and implementa-tion of condor-unicore bridge. In Proceedings of the 18th International Conference on High-PerformanceComputing in Asia-Pacific Region (HPCASIA’05). IEEE Computer Society, Washington, DC, 307.

NILSSON, P. 2007. Experience from a pilot based system for atlas. In Proceedings of the Computing in HighEnergy Physics (CHEP’07).

NISHANDAR, A., LEVINE, D., JAIN, S., GARZOGLIO, G., AND TEREKHOV, I. 2005. Extending the cluster-grid interfaceusing batch system abstraction and idealization. In Proceedings of the International Symposium onCluster, Cloud and Grid Computing.

PATERSON, S., SOLER, P., AND PARKES, C. 2006. Lhcb distributed data analysis on the computing grid. Ph.D.dissertation, University of Glasgow, Scotland.

PENNINGTON, R. 2002. Terascale clusters and the teragrid. In Proceedings of the International Conference onfor High Performance Computing and Grid Asia in Asia Pacific Region. 407–413.

PETERSON, L., BAVIER, A., FIUCZYNSKI, M., MUIR, S., AND ROSCOE, T. 2005. Towards a comprehensive PlanetLabarchitecture. Tech. rep. PDN–05–030, PlanetLab Consortium. June.

RAMAN, R., LIVNY, M., AND SOLOMON, M. 1998. Matchmaking: Distributed resource management for highthroughput computing. In Proceedings of the 7th IEEE International Symposium on High PerformanceDistributed Computing.

RANJAN, R. 2007. Coordinated resource provisioning in federated grids. Ph.D. dissertation, The University ofMelbourne, Australia.

REBATTO, D. 2005. Egee batch local ascii helper (blahp). In Proceedings of the HEPiX Meeting.RICCI, R., OPPENHEIMER, D. L., LEPREAU, J., AND VAHDAT, A. 2006. Lessons from resource allocators for large-scale

multiuser testbeds. Oper. Syst. Rev. 40, 1, 25–32.ROBERT, Y. E. A. 2003. Grid’5000 plateforme de recherche experimentale en informatique. Tech. rep. Inria,

July.RUDA, M. 2001. Integrating grid tools to build a computing resource broker: Activities of datagrid wp1. In

Proceedings of the Conference on Computing in High Energy Physics (CHEP’01).SAIZ, P., APHECETCHEB, L., BUNIMAGEIIMAGEA, P., PISKAIMAGED, R., REVSBECHE, J. E., AND IMAGEEGOD, V. 2003.

Alien - alice environment on the grid. Nucl. Instrum. Meth. A502, 437–440.SAKANE, E., HIGASHIDA, M., AND SHIMOJO, S. 2009. An application of the NAREGI grid middleware to a na-

tionwide joint-use environment for computing. In High Performance Computing on Vector Systems 2008,M. Resch, S. Roller, K. Benkert, M. Galle, W. Bez, H. Kobayashi, and T. Hirayama, Eds. Springer BerlinHeidelberg, 55–64.

SAVVA, A., ANJOMSHOAA, A., BRISARD, F., DRESCHER, M., FELLOWS, D., LY, A., MCGOUGH, S., AND PULSIPHER, D.2005. Job submission description language (jsdl) specification. http://forge.gridforum.org/projects/jsdl-wg. GFD-R.056.

SFILIGOI, I. 2007. glideinwms - a generic pilot-based workload management system. In Proceedings of theConference on Computing in High Energy Physics (CHEP’07).

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.

Page 25: A survey of task mapping on production grids

A Survey of Task Mapping on Production Grids 37:25

SFILIGOI, I., BRADLEY, D. C., HOLZMAN, B., MHASHILKAR, P., PADHI, S., AND WURTHWEIN, F. 2009. The pilot way togrid resources using glideinwms. In Proceedings of the WRI World Congress on Computer Science andInformation Engineering Vol. 02. IEEE Computer Society, Washington, DC, 428–432.

SFILIGOI, I., QUINN, G., GREEN, C., AND THAIN, G. 2008. Pilot job accounting and auditing in open science grid.In Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (GRID’08). IEEEComputer Society, Washington, DC, 112–117.

SON, S. AND LIVNY, M. 2003. Recovering internet symmetry in distributed computing. In Proceedings of the 3rdInternational Symposium on Cluster Computing and the Grid (CCGRID’03). IEEE Computer Society,Washington, DC, 542.

STREIT, A., BALA, P., BECK-RATZKA, A., BENEDYCZAK, K., BERGMANN, S., BREU, R., DAIVANDY, J., DEMUTH, B.,EIFER, A., GIESLER, A., HAGEMEIER, B., HOLL, S., HUBER, V., LAMLA, N., MALLMANN, D., MEMON, A., MEMON,M., RAMBADT, M., RIEDEL, M., ROMBERG, M., SCHULLER, B., SCHLAUCH, T., SCHREIBER, A., SODDEMANN, T.,AND ZIEGLER, W. 2010. Unicore 6 recent and future advancements. Ann. Telecommun. 65, 757–762.10.1007/s12243-010-0195-x.

TAN, W.-J., CHING, C. T. M., CAMARASU-POP, S., CALVAT, P., AND GLATARD, T. 2010. Two experiments withapplication-level quality of service on the egee grid. In Proceedings of the 2nd Workshop on GridsMeets Autonomic Computing (GMAC’10). ACM, New York, NY, 11–20.

TEREKHOV, I. 2002. Meta-computing at d0. Nuclear Instruments and Methods in Physics Research(ACAT-02) 502/2-3, NIMA14225, 402–406.

THAIN, D., TANNENBAUM, T., AND LIVNY, M. 2002. Condor and the grid. In Grid Computing: Making the GlobalInfrastructure a Reality. John Wiley & Sons Inc, Hoboken, NJ.

THE GRIDPP COLLABORATION. 2006. Gridpp: Development of the U.K. computing grid for particle physics.J. Physics G: Nuclear Particle Physics 32, N1–N20.

TSAREGORODTSEV, A., GARONNE, V., AND STOKES-REES, I. 2004. Dirac: A scalable lightweight architecture for highthroughput computing. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing(GRID’04). IEEE Computer Society, Washington, DC, 19–25.

VAN HERWIJNEN, E., CLOSIER, J., FRANK, M., GASPAR, C., LOVERRE, F., PONCE, S., GRACIANI DIAZ, R., GALLI, D.,MARCONI, U., VAGNONI, V., BROOK, N., BUCKLEY, A., HARRISON, K., SCHMELLING, M., EGEDE, U., TSAREGOROTSEV,A., GARONNE, V., BOGDANCHIKOV, B., KOROLKO, I., WASHBROOK, A., PALACIOS, J. P., KLOUS, S., SABORIDO,J. J., KHAN, A., PICKFORD, A., SOROKO, A., ROMANOVSKI, V., PATRICK, G., KUZNETSOV, G., AND GANDELMAN, M.2003. Dirac - distributed infrastructure with remote agent control. In Proceedings of the Conference onComputing in High Energy Physics.

WENAUS, T., LIVNY, M., AND WURTHWEIN, F. K.. 2006. Preliminary plans for just-in-time workload managementin the osg extensions program. Tech. rep., US Atlas. October. based on SAP proposal of March 2006.

XIAO, L., ZHANG, X., AND QU, Y. 2000. Effective load sharing on heterogeneous networks of workstations. InIPPS: 14th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos,431–438.

ZHOU, S., ZHENG, X., WANG, J., AND DELISLE, P. 1993. Utopia: A load sharing facility for large, heterogenousdistributed computer systems. Softw. Pract. Exp. 23, 12, 1305–1336.

ZVADA, M., BENJAMIN, D., AND SFILIGOI, I. 2010. Cdf glideinwms usage in grid computing of high energy physics.J. Physics: Conf. Series 219.

Received May 2010; revised April 2011, February 2012; accepted April 2012

ACM Computing Surveys, Vol. 45, No. 3, Article 37, Publication date: June 2013.


Top Related