grid resource management policies for load-balancing and energy-saving by vacation queuing theory

Computers and Electrical Engineering 35 (2009) 966–979

Contents lists available at ScienceDirect

Computers and Electrical Engineering

journal homepage: www.elsevier .com/ locate /compeleceng

Grid resource management policies for load-balancing andenergy-saving by vacation queuing theory

Yin Fei *, Jiang Changjun, Deng Rong, Yuan JianjunDepartment of Computer Science and Engineering, Tongji University, 4800 Caoan Road, Shanghai 201804, ChinaThe Key Laboratory of ‘‘Embedded System and Service Computing”, Ministry of Education, 4800 Caoan Road, Shanghai 201804, China

a r t i c l e i n f o a b s t r a c t

Article history:Available online 10 March 2009

Keywords:GridResource managementLoad-balancingEnergy-savingM/G/1 vacation queue with closedown,startup

0045-7906/$ - see front matter � 2008 Elsevier Ltddoi:10.1016/j.compeleceng.2008.09.008

* Corresponding author. Address: Department oCampus, Shanghai 201804, China. Tel.: +86 136017

E-mail addresses: [email protected], hhaa

The resource management is the central component of grid system. The analysis of theworkload log file of LCG including the job arrival and the resource utilization daily cycleshows that the idle sites in the Grid are the source of load imbalance and energy waste.Here we focus on these two issues: balancing the workload by transferring jobs to idle sitesat prime time to minimize the response time and maximize the resource utilization; powermanagement by switch the idle sites to sleeping mode at non-prime time to minimize theenergy consume. We form the M/G/1 queue model with server vacations, startup andclosedown to analysis the performance metrics to instruct the design of load-balancingand energy-saving policies. We provide our Adaptive Receiver Initiated (ARI) load-balanc-ing strategy and power-management policy for energy-saving. The simulation experimentsprove the accuracy of our analysis and the comparisons results indicate our policies are lar-gely suitable for large-scale heterogeneous grid environment.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

A Grid [1] is a very large scale, generalized distributed network computing system that can scale to Internet-size environ-ments with machines distributed across multiple organizations and administrative domains. The emergence of a variety ofnew applications demands that Grids support efficient data and resource management mechanisms. Resource Management[2] is central component of a grid system. Its basic responsibility is to accept requests from users, match user requests toavailable resources for which the user has access and schedule the matched resources. Applications may request resourcesfrom the Grid. Such resource requests are considered as jobs by the Grid. Depending on the application, the job may specifyquality of service (QoS) requirements. The resource management is required to perform resource management decisionswhile maximizing the QoS metrics delivered to the clients [3].

With the Grid becoming a viable high-performance alternative to the traditional supercomputing environment, variousaspects of effective Grid resource utilization are gaining significance. With its multitude of heterogeneous resources, a properscheduling and efficient load-balancing across the Grid is required for improving the performance of the system. Due to un-even job arrival patterns and unequal processing capacities and capabilities, the processors in one grid site may be over-loaded while others in a different grid site may be under-utilized. It is therefore desirable to dispatch jobs to idle orlightly loaded site in the grid environment to achieve better resource utilization and reduce the average job response time.This is a natural extension of the existing work on load-balancing in a traditional distributed system.

. All rights reserved.

f Computer Science and Technology, No. 13 Dormitory Caoan Road 4800, Tongji University Jiading47205; fax: +86 21 [email protected] (F. Yin).

mailto:[email protected]

mailto:[email protected]

http://www.sciencedirect.com/science/journal/00457906

http://www.elsevier.com/locate/compeleceng

F. Yin et al. / Computers and Electrical Engineering 35 (2009) 966–979 967

Grid computing tends to push high performance. Unfortunately, the ‘‘last drop” of performance tends to be the mostexpensive; that is, the last 10% increase in performance requires a disproportionally large amount of resources. The EarthSimulator, one of the world’s fastest supercomputers with 640 computing sites, consumes 7 MW of power [4]. In particular,energy consumption – and the resultant heat dissipation – is becoming an important limiting factor; reducing energy savesmoney and increases reliability, among other things.

In this paper, we focus on the two aspects of resource management of grid environment: load-balancing and energy-sav-ing. By analyzing the Large Hadron Collider Computing Grid (LCG) log file [5], we learn the statistical properties of the jobarrival daily cycle which shows that there is an activity peak around midday. Even at that period, there are more than 10%sites remain idle which is source of load imbalancing. Meanwhile, few jobs arrive at non-prime time and at least 40% sites areidle which causes the energy consumption becoming critical concerns. Queuing systems in which the server works on pri-mary and secondary customers arise naturally. As far as the primary customers are concerned, the server working on thesecondary customers is equivalent to the server taking a vacation [6]. The vacation can be explained as idle sites receivingload for load-balancing or switching the idle sites to sleeping mode for energy-saving. We form the M/G/1 queue model withserver vacations, startup and closedown to analysis the performance metric. We present our policies based on the analysis.Adaptive Receiver Initiated (ARI) load-balancing strategy for grid environment considers the job migration cost, resourceheterogeneity and network dynamics when load-balancing is considered. Power management is energy-saving policy whichexplore the tradeoff between the site energy consume and QoS request satisfy. The simulation experiment proves the per-formance of our policies.

The rest of the paper is organized as follows: In Section 2, we will present the related work on load-balancing and energy-saving in recent year. In Section 3, we analyze the LCG log file. In Section 4, we develop a novel queuing analytical model. InSection 5, we introduce our two polices and give the experiment results in Section 6. Finally, Section 7 concludes the paper.

2. Related work

In general, any load-balancing algorithm consists of four basic policies – transfer policy, selection policy, location pol-icy, and information policy [7]. The transfer policy decides if there is a need to initiate load-balancing across the system.By using workload information, it determines when a site becomes eligible to act as a sender (transfer a job to anothersite) or as a receiver (retrieve a job from another site). Once the transfer policy decides that a site is a sender, a selectionpolicy selects a task for transfer. Many studies (e.g. [8,9]) have shown that (1) migration of the executing job is oftendifficult in practice, (2) the operation is generally expensive in most systems, and (3) there are no significant benefitsof such a mechanism. The location policy determines a suitably underloaded processor. In other words, it locates com-plementary sites to/from which a site can send/receive workload to improve the overall system performance. Location-based policies can be broadly classified as sender initiated, receiver initiated, or symmetrically initiated [7,10–13]. In Ref.[14], five dynamic load-balancing strategies designed to support highly parallel systems are presented and compared.They are the Sender (Receiver) Initiated Diffusion (SID/RID), the Hierarchical Balancing Method (HBM) which organizesthe system into a hierarchy of subsystems, the Gradient Model (GM) which employs a gradient map to guide the migra-tion of tasks, and the Dimension Exchange Method (DEM) which requires a synchronization phase prior to load-balanc-ing. The results indicate that the RID approach performs well, and can most easily be scaled to support highly parallelsystems. Further, while balancing the load, certain types of information such as the number of jobs waiting in queue, jobarrival rate, CPU processing rate, and so forth at each processor, as well as at neighboring processors, may be exchangedamong the processors for improving the overall performance. Based on the information that can be used, load-balancingalgorithms are classified as static, dynamic, or adaptive [7,15–17]. Based on the degree of centralization, load-schedulingalgorithms could be classified as centralized or decentralized [7,17]. In the centralized approach, the load scheduling isdone by only one site in the distributed system which acts as the central controller. It has a global view of the loadinformation in the system, and decides how to allocate jobs to each of the sites. The rest of the sites will executethe jobs assigned by the controller. The centralized approach is more beneficial when the communication cost is lesssignificant, e.g., in the shared-memory multi-processor environment. Many authors argue that this approach is not scal-able, because when the system size increases, the central controller may become a system bottleneck and the singlepoint of failure. In the decentralized approach, all sites in the distributed system are involved in making the decision.It is commonly agreed that distributed algorithms are more scaleable and have better fault tolerance. Since the load-bal-ancing decisions are distributed, it is costly to let each site obtain the dynamic state information of the whole system.However, decentralized algorithms have the problem of communication overheads incurred by frequent information ex-change between processors. Most algorithms [13,18] only use partial information stored in the local site to make a sub-optimal decision.

As to the energy-saving issue, power management techniques have been studied extensively in the context of CPU, mem-ory and disk management in the past [19,20]. Nowadays, a major research issue in distributed mobile computing is to designefficient mechanisms for minimizing energy consumption in the wireless terminals. A Markov model to analyze the sleep-ing/active dynamics in sensor nodes was developed in [21,22]. As to the high performance computing, the low-power high-performance clusters have been developed to stem the ever-increasing demand for energy. Ref. [23] investigates the tradeoffbetween energy and performance (execution time) for HPC applications on a real small-scale power-scalable cluster.

968 F. Yin et al. / Computers and Electrical Engineering 35 (2009) 966–979

3. Workload analysis of the grid log file

We put forward our resource management policies for grid environment based on the statistical properties learned fromthe log file from the real world. We take the LCG [24] as an example. The LCG testbed has approximately 170 active sites witha total number of 24,515 CPUs and 3 Pbytes storage, which is primarily used for high-energy physics data processing. TheLCG data was graciously provided by the e-Science Group of HEP at Imperial College London. The Real Time Monitor devel-oped by Imperial College London monitors jobs from all the major RBs in the LCG testbed. This log file we used contains 11days (from November 20 to 30, 2005) of activity from multiples that make up the LCG. This file contains one line per com-pleted job with the following fields: UNIX timestamp of submit time, group name, user ID, compute element name, job run-time. Workload analysis allows obtaining a model of the user behavior. Such a model is essential for understanding how thedifferent parameters change the resource center usage.

3.1. Job arrival time

The users of grids have their own habits to request resources and to submit jobs, which is referred to as patterns. Here, wetake the daily cycle as an example. The daily cycle could refer to the habit of submitting more jobs during day time thannight, and to the considerably distinct submission distributions during the day and the night. Fig. 1 shows the daily arrivingpatterns of jobs for LCG. There is an obvious daily cycle: during night fewer jobs are submitted and there is an activity peakaround midday 3 pm–6 pm. Obviously, these patterns might blur in grid environments because of users living in differenttime zones; however, they are still important to the local sites (and the local schedulers). Similar patterns can be foundthrough the week, e.g., users tend to submit more jobs during the weekdays than during the weekend.

3.2. Fraction of grid sites state

As the daily cycle shows most jobs arrive during the day and only a few of them at night. So it is obviously that the sites inthe Grid are busier during daytime than nighttime. Even at the prime time, there are more than 10% sites are idle, as shownin Fig. 2. On this circumstance, it might happen that some jobs wait for service at the queue of one resource while at the sametime another capable resource are idle. A load-balancing algorithm whose goal is to minimize the expected turnaround timeof the tasks will tend to prevent the system from reaching such a state.

According to Ref. [25], the probability that the system is in a state in which at least one job waits for service and at leastone server is idle:

pib ¼XM

i¼1

CiMQ iHM�i ¼ ð1� pM

0 Þð1� pM0 � ð1� p0Þ

MÞ ð1Þ

For such a large-scale distributed system as Grid, the number of M is huge:

limM!1

pib ¼ limM!1ð1� pM

0 Þð1� pM0 � ð1� p0Þ

MÞ ¼ 1 ð2Þ

So the idle processors in the grid environment will result in load imbalance.

0

1000

2000

3000

4000

5000

6000

7000

8000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Arriv

al j

obs

Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Fig. 1. Job arrival daily cycle for one week.

0%10%20%30%40%50%60%70%80%90%

100%

1 3 5 7 9 11 13 15 17 19 21 23

Frac

tion

of g

rid s

ites

highmedlowidle

Fig. 2. Fraction of grid sites by workload in one day.


At non-prime time, fewer jobs are submitted and most sites are idle. Although users of grid computing are most interest-ing in performance, the energy and power consumption have become critical concerns. For the large-scale distribution sys-tem like Grid, reducing energy saves money and increases reliability. Undoubtedly, a tradeoff exists between the site energy-saving and the grid performance.

For the idle sites in the Grid, we focus on the following two issues:

(i) transferring jobs to idle sites at prime time for workload-balancing;(ii) switching the idle sites to sleeping mode at non-prime time for energy-saving.

4. Vacation queuing model

Queuing systems with server vacations have attracted much attention from numerous researchers [6]. Server vacationsare useful for the system in while the server wants to utilize his idle time for different purposed. Consider a processor incomputer and communication systems used to produce a variety of items. When it becomes idle, it undergoes secondarytasks to utilize the idle time. While the vacation is going on, any items arriving at the processor will have to wait. Whenthe processor is serving the secondary tasks, it is said to be in vacation for the primary tasks. Here, the start of vacation de-pends on the state of the queue (it happens only when the queue becomes idle after a busy period). Typical questions ofinterest here are: how does the secondary tasks affect the waiting time of the primary tasks? How long a vacation shouldbe scheduled after the end of each busy period?

As to the statistical properties of sites status in the grid environment, the problem of utilizing the sites’ idle fraction arises.Queuing system with vacation model can be used to describe the utilization of idle fraction of a site for different purpose.Since arrival patterns and service rates can be estimated by code profiling and statistical prediction, it is assumed in thisstudy that the arrival pattern and service rate is known a priori. Without loss of generality, we assume that the jobs arriverandomly at a site and the inter-arrival time is exponentially distributed with average 1/k. However, it is the loss originationfrom the potential sources: increased average queue length and response time for the new arrival jobs. A tradeoff exists be-tween the site idle fraction utilization and the job QoS requirements. Here, we develop a novel queuing analytical modelwhich enables us to explore the tradeoff and to investigate the performance.

4.1. System model

We consider a queuing architecture of grid system in which M sites are connected via a network to process independentjobs submitted by users. The jobs arrive randomly at a site, the inter-arrival time being exponentially distributed with aver-age 1/k. The service times for a job are independent and identically distributed variables with a common distributed functionS(t) with the first and second moment E[S] and E[S2]. Each site is modeled as an M/G/1 Markov chain, with the number of jobsqueued up for processing at each processor representing the state of the system. All arrival jobs will be processed in a first-come first-served manner. For the whole system always remains stable, the traffic intensity is less than 1 (q = kE[S] < 1).

Because the switch of the proceeding site state is costly and inefficient, here the N policy M/G/1 vacation queuing modelwith closedown and startup time is formed to describe how to utilize the site’s idle fraction for different propose at differentperiod of a day [26,27]. We consider the exhaustive service regime, i.e., once the server has started serving jobs, it continuesto serve the queue until the queue empties.

Because arrivals are Poisson, the queue regenerates each time it empties and the cycles are i.i.d. Each regeneration cycleconsists of four periods as shown in Fig. 3:

(i) Closedown period, the site sets up a timer with timeout value of C at the end of a busy period. If no jobs arrival, the siteswill take a vacation. The probability for no jobs arrival during the closedown interval is c0 ¼

R10 e�kt dCðtÞ ¼ c�ðkÞ.

Fig. 3. N policy vacation M/G/1 queuing with closedown and startup.


(ii) Vacation period, when the vacation site finds N jobs waiting in the queue, the site will terminate vacation. Let V denotethe length of vacation period.

(iii) Startup period – a random time interval: the startup time will elapse before the service activity is actually renewed.The distribution and generation function pertaining to the startup time is uj ¼

R10ðktÞj

j! e�kt dUðtÞ; j P 0 andU�ðkð1� zÞÞ ¼

P1j¼0ujzj with the first two moments E[U] and E[U2].

(iv) Busy period: the queue size at the beginning of a busy period impacts the duration of this busy period.

4.2. The initial queue size

Let us set Ln the number of jobs left behind by a departing job, and {Ln, n 6 1} is the embedded Markov chain for queuelength process Lv(t). Here, Lnþ1 ¼

Ln � 1þ A; Ln P 1Qb � 1þ A; Ln ¼ 0

�, A is the number of jobs that arrive during a service time with dis-

tribution and generation function

aj ¼Z 1

0

ðktÞj

j!e�kt dBðtÞ; j P 0; A�ðkð1� zÞÞ ¼

X1j¼0

ajzj

Qb is number of customers that arrive when the processor begin to serve. {Qb = 1} means one job arrival during the close-down period. In such a case, the site will begin to serve the job until idle again. If no job arrival during the closedown period,the number of jobs present when the processor start the service, includes the jobs arrive during a vacation and the jobs arriveduring the startup period, Qb = QV + QU. The site starts up until N jobs are waiting in the queue. The probability distribute forQb is

bj ¼ pfQ b ¼ jg ¼1� c�ðkÞ; j ¼ 10; 1 < j < N

c�ðkÞuj�N ; j P N

8><>:

The probability generation function for Qb is

QbðzÞ ¼X1j¼0

bjzj ¼ ð1� c�ðkÞÞzþ c�ðkÞzNu�ðkð1� zÞÞ

E½Q b� ¼ 1� c�ðkÞ þ c�ðkÞðN þ kEðUÞÞ ð3Þ

4.3. The expected length of busy period and idle period

The duration of busy period is decided by the queue size at the beginning of it. If there are m jobs in the system at thebeginning of a busy period, the subsequent busy period will consist of m independent busy periods, each of which denotedby Bi. All Bi are i.i.d. and have the same distribution as the busy period of an M/G/1 queue. The LST for busy period is

B�vðsÞ ¼ ð1� c�ðkÞÞB�ðsÞ þ c�ðkÞ½B�ðsÞ�NU�ðkð1� B�ðsÞÞÞ

B*(s) is the LST for M/G/1, and E[B] = (E[S](1 � q))�1:

E½Bv� ¼ E½E½BjQ b�� ¼ E½Q bE½B�� ¼ E½B�E½Q b� ¼ fð1� c�ðkÞÞ þ c�ðkÞ½kE½U� þ N�g=ðE½S�ð1� qÞÞ ð4Þ

The duration of vacation consists of N arrival, with LST v�ðsÞ ¼ kkþs

� �N:

EðVÞ ¼ Nk

ð5Þ


If a job arrives during the closedown time, the service is immediately started without waiting for the accumulation of N jobsand without a startup time. The mean length for closedown period caused by a job arrival is

E½C� ¼ ð1� c�ðkÞÞk�1 ð6Þ

Otherwise, the site will take a vacation until N jobs arrival, and a startup is needed before starting the busy period. The sitestartup depends on whether the vacation has been taken kc*(k)E[U].

The mean length E[Rc] of a busy cycle is given by

E½Rc� ¼ ð1� c�ðkÞÞk�1 þ c�ðkÞ½E½V � þ E½U�� þ E½Bv� ¼1� c�ðkÞ þ c�ðkÞðN þ kE½U�Þ

kð1� qÞ ¼ E½Qb�kð1� qÞ ð7Þ

The number of jobs served during the busy period depends on the initial queue size Qb. E[M] is the number of served jobsduring the busy period for M/G/1 queue:

E½Mb� ¼ E½M�E½Q b� ð8Þ

If the vacation time is used to serve the second tasks, the number of jobs served is

E½Mv� ¼ E½V �=E½S� ð9Þ

4.4. The distribution function of waiting time

Let us define the LST W�vðsÞ of the DF for the waiting time of our model. According to the decomposition property, the time

for a job spent in the queue is Wv = Wd + W. W is the waiting for in the corresponding M/G/1 system without vacations. Wd isextra delay caused by closedown, vacation, and startup. E[Qb] is the expected number of jobs arrival before service:

(i) the job arrival at the closedown time will be served immediately. The probability is 1�c�ðkÞE½Q b �

.

(ii) the jobs arrival during the vacation time will wait for the residual life of the vacation 1�V�ðsÞsE½V � and the whole startup time

U*(s). The probability is kc�ðkÞE½V �E½Qb �

.(iii) the jobs arrival during the startup time will wait for the residual life 1�U�ðsÞ

sE½U� . The probability is kc�ðkÞE½U�E½Qb �

.

The LST of W�dðsÞ is

W�dðsÞ ¼

1� c�ðkÞE½Q b�

þ kc�ðkÞE½V �E½Qb�

1� V�ðsÞsE½V � U�ðsÞ þ kc�ðkÞE½U�

E½Qb�1� U�ðsÞ

sE½U�

E½Wv� ¼ E½W� þ E½Wd� ¼kE½S2�

2ð1� qÞ þc�ðkÞfNðN � 1Þ þ 2NkE½U� þ k2E½U2�g

2kE½Qb�ð10Þ

5. Resource management based on vacation queuing theory

A Grid is a system of high diversity, which is rendered by various applications, middleware components, and resources.The Grid Scheduler (GS) is design to receive applications from Grid users, selects feasible resources for these applicationsbased on certain objective functions and predicted resource performance.

Information about the status of available resources is very important for a GS to make a proper schedule, especially whenthe heterogeneous and dynamic nature of the Grid is taken into account. The QoS requests of application are the combinationof metrics. Metrics measure specific quantifiable attributes of the system components. Here we focus on the timeliness met-rics which measure the specifications that are related to the timing constraints of the work, especially the average responsetime.

5.1. Load-balancing for grid environment

A reduction in the probability of workload imbalance in the grid environment will cause an improvement of the averageresponse time of the jobs, which can be achieved by transferring jobs from busy sites to idle ones. The idle sites, who com-plete the service with other jobs in the waiting queue, will receive jobs from other sites. The activity of receiving jobs fromother sites for execution can be regarded as the secondary task for it. We defined that the site is taking on a vacation when itexecutes the transferred jobs from others. In the last section, we form the N policy vacation queuing model with server close-down and startup. We derive the important system characteristics, such as the expected length of idle period and the busyperiod, the number of jobs served in the busy period and the average response time for jobs. Here, we will apply these resultsto design our load-balancing strategy. ARI is a dynamic receiver initiated decentralized algorithm. The RI approach can easilybe scaled to support large-scale grid environment and perform well than SI approach. ARI is a modified version of RI in which


we consider the characteristics of Grid including resource heterogeneity, job migration cost and network dynamics whenload-balancing is considered.

5.1.1. ARI load-balancing strategyThe site in the Grid will maintain p number of neighboring sites NSeti, which will be used for load-balancing. Neighbors

for each site are formed in terms of communication delay. For a site si, a site sj is considered as its neighboring site as longas {j 2 NSet(i)jdi,j < emin16k6M,k – j(di,k)}. We have found e = 1.5 to yield very good results by the experiments and this valueis used throughout the paper. The neighbor set for one site will undergo change according to the dynamics of gridenvironment.

In order to incorporate the dynamic changing environment of the Grid, each si in the system calculates its status param-eters, which are the estimated arrival rate, service rate, and load at each periodic interval of time Ti (the status exchangeinterval). Each si in the system exchanges its status information with others in its neighbor set. The instant at which thisinformation exchange takes place is called a status exchange instant. Therefore, si calculates its status information at Tn sta-tus exchange instant using the following relationships:

kiðTnÞ ¼ akiðTn�1Þ þ ð1� aÞðArriðTn � Tn�1Þ=ðTn � Tn�1ÞÞ;liðTnÞ ¼ bEiðTn�1Þ þ ð1� bÞðDepiðTn � Tn�1Þ=ðTn � Tn�1ÞÞ; ða ¼ b ¼ 0:5Þ

ð11Þ

The load index is an important issue in designing a dynamic load-balancing algorithm that measures the current loading of asite. Thomas Kunz [20] reports that the simple CPU queue length is the most effective load index:

LiðTnÞ ¼ QiðTnÞ ð12Þ

For the idle site sI, it collects information of the neighbor sites about their processing capacity and current load to calculatethe average load. sI receives the extra load from those sites whose load is greater than the average load to get all neighborsites finish at the same time and takes into account the sites’ heterogeneity in terms of process capacity. Let spi denote theweight of the process capacity of site i, which is a normalized measure of its speed. Here, the normalized average workload is

NSLavg ¼P

k2NSetisp�kLkðTÞP

k2NSetispk

ð13Þ

sj is considered as a heavy load site if NSLavg < sp�j LjðTÞ.sI will try to receive extra load from all sites in its neighbor set to get finished at almost the same time, taking into account

the sites’ heterogeneity in term of processing capacity:

LjðTÞ � lj

ljðTÞ¼ L0IðTÞ þ lj

lIðTÞð14Þ

Here, lj is the amount of jobs that will be transferred from sj to sI, and Ltol is total number of jobs transferred from the neigh-bor set:

Ltol ¼X

j2NSeti

lj ð15Þ

The job will be migrated only if its expected finish time on the destination site is less than the expected finish time on thesource site. According to M/G/1 queue theory, the EFT of job k on site sj is

EFTjk ¼

E½S2j �

2E½Sj�þ kE½Sj� ð16Þ

The expected finish time of job k on the site sI can be calculated by estimating the transferred workload and the migrationtime for job k from sj to sI is

EFTIk ¼ MI;j

k þ L0IðTÞE½SI� ð17Þ

The job k will be transferred if and only if EFTjk < EFTI

k and site sj will sign it for pause. The pseudo-code is given in Fig. 4.The idle site in the grid environment receiving the extra workload for its neighbor set will help to achieving the load-bal-

ance and improving the site utilization and throughput. It is very necessary for the Grid at the prime time. However, it in-creases the average queue length and response time for the new arrival jobs. When the site is handling the transferred jobs,the new arrival jobs will be paused. We must ensure the QoS request. We can calculate the expected finish time for hth ar-rival job in the pausing queue according the analysis in the last section:

EFTIh ¼ ðN � 1� hÞ=kIðTÞ þ eS þ ðhþ 1Þ=lIðTÞ ð18Þ

Here, N is the number of jobs assuming arrival during vacation period, which depends on the Ltol, and N ¼ L�tolkIðTÞ=lIðTÞ. eS isthe estimated length of the startup period. If EFTI

k > QosFTðkÞ, the site will stop to do the startup work. The pseudo-code isgiven in Fig. 5.

Procedure Migration by SI

While no new task arrive

1. Collect statue information of the neighbor set

2. Estimate the average load of he neighbor set

3. For the sites which have load greater than the avg load Sj

i. determine how much load can be migrated to SI such that load on sites get finished at almost same time

ii. estimate the finish time of tasks on the overload site Sj and migrate task only if j Ik kEFT EFT>

iii. migrate task and sign it

If new job arrival

Terminate migration and start serving

else

Execute migrated jobs

Fig. 4. Pseudo-code for closedown period of ARI.

Procedure Termination by SI

For every new come task

Estimate the expected finish time IkEFT

if ( )Ik FTEFT Qos k>

Terminate executing migrated tasks

Fig. 5. Pseudo-code for vacation period of ARI.


When the site sI terminates executing the transferred jobs, it will do the startup work before executing the jobs waiting inthe pausing queue. It will check the transferred jobs’ statue. If the job g has not been executed, sI will inform the g’s sourcesite. The estimated length of the startup period eS depends on the delay of the network. The pseudo-code is given in Fig. 6.

5.1.2. Performance evaluation for ARIARI is a receiver initiated load-balancing strategy, which takes the advantage of the site’s idle fraction to serve the mi-

grated jobs from the heavy load site at prime time. When the site takes a vacation, it serves the migrated job; when the vaca-tion is over, the site serves its own jobs. The utilization of site is Util = (E[V] + E[Bv])/E[Rc].

The throughput measures the number of jobs served in unit time of a service circle which depends on the sum of jobsserved during vacation time and busy time Thou = (E[MB] + E[Mv])/E[Rc].

Procedure Mop-up by SI

For every migration tasks

Switch task. statue

Case finished: cancel it by its source

Case unfinished: resume it by its source

Fig. 6. Pseudo-code for startup period of ARI.

Procedure Migration by SI

While no new task arrive

1. Collect statue information of the neighbor set

2. Estimate the traffic intensity of he neighbor set

For the sites in the SI’s neighbor set

If ( ) ( ) 0.3j Nset i jρ ∈ <

Switch to sleeping mode

else

Switch to low-energy mode

A

Fig. 7. Pseudo-code for energy-saving policy.


5.2. Energy-saving policy for grid environment

At the non-prime time, when a site sI becomes idle, it will call the status exchange procedure. It collects the status infor-mation with the sites in its neighbor set, including current arrival rate, service rate, by which sI calculates the traffic intensityqj 2 NSet(i)(j) = kj/lj. If "qj 2 NSet(i)(j) < 0.3 and no new job arrival during this procedure, sI will switch to sleeping mode. Thepseudo-code is given in Fig. 7.

It has long been recognized that energy conservation usually comes at the cost of degraded performance such as longerresponse time and lower throughput. When the site is sleeping, the new arrival jobs will wait in the queue. The wakeup pol-icy for sleeping site depends on the QoS request of the new arrival jobs just as the load-balancing strategies we providedpreviously. At the startup step, the site will switch to the normal mode and start serving.

The performance metric defined in this section complements the ones derived in Section 4. Here, we derive the gain inenergy at a site should the energy-save mechanism be activated. Considering the possible site states, we distinguish betweenfour possible levels of energy consumption, from highest to lowest:

Chigh: experienced during service.Cstartup: experienced when startup.Clow: the low level observed when the site is idle but not sleeping.Csleep: when the site is sleeping.

Without power management, the energy consumption per unit of time is Clow for idle and Chigh for busy period. The en-ergy consumption is En = qChigh + (1 � q)Clow, q = kE[S].

For the power management, during busy periods, the energy consumption per unit of time is Chigh also. At startup period,the consumption is Cstartup, and is equal to Csleep for the vacation period, Clow for the rest of time. The energy consumption is

Table 1Parame

Parame

MNJ1/ksqcvCJ

Em ¼E½B�E½Rc�

Chigh þ1

kE½Rc�Clow þ

NkE½Rc�

Csleep þE½U�E½Rc�

Cstartup ð19Þ

The economy in energy per unit of time should a site enable its power saving is Es = En � Em.

ter values

ter Explanation Value

Site number 128Job number 1000Mean inter-arrival time [0.01–0.1]Mean service time [0.1–2.0]Traffic intensity [0.1–0.9]Service coefficient of variation (heterogeneity) [0.1–0.9]Communication delay [0.05–0.5]Job size (MB) [5–50]


6. Experimental results

In this section, we use simulation experiments to evaluate the load-balancing and energy-saving policy we provide pre-viously. In our simulation model, we have considered heterogeneous sites connected by communication channels. It is as-sumed that job arrival times follow Poisson distribution and job execution times follow Uniform distribution. The variousparameter values for the simulation are shown in Table 1. All time units are in seconds, so performance metrics (averageresponse time: ART) are also measured in seconds.

6.1. Performance of ARI

We compare ARI with two well-known load-balancing policies: SI and RI in Ref. [7] to reveal the strength of proposedstrategy. ARI takes account the resource heterogeneity and communication delay when balances the workload in the neigh-bor set.

6.1.1. Accuracy of estimate finish timeFor ARI, whether the idle site SI will receive job k for heavy load site sj depends on EFTj

k and EFTIk. At the same time, the

vacation termination depends on the EFTIh (see Eq. (18)) and QosFT(h). Therefore, the accuracy of EFT decides the performance

of ARI. The calculation formulas for EFT given in Section 5 are derived from the analysis in Section 4. Here, we examine thedeviation of job’s actual depart time and EFT by varying traffic intensity (q). Here, the deviation is defined as

d ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðDeptimek � EFTkÞ2

q=ART and ART is ART ¼ 1

NJ

PNJi¼1ðDeptimei � ArrtimeiÞ. As can be seen from Fig. 8, for most cases

the deviation remains under 1. With the increasing of q, the estimation becomes more accurate. So we conclude that theEFT can accurately estimate the finish time for most jobs which ensures the performance of ARI.

6.1.2. ART for different traffic intensityThe traffic intensity measures the ratio of the job inter-arrival time and the site service time. According the statistical

properties of job arrival circle and workload variety of grid environment, more jobs arrive at prime time of weekday thanother time. In this experiment, we compare the ART of jobs between different load-balancing policy, and we set the service

coefficient of variation cv ¼ 0:5 cv ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

16i6MðMEANsp�spiÞ2

pMEANsp

� �. Here, ‘‘non” means no load-balancing strategy applied. As shown

in Fig. 9, load-balancing policy can reduce the ART by more than 40%. Our experiment also prove the results that receiverinitiated policies work much better than sender initiated policy with the increasing of q. ARI surpasses other policies in termof traffic intensity for most cases and is largely suitable for the grid environment which loads intensity varies according tothe job arrival daily circle.

6.1.3. Throughput for different heterogeneity of sitesWe consider the resource heterogeneity by setting the values of mean service capacity for randomly which is measured

by cv. For this experiment, we set q = 0.5. As seen in Fig. 10, ARI significantly improve system throughput over the other

0

0.5

1

1.5

2

2.5

1 9 2 183 274 365 456 547 638 729 820 911

=0.3

-0.5

-0.7

ρ

ρ

ρ

Estim

ate

depa

rt tim

e de

viat

ion

Fig. 8. Accuracy of estimate finishing time of ARI.

0

50

100

150

200

250

0.1 0.3 0.5 0.7 0.9Heterogeneity of nodes

Thou

ghpu

t non

si

ri

ari

Fig. 10. Throughput comparison between different load-balancing policies with service coefficient of variation.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0.1 0.3 0.5 0.7 0.9Traffic intensity of system

ART(

s)non

si

ri

ari

Fig. 9. ART comparison between different load-balancing policies by various traffic intensity.


strategies. When the cv is low, jobs are distributed on the sites evenly, ART for different policies is almost the same. Withthe increasing of cv, load-balancing policies perform better. The RI strategy differs from its counterpart SI in the taskmigration phase, which provides better performance for high heterogeneity system. When a capable site is idle, it will re-ceive extra load from heavy load site, which helps to take advantage the idle site’s process capacity to improve thethroughput. ARI migrate jobs taking into account the sites’ capacity heterogeneity, so it is more suitable for a heteroge-neous grid environment.

6.1.4. Utilization for grid sitesThe utilization measures the percentage of total available time out of total simulate time of a site (see Section 5.1.2). In

this experiment, we set cv = 0.7 and q = 0.7. ARI transfer jobs from heavy load sites to the idle sites which will balance theworkload among sites and improve the utilization of idle sites. In Fig. 11, it may be observed that the varying scope of uti-lization among sites is relatively small which means that the workload is distributed evenly among sites of gridenvironment.

6.1.5. Effect of job sizeARI takes into account job migration cost, and as job size largely affects the job migration cost due to communication

delay. It is very important to measure the performance of the algorithms by varying job size. In Fig. 12, as we increasethe job size, the performance of ARI becomes far better than other in term of ART. The results indicate that ARI outperformothers when the job migration cost is large, which is the case for large-scale grid environment.

0

0.2

0.4

0.6

0.8

1

1.2

1 11 21 31 41 51 61 71 81 91 101 111 121Nodes

Util

izat

ion non

si

r i

ari

Fig. 11. Utilization comparison between 128 sites for ARI.

0

0.5

1

1.5

2

2.5

3

3.5

4

5 10 20 50Size of jobs(M)

ART(

s)

si

ri

ari

Fig. 12. ART comparison between different load-balancing policies by various job sizes.


6.2. Performance of power management

Our power management policy is provided based on the exhaustive vacation theory. For the idle sites, if no new jobs ar-rive during the closedown period, it will switch to sleeping mode. Sites at sleeping mode consume less energy which will alsoimprove the reliability. The simulation experiments show the performance of our policy.

6.2.1. Fraction of grid sitesAt non-prime time, switching the site to sleeping mode will save the energy consuming. Here, we use the time-out power

management policy. The site will go to sleep if no jobs arrive before the end of closedown period. It is a tradeoff between thepower saving and the job QoS request. From Fig. 13, we can see that 80% idle site at non-prime time will switch to sleepingmode. It will achieve the power saving goal based on Eq. (19).

6.2.2. The ART for different traffic intensityThe idle sites will switch to sleeping mode only when all its neighbor sites’ traffic intensity is small than 0.3. When the

site is sleeping, the new arrival jobs will wait in the queue until it wakes up and the ART of these jobs will be extended cor-respondingly. Here, we compare the ART for our energy-saving policy and non-energy-saving mechanism by varying thetraffic intensity in [0.1–0.3]. The experiment results in Fig. 14 show the ART has small increase with the increasing of trafficintensity. The difference between ARTes and ATRnon is acceptable which ensures the satisfaction of jobs’ QoS request.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 3 5 7 9 11 13 15 17 19 21 23

Frac

tion

of g

rid s

its w

ith p

ower

man

agem

ent high

medlowidlesleep

Fig. 13. Utilization comparison between 128 sites for ARI.

0

0.5

1

1.5

2

2.5

3

0.1 0.2 0.3

Tranfic intensity

AR

T(s

)

Energy-saving non Energy-saving

Fig. 14. ART comparison between energy-saving policy and non-energy-saving policy as the traffic intensity varies.


7. Conclusions

In this paper, we first analyze of the LCG log file. The job arrival daily cycle and the fraction of resource state introduce theproblems of load-balancing and energy-saving. Queuing system with vacation model can be used to describe the utilizationof idle fraction of a site for different purpose. A tradeoff is induced between the site idle fraction utilization and the job QoSrequirements. Here, we develop a novel queuing analytical model with server vacations, startup and closedown to deducethe performance metrics which enable us to explore the tradeoff. ARI is a modified version of RI in which we considerthe job migration cost, resource heterogeneity and network dynamics when load-balancing is considered. Power-manage-ment policy will switch the idle sites to sleeping mode at non-prime time for energy-saving considering the QoS requestof new arrival jobs. The simulation results show that ARI surpasses other exist load-balancing strategies especially the casefor large-scale heterogeneous grid environment. The power management policy will switch most idle sites to sleep at non-prime time with acceptable response time. In the future, we will extend this work by providing fault tolerance into the re-source management system as fault tolerance is a very important characteristic for any distributed systems.

Acknowledgements

This paper is supplied by National Natural Science Foundation of China (Grant No. 60534060) and Shanghai Science andTechnology Research Plan (Grant No. 06JC14065). We are grateful to Gidon Moont, David Colling, and the e-Science group of


HEP at Imperial College London who graciously provide us with the LCG Grid trace used in this study. We also want to thankall the reviewers for their many valuable suggestions that improve the quality of this paper.

References

[1] Foster I, Kesselman C. The Grid: Blueprint for a future computing infrastructure. Los Altos: Morgan Kaufman; 1999.[2] Krauter K, Buyya R, Maheswaran M. A taxonomy and survey of grid resource management systems for distributed computing. Software – Practice Exp

2002;32:135–64.[3] Maheswaran M. Quality of service driven resource management algorithms for network computing. In: Proceedings of the 1999 international

conference on parallel and distributed processing technologies and applications (PDPTA’99). June 1999. p. 1090–6.[4] Feng W-C. Making a case for efficient supercomputing. ACM Queue 2003;1(7):55–64.[5] The LCG log file. <http://www.cs.huji.ac.il/labs/parallel/workload/l_lcg/index.html>.[6] Doshi BT. Queueing systems with vacations – a survey. Queue Systems 1986;1(1):29–66.[7] Shivaratri NG, Krueger P, Singhal M. Load distributing for locally distributed systems. Computer 1992;25(12):33–44.[8] Kim C, Kameda H. An algorithm for optimal static load balancing in distributed computer systems. IEEE Trans Comput 1992;41(3):381–4.[9] Zhu W, Socko P, Kiepuszewski B. Migration impact on load balancing – an experience on amoeba. Operat Systems Rev 1997;31(1):43–53.

[10] Feng Y, Li D, Wu H, Zhang Y. A dynamic load balancing algorithm based on distributed database system. In: Proceedings of the fourth internationalconference on high-performance computing in the Asia–Pacific region. May 2000. p. 949–52.

[11] Willebeek-LeMair M, Reeves A. Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distr Systems1993;9(4):979–93.

[12] Lin H, Raghavendra C. A dynamic load-balancing policy with a central job dispatcher (LBC). IEEE Trans Software Eng 1992;18(2):148–58.[13] Shah R, Veeravalli B, Misra M. On the design of adaptive and decentralized load-balancing algorithms with load estimation for computational grid

environments. IEEE Trans Parallel Distr systems 2007;18(12):1675–86.[14] Willebeek-LeMair MH, Reeves AP. Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distr systems

1993;4(9):979–93.[15] Watts J, Taylor S. A practical approach to dynamic load balancing. IEEE Trans Parallel Distr systems 1998;9(3):235–48.[16] Manimaran G, Siva Ram Murthy C. An efficient dynamic scheduling algorithm for multiprocessor real-time systems. IEEE Trans Parallel Distr systems

1998;9(3):312–9.[17] Zaki MJ, Parthasarathy WLS. Customized dynamic load balancing for a network of workstations. J Parallel Distr Comput 1997;43(2):156–62.[18] Sanders P. Analysis of nearest neighbor load balancing algorithms for random loads. Parallel Comput 1999;25(80):1013–33.[19] Helmbold D, Long DDE, Sherrod B. A dynamic disk spin-down technique for mobile computing. In: International conference on mobile computing and

networking (MobiCom); 1999.[20] Lebeck AR et al. Power aware page allocation. In: ASPLOS 2000. p. 105–16.[21] Zheng R, Hou JC, Sha L. Performance analysis of power management policies in wireless networks. IEEE Trans Wireless Commun 2006;5(6):1351–61.[22] Fallahi A, Hossain E, Alfa AS. QoS and energy trade off in distributed energy-limited mesh/relay networks: a queuing analysis. IEEE Trans Parallel Distr

systems 2006;17(6):576–92.[23] Freeh VW, Lowenthal DK, Pan F, Kappiah N, Springer R, Rountree BL, et al. Analyzing the energy-time trade-off in high-performance computing

applications. IEEE Trans Parallel Distr Systems 2007;18(6):835–48.[24] The Worldwide LHC Computing Grid project. <http://lcg.web.cern.ch/LCG/>.[25] Livny M, Melman M. Load balancing in homogeneous broadcast distributed systems. In: Proceedings of ACM computer network performance

symposium. New York: ACM; 1982. p. 47–55.[26] Ke J-C. The optimal control of an M/G/1 queuing system with server vacations, startup and breakdowns. Comput Indust Eng 2003;44:567–79.[27] Chae KC, Lee HW. MX/G/1 vacation models with N-policy: heuristic interpretation of the mean waiting time. J Operat Soc 1995;46:258–64.

http://www.cs.huji.ac.il/labs/parallel/workload/l_lcg/index.html

http://lcg.web.cern.ch/LCG/

grid resource management policies for load-balancing and energy-saving by vacation queuing theory

Documents