spatial task scheduling for cost minimization in distributed...

12
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019 729 Spatial Task Scheduling for Cost Minimization in Distributed Green Cloud Data Centers Haitao Yuan , Member, IEEE, Jing Bi , Senior Member, IEEE , and MengChu Zhou , Fellow, IEEE Abstract— The infrastructure resources in distributed green cloud data centers (DGCDCs) are shared by multiple heteroge- neous applications to provide flexible services to global users in a high-performance and low-cost way. It is highly challenging to minimize the total cost of a DGCDC provider in a market, where bandwidth prices of Internet service providers (ISPs), electricity prices, and the availability of renewable green energy all vary with geographical locations. Unlike existing studies, this paper proposes a spatial task scheduling and resource optimiza- tion (STSRO) method to minimize the total cost of their provider by cost-effectively scheduling all arriving tasks of heterogeneous applications to meet tasks’ delay-bound constraints. STSRO well exploits spatial diversity in DGCDCs. In each time slot, the cost minimization problem for DGCDCs is formulated as a constrained optimization one and solved by the proposed simulated annealing-based bat algorithm (SBA). Trace-driven experiments demonstrate that STSRO achieves lower total cost and higher throughput than two typical scheduling methods. Note to Practitioners—This paper investigates the cost mini- mization problem for DGCDCs while meeting delay-bound con- straints for all arriving tasks. Previous task scheduling methods do not jointly investigate the spatial diversity in bandwidth prices of ISPs, electricity prices, and the availability of renewable green energy. Therefore, they fail to cost-effectively schedule all arriving tasks of heterogeneous applications during their delay-bound constraints. In this paper, a new method that overcomes the shortcomings of the existing methods is proposed. It is obtained by using the proposed SBA that solves a constrained optimization problem. Simulation results demonstrate that compared with two typical scheduling methods, it increases the throughput and decreases the cost. It can be readily implemented and integrated into real-world industrial DGCDCs. The future work needs to investigate the indeterminacy of renewable energy and the uncertainty in arriving tasks with rough deep neural network approaches on STSRO. Manuscript received June 6, 2018; accepted July 14, 2018. Date of pub- lication August 6, 2018; date of current version April 5, 2019. This paper was recommended for publication by Associate Editor A. E. Smith and Editor L. Shi upon evaluation of the reviewers’ comments. This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant 2016RC030, in part by the China Postdoctoral Science Foundation under Grant 2017T100034 and Grant 2016M600912, and in part by the National Natural Science Foundation of China under Grant 61703011. The work of H. Yuan was supported by China Scholarship Council. (Corresponding author: Jing Bi.) H. Yuan is with the School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China (e-mail: [email protected]). J. Bi is with the School of Software Engineering in Faculty of Information Technology, Beijing University ofTechnology, Beijing 100124, China (e-mail: [email protected]). M. Zhou is with the Department of Electrical and Computer Engineer- ing, New Jersey Institute of Technology, Newark, NJ 07102 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASE.2018.2857206 Index Terms— Bat algorithm, cost minimization, distributed computing, green data centers, hybrid metaheuristic optimiza- tion, simulated annealing (SA), task scheduling. I. I NTRODUCTION C LOUD computing is growingly chosen by enter- prises, governments, and academic institutes in recent years [1]–[4]. It is implemented in cloud data centers (CDCs) that manage a great number of large-scale infrastructures that typically include millions of servers and cooling facili- ties [5], [6]. The infrastructure resources in CDCs are shared to concurrently support multiple applications that flexibly deliver services to users around the world. Each CDC usually needs tens of megawatts of energy for powering and cooling tens of thousands of servers [7]. To provide high availability and low latency, each application is usually deployed in multiple CDCs located in different geographical locations [8]. In addition, considering performance and cost, each CDC is connected to multiple Internet service providers (ISPs) that transmit gigantic data among distributed CDCs and millions of users [9]. It is shown that ISP bandwidth and energy cost account for a majority of a CDC provider’s operational expense [10]. Recently, we have witnessed the dramatic increase in the types and number of applications [11]. Recent data show that the energy consumed by data centers in the U.S. is roughly 78 billion kWh in 2017, and it accounts for about 2.9% of the total energy consumed in the U.S. [12]. It is expected that this percentage would continue to increase to 4% in 2020. In 2017, over 80% of energy in the U.S. is produced by burning nonrenewable fossil fuel, e.g., coal, petroleum, and natural gas. This will bring the irreversible harm and pollution to the global environment. Therefore, a growing number of CDCs, e.g., Microsoft and Amazon, install renew- able energy facilities to reduce the environmental pollution caused by the usage of fossil fuel and migrate to distributed green CDCs (DGCDCs) [13]. As tasks of applications soar, the energy cost of the DGCDCs is skyrocketing. There have been studies from both industries and academia on energy optimization problems [14], [15]. However, users’ tasks must first traverse through the wide-area network includ- ing multiple ISPs before they can reach back-end DGCDCs. For example, Google’s wide-area network delivers multiple applications, e.g., video, search, and mail, to global users [16]. DGCDCs also need to pay money to ISPs due to users’ tasks and response data transmitted among users and DGCDCs [17]. Currently, typical DGCDCs transmit more than a petabyte data each day, and therefore they suffer from huge ISP 1545-5955 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 13-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019 729

Spatial Task Scheduling for Cost Minimization inDistributed Green Cloud Data Centers

Haitao Yuan , Member, IEEE, Jing Bi , Senior Member, IEEE, and MengChu Zhou , Fellow, IEEE

Abstract— The infrastructure resources in distributed greencloud data centers (DGCDCs) are shared by multiple heteroge-neous applications to provide flexible services to global users ina high-performance and low-cost way. It is highly challengingto minimize the total cost of a DGCDC provider in a market,where bandwidth prices of Internet service providers (ISPs),electricity prices, and the availability of renewable green energyall vary with geographical locations. Unlike existing studies, thispaper proposes a spatial task scheduling and resource optimiza-tion (STSRO) method to minimize the total cost of their providerby cost-effectively scheduling all arriving tasks of heterogeneousapplications to meet tasks’ delay-bound constraints. STSROwell exploits spatial diversity in DGCDCs. In each time slot,the cost minimization problem for DGCDCs is formulated asa constrained optimization one and solved by the proposedsimulated annealing-based bat algorithm (SBA). Trace-drivenexperiments demonstrate that STSRO achieves lower total costand higher throughput than two typical scheduling methods.

Note to Practitioners—This paper investigates the cost mini-mization problem for DGCDCs while meeting delay-bound con-straints for all arriving tasks. Previous task scheduling methodsdo not jointly investigate the spatial diversity in bandwidth pricesof ISPs, electricity prices, and the availability of renewable greenenergy. Therefore, they fail to cost-effectively schedule all arrivingtasks of heterogeneous applications during their delay-boundconstraints. In this paper, a new method that overcomes theshortcomings of the existing methods is proposed. It is obtainedby using the proposed SBA that solves a constrained optimizationproblem. Simulation results demonstrate that compared withtwo typical scheduling methods, it increases the throughput anddecreases the cost. It can be readily implemented and integratedinto real-world industrial DGCDCs. The future work needsto investigate the indeterminacy of renewable energy and theuncertainty in arriving tasks with rough deep neural networkapproaches on STSRO.

Manuscript received June 6, 2018; accepted July 14, 2018. Date of pub-lication August 6, 2018; date of current version April 5, 2019. This paperwas recommended for publication by Associate Editor A. E. Smith andEditor L. Shi upon evaluation of the reviewers’ comments. This workwas supported in part by the Fundamental Research Funds for the CentralUniversities under Grant 2016RC030, in part by the China PostdoctoralScience Foundation under Grant 2017T100034 and Grant 2016M600912, andin part by the National Natural Science Foundation of China under Grant61703011. The work of H. Yuan was supported by China Scholarship Council.(Corresponding author: Jing Bi.)

H. Yuan is with the School of Software Engineering, Beijing JiaotongUniversity, Beijing 100044, China (e-mail: [email protected]).

J. Bi is with the School of Software Engineering in Faculty of InformationTechnology, Beijing University of Technology, Beijing 100124, China (e-mail:[email protected]).

M. Zhou is with the Department of Electrical and Computer Engineer-ing, New Jersey Institute of Technology, Newark, NJ 07102 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TASE.2018.2857206

Index Terms— Bat algorithm, cost minimization, distributedcomputing, green data centers, hybrid metaheuristic optimiza-tion, simulated annealing (SA), task scheduling.

I. INTRODUCTION

CLOUD computing is growingly chosen by enter-prises, governments, and academic institutes in recent

years [1]–[4]. It is implemented in cloud data centers (CDCs)that manage a great number of large-scale infrastructuresthat typically include millions of servers and cooling facili-ties [5], [6]. The infrastructure resources in CDCs are shared toconcurrently support multiple applications that flexibly deliverservices to users around the world. Each CDC usually needstens of megawatts of energy for powering and cooling tens ofthousands of servers [7]. To provide high availability and lowlatency, each application is usually deployed in multiple CDCslocated in different geographical locations [8]. In addition,considering performance and cost, each CDC is connected tomultiple Internet service providers (ISPs) that transmit giganticdata among distributed CDCs and millions of users [9]. It isshown that ISP bandwidth and energy cost account for amajority of a CDC provider’s operational expense [10].

Recently, we have witnessed the dramatic increase in thetypes and number of applications [11]. Recent data showthat the energy consumed by data centers in the U.S. isroughly 78 billion kWh in 2017, and it accounts for about2.9% of the total energy consumed in the U.S. [12]. It isexpected that this percentage would continue to increase to 4%in 2020. In 2017, over 80% of energy in the U.S. is producedby burning nonrenewable fossil fuel, e.g., coal, petroleum,and natural gas. This will bring the irreversible harm andpollution to the global environment. Therefore, a growingnumber of CDCs, e.g., Microsoft and Amazon, install renew-able energy facilities to reduce the environmental pollutioncaused by the usage of fossil fuel and migrate to distributedgreen CDCs (DGCDCs) [13]. As tasks of applications soar,the energy cost of the DGCDCs is skyrocketing.

There have been studies from both industries and academiaon energy optimization problems [14], [15]. However, users’tasks must first traverse through the wide-area network includ-ing multiple ISPs before they can reach back-end DGCDCs.For example, Google’s wide-area network delivers multipleapplications, e.g., video, search, and mail, to global users [16].DGCDCs also need to pay money to ISPs due to users’ tasksand response data transmitted among users and DGCDCs [17].Currently, typical DGCDCs transmit more than a petabytedata each day, and therefore they suffer from huge ISP

1545-5955 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

730 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

bandwidth cost [18]. In addition, the bandwidth price of eachISP is determined by a service-level agreement (SLA) [19]specified between users and a DGCDC provider, and thereforebandwidth prices of some ISPs are much more expensivethan others. However, existing studies do not consider thediversity in bandwidth prices of ISPs, and therefore maylead to high cost. Besides, DGCDCs are usually located indifferent sites where the prices of electricity also exhibit thespatial diversity [20]. Besides, wind speed, solar radiation,on-site air density, the maximum available energy, and thenumber of servers in each GCDC all vary with geographicallocations [21]. Thus, it becomes a big challenge to minimizethe total cost of a DGCDC provider in a market where ISPbandwidth price, electricity price, and availability of renewablegreen energy all show the spatial diversity.

Therefore, it is essential to provide a spatial task schedulingand resource optimization (STSRO) method to minimize thetotal cost of a DGCDC provider by exploiting the spatialdiversity of ISP bandwidth and energy cost while strictlymeeting delay-bound constraints of tasks of all applications.Here, we clearly give the formal definition of spatial taskscheduling. In this paper, the spatial task scheduling means atask scheduling method that considers the spatial diversity ofISP bandwidth and energy cost. This paper incorporates thespatial diversity in the above-mentioned factors into a con-strained optimization problem and solves it with the proposedsimulated annealing-based bat algorithm (SBA) to offer a real-time near-optimal solution. It jointly specifies the optimalallocation of all arriving tasks among multiple ISPs anddetermines the optimal setting of each server in each GCDC.It considers the variations of many factors, including electricityprice, wind speed, solar radiation, and on-site air density, andcan intelligently schedule all arriving tasks to DGCDCs withintheir delay-bound constraints.

The remainder of this paper is organized as follows. Therelated studies are discussed in Section II and the contributionof our work is comparatively pointed out. Section III presentsthe system architecture of DGCDCs. Section IV formulatesthe STSRO problem. Section V describes the proposed SBAalgorithm to solve this problem and realize STSRO thatminimizes the total cost of a DGCDC provider. Trace-drivenexperiments with real-life data are conducted to evaluate theSTSRO in Section VI. Section VII concludes this paper andgives the future direction.

II. RELATED WORK

A. Task Scheduling

A growing number of studies focus on the task schedul-ing problem in data centers in recent years [22]–[25].Shah-Mansouri et al. [22] formulate a utility maximizationproblem that considers a delay, prices of cloud services, andthe energy consumption and produces the optimal schedulingfor mobile users’ tasks. In addition, the optimal pricing strat-egy is determined for the cloud provider. It effectively bal-ances the tradeoff between the delay and energy consumption.However, the scheduling method in this paper is coarse-grained because it cannot determine the optimal setting of

each server in each GCDC. Maqsood et al. [23] presentmultiple algorithms to jointly realize data allocation andtask scheduling in a unified way. Besides, a feasible sys-tem model for Network-on-Chip architectures is proposed toeffectively capture the energy consumption by consideringcaches, processing cores, and the Network-on-Chip subsys-tem. However, it only considers the optimization of energyconsumption in data centers. Nir et al. [24] propose an energyand cost-aware task scheduling model that allows mobiledevices to schedule some tasks to cloud resources. Its optimalsolution can significantly reduce the total cost of the cloudprovider. However, it does not consider the energy producedby renewable sources. Wang et al. [25] propose a task schedul-ing model that describes tasks’ performance needs with theminimum frequency. It schedules the arriving tasks to activeservers and adjusts the execution frequencies of server cores toreduce the energy consumption. Zhang and Zhou [26] proposean effective two-stage strategy to increase the schedulingperformance of tasks and realize the load balancing of virtualmachines (VMs) in cloud. The Bayes classifier is adopted toclassify tasks according to the historical scheduling informa-tion. Then, an optimal number of VMs is dynamically createdto execute the matched tasks, and therefore, the performanceof task scheduling is improved.

Different from the above-mentioned studies, this paperjointly determines the optimal allocation of all tasks amongmultiple available ISPs and specifies the optimal setting ofeach server in each GCDC in each time slot. All tasksof each application are smartly scheduled to DGCDCswhile guaranteeing that tasks’ delay-bound constraints arestrictly met.

B. Green Cloud

Recently, an increasing number of studies have beenproposed to adopt green energy in large-scale data cen-ters [13], [27]–[31]. Farahnakian et al. [13] propose a VM con-solidation method based on live migration of VMs to reducethe energy consumption of data centers while meeting thedesired quality of service. However, it only considers theenergy optimization and ignores the availability of greenenergy. Deng et al. [27] propose an online algorithm to achievethe eco-aware energy optimization and load scheduling fordistributed data centers. A stochastic optimization problem isobtained and tackled with the Lyapunov optimization theory.Then, an online control algorithm is proposed to achieve theminimization of the eco-aware energy cost of data centerswhile meeting the tasks’ performance requirement. However,it focuses on the long-term time average of the eco-awarepower cost. Nguyen and Cheriet [28] design an environment-aware method for virtual slices to cope with the intermittenceof renewable energy. A virtual slice allocation problem thatconsiders renewable energy availability, VM locations, andnetwork capacity is designed and solved to effectively reducethe environmental footprint. Hasan et al. [29] apply virtualiza-tion technologies of renewable energy to tackle its availabilityuncertainty problem in data centers. A greenSLA algorithm isproposed to design a green SLA for data centers according

Page 3: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

YUAN et al.: SPATIAL TASK SCHEDULING FOR COST MINIMIZATION IN DGCDCS 731

to the availability and intermittent nature of green energy.Qiu et al. [30] propose a genetic-based algorithm for chipmultiprocessors with phase-change memory in green clouds.It realizes the tradeoff of the memory usage efficiency and thetotal execution time by scheduling tasks to cores. However,it aims to reduce the energy consumption by only optimizingthe memory usage and ignores the adoption of green energy.Kiani and Ansari [31] propose a novel workload distributionmethod to maximize the profit of geographically distributedgreen data centers. The performance of data centers is modeledand evaluated based on a G/D/1 queue. Specifically, a convexoptimization problem is formulated by considering SLAs andthe diversity of electricity prices in distributed data centers.Hassan et al. [32] present a novel and efficient mechanism torealize the cost-effective resource sharing in a cloud federation.In the mechanism, interactions among cloud providers in afederation are modeled as a coalition game that aims tomaximize the total profit or social welfare of cloud providers.Besides, a comprehensive analysis of revenue and costs ofcloud providers when they join in a federation is presented.

Different from the above-mentioned studies, this paperjointly considers the spatial diversity of DGCDCs’ manyfactors including electricity price, wind speed, solar radiation,on-site air density, the maximum available energy, and thenumber of servers in each GCDC. Then, it smartly schedulesall tasks of each application to DGCDCs located in multiplegeographical locations within their delay-bound constraints.

C. Cost Minimization

A growing number of studies have been proposedto achieve the cost minimization for CDCs [33]–[36].Chen and Chang [33] propose a cloud framework to real-ize user-oriented energy optimization for retail electricityproviders. A linear programming model is designed to min-imize the multiperiod global cost and stabilize the renew-able energy consumption for enhanced integration. However,it does not consider the spatial diversity of renewable energysources during delay-bound constraints of tasks of multipleapplications. Canali et al. [34] propose an allocation modelfor virtual elements (VEs) and aim to minimize the energyconsumption of a software-defined CDC. Besides, the energyconsumption is modeled by incorporating VEs’ computingcosts on physical servers, migrating costs across servers,and data transferring costs among VEs. However, the pro-posed method can only be applied to a single data center.Chen et al. [35] present a streaming workflow schedulingalgorithm that considers characteristics of streaming work-flow and the price diversity of geodistributed data centers.It aims to minimize the total cost for streaming big dataprocessing provided that the latency requirement is strictlymet. However, it ignores the spatial diversity of ISP bandwidthprice, electricity price, and availability of renewable greenenergy. Shi et al. [36] propose an offline algorithm to minimizethe cost of energy consumed by a data center and improveapplications’ quality of service. The data center managementis formulated as a cost minimization problem by incorporatingenergy cost, switching cost, and delay cost and solved by adynamic programming based algorithm.

Fig. 1. System architecture of DGCDCs.

Different from the above-mentioned studies, this paper aimsto minimize the total cost of a DGCDC provider by exploitingthe spatial diversity of ISP bandwidth price, electricity price,and availability of renewable green energy while strictlymeeting delay-bound constraints of tasks of all applications.

III. MOTIVATION AND SYSTEM ARCHITECTURE

This section presents the system architecture of DGCDCsillustrated in Fig. 1. A typical cloud provider manages multipleDGCDCs in different geographical locations and providesdifferent types of applications to users around the world.Each GCDC typically hosts a server cluster consisting of ahuge number of servers that range from several hundreds toseveral thousands. In addition, to guarantee the response time,robustness, and availability, multiple available ISPs that trans-fer data among DGCDCs and users are designed to connectto each GCDC. In addition, similar to [17], it is assumedthat replicas (e.g., programs and data) for each applicationhave been copied and distributed across all DGCDCs. Thus,applications and their data are consistent with each other,and therefore tasks of each application can be independentlyexecuted within each GCDC. Besides, it is assumed that theservers of each application are homogeneous while the serversof different applications are heterogeneous in hardware.

In Fig. 1, users around the world send their various tasksto DGCDCs through multiple types of electronic devices,e.g., smartphones, computers, servers, and laptops. DGCDCsrun as follows. In each GCDC, users’ tasks are executedbased on the first-come-first-serve (FCFS) policy [37]. Tasksof each application are enqueued into their correspondingFCFS queue. The information about all queues is sent totask scheduler. Besides, each GCDC can obtain electricityfrom multiple power sources (power grid, solar, and windenergy suppliers) and periodically transmit the informationto task scheduler. The information includes the price of thepower grid, wind speed, solar irradiance, on-site air density,average peak (idle) power of each server, and so on. Basedon the above-mentioned information, task scheduler executesSTSRO to jointly specify the optimal allocation of all arrivingtasks among multiple available ISPs and determine the optimal

Page 4: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

732 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

setting of each server in each GCDC. Then, the settinginformation of servers is adopted to configure them in eachGCDC.

There are several existing architectures [31], [38] that areclosely similar to the system architecture of DGCDCs. In thearchitecture in [31], users’ arriving requests are scheduled togeographically dispersed data centers powered by on-grid andrenewable energy produced by solar panels or a wind farm.In addition, in [31], tasks are enqueued into brown and greenqueues that are processed by servers powered by brown andgreen energy, respectively. Note that servers powered by brownenergy can only execute tasks in brown queues while serverspowered by green energy can only execute tasks in greenqueues. The work in [38] presents the hierarchical architectureto efficiently manage the workload in geodistributed datacenters. The architecture consists of two layers including upperand lower layers. The upper layer exchanges informationamong different locations and distributes VMs among datacenters while the lower layer schedules the workload withineach data center.

However, different from them, our architecture also incor-porates multiple ISPs delivering tasks between users andDGCDCs, and therefore, it is more realistic than the exist-ing architectures. Besides, in our architecture, servers ineach GCDC can obtain electricity from multiple energysources including power grid, solar panels, and wind farms.In addition, our architecture includes a centralized task sched-uler that periodically executes STSRO based on the collectedinformation about all task queues, power grid, and renewablegreen energy. Based on our architecture, Section IV formulatesthe STSRO problem. Section V describes the proposed SBAalgorithm that solves this problem and realizes STSRO tominimize the total cost of a DGCDC provider while strictlymeeting delay-bound constraints of tasks of all applications.

IV. PROBLEM FORMULATION

Based on the architecture of DGCDCs, the cost minimiza-tion problem is formulated. The objective is to minimize thetotal cost of a DGCDC provider denoted by �. � consists oftwo parts that are � and �, respectively. Here, � denotes theISP bandwidth cost of transmitting data between users andDGCDCs, and � denotes the DGCDCs’ energy cost broughtby the execution of tasks scheduled to DGCDCs in timeslot τ

� = �+ �. (1)

� is calculated as follows:

� =K∑

k=1

(bkτ

(C∑

c=1

N∑

n=1

(λk,c,nτ sn L

)))(2)

where L denotes the length of each time slot, K denotes thenumber of available ISPs, C denotes the number of DGCDCs,and N denotes the number of applications deployed in eachGCDC. Besides, bk

τ denotes the unit bandwidth price of ISPk in time slot τ , λk,c,n

τ denotes the arriving rate of tasks ofapplication n delivered to GCDC c through ISP k in time slotτ , and sn denotes the average size of each task of application n.

� is calculated as follows:

� =C∑

c=1

(pcτ

(max

(Ecτ − Ec,s

τ − Ec,wτ , 0

)))(3)

where pcτ denotes the price of electricity produced by thermal

power generation in GCDC c in time slot τ . μc,nτ denotes

the service rate of tasks of application n in GCDC c in timeslot τ . Ec

τ denotes the total energy consumed by the executionof tasks of all applications in GCDC c in time slot τ and it iscalculated as

Ecτ =

N∑

n=1

(gc

nμc,nτ + hc

nλc,nτ

(1− δ(λc,n

τ , μc,nτ

))

σ cn

L

)

gcn = Pc,n + (γ c − 1)P̂c,n

hcn = P̂c,n − Pc,n

δ(λc,nτ , μc,n

τ

) = 1− ρc,nτ

1− (ρc,nτ

)βcn+1

(ρc,nτ

)βcn

ρc,nτ =

λc,nτ

μc,nτ

(4)

where λc,nτ denotes the arriving rate of tasks of application

n in GCDC c in time slot τ . δ(λc,nτ , μc,n

τ ) denotes the losspossibility of tasks of application n in time slot τ . σ c

n denotesthe number of tasks executed by each switched-ON server forapplication n per minute in GCDC c. Pc,n and P̂c,n denotethe average idle and peak power of each server for applicationn in GCDC c, respectively. γ c denotes the value of powerusage effectiveness of GCDC c. βc

n denotes the capacity ofthe task queue of each server for application n in GCDC c,and it is the maximum number of tasks that all servers ofapplication n can execute. There are many existing studies thatadopt the M/M/1 queueing system to evaluate the performanceof servers of applications in data centers. Similar to [39]–[42],servers of application n in each GCDC are modeled as anM/M/1/βc

n /∞ queueing system.Ec,sτ denotes the solar energy consumed by the execution

of tasks of all applications in GCDC c in time slot τ .Following [43], we have

Ec,sτ = κcψc I c

τ L (5)

where κc denotes the conversion rate of solar radiation toelectricity in GCDC c, ψc denotes the active irradiation areaof solar panels, and I c

τ denotes the solar irradiance in timeslot τ .

Ec,wτ denotes the wind energy consumed by the execution

of tasks of all applications in GCDC c in time slot τ .Following [43], we have

Ec,wτ = 1

2ηcαcς c(υc

τ

)3L (6)

where ηc denotes the conversion rate of wind to electricity inGCDC c, αc denotes the on-site air density in GCDC c, ς c

denotes the rotor area of wind turbines in GCDC c, and υcτ

denotes the wind speed in GCDC c.Let �k denote the bandwidth capacity limit of ISP k. The

total bandwidth allocated to all tasks that are transmitted

Page 5: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

YUAN et al.: SPATIAL TASK SCHEDULING FOR COST MINIMIZATION IN DGCDCS 733

through ISP k must be less than or equal to �k in time slot τ ,that is

C∑

c=1

N∑

n=1

(λk,c,nτ sn

) ≤ �k . (7)

Let Mc,n denote the number of available servers for appli-cation n in GCDC c. The number of switched-ON servers forapplication n in GCDC c is (μc,n

τ /σ cn ) in time slot τ . Therefore

μc,nτ

σ cn≤ Mc,n . (8)

Let �c denote the amount of maximum available energy inGCDC c. The amount of energy consumed by the execution oftasks of all applications in GCDC c must be less than or equalto �c in time slot τ . Therefore

N∑

n=1

(gc

nμc,nτ + hc

nλc,nτ

(1− δ(λc,n

τ , μc,nτ

))

σ cn

L

)≤ �c. (9)

In time slot τ , to guarantee the stability of the task queueof application n in GCDC c, λc,n

τ must be less than or equalto μc,n

τ . Therefore

λc,nτ =

K∑

k=1

λk,c,nτ ≤ μc,n

τ . (10)

In time slot τ , the sum of λk,c,nτ must be equal to the arriving

rate of tasks of application n, λnτ . Therefore

λnτ =

C∑

c=1

λc,nτ =

C∑

c=1

K∑

k=1

λk,c,nτ . (11)

Let ξn denote the response time constraint of tasks ofapplication n. In time slot τ , the average response time of tasksof application n in GCDC c cannot exceed its constraint ξn

Lc,nτ

μc,nτ

(1− Qc,n,0

τ

) ≤ ξn (12)

where

Lc,nτ =

ρc,nτ

1− ρc,nτ−

(βc

n + 1)(ρc,nτ

)βcn+1

1− (ρc,nτ

)βcn+1

Qc,n,0τ = 1− ρc,n

τ

1− (ρc,nτ

)βcn+1

ρc,nτ =

λc,nτ

μc,nτ.

Then, based on (1)–(12), the cost minimization problem forDGCDCs is given as

P1 : Min {�}s.t.

C∑

c=1

N∑

n=1

(λk,c,nτ sn

) ≤ �k (13)

μc,nτ

σ cn≤ Mc,n (14)

N∑

n=1

(gc

nμc,nτ +hc

nλc,nτ

(1− δ(λc,n

τ , μc,nτ

))

σ cn

L

)≤ �c

(15)

λc,nτ =

K∑

k=1

λk,c,nτ ≤ μc,n

τ (16)

λnτ =

C∑

c=1

λc,nτ =

C∑

c=1

K∑

k=1

λk,c,nτ (17)

Lc,nτ

μc,nτ

(1− Qc,n,0

τ

) ≤ ξn (18)

λk,c,nτ ≥ 0, μc,n

τ > 0

(1 ≤ k ≤ K , 1 ≤ c ≤ C, 1 ≤ n ≤ N). (19)

Constraint (19) specifies the valid ranges of decisionvariables including λk,c,n

τ and μc,nτ . We also assume that

time slot-related parameters (e.g., pcτ , rn

τ , bkτ , I c

τ , and υcτ )

are already well predicted with existing prediction algo-rithms (e.g., stacked autoencoder deep neural network [44])at the beginning of each time slot τ . The method to solveP1 is described next, and its optimal solution jointly specifiesthe optimal allocation of all arriving tasks among multipleISPs and determines the optimal setting of each server ineach GCDC. In this way, the cost of a DGCDC provideris minimized while delay-bound constraints of all tasks arestrictly met.

V. SIMULATED ANNEALING-BASED BAT ALGORITHM

It is worth noting that � in P1 is nonlinear with respectto continuous decision variables. Thus, P1 is a constrainednonlinear program. We adopt a penalty function method [45]to transform it into unconstrained nonlinear program

P2 : Minλk,c,nτ ,μc,n

τ

{ f̂ = ϕ�+ �} (20)

where f̂ denotes the augmented objective function, ϕ is alarge positive constant, and � denotes the penalty of theviolation of all constraints. Let �� denote the vector of decisionvariables including λk,c,n

τ and μc,nτ . Let ξ and � be two positive

constants. � is obtained as follows:

� =Z∑

z=1

(max{0,−uz(��)})ξ +Y∑

y=1

|wy(��)|�. (21)

In (21), each inequality constraint z (1 ≤ z ≤ Z ) istransformed into uz(��) ≥ 0. If it is not violated, its penalty is 0;otherwise, its penalty is (−uz(��))ξ . Each equality constrainty (1 ≤ y ≤ Y ) is transformed into wy(��) = 0. If it is not vio-lated, its penalty is 0; otherwise, its penalty is

∣∣wy(��)∣∣�. In this

way, we can obtain unconstrained problem P2. There are sev-eral typical algorithms (e.g., conjugate gradient method [46]and sequential quadratic programming [47]) to solve it.However, they usually depend on the first-order or second-order derivatives of f̂ , and therefore, they are only suitablefor specific optimization problems with these mathematicalstructures [48]. In addition, their optimization processes arecomplex, and therefore, the quality of their final solutions isnot satisfying.

Metaheuristic algorithms have several advantages, e.g.,robustness, wide applicability, and easy implementation. Thus,they have been commonly applied to solve different types of

Page 6: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

734 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

complex optimization problems. However, each metaheuristicalgorithm has its own pros and cons [49], [50]. As a typ-ical example, simulated annealing (SA) has been proven tobe effective in solving continuous and discrete constrainedoptimization problems. Its Metropolis acceptance rule allowssome moves that worsen the objective function value in orderto escape from local optima. It has been demonstrated that SAis able to finally obtain global optima by careful selection ofthe temperature cooling rate. However, its main disadvantageis that its convergence process can be very slow [50]. Besides,the bat algorithm is commonly applied due to its many advan-tages, e.g., quick convergence. However, it may easily trap intolocal optima in its exploration and exploitation processes [49].Thus, its final solutions are usually of low quality when itis applied to solve large-scale optimization problems withhigh-dimension search spaces. To improve the efficiency andglobal search accuracy of bat algorithm (BA), we adopt ahybrid metaheuristic algorithm named SBA by combining SA’sMetropolis acceptance rule into the bat algorithm. In thisway, the diversity of solutions is increased to improve the batalgorithm’s performance, thereby yielding its excellent variant.The pseudocode of SBA is given in Algorithm 1.

Algorithm 1 SBA1: Initialize positions and velocities of all bats2: Initialize frequency fi , pulse rate ri and loudness Ai of bat

i3: Calculate the fitness value of each position, and store the

optimal position in x∗4: Set the initial temperature �05: t ← 06: while t ≤ T do7: Calculate the adaptation value �i of bat i in current

temperature �t based on (22)8: Determine an alternative optimal position x ′∗ from posi-

tions of all bats with the roulette strategy9: Adjust frequencies and update positions and velocities of

all bats based on (23)–(25)10: Calculate the new fitness value of bat i11: if rand>ri then12: Choose a position from the best positions for bat i13: Produce a new position around it with random walk

based on (26)14: end if15: if rand < Ai && f̂ (xi ) < f̂ (x∗) then16: Accept the new solution for bat i17: Increase ri and decrease Ai based on (27)18: end if19: Rank bats and determine currently optimal position x∗20: �t+1← �tφ21: t ← t + 122: end while23: Output the best position of all bats, x∗

The details of Algorithm 1 are described here. Line 1initializes positions and velocities of all bats. Let xt

i and v ti

denote the position and velocity of bat i in iteration t . xti and

v ti are the d-dimension vectors that include decision variables.

The first K∗C∗N elements of each vector store λk,c,nτ . The

next C∗N elements of each vector store μc,nτ . Therefore,

d = C∗N(K + 1). Line 2 initializes frequency fi , pulserate ri , and loudness Ai of bat i . Here, fi ∈ [ fmin, fmax],ri ∈ [rmin, rmax], and Ai ∈ [Amin, Amax]. Line 3 calculates thefitness value of each position and stores the optimal positionin x∗. Line 4 sets the initial temperature �0. Let T denotethe maximum number of iterations. Let I denote the numberof bats. Let �i denote the adaptation value of position xi ofbat i . Line 7 calculates �i in current temperature �t basedon (22). Let �t denote the current temperature in iteration t .Line 8 determines an alternative optimal position x ′∗ selectedfrom positions of all bats with a roulette strategy [51]

�i = e−( f̂ (xi )− f̂ (x∗))

�t

∑Ii=1 e

−( f̂ (xi )− f̂ (x∗))�t

. (22)

Line 8 adjusts frequencies and updates positions and veloc-ities of all bats based on (23)–(25)

fi = fmin + ( fmax − fmin)χ (23)

v t+1i = v t

i +(xt

i − x ′∗)

fi (24)

xt+1i = xt

i + v t+1i . (25)

Line 10 calculates the new fitness value of bat i .Lines 11–14 choose a position (xold) from the best positionsfor bat i and produce a new position (xnew) around it withrandom walk [52] if rand>ri is met

xnew = xold + εAt (26)

where ε(ε ∈ [−1, 1]) is a random number, and At = (1/I )∑Ii=1 At

i is the average loudness of all bats at iteration t .Lines 15–18 accept the new solution for bat i , increase ri ,

and decrease Ai based on (27) if rand < Ai andf̂ (xi ) < f̂ (x∗)

At+1i = ζ At

i , rt+1i = r0

i [1− e−θ t ]. (27)

Line 19 ranks bats and determines the current optimalposition x∗. Finally, the best position of all bats, x∗, is outputas the final solution.

VI. PERFORMANCE EVALUATION

The following experiments evaluate STSRO with real-lifedata. STSRO is coded and implemented with MATLAB2017, and it runs on a computer with an Intel XeonE5-2699AV4 CPU at 2.4 GHz and a 32-GB DDR4 memory.

A. Parameter Setting

We adopt publicly available realistic tasks of three appli-cations in Google cluster1 for one day on May 10, 2011.Fig. 2 shows the task arriving rates of three applications(types 1, 2, and 3) that are sampled every 5 min. Besides,we adopt realistic electricity price for one day on May 10,2011 in capital region of New York, U.S.2

1https://github.com/google/cluster-data2http://www.nyiso.com/public/index.jsp

Page 7: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

YUAN et al.: SPATIAL TASK SCHEDULING FOR COST MINIMIZATION IN DGCDCS 735

Fig. 2. Task arriving rates of three applications.

TABLE I

PARAMETER SETTING OF ENERGY SOURCES

Fig. 3. Solar irradiance of three GCDCs.

Fig. 4. Wind speed of three GCDCs.

Here, K = 3, C = 3, and N = 3. Based on [53],the parameter setting of energy sources including thermalpower, wind energy, and solar energy is given in Table I.Besides, this paper collects data about solar irradiance3 andwind speed4 for 1 day on May 10, 2011. The solar irra-diance and the wind speed in three GCDCs are shownin Figs. 3 and 4.

In addition, σ cn , Pc,n , P̂c,n , βc

n , and Mc,n are set in Table II.The bandwidth prices of three ISPs are shown in Fig. 5.

3http://www.nrel.gov/midc/srrl_bms/4http://www.nrel.gov/midc/nwtc_m2/

Fig. 5. Bandwidth prices of three ISPs.

Fig. 6. Electricity prices of three GCDCs.

Fig. 7. Total cost ($/s) and penalty in each time slot.

Besides, the electricity prices of three GCDCs are shownin Fig. 6. According to [54], �1 = 4×106 (Mb/s), �2 =5×106 (Mb/s), and �3 = 6×106 (Mb/s). In addition,s1 = 8 (Mb), s2 = 5 (Mb), s3 = 2 (Mb), ξ1 = 0.15 (s),ξ2 = 0.2 (s), and ξ3 = 0.25 (s).

It is worth noting that many metaheuristic algorithms aresensitive to the setting of their parameters. Therefore, basedon the parameter setting in previous studies [49], [50], [52],numerous experiments are conducted to investigate and obtainthe optimal setting of parameters in SBA by using the gridsearch method [55]. Finally, the parameter setting of SBA isdetermined and shown as follows: fmin = 0, fmax = 100,rmin = 0, rmax = 1, Amin = 1, and Amax = 100. Besides,I = 50, χ ∈ [0, 1], ζ = θ = 0.9, �0 = 1012, T = 103 andφ = 0.975. Besides, ϕ = 1020 and ξ = � = 2.

B. Experimental Results

Fig. 7 illustrates the total cost of a DGCDC provider andits penalty of final solutions obtained by STSRO. It is shownthat the penalty in each time slot is zero, and it demonstrates

Page 8: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

736 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

TABLE II

PARAMETER SETTING OF THREE GCDCS

Fig. 8. Occupied bandwidth of three ISPs.

Fig. 9. Consumption of energy produced by thermal power generation.

that STSRO can search a high-quality solution that meets allconstraints in P1. Besides, as shown in (20), Fig. 7 representsthat STSRO can minimize the total cost of a DGCDC provider.The occupied bandwidth of three ISPs connecting to DGCDCsis considered. As shown in Fig. 5, the bandwidth prices ofISPs are different from each other. Fig. 8 illustrates that theoccupied bandwidth of each ISP differs significantly becauseof the variation in bandwidth prices of ISPs. The reason isthat STSRO aims to minimize the total cost of a DGCDCprovider by specifying the optimal allocation of all arrivingtasks among multiple ISPs. It is observed that the numberof tasks that traverse through ISP 1 is the largest while thenumber of tasks that traverse ISP 3 is the smallest amongthree ISPs. The result is consistent with bandwidth prices ofthree ISPs, i.e., ISP 1’s bandwidth price is the smallest whileISP 3’s bandwidth price is the largest.

The consumption of energy produced by thermal powergeneration in three GCDCs is shown in Fig. 9. As shownin Fig. 6, electricity prices of three GCDCs are also differentfrom each other. Fig. 9 shows that the consumption of energyproduced by thermal power generation in three GCDCs variesdue to the difference in electricity prices of three GCDCs.Similarly, the reason is that STSRO aims to minimize the total

cost of a DGCDC provider by determining the optimal settingof each server in each GCDC. It is shown that the consumptionof energy produced by thermal power generation in GCDC 3is the largest while that in GCDC 1 is the smallest amongthree GCDCs. The result is consistent with electricity pricesof three GCDCs, i.e., GCDC 3’s electricity price is the smallestwhile GCDC 1’s electricity price is the largest.

Fig. 10 illustrates the total energy consumption of threeGCDCs. It is observed that the total energy consumption ofeach GCDC is much less than its corresponding amount ofthe maximum available energy in most of the time slots.Besides, it shows that STSRO can dynamically and efficientlyutilize the available energy by optimally determining thesetting of each server in each GCDC. Fig. 11 illustratesthe number of switched-ON servers in GCDCs 1, 2, and 3,respectively. It is observed that the number of switched-ON

servers for each type in each GCDC does not exceed itscorresponding limit. In addition, it is also shown that thenumber of switched-ON servers for the same type in threeGCDCs differs a lot. For example, the number of switched-ON

servers for type 2 in GCDC 1 is obviously larger than thatfor type 2 in GCDCs 2 and 3. This is because GCDC 1’selectricity price is the smallest among three GCDCs. Thus,the result demonstrates that STSRO can minimize the total costof the GCDCs provider by determining the optimal number ofswitched-ON servers in each GCDC.

To show the performance of SBA, this paper comparesit with two typical metaheuristic algorithms including SAand BA. The reasons for choosing them for comparison aredescribed as follows. It is demonstrated that SA can convergeto a global optimum in theory by careful design of the coolingrate of temperature because it can smartly escape from a localoptimum [50]. Thus, the comparison between SBA and SA candemonstrate the accuracy of SBA’s final solution. In addition,it is also shown that BA’s convergence speed is quick [49].Thus, the comparison between SBA and BA can demonstrateSBA’s convergence speed. Note that SA and BA are bothsensitive to the setting of their parameters. Therefore, similarto SBA, numerous experiments are conducted to determinethe optimal setting of parameters of SA and BA based on thegrid search method [55] and the parameter setting in previousstudies [49], [50], [52].

Fig. 12 shows the comparison of the execution time of SBA,BA, and SA. It is shown that the average execution time ofSA is 7.13 s that is 4.27 times larger than that of SBA, 1.67 s,and 39.61 times larger than that of PSO, 0.18 s. In addition,although PSO’s execution time is the smallest, this is causedby its fast trap into a local optimum. Fig. 13 presents thetotal cost comparison of each iteration of SBA, BA, and SA

Page 9: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

YUAN et al.: SPATIAL TASK SCHEDULING FOR COST MINIMIZATION IN DGCDCS 737

Fig. 10. Total energy consumption of three GCDCs. (a) GCDC 1. (b) GCDC 2. (c) GCDC 3.

Fig. 11. Number of switched-ON servers in three GCDCs. (a) GCDC 1. (b) GCDC 2. (c) GCDC 3.

Fig. 12. Comparison of execution time.

Fig. 13. Total cost of each iteration in the 50th time slot.

in the 50th time slot. Here, each iteration in SBA representsLines 7–21 in Algorithm 1. The iterations of BA and SA havethe similar meaning as that of SBA. Fig. 14 shows the penaltyin each iteration in each time slot, which is calculated basedon (21).

It is shown that BA converges after the least number ofiterations compared to SBA and SA. Nevertheless, Fig. 14illustrates that the penalty of BA’s final solution is extremelylarge (about 4×106). This result shows that its final solutioncannot satisfy all the constraints in P1. Therefore, BA’s finalsolution is the worst due to its quick trap into a local optimum.SA requires about 766 iterations to converge to its final

Fig. 14. Penalty of each iteration.

solution, and the total cost of its final solution is 263.31$.SBA only requires 241 iterations to converge to its finalsolution, and its corresponding final total cost is 232.26$.Therefore, SBA decreases the total cost of a DGCDC providerby 31.05$ in much fewer iterations than SA. Besides, Fig. 14presents that the penalty of SBA’s final solution is zero.It means that SBA can obtain a high-quality solution meetingall the constraints in P1. Therefore, Figs. 12–14 show thatthe adoption of SA’s Metropolis acceptance rule in SBA canincrease the diversity of solutions and improve the efficiencyand global search accuracy of BA.

To demonstrate the effectiveness of STSRO, it is comparedwith two typical scheduling approaches [27], [54] in terms ofthe total cost and throughput of the DGCDC provider.

1) Method A1, similar to the cheap-first scheduling in [27],schedules tasks to DGCDCs according to the order oftheir electricity prices. Therefore, the GCDC with theleast electricity price executes the largest number oftasks while the one with the highest electricity priceexecutes the least number of tasks.

2) Method A2, similar to the renewable energy-firstscheduling in [54], schedules tasks to DGCDCs accord-ing to the order of their amount of renewable energy.Therefore, the GCDC with the largest amount of

Page 10: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

738 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

Fig. 15. Throughput of STSRO, A1, and A2. (a) Type 1. (b) Type 2. (c) Type 3.

Fig. 16. Total cost of STSRO, A1, and A2.

renewable energy executes the largest number of taskswhile the one with the least amount executes the leastnumber of tasks.

Fig. 15 compares STSRO with A1 and A2 in terms of thethroughput that is the number of tasks scheduled in time slot τ .It is observed in Fig. 15 that the throughput of STSRO islarger than those of A1 and A2 in each time slot for eachapplication, respectively. For example, for type 1 application,the throughput of STSRO is larger than that of A1 and A2by 52.94% and 52.11% on average, respectively. The reasonis that the bandwidth capacity of each ISP, the numberof available servers for each application and the maximumamount of available energy in each GCDC, are all limited ineach time slot. Therefore, some arriving tasks are refused andnot executed to DGCDCs when using A1 or A2. Thus, Fig. 15shows that the throughput of DGCDCs is drastically increasedwith STSRO.

Besides, with A1 and A2, an average allocation policy isadopted by ISPs [27], [54]. It means that all tasks are evenlyallocated among multiple ISPs. Fig. 16 illustrates the totalcost of STSRO, A1, and A2, respectively. To guarantee theexecution performance of tasks, the penalty cost is usuallyspecified in SLA for task refusal [56]. It is determined afterthe negotiation between users and a DGCDC provider, andall refused tasks bring the penalty to a DGCDC provider.Let ϑn

τ denote the penalty paid by their provider if a task ofapplication n is refused in time slot τ . ϑn

τ in SLAs is typicallylarger than the maximum cost caused by the execution ofeach task of application n among DGCDCs in time slot τ .It motivates the DGCDC provider to strictly meet the delay-bound constraints of all tasks. Similar to Fig. 15, the total costin time slot τ is obtained by calculating the sum of the costbrought by executed tasks in DGCDCs, and the penalty dueto refused tasks in time slot τ . It is shown in Fig. 16 that

compared with A1 and A2, the total cost of STSRO can bereduced by 30.58% and 30.82% on average, respectively. Thereason is that STSRO smartly schedules tasks among ISPsand DGCDCs by jointly considering the spatial diversity inbandwidth prices of ISPs, electricity prices, and the availabilityof renewable green energy in DGCDCs.

VII. CONCLUSION

DGCDCs need a huge amount of bandwidth and energy toexecute multiple applications. Existing studies investigate theenergy cost minimization problem in DGCDCs. The spatialdiversity of bandwidth prices of ISPs, electricity prices, andthe availability of renewable green energy brings an oppor-tunity to minimize the total cost of a DGCDC provider.A nonlinear optimization problem is formulated and solvedby the proposed SBA. In this way, this paper proposes anSTSRO method to minimize the total cost of a DGCDCprovider by exploiting such spatial diversity in DGCDCs.STSRO can cost-effectively schedule all arriving tasks ofheterogeneous applications while strictly meeting their delay-bound constraints. Experimental results demonstrate that it candrastically increase the throughput and reduce the total cost ofa DGCDC provider in comparison with two recent schedulingmethods provided that delay-bound constraints of all tasksare strictly met. In the future work, we plan to considerthe indeterminacy of renewable energy and the uncertaintyin arriving tasks with rough deep neural network approaches.Other recent intelligent optimization methods should beexplored (see [57]–[61])

REFERENCES

[1] M. H. Ghahramani, M. Zhou, and C. T. Hon, “Toward cloud comput-ing QoS architecture: Analysis of cloud systems and cloud services,”IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 6–18, Jan. 2017.

[2] M. Armbrust et al., “A view of cloud computing,” Commun. ACM,vol. 53, no. 4, pp. 50–58, 2010.

[3] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloudcomputing and emerging IT platforms: Vision, hype, and reality fordelivering computing as the 5th utility,” Future Generat. Comput. Syst.,vol. 25, no. 6, pp. 599–616, 2009.

[4] M. Lin, A. Wierman, L. L. Andrew, and E. Thereska, “Dynamic right-sizing for power-proportional data centers,” IEEE/ACM Trans. Netw.,vol. 21, no. 5, pp. 1378–1391, Oct. 2013.

[5] J. Shuja et al., “Survey of techniques and architectures for designingenergy-efficient data centers,” IEEE Syst. J., vol. 10, no. 2, pp. 507–519,Jun. 2016.

[6] D. A. Chekired and L. Khoukhi, “Smart grid solution for charging anddischarging services based on cloud computing scheduling,” IEEE Trans.Ind. Informat., vol. 13, no. 6, pp. 3312–3321, Dec. 2017.

Page 11: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

YUAN et al.: SPATIAL TASK SCHEDULING FOR COST MINIMIZATION IN DGCDCS 739

[7] M. Erol-Kantarci and H. T. Mouftah, “Energy-efficient information andcommunication infrastructures in the smart grid: A survey on interac-tions and open issues,” IEEE Commun. Surveys Tuts., vol. 17, no. 1,pp. 179–197, 1st Quart., 2015.

[8] M. Dabbagh, B. Hamdaoui, M. Guizani, and A. Rayes, “Energy-efficientresource allocation and provisioning framework for cloud data centers,”IEEE Trans. Netw. Service Manage., vol. 12, no. 3, pp. 377–391,Sep. 2015.

[9] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali,“Cloud computing: Distributed Internet computing for IT and scien-tific research,” IEEE Internet Comput., vol. 13, no. 5, pp. 10–13,Sep./Oct. 2009.

[10] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost of acloud: Research problems in data center networks,” ACM SIGCOMMComputer Commun. Rev., vol. 39, no. 1, pp. 68–73, Jan. 2009.

[11] A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resourceallocation heuristics for efficient management of data centers for cloudcomputing,” Future Gener. Comput. Syst., vol. 28, no. 5, pp. 755–768,May 2012.

[12] U.S. Energy Information Administration. Accessed: Jun. 2018. [Online].Available: http://www.eia.gov/

[13] F. Farahnakian et al., “Using ant colony system to consolidate VMs forgreen cloud computing,” IEEE Trans. Services Comput., vol. 8, no. 2,pp. 187–198, Mar. 2015.

[14] Z. Á. Mann, “Allocation of virtual machines in cloud data centers—A survey of problem models and optimization algorithms,” ACM Com-put. Surv., vol. 48, no. 1, pp. 11:1–11:34, Aug. 2015.

[15] D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, and A. Y. Zomaya,“Energy-efficient data replication in cloud computing datacenters,” Clus-ter Comput., vol. 18, no. 1, pp. 385–402, Mar. 2015.

[16] C. Y. Hong et al., “Achieving high utilization with software-drivenWAN,” ACM SIGCOMM Comput. Commun. Rev., vol. 43, no. 4,pp. 15–26, Oct. 2013.

[17] H. Yuan, J. Bi, W. Tan, and B. H. Li, “CAWSAC: Cost-aware workloadscheduling and admission control for distributed cloud data centers,”IEEE Trans. Autom. Sci. Eng., vol. 13, no. 2, pp. 976–985, Apr. 2015.

[18] Z. Zhang, M. Zhang, A. Greenberg, Y. C. Hu, R. Mahajan, andB. Christian, “Optimizing cost and performance in online serviceprovider networks,” in Proc. 7th USENIX Symp. Netw. Syst. DesignImplement., 2010, pp. 33–48.

[19] U. Franke and M. Buschle, “Experimental evidence on decision-makingin availability service level agreements,” IEEE Trans. Netw. ServiceManage., vol. 13, no. 1, pp. 58–70, Mar. 2016.

[20] J. Luo, L. Rao, and X. Liu, “Temporal load balancing with servicedelay guarantees for data center energy cost optimization,” IEEE Trans.Parallel Distrib. Syst., vol. 25, no. 3, pp. 775–784, Mar. 2014.

[21] A. Kiani and N. Ansari, “A fundamental tradeoff between total andbrown power consumption in geographically dispersed data centers,”IEEE Commun. Lett., vol. 20, no. 10, pp. 1955–1958, Oct. 2016.

[22] H. Shah-Mansouri, V. W. Wong, and R. Schober, “Joint optimal pricingand task scheduling in mobile cloud computing systems,” IEEE Trans.Wireless Commun., vol. 16, no. 8, pp. 5218–5232, Aug. 2017.

[23] T. Maqsood, N. Tziritas, T. Loukopoulos, S. A. Madani, S. U. Khan,and C.-Z. Xu, “Leveraging on deep memory hierarchies to minimizeenergy consumption and data access latency on single-chip cloudcomputers,” IEEE Trans. Sustain. Comput., vol. 2, no. 2, pp. 154–166,Apr./Jun. 2017.

[24] M. Nir, A. Matrawy, and M. St-Hilaire, “Economic and energy consid-erations for resource augmentation in mobile cloud computing,” IEEETrans. Cloud Comput., vol. 6, no. 1, pp. 99–113, Jan./Mar. 2018.

[25] S. Wang, Z. Qian, J. Yuan, and I. You, “A DVFS based energy-efficienttasks scheduling in a data center,” IEEE Access, vol. 5, pp. 13090–13102,Jul. 2017.

[26] P. Zhang and M. Zhou, “Dynamic cloud task scheduling based on atwo-stage strategy,” IEEE Trans. Automat. Sci. Eng., vol. 15, no. 2,pp. 772–783, Apr. 2018.

[27] X. Deng, D. Wu, J. Shen, and J. He, “Eco-aware online power manage-ment and load scheduling for green cloud datacenters,” IEEE Syst. J.,vol. 10, no. 1, pp. 78–87, Mar. 2016.

[28] K. K. Nguyen and M. Cheriet, “Environment-aware virtual slice pro-visioning in green cloud environment,” IEEE Trans. Services Comput.,vol. 8, no. 3, pp. 507–519, May 2015.

[29] M. S. Hasan, Y. Kouki, T. Ledoux, and J.-L. Pazat, “Exploiting renew-able sources: When green SLA becomes a possible reality in cloudcomputing,” IEEE Trans. Cloud Comput., vol. 5, no. 2, pp. 249–262,Apr./Jun. 2017.

[30] M. Qiu, Z. Ming, J. Li, K. Gai, and Z. Zong, “Phase-change memoryoptimization for green cloud with genetic algorithm,” IEEE Trans.Comput., vol. 64, no. 12, pp. 3528–3540, Dec. 2015.

[31] A. Kiani and N. Ansari, “Profit maximization for geographically dis-persed green data centers,” IEEE Trans. Smart Grid, vol. 9, no. 2,pp. 703–711, Mar. 2018.

[32] M. M. Hassan, M. A. Al-Wadud, and G. Fortino, “A socially optimalresource and revenue sharing mechanism in cloud federations,” inProc. IEEE 19th Int. Conf. Comput. Supported Cooperat. Work Design,May 2015, pp. 620–625.

[33] Y.-W. Chen and J. M. Chang, “EMaaS: Cloud-based energy managementservice for distributed renewable energy integration,” IEEE Trans. SmartGrid, vol. 6, no. 6, pp. 2816–2824, Nov. 2015.

[34] C. Canali, L. Chiaraviglio, R. Lancellotti, and M. Shojafar, “Jointminimization of the energy costs from computing, data transmission,and migrations in cloud data centers,” IEEE Trans. Green Commun.Netw., vol. 2, no. 2, pp. 1–16, Jun. 2018.

[35] W. Chen, I. Paik, and Z. Li, “Cost-aware streaming workflow allocationon geo-distributed data centers,” IEEE Trans. Comput., vol. 66, no. 2,pp. 256–271, Feb. 2017.

[36] L. Shi, Y. Shi, X. Wei, X. Ding, and Z. Wei, “Cost minimizationalgorithms for data center management,” IEEE Trans. Parallel Distrib.Syst., vol. 28, no. 1, pp. 60–71, Jan. 2017.

[37] S. Garg, J. Aryal, H. Wang, T. Shah, G. Kecskemeti, and R. Ranjan,“Cloud computing based bushfire prediction for cyber–physical emer-gency applications,” Future Gener. Comput. Syst., vol. 79, pp. 354–363,Feb. 2017.

[38] A. Forestiero, C. Mastroianni, M. Meo, G. Papuzzo, andM. Sheikhalishahi, “Hierarchical approach for efficient workloadmanagement in geo-distributed data centers,” IEEE Trans. GreenCommun. Netw., vol. 1, no. 1, pp. 97–111, Mar. 2017.

[39] H. Yuan, J. Bi, M. Zhou, and A. C. Ammari, “Time-aware multi-application task scheduling with guaranteed delay constraints ingreen data center,” IEEE Trans. Autom. Sci. Eng., vol. 15, no. 3,pp. 1138–1151, Sep. 2017.

[40] Z. Liu, M. Lin, A. Wierman, S. Low, and L. L. H. Andrew, “Greeninggeographical load balancing,” IEEE/ACM Trans. Netw., vol. 23, no. 2,pp. 657–671, Apr. 2015.

[41] J. Yao, H. Guan, J. Luo, L. Rao, and X. Liu, “Adaptive powermanagement through thermal aware workload balancing in Internetdata centers,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 9,pp. 2400–2409, Sep. 2015.

[42] H. Shao et al., “Optimal load balancing and energy cost manage-ment for Internet data centers in deregulated electricity markets,”IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 10, pp. 2659–2669,Oct. 2014.

[43] M.-F. Hsieh, F.-S. Hsu, and D. G. Dorrell, “Winding changeoverpermanent-magnet generators for renewable energy applications,” IEEETrans. Magn., vol. 48, no. 11, pp. 4168–4171, Nov. 2012.

[44] M. Khodayar, O. Kaynak, and M. E. Khodayar, “Rough deep neuralarchitecture for short-term wind speed forecasting,” IEEE Trans. Ind.Informat., vol. 13, no. 6, pp. 2770–2779, Dec. 2017.

[45] M. Shokrian and K. A. High, “Application of a multi objective multi-leader particle swarm optimization algorithm on NLP and MINLPproblems,” Comput. Chem. Eng., vol. 60, no. 1, pp. 57–75, Jan. 2014.

[46] C. B. Khadse, M. A. Chaudhari, and V. B. Borghate, “Electromagneticcompatibility estimator using scaled conjugate gradient backpropagationbased artificial neural network,” IEEE Trans. Ind. Informat., vol. 13,no. 3, pp. 1036–1045, Jun. 2017.

[47] W. Sheng, K.-Y. Liu, S. Cheng, X. Meng, and W. Dai, “A trustregion SQP method for coordinated voltage control in smart distri-bution grid,” IEEE Trans. Smart Grid, vol. 7, no. 1, pp. 381–391,Jan. 2016.

[48] W. Tian, H. Zhou, and W. Deng, “A class of second order differenceapproximations for solving space fractional diffusion equations,” Math.Comput., vol. 84, no. 294, pp. 1703–1727, Jan. 2015.

[49] A. H. Gandomi and X.-S. Yang, “Chaotic bat algorithm,” J. Comput.Sci., vol. 5, no. 2, pp. 224–232, Mar. 2014.

[50] V. F. Yu and S.-W. Lin, “Multi-start simulated annealing heuristic forthe location routing problem with simultaneous pickup and delivery,”Appl. Soft Comput., vol. 24, no. 1, pp. 284–290, Nov. 2014.

[51] H. Hallawi, J. Mehnen, and H. He, “Multi-capacity combinatorialordering GA in application to cloud resources allocation and efficientvirtual machines consolidation,” Future Gener. Comput. Syst., vol. 69,pp. 1–10, Apr. 2017.

Page 12: Spatial Task Scheduling for Cost Minimization in Distributed …static.tongtianta.site/paper_pdf/3d432c44-b9d8-11e9-b2c4... · 2019. 8. 8. · IEEE TRANSACTIONS ON AUTOMATION SCIENCE

740 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 16, NO. 2, APRIL 2019

[52] D. Rodrigues et al., “A wrapper approach for feature selection basedon bat algorithm and optimum-path forest,” Expert Syst. Appl., vol. 41,no. 5, pp. 2250–2258, Apr. 2014.

[53] M. Ghamkhari and H. Mohsenian-Rad, “Energy and performance man-agement of green data centers: A profit maximization approach,” IEEETrans. Smart Grid, vol. 4, no. 2, pp. 1017–1025, Jun. 2013.

[54] J. Bi, H. Yuan, W. Tan, and B. H. Li, “TRS: Temporal request schedulingwith bounded delay assurance in a green cloud data center,” Inf. Sci.,vol. 360, no. 1, pp. 57–72, Sep. 2016.

[55] Z. Huang, Z. Wang, and H. Zhang, “Multilevel feature moving averageratio method for fault diagnosis of the microgrid inverter switch,”IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 177–185, Apr. 2017.

[56] D. Ardagna, B. Panicucci, and M. Passacantando, “Generalized nashequilibria for the service provisioning problem in cloud systems,” IEEETrans. Services Comput., vol. 6, no. 4, pp. 429–442, Oct./Dec. 2013.

[57] W. Dong and M. C. Zhou, “A supervised learning and control methodto improve particle swarm optimization algorithms,” IEEE Trans. Syst.,Man, Cybern., Syst., vol. 47, no. 7, pp. 1149–1159, Jul. 2017.

[58] X. Guo, S. Liu, G. Tian, and M. C. Zhou, “Disassembly sequenceoptimization for large-scale products with multiresource constraintsusing scatter search and petri nets,” IEEE Trans. Cybern., vol. 46, no. 11,pp. 2435–2446, Nov. 2016.

[59] Q. Kang, S. Feng, M. Zhou, A. C. Ammari, and K. Sedraoui, “Opti-mal load scheduling of plug-in hybrid electric vehicles via weight-aggregation multi-objective evolutionary algorithms,” IEEE Trans. Intell.Transp. Syst., vol. 18, no. 9, pp. 2557–2568, Sep. 2017.

[60] Y. Hou, N. Wu, M. Zhou, and Z. Li, “Pareto-optimization for schedulingof crude oil operations in refinery via genetic algorithm,” IEEE Trans.Syst., Man, Cybern., Syst., vol. 47, no. 3, pp. 517–530, Mar. 2017.

[61] G. Tian, Y. Ren, and M. Zhou, “Dual-objective scheduling of rescuevehicles to distinguish forest fires via differential evolution and particleswarm optimization combined algorithm,” IEEE Trans. Intell. Transp.Syst., vol. 17, no. 11, pp. 3009–3021, Nov. 2016.

Haitao Yuan (S’15–M’17) received the B.S. andM.S. degrees in software engineering from North-eastern University, Shenyang, China, in 2010 and2012, respectively, and the Ph.D. degree in controlscience and engineering from Beihang University,Beijing, China, in 2016.

In 2015, he was a visiting doctoral student with theNew Jersey Institute of Technology (NJIT), Newark,NJ, USA. He is currently an Assistant Professorwith the School of Software Engineering, BeijingJiaotong University, Beijing. He is currently also a

visiting research scholar with the Department of Electrical and ComputerEngineering, NJIT, under the financial support from China ScholarshipCouncil. His research interests include cloud computing, data center, software-defined networking, optimization algorithms, and big data.

Dr. Yuan was the recipient of the 2011 Google Excellence Scholarship.

Jing Bi (M’13–SM’16) received the Ph.D. degreefrom Northeastern University, Shenyang, China,in 2011.

She was a Post-Doctoral Researcher with theDepartment of Automation, Tsinghua University,Beijing, China. She was a Research Assistant withIBM Research, Beijing, where she was involvedin cloud computing. She is currently an AssociateProfessor with the School of Software Engineering,Faculty of Information Technology, Beijing Univer-sity of Technology, Beijing. She is currently also

a visiting research scholar with the Department of Electrical and ComputerEngineering, New Jersey Institute of Technology, Newark, NJ, USA. She hasauthored or co-authored more than 60 refereed journal and conference papers.Her recent research interests include distributed computing, cloud computing,large-scale data analytics, machine learning, and performance optimization.

Dr. Bi was the recipient of the IBM Ph.D. Fellowship Award.

MengChu Zhou (S’88–M’90–SM’93–F’03)received the B.S. degree in control engineering fromthe Nanjing University of Science and Technology,Nanjing, China, in 1983, the M.S. degree inautomatic control from the Beijing Institute ofTechnology, Beijing, China, in 1986, and the Ph.D.degree in computer and systems engineering fromRensselaer Polytechnic Institute, Troy, NY, USA,in 1990.

In 1990, he joined the New Jersey Instituteof Technology, Newark, NJ, USA, where he is

currently a Distinguished Professor of Electrical and Computer Engineering.He has authored or co-authored 700 publications including 12 books, morethan 400 journal papers (over 300 in IEEE transactions), and 28 bookchapters. He holds 11 patents. His research interests are in Petri nets,intelligent automation, Internet of Things, big data, web services, andintelligent transportation.

Dr. Zhou is a Life Member of the Chinese Association for Science andTechnology, USA, and served as its President in 1999. He is a Fellow of theInternational Federation of Automatic Control, the American Association forthe Advancement of Science, and the Chinese Association of Automation.He was a recipient of the Humboldt Research Award for U.S. SeniorScientists from Alexander von Humboldt Foundation, the Franklin V. TaylorMemorial Award, and the Norbert Wiener Award from the IEEE Systems,Man and Cybernetics Society. He is the Founding Editor of the IEEEPRESS BOOK SERIES ON SYSTEMS SCIENCE AND ENGINEERING and theEditor-in-Chief of the IEEE/CAA JOURNAL OF AUTOMATICA SINICA.