energy-efficient thermal-aware autonomic management of ... › ~pompili › paper ›...

27
J Grid Computing (2012) 10:447–473 DOI 10.1007/s10723-012-9219-2 Energy-Efficient Thermal-Aware Autonomic Management of Virtualized HPC Cloud Infrastructure Ivan Rodero · Hariharasudhan Viswanathan · Eun Kyung Lee · Marc Gamell · Dario Pompili · Manish Parashar Received: 1 October 2011 / Accepted: 3 July 2012 / Published online: 27 July 2012 © Springer Science+Business Media B.V. 2012 Abstract Virtualized datacenters and clouds are being increasingly considered for traditional High-Performance Computing (HPC) workloads that have typically targeted Grids and con- ventional HPC platforms. However, maximizing energy efficiency and utilization of datacenter re- sources, and minimizing undesired thermal behav- ior while ensuring application performance and other Quality of Service (QoS) guarantees for HPC applications requires careful consideration of important and extremely challenging tradeoffs. Virtual Machine (VM) migration is one of the I. Rodero (B ) · H. Viswanathan · E. K. Lee · M. Gamell · D. Pompili · M. Parashar NSF Cloud and Autonomic Computing Center, Rutgers Discovery Informatics Institute, Rutgers University, 94 Brett Road, Piscataway, NJ 08854, USA e-mail: [email protected] H. Viswanathan e-mail: [email protected] E. K. Lee e-mail: [email protected] M. Gamell e-mail: [email protected] D. Pompili e-mail: [email protected] M. Parashar e-mail: [email protected] most common techniques used to alleviate ther- mal anomalies (i.e., hotspots) in cloud datacen- ter servers as it reduces load and, hence, the server utilization. In this article, the benefits of using other techniques such as voltage scaling and pinning (traditionally used for reducing energy consumption) for thermal management over VM migrations are studied in detail. As no single technique is the most efficient to meet temper- ature/performance optimization goals in all sit- uations, an autonomic approach that performs energy-efficient thermal management while en- suring the QoS delivered to the users is proposed. To address the problem of VM allocation that arises during VM migrations, an innova- tive application-centric energy-aware strategy for VM allocation is proposed. The proposed strat- egy ensures high resource utilization and en- ergy efficiency through VM consolidation while satisfying application QoS by exploiting knowl- edge obtained through application profiling along multiple dimensions (CPU, memory, and net- work bandwidth utilization). To support our arguments, we present the results obtained from an experimental evaluation on real hard- ware using HPC workloads under different scenarios. Keywords Cloud infrastructure · Virtualization · Thermal management · Energy-efficiency

Upload: others

Post on 29-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

J Grid Computing (2012) 10:447–473DOI 10.1007/s10723-012-9219-2

Energy-Efficient Thermal-Aware Autonomic Managementof Virtualized HPC Cloud Infrastructure

Ivan Rodero · Hariharasudhan Viswanathan ·Eun Kyung Lee · Marc Gamell · Dario Pompili ·Manish Parashar

Received: 1 October 2011 / Accepted: 3 July 2012 / Published online: 27 July 2012© Springer Science+Business Media B.V. 2012

Abstract Virtualized datacenters and clouds arebeing increasingly considered for traditionalHigh-Performance Computing (HPC) workloadsthat have typically targeted Grids and con-ventional HPC platforms. However, maximizingenergy efficiency and utilization of datacenter re-sources, and minimizing undesired thermal behav-ior while ensuring application performance andother Quality of Service (QoS) guarantees forHPC applications requires careful considerationof important and extremely challenging tradeoffs.Virtual Machine (VM) migration is one of the

I. Rodero (B) · H. Viswanathan · E. K. Lee ·M. Gamell · D. Pompili · M. ParasharNSF Cloud and Autonomic Computing Center,Rutgers Discovery Informatics Institute,Rutgers University, 94 Brett Road,Piscataway, NJ 08854, USAe-mail: [email protected]

H. Viswanathane-mail: [email protected]

E. K. Leee-mail: [email protected]

M. Gamelle-mail: [email protected]

D. Pompilie-mail: [email protected]

M. Parashare-mail: [email protected]

most common techniques used to alleviate ther-mal anomalies (i.e., hotspots) in cloud datacen-ter servers as it reduces load and, hence, theserver utilization. In this article, the benefits ofusing other techniques such as voltage scaling andpinning (traditionally used for reducing energyconsumption) for thermal management over VMmigrations are studied in detail. As no singletechnique is the most efficient to meet temper-ature/performance optimization goals in all sit-uations, an autonomic approach that performsenergy-efficient thermal management while en-suring the QoS delivered to the users is proposed.To address the problem of VM allocationthat arises during VM migrations, an innova-tive application-centric energy-aware strategy forVM allocation is proposed. The proposed strat-egy ensures high resource utilization and en-ergy efficiency through VM consolidation whilesatisfying application QoS by exploiting knowl-edge obtained through application profiling alongmultiple dimensions (CPU, memory, and net-work bandwidth utilization). To support ourarguments, we present the results obtainedfrom an experimental evaluation on real hard-ware using HPC workloads under differentscenarios.

Keywords Cloud infrastructure · Virtualization ·Thermal management · Energy-efficiency

Page 2: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

448 I. Rodero et al.

1 Introduction

1.1 Motivation and Background

Virtualized datacenters and clouds provide theabstraction of nearly-unlimited computing re-sources through the elastic use of consolidatedresource pools. These platforms are being increas-ingly considered for traditional High-PerformanceComputing (HPC) workloads that have typicallytargeted Grids and conventional HPC platforms.The scale and overall complexity of modern dat-acenters are growing at an alarming rate (currentdatacenters contain tens to hundreds of thousandsof computing and storage devices running com-plex applications) and, hence, energy consump-tion, heat generation, and cooling requirementshave become critical concerns both in terms of thegrowing operating costs as well as their environ-mental and societal impacts [1]. Addressing theseconcerns while balancing multiple requirements,including performance, Quality of Service (QoS),and reliability, is thus an important task for serviceproviders.

Thermal awareness, which is the knowledge oftemperature distribution inside a datacenter, is es-sential to maximize energy and cooling efficiencyas well as to minimize server failure rates. Hence,in this article, we propose techniques to acquireand characterize the thermal behavior of a dat-acenter so to bestow self-protection and self-healing capabilities on datacenter managementsystems. Thermal-aware datacenter management

is aimed at minimizing both the impact on theenvironment and the Total Cost of Ownership(TCO) of datacenters without any increase inthe number of Service Level Agreement (SLA)violations.

Prior work in the field of thermal managementexplores efficient methods for improving heat ex-traction through cooling system optimization as in[20, 44–46] or methods for controlling heat gener-ation [31, 48, 52] that focus on how to distributeand migrate workloads (if necessary) in such away as to avoid undesired thermal behavior (e.g.,overheating of computing equipment). However,maximizing energy efficiency and utilization ofdatacenter resources, minimizing undesired ther-mal behavior, and ensuring QoS guarantees forHPC application are conflicting goals. Carefulconsideration of extremely challenging tradeoffsamong these goals is essential for addressing themsimultaneously.

1.2 Architecture and Goals

The long-term goal of our approach is toautonomically manage datacenters using the in-formation from sensors and taking decisions atdifferent levels (through controllers) based onthe optimization goals (e.g., performance, energyefficiency, cost). To do this, we consider an archi-tecture composed of layers belonging to differentabstract components with different responsibili-ties but with the same common objectives. Thearchitecture (see Fig. 1) is composed of four lay-

Fig. 1 Layered architecture model

Page 3: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 449

ers: environment layer (which detects, localizes,characterizes, and tracks thermal hotspots usinga hybrid sensing infrastructure composed of in-built as well as external scalar temperature andhumidity sensors), physical resource layer (whichmanages the hardware and software componentsof servers), virtualization layer (which instanti-ates, configures, and manages VMs), and applica-tion layer (which is aware of the workload’s andapplications’ characteristics and behavior). Forautonomic reactive thermal management we focuson the possible interactions among all the fourlayers based on temperature, power consump-tion, and application QoS requirements. Note thatthe environment layer comprising of the hybridsensing infrastructure monitors both the micro(chip and server level) as well as the macro (rackand datacenter level) phenomena, i.e., tempera-ture distribution, in the datacenter. The proposedapproach represents a significant and transfor-mative shift towards cross-layer autonomics fordatacenter management problems, which have sofar been considered mostly in terms of individuallayers.

1.3 Approach

One component of our proposed solution contin-uously monitors temperature changes and han-dles thermal emergencies (using VM migrations,voltage and/or frequency scaling, and pinning)while ensuring that application performance is notdegraded. Another component strives to achieveenergy efficiency while adhering to SLAs by co-locating compatible workloads (or VMs hostingthe workloads) in physical machines when VMmigration is chosen as the thermal-managementstrategy. Compatibility among workloads is de-termined through extensive HPC applicationprofiling. It is evident that the first componentinvolves cross-layer interactions among the vir-tualization, physical, and the environment lay-ers while the second component requires interac-tions between the application and virtualizationlayers. These two components together help usrealize our envisioned cross-layer thermal-awareautonomic management of datacenters throughcareful consideration of trade-offs among the re-quirements from energy, application QoS, and op-

erating environment perspectives. Our approachis twofold:

(1) Reactive thermal management: VM migrationis one of the most common techniques usedto alleviate thermal anomalies (i.e., hotspots)in cloud datacenter’s servers by reducing theload and, therefore, by decreasing the serverutilization. Switching off idle servers is also atypical policy along with VM migration forpower management. In this article, we ex-plore ways to take actions at the server sidebefore performing costly migrations of VMs,including techniques that can be used as tem-porary actions that may facilitate migrationswithout incurring in additional penalty interms of server thermal behavior. We aimat optimizing the energy efficiency of data-centers while ensuring the QoS delivered tothe users. Specifically, we focus on exploitingVM Monitor (VMM) configurations (such aspinning techniques, i.e., CPU affinity) in Xenplatforms as well as Dynamic Voltage andFrequency Scaling (DVFS) at the physicalresource layer to alleviate undesired ther-mal behavior of the datacenter’s hardwarecomponents. However, not always the samemechanism is the most efficient to meet thedesired optimization goals.

(2) Application-centric VM allocation: To ad-dress the problem of VM allocation—whicharises in the following two situations: initialmapping of VMs to physical servers and mi-gration of VMs from one physical server toanother in response to (reactive) undesiredthermal behavior and/or SLA violations—we propose an application-centric energy-aware VM allocation model. Our model canhelp reduce the number of SLA violations,avoid undesired thermal behavior, and mini-mize the energy costs by improving resourceutilization.

In contrast to existing VM allocation solutionsthat are aimed at satisfying only the resource uti-lization requirement of an application along onlyone dimension (CPU utilization), our approachto VM consolidation considers the application’sresource utilization requirements along multipledimensions, i.e., CPU, memory, disk I/O, and

Page 4: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

450 I. Rodero et al.

network subsystems. In addition, our approach isapplication centric as in [15, 17, 56] and, hence,enables co-location of compatible VMs on serversduring consolidation. This co-location minimizesthe usual adverse effects of resource contentionand virtualization overhead on application per-formance while retaining the benefits of serverconsolidation. The aforementioned applicationawareness is acquired through HPC benchmarksprofiling.

1.4 Contributions

The main contributions of this work are: (1) astudy of different reactive thermal managementtechniques for virtualized and instrumented data-centers from the energy perspective towards thedesign of an autonomic approach, (2) a studyof the tradeoffs between performance, energyefficiency, and thermal efficiency of the tech-niques for HPC workloads, (3) development of anautonomic reactive thermal-management solutionthat leverages the knowledge gained in the afore-mentioned studies to choose the most appropriatetechniques for alleviating thermal anomalies, (4)creation of an empirical VM allocation modelfrom data obtained by running HPC workloadsextensively on a system with a general-purposerack server configuration, and (5) development ofa proactive application-centric VM allocation al-gorithm that uses this empirical model. To supportthe arguments of our approach, we present the re-sults obtained from simulation and experimentalevaluation on real hardware using HPC workloadsunder different scenarios.

To evaluate our proposed techniques for ther-mal management of HPC cloud infrastructure,we conducted the simulations and experimentson real hardware using parallel workload tracesfrom real world HPC production systems. Theresults obtained from simulations using real worldproduction HPC workload traces show that ourproactive VM allocation solution significantlycontributes to energy efficiency (12 % reduc-tion in energy consumption) and/or optimizationof the application performance (18 % reductionin execution time) depending on optimizationgoals. Our reactive thermal management solutionin conjunction with the proactive consolidation

approach outperforms other traditional thermalmanagement approaches like load redistribution(VM migrations) and Temperature-Aware (TA)VM placement in terms of energy consumption(up to 9 % less than that of TA’s), makespan (upto 12 % less than that of TA’s), maximum serveroperating temperatures (up to 10 % less than thatachieved by the load redistribution technique),and percentage of time for which servers operatein unsafe temperatures (up to 25 % less than thatachieved by the load redistribution technique).

The rest of the article is organized as fol-lows. In Section 2, we review related work. InSection 3, we discuss the temporal-spatial char-acteristics of hotspots and our empirical-model-based approach to reactive thermal management.In Section 4, we describe our empirical VM al-location model and present our VM allocationalgorithm, which uses this model. In Section 5,we discuss our evaluation methodology as wellas the results we obtained from simulations andempirical experiments on real hardware. Finally,in Section 6, we conclude the article and outlinedirections for future work.

2 Related Work

In this section, firstly, we review the state of theart in thermal management [38, 45] at the envi-ronment as well as application layers. Secondly,we review power management techniques (such asDVFS and pinning at the physical resource layer)that we propose to exploit for energy-efficientthermal management in place of costly VM migra-tions. Thirdly, we discuss prior work in the fieldof energy-efficient VM allocation, which is keyfor VM migration decisions in reaction to thermalanomalies. Finally, we review existing cross-layersolutions, which are characterized by pair-wiseinteractions between layers.

2.1 Thermal Management at the EnvironmentLayer

Prior research efforts in this area focus on man-agement of heat extraction in a datacenter [20,44–46]. In [16], Greenberg et al. profiled andbenchmarked the energy usage of 22 datacenters

Page 5: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 451

and concluded that the key to energy efficiencyis air circulation management (for effective andefficient cooling). Lee et al. [26] propose a proac-tive control approach that jointly optimizes the airconditioner compressor duty cycle and fan speedto prevent heat imbalance and to minimize thecost of cooling in data centers. As many data-centers employ raised floors with perforated tilesto distribute the chilled air to racks, researchershave tried to gain valuable insights into efficientairflow distribution strategies in such datacenterlayouts [20, 44–46]. Other research efforts wereaimed at improving the efficiency of cooling sys-tems through thermal profiling [27, 28, 47], whichis the extraction of knowledge about air andheat circulation using measurements from scalarsensors and mathematical models. However,capturing this complex thermodynamic phenom-ena using compute-intensive models [5, 37] (e.g.,computational fluid dynamics) is prohibitive interms of computational overhead.

2.2 Thermal Management at the ApplicationLayer

Another popular approach to thermal manage-ment has been controlling the heat generationinside a datacenter [4, 18, 31, 32, 52] throughthermal-aware workload placement. Moore et al.[31] propose the use of “offline experiments”to characterize the thermodynamic phenomena(heat recirculation) inside the datacenter andschedule workloads by taking the temperaturedistribution into account. In [32], Moore et al.eliminate the need for the aforementioned offlineexperiments and propose an online machine-learning-based method to model the thermal be-havior of the datacenter. Bash et al. [4] proposea policy to place the workload in areas of a dat-acenter that are easier to cool, which results incooling power savings. They use scalar temper-ature sensor measurements alone to derive twometrics that help decide whether to place work-load on a server or not. Tang et al. [52] developa linear, low-complexity process model to predictthe equipment inlet temperatures in a datacentergiven a server utilization vector and formalize(mathematically) the problem of minimizing thedatacenter cooling cost as the problem of minimiz-

ing the maximal (peak) inlet temperature throughtask assignment. In [33], Mukherjee et al. explorea spatio-temporal thermal-aware job schedulingas an extension to spatial thermal-aware solutionslike [18, 31, 32, 52].

2.3 Power Management at the Physical ResourceLayer

Several research efforts propose methods tojointly manage power and performance at thephysical resource layer. One of the most usedtechniques in the last decades to reduce powerconsumption is DVFS. Researchers have devel-oped different DVFS scheduling algorithms andmechanisms to save energy while provisioning re-sources under deadline restrictions. Chen et al.[7] address resource provisioning and proposepower management strategies with SLA con-straints based on steady-state queuing analysisand feedback control theory. They use server turnon/off and DVFS for enhancing power savings.Ranganathan et al. [40] highlight the current issueof under utilization and over-provisioning of theservers and present a solution for peak powerbudget management across a server ensembleto avoid excessive over-provisioning consideringDVFS and memory/disk scaling. Rusu et al. [43]propose a cluster-wide on/off policy based on dy-namic reconfiguration and DVFS. They focus onpower, execution time, and server capacity charac-terization to provide energy management. Roderoet al. [41] studied application-centric aggressivepower management of data centers resources forHPC workloads considering power managementmechanisms and controls available at differentlevels and for different subsystems (i.e., CPU,memory, disk, network). Kephart et al. [22] andDas et al. [9] address the coordination of multi-ple autonomic managers for power/performancetradeoffs by using a utility function approach in anon virtualized environment.

Virtual machine monitor configurations (e.g.,within the Xen hypervisor) such as pinning (i.e.,CPU affinity) have been proposed earlier (as in[50]) to optimize VM performance. In this article,we investigate the use of power management andperformance optimization techniques (DVFS andpinning) for mitigating thermal anomalies before

Page 6: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

452 I. Rodero et al.

considering costly VM migrations. To the best ofour knowledge, none of the existing approacheshave investigated and exploited pinning to miti-gate the effects of thermal anomalies with energyefficiency and performance as optimization goals.

2.4 Energy Management at the VirtualizationLayer

A large body of work in datacenter energy man-agement addresses the problem of the requestdistribution at the VM management level in sucha way that the performance goals are met and theenergy consumption is minimized. VM consolida-tion techniques involve filling up physical serverswith VMs (using heuristics like first fit, best fit,etc.) until high server subsystem (CPU, mem-ory, disk storage, network interface) utilization isachieved while still ensuring that the individualVM’s subsystem utilization requirements are met.Apparao et al. [3] present a study on the impactof consolidating several applications on a singleserver running Xen.

Server resource provisioning or VM allocationcan be static or dynamic. It is static when a VMis being allocated physical resources for the firsttime and the problem of VM allocation reducesto a deterministic or statistical bin/vector-packingproblem [2] depending on how the VM utiliza-tion requirement is characterized. In the dynamiccase, VMs are first consolidated using any simplebin-packing heuristic and the variations in VM’sutilization requirements are handled through liveVM migrations [6, 19, 51, 53, 54] or throughdynamic server resource provisioning [29, 34]whenever necessary.

2.5 SLA-Management at the Virtualization Layer

VM migrations are performed either reactively[42] or proactively [6] in such a way as to avoidequipment overheating and/or SLA violations.Kochut et al. [23] provide an estimate of theexpected improvement in response time due toa migration decision and determines which VMsare best candidates to be placed together. In [19],Hermenier et al. determine the order in whichthe VM migrations should occur in addition to

deciding which VMs to migrate so to minimizethe impact on application performance in termsof execution time. Stoess et al. [51] developed amulti-tiered infrastructure that enables intra-nodevirtual CPU (vCPU) migration and inter-nodelive VM migration for workload consolidation andthermal balancing. Voorsluys et al. [55] present anevaluation on the effects of live migration of vir-tual machines on the performance of applicationsrunning inside Xen VMs.

On-demand server resource provisioning tech-niques monitor the workloads on a set of VMsand adjust the instantaneous resources availed byVMs. Song et al. [49] propose an adaptive and dy-namic scheme for adjusting resources (specifically,CPU and memory) among virtual machines ona single server to share the physical resourcesefficiently. Menasce et al. [29] proposed an au-tonomic controller and showed how it can beused to dynamically allocate CPUs in virtualizedenvironments with varying workload levels by op-timizing a global utility function. Meng et al. [30]exploited statistical multiplexing of VMs to enablejoint VM provisioning and consolidation basedon aggregated capacity needs. However, all theaforementioned techniques are aimed at satisfy-ing the resource utilization level guarantees anddo not consider the application-level performance(execution time).

2.6 Cross-Layer Solutions For Power-, Thermal-,and QoS-Management

Nathuji et al. [35] investigate the integration ofpower management at the physical layer and vir-tualization technologies. In particular they pro-pose VirtualPower to support the isolated andindependent operation of virtual machines andcontrol the coordination among virtual machinesto reduce the power consumption. In [34], Nathujiet al. consider the heterogeneity of the underlyingplatforms (in terms of processor and memory sub-system architecture) to efficiently map the work-loads to the best fitting platforms. Laszewski et al.[25] present a scheduling algorithm for VMs in acluster to reduce power consumption using DVFS.Kumar et al. [24] present vManage, a practical co-ordination approach that loosely couples platform

Page 7: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 453

and virtualization management aimed at improv-ing energy savings and QoS and at reducing VMmigrations.

Heath et al. [18] propose emulation tools (Mer-cury and Freon) for investigating the thermal im-plications of power management. In [39], Ramoset al. present C-Oracle, a software infrastructurethat dynamically predicts the temperature andperformance impact of different thermal manage-ment reactions (such as load redistribution anddynamic voltage and frequency scaling) into thefuture, allowing the thermal management policyto select the best reaction.

Recently, researchers have started to focus onapplication- or workload-aware VM consolidationthat not only achieves all the objectives as itstraditional resource-utilization- and energy-awarecounterparts but also ensures minimum degrada-tion to application performance due to resourcemultiplexing and virtualization overhead [15, 17,56]. Application-aware VM allocation is not onlyaimed at energy-efficient VM consolidation butalso at co-locating VMs that are “compatible” sothat further gains can be achieved in terms ofenergy savings and overhead for virtualization.In [17, 56], the authors propose to consolidateVMs with similar memory content on the samehosts for higher memory sharing and Govindanet al. [15] propose to consolidate based on inter-process communication patterns. Zhu et al. [57]propose pSciMapper, a power-aware method forconsolidation virtual machines that host scientificworkflow tasks. They use a dimensionality re-duction technique, Kernel Canonical CorrelationAnalysis (KCCA), to associate temporal featuresextracted from time series data about resourcerequirements of a workflow task with its powerconsumption and execution time (performance).Information about the tasks’ power consumptionand performance are exploited in an online con-solidation algorithm.

All the aforementioned cross-layer solutionsare characterized only by pair-wise interactionbetween two layers—physical and virtualizationlayers, physical and environment layers, or appli-cation and virtualization layers. In our proposedthermal-aware management solution interactionsbetween the application and the virtualizationlayers are leveraged for proactive VM allocation

while interactions among all the four layers are ex-ploited for reactive thermal management. In otherwords, in our thermal management solution, theenvironment layer alerts the virtualization layerand physical layer about undesired thermal be-havior. These two layers jointly decide whether touse power management techniques at the physicallayer or VM migration with help from applicationlayer to alleviate the thermal anomalies.

3 Reactive Thermal Management in Datacenters

In this section, we briefly discuss the charac-teristics of thermal hotspots that majorly causethermal inefficiency in datacenters. Then, we in-troduce three different techniques to mitigatethe effects of hotspots and a characterizationof the different techniques based on empiricalexperimentation.

3.1 Characteristics of Thermal Hotspots

Hotspots can be detected using internal and ex-ternal temperature sensors when the measure-ments cross specific temperature value definedas “hotspot threshold”. Hotspots are difficult tolocalize accurately in space and are hard to predictin time: this is because the heat transfer via aircirculation (convection) and through the serverblades and racks (conduction) are phenomena

Fig. 2 26 TelosB motes deployed on a rack to observe thecharacteristics of hotspots

Page 8: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

454 I. Rodero et al.

020

4060

0

5

10

15

0

2

−2 −2

4

6

8

Time [min]Node Number

Tem

pera

ture

[C]

020

4060

0

5

10

15

0

2

4

6

8

Time [min]Node Number

Tem

pera

ture

[C]

(a) Seven servers running (nodes 7-13) (b) Four servers running (nodes 7-10)

Fig. 3 Temporal and spatial measurement data (temperature) among 13 nodes

difficult to model. In addition, hotspots changetheir positions in time and space depending onseveral factors such as distribution and intensity ofrunning workloads, server heat dissipation char-acteristics, airflow circulation in the datacenter,etc. Thus, it is crucial to understand the charac-teristics of hotspots for energy-efficient thermalmanagement.

The temporal correlation of the measured tem-perature data collected from 13 TelosB sensornodes (deployed in front of the outlet fans of eachserver in a dual rack system), each placed on 13vertically arranged servers (as shown in Fig. 2) isdepicted in Fig. 3. These TelosB sensor nodes arewirelessly connected and built with IEEE 802.15.4compliant CC2420 radio (2.4 GHz) as well aswith a few sensors (temperature, humidity, andlight). Figure 3a and b correspond to the set ofexperiments of using TelosB nodes in which onlyservers 7–13 and 7–10 are running, respectively.These results show that some idle servers that arein the proximity of operational servers experiencean increase in temperature much higher than thatof other idle servers. These results show that ahotspot affecting a server may in fact be causedby another server running on the same rack oron a different one nearby. The increase in thetemperature near the outlet fans of idle servers(due to conduction and convection) decreases theability of the air around idle CPUs to extract heatefficiently when the idle servers become busy.This results in higher server operating tempera-tures, which in turn increases the risk of failures.

The only solution to counter this effect is decreas-ing the CRAC outlet air temperature, which is notenergy efficient.

Figure 4 shows the thermal behavior of a singleserver in the presence of an hotspot and the cor-responding power consumption. It also shows theimpact of VM migration on server’s temperatureand power consumption. The temperature sensorson the TelosB nodes placed near the server outletfan (external) and near the CPU (internal) areused to observe heat propagation in space andtime. The increase of the external temperature,which is controlled with a heat source (from sec-ond 600 to 1,400) results in an increase of theserver’s internal temperature. The figures illus-trate the impact of migrating one VM on theserver’s temperature and power consumption (atsecond 900—reacting to the hotspot). In com-paring the internal server temperature (Fig. 4a)and power consumption (Fig. 4b) we observe thatthere is a correlation between the two metrics.

3.2 Reactive Thermal Management Approaches

A typical mechanism used to reduce the server’stemperature (e.g., to react to a hotspot) consistsof decreasing the heat generated by the server,which, based on our measurements is highly cor-related with the server’s power consumption.As the CPU1 is the most power consuming

1We use the term CPU to refer to each core of a multi-coreprocessor.

Page 9: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 455

30

35

40

45

50

55

60

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Tem

pera

ture

(C

)

Time (s)

Internal Server TemperatureEnvironment (Hotspot)

120

140

160

180

200

220

240

260

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Pow

er (

W)

Time (s)

(a) Temperature (b) Power Consumption

Fig. 4 Server’s thermal and power behavior in the presence of a hotpost

component of a server, we consider CPU power(as a simplification) to describe the different tech-niques analyzed in this article. Equation 1 shows asimplified dynamic power dissipation model for aCPU, where C is the capacitance of the processor(that we consider fixed), α is an activity factor(also known as switching activity), and V and fare the operational voltage and frequency, respec-tively [21].

Pcpu ∼ C × α × V2 × f. (1)

Therefore, the power consumption of a servercan be decreased by either reducing the activity ofCPUs or reducing the frequency/voltage of CPUs(via DVFS). In the following sections, we discussdifferent techniques that use this model to reduceserver power consumption (and thus the heat gen-erated). We do not take into account cooling orplacement issues; rather, we focus on the energyefficiency of thermal management approaches onthe servers considering HPC workloads and vir-tualized environments under the assumption thatcooling and placement do not change.

3.2.1 VM Migration

This technique consists of moving a running VM,its guest OS, and all of its applications to adifferent server. Migrating VMs reduces the CPUactivity α in (1) and, if a CPU is freed and the OSimplements dynamic CPU power management, itcan also reduce the frequency/voltage. We assumethat the OS power management is enabled by de-fault. When we need to migrate a VM or multiple

VMs to react to a hotspot, one of the followingfour scenarios is possible:

– Another server is available to host the VM(s)that are being migrated: migration can beperformed at the penalty of some overhead(energy, latency, bandwidth).

– A server has been powered down: we can per-form the migration after powering the serverup at the additional penalty of booting.

– All servers already have some load but someof them have higher thermal efficiency: wecan perform the migration but with possiblepenalty due to resource sharing between mi-grated workload and the existing workload inthe destination server.

– No other server is available to host newVM(s): in this case, we can only suspend theVM(s) until a server becomes available.

3.2.2 Processor DVFS

This is an effective technique to reduce proces-sor power dissipation supported in most of thecurrent processors. DVFS reduces the proces-sor frequency/voltage but not the activity fac-tor. However, CPU-intensive workloads runningat low frequencies may experience a significantpenalty on their execution time. Within a singleserver we can use DVFS in two ways:

– By reducing the frequency/voltage of all CPUssimultaneously: the workload execution mayincrease depending of the frequency/voltagereduction and workload characteristics (e.g.,

Page 10: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

456 I. Rodero et al.

I/O-intensive workloads may not be sig-nificantly penalized).

– By reducing the frequency/voltage of a subsetof CPUs: if different VMs are independentthey may complete their workload at differenttimes. If VMs are coupled (e.g., MPI parallelapplications) the workload running in slowerCPUs may penalize the workload runningin faster CPUs. However, some architectureshave restrictions (e.g., by paired CPUs).

3.2.3 VMM Conf iguration (Pinning)

We propose to use this technique to react tohotspots as an alternative to VM migration andDVFS. VMMs may allow virtual CPUs (vCPUs)of VMs to be assigned to physical CPUs (pCPUs)using two different approaches: without and withaffinity (pinning). In the former, the VMM de-termines how vCPUs are assigned to pCPUs; inthe latter, the VMM allows hard assignments ofthe vCPUs to one or more pCPUs. Pinning tech-niques are typically used where the characteristicsof the workload would benefit from execution onspecific CPUs (e.g., cache locality).

We propose to reduce the activity of one ormore CPUs by pinning the VMs to the otherCPUs. As we assume that the OS performs bydefault dynamic CPU power management, whena CPU is freed from VMs activity, its frequencyor voltage can also be reduced. However, theactivity of the running CPUs may be increased,resulting in a penalty in the workload’s executionperformance caused by higher resource sharing.

3.3 Characterizing Thermal ManagementTechniques

We evaluated the three VM management tech-niques discussed previously using two Dellservers, each with an Intel quad-core Xeon X3220processor which operates at four frequencies rang-ing from 1.6 GHz to 2.4 GHz, 4 GB of memory,two hard disks, and two 1 Gb Ethernet inter-faces. This is intended to represent a general-purpose rack server configuration, widely used invirtualized datacenters. The servers were rackedone on top of the other and run CentOS Linuxoperating system with a patched 2.6.18 kernel with

Xen version 3.1. To empirically measure the “in-stantaneous” power consumption of the serverswe used a “Watts Up? .NET” power meter. Thispower meter has an accuracy of ±1.5 % of themeasured power with sampling rate of 1 Hz. Themeter was mounted between the wall power andthe server. We estimate the consumed energy byintegrating the actual power measures over time.We used built-in temperature sensors and Linux-based hardware monitoring tool ’lm-sensors’ tomeasure CPU temperatures, and TelosB motesto measure both internal (sensors placed insidethe chassis, near the CPU) and external (sensorsplaces on the back of the servers) temperaturesof the server as described in Section 3.1. We usedthe CPUfreq kernel module and its user-spacetools to dynamically adjust the voltage and fre-quency pairs. We used a Sunbeam SFH111 heater(directed at the servers) in order to emulate acontrolled thermal hotspot (e.g., potentially fromother servers), which facilitated the experimentssince the nodes were isolated with disproportion-ate cooling.

Figures 5, 6, and 7 show the thermal behav-ior and power consumption of a server whenrunning a HPC workload with the different VMmanagement techniques considered in this article.Specifically, we used the HPL Linpack bench-mark, which is one of the most representativeHPC benchmarks (uses intensively different re-sources such as CPU and memory) since the goalof this experiment was to get insights regard-ing typical behaviors and trends. All the tech-niques and configurations have been evaluatedunder the same conditions. The experiment con-sists of running HPL in 4 VM instances with thesame configuration and applying a given tech-nique 800 s after starting the experiment (which islong enough to reach the steady state). The figuresfocus on the initial part of the experiment to showbetter the trends and only plot the Bezier curvesfor readability. We obtained the external temper-ature of the server with a sensor placed in the backside of the server, and the internal temperature (in◦C) of the server with a sensor placed inside theserver. The internal sensor provides the averagetemperature of the server’s components (not onlyCPU temperature, which can be obtained from itsinternal sensors).

Page 11: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 457

30

35

40

45

50

55

0 200 400 600 800 1000 1200 1400 1600

Ext

erna

l Tem

pera

ture

Time (s)

ReferenceMigrate 1VM

Migrate 2VMsMigrate 3VMs

30

32

34

36

38

40

42

44

46

Ext

erna

l Tem

pera

ture

Time (s)

ReferenceMigrate 1VM

Migrate 2VMsMigrate 3VMs

120

140

160

180

200

220

240

260

Pow

er (

W)

Time (s)

ReferenceMigrate 1VM

Migrate 2VMsMigrate 3VMs

0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600

(a) Internal temperaure (b) External temperaure (c) Power Comsuption

Fig. 5 Thermal behavior and power consumption using VM migration

30

35

40

45

50

55

0 200 400 600 800 1000 1200 1400 1600

Ext

erna

l Tem

pera

ture

Time (s)

4CPUs at 2.40GHz2CPUs at 1.60GHz4CPUs at 2.13GHz4CPUs at 1.60GHz

30

32

34

36

38

40

42

44

46

0 200 400 600 800 1000 1200 1400 1600

Ext

erna

l Tem

pera

ture

Time (s)

4CPUs at 2.40GHz2CPUs at 1.60GHz4CPUs at 2.13GHz4CPUs at 1.60GHz

120

140

160

180

200

220

240

260

0 200 400 600 800 1000 1200 1400 1600

Pow

er (

W)

Time (s)

4CPUs at 2.40GHz2CPUs at 1.60GHz4CPUs at 2.13GHz4CPUs at 1.60GHz

(a) Internal temperaure (b) External temperaure (c) Power comsuption

Fig. 6 Thermal behavior and power consumption using processor DVFS

30

35

40

45

50

55

0 200 400 600 800 1000 1200 1400 1600

Inte

rnal

Tem

pera

ture

Time (s)

ReferencePinning VMs to 3CPUsPinning VMs to 2CPUsPinning VMs to 1CPU

30

32

34

36

38

40

42

44

46

0 200 400 600 800 1000 1200 1400 1600

Ext

erna

l Tem

pera

ture

Time (s)

ReferencePinning VMs to 3CPUsPinning VMs to 2CPUsPinning VMs to 1CPU

120

140

160

180

200

220

240

260

0 200 400 600 800 1000 1200 1400 1600

Pow

er (

W)

Time (s)

ReferencePinning VMs to 3CPUsPinning VMs to 2CPUsPinning VMs to 1CPU

(a) Internal temperaure (b) External temperaure (c) Power comsuption

Fig. 7 Thermal behavior and power consumption using pinning techniques (pinning 4 VMs to different number of CPUs—from 3 to 1 CPUs)

Page 12: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

458 I. Rodero et al.

We can appreciate in the figures that internaland external temperatures are strongly correlatedand follow similar trends. However, the internaltemperature is almost 8 ◦C higher than the ex-ternal temperature. Temperature and power arealso correlated but variations in temperature areslower than in power. Overall, the higher reduc-tion in temperature and power are obtained usingVM migration. The higher the number of VMsmigrated, the more significant the decrease oftemperature and power. The reduction of temper-ature and power is more moderate using DVFSthan using VM migration. The highest reductionin temperature and power is obtained operatingall 4 CPUs at 1.60 GHz. This is consistent with (1),where higher the operational frequency higherthe temperature and power consumption. How-ever, running 2 CPUs at 1.60 GHz and 2 CPUsat 2.40 GHz obtains slightly worse results thanrunning all CPUs at 2.13 GHz.

The reduction in temperature and power usingpinning is lower than the one using VM migrationbut higher than the one using DVFS. The reduc-tion in temperature and power is higher whenthe VMs are pinned to fewer CPUs. Althoughusing different techniques we obtain different re-ductions in temperature and power, the plots ob-tained with the three different mechanisms followsimilar patterns. Hence, we can conclude that allof them can reduce the temperature and powerconsumption effectively, however, we focus on thetradeoffs between performance, energy efficiency,and thermal efficiency (i.e., temperature reduc-tion).

In order to measure the energy consumed bythe servers in the experiments that perform VMmigrations, we have considered the energy con-sumed by the original server during the wholeexecution of the experiment plus the energy con-sumed by the destination machine from when theVMs start migrating to the destination machine(i.e., shadowed area in Fig. 8a and b).

In the case that the destination machine alreadyhosts other VMs, we take into account the energyconsumed from the increase of power in relationto the power consumed by the server before themigration. Although we can find other intermedi-ate scenarios, in our experiments we only considerthese two scenarios as they provide simplified

but meaningful performance bounds. AlthoughVM migration seems better in terms of thermalefficiency, we look at the tradeoffs between ther-mal efficiency and other dimensions such as per-formance and energy efficiency.

Table 1 summarizes the experimental results. Itshows i) the obtained makespan (the time neededto complete the workload, which may include themigration overhead), ii) the energy consumed,iii) the Energy Delay Product (EDP), which isa good metric for energy efficiency because itcaptures the effect of energy management on per-formance [14], and iv) the reduction of temper-ature (“Temp. ↓” in Table 1). For the referenceexecution (regular execution without any specifictechnique) we present absolute values and forthe different techniques we present the resultsrelative the reference execution (in the form of% increase), except for temperature reduction.The best results of each technique are shownin bold.

As we commented previously, VM migrationis the technique that achieves highest reductionin temperature. At the price of some migrationoverhead, we also achieve shorter makespan bymigrating two VMs. However, it does not achievethe highest energy efficiency. The configurationwhere the destination server is empty is betterthan when the destination server is full in termof makespan but is worse in terms of energyefficiency. The configuration where the destina-tion is full is the best case in term of energyefficiency (except for migration of 2 VMs) but theworst in terms of makespan. However, when thedestination server is switched off, the penalty ofstarting up the server on both makespan and en-ergy efficiency is significant. When the destinationserver is full, the number of migrated VMs doesnot influence significantly either the makespan orthe energy efficiency. However, the higher thenumber of migrated VMs, the higher the reduc-tion of temperature. Hence, the potential increaseof temperature and the penalty on existing VMson the destination server should be analyzed inorder to identify the tradeoffs between increasingthe number of migrated VMs and the negativeeffects on the destination machine.

In most of the cases, DVFS obtains worse re-sults than the other techniques, in contrast to

Page 13: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 459

(a) Source server (running 4 VMs) (b) Destination server (initially idle)

Fig. 8 Power consumption of source/destination servers when migrating one VM. Migration starting time is depicted withthe dashed vertical line

other approaches that only focus on DVFS mainlyfor non-virtualized environments. To obtain asimilar reduction of temperature with DVFS, thepenalty on both makespan and energy efficiencyis higher. As we commented previously, runningall CPUs at 2.13 GHz works better than running2 CPUs at 1.60 GHz. In fact, if the workloadsof the different VMs were dependent, the exe-cution time of the workload when using 2 CPUsat 1.60 GHz and using 4 CPUs at 1.60 GHzwould be similar. Pinning the VMs to 3 CPUspenalizes makespan by only 18.57 % (which issimilar to the penalty when performing VM mi-

gration) but gives the highest energy efficiency.However, the reduction in temperature is lowerthan the one with VM migration. A higher re-duction in temperature is achieved when pinningthe VMs to 2 CPUs; however, in such scenariothe makespan increases significantly and becomescomparable to running all CPUs at 2.13 GHz(which is the best case for DVFS). Pinning theVMs to only one CPU achieves a higher re-duction in temperature but the penalty on bothmakespan and energy efficiency (due to resourcesharing and the associated problems such as con-text switches) is not acceptable. Thus, the thresh-

Table 1 Experimental results using different VM management techniques

Technique - configuration Makespan (s) Energy (J) EDP Temp ↓Reference (regular execution) 2,239 s 501,023 J 1,121,790,497 –Migrate 1 VM – destination empty +19.60 % +52.82 % +82.79 % 4 ◦CMigrate 1 VM – destination full +37.51 % +31.92 % +81.41 % 4 ◦CMigrate 1 VM – destination off +26.52 % +57.40 % +83.10 % 4 ◦CMigrate 2 VMs – destination empty +14.15% +51.86 % +73.36 % 9 ◦CMigrate 2 VMs – destination full +37.51 % +30.92 % +80.04 % 9 ◦CMigrate 2 VMs – destination off +21.08 % +56.44 % +73.68 % 9 ◦CMigrate 3 VMs – destination empty +16.43 % +51.71 % +76.65 % 15 ◦CMigrate 3 VMs – destination full +37.60 % +25.89 % +73.12 % 15 ◦CMigrate 3 VMs – destination off +23.35 % +56.29 % +76.96 % 15 ◦CDVFS 4 CPUs @1.60 GHz +78.38 % +46.18 % +160.76 % 8 ◦CDVFS 4 CPUs @2.13 GHz +53.46 % +36.28 % +109.15 % 4 ◦CDVFS 2 CPUs @1.60 GHz +60.02 % +51.14 % +141.87 % 3 ◦CPinning VMs to 3 CPUs +18.57 % +16.23 % +37.83 % 2.5 ◦CPinning VMs to 2 CPUs +55.69 % +37.03 % +113.35 % 6 ◦CPinning VMs to 1 CPU +165.74 % +108.89 % +455.11 % 10◦C

“Destination empty” means that the destination server is running but idle, “destination full” means that the destination serverhas 4 VMs running on the 4 CPUs, and “destination of f ” means that de destination server is switched off

Page 14: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

460 I. Rodero et al.

old for applying the pinning technique is 2 CPUsfor the HPL workload, which results in a temper-ature decrease of 6 ◦C. This temperature decreasemay be sufficient to react to many moderatehotspots.

Overall, the obtained results show that, de-pending on the temperature reduction required tomitigate the effects of a hotspot and the optimiza-tion goals (i.e., performance or energy efficiency),VM migration and pinning are the most effectivetechniques. The results also show that when thereare available servers to migrate VMs and themain objective is optimizing performance (i.e.,minimizing the makespan), it may be better mi-grating VMs rather than using other techniques.However, when the focus is energy efficiency,pinning may be a preferable technique in favor ofVM migration. Furthermore, when VM migrationis not feasible, pinning is the most effective mech-anism to reduce the server’s temperature whilebalancing performance and energy efficiency.

4 Proactive VM Allocation Model andImplementation

The purpose of our proactive VM allocation ap-proach is to increase the resource utilization and,hence, the energy efficiency. It aims at aims atmaximizing the system throughput by allocatingthe maximum possible number of VMs per node,without penalizing the applications’ performance.In other words, the objective is to find out thetrade-off between the applications’ performanceand the overall datacenter energy consumptionwhen different number and combinations of avariety of VMs are allocated to a physical server.The performance of an application is measuredin terms of its average execution time, which isdefined as the ratio of the maximum applicationexecution time when a number of VMs are run-ning simultaneously to the number of VMs. Thismetric gives an insight into the gains obtainedby multiplexing VMs (i.e., running them in par-allel) over running them sequentially one afterthe other. It is important to note that by con-sidering the average execution time of VMs wedo not focus on minimizing the execution time ofeach application individually but we strive to im-

prove the QoS by minimizing the number of SLAviolations.

As the best number of VMs per node may bedifferent for different application types (based ontheir usage of different subsystems), we considerthe applications’ profiles. Specifically, we focuson finding the best partition and allocation ofVMs when the different VMs run applications ofdifferent types. In our approach, we assume thatthe applications’ profiles are known in advance(e.g., specified by the user in the job definition).To find the allocation for a set of VMs thatbest matches energy efficiency/performance goalswhile ensuring QoS guarantees, we rely on amodel based on empirical data from experiments.We have developed a methodology composed ofthe following steps:

1. Profile a comprehensive set of applications(standard HPC benchmark workloads);

2. Run benchmarks exhaustively (all possible al-locations based on number of VMs and appli-cation type) and collect data;

3. Create a model (database) with all thedata collected during the benchmarkingprocess, including execution time and energyconsumed;

4. Implement an algorithm that, given the i)model, ii) an optimization goal (either min-imize energy consumption or minimize exe-cution time), iii) a set of servers with theircurrent allocations, and iv) a set of VMs alongwith their characteristics, returns a set of parti-tions and allocations of the VMs in the servers.

4.1 Application Profiling

The methodology involves profiling an applica-tion’s (and, hence, a VM’s) behavior as I/O-intensive, memory-intensive, and/or CPU-intensivebased on its usage of different subsystems. Mostof the standard profiling utilities are designedfor comparing computation efficiency of the ap-plications on systems on which they are run-ning and, therefore, their outputs are not veryuseful from the subsystem usage point of view.We profiled standard HPC benchmarks with re-spect to their behaviors and subsystem usage onindividual servers. To collect run-time OS-level

Page 15: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 461

metrics for CPU utilization, hard disk I/O, andnetwork I/O we used different mechanisms suchas “mpstat”, “iostat”, “netstat” or “PowerTOP”from Intel. We also patched the Linux kernel2.6.18 with “perfctr” so that we can read hardwareperformance counters on-line with relatively smalloverhead. We instrumented the applications withPAPI and, as the server architecture does notsupport total memory LD/ST counter, we countedthe number of L2 cache misses, which indicates(approximately) the activity of memory.

We chose a comprehensive set of HPC bench-mark workloads. Each workload stresses oneor more of the following subsystems—CPU,memory, disk (storage), and network interface.They can be classified as:

– CPU intensive, e.g., HPL Linpack, whichsolves a (random) dense linear system in dou-ble precision arithmetic, and FFTW, whichcomputes the discrete Fourier transform.

– Memory intensive, e.g., sysbench, which isa multi-threaded benchmark developed orig-inally to evaluate systems running a databaseunder intensive load.

– I/O intensive, e.g., b_eff_io, which isa MPI-I/O application, and bonnie++,

which focuses on hard-drive and file-systemperformance.

An application usually demands the servicesof a given subsystem in discrete time windows.However, if the average demand for a subsystemX is significant, we consider the application tobe X-intensive. Figure 9 (left) shows differentsubsystem utilizations of a CPU-intensive work-load. Note that an application can also be deemedto be intensive along multiple dimensions if thedemand for resources from multiple subsystemsare significant. Figure 9 (right) shows differentsubsystem utilizations of a network- and CPU-intensive workload. The utilization of a particularsubsystem by two different VMs running applica-tions may either overlap (resulting in contention)or not overlap (be contention free).

4.2 Benchmarking

The benchmarking was conducted in the experi-mental setup described in Section 3.3. In additionto using a single server type, we made some ad-ditional assumptions, such as a single process perVM, to reduce the complexity. To run multipleprocesses (e.g., MPI applications) multiple VMsare required.

Fig. 9 Sub-system utilization over time for a CPU-intensive workload (left) and a CPU-plus-network-intensive workload(right)

Page 16: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

462 I. Rodero et al.

In order to acquire sufficient data to createa VM allocation model, firstly, we conducted aset of base tests that consolidate different VMinstances running applications of the same typein a single server. This allowed us to find out, onthe one hand, the optimal scenarios to either max-imize performance or minimize energy consump-tion and, on the other hand, the maximum numberof VMs that can be consolidated in a single serverwithout adversely impacting energy consumptionand the applications’ performance. We ran thebase experiments with different number of VMs(up to 16) running the same application type foreach of the application’s profiles.

From the base tests, we obtained a set ofoptimal scenarios, i.e., optimal number of VMsfor the shortest average execution times and forminimum energy consumption. The second partof the benchmarking consists of running all thepossible combinations of workload types withdifferent number of VMs. The combinations ex-cluded those that do not require any VM of eachworkload type and the base tests. The experimentstook several days to be completed and they wereconducted using a platform that we developed toautomatically run the benchmarks and process thedata.

4.3 Database

In order to make our model available for proac-tive VM consolidation, the information collectedfrom the benchmarking (base and combined tests)was stored in a database. As the amount of in-formation was manageable using text files, weused a plain-text file with comma-separated values(CSV) instead of an actual database managementsystem. Table 2 summarizes the information con-tained in the database.

In addition to the information listed in Table 2,we store other relevant information from the baseexperiments such as the number of VMs of op-timal scenarios and reference execution times, inan auxiliary file. As the registers of the databaseare accessed using binary search, the searchingcost is O(log(num_tests)). Therefore, we sorted (inthe ascending order) the registers of the databaseby a searching key, which is composed of the

Table 2 Summary of the information stored in thedatabase

Field Description

Ncpu #VMs running a CPU-intensivebenchmark

Nmem #VMs running a memory-intensivebenchmark

Nio #VMs running an I/O-intensivebenchmark

relTimeVM Relative execution time for each VM(for each VM) (relative time = total exec time/exec

time on 1 VM)Time Total execution time of the

outcome (s)Energy Energy consumed to run the

outcome (J)MaxPower Maximum power dissipation

measured (W)EDP Energy delay product (J × s)

parameters that indicate the number of VMs ofeach workload type (Ncpu, Nmem, Nio).

4.4 VM Allocation Algorithm

Our VM allocation algorithm takes advantage ofthe model (in the form of a database) that isdescribed above. It has two main objectives, (i)minimize energy consumption and (ii) maximizethe performance (i.e., minimize workload execu-tion time). As these are two conflicting objectives,we use a parameter α to adjust the possible trade-off between energy efficiency and performance; α

is defined as follows: α ∈ R ∩ α ∈ (0, . . . , 1) andemphasizes the energy efficiency goal while 1 − α

emphasizes performance. For example, if α = 0.7the algorithm will try to minimize the energy con-sumption first (70 % of preference) and then theperformance but with less intensity (30 % of pref-erence). The allocation algorithm focuses on theobjectives discussed above and does not considerspecific policies such as those based on priorities.The input parameters of the algorithm are: (i)the database with the allocation model, (ii) val-ues from the base experiments (can be extractedfrom the auxiliary file), (iii) a set of VMs and theapplication’s profile and maximum execution time(QoS guarantees) for each of them, and (iv) theoptimization goal (α).

Page 17: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 463

First, as mentioned earlier, the algorithm setsup the knowledge database by loading it in mem-ory and sorting it using a three-column key (e.g.Ncpu, Nmem and Nio) in order to accelerate thesearch. Then, the estimated runtime and energyfor each combination of the VMs are calculatedand the minimum of them is taken. Note that, ifthere are N VMs in the input set, it can have BN

partitions, where BN is the Bell number. An ex-haustive search over these BN distinct partitions isguaranteed to provide an optimal solution. Eachpartition of the set contains several disjoint sub-

sets, and each of these subsets contains a certainnumber of CPU-, memory- and/or I/O-intensiveVMs. The number of CPU-, memory- and I/O-intensive VMs in the input are used to searchthe database entries associated to them. The es-timated execution time of a subset is computed bymultiplying the original input runtime (which is aninput parameter) by relTimeVM. Similarly, theestimated energy consumption of a subset is com-puted by multiplying the estimated runtime byaverage power (also stored in the database). Thecomputational complexity of the whole algorithm

Fig. 10 VM allocationalgorithm

Page 18: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

464 I. Rodero et al.

is O(BN · log(NDB)), where N is the number ofVMs to allocate and NDB is the number of ele-ments in the knowledge database. The algorithmreturns the allocation of VMs that best matchesthe input optimization goal while satisfying theQoS constraints. The algorithm can be relaxed bydisregarding the QoS guarantees but it might benot acceptable for production system.

To find the best partitions of the input set ofVMs for allocation in individual servers, we used abrute-force search algorithm over the servers withtheir current VM allocations. In cases where thenumber of partitions of the input set was large, weused the search algorithm discussed in [36], whichis efficient in terms of complexity. If two partitionshave the same rank in different servers, we selectthe first server of the list. Figure 10 shows themain components and control flow of our VMallocation algorithm.

5 Evaluation

In this section, we discuss the performance ofour proposed proactive VM allocation and reac-tive thermal management strategies. Firstly, weevaluated the possible energy savings and perfor-mance tradeoffs that can be achieved at the dat-acenter level using our proactive VM allocationalgorithm described in the previous section. Weconducted the simulations using parallel workloadtraces from real HPC production systems. Then,we conducted an experimental evaluation of ourproposed reactive thermal management solutionand compared it with other existing techniques.

5.1 Simulation

To evaluate the performance and energyefficiency of our VM allocation algorithm weused workload traces from real HPC productionsystems like the Grid Observatory [13], whichcollects, publishes, and analyzes logs of thebehavior of the EGEE Grid [10]. As the readilyavailable traces are in different formats andinclude data that are not useful for our purpose,they were pre-processed before use in thesimulations. Firstly, we converted the input tracesto the Standard Workload Format (SWF) [11]

and as they are usually composed of multiplefiles we combined them into a single file. Then,we cleaned the traces, now in SWF format, inorder to eliminate failed jobs, cancelled jobs, andanomalies. As the traces found from differentsystems did not provide all the informationneeded for our analysis, we needed to completethem using a model based on the benchmarking ofHPC applications (see Section 4). We randomlyassigned one of the four possible benchmarkprofiles to each request in the input trace,following a uniform distribution by bursts. Thebursts of job requests were sized (randomly) from1 to 5 job requests. These traces are intendedto illustrate the submission of scientific HPCworkflows, which are composed of sets of jobswith the same resource requirements.

As the EGEE Grid is a large-scale hetero-geneous distributed system composed of a largenumber of nodes, we scaled and adapted the jobrequests to the characteristics of our system modeland evaluation methodology. Specifically, we as-signed 1 to 4 VMs per job request rather thanthe original CPU demand and we defined theQoS requirements (maximum response time) perapplication type and not for each specific request.

Data obtained from real experiments on repre-sentative server hardware were used to populatethe database and to create the empirical model de-scribed in the previous section. Therefore, in oursimulations we used a system model composedof several servers with the same characteristics ofour real server testbed. To compute the estimatedexecution times and energy consumption we usedthe information from our allocation model. Givena specific partition with a subset of VMs runningtheir associated applications types, we lookup ourmodel database and use the matching values pro-portionally. As VM allocations may vary overtime, we compute the estimated execution timeand energy consumption with the weighted aver-age of the values associated to each interval oftime. We also assume a fixed power dissipation of125 W when a server is idle.

Figure 11 illustrates a possible VM allocationoutcome over time for a server. The applicationtype associated with each VM is also shown.Different time intervals (A, B, C) have differentVM allocations and, therefore, the estimated ex-

Page 19: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 465

Fig. 11 Possible VMallocation outcomeover time

ecution time of the applications and energy con-sumption for each interval will be different. Forexample, the execution time of VM1 will be com-puted considering the relative weight of each al-location (70 % of allocation A and 30 % of alloca-tion B) as follows: ExecTimeVM1 = 0.7 · 1200 s +0.3 · 1800 s = 1380 s and the energy consumptionfor the whole outcome will be: Energy = 0.35 ·15 KJ + 0.15 · 20 KJ + 0.5 · 12 KJ = 14.25 KJ. Aswe focus on studying the impact of VM allocationon the performance/energy efficiency, we do notconsider the overhead for scheduling and resourceprovisioning.

We evaluate the impact of our approach interms of the following metrics: makespan (in sec-onds), which is the difference between the earliesttime of submission of any of the workload tasksand the latest time of completion of any of itstasks, energy consumption (in Joules), and per-centage of SLA violations. The number of SLAviolations were calculated by summing the num-ber of missed deadlines of all applications. Thedeadline here refers to the maximum responsetime as specified by the QoS requirements.

We have conducted our simulations usingdifferent allocation strategies that have differentgoals. Specifically, we have evaluated the follow-ing allocation strategies:

– FIRST-FIT (FF), in which job requests areallocated following the first-fit policy basedon CPU slots. It means that an incoming jobrequest is allocated to the first available serveruntil the number of allocated VMs is equalto the number of CPUs (VM multiplexing onCPUs is not allowed). FIRST-FIT-2 (FF-2)and FIRST-FIT-3 (FF-3) are two variants of

FIRST-FIT that allow multiplexing up to 2and 3 VMs on each CPU, respectively.

– Proactive (PA), in which job requests are al-located to servers following the algorithm de-scribed in Section 4. We consider the followingvariations:

– α = 1 (PA-1): the goal is minimizing theenergy consumed;

– α = 0 (PA-0): the goal is minimizing theexecution time;

– α = 0.5 (PA-0.5): the goal is finding thebest tradeoff between execution time andenergy consumption. Note that, althoughexisting literature have addressed the op-timization of both metrics using dynamicconcurrency throttling over parallel re-gions [8], in this paper we assume thatenergy efficiency and performance are,in general, conflicting goals as we haveshown in our virtualized scenario.

Figures 12, 13 and 14 show the results ob-tained using different VM allocation strategiesfor handling the workloads traces described pre-viously. Furthermore, in order to control thepressure of the system load, we modeled twodifferent Clouds of different sizes rather thanusing different input traces with different arrivalrates. The SMALLER Cloud system is the refer-ence one and the LARGER Cloud system is over-dimensioned (15 % approximately), which meansthat the former one is expected to be more loadedthan the latter. The input trace used in the simula-tions requests a total of 10,000 VMs.

As we can observe in Fig. 12, the PA strategycan provide up to 18 % shorter execution times.This is due to the fact that application awareness

Page 20: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

466 I. Rodero et al.

Fig. 12 Makespan (s)

FF1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3x 10

4

Strategy

Mak

espa

n (s

)

LARGERSMALLER

results in fewer contentions for resources as onlythe most compatible VMs are consolidated. Withthe FIRST-FIT strategy the execution times arelonger due to high resource contention, especiallywhen multiplexing 3 VMs on the same CPU. Fur-thermore, Fig. 12 shows that the PA strategy withthe performance optimization goal reduces theexecution times by more than 3 % in comparisonto the same strategy with the energy optimizationgoal. We can also appreciate that the executiontimes in the SMALLER system are higher thanthe execution times in the LARGER system due

to higher load pressure. This is especially evidentin the case of FF-3, FIRST-FIT strategy with mul-tiplexing (up to 3 VMs per CPU), due to possibleadditional resource contention.

Figure 13 shows that the PA strategy reducesenergy consumption by around 12 % on aver-age with respect to FIRST-FIT (with and with-out VM multiplexing). In fact, makespan andenergy consumption follow a similar pattern inthe LARGER system. Furthermore, Fig. 13 showsthat the PA strategy with the energy optimiza-tion goal saves almost 3 % more energy than the

Fig. 13 Energyconsumption (J)

FF2.5

3

3.5

4x 10

8

Strategy

Ene

rgy

Con

sum

ptio

n (J

)

LARGERSMALLER

Page 21: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 467

same strategy with the performance optimizationgoal. However, with the goal of finding the besttradeoff it provides intermediate results (but thevariations are not very significant, i.e., <2 %).Although the makespan in the SMALLER systemis higher than the makespan in the LARGER sys-tem, the energy consumption in the SMALLERsystem is lower than the energy consumption inthe LARGER system as in the SMALLER systemthere are fewer servers consuming energy andthere are more opportunities for consolidation.However, with the FIRST-FIT strategy the re-source contention penalizes the energy efficiencysignificantly when multiplexing of 2 or 3 VMs isallowed on the same CPU.

Figure 14 shows that the percentage of SLAviolations with the PA strategies is also less com-pared to the traditional schemes. It means that thePA strategy can maintain or even provide betterQoS guarantees than the traditional approaches.Furthermore, we can observe in Fig. 14 a correla-tion between execution time and SLA violations,the higher the makespan higher the percentageof SLA violations. We also can appreciate thatthe strategies evaluated present similar behaviorsunder higher load conditions. We do not show inthis article the results obtained with other possibleconfigurations of the PA strategy (e.g., α = 0.75)since the variation in the results was not significantenough.

5.2 Validation

We performed experiments on real hardwarein order to validate our proposed approachfor energy-efficient reactive thermal managementand to compare them with other existing VM man-agement strategies. We considered 50 ◦C as ther-mal emergency temperature. Although existingliterature considers emergency temperature of thedatacenter at 105–135 ◦F (40–57 ◦C) [12] and per-component red line temperature to 72 ◦C for theCPU and 67 ◦C for the disk [38], we determinedexperimentally that there were hardware failureswhen the servers operating temperature raisedabove this threshold. Specifically, we evaluatedthe following thermal management strategies:

– FIRST-FIT (FF), in which job requests areallocated following the first-fit policy based onCPU slots as the one used in our simulations.This strategy provides an insight into the worstperformance (lower bound) when a naive VMallocation strategy is followed without any re-gard to the thermal behavior of the underlyinghardware.

– RANDOM (RND), in which job requests areallocated to servers randomly. Servers withleast load (smallest number of VMs runningon them) are chosen with a high probability tohost the VMs corresponding to the current set

Fig. 14 Percentageof SLA violations

FF0

5

10

15

20

25

30

35

Strategy

% o

f SLA

Vio

latio

ns

LARGERSMALLER

Page 22: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

468 I. Rodero et al.

of job requests. This strategy spreads the jobrequests uniformly across all servers but maynot guarantee a uniform thermal behavior.

– TEMPERATURE-AWARE (TA), in whichthe “coolest” among all servers whose oper-ating temperatures are within a pre-specified“safety” threshold (i.e., under the threshold of50 ◦C) is chosen to host the VMs correspond-ing to the current set of job requests. Thisstrategy spreads the thermal load uniformlyacross all servers but does not ensure a uni-form load distribution (in terms of number ofjob requests allocated per server).

– VM-MIGRATION, in which the VMs areinitially allocated based on the FF approach.When a server reaches unsafe operating tem-peratures or thermal hotspots (regions thatare more than 50 ◦C in temperature in the test-bed used) are detected, VMs are migrated andagain the FF approach is used to determinewhere to reallocate the VMs.

– ENERGY-AWARE THERMAL MANAGE-MENT (EATM), in which VMs are initiallyallocated based on the proactive model de-scribed in Section 4, the cross-layer reactivethermal management described in Section 3 isapplied when a server reaches a unsafe tem-perature or thermal hotspots are detected theproactive VM allocation model is again usedto determine where to migrate the VMs (whenmigration is chosen as the most appropriatereaction to the thermal anomaly).

In all the strategies we limit the multiplexingfactor (i.e., the number of VMs that can run si-multaneously on a physical CPU) to 2 VMs/CPU.The experiments were conducted on a virtualizedcluster composed of 8 of the nodes described inSection 3.1, stacked in one rack with Xen hypervi-

sor. A “Watts Up? .NET” power meter was usedfor instantaneous power consumption measure-ments as described in Section 3.3 and internal sen-sors were used for detecting thermal anomalies.The cluster was operated in a poorly cooled envi-ronment and no cooling system optimization wasdone simultaneously. We used a workload basedon the one described in Section 5.1 but scaled tothe size of the testbed (providing an average loadof around 75 %, which is a typical target for HPCvirtualized clusters).

Table 3 shows the experimental results for thedifferent strategies described above.“%Unsafe”refers to the fraction of time that servers’ operat-ing temperature was higher than 50 ◦C (aggregateof all the servers’s “unsafe” fractions of time).Figures 15 and 16 show the temperature in ◦C ofeach server over time and power dissipation of thecluster over time, respectively, for three represen-tative strategies (i.e., RND, TA and EATM).

The overall results show that EATM out-performs the other strategies. EATM providesshorter makespan (2.25 % and 12 % shorter thanFF’s and TA’s, respectively), lower energy con-sumption (around 9 % lower than both FF’s andTA’s) and a lower EDP. We can appreciate thatTA achieves lower operating temperatures thanFF does but results in longer a makespan re-sulting in higher energy consumption and, hencehigher EDP. This is because when all the availableservers’ operating temperatures are higher thanthe safety threshold jobs cannot be scheduled andare blocked until the temperatures reduce.

VM-MIGRATION technique outperforms FF,RND, and TA strategies. However, it results in alonger makespan and higher energy footprint thanEATM. This is due to the computation overheadof VM migrations and the fact that the opportu-nities for VM migrations are low when the sys-

Table 3 Experimental results using the different thermal management strategies

Strategy Makespan (s) Energy (KJ) EDP × 106 Power (watts) Temperature (◦C) % Unsafe

min avg max min avg max

FIRST-FIT 2,994 2,487 7,341 663 835 1,093 37.67 48.27 59.14 16.07RANDOM 2,960 2,459 7,232 667 833 1,058 35.27 47.31 57.50 18.62THERMAL-AWARE 3,282 2,502 8,211 679 817 1,050 32.45 45.13 54.12 7.25VM-MIGRATION 3,180 2,407 7,654 482 768 1,067 29.50 44.61 57.74 9.12EATM 2,928 2,281 6,719 662 781 942 32.17 43.17 52.64 6.87

Page 23: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 469

Fig. 15 Temperature(in ◦C) of each serverover time

(b) THERMAL-AWARE strategy

(a) RANDOM strategy

(c) EATM strategy

Page 24: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

470 I. Rodero et al.

Fig. 16 Cluster powerdissipation over time

0 300 600 900 1,200 1,500 1,800 2,100 2,400 2,700 3,000 3,300600

700

800

900

1000

1100

Time (s)

Pow

er (

wat

ts)

0 300 600 900 1,200 1,500 1,800 2,100 2,400 2,700 3,000 3,300600700

800

900

1000

1100

Time (s)

Pow

er (

wat

ts)

0 300 600 900 1,200 1,500 1,800 2,100 2,400 2,700 3,000 3,300600

700

800

900

1000

1100

Time (s)

Pow

er (

wat

ts)

(a) RANDOM strategy

(b) THERMAL-AWARE strategy

(c) EATM strategy

tem is already heavily loaded. The average powerconsumption and minimum average temperatureis lower for VM migration with respect to EATMdue to the periods in which unused servers (whichare only a few and for a very short duration) areswitched off. VM-MIGRATION may work betterfor lightly loaded scenarios where the probabilityof having idle servers is high. However, we be-lieve that EATM would be more effective froman holistic perspective. Note that, in this paper,we focus on HPC workloads and, therefore, VMmigration overheads are specific to such scenar-ios and cannot be generalized for all types ofworkloads.

The average power dissipation while usingEATM is 7 % and 5 % lower than those incurredby FF/RND and TA, respectively, but larger thanthe one incurred by VM-MIGRATION. How-ever, EATM provides lower maximum power dis-sipation with respect to VM-MIGRATION dueto the overhead of migrating VMs when the tem-perature is high. The maximum power dissipa-tion is also significantly lower for EATM withrespect to the other strategies (e.g., 16 % and11 % lower than those incurred by FF and TA,respectively).

EATM provides a better thermal behavior (i.e.,10 % and 2 % lower average temperature com-pared to FF and TA, respectively and significantlylower maximum temperature) as it does reactto thermal anomalies and also consolidates bet-ter the workload using an application-centric ap-proach. In contrast to EATM, the TA policy re-duces the maximum temperature but does notcontrol the average temperature because it doesnothing to mitigate thermal hotspots (only avoidnew job requests in the server), which resultsin higher average temperature with respect toEATM. It can be clearly seen in Fig. 16c thatthe hottest servers are those in the middle ofthe rack due to heat propagation and proba-bly some affinity of workloads to these serversin the policy implementation. We can also ob-serve some clear thermal hotspots with tem-peratures up to 57 ◦C. As far as the thermalbehavior of servers when using the TA strat-egy is concerned, higher temperature zones areuniformly spread and thermal hotspots are oflower intensity. Similarly, while using EATM,higher temperature zones are uniformly spreadbut these temperatures are lower compared tothose achieved with TA and are well under

Page 25: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 471

the safety threshold (due to reactive thermalmanagement).

The fraction of time that servers run in unsafeconditions while using EATM is much shorterthan the ones achieved while using FF (less than50 %) and TA (less than 5 %), which indicatesthat EATM achieves the objective of reducingthe probability of hardware failure while provid-ing a positive tradeoff between performance andenergy.

6 Conclusions and Future Work

In this article, we presented and evaluated i) a re-active cross-layer thermal management solution,which alleviates undesired thermal anomalies (i.e.,hotspots) in virtualized HPC cloud infrastructureand ii) a proactive application-centric strategyfor VM allocation, which aims at maximizing theresource utilization and energy efficiency whilesatisfying quality of service guarantees for HPCapplications. The reactive thermal managementsolution considers the tradeoffs among perfor-mance, energy efficiency, and thermal efficiencywhile using different mechanisms such as VMmigration, DVFS, and CPU pinning to alleviatethermal anomalies. The VM allocation algorithmleverages an empirical model for the averageenergy consumption and execution time derivedfrom measurements on real hardware runningHPC workloads.

The results obtained from simulations usingreal production HPC workload traces show thatour proactive VM allocation solution significantlycontributes to energy efficiency (12 % reduc-tion in energy consumption) and/or optimizationof the application performance (18 % reductionin execution time) depending on optimizationgoals. Our reactive Energy-Aware Thermal Man-agement (EATM) solution in conjunction withthe aforementioned allocation approach outper-forms other traditional thermal management ap-proaches like load redistribution (VM migrations)and Temperature-Aware (TA) VM placement interms of energy consumption (up to 9 % less thanthat of TA’s), makespan (up to 12 % less thanthat of TA’s), maximum server operating tem-peratures (up to 10 % less than that achieved by

the load redistribution technique), and percentageof time for which servers operate in unsafe tem-peratures (up to 25 % less than that achieved bythe load redistribution technique). As an exten-sion of the work presented in this article, we arecurrently working on a proactive VM allocationmodel that uses the concept of heat imbalance(which is the difference between heat generatedby servers and the heat extracted by the coolingsystem) to project future temperatures and toplace workloads on servers. Also, this paper dealswith physical resources (testbed of servers) thatbelong to the same datacenter. As part of ourstudy and validation of proactive thermal manage-ment solutions (such as the aforementioned VMallocation model), we are conducting experimentson a heterogeneous testbed (in terms of comput-ing as well as cooling resources) spanning over twosites of NSF CAC at Rutgers University and theUniversity of Florida.

Acknowledgements The research presented in this workis supported in part by National Science Foundation(NSF) via grants numbers IIP 0758566, CCF-0833039,DMS-0835436, CNS 0426354, IIS 0430826, CNS 0723594and CSR-1117263, by the Department of Energy ExaCTCombustion Co-Design Center via subcontract number4000110839 from UT Battelle and via the grant num-bers DE-SC0007455 and DE-FG02-06ER54857, and by anIBM Faculty Award, and was conducted as part of theNSF Cloud and Autonomic Computing Center at RutgersUniversity.

References

1. Report to congress on server and data center energyefficiency. Tech. rep., U.S. Environmental ProtectionAgency (2007)

2. Ajiro, Y., Tanaka, A.: Improving packing algorithmsfor server consolidation. In: Proc. of Computer Mea-surement Group Conf. (CMG), San Diego, CA, pp.399–406 (2007)

3. Apparao, P., Iyer, R., Zhang, X., Newell, D.,Adelmeyer, T.: Characterization & analysis of a serverconsolidation benchmark. In: Proc. of ACM SIG-PLAN/SIGOPS Conf. on Virtual Execution Environ-ments (VEE), Seattle, WA, pp. 21–30 (2008)

4. Bash, C., Forman, G.: Cool job allocation: Measuringthe power savings of placing jobs at cooling-efficientlocations in the data center. In: Proc. of USENIX An-nual Technical Conf. (ATEC), Santa Clara, CA, pp.363–368 (2007)

5. Beitelmal, A., Patel, C.: Thermo-fluids provisioning ofa high performance high density data center. Distrib.Parallel Dat. 21(2–3), 227–238 (2007)

Page 26: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

472 I. Rodero et al.

6. Bobroff, N., Kochut, A., Beaty, K.: Dynamic place-ment of virtual machines for managing SLA vi-olations. In: Proc. of IFIP/IEEE Symp. on Inte-grated Network Management (IM), Munich, Germany,pp. 119–128 (2007)

7. Chen, Y., Das, A., Qin, W., Sivasubramaniam, A.,Wang, Q., Gautam, N.: Managing server energy andoperational costs in hosting centers. In: Proc. ofACM Intl. Conf. on Measurement and Modeling ofComputer Systems (SIGMETRICS), Banff, Canada,pp. 303–314 (2005)

8. Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopou-los, D.S., de Supinski, B.R., Schulz, M.: Predictionmodels for multi-dimensional power-performance op-timization on many cores. In: Proc. of the 17th Intl.Conf. on Parallel Architectures and Compilation Tech-niques, pp. 250–259 (2008)

9. Das, R., Kephart, J.O., Lefurgy, C., Tesauro, G.,Levine, D.W., Chan, H.: Autonomic multi-agent man-agement of power and performance in data centers.In: Proc. of Intl. joint Conf. on Autonomous Agentsand Multiagent Systems (AAMAS), Estoril, Portugal,pp. 107–114 (2008)

10. Enabling Grid for E-sciencE (2010). http://www.eu-egee.org/. Accessed 1 Oct 2011

11. Feitelson, D.: Parallel workload archive (2010).http://www.cs.huji.ac.il/labs/parallel/workload/.Accessed 1 Oct 2011

12. Garday, D., Housley, J.: Thermal storage system pro-vides emergency data center cooling. Tech. rep., IntelCorporation (2007)

13. Grid Observatory (2010). http://www.grid-observatory.org/. Accessed 1 Oct 2011

14. Gonzalez, R., Horowitz, M.: Energy dissipation in gen-eral purpose microprocessors. IEEE J. Solid-State Cir-cuits 31(9), 1277–1284 (1996)

15. Govindan, S., Nath, A.R., Das, A., Urgaonkar, B., Siva-subramaniam, A.: Xen and co.: communication-awareCPU scheduling for consolidated Xen-based hostingplatforms. In: Proc. of the Intl. Conf. on Virtual Exe-cution Environments (VEE), San Diego, CA, pp. 126–136 (2007)

16. Greenberg, S., Mills, E., Tschudi, B.: Best practicesfor data centers: lessons learned from benchmarking22 data centers. In: Proc. of American Council for anEnergy-Efficient Economy (ACEEE), Pacific Grove,CA (2006)

17. Gupta, D., Lee, S., Vrable, M., Savage, S., Sno-eren, A.C., Varghese, G., Voelker, G.M., Vahdat, A.:Difference engine: harnessing memory redundancy invirtual machines. Commun. ACM 53(10), 85–93 (2010)

18. Heath, T., Centeno, A.P., George, P., Ramos, L.,Jaluria, Y., Bianchini, R.: Mercury and freon: tem-perature emulation and management for server sys-tems. In: Proc. of Intl. Conf. on Architectural Supportfor Programming Languages and Operating Systems(ASPLOS), San Jose, CA, pp. 106–116 (2006)

19. Hermenier, F., Lorca, X., Menaud, J.M., Muller,G., Lawall, J.: Entropy: a consolidation manager forclusters. In: Proc. of the ACM SIGPLAN/SIGOPS

Conf. on Virtual Execution Environments (VEE),Washington, DC, pp. 41–50 (2009)

20. Kang, S., Schmidt, R.R., Kelkar, K., Patankar, S.: Amethodology for the design of perforated tiles in raisedfloor data centers using computational flow analysis.IEEE Trans. Compon. Packag. Technol. 24(2), 177–183(2001)

21. Kaxiras, S., Martonosi, M.: Computer ArchitectureTechniques for Power-Efficiency. Morgan and Clay-pool (2008)

22. Kephart, J.O., Chan, H., Das, R., Levine, D.W.,Tesauro, G., Rawson, F., Lefurgy, C.: Coordinatingmultiple autonomic managers to achieve specifiedpower-performance tradeoffs. In: Proc. of Intl. Conf.on Autonomic Computing (ICAC), Jacksonville, FL,pp. 24–34 (2007)

23. Kochut, A., Beaty, K.: On strategies for dynamicresource management in virtualized server environ-ments. In: Proc. of the Intl. Symp. on Modeling,Analysis, and Simulation of Computer and Telecom-munication Systems (MASCOTS), Istanbul, Turkey,pp. 193–200 (2007)

24. Kumar, S., Talwar, V., Kumar, V., Ranganathan, P.,Schwan, K.: vManage: loosely coupled platform andvirtualization management in data centers. In: Proc.of Intl. Conf. on Autonomic Computing (ICAC),Barcelona, Spain, pp. 127–136 (2009)

25. Laszewski, G., Wang, L., Younge, A.J., He, X.: Power-aware scheduling of virtual machines in DVFS-enabledclusters. In: Proc. of IEEE Intl. Conf. on ClusterComputing (CLUSTER), New Orleans, LA, pp. 1–10(2009)

26. Lee, E.K., Kulkarni, I., Pompili, D., Parashar, M.:Proactive thermal management in green datacenter. J.Supercomput. 51(1), 1–31 (2010)

27. Liu, J., Priyantha, B., Zhao, F., Liang, C., Wang, Q.,James, S.: Towards discovering data center genomeusing sensor networks. In: Proc. of the Workshop onEmbedded Networked Sensors (HotEmNets), Char-lottesville, VA (2008)

28. Mannas, E., Jones, S.: Add thermal monitoringto reduce data center energy consumption (2009).http://pdfserv.maxim-ic.com/en/an/AN4334.pdf.Accessed 1 Oct 2011

29. Menasce, D.A., Bennani, M.N.: Autonomic virtualizedenvironments. In: Proc. of Intl. Conf. on Autonomicand Autonomous Systems (ICAS), Santa Clara, CA,pp. 1–28 (2006)

30. Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet,E., Pendarakis, D.: Efficient resource provisioning incompute clouds via VM multiplexing. In: Proc. of Intl.Conf. on Autonomic Computing (ICAC), Washington,DC, pp. 11–20 (2010)

31. Moore, J., Chase, J., Ranganathan, P., Sharma, R.:Making scheduling “cool”: temperature-aware work-load placement in data centers. In: Proc. of USENIXAnnual Technical Conf. (ATEC), pp. 61–75 (2005)

32. Moore, J.D., Chase, J.S., Ranganathan, P.:Weatherman: automated, online and predictivethermal mapping and management for data centers. In:

Page 27: Energy-Efficient Thermal-Aware Autonomic Management of ... › ~pompili › paper › thermal-aware_hpc.pdf · traction through cooling system optimization as in [20, 44–46] or

Energy-Efficient HPC Cloud Infrastructure 473

Proc. of Intl. Conf. on Autonomic Computing (ICAC),Dublin, Ireland, pp. 155–164 (2006)

33. Mukherjee, T., Banerjee, A., Varsamopoulos, G.,Gupta, S., Rungta, S.: Spatio-temporal thermal-awarejob scheduling to minimize energy consumption invirtualized heterogeneous data centers. Comput. Net-works 53(17), 2888–2904 (2009)

34. Nathuji, R., Isci, C., Gorbatov, E.: Exploiting plat-form heterogeneity for power efficient data centers. In:Proc. of Intl. Conf. on Autonomic Computing (ICAC),Jacksonville, FL, pp. 1–5 (2007)

35. Nathuji, R., Schwan, K.: VirtualPower: coordinatedpower management in virtualized enterprise systems.In: Proc. of ACM SIGOPS Symp. on Operating Sys-tems Principles (SOSP), pp. 265–278 (2007)

36. Orlov, M.: Efficient generation of set partitions. Tech.rep., Engineering and Computer Sciences, Universityof Ulm (2002)

37. Patel, C., Bash, C., Belady, L., Stahl, L., Sullivan, D.:Computational fluid dynamics modeling of high com-pute density data centers to assure system inlet airspecifications. In: Proc. of Pacific Rim/ASME Inter-national Electronic Packaging Technical Conferenceof(IPACK), Kauai, HI (2001)

38. Rambo, J., Joshi, Y.: Modeling of data center airflowand heat transfer: state of the art and future trends.Distrib. Parallel Dat. 21(2–3), 193–225 (2007)

39. Ramos, L., Bianchini, R.: C-oracle: predictive ther-mal management for data centers. In: Proc. of Intl.Symp. on High-Performance Computer Architecture(HPCA), pp. 111–122 (2008)

40. Ranganathan, P., Leech, P., Irwin, D., Chase, J.:Ensemble-level power management for dense bladeservers. SIGARCH Comput. Archit. News 34(2), 66–77 (2006)

41. Rodero, I., Chandra, S., Parashar, M., Muralidhar, R.,Seshadri, H., Poole, S.: Investigating the potential ofapplication-centric aggressive power management forHPC workloads. In: Proc. of the IEEE Intl. Conf.on High Performance Computing (HiPC), Goa, India,pp. 1–10 (2010)

42. Rodero, I., Lee, E.K., Pompili, D., Parashar, M.,Gamell, M., Figueiredo, R.J.: Exploiting VM technolo-gies for reactive thermal management in instrumenteddatacenters. In: Workshop on Energy Efficient Grids,Clouds, and Clusters in conjunction with IEEE Grid,Brussels, Belguim, pp. 321–328 (2010)

43. Rusu, C., Ferreira, A., Scordino, C., Watson, A.:Energy-efficient real-time heterogeneous server clus-ters. In: Proc. of IEEE Real-Time and Embed-ded Technology and Applications Symp. (RTAS), St.Louis, MO, pp. 418–428 (2006)

44. Schmidt, R.R., Cruz, E.: Raised floor computer datacenter: effect on rack inlet temperatures of exiting boththe hot and cold aisle. In: Proc. of Itherm Conference(ITHERM), San Diego, CA (2002)

45. Schmidt, R.R., Cruz, E.E., Iyengar, M.K.: Challengesof data center thermal management. IBM J. Res. De-velop. 49(4/5), 709–723 (2005)

46. Schmidt, R.R., Karki, K., Kelkar, K., Radmehr, A.,Patankar, S.: Measurements and predictions of the flowdistribution through perforated tiles in raised floordata centers. In: Proc. of Pacific Rim/ASME Inter-national Electronic Packaging Technical Conferenceof(IPACK), Kauai, HI (2001)

47. Sharma, R., Bash, C., Patel, R.: Dimensionless parame-ters for evaluation of thermal design and performanceof large-scale data centers. In: Proc. of ASME/AIAAJoint Thermophysics and Heat Transfer Conference.St. Louis, MO (2002)

48. Sharma, R.K., Bash, C.E., Patel, C.D., Friedrich, R.J.,Chase, J.S.: Balance of power: dynamic thermal man-agement for internet data centers. IEEE Internet Com-put. 9(1), 42–49 (2005)

49. Song, Y., Sun, Y., Wang, H., Song, X.: An adap-tive resource flowing scheme amongst VMs in a VM-based utility computing. In: Proc. of IEEE Intl. Conf.on Computer and Information Technology (ICCIT),Fukushima, Japan, pp. 1053–1058 (2007)

50. Steinder, M., Whalley, I., Carrera, D., Gaweda, I.,Chess, D.: Server virtualization in autonomic manage-ment of heterogeneous workloads. In: Proc. of IEEESymp. on Integrated Network Management, pp. 139–148 (2007)

51. Stoess, J., Lang, C., Bellosa, F.: Energy manage-ment for hypervisor-based virtual machines. In: Proc.of USENIX Annual Technical Conf. (ATEC), SantaClara, CA, pp. 1–14 (2007)

52. Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Energy-efficient thermal-aware task scheduling for homoge-neous high-performance computing data centers: acyber-physical approach. IEEE Trans. Parallel Distrib.Syst. 19(11), 1458–1472 (2008)

53. Verma, A., Ahuja, P., Neogi, A.: pMapper: powerand migration cost aware application placement invirtualized systems. In: Proc. of ACM/IFIP/USENIXIntl. Conf. on Middleware (MIDDLEWARE),Leuven, Belgium, pp. 243–264 (2008)

54. Verma, A., Ahuja, P., Neogi, A.: Power-aware dy-namic placement of HPC applications. In: Proc. of Intl.Conf. on Supercomputing (ICS), Island of Kos, Greece,pp. 175–184 (2008)

55. Voorsluys, W., Broberg, J., Venugopal, S., Buyya, R.:Cost of virtual machine live migration in clouds: a per-formance evaluation. In: Proc. of Intl. Conf. on CloudComputing (CloudCom), Beijing, China, pp. 254–265(2009)

56. Wood, T., Tarasuk-Levin, G., Shenoy, P., Desnoy-ers, P., Cecchet, E., Corner, M.D.: Memory buddies:exploiting page sharing for smart colocation in vir-tualized data centers. In: Proc. of the ACM SIG-PLAN/SIGOPS Conf. on Virtual Execution Environ-ments (VEE), Washington, DC, pp. 31–40 (2009)

57. Zhu, Q., Zhu, J., Agrawal, G.: Power-aware consoli-dation of scientific workflows in virtualized environ-ments. In: Proc. of Intl. Conf. on High PerformanceComputing, Networking, Storage and Analysis (SC),New Orleans, LA, pp. 1–12 (2010)