energy aware consolidation for cloud computing aware consolidation for cloud computing shekhar...

Energy Aware Consolidation for Cloud Computing

Shekhar SrikantaiahPennsylvania State University

Aman KansalMicrosoft Research

Feng ZhaoMicrosoft Research

Abstract

Consolidation of applications in cloud computing envi-ronments presents a significant opportunity for energyoptimization. As a first step toward enabling energy effi-cient consolidation, we study the inter-relationships be-tween energy consumption, resource utilization, and per-formance of consolidated workloads. The study revealsthe energy performance trade-offs for consolidation andshows that optimal operating points exist. We model theconsolidation problem as a modified bin packing prob-lem and illustrate it with an example. Finally, we outlinethe challenges in finding effective solutions to the con-solidation problem.

1 IntroductionOne of the major causes of energy inefficiency in datacenters is the idle power wasted when servers run at lowutilization. Even at a very low load, such as 10% CPUutilization, the power consumed is over 50% of the peakpower [1]. Similarly, if the disk, network, or any suchresource is the performance bottleneck, the idle powerwastage in other resources goes up. In the cloud comput-ing approach multiple data center applications are hostedon a common set of servers. This allows for consoli-dation of application workloads on a smaller number ofservers that may be kept better utilized, as different work-loads may have different resource utilization footprintsand may further differ in their temporal variations. Con-solidation thus allows amortizing the idle power costsmore efficiently.

However, effective consolidation is not as trivial aspacking the maximum workload in the smallest numberof servers, keeping each resource (CPU, disk, network,etc) on every server at 100% utilization. In fact such anapproach may increase the energy used per unit serviceprovided, as we show later.

Performing consolidation to optimize energy usagewhile providing required performance raises several con-cerns. Firstly, consolidation methods must carefully de-cide which workloads should be combinedon a com-mon physical server. Workload resource usage, perfor-mance, and energy usages are not additive. Understand-ing the nature of their composition is thus critical todecide which workloads can be packed together. Sec-ondly, there exists anoptimal performance and energypoint. This happens because consolidation leads to per-formance degradation that causes the execution time to

increase, eating into the energy savings from reduced idleenergy. Further, the optimal point changes with accept-able degradation in performance and application mix.Determining the optimal point and tracking it as work-loads change, thus becomes important for energy effi-cient consolidation.

This paper exposes some of the complexities in per-forming consolidation for power optimization, and pro-poses viable research directions to address the chal-lenges involved. We experimentally study how perfor-mance, energy usage, and resource utilization changesas multiple workloads with varying resource usages arecombined on common servers. We use these exper-imental insights to infer optimal operating points thatminimize energy usage with and without performanceconstraints. We then discuss consolidation as a modi-fied multi-dimensional bin-packing problem of allocat-ing and migrating workloads to achieve energy opti-mal operation. To concretely illustrate this research di-rection, we present a computationally efficient heuris-tic to perform consolidation in a simplified scenario.There are many issues that affect consolidation, includ-ing server and workload behavior, security restrictionsrequiring co-location of certain application components,and power line redundancy restrictions. The paper fo-cuses only on a manageable but important subspacespanned by CPU and disk resource combinations. Themany issues that need to be addressed to achieve a real-world implementation of such consolidation methods arealso discussed to help point out fruitful research direc-tions.

2 Understanding ConsolidationUnderstanding the impact of consolidating applicationson the key observable characteristics of execution, in-cluding resource utilization, performance, and energyconsumption, is important to design an effective consol-idation strategy.

To explore this impact, we used the experimental setupshown in Figure 1. A cloud consisting ofm = 4 physi-cal servers hostingk controlled applications services re-quests from controlled clients. Each client generates re-quests at a desired rate. Each request consists of jobswith specified levels of resource usage on each serverresource (here, each server is considered as comprisingof two resources: processor and disk). Each server isconnected to a power meter (WattsUp Pro ES) to track

power consumption. The Wattsup meter samples powerat a maximum rate of 1 Hz, and hence any desired utiliza-tion state must be sustained for more than 1 second. Wedesigned the request processing to maintain a uniformutilization state for 60 seconds and averaged the readingsover this period to get accurate energy data. The resourceutilization is tracked using the operating system’s builtin instrumentation accessed through the Xperf utility totrack processor and disk usage.�� !� �� "�

Figure 1:Experimental setup.

Consolidation influences utilization of resources in anon-trivial manner. Clearly, energy usage does not lin-early add when workloads are combined, due to a sig-nificant percentage of idle energy. But utilization andperformance also change in a non-trivial manner. Perfor-mance degradation occurs with consolidation because ofinternal conflicts among consolidated applications, suchas cache contentions, conflicts at functional units of theCPU, disk scheduling conflicts, and disk write bufferconflicts.

To study the impact of consolidation with multipleresources, we measured performance and energy whilevarying both CPU and disk utilizations. First, an appli-cation with 10% CPU utilization and 10% disk utiliza-tion is started. Then, it is combined with workloads ofvarying CPU and disk utilizations, ranging from 10% to90% in each resource. The results are plotted in Figure2(a). The performance degradation observed along theCPU utilization axis is not as significant as that observedalong the disk utilization axis implying that increasingdisk utilization is a limiting factor for consolidated per-formance on this server.

Energy consumptionper transactionof a consolidatedworkload is influenced by both resource utilization andperformance. Typical variation of energy per transactionwith utilization (of a single resource) can be expectedto result in a “U”-shaped curve. When the resource uti-lization is low, idle power is not amortized effectivelyand hence the energy per transaction is high. At highresource utilization on the other hand, energy consump-tion is high due to performance degradation and longerexecution time.

Figure 2(b) plots the energy consumption per transac-tion of the consolidated workload, for the same scenario.We can make a few important observations from this re-sult. Firstly, energy consumption per transaction is moresensitive to variations in CPU utilization (as seen by thedeeper “U” along the CPU axis) than variations in diskutilization as seen by the relatively flat shape of the curve

#$ %$ &$ '$ ($ )$ *$ +$ ,$$#$%$&$'$($)$*$+$,$

#$ %$ &$ '$ ($ )$ *$ +$ ,$-./012.23456463715807983

$:#$ #$:%$ %$:&$ &$:'$ '$:($ ($:)$ )$:*$ *$:+$ +$:,$

(a) Performance degradation

;< =< >< ?<@<<;A=B>C?

;< A< =< B< >< C< ?< D< @<EFGHIJKLMNGHOHPFQPROSTF

<U; ;UA AU= =UB BU> >UC CU?

(b) Energy consumptionFigure 2: Performance degradation and energy consumption withvarying combined CPU and disk utilizations.

along the axis of disk utilization. Secondly, there ex-ists an optimal combination of CPU and disk utilizationswhere the energy per transaction is minimum. This oc-curs at 70% CPU utilization and 50% disk utilization forthis setup, though the numbers could depend on the spe-cific machines and workloads used. This energy optimalcombination may vary if we further impose bounds onthe performance degradation tolerable to us, but we canalways find an optimal combination of resource utiliza-tions that minimizes energy per transaction taking intoconsideration both the resources. Note also that this op-timal point may be significantly different than that deter-mined by considering each resource in isolation.

3 Consolidation ProblemThe goal of energy aware consolidation is to keep serverswell utilized such that the idle power costs are efficientlyamortized but without taking an energy penalty due tointernal contentions.

2

The problem of loading servers to a desired utiliza-tion level for each resource may be naıvely modeled asa multi-dimensional bin packing problem where serversare bins with each resource (CPU, disk, network, etc) be-ing one dimension of the bin. The bin size along each di-mension is given by the energy optimal utilization level.Each hosted application with known resource utilizationscan be treated as an object with given size in each dimen-sion. Minimizing the number of bins should minimizethe idle power wastage. However, that is not true in gen-eral, causing the energy aware consolidation problem todiffer from traditional vector bin packing:Performance degradation: Unlike objects packed inbins, the objects here have a characteristic not modeledalong the bin dimensions: performance. When two ob-jects are packed together, their performance (such asthroughput) degrades. Since performance degradationincreases energy per unit work, minimizing the numberof bins may not necessarily minimize energy. This as-pect must be accounted for in the solution.Power variation: In a traditional bin packing problem,if the minimum number of bins isn, then filling upn−1bins completely and placing a small object in then-th binmay be an optimal solution. Here, even given the optimalnumber of bins, the actual allocation of the objects mayaffect the power consumed.

Designing an effective consolidation problem thus re-quires a new solution. The actual implementation mustalso address several other challenges discussed in sec-tion 4. Before discussing those issues, we consider anexample heuristic for consolidation, to help illustrate theproblem concretely (though not necessarily providing afinal or optimal solution).

3.1 Consolidation AlgorithmThis algorithm aims to find a minimal energy allocationof workloads to servers. Minimum energy consumptionoccurs at a certain utilization and performance. How-ever, that performance may not be within the desired per-formance constraints and hence we design the algorithmto minimize energy subject to a performance constraint.This constraint also helps account for the first differencementioned above from traditional bin-packing: if we setthe performance constraint to that needed for optimal en-ergy consumption, the algorithm will not attempt to min-imize the number of bins beyond the number that min-imizes power. If the performance constraint is tighterthan that required for optimal energy, the number of binsis automatically never reduced below the optimal.

Also, while energy and performance change in a non-trivial manner when workloads are combined, it is worthnoting that the resource utilizations themselves are ap-proximately additive, especially at utilizations that arelower than the optimal levels for each resource. Using

utilizations as the bin dimensions thus allows our algo-rithm to operate with a closed form calculation of binoccupancy when combining objects.

The algorithm proceeds as follows: The first step de-termines the optimal point, from profiling data such asshown in Figure 2(b). Next, as each request arrives, itis allocated to a server, resulting in the desired workloaddistribution across servers. The crux lies in using a sim-ple, efficient and scalable heuristic for bin packing. Theheuristic used here maximizes the sum of the Euclideandistances of the current allocations to the optimal point ateach server. This heuristic is based on the intuition thatwe can use both dimensions of a bin to the fullest (where“full” is defined as the optimal utilization point) after thecurrent allocation is done, if we are left with maximumempty space in each dimension after the allocation. Then-dimensional Euclidean distance (δe) provides a scalarmetric that depends on the distances to the optimal pointalong each dimension. If the request cannot be allocated,a new server is turned on and all requests are re-allocatedusing the same heuristic, in an arbitrary order. This ap-proach is adaptive to changing workloads (temporal vari-ations in requests) as we allocate newer requests to ap-propriate servers. It also works seamlessly in presenceof heterogeneity: in that case the optimal combinationswould be different for each type of server.

To illustrate the heuristic, consider the simple exam-ple shown in Table 1. The current status of two serversis shown: server A is currently executing at [30, 30],which represents 30% CPU utilization and 30% diskutilization, and Server B is at [40, 10]. A new re-quest with workload requirement [10,10] is to be allo-cated. Suppose the optimal point for both A and B is[80, 50]. If the application is allocated to A, the sum ofthe Euclidean distances to the respective optimal points,δAe ([40, 40]− [80, 50])+ δB

e ([40, 10]− [80, 50]) = 97.8,is greater than that achieved by allocating the request toB. Thus the request is allocated to A, leaving A runningat [40, 40] and B at [40, 10].

CPU Disk Opt CPU Opt Disk δe

∑δe

A orig 30 30 80 50 53.8 97.8A after 40 40 80 50 41.2B orig 40 10 80 50 56.6 96.2B after 50 20 80 50 42.4

Table 1:Scenario to illustrate proposed heuristic.

Note that an optimal algorithm may find better allo-cations. In the example above, as a result of the alloca-tion chosen, it is feasible to allocate subsequent requestssuch as [10, 40], [20, 40], [30, 40], [40, 30], and [40,40], but not requests such as [50, 10], [50, 20], whichwould have been feasible if the current request of [10,10]was allocated to B. At each decision point the heuristicchooses an allocation that would leave the system capa-

3

ble of servicing the widest range of future requests. Anoptimal scheme however, would choose the allocationsbased on all requests simultaneously rather than allocat-ing the current request without changing existing alloca-tions. In general, while it is possible to develop betterheuristic algorithms, the algorithm presented here servesas an illustration to show that energy savings are possibleover the base case. We reserve a more detailed study ofthe pathological cases for this algorithm (like extremelyrandom disk accesses) for future work.

The optimal scheme may have a very high computa-tional overhead, making it infeasible to be operated at therequest arrival rate. However, for the purpose of compar-ison, we implemented the optimal scheme based on anexhaustive search of all possible allocations for a verysmall cloud consisting of four servers.

# Apps Total CPU utilization Total disk utilizationMix1 6 84.87 85.86Mix2 6 93.72 53.87Mix3 6 78.79 150.58Mix4 6 91.37 108.92

Table 2:Various mixes of applications considered for our multipro-grammed workload.

3.2 Experimental EvaluationsTable 2 shows the characteristics of various mixes of ap-plications used in the comparison of the example heuris-tic with the optimal. The application mixes represent arange of consolidated workloads. Mix1 and Mix4 haveequal CPU and disk utilizations but vary in the abso-lute total resource used. Mix2 and Mix3 are skewed to-ward high CPU utilization and high disk utilization re-spectively. The energy consumption per transaction ofconsolidated workloads is experimentally measured us-ing the setup of Figure 1.

0123456

Mix-1 Mix-2 Mix-3 Mix-4

Energ

y (J) p

er tra

nsac

tion

Optimal Proposed

(a) tolerance = 20%.

0123456

Mix-1 Mix-2 Mix-3 Mix-4

Energ

y (J) p

er tra

nsac

tion

Optimal Proposed

(b) tolerance =∞.

Figure 3: Comparison of our heuristic scheme with the optimalscheme.

Energy per transaction with a maximum performancedegradation tolerance of 20% and infinite tolerance areplotted in Figures 3(a) and 3(b) respectively. Clearly,greater energy savings are achieved by both schemes athigher tolerance in performance degradation. The en-ergy used by the proposed heuristic is about 5.4% morethan optimal (not considering the energy spent on com-puting the optimal) on an average at 20% tolerance. It ap-

proaches the optimal at infinite tolerance in performancedegradation for the test scenario.

4 DiscussionThe consolidation problem discussed above helps ap-preciate some of the key aspects of the performance-utilization-energy relationship. Our ongoing effort in de-signing a solution has helped uncover several complica-tions, not addressed in the illustrative algorithm above,that lead to a rich set of research issues.

Multi-tiered Applications:Unlike the application de-scribed in the previous section, a realistic applicationmay use multiple servers such as a front-end request clas-sification server, a back-end database server to access re-quested data, an authentication server to verify relevantcredentials and so on. Such an application is commonlyreferred to as a multi-tiered application. Each tier has adifferent resource footprint and the footprints change ina correlated manner as the workload changes. The con-solidation problem involves allocating the multiple tiersof all applications across a subset of available physicalservers such that each application may operate at its de-sired performance level with minimum total energy us-age. The allocation may have to be adapted dynamicallyas workloads evolve over time, either via intelligent allo-cation of new requests, or via explicit migration of exist-ing state from one physical server to another.

Composability Profiles:We showed how the perfor-mance, resource utilization, and energy vary as multipleapplications are consolidated on a common server withvarying workload serviced. The utilization and perfor-mance was tracked using Xperf and power using an ex-ternal power meter. In a real cloud computing scenario,the overhead of running performance tracking and de-ploying external power meters may be unacceptable andbetter alternatives may be needed [4]. Also, measuringthe energy-performance relationship itself is non-trivialas the number of applications and the number of possi-ble combinations of their multiple tiers, across multipleserver types, may be very large. The use of extra runningtime for profiling of an application’s performance andenergy behavior with varying mixes may not be feasibledue to cost reasons. Run time variations may even renderprofiles less relevant. Thus, consolidation research alsoinvolves developing efficient methods to understand thecomposability behavior of applications at run time.

Migration and Resume Costs:While the examplein the previous section assumed an application servingsmall stateless requests, that is not the only type of ap-plications. Other applications may maintain a signifi-cant persistent state, such as a long lived chat session[1] or a video streaming session. Such applications in-volve a non-trivial overhead in migration from one ma-chine to another. Migration may involve initiating the re-

4

quired configuration and virtual machine needed for themigrating application, incurring additional costs. Also,the dynamic nature of allocation implies that the sleepand wake-up costs of the servers need to be considered.When multiple servers are running at low utilization dueto workload dwindling off, their applications may beconsolidated to fewer servers allowing some physicalservers to be set to sleep mode. However, the migrationcosts and sleep/wake-up costs may sometimes overshootthe benefit from shutdown of servers. Efficient meth-ods to dynamically adapt to the workload variations withrealistic migration and server resume costs for differentcategories of applications is an interesting problem.

Server Heterogeneity and Application Affinities:Weinitially assumed that any application may be hosted onany machine. However, in practice, certain applicationsmay have affinities for certain physical resources. For in-stance, it may be more efficient to run a database serverdirectly on the physical server that hosts the disk drives,since the database software may be optimized for the spe-cial read-write characteristics of disks and may directlyaccess the raw storage sectors on the disks. Runningthe database over network connected storage with addi-tional intermittent layers of storage drivers and networkfile system abstractions may make it inefficient. Asidefrom configuration, servers in a cloud facility may varyin energy-performancecharacteristics due to age of hard-ware. These variations are crucial to be exploited in con-solidation as the energy usage of an application woulddepend on which server is used and in what applicationmix. The consolidator may not always run an applica-tion on the server most efficient for it since a global opti-mization is performed. Migration and resume costs mayrestrict the application from migrating to a more efficientserver. Ensuring globally optimal operation in a dynamicand heterogeneous setting is an important problem.

Application Feedback:Our performance-energy studycontrolled the workload to measure energy and perfor-mance, assuming constant quality of service is providedat each performance level. However, certain applicationsmay change their behavior in response to resource avail-ability. For instance, a streaming application may be de-signed to lower the video streaming rate based on sys-tem load, the network stack may have built in conges-tion control, and so on. Consolidation may reduce re-source availability by increasing utilization, incurringanacceptable level of performance degradation to save en-ergy. However, if the application changes its behaviorin response to resource availability, such feedback mayresult in complex behavior that makes it hard to guar-antee the performance constraints. Designing consoli-dation methods for such applications or designing appli-cations to be consolidation-friendly is an interesting re-search challenge.

Additionally, the example methods discussed hereonly considered the CPU and disk resources. Other re-sources such as network and memory should also beconsidered, as these may be the bottleneck resourcesfor certain applications. Operational constraints such asco-location of certain data and components for security,power line redundancies, and cooling load distributionmay affect the design as well. Consolidation methodsmay need to account for several variations such as themultiple types of requests served by an applications -their expected processing times and the exact set of tiersused to service them. Some of the applications may useexternal services, not hosted by the cloud operator, fur-ther making the relationship more complex.

5 Related workConsolidation of resources and controlling resource uti-lizations has been researched in recent times. A frame-work for adaptive control of virtualized resources inutility computing environments is presented in [5]. Inthe past, cluster resource management has posed simi-lar problems to that of consolidation of virtualized re-sources. For instance, [7] proposes a two level resourcedistribution and scheduling scheme, and [8] proposes acluster level feedback control scheme for performanceoptimization. Power aware load balancing and migrationof activity within a processor, among multicore proces-sors and servers has also been studied in [3, 2, 6]. How-ever, there has been little work on joint power and perfor-mance aware schemes for multi-dimensional resource al-location. Specifically, characterizing power-performancecharacteristics of consolidated workloads with respect tomultiple resources has not been adequately addressed.

References

[1] CHEN, G., et al. Energy-aware server provisioning and loaddispatching for connection-intensive internet services.In NSDI(2008).

[2] GOMAA , M., et al. Heat-and-run: leveraging smt and cmp to man-age power density through the operating system.SIGOPS Oper.Syst. Rev. 38, 5 (2004).

[3] HEO, S., BARR, K., AND ASANOVIC, K. Reducing power den-sity through activity migration. InISLPED(2003).

[4] K ANSAL , A., AND ZHAO, F. Fine-grained energy profiling forpower-aware application design. InWorkshop on Hot Topics inMeasurement and Modeling of Computer Systems(2008).

[5] PADALA , P., et al. Adaptive control of virtualized resources inutility computing environments. InEuropean Conference on Com-puter Systems(2007).

[6] RANGANATHAN , P., et al. Ensemble-level power management fordense blade servers. InISCA(2006).

[7] SHEN, K., et al. Integrated resource management for cluster-basedinternet services.SIGOPS Oper. Syst. Rev. 36, SI (2002), 225–238.

[8] WANG, X., AND CHEN, M. Cluster-level feedback power controlfor performance optimization. InHPCA(2008).

5

energy aware consolidation for cloud computing aware consolidation for cloud computing shekhar...

Documents