power control for data centers ming chen oct. 8 th, 2009 ece 692 topic presentation

41
Power Control for Data Centers Ming Chen Oct. 8 th , 2009 ECE 692 Topic Presentation

Upload: patrick-james

Post on 29-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Power Control for Data Centers

Ming ChenOct. 8th, 2009

ECE 692 Topic Presentation

2

Why power control in Data Centers?

Power is one of the most important computing resources.

Facility over-utilized− Dangerous− System failure and

overheating− Power below the capacity

Facility under-utilized− Cost of power facilities− Economically amortize

investment.− Provision to fully utilize power

facility.

3

Xiaorui Wang, Ming Chen University of Tennessee, Knoxville, TN

SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers

Charles Lefurgy, Tom W. Keller IBM Research, Austin, TX

4

Introduction

Power overload may cause system failures.− Power provisioning CANNOT guarantee exempt of overload.

− Over-provisioning may cause unnecessary expenses.

Power control for an entire data center is very necessary.

Data centers are expanding to meet new business requirement.− Cost-prohibitive to expand the power

facility.− Upgrades of power/cooling systems

lag far behind.− Example: NSA data center

5

Challenges

Scalability: One centralized controller for thousands of servers?

Coordination: if multiple controllers designed, how do they interact with each other?

Stability and accuracy: workload is time-varying and unpredictable.

Performance: how to allocate power budgets among different servers, racks, etc.?

6

State of The Art

Reduce power by improving energy-efficiency : [Lefurgy], [Nathuji], [Zeng], [Lu], [Brooks], [Horvath], [Chen]− NOT enforce power budget.

Power control for a server [Lefurgy], [Skadron], [Minerick],

a rack, [Wang], [Ranganathan], [Femal]− Cannot be directly applied for data centers.

No “Power” Struggles presents a multi-level power manager. [Raghavendra] − NOT designed based on power supply hierarchy− NO rigorous overall stability analysis− Only simulation results for 180 servers

7

What is This Paper About?

SHIP: a highly Scalable Hierarchical Power control architecture for large-scale data centers− Scalability: decompose the power control for a data center

into three levels

− Coordination: hierarchy is based on power distribution system in data centers.

− Stability and accuracy: theoretically guaranteed by Model Predicative Control (MPC) theory.

− Performance: differentiate power budget based on performance demands, i.e. utilization.

8

Power Distribution Hierarchy

A simplified example for a three-level data center

− Data center-level− PDU-level− Rack-level

Thousands of servers in total

9

PMRPC

PMRPC

UtilizationMonitor

FrequencyModulator

UM FM

UM FM

UM FM

Power Monitor

Rack PowerController

PDU PowerController

PDU-LevelPower Monitor

Rack-level PDU-level Data center-level

Controlled variable

The total power of the rack

The total power of the PDU

The total power of the data center

Manipulated variable

The CPU frequency of each server

The power budget of each rack

The power budget of each PDU

Control Architecture

HPCA08 paper This paper

10

PDU-level Power Model

System model:

Uncertainties:)()( kbrgkpr iii

gi is the power change ratio .

Actual model:

)(

...

)(

]...[)()1(1

1

kbr

kbr

ggkppkpp

N

N

:)(kpp the total power of PDU :)(kpri the power change of rack i

:)(kbri the change of power budget for rack i

)()()1(1

kprkppkppN

ii

11

Model Predictive Control (MPC)

Design steps:− Design a dynamic model for the controlled system.− Design the controller.− Analyze the stability and accuracy.

Control objective:

2

}1|)({))1((min s

NjkbrPkpp

j

s

jjjj

Pkpp

NjPkbrkbrPtosubject

)1(

)1()()(: max,min,

MPC Controller Design

12

Least Squares Solver

ReferenceTrajectory

Cost Function

ConstraintsSystemModel

21

0

2)(

1

||||||)|()|(||)( R(i)maxPk)|ibr(kk)|iΔbr(k

M

iiQ

P

i

kikrefkikppkV

Power budget

Measured power

)(

...

)(1

kbr

kbr

N

Budget changes

sP

)(kppIdeal trajectoryto track budget

Tracking error Control penalty

13

Stability

Local Stability− gi is assumed to be 1 at design time.

− gi is unknown a priori.

− 0 < gi < 14.8: 14.8 times of the allocated budget

Global Stability− Decouple controllers at different levels by running

them in different time scales.− The period of upper-level control loop > the

settling time of the lower-level− Sufficient but not necessary

14

System Implementation Physical testbed

− 10 Linux servers− Power meter (Wattsup)

• error: • sampling period: 1 sec

− Workload: HPL, SPEC− Controllers:

• period: 5s for rack, 30s for PDU Simulator (C++)

− Simulate large-scale data centers in three levels.− Utilization trace file from 5,415 servers in real data centers− Power model is based on experiments in servers.

%5.1

15

Precise Power Control (Testbed)

0 800 1600 2400600

800

1000

1200

PDU

Time (s)

Po

wer

(W

)

0 800 1600 2400200

240

280

320

360

Budget Rack 1Rack 2 Rack 3

Time (s)

Po

wer

(W

)

Power can be precisely controlled at the budget.

The budget can be reached within 4 control periods.

The power of each rack is controlled at their budgets.

Budgets are proportional to

.maxP

Tested for many power set points (See the paper for more results.)

16

0 800 1600 2400200

230

260

290

320

Rack1 Rack2

Time (s)

Po

wer

bu

dg

et (

W)

Power Differentiation (Testbed)

Capability to differentiate budgets based on workload to improve performance

Take the utilization as the optimization weights. Other differentiation metrics: response time,

throughput

Budget allocation proportional to estimated max consumptions;

Budgets differentiated by utilization;

CPU: 100%CPU: 80%

CPU: 50%

17

Simulation for Large-scale Data Centers

0 2 4 6 8 10 12 14400

500

600

700

800

Data centerSet point

Time (control period)

Po

wer

(kW

) 6 PDU, 270 racks Real data traces 750 kW

Randomly generate 3 data centers Real data traces

600 620 640 660 680 700 720 740 760 780

500

600

700

800

900

Data center 1 Data center 2Data center 3 Set point

Power set point (kW)

Po

wer

(kW

)

18

Budget Differentiation for PDUs

1 2 3 4 5 6 7 8 9 10 11 12 13 140

5

10

15

PDU1 PDU2 PDU3

PDU4 PDU5 PDU6

Time (control period)

CP

U u

tili

zati

on

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 140

8

16

24

32

40

PDU1 PDU2 PDU3

PDU4 PDU5 PDU6

Time (control period)

Dif

fere

nce

(kW

)

Power differentiation in large-scale data centers;− Minimize the difference with estimated max power consumption.− Utilization is the weight.− The difference order is consistent with the utilization order.

PDU5

PDU2

19

Execution time of the MPC controller Vs. the # of servers

Scalability of SHIP

0 500 1000 1500 2000 2500 30000

3000

6000

9000

12000

0.09 (50)0.39 (100)

65.9452.1

3223.6

10997.5

Number of servers

Ex

ec

uti

on

tim

e (

se

c)

Centralized SHIP

Level One level Multiple

Computation overhead Large Small

Communication overhead Long Short

Scalability NO YES

Overhead of SHIP

The max scaleof centralized

20

Conclusion SHIP: a highly Scalable HIerarchical Power control

architecture for large-scale data centers− Three-levels: rack, PDU, and data center− MIMO controllers based on optimal control theory (MPC)− Theoretically guaranteed stability and accuracy− Discussion on coordination among controllers

Experiments on a physical testbed and a simulator− Precise power control− Budget differentiation− Scalable for large-scale data centers

21

Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso

Power Provisioning for a Warehouse-sized Computer

Acknowledgments: The organization order and contents of some slides are based on Xiaobo Fan’s slides in pdf.

22

Introduction

Strong economic incentives to fully utilize facilities− Investment is best amortized.− Upgrades without any new power facility investment

Power facilities

$10-$20/watt

years

utilization

~10 ~18

Electricity < $0.8/watt-year

Run risk of outages or costly violations of SLA.

Power provisioning given the budget

0.85

0.5

23

Reasons for Facility Under-utilization

Staged deployment− new facilities are rarely fully populated

Fragmentation Conservative machine power rating (nameplate) Statistical effects

− Larger machine population, lower probability of simultaneous

peaks

Variable load

24

What is This Paper About?

Investigate over-subscription potential to increase power facility utilization.− A light-weight and accurate model for estimating power− Long-term characterization of simultaneous power usage of a

large number of machines

Study of techniques for saving energy as well as peak power.− Power capping (physical testbed)− DVS (simulation)− Reduce idle power (simulation)

25

Data Center Power Distribution

Transformer

Main Supply

ATSSwitchBoard

UPS UPS

STSPDU

STSPDU

Pan

el

Pan

el

Generator

1000 kW

200 kW

50 kW

Rack

Circuit

2.5 kWRack level

40-80 servers

PDU level20-40 racks

Data center level5-10 PDUs

26

Power Estimation Model

Model is predicted for each family of machines. Greater interest is for a group of machines.

Direct measurements are not always available.

Input: CPU utilization

Models:− Pidle+(Pbusy – Pidle)u

− Pidle+(Pbusy – Pidle)(2u-ur)

− Measure and derive <Pidle ,Pbusy

, r>

27

Model Validation

PDU-level validation example (800 machines) Almost constant offset

− Loads not accounted in the model: networking equipments.

Relative error is below 1%.− EMc

i

ii

M

cEM

nerror

|(|1

28

Analysis Setup

Data center setup− Pick up more than 5,000 servers for each workload.− Rack: 40 machines, PDU: 800 machines, Cluster: 5000+

Monitoring period: 6 months every 10 mins Distribution of power usage

− Aggregate power at each time interval at different levels.− Normalized to aggregated peak power

Workload Description

Websearch Online servicing correlating with time of dayComputation-intensive

Webmail Disk I/O intensive.

Mapreduce Offline batch jobsLess correlation between activities and time of day

Real data center Randomly pick any machines from data centers

29

Webmail

65%

92%88%86%

72%

Higher level, narrower range− More difficult to improve facility utilization in lower

levels. Peak lowers as more machines are aggregated.

− 16% more machines can be deployed.

30

Websearch

45%

98%93%

52%

Peak lowers as more machines are aggregated.− 7% more machines can be deployed.

98% 93%

Higher level, narrower range− More difficult to improve facility utilization in lower

levels.

31

Real Data Centers

Clusters have much narrower dynamic range compared to racks.

Clusters peak at 72%.− 39% more machines

Mapreduce has the similar results.

32

Summary of Characterization

Workload Avg power Power range Machine increase

Websearch 68% 52%-93% 7%

Webmail 78% 72%-86% 16%

Mapreduce 70% 54%-90% 11%

Real data center 60% 51%-72% 39%

Average power: utilization of the power facilities Dynamic range: difficulty to improve facility utilization Peak power: potential of deployment over-subscription

33

CDF

1.0Time in power capping

Power

Powersaving

Time

Power CDF1.0

Power

Power Capping

Small fraction of time in power capping Substantial saving in peak power

Provide a safety valve when workload is unexpected.

34

Results for Power Capping

For workload with loose SLA or low priority Websearch and Webmail are excluded; De-scheduling tasks or DVFS

Motivation− A large portion of dynamic power is consumed by CPU.− DVS is widely available in modern CPUs.

CPU Voltage/Frequency Scaling

utilization

CPU power

threshold

Method− Oracle-style policy− Threshold: 5%, 20%, 50%− Simulation− CPU power is halved when

DVS is triggered.

35

36

Energy saving is larger than peak power reductions.

Biggest saving in data centers.

Benefits vary with workloads

Results for DVS

37

Lower Idle Power

Motivation− Idle power is high. (more

than 50% of peak)− Most of time is in

non-peak activity level.− What if idle power is

10% of peak?

keeping peak power

unchanged.− Simulation

utilization

CPU powerPeak

0.6

0.1

38

Conclusions

Power provisioning is important to amortize facility investment.

Load variation and statistical effects lead to facility under-utilization.

Over-subscribing deployment is more attractive in cluster level than rack level.

Three simple strategies to improve facility utilization: power capping, DVS, and lower idle power

39

Comparison of the Two Papers

SHIP Power Provisioning

Target Power capacity of data centers

Power capacity of data centers

Goal Control power to the budget to avoid facility over-utilization

Give power provisioning guidelines to avoid facility under-utilization

Methodology MIMO optimal control Statistical analysis

Solutions A complete control-based solution

Some strategies suggested based on real data analysis

Experiments Physical testbed and simulation based on real trace files

Detailed analysis on real trace files and simulations

40

Critiques Paper 1

− Workload is not typical in real data centers.− Power model may include CPU utilization.− No convincing baseline is compared.

Paper 2− Power provisioning Vs. performance violations− Power model is workload-sensitive.− Estimation accuracy in rack-level?− Quantitative analysis on idle power and peak power reduction

41

Thank you !