kirk w. cameron scape laboratory virginia tech

50
1 SCAPE Laboratory Confidential Kirk W. Cameron SCAPE Laboratory Virginia Tech The past, present, and future of Green Computing

Upload: lalo

Post on 19-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

The past, present, and future of Green Computing. Kirk W. Cameron SCAPE Laboratory Virginia Tech. Enough About Me. Associate Professor Virginia Tech Co-founder Green500 Co-founder MiserWare Founding Member SpecPower Consultant for EPA Energy Star for Servers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kirk W. Cameron SCAPE Laboratory Virginia Tech

1SCAPE Laboratory Confidential

Kirk W. CameronSCAPE Laboratory

Virginia Tech

The past, present, and future of

Green Computing

Page 2: Kirk W. Cameron SCAPE Laboratory Virginia Tech

Enough About Me• Associate Professor Virginia Tech• Co-founder Green500• Co-founder MiserWare• Founding Member SpecPower• Consultant for EPA Energy Star for Servers• IEEE Computer “Green IT” Columnist• Over $4M Federally funded “Green”

research• SystemG Supercomputer

2

Page 3: Kirk W. Cameron SCAPE Laboratory Virginia Tech

3SCAPE Laboratory Confidential

What is SCAPE?

• Scalable Performance Laboratory– Founded 2001 by Cameron

• Vision– Improve efficiency of high-end systems

• Approach– Exploit/create technologies for high-end systems– Conduct quality research to solve important

problems– When appropriate, commercialize technologies– Educate and train next generation HPC CS

researchers

Page 4: Kirk W. Cameron SCAPE Laboratory Virginia Tech

4

The Big Picture (Today)

• Past: Challenges– Need to measure and correlate power data – Save energy while maintaining performance

• Present– Software/hardware infrastructure for power measurement– Intelligent Power Management (CPU Miser, Memory Miser)– Integration with other toolkits (PAPI, Prophesy)

• Future: Research + Commercialization– Management Infra-Structure for Energy Reduction– MiserWare, Inc.– Holistic Power Management

Page 5: Kirk W. Cameron SCAPE Laboratory Virginia Tech

5

1882 - 2001

Page 6: Kirk W. Cameron SCAPE Laboratory Virginia Tech

6

Prehistory

• Embedded systems• General Purpose Microarchitecture

– Circa 1999 power becomes disruptive technology

– Moore’s Law + Clock Frequency Arms Race– Simulators emerge (e.g. Princeton’s Wattch)– Related work continues today (CMPs, SMT, etc)

1882 - 2001

Page 7: Kirk W. Cameron SCAPE Laboratory Virginia Tech

7

2002

Page 8: Kirk W. Cameron SCAPE Laboratory Virginia Tech

8

Server Power

• IBM Austin– Energy-aware commercial servers [Keller et al]

• LANL– Green Destiny [Feng et al]

• Observations– IBM targets commercial apps– Feng et al achieve power savings in exchange

for performance loss

2002

Page 9: Kirk W. Cameron SCAPE Laboratory Virginia Tech

9

HPC Power

• My observations– Power will become disruptive to HPC– Laptops outselling PC’s– Commercial power-aware not appropriate for

HPC

2002

TM CM-5 .005

Megawatts

Residential A/C.015

MegawattsIntel ASCI Red

.850 Megawatts

High-speed train

10 MegawattsEarth Simulator12 Megawatts

$4,000/yr $12,000/yr $680,000/yr $8 million/yr $9.6 million/yr

$800,000 per yearper megawatt!

Conventional Power Plant300 Megawatts

Page 10: Kirk W. Cameron SCAPE Laboratory Virginia Tech

10

HPPAC Emerges

• SCAPE Project– High-performance,

power-aware computing

– Two initial goals• Measurement tools• Power/energy savings

– Big Goals…no funding (risk all startup funds)

2002

Page 11: Kirk W. Cameron SCAPE Laboratory Virginia Tech

11

2003 - 2004

Page 12: Kirk W. Cameron SCAPE Laboratory Virginia Tech

12

Cluster Power

• IBM Austin– On evaluating request-distribution schemes for saving

energy in server clusters, ISPASS ‘03 [Lefurgy et al]– Improving Server Performance on Trans Processing

Workloads by Enhanced Data Placement. SBAC-PAD ’04 [Rubio et al]

• Rutgers– Energy conservation techniques for disk array-based

servers. ICS ’04 [Bianchini et al]

• SCAPE– High-performance, power-aware computing, SC04– Power measurement + power/energy savings

2003 - 2004

Page 13: Kirk W. Cameron SCAPE Laboratory Virginia Tech

13

Hardware power/energy profiling

Software power/energy control

Data collection

High-performancePower-aware Cluster

Multi-meter BaytechPower Strip

Singlenode

AC

DCData Log

Data Analysis

Data Repository

Power/Energy Profiling Data

Multi-metercontrol

DC Power from power supply

Multi-meter Multi-meter Multi-meter

AC Power from outlet

Baytech Powerstrip

BaytechManagement

unit

DVScontrol

MM Thread MM Thread MM Thread

Multi-meter Control Thread

Applications

PowerPack libraries (profile/control)

Microbenchmarks

DVS Thread DVS Thread DVS Thread

DVS Control Thread

PowerPack MeasurementScalable, synchronized, and accurate.

2003 - 2004

Page 14: Kirk W. Cameron SCAPE Laboratory Virginia Tech

14

After frying multiple

components…

Page 15: Kirk W. Cameron SCAPE Laboratory Virginia Tech

15

PowerPack Framework(DC Power Profiling)

Multi-meters + 32-node Beowulf

If node .eq. root thencall pmeter_init (xmhost,xmport)call pmeter_log (pmlog,NEW_LOG)

endif

<CODE SEGMENT>

If node .eq. root thencall pmeter_start_session(pm_label)

endif

<CODE SEGMENT>

If node .eq. root thencall pmeter_pause()call pmeter_log(pmlog,CLOSE_LOG)call pmeter_finalize()

endif

Page 16: Kirk W. Cameron SCAPE Laboratory Virginia Tech

16

Power Profiles – Single Node

• CPU is largest consumer of power typically (under load)

CPU14%

Memory10%

Disk11%

NIC1%

Other Chipset8%

Fans23%

Power Supply33%

Power consumption distribution for system idleSystem Power: 39 Watt

CPU35%

Memory16%

Disk7%

NIC1%

Other Chipset5%

Fans15%

Power Supply21%

Power consumption distribution formemory performance bound (171.swim)

System Power: 59 Watt

Page 17: Kirk W. Cameron SCAPE Laboratory Virginia Tech

17

Power Profiles – Single Node

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

idle 171.swim 164.gz ip cp scp

CPU M emory Disk NIC

Power Consumption Distribution for Different Workloads

CPU-bound memory-bound

disk-bound

network-bound

Note : on ly power c ons umed by CPU, memory, disk and NIC are considered here

CPU-boundmemory-bound

Power Consumption for Various Workloads

disk-bound

network-bound

Page 18: Kirk W. Cameron SCAPE Laboratory Virginia Tech

18

NAS PB FT – Performance Profiling

computereduce(comm)

computeall-to-all(comm)

About 50% time spent in communications.

Page 19: Kirk W. Cameron SCAPE Laboratory Virginia Tech

19

Power Profile of FT Benchmark (class B, NP=4)

0

5

0

5

0

5

10

0

5

10

15

20

25

30

0 20 40 60 80 100 120 140 160 180 200

Time (second)

Po

we

r (w

att

) CPU power

memory power

disk power

NIC power

startup initialize iteration 1 iteration 2 iteration 3

Power profiles reflect performance profiles.

Page 20: Kirk W. Cameron SCAPE Laboratory Virginia Tech

20SCAPE Laboratory Confidential

One FFT Iteration

one iteration

evolve fft

cffts1 cffts1 cffts2transpose_x_yz

transpose_local

mpi_all-to-all

transpose_finish

send-recv send-recv wait send-recv

CPU Power

Memory Power

0

5

10

15

20

25

30

110 115 120 125 130 135 140 145 150

Time (Seconds)

Po

wer

(W

atts

)

Page 21: Kirk W. Cameron SCAPE Laboratory Virginia Tech

21

2005 - present

Page 22: Kirk W. Cameron SCAPE Laboratory Virginia Tech

22

Intuition confirmed 2005 - Present

Page 23: Kirk W. Cameron SCAPE Laboratory Virginia Tech

23

HPPAC Tool Progress

• PowerPack– Modularized PowerPack and SysteMISER– Extended analytics for applicability– Extended to support thermals

• SysteMISER– Improved analytics to weigh tradeoffs at

runtime– Automated cluster-wide, DVS scheduling– Support for automated power-aware memory

2005 - Present

Page 24: Kirk W. Cameron SCAPE Laboratory Virginia Tech

24

Predicting CPU Power

0

5

10

15

20

25

30

0 10 20 30 40 50 60 70 80 90 100

Time (Seconds)

Po

wer

(W

atts

)

Estimated CPU Power

Measured CPU Power

2005 - Present

Page 25: Kirk W. Cameron SCAPE Laboratory Virginia Tech

25

Predicting Memory Power

0

2

4

6

8

10

12

0 10 20 30 40 50 60 70 80 90 100

Time (Seconds)

Po

we

r (W

att

s)

Estimated Memory Power

Measured Memory Power

2005 - Present

Page 26: Kirk W. Cameron SCAPE Laboratory Virginia Tech

26

Correlating Thermals BT 2005 - Present

Page 27: Kirk W. Cameron SCAPE Laboratory Virginia Tech

27SCAPE Laboratory Confidential

Correlating Thermals MG 2005 - Present

Page 28: Kirk W. Cameron SCAPE Laboratory Virginia Tech

28

Tempest Results FT 2005 - Present

Page 29: Kirk W. Cameron SCAPE Laboratory Virginia Tech

29

SysteMISER

• Our software approach to reduce energy– Management Infrastructure for Energy

Reduction

• Power/performance– measurement– prediction– control

The Heat Miser.

2005 - Present

Page 30: Kirk W. Cameron SCAPE Laboratory Virginia Tech

30

Power-aware DVS scheduling strategies

CPUSPEED Daemon[example]$ start_cpuspeed[example]$ mpirun –np 16 ft.B.16

Internal schedulingMPI_Init();<CODE SEGMENT>setspeed(600);<CODE SEGMENT> setspeed(1400);<CODE SEGMENT>MPI_Finalize();

External Scheduling[example]$ psetcpuspeed 600[example]$ mpirun –np 16 ft.B.16

NEMO & PowerPack Framework for saving energy

2005 - Present

Page 31: Kirk W. Cameron SCAPE Laboratory Virginia Tech

31

CPU MISER Scheduling (FT)

36% energy savings, less than 1% performance loss

See SC2004, SC2005 publications.

Normalized Energy and Delay with CPU MISER for FT.C.8

0.00

0.20

0.40

0.60

0.80

1.00

1.20

auto 600 800 1000 1200 1400 CPU MISER

normalized delay

normalized energy

2005 - Present

Page 32: Kirk W. Cameron SCAPE Laboratory Virginia Tech

32

Where else can we save energy?

• Processor – DVS– Where everyone starts.

• NIC– Very small portion of systems power

• Disk– A good choice (our future work)

• Power-supply– A very good choice (for a EE or ME)

• Memory– Only 20-30% of system power, but…

2005 - Present

Page 33: Kirk W. Cameron SCAPE Laboratory Virginia Tech

33

The Power of Memory

Effects of increased memory on system power(90 Watt CPU, 9 Watt 4GB DIMM)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 32 64 96 128 160 192 224 256

Amount of memory per processor (GB)

% o

f sy

stem

po

wer

Percentage of system power for memory

Percentage of system power for CPUs

2005 - Present

Page 34: Kirk W. Cameron SCAPE Laboratory Virginia Tech

34

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time (minutes)

Mem

ory

Dev

ices

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time (minutes)

Mem

ory

Dev

ices

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time (minutes)

Mem

ory

Dev

ices

Default Static Dynamic

Memory Management Policies

Memory MISER =

Page Allocation Shaping + Allocation Prediction + Dynamic Control

2005 - Present

Page 35: Kirk W. Cameron SCAPE Laboratory Virginia Tech

35

Memory MISER Evaluationof Prediction and Control

0

1

2

3

4

5

6

7

8

0 5000 10000 15000 20000 25000 30000 35000

Time (seconds)

Gig

ab

yte

s

Memory Online Memory Demand

Prediction/control looks good, but are we guaranteeing performance?

2005 - Present

Page 36: Kirk W. Cameron SCAPE Laboratory Virginia Tech

36

Memory MISER Evaluationof Prediction and Control

Stable, accurate prediction using PID controller.

But, what about big (capacity) spikes?

0

1

2

3

4

5

6

7

8

22850 22860 22870 22880 22890 22900 22910 22920 22930 22940 22950

Time (seconds)

Gig

ab

yte

s

Memory Online Memory Demand

2005 - Present

Page 37: Kirk W. Cameron SCAPE Laboratory Virginia Tech

37

Memory MISER Evaluationof Prediction and Control

Memory MISER guarantees performance in “worst” conditions.

0

1

2

3

4

5

6

7

8

16940 16960 16980 17000 17020 17040 17060

Time (seconds)

Gig

ab

yte

s

Memory Online Memory Used

2005 - Present

Page 38: Kirk W. Cameron SCAPE Laboratory Virginia Tech

38

Memory MISER EvaluationEnergy Reduction

… …

FLASH Memory DemandDevices Online Memory Demand

8

16

24

32

40

48

0

1

2

3

4

5

6

0

Gig

abyt

es

Dev

ices

Timet0 t4t3t2t1

Stable PID control

High freq cyclic alloc/dealloc

Stable PID control

Pinned pages (OS) decrease efficiency

Tiered increases in memory allocations

30% total system energy savings,less than 1% performance loss

2005 - Present

Page 39: Kirk W. Cameron SCAPE Laboratory Virginia Tech

39

Present - 2012

Page 40: Kirk W. Cameron SCAPE Laboratory Virginia Tech

SystemG Supercomputer @ VT

Page 41: Kirk W. Cameron SCAPE Laboratory Virginia Tech

• 325 Mac Pro Computer nodes, each with two 4-core 2.8 gigahertz (GHZ) Intel Xeon Processors.

• Each node has eight gigabytes (GB) random access memory (RAM). Each core has 6 MB cache.

• Mellanox 40Gb/s end-to-end InfiniBand adapters and switches.

• LINPACK result: 22.8 TFLOPS (trillion operations per sec)

• Over 10,000 power and thermal sensors

• Variable power modes: DVFS control (2.4 and 2.8 GHZ), Fan-Speed control, Concurrency throttling,etc.

(Check: /sys/devices/system/cpu/cpuX/Scaling_avaliable_frequencies.)

• Intelligent Power Distribution Unit: Dominion PX (remotely control the servers and network devices. Also monitor current, voltage,

power, and temperature through Raritan’s KVM switches and secure Console Servers.)

SystemG Stats

Page 42: Kirk W. Cameron SCAPE Laboratory Virginia Tech

Deployment Details

24 U

1 U

1 U

8 U

8 U

8 U

1 U* 13 racks total, 24 nodes on each rack

and 8 nodes on each layer.

* 5 PDUs per rack. Raritan PDU Model DPCS12-20. Each single PUD in SystemG has

an unique IP address and Users can use IPMI to access and retrieve

information from the PDUS and also control them such as remotely shuting down and restarting machines, recording system AC power,

etc.

* There are two types of switch:

1) Ethernet Switch: 1 Gb/sec Ethernet switch. 36 nodes share one Ethernet switch.

2) InfiniBand switch: 40 Gb/sec InfiniBand switch. 24 nodes (which is one rack) share one IB switch.

Page 43: Kirk W. Cameron SCAPE Laboratory Virginia Tech

Data collection system and Labview

Sample diagram and corresponding front panel from Labview:

Page 44: Kirk W. Cameron SCAPE Laboratory Virginia Tech

A Power Profile for HPCC benchmark suite

Page 45: Kirk W. Cameron SCAPE Laboratory Virginia Tech

Published Papers And Useful Links

Papers:

1. Rong  Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron, PowerPack: Energy profiling and analysis of High-Performance Systems and Applications, IEEE Transactions on Parallel and Distributed Systems, Apr. 2009.

2. Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron, Energy Profiling and Analysis of the HPC Challenge Benchmarks, The International Journal of High Performance Computing Applications, Vol. 23, No. 3, 265-276 (2009)

NI system set details:

http://sine.ni.com/nips/cds/view/p/lang/en/nid/202545http://sine.ni.com/nips/cds/view/p/lang/en/nid/202571

Page 46: Kirk W. Cameron SCAPE Laboratory Virginia Tech

46

The future…

• PowerPack– Streaming sensor data from any source

• PAPI Integration

– Correlated to various systems and applications• Prophesy Integration

– Analytics to provide unified interface

• SysteMISER– Study effects of power-aware disks and NICs– Study effects of emergent architectures (CMT, SMT, etc)– Coschedule power modes for energy savings

Present - 2012

Page 47: Kirk W. Cameron SCAPE Laboratory Virginia Tech
Page 48: Kirk W. Cameron SCAPE Laboratory Virginia Tech

48SCAPE Laboratory Confidential

Outreach

• See http://green500.org• See http://thegreengrid.org• See http://www.spec.org/specpower/• See http://hppac.cs.vt.edu

Page 49: Kirk W. Cameron SCAPE Laboratory Virginia Tech

49

Acknowledgements

• My SCAPE Team– Dr. Xizhou Feng (PhD 2006)– Dr. Rong Ge (PhD 2008)– Dr. Matt Tolentino (PhD 2009)– Mr. Dong Li (PhD Student, exp 2010)– Mr. Song Shuaiwen (PhD Student, exp 2010)– Mr. Chun-Yi Su, Mr. Hung-Ching Chang

• Funding Sources– National Science Foundation (CISE: CCF, CNS)– Department of Energy (SC)– Intel

Page 50: Kirk W. Cameron SCAPE Laboratory Virginia Tech

50

Thank you very much.

http://scape.cs.vt.edu

Thanks to our sponsors: NSF (Career, CCF, CNS), DOE (SC), Intel

[email protected]