energy to solution

36
Energy to Solution: A New Mission for Parallel Computing By Ernst A. Graf Euro - Par 2013 Keynote August 29, 2013 Aachen Arndt Bode Chairman of the Board, Leibniz-Rechenzentrum of the Bavarian Academy of Sciences and Humanities and Technische Universität München

Upload: vonhu

Post on 13-Jan-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy to Solution

Energy to Solution:

A New Mission for Parallel Computing

By Ernst A. Graf

Euro-Par 2013 Keynote August 29, 2013 Aachen

Arndt Bode

Chairman of the Board, Leibniz-Rechenzentrum of the

Bavarian Academy of Sciences and Humanities and

Technische Universität München

Page 2: Energy to Solution

• Why is Energy to Solution relevant for Euro-Par

• Complexity of HPC Datacenters: LRZ as an example

• Energy to Solution analysis and Optimiziation: A Wholistic Approach

• First Steps at Leibniz Supercomputer Centre

Page 3: Energy to Solution

Why „Energy to Solution“

Euro-Par Mission

„Euro-Par is an annual series of international conferences dedicated to the

promotion and advancement of all aspects of parallel and distributed computing“

Algorithms

Theory

Software Technology

Hardware

Applications from Scientific to Mobile

What about cost?

Cost for energy rises dramatically

Annual cost (in Germany prices) for TOP_10 of Top_500: 150 M€ p.a.

(66.758 MW [list], min. 100 MW including infrastructure)

Need to consider: „Energy to solution“ in a Wholistic Approach

Engineering approach: codesign Building-Infrastructure-System Hw/Sw-

Application

Euro-Par 20133

Page 4: Energy to Solution

Calculation Basis: MW taken from TOP_500

4

Power System Processor Performance

(Linpack)

17.808 Tianhe-2 Xeon + PHi 33.862,7

8.209 Titan Opteron + NVIDIA 17.590

7.890 Sequoia Power BGQ 17.173,2

12.660 K SPARC 10.510

3.945 Mira Power BGQ 8.586,6

4.510 Stampede Xeon + PHi 5.168,1

2.301 JUQUEEN Power BGQ 5.008,9

1.972 Vulcan Power BGQ 4.293,3

3.423 SuperMUC Xeon 2.897

4.040 Tianhe-1 Xeon + NVIDIA 2.566

66.758Accumulated MW (mostly without infrastructure: cooling, USV,

cable losses, storage, interconnect, …..)

107.655,8 Accumulated PFLOPs

Page 5: Energy to Solution

Energy cost 2012 (NUS consulting)

Euro-Par 20135

US-$-Cents per KWh

Italy 20.23

Germany 15.15

Spain 13.52

UK 12.45

Belgium 11.92

Australia 11.68

Austria 11.05

Poland 9.30

US 8.89

France 8.76

Finnland 8.64

Sweden 7.95

Canada 7.58

Basis:

1 MW for 450 h delivery

Page 6: Energy to Solution

SuperMUCEuro-Par 2013 6

Page 7: Energy to Solution

7

SuperMUC - Phase 2 (Animation)

Euro-Par 2013

Page 8: Energy to Solution

8

I/O

nodes

NAS80 Gbit/s

18 Thin node islands

(each >8000 cores)

1 Fat node island

(8200 cores)

also used as Migration System

$HOME

1.5 PB / 10 GB/s

Snapshots/Replika

1.5 PB

(separate fire section)

non blocking

pruned tree (4:1)

SB-EP

16 cores/node

2 GB/core

WM-EX

40cores/node

6.4 GB/core

10 PB

200 GB/s

GPFS for

$WORK

$SCRATCH

Visualization

Internet

Archive and Backup

~ 30 PB

Desaster Recovery Site

Compute nodes Compute nodes

non blocking

SuperMUC General Configuration - the traditional view

Page 9: Energy to Solution

9Dagstuhl Perspectives Workshop: May 21 - 25, 2012

SuperMUC is the most powerfull pure x86 ISA system of the world

General purpose, standard programming interface, „easy“ to port to,

future-safe for many applications

SuperMUC is the most energy efficient x86 based supercomputer of the world

Dark Center infrastructure at LRZ

Warm Water directly cooling

Energy aware scheduling with xCAT and Load Leveler

Contract including energy for 5 years

What‘s special about SuperMUC

Page 10: Energy to Solution

• Heat flux > 90% to water; very low chilled water requirement

• Power advantage over air-cooled node:

• Warm water cooled ~10%

(cold water cooled ~15%)

• due to lower Tcomponents and no fans

• Typical operating conditions: Tair = 25 – 35°C, Twater = 18 – 45°C

10Torsten Bloth, IBM Lab Services - © IBM Corporation

10

IBM iDataplex dx360 M4

Page 11: Energy to Solution

iDataplex DWC Rack w/ water cooled

nodes

(rear view of water manifolds)

IBM System x iDataPlex Direct Water Cooled Rack

iDataplex DWC Rack w/ water cooled

nodes

(front view)11 Torsten Bloth, IBM Lab Services - © IBM Corporation

11

Page 12: Energy to Solution

HKLS

12Euro-Par 2013

LRZ Power Distribution

Page 13: Energy to Solution

FW

Kaltwasser-1

WW 30/36°C

KW 14/20°C

KKW 4/10°C

W/Gly

Hzg/BKT

KKG

DAR

KLT19RKW

RKW

KLT11RKW

KLT12RKW

KLT13RKW

KLT14RKW

KLT15KM+RK

W

KLT16KM+RK

W

KLT17DT

KLT18Brunnen

KLT04KM+RK

W

KLT05KM+RK

W

KLT02KM+RK

W WT

9

HRRDLC RHEx

NSRDLC RHEx

UKG RLT

DARRHEx

UKG RLT

EG

UGUKG RLT

UKG

CAVE Institut 2RLT

Hzg

UKG

BKT

RLT SAN

HörsaalInstitut 1

RLT

RLT SAN

Hzg BKT

KLT03KM+RK

W WT

1

1

KLT01KM+RK

W WT

1

0

WT

2

0

WT

?

WT6

WT4

WT1

WT8

WT4WT20 WT8

WT11WT12

Warmwasser Kaltwasser-2 Kaltes Kaltwasser

RLT/DLE

BKT

Layout of cooling Infrastructure

13

Page 14: Energy to Solution

LRZ: Cold Water Distribution Infrastructure

14Euro-Par 2013

Page 15: Energy to Solution

LRZ: Warm Water Distribution Infrastructure

15Euro-Par 2013

Page 16: Energy to Solution

LRZ: Cooling tower infrastructure roof

16Euro-Par 2013

Page 17: Energy to Solution

LRZ: Energy Consumption - Some Case Studies

Three month analysis total power requirement LRZ: Feb to May 2013

Corresponding Power Usage Effectiveness

Considering the Cost of Non-Optimized Infrastructure

Effects of Brown-Outs: Disturbation of the System

Infrastructure masters brown-outs

Brown-outs cost money for additional infrastructure power

Coincidence of Brown-out and Failures in the Infrastructure

System remains in operation (fault tolerant)

Cost for Power and Personel

Euro-Par 201317

Page 18: Energy to Solution

Power Requirements at LRZ (02-05/2013)

18Euro-Par 2013

Page 19: Energy to Solution

PUE at LRZ (02-05/2013)

Total Facility Power / IT Equipment Power

19Euro-Par 2013

Page 20: Energy to Solution

Waste Heat into Cooling Infrastructures by

SuperMUC (2-2013)

Euro-Par 201320

Page 21: Energy to Solution

Power Requirement for SuperMUC (2/2013)

Euro-Par 201321

Page 22: Energy to Solution

(1) Test Warm Water Cooling Infrastructure on

Feb 27, 2013 (∆T = -20 K, 10 °C instead of 30 °C)

Power

Euro-Par 201322

Page 23: Energy to Solution

(2) Test Warm Water Cooling Infrastructure on

Feb 27, 2013

Waste Heat in Warm Water and Power for Cooling Machines

Euro-Par 201323

Page 24: Energy to Solution

(1) Brown-Out June 18, 2013

24

Page 25: Energy to Solution

(2) Brown-Out June 18, 2013

25

Page 26: Energy to Solution

Intermediate Summary: the Cooling Infrastructure

An integrated, heterogenous solution (air-, chilled water -, warm water cooling)

It works even in presence of external disturbation, internal failures or

misregulation (large capacity, of cooling infrastructure, skilled personel)

It is efficient in absence of disturbations

Need for Integrated, Fine-Grained Monitoring and Control Tool

Today: 3+ databases

Advantage for LRZ integrated cooling technology as compared to rack-based

systems (would require more than 200 cooling towers)

But: We did not yet consider influence of system load

varying load depending on algorithms, applications, usage strategies

processor P and C-states

influence of tools

Euro-Par 201326

Page 27: Energy to Solution

Energy Efficient HPC, the Whole Picture

• Reduce the power

losses in the

power supply chain

• Exploit your possibilities

for using compressor-

less cooling und use

energy-efficient

cooling technologies

(e.g. direct liquid

cooling)

• Re-use waste heat of IT

systems

• Use newest

semiconductor

technology

• Use of energy saving

processor and memory

technologies

• Consider using special

hardware or accelerators

tailored for solving

specific scientific

problems or numerical

algorithms

• Monitor the energy

consumption of the

compute systems and

the cooling infrastructure

• Use energy aware

system software

to exploit the energy

saving features of your

target platform

• Monitor and optimize

the performance of your

scientific applications

Energy efficient

infrastructure

Energy efficient

hardware

Energy aware

software environment

• Use most efficient

algorithms

• Use best libraries

• Use most efficient

programming paradigm

Energy efficient

applications

Euro-Par 201327

Page 28: Energy to Solution

Energy-aware System Software:

Minimizing Energy to Solution for Parallel Applications

E

serial serialFor minimum Energy to Solution: run

serial application on low power platform

E0

E0

E0

E0

For minimum Energy to Solution:

Energy Saving due to frequency scaling must be

greater than Energy consumed by unused

processors in lowest energy state and un-core

system components

Example 1: Geophysical Application SeisSol

• 40 E7-4870 cores (one node)

• MPI

• On demand Linux governor

Example 2: CFD Application HYDRO

• 256 Intel E5-2680 cores (16 nodes)

• MPI

• On demand Linux governorEuro-Par 2013

28

Page 29: Energy to Solution

New Roles in Energy to Solution for HPC

HPC Hw /Sw System Vendor(s): Develop efficient system parts (C-states, P-states)

Use best cooling technologies

Develop energy to solution tools

Data Center:Procure appropriate building, infrastructure, HPC-system, energy contracts

Optimize „the whole thing“

Define usage strategies

Algorithm and Library Designer:Design best algorithm / library

Evaluate energy to solution for different architectures

Cooperate with data center usage strategy

Application End User:Make right choice for target architecture, ISA and algorithm

Make use of „energy to solution tools for your program

Euro-Par 201329

Page 30: Energy to Solution

C- and P-States

30

F1 Fk

Core 1

Core n

Clock

Control

CachesMMUs

NCs

C-State: Describes active

part of micro-

processor

CO: all active

P-State: Describes

clock frequency

for actitve parts

„Microprocessor“

Page 31: Energy to Solution

SuperMUC Usage by Research Areas

Euro-Par 201331

Page 32: Energy to Solution

SuperMUC Usage by Jobsize

0-641.4%

65-1283.3% 129-256

4.4%

257-51212.0%

513-102420.7%

1025-204813.7%

2049-409612.4%

4097-819210.7%

8193 -163849.7%

16385-327688.9% 0 64

65 128

129 256

257 512

513 1024

1025 2048

2049 4096

4097 8192

8193 16384

16385 32768

Euro-Par 201332

Page 33: Energy to Solution

Euro-Par 2013

(1) LRZ: Simopek and PowerDAM

33

Page 34: Energy to Solution

Euro-Par 2013

(2) LRZ: Simopek and PowerDAM

34

Page 35: Energy to Solution

Euro-Par 2013

SuperMUC next steps

Universe SuperMUC subcluster: Installation in April / Mai 2013

SuperMUC Manycore: Autumn 2013, Based on Intel PHI

SuperMUC Phase 2014/2015: Contract signed, Full system 6.46 PFLOPs

LRZ in Exascale projects: DEEP/DEEPER (Intel PHI and Xtoll)

Mont-Blanc 1 and 2 (ARM technology)

EESI

Successor to SuperMUC needs strong support for users:

Scalability issues for the „Mega-core-system“

New „HPC styles“: Big Data

Realtime HPC

Integrated Visualization

Steering

35

Page 36: Energy to Solution

Euro-Par 201336

Energy to Solution - Summary

TCO for HPC needs to include „Engineering Approach“

We need

Cooperation of all stakeholders (building to algorithm)

Codesign (prediction of requirements and parameters)

Tools for „optimal compromise“ between application

performance and cost

Experiments and experience with all sorts of new technologies

On the basis of today‘s technology, EXASCALE is not affordable:

ø of TOP_10 systems June 2013 extrapolated to EXASCALE: energy cost in

German price around 1.5 B€ p.a.!!!

In cooperation with:

Helmut Breinlinger, Detlef Labrenz, Axel Auweter, Herber Huber, Albert Kirnberger, Torsten Wilde, Jeanette Wilde,

Hayk Shoukourian, Reinhold Bader, Matthias Brehm, Werner Baur, Victor Apostolescu, IBM-Team, Intel-Team, YIT-Team,

Bauamt München-Team, Herzog und Partner-Team and many other building, infrastructure, system providers and

cooperation partners