the 4 pillar framework for energy efficient hpc data centers · 2017. 2. 14. · hpc system...

16
9/3/2013 Torsten Wilde, Axel Auweter, Hayk Shoukourian (Leibniz Supercomputing Centre of the Bavarian Academy of Science BAdW-LRZ) ENA-HPC 2013, Dresden, Germany The 4 Pillar Framework for Energy Efficient HPC Data Centers - a foundation for energy efficiency efforts in high performance computing - Open Access Paper: http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-013-0244-6

Upload: others

Post on 22-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

9/3/2013

Torsten Wilde, Axel Auweter, Hayk Shoukourian

(Leibniz Supercomputing Centre of the Bavarian Academy of Science – BAdW-LRZ)

ENA-HPC 2013, Dresden, Germany

The 4 Pillar Framework for Energy Efficient HPC Data Centers

- a foundation for energy efficiency efforts in high performance computing -

Open Access Paper: http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-013-0244-6

Page 2: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Outline

Background

4 Pillar Framework

Usage examples

Summary

Page 3: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Why Care About Energy Efficiency ?

Page 4: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

New challenges

Page 5: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

The need for a foundation

BAdW-LRZ data center goal: reducing the TCO for the lifetime of its

HPC systems

Questions to answer: What does it mean? What parts of your data center are involved?

What are the inter-system connections? How to get there and where to start? What

solutions exist?

Need a way to understand and categorize all internal and external

efforts in the area of improving the data center energy efficiency.

Need a foundation for:

Defining final vision for BAdW-LRZ

Planning future work

Presenting current efforts to outside stakeholders

Page 6: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Towards the 4 Pillar Framework

What HPC data center aspects play an important part for the

improvement of energy efficiency?

1. Building Infrastructure

2. System Hardware

3. System Software

4. Applications

Page 7: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

The 4 Pillar Framework - a foundation for energy efficiency

efforts in high performance computingExternal Influences/Constraints

Nei

ghb

ori

ng

Bu

ildin

gs

Uti

lity

Pro

vid

ers

Data Center (Goal: Reduce Total Cost of Ownership)

Pillar 1Building Infrastructure

Goal: Improve Key Performance Indicators

· Reduce power losses in the supply chain

· Improve cooling technologies

· Reuse waste heat from IT systems

· Verify actions taken by monitoring all relevant information

Pillar 2HPC System Hardware

Goal: Reduce Hardware Power Consumption

· Use newest semiconductor technologies

· Use of energy saving processor and memory technologies

· Consider using special hardware or accelerators designed for specific scientific problems

· Provide sensors for thorough power measurements

Pillar 3HPC System Software

Goal: Optimize Resource Usage, Tune System

· Provide workload management according to site goals

· Exploit the energy saving features of the platforms by tuning the systems with respect to the applications’ needs

· Shut down idle nodes

· Monitor the energy consumption of all components in the compute systems

Pillar 4HPC Applications

Goal: Optimize Application Performance

· Use the most efficient algorithms

· Use the best libraries (tuned and optimized for the system)

· Use most efficient programming paradigms

Page 8: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

The 4 Pillar Framework a tool for:

Operational Aspects:

Identifying data center specific areas of improvement

Planning future work

Identifying required internal and external resources

Research Aspects:

Classifying current research efforts (in paper)

Guiding energy efficiency efforts

Presenting results and plans to stakeholders

Page 9: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

How to plan your work - From general to specific, part 1

• PRACE PP and 1IP (CooLMUC) (EtS, adsorption cooling req. hot

water cooling – SuperMUC, DVFS, energy efficient scheduling)

External Influences/Constraints

Data Center: Current Status at LRZ

Pillar 3HPC System Software

Nei

ghb

orin

g Bu

ildin

gs

Uti

lity

Pro

vid

ers

Pillar 1Building Infrastructure

Pillar 2HPC System Hardware

Pillar 4HPC Applications

Building Management & Infrastructure

Infrastructure Monitoring

Hardware Management

System Hardware Monitoring

System Management

Software

System Software Monitoring

Performance Analysis Tools

Performance Monitoring

Heat Reuse Solutions System Tuning for Energy Efficiency

Power Monitoring

Application Specific Tuning Efforts

Page 10: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

How to plan your work - From general to specific, part 2

External Influences/Constraints

Data Center: Current Status at LRZ

Nei

ghb

ori

ng

Bu

ildin

gs

Uti

lity

Pro

vid

ers

Pillar 1Building Infrastructure

Pillar 2HPC System Hardware

Pillar 3HPC System Software

Pillar 4HPC Applications

Johnson ControlsWinCC

BACNET, OPC, etc.

SuperMUC Office HeatingCoolMUC Adsorption Cooling

DVFS Support in LoadLeveler / Slurm

Power Data Aggregation Monitor (PowerDAM V1.0)

AutoTuneAutoPin

Board Management

ControllersPaddle Cards

SNMP, IPMI, etc.

xCAT, LoadLeveler, Clustware, Slurm

sysfs, procfs, etc.

Peryst, IPM, Scalasca, etc.

PAPI, Likwid, etc.

CS IBM HPC APPIBM / Megware

Page 11: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Energy Efficient Data Center Vision for BAdW-LRZ

What’s your goal?

What does it mean for

your data center?

Key Performance

Indicators

Energy To Solution

(EtS)

Total Cost of

Ownership (TCO)

External Influences/Constraints

Data Center: Reduce Total Cost of Ownership

Nei

ghb

ori

ng

Bu

ildin

gs

Uti

lity

Pro

vid

ers

Pillar 1Building Infrastructure

Pillar 2HPC System Hardware

Pillar 3HPC System Software

Pillar 4HPC Applications

Building Management & Infrastructure

Infrastructure Monitoring

Advanced Heat Reuse Technologies System Scheduler

Hardware Management

System Hardware Monitoring

System Management

Software

System Software Monitoring

Performance Analysis Tools

Performance Monitoring

Infrastructure Aware Resource Management & Scheduling

Modeling Simulation & Optimization

Data Center Data Acquisition Monitor

Page 12: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Next steps SIMOPEK and FEPA

External Influences/Constraints

Data Center: Current Status at LRZ

Ne

igh

bo

rin

g B

uild

ing

s

Uti

lity

Pro

vid

ers

Pillar 1Building Infrastructure

Pillar 2HPC System Hardware

Pillar 3HPC System Software

Pillar 4HPC Applications

CS IBM HPC APP

SuperMUC:

Cooling:

10% Air

90% Water

80% Hot Water

20% Cold Water

Power Data:

PDU‘s and

paddle cards

Load Leveler

(energy tags)

XCat

Non uniform

application mix

Power:

Idle: 0.7MW

Average: 2.4MW

Max: 3.7MW

Focus: Hot Water

Infrastructure

JCI:

BACNET, OPC

DB, event logs

WinCC:

DB

Data Collection (low resolution data)

by PowerDAM V2.0

Building Heating

Modeling, Simulation, Optimization

by MYNTSModeling, Simulation, Optimization

High resolution

data

SIMOPEK

FEPA

Page 13: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

SIMOPEK 2nd Framework Instance

External Influences/Constraints

Data Center: Current Status at LRZ

Ne

igh

bo

rin

g B

uild

ing

s

Uti

lity

Pro

vid

ers

Pillar 1Building Infrastructure

Pillar 2HPC System Hardware

Pillar 3HPC System Software

Pillar 4HPC Applications

CS IBM HPC APP

CooLMUC:

Cooling:

100% Hot Water

Power Data:

PDU‘sSlurm

clustware

JCI:

BACNET, OPC

DB, event logs

WinCC:

DB

Data Collection (low resolution data)

by PowerDAM V2.0

Advanced Adsorption Chiller

Modeling, Simulation, Optimization

By MYNTS

SIMOPEK

Synthetic

benchmarks

MegwareSorTech AG

Page 14: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Links:

www.simopek.de

PowerDAM paper:

ICT4S 2013: http://dx.doi.org/10.3929/ethz-a-007337628

Open Access 4 Pillar Framework Paper:

http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007

/s00450-013-0244-6

14

Page 15: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Summary

The 4 Pillar Framework is a tool for:

Identifying data center specific areas of improvement

Planning future work

Identifying required internal and external resources

Classifying current research efforts (in paper)

Guiding energy efficiency efforts

Presenting results and plans to stakeholders

Page 16: The 4 Pillar Framework for Energy Efficient HPC Data Centers · 2017. 2. 14. · HPC System Hardware Pillar 3 HPC System Software Pillar 4 HPC Applications Johnson Controls WinCC

Why use the 4 Pillar Framework ?

Open Access Paper: http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-013-0244-6

HPC Resort & Spa