power measurement techniques for energy-efficient ... · power measurement techniques for...

21
Collaborative Research Center 912: HAEC Highly Adaptive Energy-Efficient Computing 22.06.2017 Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, and Accuracy Thomas Ilsche, Robert Schöne, Joseph Schuchart, Daniel Hackenberg, Marc Simon, Yiannis Georgiou, Wolfgang E. Nagel: Center for Information Services and High Performance Computing (ZIH), TU Dresden EnaHPC 2017 – June 22, 2017 – Frankfurt

Upload: truongnhi

Post on 29-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing22.06.2017

Power Measurement Techniques for Energy-Efficient

Computing: Reconciling Scalability, Resolution, and Accuracy

Thomas Ilsche, Robert Schöne, Joseph Schuchart, Daniel Hackenberg, Marc Simon,

Yiannis Georgiou, Wolfgang E. Nagel:Center for Information Services and High Performance Computing (ZIH), TU Dresden

EnaHPC 2017 – June 22, 2017 – Frankfurt

Page 2: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Why Measure Energy / Power?2

Energy monitoring, accounting

Power capping

Energy efficiency analysis/optimizations

Imagine performance optimization with a clock that only updates once

a second and has a 10% error

Page 3: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Key Criteria for Energy Measurements3

1. Temporal granularity Analyze short program phases

2. Spatial granularity Distinguish individual components

3. Well-defined accuracy Energy savings ≫ uncertainty

4. Scalability Usage in HPC systems

5. Cost Applicable to large production system or small experimental system

Page 4: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Related work4

1. Temporal granularity Common solutions between 1 s and 1 ms

2. Spatial granularity DC power meters, e.g. PowerPack, PowerMon2, PowerInsight

3. Well-defined accuracy Professional AC power meter

4. Scalability Vendor power meter, integrated PDU measurement

5. Cost Embedded CPU measurements, RAPL

Page 5: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Advancing Energy Measurement

Vendor collaboration

Highly scalable

Verified accuracy

Good temporal and spatial

resolution

Deployed in production HPC

system with 1456 nodes

5

Custom-built system

Very high temporal resolution

Good spatial resolution

Well understood accuracy

Page 6: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Systems Under Test

1456 Bullx DLC nodes

2 × Intel Xeon E5-2680 v3

64 – 256 GiB memory

128 GB SSD

Bull SCS4

6

Single node workstation

2 × Intel Xeon E5 2690 v3

256 GiB memory

800 GB SSD

Ubuntu 16.04 Server

Page 7: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Instrumentation points

Total DC node power

2 × CPUs

4 × DRAM DIMM groups

7

2 × sockets (CPU + Memory)

Mainboard 12V, 5V, 3.3V ATX

GPU via PCIe and 8+6 Molex

SSD 12V, 5V

Sum of all fans

All DC power consumers

Page 8: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HAEC Instrumentation Points8

AC power

grid

CEE 7/4plug

PSU 12 VMolexplug

VR 1.5 V SocketComponent

Hall Effect Sensors Shunts Voltage Regulators

Page 9: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HAEC Measurement Chain9

Sensor

• Shunts at all DC consumers

• Also capture all voltages

Analog processing

• Amplification to common voltage range

• Filtering

Data Acquisition

• Two NI DAQ cards

• Up to 500 kSa/s (for four sensors)

• All sensors 7 kSa/s

Digital processing

• Out-of-band measurement

• Analysis & integration

Page 10: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

10

HDEEM Hardware

10

Page 11: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HDEEM Power Acquisition Scheme11

2xCPU, 4xRAM measurement

at voltage regulators

1000 samples/s

A/D converter

8000 samples/s

blade sensor

analog filter

500 Hz

analo

g

sig

nal

SM

B/I2C

digital filter

100 samples/s

FPGAshort-term

buffer

digital filter

1000 samples/sI2C

Baseboard

Management

Controller AS

RA

M

GPIO PCIE

in-band compute

node1000 samples/s

out-of-band IPMI

Dataheap

collection and

history charts1 sample/s

IPM

I

SLURM

accounting and

profiling1 sample/s

Page 12: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Calibration & Verification HAEC12

Measuring shunts in the setup

using a variable load resistor

Calibration factors in

amplifiers

Error < 2% compared to

calibrated reference

measurement

Mostly within the uncertainty

of the reference

Page 13: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Calibration HDEEM13

Calibration of the deployed HPC system

Determine calibration factor for each node

Program correction factors for raw sensor

values

Page 14: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Verification HDEEM14

Independent measurement

with calibrated power meter

Challenging worst-case

aliasing workloads

Maximum error 3 %

Page 15: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Application Energy Measurement15

Plugin metric for

Score-P

All measurement is

asynchronous

No perturbation

NAS parallel

benchmarks

OpenMP & MPI

BT solver

ApplicationMeasurement

system

Energy

recording

initialize

event

event

event

finalize

start measurement

stop measurement

power trace

Page 16: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HDEEM Example (NPB on 1024 nodes)16

Page 17: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HDEEM Example (NPB on 1024 nodes)17

Page 18: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HDEEM Example (NPB on 1024 nodes)18

Page 19: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

HAEC Example19

Page 20: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

Summary20

Application optimization needs good metrics

Energy efficiency optimizations brings together hardware and software

Demonstrate two energy measurement infrastructures

Very high resolution at small scale

500 kSa/s – observe applications regions of < 100 μs

Separate measurements per DC-component

High resolution measurement at HPC production scale (1456 nodes)

1 kSa/s per node power

100 Sa/s socket / DRAM measurement

Both measurements verified with high accuracy errors < 2 % / < 3 %

Page 21: Power Measurement Techniques for Energy-Efficient ... · Power Measurement Techniques for Energy-Efficient Computing: Reconciling Scalability, Resolution, ... Professional AC power

Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

Thomas Ilsche

21