beyond dvfs: a first look at performance under a hardware-enforced power bound

13
LLNL-PRES-552151 This work has been authored by Lawrence Livermore National Security, LLC under contract DE-AC52- 07NA27344 with the U.S. Department of Energy. Accordingly, the United States Government retains and the publisher, by accepting this work for dissemination, acknowledges that the United States Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the disseminated form of this work or Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound HPPAC 2012 Barry Rountree, Dong H. Ahn, Bronis R. de Supinski , David K. Lowenthal, Martin Schulz Monday, May 21st

Upload: oakes

Post on 23-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. HPPAC 2012. Barry Rountree, Dong H. Ahn , Bronis R. de Supinski , David K. Lowenthal , Martin Schulz. Monday, May 21st. Computing under a power bound forces us to rethink performance. Traditional - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

LLNL-PRES-552151This work has been authored by Lawrence Livermore National Security, LLC under contract DE-AC52-07NA27344 with the U.S. Department of Energy. Accordingly, the United States Government retains and the publisher, by accepting this work for dissemination, acknowledges that the United States Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the disseminated form of this work or allow others to do so, for United States Government purposes.

Beyond DVFS:A First Look at Performance Under a Hardware-Enforced Power Bound

HPPAC 2012Barry Rountree, Dong H. Ahn,

Bronis R. de Supinski, David K. Lowenthal, Martin Schulz

Monday, May 21st

Page 2: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521512

Exascale (if not sooner)• Not all components can

operate at highest power level simultaneously

• Power provisioning is best effort

• Users must tune power for performance

• Nearly every application limited by power

Computing under a power bound forces us to rethink performance

Traditional• All components can

operate at highest power level simultaneously

• Power provisioned for “worst case”

• Users are happily oblivious (about power)

• Few if any applications limited by power

Page 3: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521513

Computing under a power bound forces us to rethink performance

Exascale (if not sooner)• Utilization measured in

kilowatt hours• Weak-scaling jobs may

perform optimally with fewer, faster nodes

• Running all components as fast as possible cannot be done. Running most components at identical speeds is suboptimal

Traditional• Utilization measured in

node-hours• Weak-scaling jobs perform

best using as many nodes as possible

• Running all components as fast as possible reliably leads to top performance

Page 4: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521514

Power(Watts)

Processors

rzmerl(Early April)

Average Processor Power BoundSum of processor power draw divided by processor count must be at or below this level.

Each processor usessome amount of power

Total processor powerdivided by processor count should be lessthan the bound

Linpack + Intel Turbo Boost

GHznon-turbo(2.6 GHz)

max turbo(3.3 GHz)

Short-term solution:

Disable Turbo Boostglobally

Lost performance

Mid-term solution:

Buy more power

(This does not scale)

Average Processor Power Bound rzmerl(Mid April)exascale(?)

Long-term solution:

Schedule powerto optimize performance

An Unexpected Power Bound:Merlot cluster at LLNL

Page 5: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521515

Runtime Average Power Limit (RAPL)• Measures cumulative joules (power x time)• Three separate power meters• Clamping on package and DRAM power

Turbo suppression

Effective frequency

libmsr currently under development

Scheduling Power with Processor Hardware: Intel’s RAPL

Page 6: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521516

Domains and Features of Runing Average Power Limit Technology

Source: Intel 64 and IA-32 Software Developer’s Manual,

Volume 3B

Introduced on Sandy Bridge ProcessorsOnboard energy meters measure accumulated joules.

Divide by time to get average power.

Can place user-specified limit on average power over a user-specific time window.

Page 7: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521517

Bounding Package Power with RAPL

Setting LOCK fixes power limits until rebootLimits are ignored until enable bits are setPower limit is enforced using average watts over user specified window.

Resolution: ~1msMax Window: ~46ms

Watts granularity: 0.125WMinimum power bound: 51W

Source: Intel 64 and IA-32 Software Developer’s Manual,

Volume 3B

Two windows allows tweaking peak and average powerHigher bound, smaller window for peak powerLower bound, wider window for average power

Page 8: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521518

Bounding DRAM Power with RAPL

Similar interface for DRAM power control

Only one power limit supported

Source: Intel 64 and IA-32 Software Developer’s Manual,

Volume 3B

Page 9: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-5521519

Processors are Heterogeneous Under a Power Bound

rzzinmg.C.864 processors34 power bounds

No Power Bound

Processors take similar time

Significant variation in power

Power variation expected and acceptable

51W Power Bound

Processors require same amount of power

Individual processor efficiency has not changed

Efficiency variation manifests as performance variation

Processors are heterogeneous under a power bound

Where should the hot processors go?

Is is worth paying a premium efficient processors?

Page 10: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-55215110

Wide Variation in Application Package Power Draw

Ave

rgae

Wat

ts

rzmerlNPB C.8234 processors

Wide variation in power consumption across applications

Provisioning power for most power-hungry application leaves remaining applicationsnode-bound, not power-bound

Processors ordered by cg.C.8 average PKG power

Page 11: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-55215111

Wide Variation in Application DRAM Power Draw

Ave

rgae

Wat

ts

rzmerlNPB C.8234 processors

Memory power substantially lower than package power

Processors ordered by cg.C.8 average PKG power

Page 12: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound

Lawrence Livermore National Laboratory LLNL-PRES-55215112

Overprovision hardware• Processors are cheap and plentiful• Power is not

Measure performance at max power consumption• May require turning off nodes• Running out of nodes before running out of power means

application is not power-bound

Expect heterogeneous processor performance• Put most-efficient nodes on the critical path if possible• Put least-efficient nodes where they will do the least harm

Exascale Is Not Only Bigger: Exascale Is Fundamentally Different

Page 13: Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power  Bound