meterpu: a generic measurement abstraction api · meterpu: a generic measurement abstraction api...

Post on 01-May-2019

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MeterPU:A Generic Measurement Abstraction API

– Enabling Energy-tuned Skeleton Backend Selection

To appear in Proc. REPARA’15 workshop at ISPA’15, Helsinki, Finland. IEEE.

Lu Li, Christoph Kesslerlu.li@liu.se, christoph.kessler@liu.se

1 / 13

Motivation

Energy MeasurementNecessary for energy modeling and optimization.Difficult, especially for GPUs

Power visualization (Li et al., 2015 [4]) for a CUDA program.Illustrating "capacitor effect". Data obtained by Nvidia NVML.

6

6.5

7

7.5

8

8.5

9

·104

Pow

er(m

W)

40

42

44

46

48

50Tem

perature

30

32

34

36

Fan

speed

65 70 75 80 85

Time(s)

Requires numerical postprocessing(Burtscher et al., 2014 [1]) to calculate real energy cost

Goal: as simple as measuring time in good old days.2 / 13

MeterPU

A software multimeterA generic measurement abstraction,hiding platform-specific measurement details.Four simple functions to unify measurement interface forvarious metrics on different hardware components.

Time, Energy on CPU, GPUEasy to extend: FLOPS, cache misses etc.

On top of native measurement libraries (plug-ins).CPU time: clock_gettime()GPU time: cudaEventRecord();CPU and DRAM energy: Intel PCM library.GPU energy: Nvidia NVML library....

3 / 13

MeterPU API

template<cl ass T ype> / / analogous to swi tch/ / on a r eal mul t imeter

cl ass Meter{

publ i c :void st ar t ( ) ; / / st ar t a measurementvoid stop ( ) ; / / stop a measurementvoid cal c ( ) ; / / cal cul at e a met r i c value

/ / of a code regi ontypename Meter_ T raits<Type>:: R esultT ype const &

get_ value( ) const ;/ / get the cal cul at ed met r i c value

pr i vat e :/ / P lat form speci f i c measurement det ai l s hidden . . .

}

4 / 13

An MeterPU Application: Measure CPU Time

#incl ude <MeterPU .h>

i nt main ( ){

cpu_ func ( ) ; / / Do sth here

}

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;Meter<CPU_ Time> meter ;

cpu_ func ( ) ; / / Do sth here

}

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;Meter<CPU_ Time> meter ;

meter . st ar t ( ) ; / / Measurement Star t !

cpu_ func ( ) ; / / Do sth here

meter . stop ( ) ; / / Measurement Stop !

}

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;Meter<CPU_ Time> meter ;

meter . st ar t ( ) ; / / Measurement Star t !

cpu_ func ( ) ; / / Do sth here

meter . stop ( ) ; / / Measurement Stop !

meter . cal c ( ) ;

}

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;Meter<CPU_ Time> meter ;

meter . st ar t ( ) ; / / Measurement Star t !

cpu_ func ( ) ; / / Do sth here

meter . stop ( ) ; / / Measurement Stop !

meter . cal c ( ) ;BUILD_ CPU_ TIME_ MODEL( meter . get_ value( ) ) ;

}

5 / 13

An MeterPU Application: Measure GPU Energy

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;/ / Only one l i ne d i f f er s ! ! ! !Meter<NVML_ Energy<> > meter ;

meter . st ar t ( ) ; / / Measurement Star t !

cuda_ func<<< . . . , . . . >>>(...) ;cudaDeviceSynchronize( ) ;

meter . stop ( ) ; / / Measurement Stop !

meter . cal c ( ) ;BUILD_GPU_ ENERGY_MODEL( meter . get_ value( ) ) ;

}

6 / 13

Measure Combinations of CPUs and GPUs

#incl ude <MeterPU .h>

i nt main ( ){

using namespace MeterPU ;/ / Only one l i ne d i f f er s ! ! ! !Meter< System_ Energy<0> > meter ;

meter . st ar t ( ) ; / / Measurement Star t !

async_ cpu_ func ( ) ;cuda_ func<<< . . . , . . . >>>(...) ;wait_ for_ cpu_ func_ to_ finish ( ) ;cudaDeviceSynchronize( ) ;

meter . stop ( ) ; / / Measurement Stop !

meter . cal c ( ) ;BUILD_ SYSTEM_ ENERGY_MODEL( meter . get_ value( ) ) ;

}

7 / 13

Unification → Reuse Legacy Autotuning Framework

Figure 1 : Unification allows empirical autotuning framework to switch tomultiple meter types on different hardware components.

8 / 13

SkePU Reduce Skeleton (A Single Skeleton Call)

2e+02 1e+03 5e+03 5e+04 5e+05

Problem size for reduce

1

10

100

1000

10000T

ime(

us):

ave

rage

of 1

00 r

uns ●

CPUOMPCUDASelection

(a) Time tuning forReduce.

●●

●●

● ● ● ●

●●

2e+03 1e+04 5e+04 2e+05

Problem size for reduce

100

200

500

1000

Ene

rgy(

mill

iJ):

ave

rage

of 1

000

runs

CPUOMPCUDASelection

(b) Energy tuning forReduce.

Further results see paperEmpirical autotuning framework for time optimizationreused for energy optimization.

9 / 13

LU decomposition (Multiple Skeleton Calls)

20 30 40 50

Problem size for LU decomposition

200

500

1000

2000

5000

10000

Tim

e(us

): a

vera

ge o

f 300

0 ru

ns ●

CPUOMPCUDASelection

(a) Time tuning for LUdecomposition

● ●●

● ●●

5 10 20 50 100

Problem size for LU decomposition

100

200

500

1000

2000

5000

10000

Ene

rgy(

mill

iJ):

ave

rage

of 1

000

runs

CPUOMPCUDASelection

(b) Energy tuning for LUdecomposition

Easy switching for optimization goals.

10 / 13

Related Work

Other measurement abstraction software:EML (Cabrera et al., 2015 [2])REPARA (Akos et al., 2015 [3])

MeterPU:Requires much fewer lines of code (LOC) to use.Very low overhead, negligible.

Only one function callCan even be inlined for some meter types.

11 / 13

Pluggable to Other Work

Some MeterPU Use CasesEnergy-tuned SkePUEnergy comparison between SkePU’s Myriad Backend andNvidia GPUGlobal Composition FrameworkTunePU...

12 / 13

Summary of Contributions

Hide complexity of platform-specific energy measurement,especially for GPUs.MeterPU enables to reuse legacy empirical autotuningframeworks, such as the one in SkePU.With MeterPU, SkePU offers the first energy-tunedskeletons, as far as we know.Switching optimization goal can be easy,facilitates building time and energy models formulti-objective optimization.

, Open source, download at:http://www.ida.liu.se/labs/pelab/meterpu

13 / 13

Bibliography

Martin Burtscher, Ivan Zecena, and Ziliang Zong.Measuring GPU power with the K20 built-in sensor.In Proc. Workshop on General Purpose Processing Using GPUs (GPGPU-7). ACM, March 2014.

Alberto Cabrera, Francisco Almeida, Javier Arteaga, and Vicente Blanco.Energy measurement library (eml) usage and overhead analysis.In 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP),pages 554–558, March 2015.

Akos Kiss, Marco Danuletto, Zoltan Herczeg, Akos Kiss, Peter Molnar, Robert Sipka, Massimo Torquati, andLaszlo Vidacs.D6.4: REPARA performance and energy monitoring library.Technical report, c©The REPARA Consortium, March 2015.

Lu Li and Christoph Kessler.Validating Energy Compositionality of GPU Computations.In Proc. HIPEAC Workshop on Energy Efficiency with Heterogeneous Computing (EEHCO-2015), January2015.

13 / 13

top related