the price performance of performance models

37
11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 1 Felix Wolf, Technical University of Darmstadt ModSim 2021 The price performance of performance models Application System Photo: Alex Becker / TU Darmstadt

Upload: others

Post on 19-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 1

Felix Wolf, Technical University of DarmstadtModSim 2021

The price performance of performance models

Application System

Photo: Alex Becker / TU Darmstadt

Page 2: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 2

Acknowledgement

TU Darmstadt• Alexandru Calotoiu

• Alexander Geiß

• Alexander Graf

• Daniel Lorenz

• Benedikt Naumann

• Thorsten Reimann

• Marcus Ritter

• Sergei Shudler

ETH Zurich• Alexandru Calotoiu

• Marcin Copik

• Tobias Grosser

• Torsten Hoefler

• Nicolas Wicki

LLNL• David Beckingsale

• Christopher Earl

• Ian Karlin

• Martin Schulz

Page 3: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 3

Scaling your code can harbor performance surprises*…

CommunicationCom

putat

ionCommunication

Compu

tation

*Goldsmith et al., 2007

Page 4: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 4

Performance model

29 210 211 212 2130

3

6

9

12

15

18

21

3¨ 10

´4 p2 ` c

Processes

Tim

ers

s

Formula that expresses a relevant performance metric as a function of one or more execution parameters

Identify kernels

• Incomplete coverage

Create models

• Laborious, difficult

Analytical (i.e., manual) creation challenging for entire programs

𝑡 = 𝑓 𝑝𝑡 = 3 & 10!"𝑝# + 𝑐

Page 5: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 5

Empirical performance modeling

Performance measurements with different execution parameters x1,...,xn

t1 t2t3

tn-2 tn-1tn

….

Machine learning

𝑡 = 𝑓 𝑥$, … , 𝑥%

Alternative metrics: FLOPs, data volume…

Page 6: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 6

Challenges

Applications

System

Run-to-run variation / noise

Cost of the required experiments

Page 7: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 7

How to deal with noisy data

• Introduce prior into learning process• Assumption about the probability distribution generating the data

• Computation• Memory access• Communication• I/O~

Time Effort

Page 8: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 8

Typical algorithmic complexities in HPCComputation

Communication

Samplesort

Naïve N-body

FFT

LU

Samplesort

Naïve N-body

FFT

LU

… …

t(p) ~ p2

t(p) ~ p

t(p) ~ c

t(p) ~ c

t(p) ~ p2 log22 (p)

t(p) ~ p

t(p) ~ log2(p)

t(p) ~ c

Page 9: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 9

Performance model normal form (PMNF)

𝑓 𝑥 = .&'$

%

𝑐& & 𝑝(! & 𝑙𝑜𝑔#)! 𝑥

Single parameter [Calotoiu et al., SC13]

𝑓 𝑥$, … , 𝑥* = .&'$

%

𝑐&2+'$

*

𝑥+(!" & 𝑙𝑜𝑔#

)!" 𝑥+

Multiple parameters [Calotoiu et al., Cluster’16]

Heuristics to reduce search

space

Parameter selection

Search space

configuration

Linear regression +

cross-validation

Quality assurance

Page 10: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 10

Extra-P 4.0

Available at: https://github.com/extra-p/extrap

Page 11: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 11

MPI implementations[Shudler et al., IEEE TPDS 2019]

Platform Juqueen Juropa Piz DaintAllreduce [s] Expectation: O (log p)Model O (log p) O (p0.5) O (p0.67 log p)R2 0.87 0.99 0.99Match ✔ ~ ✘!Comm_dup [B] Expectation: O (1)Model 2.2e5 256 3770 + 18pR2 1 1 0.99Match ✔ ✔ ✘

Page 12: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 12

Kripke - example w/ multiple parameters

SweepSolver

Main computation kernel

Expectation – Performance depends on problem size

Actual model:

MPI_Testany

Main communicationkernel: 3D wave-frontcommunication pattern

Expectation – Performance depends on cubic root of process count

Actual model:

Kernels must wait on each other

*Coefficients have been rounded for convenience

Smaller compounded effect discovered

t ~ p3t ~ d ⋅ g

t = 5+ d ⋅ g+ 0.005 ⋅ p3 ⋅d ⋅ g t = 7+ p3 + 0.005 ⋅ p3 ⋅d ⋅ g

Page 13: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 13

Experiments can be expensiveNeed experiments, = #parameters

Prob

lem

siz

e pe

r pro

cess

Processes

Low memory High jitter

50

20

40

30

10

4 8 16 32 64

Page 14: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 14

Multi-parameter modeling in Extra-P

Generation of candidate models and selection of best fit

Find best single-parameter model

Combine them in the most plausible way

(+, *, none)

𝑇𝑖𝑚𝑒 = 𝑐$ & 𝑛 & 𝑙𝑜𝑔 𝑛 & 𝑝

𝐹𝐿𝑂𝑃𝑠 = 𝑐# & 𝑛 & 𝑙𝑜𝑔 𝑛

c1 + c2 ⋅ p

c1 + c2 ⋅ p2

c1 + c2 ⋅ log(p)c1 + c2 ⋅ p ⋅ log(p)

c1 + c2 ⋅ p2 ⋅ log(p)

c1 ⋅ log(p)+ c2 ⋅ pc1 ⋅ log(p)+ c2 ⋅ p ⋅ log(p)c1 ⋅ log(p)+ c2 ⋅ p

2

c1 ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p+ c2 ⋅ p ⋅ log(p)c1 ⋅ p+ c2 ⋅ p

2

c1 ⋅ p+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p2 + c2 ⋅ p

2 ⋅ log(p)

Page 15: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 15

How many data points do we really need?Pr

oble

m s

ize

per p

roce

ss

Processes

50

20

40

30

10

4 8 16 32 64

Page 16: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 16

Learning cost-effective sampling strategies[Ritter et al., IPDPS’20]

Function generator

Noise module

Reinforcement learning agent

Selectedparametervalues

Syntheticmeasurement

Extra-P

Evaluation

Feedback

Prediction

Ground truth

Empirical model

Page 17: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 17

Heuristic parameter-value selection strategy

Measure min. amount points

required for modeling

Create a model using Extra-P

Gather on additional measurement

(assumed to be cheapest)

Create new model

Final model

If cost <

budget

yes

no

Page 18: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 18

Synthetic data evaluation

5%10%

15%20%

-5%-10%

-15%-20%

Training Evaluation

Page 19: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 19

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

1 parameter, 5% noise 2 parameters, 5% noise

Measurements used / Percentage of cost5 / 100% 9 / 12% 11 / 13% 15 / 17% 25 / 100%

Repetitions2 4 6

Page 20: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 20

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

Measurements used / Percentage of cost13 / 1.7% 15 / 1.8% 25 / 2.2% 75 / 11% 125 / 100%

Repetitions2 4 6

3 parameters, 5% noise

Page 21: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 21

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

4 parameters, 5% noise

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

Page 22: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 22

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

4 parameters, 1% noise

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

Page 23: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 23

Case studies

Application #Parameters Extra points

Cost savings [%]

Prediction error [%]

FASTEST 2 0 70 2

Kripke 3 3 99 39

Relearn 2 0 85 11

0

Page 24: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 24

Noise-resilient performance modeling[Ritter et al., IPDPS’21]

§ Performance measurements frequently affected by noise

§ Regression struggles with increased amounts of noise –especially w/ more parameters

§ Neural networks are resilient to noise – when noise is part of their training

Original Reconstructed

1 Parameter:

x3 Parameter:

zy100% application behavior

x100% application behavior

noise

noiseAdapted from: https://developer.nvidia.com/blog/ai-can-now-fix-your-grainy-photos-by-only-looking-at-grainy-photos/

Page 25: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 25

Noise-resilient adaptive modeling

DNNs often better at guessing models in the presence of noise

Performance measurements Estimate noise

Noise low?

Final model

Classic (sparse) modeler

Select best

DNN modeler

Performance model

yes

no

Performance model

Transfer learning

Adapted. Original © 2021 IEEE

Page 26: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 26

Noise-resilient performance modelingSynthetic evaluation

0%

5%

10%

15%

20%

25%

30%

20% 50% 100%Noise level

Median relative error

Adaptive Regression

2 parameters

(at unseen point, two ticks in each dimension)

0

Adapted. Original © 2021 IEEE

Page 27: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 27

Noise-resilient performance modelingCase studies – Results

13,45% 16,23%

7,12%

22,28%

69,79%

7,12%

0%

10%

20%

30%

40%

50%

60%

70%

80%

Kripke (Vulcan) FASTEST (SuperMUC) RELeARN

Median relative error

Adaptive Regression

Noise < 5%:only the regression

modeler is used

0

Adapted. Original © 2021 IEEE

Page 28: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 28

Noise-resilient performance modelingCase studies – Noise

Adapted. Original © 2021 IEEE

Page 29: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 29

Optimized measurement point selection[Naumann et al., in preparation]

Sparse Better?

Page 30: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 30

Optimized measurement point selectionvia Gaussian Process Regression (GPR) – Introduction

From: Leclercq, Florent. (2018). Bayesian optimization for likelihood-free cosmological inference. Physical Review D. 98. 10.1103/PhysRevD.98.063511. Used with permission.

Page 31: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 31

Mean function

Optimized Measurement Point SelectionGaussian Processes

Gaussian Process (GP): Series of normal distributions

𝐺𝑃 𝑡 = 𝑁( 𝑚 𝑡 , 𝑐 𝑡, 𝑡, )

Covariance function

Page 32: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 32

Optimized measurement point selectionGaussian Processes

Idea: Use covariance function

High variance High uncertainty

Promising measurement

CandidatesMeasurement costs

have to be considered!

Cost-benefit calculation

Page 33: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 33

ResultsSynthetic evaluation (3 parameters)

86

88

90

92

94

96

98

100

5% Noise 7.5% Noise 10% Noise

% o

f acc

urat

e m

odel

s

Sparse GPR Hybrid

0

10

20

30

40

50

60

5% Noise 7,5% Noise 10% Noise

Addi

tiona

l mea

sure

men

ts

Sparse GPR Hybrid

Relative error < 20%(at unseen point, 2 ticks in each direction)

Measurements needed(in addition to the minimal measurement set)

0

Page 34: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 34

Results

0

5

10

15

20

25

30

35

40

5% Noise 7,5% Noise 10% Noise

% o

f bud

get u

sed

Sparse GPR Hybrid

Budget needed(Compared to complete measurement set)

Page 35: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 35

Parameter selection[Copik et al, PPoPP’21]

• The more paramters the more experiments

• Modeling parameters without performance impact is harmful

Taint analysis

Program

Input parameters Which parameter influences

whichfunction?

Taint labels

Page 36: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 36

Selected papers

Topic BibliographyFoundation (single model paramter)

Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. SC13.

MPI case study Sergei Shudler, Yannick Berens, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf: Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations. IEEE TPDS, 30(8):1768–1785, 2019.

Multiple model parameters

Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf: Fast Multi-Parameter Performance Modeling. IEEE Cluster 2016.

Cost-effective sampling strategies

Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. IPDPS 2020.

Noise resilience Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. IPDPS 2021.

Taint-based performance modeling

Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler: Extracting Clean Performance Models from Tainted Programs. PPoPP 2021.

Page 37: The price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 37

Thank you!

Q&A