the price performance of performance models

Post on 19-Feb-2022

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 1

Felix Wolf, Technical University of DarmstadtModSim 2021

The price performance of performance models

Application System

Photo: Alex Becker / TU Darmstadt

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 2

Acknowledgement

TU Darmstadt• Alexandru Calotoiu

• Alexander Geiß

• Alexander Graf

• Daniel Lorenz

• Benedikt Naumann

• Thorsten Reimann

• Marcus Ritter

• Sergei Shudler

ETH Zurich• Alexandru Calotoiu

• Marcin Copik

• Tobias Grosser

• Torsten Hoefler

• Nicolas Wicki

LLNL• David Beckingsale

• Christopher Earl

• Ian Karlin

• Martin Schulz

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 3

Scaling your code can harbor performance surprises*…

CommunicationCom

putat

ionCommunication

Compu

tation

*Goldsmith et al., 2007

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 4

Performance model

29 210 211 212 2130

3

6

9

12

15

18

21

3¨ 10

´4 p2 ` c

Processes

Tim

ers

s

Formula that expresses a relevant performance metric as a function of one or more execution parameters

Identify kernels

• Incomplete coverage

Create models

• Laborious, difficult

Analytical (i.e., manual) creation challenging for entire programs

𝑡 = 𝑓 𝑝𝑡 = 3 & 10!"𝑝# + 𝑐

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 5

Empirical performance modeling

Performance measurements with different execution parameters x1,...,xn

t1 t2t3

tn-2 tn-1tn

….

Machine learning

𝑡 = 𝑓 𝑥$, … , 𝑥%

Alternative metrics: FLOPs, data volume…

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 6

Challenges

Applications

System

Run-to-run variation / noise

Cost of the required experiments

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 7

How to deal with noisy data

• Introduce prior into learning process• Assumption about the probability distribution generating the data

• Computation• Memory access• Communication• I/O~

Time Effort

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 8

Typical algorithmic complexities in HPCComputation

Communication

Samplesort

Naïve N-body

FFT

LU

Samplesort

Naïve N-body

FFT

LU

… …

t(p) ~ p2

t(p) ~ p

t(p) ~ c

t(p) ~ c

t(p) ~ p2 log22 (p)

t(p) ~ p

t(p) ~ log2(p)

t(p) ~ c

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 9

Performance model normal form (PMNF)

𝑓 𝑥 = .&'$

%

𝑐& & 𝑝(! & 𝑙𝑜𝑔#)! 𝑥

Single parameter [Calotoiu et al., SC13]

𝑓 𝑥$, … , 𝑥* = .&'$

%

𝑐&2+'$

*

𝑥+(!" & 𝑙𝑜𝑔#

)!" 𝑥+

Multiple parameters [Calotoiu et al., Cluster’16]

Heuristics to reduce search

space

Parameter selection

Search space

configuration

Linear regression +

cross-validation

Quality assurance

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 10

Extra-P 4.0

Available at: https://github.com/extra-p/extrap

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 11

MPI implementations[Shudler et al., IEEE TPDS 2019]

Platform Juqueen Juropa Piz DaintAllreduce [s] Expectation: O (log p)Model O (log p) O (p0.5) O (p0.67 log p)R2 0.87 0.99 0.99Match ✔ ~ ✘!Comm_dup [B] Expectation: O (1)Model 2.2e5 256 3770 + 18pR2 1 1 0.99Match ✔ ✔ ✘

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 12

Kripke - example w/ multiple parameters

SweepSolver

Main computation kernel

Expectation – Performance depends on problem size

Actual model:

MPI_Testany

Main communicationkernel: 3D wave-frontcommunication pattern

Expectation – Performance depends on cubic root of process count

Actual model:

Kernels must wait on each other

*Coefficients have been rounded for convenience

Smaller compounded effect discovered

t ~ p3t ~ d ⋅ g

t = 5+ d ⋅ g+ 0.005 ⋅ p3 ⋅d ⋅ g t = 7+ p3 + 0.005 ⋅ p3 ⋅d ⋅ g

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 13

Experiments can be expensiveNeed experiments, = #parameters

Prob

lem

siz

e pe

r pro

cess

Processes

Low memory High jitter

50

20

40

30

10

4 8 16 32 64

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 14

Multi-parameter modeling in Extra-P

Generation of candidate models and selection of best fit

Find best single-parameter model

Combine them in the most plausible way

(+, *, none)

𝑇𝑖𝑚𝑒 = 𝑐$ & 𝑛 & 𝑙𝑜𝑔 𝑛 & 𝑝

𝐹𝐿𝑂𝑃𝑠 = 𝑐# & 𝑛 & 𝑙𝑜𝑔 𝑛

c1 + c2 ⋅ p

c1 + c2 ⋅ p2

c1 + c2 ⋅ log(p)c1 + c2 ⋅ p ⋅ log(p)

c1 + c2 ⋅ p2 ⋅ log(p)

c1 ⋅ log(p)+ c2 ⋅ pc1 ⋅ log(p)+ c2 ⋅ p ⋅ log(p)c1 ⋅ log(p)+ c2 ⋅ p

2

c1 ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p+ c2 ⋅ p ⋅ log(p)c1 ⋅ p+ c2 ⋅ p

2

c1 ⋅ p+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p2 + c2 ⋅ p

2 ⋅ log(p)

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 15

How many data points do we really need?Pr

oble

m s

ize

per p

roce

ss

Processes

50

20

40

30

10

4 8 16 32 64

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 16

Learning cost-effective sampling strategies[Ritter et al., IPDPS’20]

Function generator

Noise module

Reinforcement learning agent

Selectedparametervalues

Syntheticmeasurement

Extra-P

Evaluation

Feedback

Prediction

Ground truth

Empirical model

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 17

Heuristic parameter-value selection strategy

Measure min. amount points

required for modeling

Create a model using Extra-P

Gather on additional measurement

(assumed to be cheapest)

Create new model

Final model

If cost <

budget

yes

no

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 18

Synthetic data evaluation

5%10%

15%20%

-5%-10%

-15%-20%

Training Evaluation

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 19

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

1 parameter, 5% noise 2 parameters, 5% noise

Measurements used / Percentage of cost5 / 100% 9 / 12% 11 / 13% 15 / 17% 25 / 100%

Repetitions2 4 6

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 20

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

Measurements used / Percentage of cost13 / 1.7% 15 / 1.8% 25 / 2.2% 75 / 11% 125 / 100%

Repetitions2 4 6

3 parameters, 5% noise

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 21

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

4 parameters, 5% noise

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 22

Synthetic evaluation results

20%

40%

60%

80%

100%

Perc

enta

ge o

f acc

urat

e m

odel

s*

4 parameters, 1% noise

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 23

Case studies

Application #Parameters Extra points

Cost savings [%]

Prediction error [%]

FASTEST 2 0 70 2

Kripke 3 3 99 39

Relearn 2 0 85 11

0

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 24

Noise-resilient performance modeling[Ritter et al., IPDPS’21]

§ Performance measurements frequently affected by noise

§ Regression struggles with increased amounts of noise –especially w/ more parameters

§ Neural networks are resilient to noise – when noise is part of their training

Original Reconstructed

1 Parameter:

x3 Parameter:

zy100% application behavior

x100% application behavior

noise

noiseAdapted from: https://developer.nvidia.com/blog/ai-can-now-fix-your-grainy-photos-by-only-looking-at-grainy-photos/

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 25

Noise-resilient adaptive modeling

DNNs often better at guessing models in the presence of noise

Performance measurements Estimate noise

Noise low?

Final model

Classic (sparse) modeler

Select best

DNN modeler

Performance model

yes

no

Performance model

Transfer learning

Adapted. Original © 2021 IEEE

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 26

Noise-resilient performance modelingSynthetic evaluation

0%

5%

10%

15%

20%

25%

30%

20% 50% 100%Noise level

Median relative error

Adaptive Regression

2 parameters

(at unseen point, two ticks in each dimension)

0

Adapted. Original © 2021 IEEE

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 27

Noise-resilient performance modelingCase studies – Results

13,45% 16,23%

7,12%

22,28%

69,79%

7,12%

0%

10%

20%

30%

40%

50%

60%

70%

80%

Kripke (Vulcan) FASTEST (SuperMUC) RELeARN

Median relative error

Adaptive Regression

Noise < 5%:only the regression

modeler is used

0

Adapted. Original © 2021 IEEE

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 28

Noise-resilient performance modelingCase studies – Noise

Adapted. Original © 2021 IEEE

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 29

Optimized measurement point selection[Naumann et al., in preparation]

Sparse Better?

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 30

Optimized measurement point selectionvia Gaussian Process Regression (GPR) – Introduction

From: Leclercq, Florent. (2018). Bayesian optimization for likelihood-free cosmological inference. Physical Review D. 98. 10.1103/PhysRevD.98.063511. Used with permission.

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 31

Mean function

Optimized Measurement Point SelectionGaussian Processes

Gaussian Process (GP): Series of normal distributions

𝐺𝑃 𝑡 = 𝑁( 𝑚 𝑡 , 𝑐 𝑡, 𝑡, )

Covariance function

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 32

Optimized measurement point selectionGaussian Processes

Idea: Use covariance function

High variance High uncertainty

Promising measurement

CandidatesMeasurement costs

have to be considered!

Cost-benefit calculation

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 33

ResultsSynthetic evaluation (3 parameters)

86

88

90

92

94

96

98

100

5% Noise 7.5% Noise 10% Noise

% o

f acc

urat

e m

odel

s

Sparse GPR Hybrid

0

10

20

30

40

50

60

5% Noise 7,5% Noise 10% Noise

Addi

tiona

l mea

sure

men

ts

Sparse GPR Hybrid

Relative error < 20%(at unseen point, 2 ticks in each direction)

Measurements needed(in addition to the minimal measurement set)

0

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 34

Results

0

5

10

15

20

25

30

35

40

5% Noise 7,5% Noise 10% Noise

% o

f bud

get u

sed

Sparse GPR Hybrid

Budget needed(Compared to complete measurement set)

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 35

Parameter selection[Copik et al, PPoPP’21]

• The more paramters the more experiments

• Modeling parameters without performance impact is harmful

Taint analysis

Program

Input parameters Which parameter influences

whichfunction?

Taint labels

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 36

Selected papers

Topic BibliographyFoundation (single model paramter)

Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. SC13.

MPI case study Sergei Shudler, Yannick Berens, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf: Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations. IEEE TPDS, 30(8):1768–1785, 2019.

Multiple model parameters

Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf: Fast Multi-Parameter Performance Modeling. IEEE Cluster 2016.

Cost-effective sampling strategies

Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. IPDPS 2020.

Noise resilience Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. IPDPS 2021.

Taint-based performance modeling

Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler: Extracting Clean Performance Models from Tainted Programs. PPoPP 2021.

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 37

Thank you!

Q&A

top related