the price performance of performance models

11/2/21 | Technical University of Darmstadt, Germany | Felix Wolf | 1

Felix Wolf, Technical University of DarmstadtModSim 2021

The price performance of performance models

Application System

Photo: Alex Becker / TU Darmstadt

Acknowledgement

TU Darmstadt• Alexandru Calotoiu

• Alexander Geiß

• Alexander Graf

• Daniel Lorenz

• Benedikt Naumann

• Thorsten Reimann

• Marcus Ritter

• Sergei Shudler

ETH Zurich• Alexandru Calotoiu

• Marcin Copik

• Tobias Grosser

• Torsten Hoefler

• Nicolas Wicki

LLNL• David Beckingsale

• Christopher Earl

• Ian Karlin

• Martin Schulz

Scaling your code can harbor performance surprises*…

CommunicationCom

ionCommunication

tation

*Goldsmith et al., 2007

Performance model

29 210 211 212 2130

3¨ 10

´4 p2 ` c

Processes

Formula that expresses a relevant performance metric as a function of one or more execution parameters

Identify kernels

• Incomplete coverage

Create models

• Laborious, difficult

Analytical (i.e., manual) creation challenging for entire programs

𝑡 = 𝑓 𝑝𝑡 = 3 & 10!"𝑝# + 𝑐

Empirical performance modeling

Performance measurements with different execution parameters x1,...,xn

t1 t2t3

tn-2 tn-1tn

Machine learning

𝑡 = 𝑓 𝑥$, … , 𝑥%

Alternative metrics: FLOPs, data volume…

Challenges

Applications

System

Run-to-run variation / noise

Cost of the required experiments

How to deal with noisy data

• Introduce prior into learning process• Assumption about the probability distribution generating the data

• Computation• Memory access• Communication• I/O~

Time Effort

Typical algorithmic complexities in HPCComputation

Communication

Samplesort

Naïve N-body

Samplesort

Naïve N-body

… …

t(p) ~ p2

t(p) ~ p

t(p) ~ c

t(p) ~ p2 log22 (p)

t(p) ~ p

t(p) ~ log2(p)

t(p) ~ c

Performance model normal form (PMNF)

𝑓 𝑥 = .&'$

𝑐& & 𝑝(! & 𝑙𝑜𝑔#)! 𝑥

Single parameter [Calotoiu et al., SC13]

𝑓 𝑥$, … , 𝑥* = .&'$

𝑐&2+'$

𝑥+(!" & 𝑙𝑜𝑔#

)!" 𝑥+

Multiple parameters [Calotoiu et al., Cluster’16]

Heuristics to reduce search

Parameter selection

Search space

configuration

Linear regression +

cross-validation

Quality assurance

Extra-P 4.0

Available at: https://github.com/extra-p/extrap

MPI implementations[Shudler et al., IEEE TPDS 2019]

Platform Juqueen Juropa Piz DaintAllreduce [s] Expectation: O (log p)Model O (log p) O (p0.5) O (p0.67 log p)R2 0.87 0.99 0.99Match ✔ ~ ✘!Comm_dup [B] Expectation: O (1)Model 2.2e5 256 3770 + 18pR2 1 1 0.99Match ✔ ✔ ✘

Kripke - example w/ multiple parameters

SweepSolver

Main computation kernel

Expectation – Performance depends on problem size

Actual model:

MPI_Testany

Main communicationkernel: 3D wave-frontcommunication pattern

Expectation – Performance depends on cubic root of process count

Actual model:

Kernels must wait on each other

*Coefficients have been rounded for convenience

Smaller compounded effect discovered

t ~ p3t ~ d ⋅ g

t = 5+ d ⋅ g+ 0.005 ⋅ p3 ⋅d ⋅ g t = 7+ p3 + 0.005 ⋅ p3 ⋅d ⋅ g

Experiments can be expensiveNeed experiments, = #parameters

Processes

Low memory High jitter

4 8 16 32 64

Multi-parameter modeling in Extra-P

Generation of candidate models and selection of best fit

Find best single-parameter model

Combine them in the most plausible way

(+, *, none)

𝑇𝑖𝑚𝑒 = 𝑐$ & 𝑛 & 𝑙𝑜𝑔 𝑛 & 𝑝

𝐹𝐿𝑂𝑃𝑠 = 𝑐# & 𝑛 & 𝑙𝑜𝑔 𝑛

c1 + c2 ⋅ p

c1 + c2 ⋅ p2

c1 + c2 ⋅ log(p)c1 + c2 ⋅ p ⋅ log(p)

c1 + c2 ⋅ p2 ⋅ log(p)

c1 ⋅ log(p)+ c2 ⋅ pc1 ⋅ log(p)+ c2 ⋅ p ⋅ log(p)c1 ⋅ log(p)+ c2 ⋅ p

c1 ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p+ c2 ⋅ p ⋅ log(p)c1 ⋅ p+ c2 ⋅ p

c1 ⋅ p+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2

c1 ⋅ p ⋅ log(p)+ c2 ⋅ p2 ⋅ log(p)

c1 ⋅ p2 + c2 ⋅ p

2 ⋅ log(p)

How many data points do we really need?Pr

Processes

4 8 16 32 64

Learning cost-effective sampling strategies[Ritter et al., IPDPS’20]

Function generator

Noise module

Reinforcement learning agent

Selectedparametervalues

Syntheticmeasurement

Extra-P

Evaluation

Feedback

Prediction

Ground truth

Empirical model

Heuristic parameter-value selection strategy

Measure min. amount points

required for modeling

Create a model using Extra-P

Gather on additional measurement

(assumed to be cheapest)

Create new model

Final model

If cost <

budget

Synthetic data evaluation

15%20%

-5%-10%

-15%-20%

Training Evaluation

Synthetic evaluation results

1 parameter, 5% noise 2 parameters, 5% noise

Measurements used / Percentage of cost5 / 100% 9 / 12% 11 / 13% 15 / 17% 25 / 100%

Repetitions2 4 6

Measurements used / Percentage of cost13 / 1.7% 15 / 1.8% 25 / 2.2% 75 / 11% 125 / 100%

Repetitions2 4 6

3 parameters, 5% noise

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

Measurements used / Percentage of cost

17 / 0.1% 18 / 0.12% 125 / 1.2% 250 / 4.2% 625 / 100%

Repetitions2 4 6

25 / 0.17%

Case studies

Application #Parameters Extra points

Cost savings [%]

Prediction error [%]

FASTEST 2 0 70 2

Kripke 3 3 99 39

Relearn 2 0 85 11

Noise-resilient performance modeling[Ritter et al., IPDPS’21]

§ Performance measurements frequently affected by noise

§ Regression struggles with increased amounts of noise –especially w/ more parameters

§ Neural networks are resilient to noise – when noise is part of their training

Original Reconstructed

1 Parameter:

x3 Parameter:

zy100% application behavior

x100% application behavior

noiseAdapted from: https://developer.nvidia.com/blog/ai-can-now-fix-your-grainy-photos-by-only-looking-at-grainy-photos/

Noise-resilient adaptive modeling

DNNs often better at guessing models in the presence of noise

Performance measurements Estimate noise

Noise low?

Final model

Classic (sparse) modeler

Select best

DNN modeler

Performance model

Transfer learning

Noise-resilient performance modelingSynthetic evaluation

20% 50% 100%Noise level

Median relative error

Adaptive Regression

2 parameters

(at unseen point, two ticks in each dimension)

Noise-resilient performance modelingCase studies – Results

13,45% 16,23%

22,28%

69,79%

Kripke (Vulcan) FASTEST (SuperMUC) RELeARN

Median relative error

Adaptive Regression

Noise < 5%:only the regression

modeler is used

Noise-resilient performance modelingCase studies – Noise

Optimized measurement point selection[Naumann et al., in preparation]

Sparse Better?

Optimized measurement point selectionvia Gaussian Process Regression (GPR) – Introduction

From: Leclercq, Florent. (2018). Bayesian optimization for likelihood-free cosmological inference. Physical Review D. 98. 10.1103/PhysRevD.98.063511. Used with permission.

Mean function

Optimized Measurement Point SelectionGaussian Processes

Gaussian Process (GP): Series of normal distributions

𝐺𝑃 𝑡 = 𝑁( 𝑚 𝑡 , 𝑐 𝑡, 𝑡, )

Covariance function

Optimized measurement point selectionGaussian Processes

Idea: Use covariance function

High variance High uncertainty

Promising measurement

CandidatesMeasurement costs

have to be considered!

Cost-benefit calculation

ResultsSynthetic evaluation (3 parameters)

5% Noise 7.5% Noise 10% Noise

Sparse GPR Hybrid

5% Noise 7,5% Noise 10% Noise

Sparse GPR Hybrid

Relative error < 20%(at unseen point, 2 ticks in each direction)

Measurements needed(in addition to the minimal measurement set)

Results

5% Noise 7,5% Noise 10% Noise

Sparse GPR Hybrid

Budget needed(Compared to complete measurement set)

Parameter selection[Copik et al, PPoPP’21]

• The more paramters the more experiments

• Modeling parameters without performance impact is harmful

Taint analysis

Program

Input parameters Which parameter influences

whichfunction?

Taint labels

Selected papers

Topic BibliographyFoundation (single model paramter)

Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. SC13.

MPI case study Sergei Shudler, Yannick Berens, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf: Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations. IEEE TPDS, 30(8):1768–1785, 2019.

Multiple model parameters

Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf: Fast Multi-Parameter Performance Modeling. IEEE Cluster 2016.

Cost-effective sampling strategies

Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. IPDPS 2020.

Noise resilience Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. IPDPS 2021.

Taint-based performance modeling

Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler: Extracting Clean Performance Models from Tainted Programs. PPoPP 2021.

Thank you!

the price performance of performance models

Documents

effects of mineral price models on.pdf

price-advertising relationship and diffusion models

price s60 models (car_computers.ru)

affordable price big performance!

other attachments price sheet (all models)

share price performance

impa commodities course : forward price · pdf fileblack’s...

cobra ls190xx. symbol confidential performance price ls210x...

price performance youth ad 3

emerging hpc use cases and deployment models · emerging...

a comparative analysis of reference price models

price and return models

new the price performance of performance models · 2020. 9....

quantifying performance models

price models

high performance schools - price industries

price list · chandler model a0000 models list price (us$)...

compact high-speed camera system · 2017. 3. 3. ·...

applying human-performance models to designing and ...1.2...

calibration of calme models using westrack performance...