towards multi-objective, region-based auto-tuning for ...€¦ · auto-tuning for parallel programs...

19
Towards Multi-objective, Region-based Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman, Thomas Fahringer Research Center HPC / Distributed and Parallel Systems Group, University of Innsbruck

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Towards Multi-objective, Region-based Auto-tuning for Parallel Programs

Philipp GschwandtnerJuan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman, Thomas Fahringer

Research Center HPC / Distributed and Parallel Systems Group, University of Innsbruck

Page 2: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Why is it so Hard to Optimize Codes for Parallel Systems?

2

If you ran your code on different hardware how would you alter your code? I/O and process scheduling, cache sizes and policies, interconnect, …

Complexity, non-determinism, effort to predict program and system behavior dynamic reallocation of cores and memory, clock speed, external load and resource

contention, etc. operating system, queueing systems, software libraries, etc.

Modern system architectures are too complex for humans beings to manually find the best code transformation sequences for program optimization.

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 3: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

3

Intel Westmere AMD Barcelona

Impact of Parallelism on Loop Tiling of Matrix Multiply

100% >150%Relative Execution Time 100% >150%Relative Execution Time

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 4: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Search Space for Code Optimizations and Parallelization Strategies

4

code transformations loop tiling, scheduling, interchange, data transpose, data distribution, …

library settings MPI tuning parameters (several hundred)

compiler selection and flag settings optimization flags (gcc: approx. 200)

target machine configuration number of cores, multi-threading settings, frequencies & turbo boost, …

huge search space to explore for manual optimization use auto-tuning instead

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 5: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Automatic Optimization Landscape

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference5

Able to re-write your application? Use a domain-specific language! toolchain will/should/might take care of optimization for you facilitates performance portability and separation of concerns EU H2020 Project AllScale – www.allscale.eu

Unable to re-write, but you can manually annotate? manually mark interesting code regions, leave rest up to toolchain

Unable to manually annotate? use a compiler! fully automatic

Page 6: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

6

Insieme Source-to-source compiler C/C++, OpenMP, MPI, OpenCL, Cilk uniform intermediate representation analysis and transformation framework

(polyhedral model, pattern matching, etc.)

Insieme runtime system scheduling & runtime auto-tuning compiler-aided decision making

process

Insieme – Compiler and Runtime Research Platform

Fron

tend

Back

end

INSPIRE

Static Optimizer

IR Toolbox IRSMC/C++

OpenMPCilk

OpenCLMPI

and extensionsCompiler

Dyn. Optimizer

Scheduler

Monitoring

Exec. Engine

Runtime

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 7: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Insieme – Compiler and Runtime Research Platform

7

Foundation for research in parallel languages and extensions optimizing parallel applications

static, dynamic, and hybrid approaches search- and ML-based auto-tuning

constraint-based program analysis region-based multi-objective optimization

Further Information: https://www.insieme-compiler.org

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 8: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Insieme Auto-tuning Architecture

8

Input Code

Parallel Target Platform

Analyzer1

Optimizer

2

CodeRegions

ConfigurationsMeasure-

ments3

Backend

BestSolutions

4

Multi-Versioned

Code

5

Runtime System

DynamicSelection6

compile time runtime

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 9: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Multi-objective Optimization Using Pareto Optimality

9

(0,0) Objective 1

Objective 2 Configurations

Pareto Set

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 10: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

10

Our algorithm: RS-GDE3

Combination of metaheuristics

generalized differential evolution (conducts the actual iterative search)

search space reduction based on rough sets domain independent

Optimization Algorithm

generate & evaluate new points (GDE3)within bounds (RS-GDE3)

select new population

(GDE3)

compute new bounds

(RS)

Initial Population Final Populationimproved

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 11: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Objective Space and Problem Space

11

Optimize 3 objectives simultaneously runtime (measured) efficiency (cost x = 𝑥𝑥 × 𝑡𝑡𝑝𝑝(𝑥𝑥)) energy (measured e.g. via Intel RAPL)

Several tuning parameters 2D/3D loop tiling degree of parallelism (number of threads) clock frequency (DVFS)

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 12: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

12

Target architecture: 4x Intel Xeon Sandy Bridge with 32 cores

total available clock frequencies (1.2-2.6 GHz)

Tuning algorithms: hierarchical search (regular grid) random search RS-GDE3 (our algorithm)

Multiple codes with various kernels matrix multiply, stencil, n-body, multi-grid,

etc.

Quality of solution S hypervolume: V S ∈ [0, … 1] number of points: |𝑆𝑆| degree of dominance: |𝑆𝑆|′

Efficiency of algorithms number of evaluated configurations N time-to-solution energy-to-solution

V S , �S, �E for stochastic algorithms

Experimental Setup

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 13: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Performance Comparison

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference13

BenchmarkHierarchical Search Random Search RS-GDE3

N |S| |S|‘ V(S) N |S| |S|‘ V(S) N |S| |S|‘ V(S)

mm 18432 18 2% 0.00 15000 4.4 0% 0.33 956.2 23.4 98% 0.48

dsyrk 18432 21 5% 0.00 15000 2.2 11% 0.17 1149.6 24.8 98% 0.32

jacobi-2d 15876 31 78% 0.69 15000 17.2 5% 0.55 1243.6 29.8 75% 0.76

3d-stencil 15876 30 22% 0.75 15000 24.8 60% 0.61 981.4 28.2 77% 0.76

n-body 15876 26 0% 0.50 15000 30.0 17% 0.70 1801.4 29.6 87% 0.77

Page 14: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

14

RS-GDE3 evaluates less than 10-7 of all possible configurations and only 7 % of the configurations required by random search

Comparing the fastest solutions 70 % more energy efficient than

hierarchical, 45 % faster 24 % more energy efficient than

random search, 14 % faster The hypervolume also shows the higher

quality of the Pareto fronts computed by RS-GDE3

Highlights

0

0,2

0,4

0,6

0,8

1

Runtime Energy

RS-GDE3

Random

Hierachical

00,10,20,30,40,50,6

V(S)

RS-GDE3

Random

Hierarchical

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference

Page 15: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Region-aware Auto-tuning

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference15

Most existing auto-tuners global tuning finds fixed parameter values for the entire program assumes that fixed parameter values achieve best performance across all regions

Programs may consist of many code regions code region specific parameter values may yield best performance changing parameter values implies overhead region performance may be inter-dependent

Region based auto-tuning increases search space

Page 16: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Recombination Algorithm

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference16

R0 R1 R2 R3 R4 R5Config. 0

R0 R1 R2 R3 R4 R5Config. 1

R0 R1 R2 R3 R4 R5Config. 2

Population 0

R0 R1 R2 R3 R4 R5Config. 0

R0 R1 R2 R3 R4 R5Config. 1

R0 R1 R2 R3 R4 R5Config. 2

Population 1

R0 R1 R2 R3 R4 R5Config. 0

R0 R1 R2 R3 R4 R5Config. 1

R0 R1 R2 R3 R4 R5Config. 2

Population 2

differential evolution

recombination differential evolution

R0 R1 R2 R3 R4 R5Config. 0

R0 R1 R2 R3 R4 R5Config. 1

R0 R1 R2 R3 R4 R5Config. 2

Offspring 0

R0 R1 R2 R3 R4 R5Config. 0

R0 R1 R2 R3 R4 R5Config. 1

R0 R1 R2 R3 R4 R5Config. 2

Offspring 1

Page 17: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Region-based Hypervolume Comparison

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference17

00,10,20,30,40,50,60,70,80,9

1

mg hp bt

Random RS-GDE3 Region RS-GDE Recombination GPT

Page 18: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

Conclusion

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference18

motivated multi-objective auto-tuning with compilers discussed a suitable generic optimizer (RS-GDE3) for arbitrary numbers of

objectives demonstrated its effectiveness on tuning multiple parallel code regions for

three objectives (runtime, efficiency, energy) Region-aware auto-tuner code versions up to 7x faster, up to 10x less energy consuming, or up to 61x more efficient than the parallel, unoptimized version

Page 19: Towards Multi-objective, Region-based Auto-tuning for ...€¦ · Auto-tuning for Parallel Programs Philipp Gschwandtner Juan Durillo, Klaus Kofler, Herbert Jordan, Peter Thoman,

19

www.uibk.ac.at/fz-hpc https://www.insieme-compiler.org https://dps.uibk.ac.at

Ref: Juan J. Durillo, Philipp Gschwandtner, Klaus Kofler, Thomas Fahringer: Multi-objective region-aware optimization of parallel programs. J. of Parallel Computing, Vol. 83, p. 3-21, Elsevier, 2019.

16.09.2019Philipp Gschwandtner | Multi-objective Region-based Auto-tuning | ÖMG Conference