university of houston extending global optimizations in the openuh compiler for openmp open64...

12
University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

Upload: jasmine-russell

Post on 18-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Extending Global Optimizations in the OpenUH

Compiler for OpenMP

Open64 Workshop, CGO ‘08

Page 2: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Goals

• Exploit the compiler analysis and optimizations for OpenMP programs

• Enable high level optimizations by taking OpenMP semantics into consideration

• Build a general framework for OpenMP compiler optimizations

2

Page 3: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

OpenUH Compiler based on Open64

IPA(Inter Procedural Analyzer)

Source code w/ OpenMP directives

Source code with runtime library calls

Linking

CG(code for IA-32, IA-64, Opteron)

WOPT(global scalar optimizer)

Object files

LOWER_MP(Transformation of OpenMP )

A NativeCompiler

A NativeCompiler

ExecutablesExecutables

A Portable OpenMPRuntime library

A Portable OpenMPRuntime library

FRONTENDS(C/C++, Fortran 90, OpenMP)

Op

en64

Co

mp

iler

in

fras

tru

ctu

re LNO(Loop Nest Optimizer)

OMP_PRELOWER(Preprocess OpenMP )

WHIRL2C & WHIRL2F(IR-to-source for none-Itanium )

Page 4: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

OpenUH Compiler based on Open64

IPA(Inter Procedural Analyzer)

Source code w/ OpenMP directives

Source code with runtime library calls

Linking

CG(code for IA-32, IA-64, Opteron)

WOPT(global scalar optimizer)

Object files

LOWER_MP(Transformation of OpenMP )

A NativeCompiler

A NativeCompiler

ExecutablesExecutables

A Portable OpenMPRuntime library

A Portable OpenMPRuntime library

FRONTENDS(C/C++, Fortran 90, OpenMP)

Op

en64

Co

mp

iler

in

fras

tru

ctu

re LNO(Loop Nest Optimizer)

OMP_PRELOWER(Preprocess OpenMP )

WHIRL2C & WHIRL2F(IR-to-source for none-Itanium )

Page 5: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Motivation

Compiler flags

-O3 -O3 –mp3

PRE-example

7.42 46.8

NAS FT 18.45 26.17

NAS UA 130.31 220.15

Why different performance?

Page 6: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

A PRE Example

Page 7: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

A PRE Example

copy propagation

no copy propagation!

Page 8: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Parallel Data Flow Analysis

• Compilers need to further optimize OpenMP codes

• Most current OpenMP compilers perform optimizations after OpenMP constructs have been lowered to threaded codes– Have to restrict the traditional optimizations inside an

OpenMP construct, not crossing synchronizations• Need to enable global optimizations

– Missed opportunity to perform high-level OpenMP optimizations

• Such as barrier elimination

Page 9: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Solution Method

• Based on the OpenMP Memory Model– Relaxed Consistency– Flush is the key operation!

• Design a Parallel Control Flow Graph to represent a OpenMP program

Page 10: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Barrier

a=1; b=1;

Flush(a,b) Flush(a,b)

Else…

a=0; b=0;#pragma omp parallel sections{ #pragma omp section { a=1; #pragma omp flush(a,b) IF (b == 0){ Critical1;

a:= 0;#pragma omp flush(a) }ELSE else1;

#pragma omp section { b=1; #pragma omp flush(a,b) IF (a == 0){ Critical2; b= 0; #pragma omp flush(b) }ELSE else2; }}

A: an OpenMP section example

B: The corresponding PCFG

Super node: Composite node:

Basic Node:

Parallel edge:

Sequential edge:

Entry

Conflict edge:

If (a ==0)

Flush(b)

b=0Else…

If (b ==0)

Flush(a)

a=0

Page 11: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

CFGCFG

HSSAHSSA

IVRIVR

CPDCECP

DCE

EmitEmit

Input WHIRL tree

Output WHIRL tree

-Construct CFG-Control Flow Analyses-Flow Free Alias Analysis

-Construct HSSA representation-Points-to and Pointer Alias Analysis-Create CODEMAP representation

-PREOPT SSA-based optimizations

“Flow free copy propagation”

-Emit new WHIRL from optimized CFG/SSA

PCFG

HSSA

IVRIVR

CPDCECP

DCE

EmitEmit

Input WHIRL tree

Output WHIRL tree

-Construct CFG-Control Flow Analyses-Parallel Control Flow Analysis-Flow Free Alias Analysis

-Construct HSSA representation-Phi insertion for conflict edges-Points-to and Pointer Alias Analysis-Create CODEMAP representation

-SSA-based optimizations

“Flow free copy propagation”

-Emit new WHIRL from optimized CFG/SSA

SSAPRE -Perform PRE on OpenMP code

Page 12: University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston

Conclusion

• Implementing in the OpenUH compiler

• Improve the scalability of OpenMP programs

• A framework for conducting more aggressive optimizations for Cluster OpenMP

• Can be used in conjunction with data race detection tools