meta optimization improving compiler heuristics with machine learning mark stephenson, una-may...

Meta OptimizationImproving Compiler Heuristics

with Machine Learning

Mark Stephenson, Una-May O’Reilly,Martin Martin, and Saman AmarasingheMIT Computer Architecture Group

Presented by Utku Aydonat

Motivation

• Compiler writers are faced with many challenges:– Many compiler problems are NP-hard– Modern architectures are inextricably complex

• Simple models can’t capture architecture intricacies

– Micro-architectures change quickly

Motivation

• Heuristics alleviate complexity woes– Find good approximate solutions for a large

class of applications– Find solutions quickly

• Unfortunately…– They require a lot of trial-and-error tweaking to

achieve suitable performance– Fine tuning is a tedious process

Priority Functions

• A single priority or cost function often dictates the efficacy of a heuristic

• Priority Function: A function of the factors that affect a given problem

• Priority functions rank the options available to a compiler heuristic

Priority Functions

• Graph coloring register allocation (selecting nodes to spill)

• List scheduling (identifying instructions in worklist to schedule first)

• Hyperblock formation (selecting paths to include)

• Data Prefetching (inserting prefetch instructions)

List Scheduling

Machine Learning

• They propose using machine learning techniques to automatically search the priority function space– Feedback directed optimization– Find a function that works well for a broad

range of applications– Needs to be applied only once

Generic Programming

• Modeled after Darwinism (Survival of the Fittest).• Parse trees of operators and operands describe the potential

priority functions.• A population is a collection of parse trees for one

generation.• After testing, several members of the population are

selected for reproduction via crossover, which swaps a random node from each of 2 parse trees.

• Other parse trees are selected for mutation, in which a random node is replaced by a random expression.

Generic Programming

Genetic ProgrammingCreate initial population(initial solutions)

Evaluation

Selection

END

Generation of variants(mutation and crossover)

Generations < Limit?

Generic Programming

• The system selects the smaller of several expressions that are equally fit so that the parse trees do not grow exponentially.

• Tournament selection is used repeatedly to select parse trees for crossover.– Choose N expressions at random from the population

and select the one with the highest fitness.

• Dynamic subset selection (DSS) is used to reduce fitness evaluations to achieve suitable solution.

Meta Optimization

• Just as with Natural Selection, the fittest individuals are more likely to survive and reproduce.

• Expressions creating fastest code are the fittest.

• Create 399 random expressions based on parameters.

• It also seeded with the compiler writer’s best guesses

Meta Optimization

• Find predictable regions of control flow

• Prioritize paths based on several characteristics– The priority function we want to optimize

• Add paths to hyperblock in priority order

Case Study I: Hyperblock Formation

Hyperblock Formation

Case Study I: IMPACT’s Function

free hazard is if :1

hazard has if :25.0

i

ii path

pathhazard

jNj

ii heightdep

heightdepratiodep

_max

__

1

)__1.2(_ iiiii ratioopratiodephazardratioexecpriority

jNj

ii heightop

heightopratioop

_max

__

1

Favor frequentlyExecuted paths Favor short paths

Favor parallel pathsPenalize paths with hazards

Hyperblock Formation

• What are the important characteristic of a hyperblock formation priority function?

• IMPACT uses four characteristics

• Extract all the characteristics you can think of and have a machine learning algorithm find the priority function

Hyperblock Formationx1 Maximum ops over paths x2 Dependence height

x3 Number of paths x4 Number of operations

x5 Does path have subroutine calls? x6 Number of branches

x7 Does path have unsafe calls? x8 Path execution ratio

x9 Does path have pointer derefs? x10 Average ops executed in path

x11 Issue width of processor x12 Average predictability of branches in path

… xN Predictability product of branches in path

)function( 1 Ni xxpriority

)__1.2(_ iiiii ratioopratiodephazardratioexecpriority

Hyperblock ResultsCompiler Specialization

1.5

4

1.2

3

0

0.5

1

1.5

2

2.5

3

3.51

29

.co

mp

ress

g7

21

en

cod

e

g7

21

de

cod

e

hu

ff_

de

c

hu

ff_

en

c

raw

cau

dio

raw

da

ud

io

toa

st

mp

eg

2d

ec

Ave

rag

e

Sp

ee

du

p

Train data set Alternate data set

Hyperblock ResultsA General Purpose Priority Function

1.44

1.25

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5de

codr

le4

codr

le4

g721

deco

de

g721

enco

de

raw

daud

io

raw

caud

io

toas

t

mpe

g2de

c

124.

m88

ksim

129.

com

pres

s

huff

_enc

huff

_dec

Ave

rage

Sp

ee

du

p


Cross ValidationTesting General Purpose Applicability

1.09

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6u

ne

pic

djp

eg

rast

a

02

3.e

qn

tott

13

2.ij

pe

g

05

2.a

lvin

n

14

7.v

ort

ex

08

5.c

c1 art

13

0.li

osd

em

o

mip

ma

p

Ave

rag

e

Sp

ee

du

p

Case Study II: Register AllocationA General Purpose Priority Function

1.03

1.03

0.85

0.9

0.95

1

1.05

1.1

1.151

29

.co

mp

ress

g7

21

de

cod

e

g7

21

en

cod

e

hu

ff_e

nc

hu

ff_d

ec

raw

cau

dio

raw

da

ud

io

mp

eg

2d

ec

Ave

rag

e

Sp

ee

du

p


1.02 1.

03

0.9

0.95

1

1.05

1.1

1.15

1.2

de

cod

rle

4

cod

rle

4

12

4.m

88

ksim

un

ep

ic

djp

eg

02

3.e

qn

tott

13

2.ij

pe

g

14

7.v

ort

ex

08

5.c

c1

13

0.li

Ave

rag

e

Sp

ee

du

p

Speedup (32-Regs) Speedup (64-Regs)

Register Allocation ResultsCross Validation

Case Study III: PrefetchingA General Purpose Priority Function

Prefecthing ResultsCross Validation

Conclusion

• Machine learning techniques can identify effective priority functions

• ‘Proof of concept’ by evolving three well known priority functions

• Human cycles v. computer cycles

My Conclusions

• Heuristics to improve heuristics– How to choose population size, mutation rate,

tournament size?

• Does it guarantee better results?• Requires a lot of experiments for the results to

converge. – Do we have that opportunity?

• It is very effective for optimizing a specific application space.

Thank you!

(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)

(mul 0.6727 num_paths) (mul 1.1609

(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean

0.9838) (sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean

num_branches_max) num_paths)))))))

GP Hyperblock SolutionsGeneral Purpose

Intron that doesn’t affect solution







Favor paths that don’t have pointer dereferences







Favor highly parallel (fat) paths







If a path calls a subroutine that may have side effects, penalize it

Case Study I: IMPACT’s Algorithm

A

B C

E

F

G

4k 24k

4k 22k 2k

2k 25

28k

28k

D

10

10

Path exec haz ops dep pr

A-B-D-F-G ~0 1.0 13 4 ~0

A-B-F-G 0.14 1.0 10 4 0.21

A-C-F-G 0.79 1.0 9 2 1.44

A-C-E-F-G 0.07 0.25 13 5 0.02

A-C-E-G ~0 0.25 11 3 ~0

meta optimization improving compiler heuristics with machine learning mark stephenson, una-may...

Documents

random expressions

parse trees of operators

collection of parse

cost function

random node

potential priority functions

compiler problems

n expressions