meta optimization improving compiler heuristics with machine learning mark stephenson, una-may...
TRANSCRIPT
Meta OptimizationImproving Compiler Heuristics
with Machine Learning
Mark Stephenson, Una-May O’Reilly,Martin Martin, and Saman AmarasingheMIT Computer Architecture Group
Presented by Utku Aydonat
Motivation
• Compiler writers are faced with many challenges:– Many compiler problems are NP-hard– Modern architectures are inextricably complex
• Simple models can’t capture architecture intricacies
– Micro-architectures change quickly
Motivation
• Heuristics alleviate complexity woes– Find good approximate solutions for a large
class of applications– Find solutions quickly
• Unfortunately…– They require a lot of trial-and-error tweaking to
achieve suitable performance– Fine tuning is a tedious process
Priority Functions
• A single priority or cost function often dictates the efficacy of a heuristic
• Priority Function: A function of the factors that affect a given problem
• Priority functions rank the options available to a compiler heuristic
Priority Functions
• Graph coloring register allocation (selecting nodes to spill)
• List scheduling (identifying instructions in worklist to schedule first)
• Hyperblock formation (selecting paths to include)
• Data Prefetching (inserting prefetch instructions)
List Scheduling
Machine Learning
• They propose using machine learning techniques to automatically search the priority function space– Feedback directed optimization– Find a function that works well for a broad
range of applications– Needs to be applied only once
Generic Programming
• Modeled after Darwinism (Survival of the Fittest).• Parse trees of operators and operands describe the potential
priority functions.• A population is a collection of parse trees for one
generation.• After testing, several members of the population are
selected for reproduction via crossover, which swaps a random node from each of 2 parse trees.
• Other parse trees are selected for mutation, in which a random node is replaced by a random expression.
Generic Programming
Genetic ProgrammingCreate initial population(initial solutions)
Evaluation
Selection
END
Generation of variants(mutation and crossover)
Generations < Limit?
Generic Programming
• The system selects the smaller of several expressions that are equally fit so that the parse trees do not grow exponentially.
• Tournament selection is used repeatedly to select parse trees for crossover.– Choose N expressions at random from the population
and select the one with the highest fitness.
• Dynamic subset selection (DSS) is used to reduce fitness evaluations to achieve suitable solution.
Meta Optimization
• Just as with Natural Selection, the fittest individuals are more likely to survive and reproduce.
• Expressions creating fastest code are the fittest.
• Create 399 random expressions based on parameters.
• It also seeded with the compiler writer’s best guesses
Meta Optimization
• Find predictable regions of control flow
• Prioritize paths based on several characteristics– The priority function we want to optimize
• Add paths to hyperblock in priority order
Case Study I: Hyperblock Formation
Hyperblock Formation
Case Study I: IMPACT’s Function
free hazard is if :1
hazard has if :25.0
i
ii path
pathhazard
jNj
ii heightdep
heightdepratiodep
_max
__
1
)__1.2(_ iiiii ratioopratiodephazardratioexecpriority
jNj
ii heightop
heightopratioop
_max
__
1
Favor frequentlyExecuted paths Favor short paths
Favor parallel pathsPenalize paths with hazards
Hyperblock Formation
• What are the important characteristic of a hyperblock formation priority function?
• IMPACT uses four characteristics
• Extract all the characteristics you can think of and have a machine learning algorithm find the priority function
Hyperblock Formationx1 Maximum ops over paths x2 Dependence height
x3 Number of paths x4 Number of operations
x5 Does path have subroutine calls? x6 Number of branches
x7 Does path have unsafe calls? x8 Path execution ratio
x9 Does path have pointer derefs? x10 Average ops executed in path
x11 Issue width of processor x12 Average predictability of branches in path
… xN Predictability product of branches in path
)function( 1 Ni xxpriority
)__1.2(_ iiiii ratioopratiodephazardratioexecpriority
Hyperblock ResultsCompiler Specialization
1.5
4
1.2
3
0
0.5
1
1.5
2
2.5
3
3.51
29
.co
mp
ress
g7
21
en
cod
e
g7
21
de
cod
e
hu
ff_
de
c
hu
ff_
en
c
raw
cau
dio
raw
da
ud
io
toa
st
mp
eg
2d
ec
Ave
rag
e
Sp
ee
du
p
Train data set Alternate data set
Hyperblock ResultsA General Purpose Priority Function
1.44
1.25
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5de
codr
le4
codr
le4
g721
deco
de
g721
enco
de
raw
daud
io
raw
caud
io
toas
t
mpe
g2de
c
124.
m88
ksim
129.
com
pres
s
huff
_enc
huff
_dec
Ave
rage
Sp
ee
du
p
Train data set Alternate data set
Cross ValidationTesting General Purpose Applicability
1.09
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6u
ne
pic
djp
eg
rast
a
02
3.e
qn
tott
13
2.ij
pe
g
05
2.a
lvin
n
14
7.v
ort
ex
08
5.c
c1 art
13
0.li
osd
em
o
mip
ma
p
Ave
rag
e
Sp
ee
du
p
Case Study II: Register AllocationA General Purpose Priority Function
1.03
1.03
0.85
0.9
0.95
1
1.05
1.1
1.151
29
.co
mp
ress
g7
21
de
cod
e
g7
21
en
cod
e
hu
ff_e
nc
hu
ff_d
ec
raw
cau
dio
raw
da
ud
io
mp
eg
2d
ec
Ave
rag
e
Sp
ee
du
p
Train data set Alternate data set
1.02 1.
03
0.9
0.95
1
1.05
1.1
1.15
1.2
de
cod
rle
4
cod
rle
4
12
4.m
88
ksim
un
ep
ic
djp
eg
02
3.e
qn
tott
13
2.ij
pe
g
14
7.v
ort
ex
08
5.c
c1
13
0.li
Ave
rag
e
Sp
ee
du
p
Speedup (32-Regs) Speedup (64-Regs)
Register Allocation ResultsCross Validation
Case Study III: PrefetchingA General Purpose Priority Function
Prefecthing ResultsCross Validation
Conclusion
• Machine learning techniques can identify effective priority functions
• ‘Proof of concept’ by evolving three well known priority functions
• Human cycles v. computer cycles
My Conclusions
• Heuristics to improve heuristics– How to choose population size, mutation rate,
tournament size?
• Does it guarantee better results?• Requires a lot of experiments for the results to
converge. – Do we have that opportunity?
• It is very effective for optimizing a specific application space.
Thank you!
(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)
(mul 0.6727 num_paths) (mul 1.1609
(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean
0.9838) (sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean
num_branches_max) num_paths)))))))
GP Hyperblock SolutionsGeneral Purpose
Intron that doesn’t affect solution
(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)
(mul 0.6727 num_paths) (mul 1.1609
(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean
0.9838) (sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean
num_branches_max) num_paths)))))))
GP Hyperblock SolutionsGeneral Purpose
Favor paths that don’t have pointer dereferences
GP Hyperblock SolutionsGeneral Purpose
(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)
(mul 0.6727 num_paths) (mul 1.1609
(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean
0.9838) (sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean
num_branches_max) num_paths)))))))
Favor highly parallel (fat) paths
GP Hyperblock SolutionsGeneral Purpose
(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)
(mul 0.6727 num_paths) (mul 1.1609
(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean
0.9838) (sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean
num_branches_max) num_paths)))))))
If a path calls a subroutine that may have side effects, penalize it
Case Study I: IMPACT’s Algorithm
A
B C
E
F
G
4k 24k
4k 22k 2k
2k 25
28k
28k
D
10
10
Path exec haz ops dep pr
A-B-D-F-G ~0 1.0 13 4 ~0
A-B-F-G 0.14 1.0 10 4 0.21
A-C-F-G 0.79 1.0 9 2 1.44
A-C-E-F-G 0.07 0.25 13 5 0.02
A-C-E-G ~0 0.25 11 3 ~0
Case Study I: IMPACT’s Algorithm
A
B C
E
F
G
4k 24k
4k 22k 2k
2k 25
28k
28k
D
10
10
Path exec haz ops dep pr
A-B-D-F-G ~0 1.0 13 4 ~0
A-B-F-G 0.14 1.0 10 4 0.21
A-C-F-G 0.79 1.0 9 2 1.44
A-C-E-F-G 0.07 0.25 13 5 0.02
A-C-E-G ~0 0.25 11 3 ~0